| | |
Help with SED/AWK email parser
Please support our Shell Scripting advertiser: Programming Forums - DaniWeb Sister Site
Thread Solved |
I have a bunch of text files with different formats that somewhere in the file have email addresses. I would like to be able to parse through any number of these files for email addresses. Here are the types of input:
CFO: some_cfo@domain.com
misterman@domain.com
The Main Man mainman@domain.com
To take care of the situations I have the following seds:
#Removes line with an opening title
sed -e 's/^.*://'
#Removes opening and closing whitepsace
sed -e 's/^[ ^t]*//;s/[ ^t]*$//'
Those are both really simple, but for the life of me I can't figure out how to remove normal text from before the email address. I either end up clobbering the whole thing, or including it.
I just need to end up with something like keep what is directly attached to the '@' and delete anything after or before other whitespace
The Main Man mainman@domain.com
^not part of email. ^ and ^ are both parts of email.
Any clues anyone?
CFO: some_cfo@domain.com
misterman@domain.com
The Main Man mainman@domain.com
To take care of the situations I have the following seds:
#Removes line with an opening title
sed -e 's/^.*://'
#Removes opening and closing whitepsace
sed -e 's/^[ ^t]*//;s/[ ^t]*$//'
Those are both really simple, but for the life of me I can't figure out how to remove normal text from before the email address. I either end up clobbering the whole thing, or including it.
I just need to end up with something like keep what is directly attached to the '@' and delete anything after or before other whitespace
The Main Man mainman@domain.com
^not part of email. ^ and ^ are both parts of email.
Any clues anyone?
Last edited by i686-linux; May 14th, 2004 at 2:56 pm. Reason: Formatting error
PARANOIA:
A healthy understanding of the way the universe works.
A healthy understanding of the way the universe works.
•
•
•
•
Originally Posted by i686-linux
I just need to end up with something like keep what is directly attached to the '@' and delete anything after or before other whitespace
The Main Man mainman@domain.com
^not part of email. ^ and ^ are both parts of email.

grep -o "[[:alnum:][:graph:]]*@[[:alnum:][:graph:]]*"
I haven't tested for many bugs/quirks in the results yet, but a few quick checks seemed to work fine.
If anyone has any further ideas though they would still be greatly appreciated!
PARANOIA:
A healthy understanding of the way the universe works.
A healthy understanding of the way the universe works.
Here are a few I've used in the past (dunno if they'll work for you since SED isn't my strong point):
# get return address header
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'
# get Subject header, but remove initial "Subject: " portion
sed '/^Subject: */!d; s///;q'
# get return address header
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'
# parse out the address proper. Pulls out the e-mail address by itself
# from the 1-line return address header (see preceding script)
sed 's/ *(.*)//; s/>.*//; s/.*[:<] *//'
# get return address header
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'
# get Subject header, but remove initial "Subject: " portion
sed '/^Subject: */!d; s///;q'
# get return address header
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'
# parse out the address proper. Pulls out the e-mail address by itself
# from the 1-line return address header (see preceding script)
sed 's/ *(.*)//; s/>.*//; s/.*[:<] *//'
My Home Away from Home: Yet Another Linux Blog
I got them from my friend Josh who probably did pull them off that very site
My Home Away from Home: Yet Another Linux Blog
•
•
Join Date: Mar 2004
Posts: 1,620
Reputation:
Solved Threads: 51
hello,
I am working with:
cat filename.txt | grep @
and getting the email names reduced to something in the one line. I am thinking that this will help. What I wonder is if we can get grep to simply output the found expression instead of the whole dang line.
I am also wondering if AWK will do what you need.
Christian
I am working with:
cat filename.txt | grep @
and getting the email names reduced to something in the one line. I am thinking that this will help. What I wonder is if we can get grep to simply output the found expression instead of the whole dang line.
I am also wondering if AWK will do what you need.
Christian
•
•
•
•
Originally Posted by kc0arf
cat filename.txt | grep @
and getting the email names reduced to something in the one line. I am thinking that this will help. What I wonder is if we can get grep to simply output the found expression instead of the whole dang line.
Christian
grep -o "[[:alnum:][:graph:]]*@[[:alnum:][:graph:]]*"
grep -o returns the matched expression instead of the whole line matched
I realized that this can be cut down to:
grep -o "[[:graph:]]*@[[:graph:]]*"
PARANOIA:
A healthy understanding of the way the universe works.
A healthy understanding of the way the universe works.
Using SED...you could also find a pattern similar to the grep -o
sed -n 's/.*\(pattern\).*/\1/p' file
Is the * in your grep -o example = to any character? I've never used that in a grep command before...
sed -n 's/.*\(pattern\).*/\1/p' file
Is the * in your grep -o example = to any character? I've never used that in a grep command before...
My Home Away from Home: Yet Another Linux Blog
![]() |
Other Threads in the Shell Scripting Forum
- Previous Thread: Need help with BASH to MS-DOS Batch conversion.
- Next Thread: Case and loop
Views: 15398 | Replies: 9
| Thread Tools | Search this Thread |
Tag cloud for Shell Scripting






