Help with SED/AWK email parser

Please support our Shell Scripting advertiser: Programming Forums - DaniWeb Sister Site
Thread Solved

Join Date: Mar 2004
Posts: 209
Reputation: i686-linux is on a distinguished road 
Solved Threads: 12
i686-linux's Avatar
i686-linux i686-linux is offline Offline
Posting Whiz in Training

Help with SED/AWK email parser

 
0
  #1
May 14th, 2004
I have a bunch of text files with different formats that somewhere in the file have email addresses. I would like to be able to parse through any number of these files for email addresses. Here are the types of input:

CFO: some_cfo@domain.com

misterman@domain.com

The Main Man mainman@domain.com

To take care of the situations I have the following seds:

#Removes line with an opening title
sed -e 's/^.*://'

#Removes opening and closing whitepsace
sed -e 's/^[ ^t]*//;s/[ ^t]*$//'

Those are both really simple, but for the life of me I can't figure out how to remove normal text from before the email address. I either end up clobbering the whole thing, or including it.

I just need to end up with something like keep what is directly attached to the '@' and delete anything after or before other whitespace

The Main Man mainman@domain.com
^not part of email. ^ and ^ are both parts of email.

Any clues anyone?
Last edited by i686-linux; May 14th, 2004 at 2:56 pm. Reason: Formatting error
PARANOIA:
A healthy understanding of the way the universe works.
Reply With Quote Quick reply to this message  
Join Date: Mar 2004
Posts: 209
Reputation: i686-linux is on a distinguished road 
Solved Threads: 12
i686-linux's Avatar
i686-linux i686-linux is offline Offline
Posting Whiz in Training

Re: Help with SED/AWK email parser

 
0
  #2
May 14th, 2004
Originally Posted by i686-linux
I just need to end up with something like keep what is directly attached to the '@' and delete anything after or before other whitespace

The Main Man mainman@domain.com
^not part of email. ^ and ^ are both parts of email.
And grep saves the day. Next time I'll RTFM better.

grep -o "[[:alnum:][:graph:]]*@[[:alnum:][:graph:]]*"

I haven't tested for many bugs/quirks in the results yet, but a few quick checks seemed to work fine.

If anyone has any further ideas though they would still be greatly appreciated!
PARANOIA:
A healthy understanding of the way the universe works.
Reply With Quote Quick reply to this message  
Join Date: Jan 2004
Posts: 468
Reputation: TKS will become famous soon enough TKS will become famous soon enough 
Solved Threads: 18
TKS's Avatar
TKS TKS is offline Offline
Posting Pro in Training

Re: Help with SED/AWK email parser

 
0
  #3
May 14th, 2004
Here are a few I've used in the past (dunno if they'll work for you since SED isn't my strong point):

# get return address header
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'

# get Subject header, but remove initial "Subject: " portion
sed '/^Subject: */!d; s///;q'

# get return address header
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'

# parse out the address proper. Pulls out the e-mail address by itself
# from the 1-line return address header (see preceding script)
sed 's/ *(.*)//; s/>.*//; s/.*[:<] *//'
My Home Away from Home: Yet Another Linux Blog
Reply With Quote Quick reply to this message  
Join Date: Mar 2004
Posts: 209
Reputation: i686-linux is on a distinguished road 
Solved Threads: 12
i686-linux's Avatar
i686-linux i686-linux is offline Offline
Posting Whiz in Training

Re: Help with SED/AWK email parser

 
0
  #4
May 14th, 2004
Those look awfully familiar. Did you grab those off of a "100 useful SED scripts" site?
PARANOIA:
A healthy understanding of the way the universe works.
Reply With Quote Quick reply to this message  
Join Date: Jan 2004
Posts: 468
Reputation: TKS will become famous soon enough TKS will become famous soon enough 
Solved Threads: 18
TKS's Avatar
TKS TKS is offline Offline
Posting Pro in Training

Re: Help with SED/AWK email parser

 
0
  #5
May 14th, 2004
I got them from my friend Josh who probably did pull them off that very site
My Home Away from Home: Yet Another Linux Blog
Reply With Quote Quick reply to this message  
Join Date: Mar 2004
Posts: 1,620
Reputation: kc0arf is a jewel in the rough kc0arf is a jewel in the rough kc0arf is a jewel in the rough 
Solved Threads: 51
Team Colleague
kc0arf kc0arf is offline Offline
Posting Virtuoso

Re: Help with SED/AWK email parser

 
0
  #6
May 14th, 2004
hello,

I am working with:

cat filename.txt | grep @

and getting the email names reduced to something in the one line. I am thinking that this will help. What I wonder is if we can get grep to simply output the found expression instead of the whole dang line.

I am also wondering if AWK will do what you need.

Christian
Reply With Quote Quick reply to this message  
Join Date: Mar 2004
Posts: 1,620
Reputation: kc0arf is a jewel in the rough kc0arf is a jewel in the rough kc0arf is a jewel in the rough 
Solved Threads: 51
Team Colleague
kc0arf kc0arf is offline Offline
Posting Virtuoso

Re: Help with SED/AWK email parser

 
0
  #7
May 14th, 2004
Wow. Looks like a bunch of us working on it at the same time. Cool.

Christian
Reply With Quote Quick reply to this message  
Join Date: Mar 2004
Posts: 209
Reputation: i686-linux is on a distinguished road 
Solved Threads: 12
i686-linux's Avatar
i686-linux i686-linux is offline Offline
Posting Whiz in Training

Re: Help with SED/AWK email parser

 
0
  #8
May 14th, 2004
Originally Posted by kc0arf
cat filename.txt | grep @

and getting the email names reduced to something in the one line. I am thinking that this will help. What I wonder is if we can get grep to simply output the found expression instead of the whole dang line.
Christian
That is what I posted about:

grep -o "[[:alnum:][:graph:]]*@[[:alnum:][:graph:]]*"

grep -o returns the matched expression instead of the whole line matched

I realized that this can be cut down to:

grep -o "[[:graph:]]*@[[:graph:]]*"
PARANOIA:
A healthy understanding of the way the universe works.
Reply With Quote Quick reply to this message  
Join Date: Jan 2004
Posts: 468
Reputation: TKS will become famous soon enough TKS will become famous soon enough 
Solved Threads: 18
TKS's Avatar
TKS TKS is offline Offline
Posting Pro in Training

Re: Help with SED/AWK email parser

 
0
  #9
May 14th, 2004
Using SED...you could also find a pattern similar to the grep -o

sed -n 's/.*\(pattern\).*/\1/p' file


Is the * in your grep -o example = to any character? I've never used that in a grep command before...
My Home Away from Home: Yet Another Linux Blog
Reply With Quote Quick reply to this message  
Join Date: Mar 2004
Posts: 209
Reputation: i686-linux is on a distinguished road 
Solved Threads: 12
i686-linux's Avatar
i686-linux i686-linux is offline Offline
Posting Whiz in Training

Re: Help with SED/AWK email parser

 
0
  #10
May 14th, 2004
* = any ammount of matches of the previous expression

For example:

[[:graph:]]* is really "Any printable and visible (non-space) character repeated any number of times"
PARANOIA:
A healthy understanding of the way the universe works.
Reply With Quote Quick reply to this message  
Reply

This thread has been marked solved.
Perhaps start a new thread instead?
Message:



Other Threads in the Shell Scripting Forum


Views: 15398 | Replies: 9
Thread Tools Search this Thread



Tag cloud for Shell Scripting
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC