User Name Password Register
DaniWeb IT Discussion Community
All
What is DaniWeb IT Discussion Community?
You're currently browsing the Shell Scripting section within the Software Development category of DaniWeb, a massive community of 391,909 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 3,636 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Shell Scripting advertiser:
Views: 10262 | Replies: 9 | Solved
Reply
Join Date: Mar 2004
Location: Rancho Santa Margarita, California
Posts: 209
Reputation: i686-linux is an unknown quantity at this point 
Rep Power: 5
Solved Threads: 10
i686-linux's Avatar
i686-linux i686-linux is offline Offline
Posting Whiz in Training

Help with SED/AWK email parser

  #1  
May 14th, 2004
I have a bunch of text files with different formats that somewhere in the file have email addresses. I would like to be able to parse through any number of these files for email addresses. Here are the types of input:

CFO: some_cfo@domain.com

misterman@domain.com

The Main Man mainman@domain.com

To take care of the situations I have the following seds:

#Removes line with an opening title
sed -e 's/^.*://'

#Removes opening and closing whitepsace
sed -e 's/^[ ^t]*//;s/[ ^t]*$//'

Those are both really simple, but for the life of me I can't figure out how to remove normal text from before the email address. I either end up clobbering the whole thing, or including it.

I just need to end up with something like keep what is directly attached to the '@' and delete anything after or before other whitespace

The Main Man mainman@domain.com
^not part of email. ^ and ^ are both parts of email.

Any clues anyone?
Last edited by i686-linux : May 14th, 2004 at 1:56 pm. Reason: Formatting error
PARANOIA:
A healthy understanding of the way the universe works.
AddThis Social Bookmark Button
Reply With Quote  
Join Date: Mar 2004
Location: Rancho Santa Margarita, California
Posts: 209
Reputation: i686-linux is an unknown quantity at this point 
Rep Power: 5
Solved Threads: 10
i686-linux's Avatar
i686-linux i686-linux is offline Offline
Posting Whiz in Training

Re: Help with SED/AWK email parser

  #2  
May 14th, 2004
Originally Posted by i686-linux

I just need to end up with something like keep what is directly attached to the '@' and delete anything after or before other whitespace

The Main Man mainman@domain.com
^not part of email. ^ and ^ are both parts of email.

And grep saves the day. Next time I'll RTFM better.

grep -o "[[:alnum:][:graph:]]*@[[:alnum:][:graph:]]*"

I haven't tested for many bugs/quirks in the results yet, but a few quick checks seemed to work fine.

If anyone has any further ideas though they would still be greatly appreciated!
PARANOIA:
A healthy understanding of the way the universe works.
Reply With Quote  
Join Date: Jan 2004
Location: VA, USA
Posts: 458
Reputation: TKS will become famous soon enough TKS will become famous soon enough 
Rep Power: 6
Solved Threads: 17
TKS's Avatar
TKS TKS is offline Offline
Posting Pro in Training

Re: Help with SED/AWK email parser

  #3  
May 14th, 2004
Here are a few I've used in the past (dunno if they'll work for you since SED isn't my strong point):

# get return address header
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'

# get Subject header, but remove initial "Subject: " portion
sed '/^Subject: */!d; s///;q'

# get return address header
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'

# parse out the address proper. Pulls out the e-mail address by itself
# from the 1-line return address header (see preceding script)
sed 's/ *(.*)//; s/>.*//; s/.*[:<] *//'
My Home Away from Home: Yet Another Linux Blog
Reply With Quote  
Join Date: Mar 2004
Location: Rancho Santa Margarita, California
Posts: 209
Reputation: i686-linux is an unknown quantity at this point 
Rep Power: 5
Solved Threads: 10
i686-linux's Avatar
i686-linux i686-linux is offline Offline
Posting Whiz in Training

Re: Help with SED/AWK email parser

  #4  
May 14th, 2004
Those look awfully familiar. Did you grab those off of a "100 useful SED scripts" site?
PARANOIA:
A healthy understanding of the way the universe works.
Reply With Quote  
Join Date: Jan 2004
Location: VA, USA
Posts: 458
Reputation: TKS will become famous soon enough TKS will become famous soon enough 
Rep Power: 6
Solved Threads: 17
TKS's Avatar
TKS TKS is offline Offline
Posting Pro in Training

Re: Help with SED/AWK email parser

  #5  
May 14th, 2004
I got them from my friend Josh who probably did pull them off that very site
My Home Away from Home: Yet Another Linux Blog
Reply With Quote  
Join Date: Mar 2004
Posts: 1,514
Reputation: kc0arf is a jewel in the rough kc0arf is a jewel in the rough kc0arf is a jewel in the rough 
Rep Power: 10
Solved Threads: 48
Colleague
kc0arf kc0arf is offline Offline
Posting Virtuoso

Re: Help with SED/AWK email parser

  #6  
May 14th, 2004
hello,

I am working with:

cat filename.txt | grep @

and getting the email names reduced to something in the one line. I am thinking that this will help. What I wonder is if we can get grep to simply output the found expression instead of the whole dang line.

I am also wondering if AWK will do what you need.

Christian
Reply With Quote  
Join Date: Mar 2004
Posts: 1,514
Reputation: kc0arf is a jewel in the rough kc0arf is a jewel in the rough kc0arf is a jewel in the rough 
Rep Power: 10
Solved Threads: 48
Colleague
kc0arf kc0arf is offline Offline
Posting Virtuoso

Re: Help with SED/AWK email parser

  #7  
May 14th, 2004
Wow. Looks like a bunch of us working on it at the same time. Cool.

Christian
Reply With Quote  
Join Date: Mar 2004
Location: Rancho Santa Margarita, California
Posts: 209
Reputation: i686-linux is an unknown quantity at this point 
Rep Power: 5
Solved Threads: 10
i686-linux's Avatar
i686-linux i686-linux is offline Offline
Posting Whiz in Training

Re: Help with SED/AWK email parser

  #8  
May 14th, 2004
Originally Posted by kc0arf
cat filename.txt | grep @

and getting the email names reduced to something in the one line. I am thinking that this will help. What I wonder is if we can get grep to simply output the found expression instead of the whole dang line.
Christian

That is what I posted about:

grep -o "[[:alnum:][:graph:]]*@[[:alnum:][:graph:]]*"

grep -o returns the matched expression instead of the whole line matched

I realized that this can be cut down to:

grep -o "[[:graph:]]*@[[:graph:]]*"
PARANOIA:
A healthy understanding of the way the universe works.
Reply With Quote  
Join Date: Jan 2004
Location: VA, USA
Posts: 458
Reputation: TKS will become famous soon enough TKS will become famous soon enough 
Rep Power: 6
Solved Threads: 17
TKS's Avatar
TKS TKS is offline Offline
Posting Pro in Training

Re: Help with SED/AWK email parser

  #9  
May 14th, 2004
Using SED...you could also find a pattern similar to the grep -o

sed -n 's/.*\(pattern\).*/\1/p' file


Is the * in your grep -o example = to any character? I've never used that in a grep command before...
My Home Away from Home: Yet Another Linux Blog
Reply With Quote  
Join Date: Mar 2004
Location: Rancho Santa Margarita, California
Posts: 209
Reputation: i686-linux is an unknown quantity at this point 
Rep Power: 5
Solved Threads: 10
i686-linux's Avatar
i686-linux i686-linux is offline Offline
Posting Whiz in Training

Re: Help with SED/AWK email parser

  #10  
May 14th, 2004
* = any ammount of matches of the previous expression

For example:

[[:graph:]]* is really "Any printable and visible (non-space) character repeated any number of times"
PARANOIA:
A healthy understanding of the way the universe works.
Reply With Quote  
Reply

Only community members can participate in forum threads. You must register or log in to contribute.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 

DaniWeb Shell Scripting Marketplace
Thread Tools Display Modes

Other Threads in the Shell Scripting Forum

All times are GMT -4. The time now is 7:47 am.
Forum system based on vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC