Help with SED/AWK email parser

Question

i686-linux 75 Posting Whiz in Training

21 Years Ago

I have a bunch of text files with different formats that somewhere in the file have email addresses. I would like to be able to parse through any number of these files for email addresses. Here are the types of input:

CFO: some_cfo@domain.com

misterman@domain.com

The Main Man mainman@domain.com

To take care of the situations I have the following seds:

#Removes line with an opening title
sed -e 's/^.*://'

#Removes opening and closing whitepsace
sed -e 's/^[ ^t]*//;s/[ ^t]*$//'

Those are both really simple, but for the life of me I can't figure out how to remove normal text from before the email address. I either end up clobbering the whole thing, or including it.

I just need to end up with something like keep what is directly attached to the '@' and delete anything after or before other whitespace

The Main Man mainman@domain.com
^not part of email. ^ and ^ are both parts of email.

Any clues anyone?

email shell-scripting

4 Contributors
10 Replies
1K Views
5 Years Discussion Span
Latest Post 15 Years Ago Latest Post by Skifter

All 10 Replies

TKSS

21 Years Ago

Here are a few I've used in the past (dunno if they'll work for you since SED isn't my strong point):

# get return address header
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'

# get Subject header, but remove initial "Subject: " portion
sed '/^Subject: */!d; s///;q'

# get return address header
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'

# parse out the address proper. Pulls out the e-mail address by itself
# from the 1-line return address header (see preceding script)
sed 's/ *(.*)//; s/>.*//; s/.*[:<] *//'

TKSS

21 Years Ago

I got them from my friend Josh who probably did pull them off that very site :D

TKSS

21 Years Ago

Using SED...you could also find a pattern similar to the grep -o

sed -n 's/.*$pattern$.*/\1/p' file

Is the * in your grep -o example = to any character? I've never used that in a grep command before...

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

i686-linux 75 Posting Whiz in Training · Answer 1 · 2004-05-15T02:08:01+00:00

I just need to end up with something like keep what is directly attached to the '@' and delete anything after or before other whitespace
The Main Man mainman@domain.com
^not part of email. ^ and ^ are both parts of email.

And grep saves the day. Next time I'll RTFM better. :)

grep -o "[[:alnum:][:graph:]]*@[[:alnum:][:graph:]]*"

I haven't tested for many bugs/quirks in the results yet, but a few quick checks seemed to work fine.

If anyone has any further ideas though they would still be greatly appreciated!

i686-linux 75 Posting Whiz in Training · Answer 2 · 2004-05-15T02:16:34+00:00

Those look awfully familiar. Did you grab those off of a "100 useful SED scripts" site? :)

kc0arf 68 Posting Virtuoso Team Colleague · Answer 3 · 2004-05-15T02:38:30+00:00

hello,

I am working with:

cat filename.txt | grep @

and getting the email names reduced to something in the one line. I am thinking that this will help. What I wonder is if we can get grep to simply output the found expression instead of the whole dang line.

I am also wondering if AWK will do what you need.

Christian

kc0arf 68 Posting Virtuoso Team Colleague · Answer 4 · 2004-05-15T02:40:35+00:00

Wow. Looks like a bunch of us working on it at the same time. Cool.

Christian

i686-linux 75 Posting Whiz in Training · Answer 5 · 2004-05-15T02:50:37+00:00

cat filename.txt | grep @
and getting the email names reduced to something in the one line. I am thinking that this will help. What I wonder is if we can get grep to simply output the found expression instead of the whole dang line.
Christian

That is what I posted about:

grep -o "[[:alnum:][:graph:]]*@[[:alnum:][:graph:]]*"

grep -o returns the matched expression instead of the whole line matched

I realized that this can be cut down to:

grep -o "[[:graph:]]*@[[:graph:]]*"

i686-linux 75 Posting Whiz in Training · Answer 6 · 2004-05-15T03:04:42+00:00

* = any ammount of matches of the previous expression

For example:

[[:graph:]]* is really "Any printable and visible (non-space) character repeated any number of times"

Skifter 0 Newbie Poster · Answer 7 · 2010-03-14T04:45:18+00:00

I'm a grep fan and rarely use sed or awk. To pull the first and last name that often precede the address you can use this.

grep -o "[[:alnum:]]*[[:blank:]][[:alnum:]]*[[:blank:]][[:graph:]]*@[[:graph:]]*"

This adds the first/last name preceding the address, by looking for all letters and numbers in any amount preceding at least one horizontal space of any type, twice. Once for first name, then again for last name. Then match the actual address including any brackets hyphens, underscores, etc. This makes building a contact list from the typical email headers and forwarded headers very easy. This will return each name/name/address match on a new line. you may want to switch the alnum for the name for graph if you want any non blank delineator to be included.

Help with SED/AWK email parser

Recommended Answers Collapse Answers

All 10 Replies

Recommended Answers