I have a bunch of text files with different formats that somewhere in the file have email addresses. I would like to be able to parse through any number of these files for email addresses. Here are the types of input:

CFO: some_cfo@domain.com

misterman@domain.com

The Main Man mainman@domain.com

To take care of the situations I have the following seds:

#Removes line with an opening title
sed -e 's/^.*://'

#Removes opening and closing whitepsace
sed -e 's/^[ ^t]*//;s/[ ^t]*$//'

Those are both really simple, but for the life of me I can't figure out how to remove normal text from before the email address. I either end up clobbering the whole thing, or including it.

I just need to end up with something like keep what is directly attached to the '@' and delete anything after or before other whitespace

The Main Man mainman@domain.com
^not part of email. ^ and ^ are both parts of email.

Any clues anyone?

I just need to end up with something like keep what is directly attached to the '@' and delete anything after or before other whitespace

The Main Man mainman@domain.com
^not part of email. ^ and ^ are both parts of email.

And grep saves the day. Next time I'll RTFM better. :)

grep -o "[[:alnum:][:graph:]]*@[[:alnum:][:graph:]]*"

I haven't tested for many bugs/quirks in the results yet, but a few quick checks seemed to work fine.

If anyone has any further ideas though they would still be greatly appreciated!

Member Avatar for TKSS

Here are a few I've used in the past (dunno if they'll work for you since SED isn't my strong point):

# get return address header
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'

# get Subject header, but remove initial "Subject: " portion
sed '/^Subject: */!d; s///;q'

# get return address header
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'

# parse out the address proper. Pulls out the e-mail address by itself
# from the 1-line return address header (see preceding script)
sed 's/ *(.*)//; s/>.*//; s/.*[:<] *//'

Those look awfully familiar. Did you grab those off of a "100 useful SED scripts" site? :)

Member Avatar for TKSS

I got them from my friend Josh who probably did pull them off that very site :D

hello,

I am working with:

cat filename.txt | grep @

and getting the email names reduced to something in the one line. I am thinking that this will help. What I wonder is if we can get grep to simply output the found expression instead of the whole dang line.

I am also wondering if AWK will do what you need.

Christian

Wow. Looks like a bunch of us working on it at the same time. Cool.

Christian

cat filename.txt | grep @

and getting the email names reduced to something in the one line. I am thinking that this will help. What I wonder is if we can get grep to simply output the found expression instead of the whole dang line.
Christian

That is what I posted about:

grep -o "[[:alnum:][:graph:]]*@[[:alnum:][:graph:]]*"

grep -o returns the matched expression instead of the whole line matched

I realized that this can be cut down to:

grep -o "[[:graph:]]*@[[:graph:]]*"

Member Avatar for TKSS

Using SED...you could also find a pattern similar to the grep -o

sed -n 's/.*\(pattern\).*/\1/p' file


Is the * in your grep -o example = to any character? I've never used that in a grep command before...

* = any ammount of matches of the previous expression

For example:

[[:graph:]]* is really "Any printable and visible (non-space) character repeated any number of times"

I'm a grep fan and rarely use sed or awk. To pull the first and last name that often precede the address you can use this.

grep -o "[[:alnum:]]*[[:blank:]][[:alnum:]]*[[:blank:]][[:graph:]]*@[[:graph:]]*"

This adds the first/last name preceding the address, by looking for all letters and numbers in any amount preceding at least one horizontal space of any type, twice. Once for first name, then again for last name. Then match the actual address including any brackets hyphens, underscores, etc. This makes building a contact list from the typical email headers and forwarded headers very easy. This will return each name/name/address match on a new line. you may want to switch the alnum for the name for graph if you want any non blank delineator to be included.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.