learning re module

Question

krystosan 0 Junior Poster

12 Years Ago

I am learning re module

for practice I have taken an export of my phone addressbook which is a comma seperated text file, containing
"First Name","Mobile Phone","Home Phone","Company","E-mail Address","Company Main Phone","Business Fax","Birthday"

as of now I am more interested in First name , mobile phone number and email address.
in my phone book I have first names with characters a-z in small caps as well as capitals with - and some also containig number or special characters so almost everything is included.

so I build a search pattern like this namePattern = re.compile('[a-zA-Z _@-]+')

this works fine but it also gets me all characters from company name, email address, birthday entries data what i need is to search the row containig First Name..similarly i want to seperate out email and number and so the search dont get mixed.

so far this is what I have got

ph=open(phile,"r")

try:

    lines=ph.readlines()
    firstLine=ph.readline()
except Exception as er:
       print er
finally: ph.close()
print firstLine
namePattern = re.compile('[a-zA-Z _@-]+')

for index,line in enumerate(lines):
    for namMatch in re.finditer(namePattern,line):
        s=namMatch.start()
        e=namMatch.end()
        print index, line[s:e]

how should I only get First names only ...?

python regex

3 Contributors
6 Replies
406 Views
1 Week Discussion Span
Latest Post 12 Years Ago Latest Post by snippsat

All 6 Replies

Gribouillis 1,391 Programming Explorer

12 Years Ago

If it is a csv file, it would be easier to read with the csv module. You would select the first entry of each row to get the first name. Can you post a few rows of your file (with real names and addresses replaced by similar fancy data) ?

Edited 12 Years Ago by Gribouillis

Gribouillis 1,391 Programming Explorer

12 Years Ago

Well, here is how to read it with the csv module

#!/usr/bin/env python
# -*-coding: utf8-*-
from __future__ import unicode_literals, print_function, division

from collections import namedtuple
import csv

csv.register_dialect('bookdialect',
    delimiter = str(','),
    quoting = csv.QUOTE_ALL,
    doublequote = False,
    quotechar = str('"'),
    escapechar = str('\\'),
)

record = namedtuple("record", "name phmobile phhome company email phcompany fax birthday")

with open("book.csv", "rb") as ifh:
    reader = csv.reader(ifh, dialect = 'bookdialect')
    records = list(record(*(x.strip() for x in row)) for row in reader)

print(records)
print("="*20)
print(records[3])
print("="*20)
print(records[4].name, records[4].phhome, records[4].email)

"""my output -->
[record(name='First Name', phmobile='Mobile Phone', phhome='Home Phone', company='Company', email='E-mail Address', phcompany='Company Main Phone', fax='Business Fax', birthday='Birthday'), record(name='121', phmobile='121', phhome='', company='', email='', phcompany='', fax='', birthday=''), record(name='Abha Garg', phmobile='08600746256', phhome='', company='', email='', phcompany='', fax='', birthday=''), record(name='Bakh Bagla(a G)', phmobile='+91932424617', phhome='', company='', email='', phcompany='0188424242', fax='', birthday=''), record(name='Dad', phmobile='+91945334045', phhome='+9188743428', company='', email='eresfsfdra@yrdl.com', phcompany='+91353423449', fax='', birthday=''), record(name='tailor master', phmobile='9357310498', phhome='', company='', email='', phcompany='', fax='', birthday=''), record(name='taruna', phmobile='09015561619', phhome='+918968391049', company='', email='tarunthegreat43@gmail.com', phcompany='', fax='', birthday=''), record(name='Kanika Jain TL@ Pf', phmobile='9967504886', phhome='', company='', email='', phcompany='', fax='', birthday=''), record(name='', phmobile='+918968554786', phhome='', company='', email='', phcompany='', fax='', birthday=''), record(name='', phmobile='9167228454', phhome='', company='', email='', phcompany='', fax='', birthday=''), record(name='VAS Act/Deact', phmobile='12116', phhome='', company='', email='', phcompany='', fax='', birthday='')]
====================
record(name='Bakh Bagla(a G)', phmobile='+91932424617', phhome='', company='', email='', phcompany='0188424242', fax='', birthday='')
====================
Dad +9188743428 eresfsfdra@yrdl.com
"""

There are also solutions if the file is encoded in unicode.

Edited 12 Years Ago by Gribouillis

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

krystosan 0 Junior Poster · Answer 1 · 2012-12-24T13:47:13+00:00

Here is an example of different columns from the file , i have chnaged numbers and mail ids

"First Name","Mobile Phone","Home Phone","Company","E-mail Address","Company Main Phone","Business Fax","Birthday"
"121"," 121",,,,,,
"Abha Garg","   08600746256",,,,,,
"Bakh Bagla(a G)"," +91932424617",,,,"  0188424242",,
"Dad"," +91945334045"," +9188743428",,"eresfsfdra@yrdl.com","   +91353423449",,
"tailor master","   9357310498",,,,,,
"taruna","  09015561619","  +918968391049",,"tarunthegreat43@gmail.com",,,
"Kanika Jain TL@ Pf","  9967504886",,,,,,
,"  +918968554786",,,,,,
,"  9167228454",,,,,,
"VAS Act/Deact","   12116",,,,,,

krystosan 0 Junior Poster · Answer 2 · 2012-12-31T16:50:22+00:00

krystosan 0 Junior Poster

12 Years Ago

but i wanted to learn re module...

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 3 · 2012-12-31T17:52:48+00:00

but i wanted to learn re module...

You can start with the chapter about the re module in diveintopython.

The first rule is always use raw strings for regular expressions, eg re.compile( r"foo" ) (notice the r).

snippsat 661 Master Poster · Answer 4 · 2013-01-01T05:22:17+00:00

but i wanted to learn re module..

Start with something simpler.
An example could be something like this.
Match phonenumber with +.

>>> import re
>>> s = '"taruna","  09015561619","  +918968391049",,"tarunthegreat43@gmail.com",,,'
>>> re.findall(r'\+\d+', s)
['+918968391049']

So \+ macth +.
+ alone has a spesiell mening(Matches 1 or more of the preceeding token)
\d Matches any digit character (0-9)
And last + so we get whole number.

Email adress.
\w+ Matches any word character
Then mactch @,then a new \w+.\w+
So will this work?

>>> s = '"taruna","  09015561619","  +918968391049",,"tarunthegreat43@gmail.com",,,'
>>> re.findall(r'\w+@\w+.\w+', s)
['tarunthegreat43@gmail.com']

Read python doc
There are many online tools that can help with regex.
http://www.gskinner.com/RegExr/
http://osteele.com/tools/rework/
http://regexlib.com/?AspxAutoDetectCookieSupport=1
http://rubular.com/

learning re module

Recommended Answers Collapse Answers

All 6 Replies

Recommended Answers