Problem in parsing a text file(e.g. vCard)

Question

M.S. 53 Light Poster

13 Years Ago

I tried many times but no luck. I want to parse a text file with a vcard format (from my phone contacts).

the text file looks like this:

BEGIN:VCARD
VERSION:2.1
REV:20110913T095232Z
UID:aac119d5fe3bc9dc-00e17913379d6cc8-3
N;X-EPOCCNTMODELLABEL1=First name:;Maj;;;
TEL;VOICE:09120000000
X-CLASS:private
END:VCARD
BEGIN:VCARD
VERSION:2.1
REV:20110228T083215Z
UID:aac119d5fe3bc9dc-00e17b0693898c98-4
N;X-EPOCCNTMODELLABEL1=First name:;Ali jahan;;;
TEL;VOICE:09120000001
X-CLASS:private
END:VCARD
BEGIN:VCARD
VERSION:2.1
REV:20110228T083510Z
UID:aac119d5fe3bc9dc-00e17b069df653a0-5
N;X-EPOCCNTMODELLABEL0=Last name;X-EPOCCNTMODELLABEL1=First name:Eqlimi;Mostafa;;;
TEL;CELL;1:+989120000002
TEL;VOICE:09180000003
X-CLASS:private
TEL;CELL;2:09390000004
X-CLASS:private
END:VCARD

I pasted three different pieces of data and i bolded the info i want to extract.
the output i like to have is a dict like this:
name_numbers = {'Maj':'09120000000', 'Ali jahan':'09120000001', 'Eqlimi Mostafa':'09120000002'}

the code i made so far is:

data = open('contacts.vcf', 'r')
name = ''
number = ''
if data:
    for l in data:
        if l.startwith('N;'):
            name = l.split(':')[1].strip(';')
        if l.startwith('TEL'):
            number = l.split(':')[1]
    print "%s: %s"%(name, number)

P.S: Im using python on my phone and the python on my phone is python 2.2.2

by the way, I need this for a free application for symbian nokia phones and its not a homework.

thanks in advance for any help

python

4 Contributors
8 Replies
2K Views
1 Day Discussion Span
Latest Post 13 Years Ago Latest Post by snippsat

Gribouillis 1,391 Programming Explorer

13 Years Ago

There are python modules to parse vcards. I pip-installed a module called vobject and it works:

#!/usr/bin/env python
# -*-coding: utf8-*-

import vobject

cards = ["""BEGIN:VCARD
VERSION:2.1
REV:20110913T095232Z
UID:aac119d5fe3bc9dc-00e17913379d6cc8-3
N;X-EPOCCNTMODELLABEL1=First name:;Maj;;;
TEL;VOICE:09120000000
X-CLASS:private
END:VCARD""",

"""BEGIN:VCARD
VERSION:2.1
REV:20110228T083215Z
UID:aac119d5fe3bc9dc-00e17b0693898c98-4
N;X-EPOCCNTMODELLABEL1=First name:;Ali jahan;;;
TEL;VOICE:09120000001
X-CLASS:private
END:VCARD""",

"""BEGIN:VCARD
VERSION:2.1
REV:20110228T083510Z
UID:aac119d5fe3bc9dc-00e17b069df653a0-5
N;X-EPOCCNTMODELLABEL0=Last name;X-EPOCCNTMODELLABEL1=First name:Eqlimi;Mostafa;;;
TEL;CELL;1:+989120000002
TEL;VOICE:09180000003
X-CLASS:private
TEL;CELL;2:09390000004
X-CLASS:private
END:VCARD"""
]

if __name__ == "__main__":
    result = dict()
    for c in cards:
        v = vobject.readOne( c )
        # v.prettyPrint()
        result[str(v.n.value).strip()] = v.tel.value
    print result

""" my output -->
{'Ali jahan': u'09120000001', 'Maj': u'09120000000', 'Mostafa  Eqlimi': u'+989120000002'}
"""

Not sure it will work on your phone however

Edited 13 Years Ago by Gribouillis because: n/a

snippsat 661 Master Poster

13 Years Ago

I see that you have solved it,that`s god.
I did mix something together with regex when i saw the post.
Here it is not the cleanest soultion,but shoud work ok.

import re

text = '''\
BEGIN:VCARD
VERSION:2.1
REV:20110913T095232Z
UID:aac119d5fe3bc9dc-00e17913379d6cc8-3
N;X-EPOCCNTMODELLABEL1=First name:;Maj;;;
TEL;VOICE:09120000000
X-CLASS:private
END:VCARD
BEGIN:VCARD
VERSION:2.1
REV:20110228T083215Z
UID:aac119d5fe3bc9dc-00e17b0693898c98-4
N;X-EPOCCNTMODELLABEL1=First name:;Ali jahan;;;
TEL;VOICE:09120000001
X-CLASS:private
END:VCARD
BEGIN:VCARD
VERSION:2.1
REV:20110228T083510Z
UID:aac119d5fe3bc9dc-00e17b069df653a0-5
N;X-EPOCCNTMODELLABEL0=Last name;X-EPOCCNTMODELLABEL1=First name:Eqlimi;Mostafa;;;
TEL;CELL;1:+989120000002
TEL;VOICE:09180000003
X-CLASS:private
TEL;CELL;2:09390000004
X-CLASS:private
END:VCARD
'''

#Find names
name_re = []
for match in re.finditer(r"name:;.*|name:.*", text):
     name_re.append(match.group())
temp = re.sub(r'name|[:;]', ' ', ','.join(name_re))
temp = temp.strip().split(',')
names = [i.strip() for i in temp]

#Find voices
voice = []
for v in re.finditer(r"VOICE:(.*)|1:(.*)", text):
     voice.append(v.group(1)),voice.append(v.group(2))
voice = [i for i in voice if i != None][:3]


#Zip list together
result = dict(zip(names, voice))
print result
"""Output-->
{'Ali jahan': '09120000001', 'Eqlimi Mostafa': '+989120000002', 'Maj': '09120000000'}
"""

Edited 13 Years Ago by snippsat because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 1 · 2012-02-01T11:43:54+00:00

If you insist on doing it yourself, the method is startswith, not startwith:

data = open('contacts.vcf', 'r')
name = ''
number = ''
if data:
    for line in data:
        if line.startswith('N;'):
            name = line.split(':')[1].strip(' ;:\n')
        if name and line.startswith('TEL'):
            number = line.split(':')[1].rstrip()
            print "%s: %s"%(name, number)
            name = number = ''

Output:

Maj: 09120000000
Ali jahan: 09120000001
Eqlimi;Mostafa: +989120000002

M.S. 53 Light Poster · Answer 2 · 2012-02-01T11:47:30+00:00

thank you Gribouillis for your kind reply, but unfortunately it doesnt work on my mobile.

regex may help but im totaly unfamilliar with regexp.

@PyTony: this code of mine is very poor and cant handle a file with multiple vcards and also vcards with extra lines like the third one I pasted in the first post.

M.S. 53 Light Poster · Answer 3 · 2012-02-01T12:23:56+00:00

oh thanks PyTony, I didnt notice other changes you made to that code.
Ill try your revision and tell the result from the whole 250 vcards file.

M.S. 53 Light Poster · Answer 4 · 2012-02-01T12:38:34+00:00

M.S. 53 Light Poster

13 Years Ago

YES! it worked greatly thank you bros.

M.S. 53 Light Poster · Answer 5 · 2012-02-02T07:05:34+00:00

@snippsat:

thank you dear snippsat.
I unmarked the thread back to unsolved to ask just one more question.
regex method is great but can you please tell how can I manage the re expression to extract the CELL's first number for every vCard(instead of VOICE numbers).

thanks again

snippsat 661 Master Poster · Answer 6 · 2012-02-02T20:06:03+00:00

Something like this should do it.

import re

text = '''\
BEGIN:VCARD
VERSION:2.1
REV:20110913T095232Z
UID:aac119d5fe3bc9dc-00e17913379d6cc8-3
N;X-EPOCCNTMODELLABEL1=First name:;Maj;;;
TEL;VOICE:09120000000
X-CLASS:private
END:VCARD
BEGIN:VCARD
VERSION:2.1
REV:20110228T083215Z
UID:aac119d5fe3bc9dc-00e17b0693898c98-4
N;X-EPOCCNTMODELLABEL1=First name:;Ali jahan;;;
TEL;VOICE:09120000001
X-CLASS:private
END:VCARD
BEGIN:VCARD
VERSION:2.1
REV:20110228T083510Z
UID:aac119d5fe3bc9dc-00e17b069df653a0-5
N;X-EPOCCNTMODELLABEL0=Last name;X-EPOCCNTMODELLABEL1=First name:Eqlimi;Mostafa;;;
TEL;CELL;1:+989120000002
TEL;VOICE:09180000003
X-CLASS:private
TEL;CELL;2:09390000004
X-CLASS:private
END:VCARD
'''

for match in re.finditer(r"CELL;\d:(.*)", text):
    print match.group(1)

'''Output-->
+989120000002
09390000004
'''