Problem in parsing a text file(e.g. vCard)
I tried many times but no luck. I want to parse a text file with a vcard format (from my phone contacts).
the text file looks like this:
BEGIN:VCARD
VERSION:2.1
REV:20110913T095232Z
UID:aac119d5fe3bc9dc-00e17913379d6cc8-3
N;X-EPOCCNTMODELLABEL1=First name:;Maj;;;
TEL;VOICE:09120000000
X-CLASS:private
END:VCARD
BEGIN:VCARD
VERSION:2.1
REV:20110228T083215Z
UID:aac119d5fe3bc9dc-00e17b0693898c98-4
N;X-EPOCCNTMODELLABEL1=First name:;Ali jahan;;;
TEL;VOICE:09120000001
X-CLASS:private
END:VCARD
BEGIN:VCARD
VERSION:2.1
REV:20110228T083510Z
UID:aac119d5fe3bc9dc-00e17b069df653a0-5
N;X-EPOCCNTMODELLABEL0=Last name;X-EPOCCNTMODELLABEL1=First name:Eqlimi;Mostafa;;;
TEL;CELL;1:+989120000002
TEL;VOICE:09180000003
X-CLASS:private
TEL;CELL;2:09390000004
X-CLASS:private
END:VCARD
I pasted three different pieces of data and i bolded the info i want to extract.
the output i like to have is a dict like this:
name_numbers = {'Maj':'09120000000', 'Ali jahan':'09120000001', 'Eqlimi Mostafa':'09120000002'}
the code i made so far is:
data = open('contacts.vcf', 'r')
name = ''
number = ''
if data:
for l in data:
if l.startwith('N;'):
name = l.split(':')[1].strip(';')
if l.startwith('TEL'):
number = l.split(':')[1]
print "%s: %s"%(name, number)
P.S: Im using python on my phone and the python on my phone ispython 2.2.2
by the way, I need this for a free application for symbian nokia phones and its not a homework.
thanks in advance for any help
M.S.
Junior Poster in Training
56 posts since Jul 2011
Reputation Points: 28
Solved Threads: 7
There are python modules to parse vcards. I pip-installed a module called vobject and it works:
#!/usr/bin/env python
# -*-coding: utf8-*-
import vobject
cards = ["""BEGIN:VCARD
VERSION:2.1
REV:20110913T095232Z
UID:aac119d5fe3bc9dc-00e17913379d6cc8-3
N;X-EPOCCNTMODELLABEL1=First name:;Maj;;;
TEL;VOICE:09120000000
X-CLASS:private
END:VCARD""",
"""BEGIN:VCARD
VERSION:2.1
REV:20110228T083215Z
UID:aac119d5fe3bc9dc-00e17b0693898c98-4
N;X-EPOCCNTMODELLABEL1=First name:;Ali jahan;;;
TEL;VOICE:09120000001
X-CLASS:private
END:VCARD""",
"""BEGIN:VCARD
VERSION:2.1
REV:20110228T083510Z
UID:aac119d5fe3bc9dc-00e17b069df653a0-5
N;X-EPOCCNTMODELLABEL0=Last name;X-EPOCCNTMODELLABEL1=First name:Eqlimi;Mostafa;;;
TEL;CELL;1:+989120000002
TEL;VOICE:09180000003
X-CLASS:private
TEL;CELL;2:09390000004
X-CLASS:private
END:VCARD"""
]
if __name__ == "__main__":
result = dict()
for c in cards:
v = vobject.readOne( c )
# v.prettyPrint()
result[str(v.n.value).strip()] = v.tel.value
print result
""" my output -->
{'Ali jahan': u'09120000001', 'Maj': u'09120000000', 'Mostafa Eqlimi': u'+989120000002'}
"""
Not sure it will work on your phone however
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
If you insist on doing it yourself, the method is startswith, not startwith:
data = open('contacts.vcf', 'r')
name = ''
number = ''
if data:
for line in data:
if line.startswith('N;'):
name = line.split(':')[1].strip(' ;:\n')
if name and line.startswith('TEL'):
number = line.split(':')[1].rstrip()
print "%s: %s"%(name, number)
name = number = ''
Output:
Maj: 09120000000
Ali jahan: 09120000001
Eqlimi;Mostafa: +989120000002
pyTony
pyMod
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
thank you Gribouillis for your kind reply, but unfortunately it doesnt work on my mobile.
regex may help but im totaly unfamilliar with regexp.
@PyTony: this code of mine is very poor and cant handle a file with multiple vcards and also vcards with extra lines like the third one I pasted in the first post.
M.S.
Junior Poster in Training
56 posts since Jul 2011
Reputation Points: 28
Solved Threads: 7
oh thanks PyTony, I didnt notice other changes you made to that code.
Ill try your revision and tell the result from the whole 250 vcards file.
M.S.
Junior Poster in Training
56 posts since Jul 2011
Reputation Points: 28
Solved Threads: 7
YES! it worked greatly thank you bros.
M.S.
Junior Poster in Training
56 posts since Jul 2011
Reputation Points: 28
Solved Threads: 7
I see that you have solved it,that`s god.
I did mix something together with regex when i saw the post.
Here it is not the cleanest soultion,but shoud work ok.
import re
text = '''\
BEGIN:VCARD
VERSION:2.1
REV:20110913T095232Z
UID:aac119d5fe3bc9dc-00e17913379d6cc8-3
N;X-EPOCCNTMODELLABEL1=First name:;Maj;;;
TEL;VOICE:09120000000
X-CLASS:private
END:VCARD
BEGIN:VCARD
VERSION:2.1
REV:20110228T083215Z
UID:aac119d5fe3bc9dc-00e17b0693898c98-4
N;X-EPOCCNTMODELLABEL1=First name:;Ali jahan;;;
TEL;VOICE:09120000001
X-CLASS:private
END:VCARD
BEGIN:VCARD
VERSION:2.1
REV:20110228T083510Z
UID:aac119d5fe3bc9dc-00e17b069df653a0-5
N;X-EPOCCNTMODELLABEL0=Last name;X-EPOCCNTMODELLABEL1=First name:Eqlimi;Mostafa;;;
TEL;CELL;1:+989120000002
TEL;VOICE:09180000003
X-CLASS:private
TEL;CELL;2:09390000004
X-CLASS:private
END:VCARD
'''
#Find names
name_re = []
for match in re.finditer(r"name:;.*|name:.*", text):
name_re.append(match.group())
temp = re.sub(r'name|[:;]', ' ', ','.join(name_re))
temp = temp.strip().split(',')
names = [i.strip() for i in temp]
#Find voices
voice = []
for v in re.finditer(r"VOICE:(.*)|1:(.*)", text):
voice.append(v.group(1)),voice.append(v.group(2))
voice = [i for i in voice if i != None][:3]
#Zip list together
result = dict(zip(names, voice))
print result
"""Output-->
{'Ali jahan': '09120000001', 'Eqlimi Mostafa': '+989120000002', 'Maj': '09120000000'}
"""
snippsat
Practically a Posting Shark
808 posts since Aug 2008
Reputation Points: 353
Solved Threads: 294
@snippsat:
thank you dear snippsat.
I unmarked the thread back to unsolved to ask just one more question.
regex method is great but can you please tell how can I manage the re expression to extract the CELL's first number for every vCard(instead of VOICE numbers).
thanks again
M.S.
Junior Poster in Training
56 posts since Jul 2011
Reputation Points: 28
Solved Threads: 7
Something like this should do it.
import re
text = '''\
BEGIN:VCARD
VERSION:2.1
REV:20110913T095232Z
UID:aac119d5fe3bc9dc-00e17913379d6cc8-3
N;X-EPOCCNTMODELLABEL1=First name:;Maj;;;
TEL;VOICE:09120000000
X-CLASS:private
END:VCARD
BEGIN:VCARD
VERSION:2.1
REV:20110228T083215Z
UID:aac119d5fe3bc9dc-00e17b0693898c98-4
N;X-EPOCCNTMODELLABEL1=First name:;Ali jahan;;;
TEL;VOICE:09120000001
X-CLASS:private
END:VCARD
BEGIN:VCARD
VERSION:2.1
REV:20110228T083510Z
UID:aac119d5fe3bc9dc-00e17b069df653a0-5
N;X-EPOCCNTMODELLABEL0=Last name;X-EPOCCNTMODELLABEL1=First name:Eqlimi;Mostafa;;;
TEL;CELL;1:+989120000002
TEL;VOICE:09180000003
X-CLASS:private
TEL;CELL;2:09390000004
X-CLASS:private
END:VCARD
'''
for match in re.finditer(r"CELL;\d:(.*)", text):
print match.group(1)
'''Output-->
+989120000002
09390000004
'''
snippsat
Practically a Posting Shark
808 posts since Aug 2008
Reputation Points: 353
Solved Threads: 294