How to split string with ',' ignoring commas between quotes
I have string like:
'par1=val1,par2=val2,par3="some text, again some text, again some text",par4="some text",par5=val5'
I have to split it to parts like:
par1=val1
par2=val2
par3="some text, again some text, again some text"
par4="some text"
par5=val5'
I use this code:
a = 'par1=val1,par2=val2,par3="some text1, again some text2, again some text3",par4="some text",par5=val5'.split(',')
newList = []
for i, b in enumerate(a) :
if b.find('=') != -1 :
newList.append(b)
else :
newList[len(newList)-1] += ',' + b
print(newList)
I'm looking for better solution, can anybody give me it.
Thank you in advance!
_neo_
Junior Poster in Training
64 posts since Aug 2010
Reputation Points: 26
Solved Threads: 3
griswolf
Veteran Poster
1,165 posts since Apr 2010
Reputation Points: 344
Solved Threads: 256
I could not parse it with csv, but if your string looks like python code, you can use python's own tokenizer:
# python 2 and 3
import sys
if sys.version_info < (3,):
from cStringIO import StringIO
else:
from io import StringIO
xrange = range
from tokenize import generate_tokens
a = 'par1=val1,par2=val2,par3="some text1, again some text2, again some text3",par4="some text",par5=val5'
def parts(a):
"""Split a python-tokenizable expression on comma operators"""
compos = [-1] # compos stores the positions of the relevant commas in the argument string
compos.extend(t[2][1] for t in generate_tokens(StringIO(a).readline) if t[1] == ',')
compos.append(len(a))
return [ a[compos[i]+1:compos[i+1]] for i in xrange(len(compos)-1)]
print(parts(a))
""" my output -->
['par1=val1', 'par2=val2', 'par3="some text1, again some text2, again some text3"', 'par4="some text"', 'par5=val5']
"""
The other alternative is to use regular expressions.
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
Thank you, Gribouillis. I use your snippet.
Thank you, griswolf too. Your link will solve my other problem ;)
_neo_
Junior Poster in Training
64 posts since Aug 2010
Reputation Points: 26
Solved Threads: 3
Here is a version with regex. It should work even if the data string contains newlines
# python 2 and 3
import re
regex = re.compile(r"\\.|[\"',]", re.DOTALL)
def parts(data):
delimiter = ''
compos = [-1]
for match in regex.finditer(data):
g = match.group(0)
if delimiter == '':
if g == ',':
compos.append(match.start())
elif g in "\"'":
delimiter = g
elif g == delimiter:
delimiter = ''
# you may uncomment the next line to catch errors
#if delimiter: raise ValueError("Unterminated string in data")
compos.append(len(data))
return [ data[compos[i]+1:compos[i+1]] for i in range(len(compos)-1)]
if __name__ == "__main__":
a = 'par1=val1,par2=val2,par3="some text1, again some text2, again some text3",par4="some text",par5=val5'
print(parts(a))
""" my output -->
['par1=val1', 'par2=val2', 'par3="some text1, again some text2, again some text3"', 'par4="some text"', 'par5=val5']
"""
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691