954,525 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

How to split string with ',' ignoring commas between quotes

I have string like:
'par1=val1,par2=val2,par3="some text, again some text, again some text",par4="some text",par5=val5'

I have to split it to parts like:
par1=val1
par2=val2
par3="some text, again some text, again some text"
par4="some text"
par5=val5'

I use this code:

a = 'par1=val1,par2=val2,par3="some text1, again some text2, again some text3",par4="some text",par5=val5'.split(',')
newList = []
for i, b in enumerate(a) :
    if b.find('=') != -1 :
        newList.append(b)
    else :
        newList[len(newList)-1] += ',' + b
print(newList)


I'm looking for better solution, can anybody give me it.

Thank you in advance!

_neo_
Junior Poster in Training
64 posts since Aug 2010
Reputation Points: 26
Solved Threads: 3
 

This is a solved problem. Look at the csv module

griswolf
Veteran Poster
1,165 posts since Apr 2010
Reputation Points: 344
Solved Threads: 256
 

I could not parse it with csv, but if your string looks like python code, you can use python's own tokenizer:

# python 2 and 3
import sys
if sys.version_info < (3,):
    from cStringIO import StringIO
else:
    from io import StringIO
    xrange = range
from tokenize import generate_tokens


a = 'par1=val1,par2=val2,par3="some text1, again some text2, again some text3",par4="some text",par5=val5'

def parts(a):
    """Split a python-tokenizable expression on comma operators"""
    compos = [-1] # compos stores the positions of the relevant commas in the argument string
    compos.extend(t[2][1] for t in generate_tokens(StringIO(a).readline) if t[1] == ',')
    compos.append(len(a))
    return [ a[compos[i]+1:compos[i+1]] for i in xrange(len(compos)-1)]

print(parts(a))

""" my output -->
['par1=val1', 'par2=val2', 'par3="some text1, again some text2, again some text3"', 'par4="some text"', 'par5=val5']
"""

The other alternative is to use regular expressions.

Gribouillis
Posting Maven
Moderator
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
 

Thank you, Gribouillis. I use your snippet.
Thank you, griswolf too. Your link will solve my other problem ;)

_neo_
Junior Poster in Training
64 posts since Aug 2010
Reputation Points: 26
Solved Threads: 3
 

Here is a version with regex. It should work even if the data string contains newlines

# python 2 and 3
import re
regex = re.compile(r"\\.|[\"',]", re.DOTALL)

def parts(data):
    delimiter = ''
    compos = [-1]
    for match in regex.finditer(data):
        g = match.group(0)
        if delimiter == '':
            if g == ',':
                compos.append(match.start())
            elif g in "\"'":
                delimiter = g
        elif g == delimiter:
            delimiter = ''
    # you may uncomment the next line to catch errors
    #if delimiter: raise ValueError("Unterminated string in data")
    compos.append(len(data))
    return [ data[compos[i]+1:compos[i+1]] for i in range(len(compos)-1)]

if __name__ == "__main__":
    a = 'par1=val1,par2=val2,par3="some text1, again some text2, again some text3",par4="some text",par5=val5'
    print(parts(a))

""" my output -->
['par1=val1', 'par2=val2', 'par3="some text1, again some text2, again some text3"', 'par4="some text"', 'par5=val5']
"""
Gribouillis
Posting Maven
Moderator
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: