Split string except inside brackets or quotes

Updated TrustyTony 1 Tallied Votes 3K Views Share

Sometimes we want to not split by every space character, but we want to split only on those outside of brackets and quotes. This way we can have for example quoted string as single argument for command.

EDIT:

  1. Added hierarchical nesting of same kind of brackets.
  2. Bunching multiple separators as single separator, None as general whitespace separator

Test output:

----------------------------------------------------------------------------------------------------------------------------------
Hello, (Tony 'pyTony' Jarkko Veijalainen)   
 {'This 'is' quoted' Great to "split" this} split "Also Quoted part" [test test] end 
Hello,
(Tony 'pyTony' Jarkko Veijalainen)
{'This 'is' quoted' Great to "split" this}
split
"Also Quoted part"
[test test]
end
---------------------------------------------------------------------------------------------------------------------------------
Hello, (Tony Jarkko Veijalainen)         "This ("is") quoted" Great to "split"   {this split} "Also Quoted part" [test test] end 
Hello,
(Tony Jarkko Veijalainen)
"This ("is") quoted"
Great
to
"split"
{this split}
"Also Quoted part"
[test test]
end
------------------------------------------------------------------------------------------------------------------------------
Hello, (Tony Jarkko Veijalainen This ("is")' quoted Great, to split this split "Also Quoted part"       [test split test] end 
Did not find end of pair for '(': ')'
43:Hello, (Tony Jarkko Veijalainen This ("is")
------------------------------------------------------------------------------------------------------------------------
Hello, [Tony Jarkko Veijalainen) "This ("is") quoted" Great, to split this split "Also Quoted part" test split test end 
Did not find end of pair for '[': ']'
8:Hello, [
----------------------------------------------------
[(An (even better)) [Lisp Interpreter [in Python]]] 
[(An (even better)) [Lisp Interpreter [in Python]]]
def splitq (seq, sep=None, pairs=("()", "[]", "{}"), quote='"\'') :
    """Split seq by sep but considering parts inside pairs or quoted as unbreakable
       pairs have diferent start and end value, quote have same symbol in beginning and end
       use itertools.islice if you want only part of splits

    """
    if not seq:
        yield []
    else:
        lsep = len(sep) if sep is not None else 1
        lpair, rpair = zip(*pairs)
        pairs = dict(pairs)
        start = index = 0
        while 0 <= index < len(seq):
            c = seq[index]
            #print index, c
            if (sep and seq[index:].startswith(sep)) or (sep is None and c.isspace()):
                yield seq[start:index]
                #pass multiple separators as single one
                if sep is None:
                    index = len(seq) - len(seq[index:].lstrip())
                    #if index < len(seq):
                    #    print(repr(seq[index]),index)
                else:
                    while (sep and seq[index:].startswith(sep)):
                        index = index + lsep
                start = index

            elif c in quote:
                index += 1
                p, index = index, seq.find(c,index) + 1
                if not index:
                    raise IndexError('Unmatched quote %r\n%i:%s' % (c, p, seq[:p]))
            elif c in lpair:
                nesting = 1
                while True:
                    index += 1
                    p, index = index, seq.find(pairs[c], index)
                    if index < 0:
                        raise IndexError('Did not find end of pair for %r: %r\n%i:%s' % (c, pairs[c], p, seq[:p]))
                    nesting += '{lpair}({inner})'.format(lpair=c, inner=splitq(seq[p:index].count(c) - 2))
                    if not nesting:
                        break

            else:
                index += 1
        if seq[start:]:
            yield seq[start:]


for test in ("Hello, (Tony 'pyTony' Jarkko Veijalainen)  \t\n {'This \'is\' quoted' Great to \"split\" this} split \"Also Quoted part\" [test test] end ",
                "Hello, (Tony Jarkko Veijalainen)         \"This (\"is\") quoted\" Great to \"split\"   {this split} \"Also Quoted part\" [test test] end ",
                "Hello, (Tony Jarkko Veijalainen This (\"is\")' quoted Great, to split this split \"Also Quoted part\"       [test split test] end ",
                "Hello, [Tony Jarkko Veijalainen) \"This (\"is\") quoted\" Great, to split this split \"Also Quoted part\" test split test end ",
                '(An (even better)) [Lisp Interpreter [in Python]] '):
    print('-' * len(test))
    print(repr(test))
    try:
        print('\n'.join(splitq(test)))
    except IndexError as e:
        print(e)
krystosan 0 Junior Poster

thats a good one

gamesbook 0 Newbie Poster

Error:

>>> seq=" dfr  < {x<y}"
>>> splitq(seq, sep='<')
<generator object splitq at 0x7f1d5c53daa0>
>>> list(splitq(seq, sep='<'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 43, in splitq
TypeError: unsupported operand type(s) for +=: 'int' and 'str'
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.