I've been looking for a way to parse a simple XML-like language for use as a type of data storage. I've been through stuff like JSON, XML, etc but I don't want to use them because they are quite slow. I just need a simple way to parse this:

[stuff]
  [key1]data[/key1]
  [key2]data[/key2]
[/stuff]

And make/map it into a dictionary, like this:

{"stuff":{"key1":"data", "key2":"data"}}

I've made myself a generator which will process a dictionary according to the syntax rules:

def generate_di(self, item)
    assert type(item) is dict
    for key in item:
            if type(item[key]) is dict:
                self.puts(self.strtag % key)
                # means we are going in one branch
                self.rlevel += 1
                self.generate_di(item[key])
                # we have finished parsing that branch
                self.rlevel -= 1
                self.puts(self.endtag % key)
            else:
                self.puts(self.strtag % key + str(item[key]) + self.endtag % key)

Which takes in a dictionary, say {"deltas":{"key":"data", "key2":"data"}} and puts this into the stream:

[deltas]
  [key]data[/key]
  [key2]data[/key2]
[/deltas]

As you can see this is very XML-like but eliminates the complex tree and attributes arguments. Any help would be appreciated!

Recommended Answers

All 7 Replies

No it wouldn't, because sometimes the configuration options are deeply nested within one another, i.e.:

[options]
  [user]
    [name]John Doe[/name]
    [age]Age[/age]
  [/user]
  [packages]
    [pkg]
      [version]1.2[/version]
      [summary]Something[/summary]
      [author]John Doe[/author]
    [/pkg]
  [/packages]
[/options]

Replace [user] by "user":{ and [/user] by },. In the same way, replace [name] with "name":" and [/name] with ",. Do this with all the tags, then call eval()

The problem is that the '"name":{content}' part would be invalid. I need a way to check whether it's a tree of tags or just a tag on it's own. And the content that's parsed would be dynamic.

Here is a small parser. It can be improved by stronger input validation

#!/usr/bin/env python
#-*-coding: utf8-*-

START, END, DATA, EMPTY = range(4)

class ParseError(Exception):
    pass

class Parser(object):
    def __init__(self):
        pass

    def error(self, lineno):
        raise ParseError("Invalid Syntax at line %s" %  str(lineno))

    def parse(self, lines):
        L = [dict()]
        for i, line in enumerate(lines, 1):
            t = self.classify(line, i)
            type, key, data = t
            if type == START:
                L[-1][key] = D = {}
                L.append(D)
            elif type == END:
                del L[-1]
            elif type == DATA:
                L[-1][key] = data
        return L[0]

    def classify(self, line, lineno):
        line = line.strip()
        if not line:
            return (EMPTY, '', '')
        if not(len(line) >= 3 and line[0] == '[' and line[-1] == ']'):
            self.error(lineno)
        if line[1] == '/':
            return (END, line[2:-1], '')
        else:
            i = line.find(']')
            if i == len(line) - 1:
                return (START, line[1:-1], '')
            else:
                return (DATA, line[1:i], line[i+1:-(i+2)])

if __name__ == '__main__':
    example_data = """
[options]
    [user]
        [name]John Doe[/name]
        [age]Age[/age]
    [/user]
    [packages]
        [pkg]
            [version]1.2[/version]
            [summary]Something[/summary]
            [author]John Doe[/author]
        [/pkg]
    [/packages]
[/options]
    """
    from StringIO import StringIO
    from pprint import pprint
    p = Parser()
    f = StringIO(example_data)
    result = p.parse(f)
    pprint(result)

"""my output -->
 {'options': {'packages': {'pkg': {'author': 'John Doe',
                                  'summary': 'Something',
                                  'version': '1.2'}},
             'user': {'age': 'Age', 'name': 'John Doe'}}}
"""
commented: Neat! +12

Thanks dude! Though the code is barely understandable. I do not understand why you'd have to create the START, END, DATA, EMPTY variables.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.