Hi guys and gals,

I'm writing a file parser, which reads "source code -ish" text from files. The text is usually in this kind of format:

Module: {
    Submodule: {
        Property: Value
        ...
    }

    Submodule: {

    }
}

Module: {
    Property: Value
}

I've written a simple but hazardous reader that keeps reading until the next known name is found, but the problem in this approach is that if the known module / submodule / property name is not found, the program will read past the end and crash.

So, the question is: what is the best approach for this kind of reading and parsing? I thought of using a "brace stack", which would simply keep track of opening braces, so the reader will always know when a module ends. But that isn't enough, I am totally clueless about how to actually proceed and read the text safely and without any problems.

Thanks in advance for all the help and tips :)

This almost looks like JSON. There are some good parsers that you can get for JSON with C#.

If you have control of the source text files, I suggest switching to the JSON Format and use one of the existing parsers.

Otherwise, XML will do this perfectly and is much easier to work with in code.

If you absolutely must keep this format, then I would parse the brace blocks. Find the top "node" of [Name]:{ } and then parse what's inside the braces.

You should write your code in such a way that each set of braces can be parsed with the same method. Like a ParseBlock(String braceBlock) and each time you reach a block, recurse. How you do this, is dependent on how you want to output. I imagine you're doing some kind of configuration or object set up. I suggest reflection for this purpose :)

If you are using a streamreader and reading a line at a time you can use something like this:

using (StreamReader sr = new StreamReader("File.txt"))
{
string line;
while ((line = sr.ReadLine()) != null)
{
   // your code
}
}

Depending on how you are parsing it may or may not help. Worth a shot:)

hericles: Thanks, but the problem isn't about loading and reading the file, but in the parsing of the data. :)

Ketsuekiame: I have to keep the format intact, because it's an output from another program. The recursive method seems to be the one I need, but it's still a bit hazy for me how to do the actual block parsing nice and clean. I'll try to figure something out, but still, further pointers are appreciated :)

Hericles: Ah, my mistake, didn't mean it that way. What I tried to say was that when searching for a certain module with name "ModuleName" that doesn't actually exist, it would try to find it until it reached the end of file, and then it would've just skipped all the other data while trying to find the one module. :)

My opinion is that you're doing it in reverse.

Your only reliable knowledge is that you know the state of the object before parsing and the what the state of the object should be after parsing.

You cannot be certain of the data in the object, only whether the data is valid and that it contains all the data required. This is what you know :)

So you need to look at the parsing from the flip side, rather that assume that the file contains all the data you need (this is not a reliable assumption as you don't control the output), you should parse the file, find the key/value attribute pairs and then find the object property that relates to these.

So for example; You start parsing a block of data, you come accross 'Name:Ketsuekiame'
At this point, you can make two choices:
1. Reflect your attribute name (in this case Name) and attempt to set the property value.
2. Have a switch statement text look-up that will set the value of the property.

An example of number 2 would be:

switch(propertyName)
{
    case "Name":
        myObject.Name = propertyValue;
        break;
    default:
        throw new Exception("Key not known!");
}

My personal preference is to use reflection, but I don't know how accurately your datafile could be mapped to an object.

This article has been dead for over six months. Start a new discussion instead.