Hey,

I'm writing an XML parser in C++. Currently it works, but too much of what needs to be done is left up to the end user. I'm trying to figure out a way to have a clean, more encapsulated interface for the parser, but I can't seem to think of one that I like.

This is the deceleration of the parser:

class XMLParser
{
public:
	XMLParser();
	~XMLParser();

	int OpenFile(std::string filename);
	void CloseFile();

	int ReadTag(std::string* name,TagType* tag_type,bool* attributes);
	int ReadAtribute(std::string* name,std::string* value,TagType* tag_type,bool* moreAttributes);
	int ReadData(std::string* data);
private:
	std::string m_filename;
	std::ifstream m_fin;
};

Definition of TagType:

enum TagType
{
	Unknown,
	Open,
	Close,
	StandAlone
};

The parser works with three main methods. The first is ReadTag(). It returns 1 on success, and fills name with the name of the tag, tag_type with the type of the tag (if not known, its set to 0 (unknown)), and fill attributes with whether or not there are any attributes to be read (if attributes is true, tag_type is always unknown).

ReadAttribute() should only be called directly after a call to ReadTag() that sets attributes to true (or a call to ReadAttribute() that says there are more attributes). It will give you the name of the attribute, it's value, the type of tag (again, if known), and whether or not there are more attributes (again, if true, tag_type is always unknown).

ReadData() should be called only after the attribute flag (of either ReadTag() or ReadAttribute()) is false. It fills the string with data from the file until it encounters the beginning of another tag, at which point you should call ReadTag() and start the cycle over. Here is an example of how these methods may be used:

#include <iostream>
#include <string>
#include "ReznebXML.h"
using namespace std;

int main()
{
	cout << "XML test" << "\n\n";

	XMLParser* parse;
	parse = new XMLParser;
	parse->OpenFile("test.xml");

	string str1;
	string str2;
	bool attributes;
	TagType type;

	string root;
	parse->ReadTag(&str1,&type,&attributes);//parse root element...
	root = str1;//...and store it
	cout << "Root: " << root << "\n";
	while(attributes)//while there are still attributes to be read...
	{
		if(!parse->ReadAtribute(&str1,&str2,&type,&attributes))
			return 0;

		cout << "\t" << str1 << ": " << str2 << "\n";//...output them
	}
	if(type != Open)
		return 0;

	if(!parse->ReadData(&str1))
		return 0;

	cout << "Data: " << str1 << "\n";

	str1 = "";
	//Read elements
	while(true)
	{
		if(!parse->ReadTag(&str1,&type,&attributes))
			return 0;
		
		if(str1 == root)
		{
			if(!attributes && type == Close)
				break;
			else
				return 0;
		}
		else
		{
			cout << "Sub element: " << str1 << "\n";
			while(attributes)
			{
				if(!parse->ReadAtribute(&str1,&str2,&type,&attributes))
					return 0;

				cout << "\t" << str1 << ": " << str2 << "\n";
			}
			if(!parse->ReadData(&str1))
				return 0;

			cout << "Data : " << str1 << "\n";
		}
	}

	system("PAUSE");
	return 0;
}

As you can see, there's a lot of user-dependency (the user being the programmer that uses the parser). Can anyone help me think of a better interface, probably one that wraps around these three methods.

Thanks.

You can have a look at my definition for an xml parser. I've stored the contents of each tag/dir in an XmlDir class (which contains the tag name, a list of XmlDirs and a map of attributes to create a type of directory tree). The user interfaces are the public member functions in either class.

An example of usage:

XmlParser doc;
   
      doc.loadFile("c:/colourdefs.xml");
      
      list<XmlDir*> saturatedRedGreens;

      // supply a list to get multiple dirs
      doc.getDir("/palette/colors/color<@enabled=\"true\">/red<255>/../green<255>/../name", &saturatedReds);

      // saturatedReds now contains the name of all enabled (as per attribute field) colours having red = 255, green = 255

Look at TinyXML or something, it's nicely done.

now that makes me want to code an xml parser... perhaps with XPath support...

Um, sorry, what exactly is a dir? A directory?

Um, sorry, what exactly is a dir? A directory?

dir... yes it's a directory. I'm not well versed in XML speak, so I likened each tag to a directory as in a file structure. They're actually called 'elements', which I'll make sure I call them in future...