I have a bit of a baffling problem! I'm writing a tag matcher in XML and whenever I run the below procedure, I get a garbage value after the name of every tag except for the first one. When I add the array text to the parameters list, the garbage value goes away (I had it there during initial debugging) even though it isn't actually referenced at any point in the procedure.

Another thing, is that while reading a tag, it doesn't seem to want to recognize spaces. For example, <img src="img.png"/> reads as imgsrc="imgpng" instead of img, but only when another tag is beside it on the same line. For example:

<head>
<img src="blah.jpg"/>
</head>

Reads the tagname fine, while

<head><img src="blah.jpg"/></head>

Results in strange output.

Full code: http://pastebin.com/kEEBmipy
Main problem code:

/* ReadTagName
* When a tag is detected, finds the name of the tag in the form <name ... /> or </name ...> etc.
* ReadTagName terminates when a space is detected.
* Tag names which do not start with a letter return an error, and tag names which contain illegal characters also return an error.
* Outputs: The tag name to the char* tagName.
*/
void ReadTagName(char *tagName, char ch, int *tagNameCount, int *doneReadingName){
  if(*tagNameCount > MAX_TEXT_LEN){
    fprintf(stderr,"%s: %d: Tag name exceeded MAX_TEXT_LEN of %d characters.\n",g_fileName,g_lineNumber,MAX_TEXT_LEN);
  }
  if(*tagNameCount == 0){
    if(isalpha(ch)){
      tagName[*tagNameCount] = ch;
    }
    else if(ch == '/'){
      return;
    }
    else{
      fprintf(stderr,"%s: %d: Tag name must start with a letter.\n",g_fileName,g_lineNumber);
      exit(1);
    }
  }
  if(*tagNameCount > 0){
    if(isalnum(ch)){
      tagName[*tagNameCount] = ch;
    }
    else if(ch == '>' || ch == '/' || isspace(ch)){
      *doneReadingName = 1;
      tagName[*tagNameCount] = '\0';
      return;
    }
    else{
      fprintf(stderr,"%s: %d: Tag name contains illegal symbol.\n",g_fileName, g_lineNumber);
      exit(1);
    }
  }
  *tagNameCount = *tagNameCount + 1;
}

If you can point me in the right direction, that would be awesome!

Finite-state machines are your friends for parsing data like this! IE:

  1. first '<' == "start tag" event - transition to "reading tag_name" state.
  2. handler == "read data until '/' or '>' found".
  3. if '/' found, peek at next char. If next char is '>', then skip and go to "end_tag" state. if not, then go to "error" state. If '>' found, then go to "read_data" state.
    .
    .
    .
    I leave the rest to you to determine. FWIW, I have written a number of XML parsers that are used in major commercial software systems. You can also use some of the common open source parser libraries, such as xerces, etc. They will do the "heavy lifting" for you.
This article has been dead for over six months. Start a new discussion instead.