Parsing a String into Tokens Using sscanf

Dave Sinkula 0 Tallied Votes 10K Views Share

Many times strtok is recommended for parsing a string; I don't care for strtok . Why?

  • It modifies the incoming string, so it cannot be used with string literals or other constant strings.
  • The identity of the delimiting character is lost.
  • It uses a static buffer while parsing, so it's not reentrant.
  • It does not correctly handle "empty" fields -- that is, where two delimiters are back-to-back and meant to denote the lack of information in that field.

This snippet shows a way to use sscanf to parse a string into fields delimited by a character (a semicolon in this case, but commas or tabs or others could be used as well).

Thanks to figo2476 for pointing out an issue with a previous version!
Thanks to dwks for asking why not to use strtok.

#include <stdio.h>

int main(void)
{
   const char line[] = "2004/12/03 12:01:59;info1;info2;info3";
   const char *ptr = line;
   char field [ 32 ];
   int n;
   while ( sscanf(ptr, "%31[^;]%n", field, &n) == 1 )
   {
      printf("field = \"%s\"\n", field);
      ptr += n; /* advance the pointer by the number of characters read */
      if ( *ptr != ';' )
      {
         break; /* didn't find an expected delimiter, done? */
      }
      ++ptr; /* skip the delimiter */
   }
   return 0;
}

/* my output
field = "2004/12/03 12:01:59"
field = "info1"
field = "info2"
field = "info3"
*/
waldchr -2 Junior Poster in Training

Dave Sinkula

How do I edit that code so it can have an input that can be changed?

sanushks 25 Light Poster

Hi,

This code does not work if there are successive delimiters with no info as shown below like
2007/09/15 12:34:23;;info1;info2;
The output is 2007/09/15 12:34:23
The rest of the string is ignored.

Please check

acawhiz 0 Newbie Poster

To get around successive delimiters
Replace code on line 17 with

while ( *ptr == ';' )      
     { 
          ++ptr; /* skip the delimiter */    
      }
sheshas 0 Newbie Poster

Even with this, you can parse till the end, however it will not indicate is a field is empty. Here is a snippet to over come that.

while (*ptr != '\0') {
        int items_read = sscanf(ptr, "%31[^;]%n", field, &n);
        printf("field = \"%s\"\n", field);
        field[0]='\0';
        if (items_read == 1)
            ptr += n; /* advance the pointer by the number of characters read */
        if ( *ptr != ';' ) {
            break; /* didn't find an expected delimiter, done? */
        }
        ++ptr; /* skip the delimiter */
    }

=======
output
=======
field = "2004/12/03 12:01:59"
field = "info1"
field = ""
field = "info2"
field = "info3"

ladookie4343 0 Newbie Poster

How would you store each field into a different character array?

DBCepull 0 Newbie Poster

Is this what you are looking for? This example handles a file with tab delimiters. Properly deals with empty fields in the line.

#define FIELDS 10
  char inputLine[512];
  char *ptr;
  char fields[10][51];
  int n, f;

  for (f=0; f<FIELDS; f++) {
    strcpy(fields[f], "");
  }
  
  fgets(inputLine, 255, fileHandle);  //load a record from the file
  
  ptr = inputLine;
  for (f=0; f<FIELDS; f++) {
    sscanf(ptr, "%50[^\t\n]%n", fields[f], &n); 
    ptr += n;                    //advance the pointer by the number of chars read 
    if (*ptr != '\t') break;
    ptr += 1;                    //skip the delimiter  
    while (*ptr == '\t') {
      ptr += 1; f++;             //skip any back-to-back delimiters (and the next field)
    }
  }
anti_neoliberal 0 Newbie Poster

For those curious about the quotation in Dave's signature, I recommend the Shock Doctrine by Naomi Klein. Friedman helped Pinochet use quite a lot of force.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.