I started off reading a file with fscanf. I figured I could use fscanf since the file was consistent with just two columns. I got very strange output when using fscanf. Can someone please explain why? When I switched over to fgets with sscanf it worked perfectly. I am curious why though. The only changes I made was switching the fscanf to fgets and adding the sscanf line.

//while (fscanf(trans1_fp, "%c %s",  &char_storage, nam) !=EOF)
while(fgets(line1, 80, trans1_fp) != NULL)
{
    sscanf(line1, "%c %s", &char_storage, nam);
    trans1_char[counter] = char_storage;
    trans1_char_table[counter] = malloc(strlen(nam)+1);
    strcpy(trans1_char_table[counter], nam);
    printf(" trans1_char[counter] is %c \n", trans1_char[counter]);
    printf(" trans1_char_table[counter] is %s \n", trans1_char_table[counter]);
    printf(" passing through second while loop \n");
    counter++;
    memset(nam, 0, 80);
    char_storage = 0;
    //memset(char_storage, 0, 80);
    val = 0; 
}

This is the output I was getting

 trans1_char[counter] is R
 trans1_char_table[counter] is XXXX
 passing through second while loop
 trans1_char[counter] is

 trans1_char_table[counter] is R
 passing through second while loop
 trans1_char[counter] is
 trans1_char_table[counter] is DISNEY
 passing through second while loop
 trans1_char[counter] is

 trans1_char_table[counter] is
 passing through second while loop

Here is the output I expect.

 trans1_char[counter] is R
 trans1_char_table[counter] is XXXX
 passing through second while loop
 trans1_char[counter] is R
 trans1_char_table[counter] is DISNEY
 passing through second while loop

This is because fgets effective consumes the newline for you and fscanf doesn't

I assume your file is

R XXXX
R DISNEY

Remember at the end of each line is a newline character so the file can be presented by the character stream

R XXXX\nR DISNEY\n

This is an important point, a newline character starting a new line is just a visual clue for the dumb humans using the computer, to the program itself it is just another character value (or 2) in the file stream that in some circumstances it puts a little bit of special meaning into (like fgets) but it really isn't that remarkable.

With fscanf you read a character then a white space delimited string so

First loop read R, XXXX (skip the space)
File stream contains \nR DISNEY\n
Second loop read \n (it is a character), R (it is a string)
File stream contains DISNEY\n
Third loop read ' ' (it is a character), DISNEY

When you use the fgets to read followed by sscanf you read line by line, the new line gets parsed by the sscanf because it is always just white space at the end of the current line.

First loop parse R XXXX\n to R and XXXX ignore \n
Second loop parse R DISNEY\n to R and DISNEY ignore \n

Using fgets provides 2 advantages

  1. More control, you know you have exactly 1 line the are no extra characters interfering with your parsing of the data
  2. You limit the amount of data read to your buffer size so there is no chance of an out of bounds array access

Edited 2 Years Ago by Banfa

More control, you know you have exactly 1 line the are no extra characters interfering with your parsing of the data

With the caveat that your second advantage interferes with the first.

You limit the amount of data read to your buffer size so there is no chance of an out of bounds array access

If the line is longer than the buffer size, multiple calls to fgets are required to get all of it. Ignoring stream errors, fgets will read up to the buffer size (the second argument) or a newline is extracted, or end-of-file. Robust code would account for all of those possibilities:

char *line = NULL;
char buf[BUFSIZ];

while (fgets(buf, sizeof buf, in) != NULL)
{
    size_t end = append_str(&line, buf);

    // Did we read a full line?
    if (line[end] == '\n' || feof(in))
    {
        break;
    }
}

if (line == NULL)
{
    // Handle no input
}

What happens in this case? It was able to deal with the /n.

    while (fscanf(init_dat_fp, "%s %d",  nam, &val) !=EOF)
    {
        //sscanf(line1, "%s %s %x", label, mneumonic ,&start_address);
        init_name_table[counter] = malloc(strlen(nam)+1);
        strcpy(init_name_table[counter], nam);
        init_value_table[counter] = val;
        strcpy(table[counter].id, nam);
        table[counter].value = val;
        printf(" init_name_table[counter] is %s \n", init_name_table[counter]);
        printf(" init_value_table[counter] is %d \n", init_value_table[counter]);
        counter++;
        memset(nam, 0, 80);
        val = 0;
        table_size = counter;  
    }

The problem stems from the %c format specifier which reads a character and since newline is a character it can be read by %c. All other format specifiers, %s, %d etc use white space as field delimiters and so the white space, including newlines is automatically consumed.

What happens is on the first read the newline after the number read by %d is left in the input string but is then skipped in the second read to find the first non-white space character for the %s

First loop read R, XXXX
File stream contains \nR DISNEY\n
Second loop skip \n (it is white space), read R DISNEY

Although of course now you are reading a number so the input data must have changed but you get the idea.

This article has been dead for over six months. Start a new discussion instead.