help with C code

Question

hariza 0 Light Poster

18 Years Ago

Hi Guys,

I wonder if someone can help me out with this. I'm running the same C code in diffferent platforms (windows/unix) but getting different result. For instance the code is:

fgets(line,LINE_BUF,fp); 
          while (!feof(fp))     {
        Len=strlen(line) - 1;
      if ( (newNode=malloc(sizeof(ListNode))) == NULL)
      {
         fprintf(stderr,"\nListInsert: Memory Allocation failed! Aborting.\n");
         break;
      }
      current = head;
      previous = NULL;
    /* Search to find where in insert new list node */
      while (current != NULL )
      {
         previous = current;
         current = current->next;
      }
   if ( Len > 0)  {
      strncpy(newNode->word,line,Len);
      }
      newNode->next = current;
       fgets(line,LINE_BUF,fp); 
      if (previous == NULL)
      {
         head = newNode;
      }
      else
      {
         previous->next = newNode;
      }
  }

Basically what I'm trying to do is load a linklist but for example here is the output of the program executed from Unix:
written Len: 7 Number : 3917
wrong Len: 5 Number : 3918
wrote Len: 5 Number : 3919
year Len: 4 Number : 3920
years Len: 5 Number : 3921
yellow Len: 6 Number : 3922
yes Len: 3 Number : 3923
yesterday Len: 9 Number : 3924
yet Len: 3 Number : 3925
you Len: 3 Number : 3926
young Len: 5 Number : 3927
your Len: 4 Number : 3928
yours Len: 5 Number : 3929
yourself Len: 8 Number : 3930
zero Len: 4 Number : 3931

And here is the output of the same program executed on Windows:
written Len: 7 Number : 3917
wrong☺= Len: 7 Number : 3918
wrote☺= Len: 7 Number : 3919
yearx☺= Len: 7 Number : 3920
years☺= Len: 7 Number : 3921
yellow= Len: 7 Number : 3922
yes Len: 3 Number : 3923
yesterday Len: 9 Number : 3924
yet Len: 3 Number : 3925
you Len: 3 Number : 3926
young☺= Len: 7 Number : 3927
your♦= Len: 7 Number : 3928
yours♦= Len: 7 Number : 3929
yourself Len: 8 Number : 3930
zero°☻= Len: 7 Number : 3931

Just in case the print statement is the same in both platforms and it looks like this :

printf("%s Len: %d Number : %d\n", current->word,Len1,count);

Please any help on what should I do to overcome this problem would be highly appreciated. thanks.

harby

c

7 Contributors
18 Replies
151 Views
1 Day Discussion Span
Latest Post 18 Years Ago Latest Post by Dave Sinkula

Ancient Dragon 5,243 Achieved Level 70

18 Years Ago

fgets(line,LINE_BUF,fp); 
          while (!feof(fp))     {

The behavior you see is probably because the above is not the correct way to code that loop. Here is the correct, and more efficient, way. Not how eof() function is not needed.

while( fgets(line,LINE_BUF,fp) != NULL)
{
   // blabla
}

You did not post the declaration of ListNode, but I assume member word is an array of characters and not a pointer. Your program might be more efficient if you make word a pointer and allocate the required length. Sure, strncpy() will help minimize buffer overflows, but if the source string is longer than the amount of space allocated for the destination string, the result destination string will not be NULL terminated, and that will make other standard string operations (such as strlen) unpredictable.

Dave Sinkula 2,398 long time no c

18 Years Ago

MS-Windows fgets() appends a '\n' at the end of the line, which you must remove.
while (fgets( // blabla ) )
{
   if( line[strlen(line)-1] == '\n')
      line[strlen(line)-1] = 0;
}

I wouldn't say it appends it, but rather leaves it there if in fact it is there. And with very long strings, it seems a waste to calculate the length twice. (I mentioned this to a seasoned programmer in a slightly different light once long ago, so that is why I mention it again here. That problem involved nested loops and strlen and severe performance issues, which is why I feel it is worthy of mention -- it's not a good habit to perpetuate.)

Dave Sinkula 2,398 long time no c

18 Years Ago

I agree -- in actual programs I calculate the length only once and probably should have shown that method here too.

Out of curiosity, since you were around for my DevShed thread, what is it? On Cprog some of us had a similar debate once, so I'm just wondering which one you prefer? The strlen to an index, the strlen to a pointer, the strchr, the strrchr, the strcspn, or the strtok, other?

Edited 12 Years Ago by TrustyTony because: fixed formatting

Dave Sinkula 2,398 long time no c

18 Years Ago

Since '\n' is always the last character in the buffer there is no point using those functions -- they are a waste of time and will do nothing more than slow down the program if used frequently. My opinion is that it is a misuse of those functions.
strlen() to an index or pointer are one in the same -- I prefer index method as in the example I posted

While that might seem to be the "obvious" case, it is not necessarily true.

On Cprog some of us had a similar debate once,

Case in point: a reply to my post in which strrchr was the fastest.

Assumptions about library implementations may be unreliable.

Dave Sinkula 2,398 long time no c

18 Years Ago

Now I'm intrigued by it. Could anyone post a little example of re-writting the fgets function?

There are links in User Input: Strings and Numbers [C] about 2/3 of the way down, called "Read a Line of Text from the User" and "Safe Version of gets()", for more examples to look at.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

John A 1,896 Vampirical Lurker Team Colleague · Answer 1 · 2007-01-16T08:18:01+00:00

I can't read your code. Please use code tags.

It's kind of hard to see what the loops do without proper indentation (code tags fix this problem) so I can't quite see what's exactly happening in the program. One thing:

fgets(line,LINE_BUF,fp); 
          while (!feof(fp))     {
        Len=strlen(line) - 1;
      if ( (newNode=malloc(sizeof(ListNode))) == NULL)
      {
         fprintf(stderr,"\nListInsert: Memory Allocation failed! Aborting.\n");
         break;
      }
      current = head;
      previous = NULL;
    /* Search to find where in insert new list node */
      while (current != NULL )
      {
         previous = current;
         current = current->next;
      }
   if ( Len > 0)  {
      strncpy(newNode->word,line,Len);
      }
      newNode->next = current;
        fgets(line,LINE_BUF,fp);

Why do you have 2 fgets() statements? A properly-written file reading function should only need 1.

Another thing: is your file encoding messed up between platforms? Often operating systems store newlines and other charecters differently, which can confuse functions like fgets() and cause problems.

hariza 0 Light Poster · Answer 2 · 2007-01-16T08:56:30+00:00

Hi There,

I don't the file is messed up or corrupted because I was able to open the file on windows , copy alll the lines and then paste it into a new file on unix and still getting the same result. I was wondering maybe something that need to be set like environment variable at the windows level. Don't know not sure what could be.

Thanks.

hariza 0 Light Poster · Answer 3 · 2007-01-16T09:37:30+00:00

Hi There,

Thanks a lot for your advice. In fact yes the code looks now neat and tidy after the recommendation you gave me but still getting the same funny characters on windows.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 4 · 2007-01-16T09:45:32+00:00

MS-Windows fgets() appends a '\n' at the end of the line, which you must remove.

while (fgets( // blabla ) )
{
   if( line[strlen(line)-1] == '\n')
      line[strlen(line)-1] = 0;
}

If the above doesn't fix the problem, look at the file with Notepad.exe and see if those characters are there. If they are, then the problem is in the data file and not your program.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 5 · 2007-01-16T10:25:14+00:00

I wouldn't say it appends it, but rather leaves it there if in fact it is there. And with very long strings, it seems a waste to calculate the length twice. (I mentioned this to a seasoned programmer in a slightly different light once long ago, so that is why I mention it again here. That problem involved nested loops and strlen and severe performance issues, which is why I feel it is worthy of mention -- it's not a good habit to perpetuate.)

I agree -- in actual programs I calculate the length only once and probably should have shown that method here too.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 6 · 2007-01-16T17:50:15+00:00

Since '\n' is always the last character in the buffer there is no point using those functions -- they are a waste of time and will do nothing more than slow down the program if used frequently. My opinion is that it is a misuse of those functions.

strlen() to an index or pointer are one in the same -- I prefer index method as in the example I posted

Ravalon 62 Posting Whiz in Training · Answer 7 · 2007-01-16T20:07:43+00:00

Since '\n' is always the last character in the buffer there is no point using those functions -- they are a waste of time and will do nothing more than slow down the program if used frequently. My opinion is that it is a misuse of those functions.

But does it really matter which one is faster? I think the one that makes the intention clear is the best choice, and any speed concerns are probably misplaced paranoia. Do you think that micro optimizations like this will make any difference compared to algorithm changes or that they will nullify the slowness of I/O and other external interaction? ;)

I think strchr() is the most clear because it says exactly what you're trying to do: find a '\n' and replace it with '\0'.

char *p = strchr( str, '\n' );

if ( p )
  *p = '\0';

But strlen() is pretty clear too: find the end of the string and replace the last character with '\0' if it's '\n'. But it's also longer to describe and that makes it less clear than strchr() in my book.

size_t n = strlen( str );

if ( str[n - 1] == '\n' )
  str[n - 1] = '\0';

The others are all tricky enough to warrant a comment, and that makes them unclear choices. But I probably wouldn't use any of them regularly. I would use a wrapper function or write my own version of fgets to suit my application. That way the code is really clear. :)

Salem 5,248 Posting Sage · Answer 8 · 2007-01-16T23:51:01+00:00

> Since '\n' is always the last character in the buffer there is no point using those functions
No it isn't.
If the line is longer than the buffer, then there will be no \n in there at all, so just trashing the character before the \0 is wrong.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 9 · 2007-01-17T01:36:36+00:00

>>But does it really matter which one is faster?
Yes it might, depending on how often it is used and whether it is a real-time program or not. How many programs have you used that are sluggish and show to respond -- one reason is for careless programming and programmers not paying attention to efficiency.

It would appear from your comments that you have never written a real-time program where you have to sequeeze every last nano-second you can out of the program. No reflection on you but what you stated is a common belief about program speed/efficiency versus program maintainability. Sometimes ease of maintainance issues have to be sacrificed for program efficiency.

>>I would use a wrapper function or write my own version of fgets to suit my application.
Yes, that is the best solution if there are a lot of fgets()'s scannered throughout the program.

> Since '\n' is always the last character in the buffer there is no point using those functions
No it isn't.
If the line is longer than the buffer, then there will be no \n in there at all, so just trashing the character before the \0 is wrong.

you are of course correct but I made no mention of that. If it exists it will always always be the last byte just before the null terminator.

Aia 1,977 Nearly a Posting Maven · Answer 10 · 2007-01-17T03:05:33+00:00

I have been researching for some days about this problem with the '\n'
left behind.

This have been posted:

>>I would use a wrapper function or write my own version of fgets to suit my application.

Now I'm intrigued by it. Could anyone post a little example of re-writting the fgets function?.
From Ravalon I heard for first time about wrapper but I'm not sure I follow what it means.

Until I found this forum all that I did was to trash the last character
before '\0' like Ancient Dragon initially posted. Now I'm looking for a better way that I can understand.
Thank you guys, for your guidance.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 11 · 2007-01-17T03:44:06+00:00

to write a wrapper means to write your own function that will eventually call some standard library function

Now, instead of calling fgets() call myfgets()

char* myfgets(char *buf, size_t bufsize, FILE* fp)
{
    chr* ptr = NULL;
    // validate parameters
    if( buf != NULL && bufsize > 0 && fp != NULL)
    {          
        ptr = fgets(buf,bufsize,fp);
        if(ptr != NULL)
       {
            size_t len = strlen(buf);
            if( buf[len-1] == '\n')
               buf[len-1] = 0;
       }
    }
    return ptr;
}

John A 1,896 Vampirical Lurker Team Colleague · Answer 12 · 2007-01-17T05:32:51+00:00

Not to be nitpicky, but...

chr* ptr = NULL;

Something doesn't look right. :D

Ravalon 62 Posting Whiz in Training · Answer 13 · 2007-01-17T06:07:28+00:00

>>But does it really matter which one is faster?
Yes it might, depending on how often it is used and whether it is a real-time program or not. How many programs have you used that are sluggish and show to respond -- one reason is for careless programming and programmers not paying attention to efficiency.

I think there's a difference between careless programming and optimizing without good cause. Sacrificing clarity for dubious gains in speed sounds like a poor tradeoff to me. This applies to real-time programming as well. You don't waste your time optimizing things that may not have a noticeable effect. You profile the application, marvel at how the bottleneck is not even close to where you predicted, optimize it, and repeat until the application is fast enough. ;)

It would appear from your comments that you have never written a real-time program where you have to sequeeze every last nano-second you can out of the program. No reflection on you but what you stated is a common belief about program speed/efficiency versus program maintainability. Sometimes ease of maintainance issues have to be sacrificed for program efficiency.

You're absolutely right. I don't have experience with real-time programs, but I do know that they're a different ball game altogether. If I were trying to squeeze every cycle out of a real-time program that does a lot of I/O with long strings, I wouldn't be doing it by calling fgets and removing the newline every time. I would probably use a more speed conscious string model like length prefix, and write an fgets that works with it.

The problem is that too many people go to the extreme. On the one extreme, they fool themselves into believing that their programs are efficient by micro optimizing everything and ignoring the parts that really matter. On the other extreme, they fool themselves into believing that the speed of computers makes a mind for efficiency obsolete. In both cases, they're fooling themselves. Efficiency isn't just a matter of using the code with the fewest cycles. It's a great deal harder and more subtle, and lacking experience with real-time programs doesn't mean I'm not aware of that.