Hi guys!

I'm trying to rid my words of newline characters (and eventually punctuation as well). The words are in a two-dimensional array declared as words[MAXWORD][MAXLINE], where MAXWORD is currently 8000, and MAXLINE is currently 20. So up to 8000 words of 20 characters each.

To do this, I am parsing the array character by character, and only copying the character into a new array cleanwords if it *isn't* '\n'. Then I print out the cleanwords array, and it *should* be rid of '\n'. However, it is coming out exactly the same.

I put in some counters to keep track of how many operations are occurring, k for the number of times a character *isn't* a '\n', and l for the number of times a character *is* a '\n'. To my surprise, k is 6602 and l is 38. Now 38 is the number of lines in the text file I'm using, so I figure I must be on the right track -- the error must lie somewhere within the copying process -- for some reason, the '\n' are still migrating to the clean array.

I have been looking at this for some time, and I just can't see the problem. If you can tell me what I'm doing wrong, then I'd be very grateful.

Thanks and much love,
sd

char  words[MAXWORD][MAXLEN];
char cleanwords[MAXWORD][MAXLEN];
  char  buff[BUFSIZ];
  int   ntokens = 0;
  int   i, j, k, l;

[B]...[/B]




 for (i = 0; i < ntokens; i++) {
     for (j = 0; j < sizeof(words[i]); j++) {
         if (words[i][j] != '\n') {
            cleanwords[i][j] = words[i][j];
            k++; ;
         }
         else {
      
         l++;
}
     }
 }

If all that you want to do is to remove the new line you could always look at the character before the nul terminator.

if ( words[i][ strlen( words[i] ) - 1] == '\n' ) {
    words[i][ strlen( words[i] ) - 1] = '\0';
}

However since you want to check for the existence of other characters as well, let's see what you have:
Assuming that int ntokens = 0; will be set to some amount greater than 0 before hitting the loop. Using the string "Hello" as an example.

words[0][0] = 'H';   
         words[0][1] = 'e';
         words[0][2] = 'l';
         words[0][3] = 'l';
         words[0][4] = '0';
         words[0][5] = '\n';
         words[0][6] = '\0';
         words[0][7] = '/* garbage lives here';
         . /* garbage lives here */
         .
         .
         words[0][19] = '/* garbage lives here';
         
         After the loop.
         
         cleanwords[0][0] = 'H';
         cleanwords[0][1] = 'e';
         cleanwords[0][2] = 'l';
         cleanwords[0][3] = 'l';
         cleanwords[0][4] = 'o';
         cleanwords[0][5] = '/* garbage stays */ '; /* was a \n in the original */
         cleanwords[0][6] = '/0';
         . /* garbage copied here */
         .
         .
         cleanwords[0][19] = '/* garbage was copy here */';

That's what your loops are doing, courtesy of sizeof(words) and if (words[j] != '\n')
sizeof is giving you the length of the array and not the length of the string. The loop is doing work for the rest of the characters laying behind the string you want.
The if block is skipping the \n in the source string and skipping to overwrite whatever is in the destination string array subscript.

If all that you want to do is to remove the new line you could always look at the character before the nul terminator.

if ( words[i][ strlen( words[i] ) - 1] == '\n' ) {
    words[i][ strlen( words[i] ) - 1] = '\0';
}

Keeping the possibility of looooooong strings in mind and the compiler taking you at your word, there is no reason to traverse the same looooooong string twice. FWIW.

This article has been dead for over six months. Start a new discussion instead.