text files and binary files
I am trying to open a text file which contains a dictionary of english words. Each word and it's definition are on the same line and the entries are delimited by a newline. Now, my question is that if you open a text file using fopen() in "rt" mode then do the newlines have a \r\n or just \n? In binary mode does the newline get interpreted as \r\n or just \n? Massive confusion!
Related Article: Compare two text files word by word
is a solved C discussion thread by PureHashIsh that has 4 replies, was last updated 1 year ago and has been tagged with the keywords: compare, file, text, words.
anumash
Junior Poster in Training
51 posts since Jan 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0
From your question, I assume you are using a Windows system? Do you know if the files are in MS, or in Unix/Linux format?
rubberman
Posting Maven
2,572 posts since Mar 2010
Reputation Points: 365
Solved Threads: 305
Skill Endorsements: 52
I am using a Windows system, the file is a .txt file
anumash
Junior Poster in Training
51 posts since Jan 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0
In text mode the newline sequence will be converted to '\n', this is true for any platform. In binary mode you're on your own, no translation will occur so on Windows you need to look for and handle newlines in the form of CRLF.
But it's problematic because you can get a text file formatted using POSIX newlines (just LF rather than CRLF). So if you just look for CRLF or rely on text mode translation the lines might not be split correctly. Fun, huh? ;)
deceptikon
Challenge Accepted
3,445 posts since Jan 2012
Reputation Points: 822
Solved Threads: 473
Skill Endorsements: 57
Ok. On Windows, a text newline ('\n') IS a carriage-return+linefeed ('\r\n') combination. You would only need to use the latter representation if you were reading the file from Unix/Linux systems. On Windows, it is still encoded as '\n'. IE, don't sweat it unless you are reading a file from one system type on another and have not passed the file through a filter to convert newlines accordingly, which normally a tool like ftp will do for you if the transfer is specified as text-type. There are also other tools which will convert newlines for you - this is a very common problem.
So, if you execute the function fprintf(outfile, "Hello World.\n"); on Windows, the file will contain a '\r\n' terminator on the line. On Linux/Unix, it would contain only a linefeed ('\n'). Reading back, the same code should work appropriately on either system, making programming applications that is intended to work on both types of systems much easier. Again, problems only occur when you are processing data written on one system type on the other.
And welcome to cross-platform programming and all the little warts you will encounter in that endeavor! :-)
rubberman
Posting Maven
2,572 posts since Mar 2010
Reputation Points: 365
Solved Threads: 305
Skill Endorsements: 52
what should I do if I want to detect a new line?
anumash
Junior Poster in Training
51 posts since Jan 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0
In text mode, look for '\n'. In binary mode, look for '\r' followed immediately by '\n'.
deceptikon
Challenge Accepted
3,445 posts since Jan 2012
Reputation Points: 822
Solved Threads: 473
Skill Endorsements: 57
i did that and I keep getting stuck in an infinite loop..i'll post my code in a minute...
anumash
Junior Poster in Training
51 posts since Jan 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0
/* program used to determine the number of characters and in turn the number of bytes in an
alphabet entry i.e. number of bytes in 'A', 'B' etc..
This program also searches for the longest entry in the database i.e. the maximum
number of bytes for a given word and it's definition which are found on the same line.
the program gets stuck in an infinite loop and I don't know why.
*/
#include<stdio.h>
#include<stdlib.h>
int main(){
FILE *fp;
fp=fopen("database.txt","rt");
if(fp==NULL)
{printf("Error opening file!");
exit(1);
} // File open and error checking
char ch;
ch= fgetc(fp);
char alphabet='A';
unsigned long countal[26]; //to store the number of bytes for a particular entry (dictionary is sorted)
int size1=0;
short i=0;
int size=0;
while(alphabet<='Z') // A through Z, looping through the entire file untile eof.
{
unsigned long chars=0;
if(ch==alphabet) /* if found then increment the number of bytes and check the size
{ of a given entry */
while(ch!='\n') // infinite loop??
{chars++;
size++;
ch=fgetc(fp);
}
}
else
{
while(ch!='\n')
{size++;
ch=fgetc(fp);
}
}
if(size>size1)
size1=size;
size=0;
ch=fgetc(fp);
countal[i]=chars;
i++;
if(ch==EOF)
{
alphabet++;
rewind(fp);
}
}
printf("Largest directory entry: %d\n",size1);
char abcd='A';
for(i=0;i<26;i++)
{
printf("%c= ",abcd);
printf("%u bytes\n",countal[i]);
abcd++;
}
fclose(fp);
return 0;
}
anumash
Junior Poster in Training
51 posts since Jan 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0
When you have a string of words - here a word, and then it's definition, on the same line, you want to use fgets() and put the entire line into a char array (I use "buffer", all at once.
The newline will be included on the end of the buffer (space permitting), so now using strlen(buffer) you can get the full size. Easy smeazy.
while((fgets(buffer, sizeof(buffer), filePointer))!= NULL) {
//your other code in here
}
Remember to make buffer longer than any possible line of text, and you're good to go. A word, plus a definition, may be a line longer than 200 chars - so think 500 for starters.
Adak
Posting Virtuoso
1,640 posts since Jun 2008
Reputation Points: 456
Solved Threads: 196
Skill Endorsements: 7
Hey thanks for your valuable suggestion! :):)
anumash
Junior Poster in Training
51 posts since Jan 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0
Does the function fgets() increment the file pointer internally to point to the next line??
anumash
Junior Poster in Training
51 posts since Jan 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0
Does the function fgets() increment the file pointer internally to point to the next line??
Yes, all of the standard I/O functions adjust the file position accordingly.
deceptikon
Challenge Accepted
3,445 posts since Jan 2012
Reputation Points: 822
Solved Threads: 473
Skill Endorsements: 57