943,936 Members | Top Members by Rank

Ad:
  • C++ Discussion Thread
  • Unsolved
  • Views: 14390
  • C++ RSS
You are currently viewing page 1 of this multi-page discussion thread
Sep 23rd, 2008
0

The Fastest way to read a .txt File

Expand Post »
I am reading Comma delimited Large .txt files(About 50 Mb).
Currently I am using the method below to step through the lines in the file.
I have one other application that Read the exact same .txt file that I do.
That application will reach the end of the textFile in 5 Seconds while my method below will do it in 50 seconds. (I dont know what method that application uses)

So what I wonder is if there is a more effective way to read the txtFile than I do.
I have heard and red around that open the file in binary mode will be more efficient but I dont know the method to do this and what data I will get from ifstream.
The lines in the txt file that I read look like this:

Monday,1,2
Tuesday,2,3
Wednesday,3,4


std::string Text1;
double Number1 = 0;
doulbe Number2 = 0;	
char Comma;

	ifstream LargeFile("C:\\LargeFile.txt");

	while( getline(LargeFile, Text1, ',') )		
	{						
	     LargeFile >> Number1;		        
	     LargeFile >> Comma;                   
	     LargeFile >> Number2;			
	     LargeFile.get();	
	}
	MessageBox::Show("File has Reached End");
Last edited by Jennifer84; Sep 23rd, 2008 at 6:19 pm.
Similar Threads
Reputation Points: 10
Solved Threads: 1
Posting Pro
Jennifer84 is offline Offline
563 posts
since Feb 2008
Sep 23rd, 2008
0

Re: The Fastest way to read a .txt File

The complete example should look like this instead of the previous post.
I both Read from and Write to a .txt File.

std::string Text1;
double Number1 = 0;
doulbe Number2 = 0;	
char Comma;

ofstream OutPut;
OutPut.open("C:\\OutPut.txt");

	ifstream LargeFile("C:\\LargeFile.txt");

	while( getline(LargeFile, Text1, ',') )		
	{						
	     LargeFile >> Number1;		        
	     LargeFile >> Comma;                   
	     LargeFile >> Number2;			
	     LargeFile.get();	

              OutPut << Text1 << ',' << Number1 << ',' << Number2 << '\n';
	}

	MessageBox::Show("File has Reached End");std::string Text1;
Last edited by Jennifer84; Sep 23rd, 2008 at 7:51 pm.
Reputation Points: 10
Solved Threads: 1
Posting Pro
Jennifer84 is offline Offline
563 posts
since Feb 2008
Sep 23rd, 2008
0

Re: The Fastest way to read a .txt File

Try C style implementation. I don't know if its faster or slower so you will have to test with your huge file.
C++ Syntax (Toggle Plain Text)
  1. int main ()
  2. {
  3. char text[80];
  4. int n1, n2;
  5. FILE* fp = fopen("..\\TextFile1.txt", "r");
  6. if(fp)
  7. {
  8. while( fgets(text, sizeof(text), fp) )
  9. {
  10. char* p = strtok(text,",");
  11. p = strtok(NULL, ",");
  12. n1 = atol(p);
  13. p = strtok(NULL, ",");
  14. n2 = atol(p);
  15. cout << text << " " << n1 << " " << n2 << "\n";
  16. }
  17. }
  18. fclose(fp);
  19. return 0;
  20. }
Sponsor
Team Colleague
Featured Poster
Reputation Points: 5608
Solved Threads: 2282
Retired and Enjoying Life
Ancient Dragon is offline Offline
21,953 posts
since Aug 2005
Sep 23rd, 2008
-1

Re: The Fastest way to read a .txt File

Since you are not doing anything with those integers except output them to another file, there is no reason to convert them from char* to int.
C++ Syntax (Toggle Plain Text)
  1. int main ()
  2. {
  3. char text[80];
  4. int n1, n2;
  5. FILE* fp = fopen("..\\TextFile1.txt", "r");
  6. if(fp)
  7. {
  8. while( fgets(text, sizeof(text), fp) )
  9. {
  10. if(text[strlen(text)-1] == '\n')
  11. text[strlen(text)-1] = 0;
  12. char* p1 = strtok(text,",");
  13. char* p2 = strtok(NULL, ",");
  14. char* p3 = strtok(NULL, ",");
  15. cout << p1 << "," << p2 << "," << p3 << "\n";
  16.  
  17. }
  18. }
  19. fclose(fp);
  20. return 0;
  21. }
Sponsor
Team Colleague
Featured Poster
Reputation Points: 5608
Solved Threads: 2282
Retired and Enjoying Life
Ancient Dragon is offline Offline
21,953 posts
since Aug 2005
Sep 23rd, 2008
1

Re: The Fastest way to read a .txt File

What about using fread() to read a large buffer and then write it out ?
Reputation Points: 161
Solved Threads: 43
Posting Whiz
stilllearning is offline Offline
309 posts
since Oct 2007
Sep 23rd, 2008
0

Re: The Fastest way to read a .txt File

Yes, that can be done in a loop. If you are on MS-Windows just call win32 api function CopyFile().
Sponsor
Team Colleague
Featured Poster
Reputation Points: 5608
Solved Threads: 2282
Retired and Enjoying Life
Ancient Dragon is offline Offline
21,953 posts
since Aug 2005
Sep 24th, 2008
0

Re: The Fastest way to read a .txt File

The only addition to Ancient Dragon's C stream library method: add setvbuf call after fopen:
C++ Syntax (Toggle Plain Text)
  1. const size_t BSZ = 1024*32 // or more
  2. ...
  3. FILE* fp = fopen("..\\TextFile1.txt", "r");
  4. if (fp)
  5. {
  6. setvbuf(fp,0,_IOFBF,BSZ); // No need to free buffers explicitly
  7. ...
Default stream buffer size is too small for huge files. You will get much more faster file reading. As usually, in VC++ C streams and data conversions are faster than C++ ones.
It's possible to accelerate C++ streams with a proper streambuf declarations but it's the other story and VC++ slow getline absorbs the effect...
Reputation Points: 1234
Solved Threads: 347
Postaholic
ArkM is offline Offline
2,001 posts
since Jul 2008
Sep 24th, 2008
0

Re: The Fastest way to read a .txt File

Click to Expand / Collapse  Quote originally posted by ArkM ...
The only addition to Ancient Dragon's C stream library method: add setvbuf call after fopen:
C++ Syntax (Toggle Plain Text)
  1. const size_t BSZ = 1024*32 // or more
  2. ...
  3. FILE* fp = fopen("..\\TextFile1.txt", "r");
  4. if (fp)
  5. {
  6. setvbuf(fp,0,_IOFBF,BSZ); // No need to free buffers explicitly
  7. ...
Default stream buffer size is too small for huge files. You will get much more faster file reading. As usually, in VC++ C streams and data conversions are faster than C++ ones.
It's possible to accelerate C++ streams with a proper streambuf declarations but it's the other story and VC++ slow getline absorbs the effect...
Hmmm, i don't think calling setvbuf has any effect on reading a file. If u loop fgets calls the file is simply read line by line and the read info is placed direclty in the buffer u provide as parameter ( i think).
As far as I know, setvbuf sets the output buffer when writing a file, and as far as I know all file output operations are by default blockbuffered with the buffer set with the optimum size...
U're speed problem is that u read the file line by line. 4 optimum speed eff u should fread chunks of 512 or 1024 bytes ( the optimum size would be u're hdd cluster or sector size, or whatever ) and do the info processing in memory
kux
Reputation Points: 66
Solved Threads: 11
Junior Poster
kux is offline Offline
119 posts
since Jan 2008
Sep 24th, 2008
0

Re: The Fastest way to read a .txt File

ok, I was actually curious to see the effect of setvbuf over fgets

C++ Syntax (Toggle Plain Text)
  1. #include <stdio.h>
  2. #include <iostream>
  3. #include <assert.h>
  4. using namespace std;
  5.  
  6. int main(int argc, char* argv[])
  7. {
  8. const char* fname = "d:\\test162MB.txt";
  9. FILE *fp = fopen( fname, "rb" );
  10.  
  11. int x = setvbuf(fp, (char *)NULL, _IOFBF, 512);
  12.  
  13. assert( x == 0 &&fp != NULL );
  14.  
  15. char mysmallbuf[20];
  16. while ( fgets( mysmallbuf, 20, fp ) )
  17. {
  18. }
  19.  
  20. /*
  21. char mybigbuff[1024];
  22. while ( fread( mybigbuff, 1024, 1, fp ) )
  23. {
  24. }
  25. */
  26. return 0;
  27. }

running the following code with fgets the 162 mb file is read in 8 secs
running with fread it is read in 2 secs. So my conclussion is that no intermediate 512 bytes buffer is for reading large chunks of the file ( I thought that calling the first fgets would read 512 bytes and store them in a buffer, and the next xxx fgets would get from the intermediate buffer, not directly from the file, thus having the same speed as the fread version, but it seems not, setvbuf just has no effect over fgets, fgets is by default linebufferd )
kux
Reputation Points: 66
Solved Threads: 11
Junior Poster
kux is offline Offline
119 posts
since Jan 2008
Sep 24th, 2008
0

Re: The Fastest way to read a .txt File

The fread sounds interesting. I am used to VC++ so some calls here are new to me. First I will show exactly how the lines in the file look like:

Monday,1.1,1.2,1.3,1.4,1.5,1.6,1.7
Tuesday,1.1,1.2,1.3,1.4,1.5,1.6,1.7
Wednesday,1.1,1.2,1.3,1.4,1.5,1.6,1.7


Some questions I wonder:
In the fread(), I understand how fp is pointed to the file that will be red.
mybigbuff, I am not really sure what it stands for but it should be a buffer where data is stored I think ?
The next 1024 should be how many bytes that will be red each time ?
I have put the number of 8 next because I read 8 commadelimited values but I dont know what this number stands for ?
I tried to put the number 1 as in the example in the previous post but the program had an errormessage that said: "Expression: nptr != NULL"
Also I dont know what "rb" stands for in: fopen(fname, "rb"); "r" stands for reading I know.
The second argument should be the mode.


However if I use the code below and read this huge file(130 Mb), the messageBox will show after less than 0.5 sec wich is very fast.
I try to use ofstream OutPut to write some values to a file, but nothing is written.
I wonder if I do this correctly. I find this really interesting as I will read thousands of thousands of these files.


ofstream OutPut;
OutPut.open("C:\\out.txt");

double n1, n2, n3, n4, n5, n6, n7;
	
const char* fname = "C:\\test130mb.txt";
FILE* fp = fopen(fname, "rb");


    char mybigbuff[1024];
    while( fread(mybigbuff, 1024, 8, fp) )
    {
        char* p = strtok(text,",");
        p = strtok(NULL, ",");
        n1 = atol(p);
        p = strtok(NULL, ",");
        n2 = atol(p);
        p = strtok(NULL, ",");
        n3 = atol(p);
        p = strtok(NULL, ",");
        n4 = atol(p);
        p = strtok(NULL, ",");
        n5 = atol(p);
        p = strtok(NULL, ",");
        n6 = atol(p);
        p = strtok(NULL, ",");
        n7 = atol(p);

OutPut << text << " " << n1 << "\n";  //This does not give any OutPut
    }

  fclose(fp);
  MessageBox::Show("File has Reached End");
Last edited by Jennifer84; Sep 24th, 2008 at 12:25 pm.
Reputation Points: 10
Solved Threads: 1
Posting Pro
Jennifer84 is offline Offline
563 posts
since Feb 2008

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in C++ Forum Timeline: Compilation error
Next Thread in C++ Forum Timeline: system call to read the windows kernel message queue





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC