The Fastest way to read a .txt File

Please support our C++ advertiser: Intel Parallel Studio Home
Reply

Join Date: Feb 2008
Posts: 517
Reputation: Jennifer84 is an unknown quantity at this point 
Solved Threads: 1
Jennifer84 Jennifer84 is offline Offline
Posting Pro

The Fastest way to read a .txt File

 
0
  #1
Sep 23rd, 2008
I am reading Comma delimited Large .txt files(About 50 Mb).
Currently I am using the method below to step through the lines in the file.
I have one other application that Read the exact same .txt file that I do.
That application will reach the end of the textFile in 5 Seconds while my method below will do it in 50 seconds. (I dont know what method that application uses)

So what I wonder is if there is a more effective way to read the txtFile than I do.
I have heard and red around that open the file in binary mode will be more efficient but I dont know the method to do this and what data I will get from ifstream.
The lines in the txt file that I read look like this:

Monday,1,2
Tuesday,2,3
Wednesday,3,4


std::string Text1;
double Number1 = 0;
doulbe Number2 = 0;	
char Comma;

	ifstream LargeFile("C:\\LargeFile.txt");

	while( getline(LargeFile, Text1, ',') )		
	{						
	     LargeFile >> Number1;		        
	     LargeFile >> Comma;                   
	     LargeFile >> Number2;			
	     LargeFile.get();	
	}
	MessageBox::Show("File has Reached End");
Last edited by Jennifer84; Sep 23rd, 2008 at 6:19 pm.
Reply With Quote Quick reply to this message  
Join Date: Feb 2008
Posts: 517
Reputation: Jennifer84 is an unknown quantity at this point 
Solved Threads: 1
Jennifer84 Jennifer84 is offline Offline
Posting Pro

Re: The Fastest way to read a .txt File

 
0
  #2
Sep 23rd, 2008
The complete example should look like this instead of the previous post.
I both Read from and Write to a .txt File.

std::string Text1;
double Number1 = 0;
doulbe Number2 = 0;	
char Comma;

ofstream OutPut;
OutPut.open("C:\\OutPut.txt");

	ifstream LargeFile("C:\\LargeFile.txt");

	while( getline(LargeFile, Text1, ',') )		
	{						
	     LargeFile >> Number1;		        
	     LargeFile >> Comma;                   
	     LargeFile >> Number2;			
	     LargeFile.get();	

              OutPut << Text1 << ',' << Number1 << ',' << Number2 << '\n';
	}

	MessageBox::Show("File has Reached End");std::string Text1;
Last edited by Jennifer84; Sep 23rd, 2008 at 7:51 pm.
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 15,546
Reputation: Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute 
Solved Threads: 1483
Team Colleague
Featured Poster
Ancient Dragon's Avatar
Ancient Dragon Ancient Dragon is offline Offline
Still Learning

Re: The Fastest way to read a .txt File

 
0
  #3
Sep 23rd, 2008
Try C style implementation. I don't know if its faster or slower so you will have to test with your huge file.
  1. int main ()
  2. {
  3. char text[80];
  4. int n1, n2;
  5. FILE* fp = fopen("..\\TextFile1.txt", "r");
  6. if(fp)
  7. {
  8. while( fgets(text, sizeof(text), fp) )
  9. {
  10. char* p = strtok(text,",");
  11. p = strtok(NULL, ",");
  12. n1 = atol(p);
  13. p = strtok(NULL, ",");
  14. n2 = atol(p);
  15. cout << text << " " << n1 << " " << n2 << "\n";
  16. }
  17. }
  18. fclose(fp);
  19. return 0;
  20. }
Don't PM me with questions -- you might get a nasty PM in response. If you have a question then post it in one of the forums.
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 15,546
Reputation: Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute 
Solved Threads: 1483
Team Colleague
Featured Poster
Ancient Dragon's Avatar
Ancient Dragon Ancient Dragon is offline Offline
Still Learning

Re: The Fastest way to read a .txt File

 
-1
  #4
Sep 23rd, 2008
Since you are not doing anything with those integers except output them to another file, there is no reason to convert them from char* to int.
  1. int main ()
  2. {
  3. char text[80];
  4. int n1, n2;
  5. FILE* fp = fopen("..\\TextFile1.txt", "r");
  6. if(fp)
  7. {
  8. while( fgets(text, sizeof(text), fp) )
  9. {
  10. if(text[strlen(text)-1] == '\n')
  11. text[strlen(text)-1] = 0;
  12. char* p1 = strtok(text,",");
  13. char* p2 = strtok(NULL, ",");
  14. char* p3 = strtok(NULL, ",");
  15. cout << p1 << "," << p2 << "," << p3 << "\n";
  16.  
  17. }
  18. }
  19. fclose(fp);
  20. return 0;
  21. }
Don't PM me with questions -- you might get a nasty PM in response. If you have a question then post it in one of the forums.
Reply With Quote Quick reply to this message  
Join Date: Oct 2007
Posts: 305
Reputation: stilllearning has a spectacular aura about stilllearning has a spectacular aura about 
Solved Threads: 43
stilllearning stilllearning is offline Offline
Posting Whiz

Re: The Fastest way to read a .txt File

 
1
  #5
Sep 23rd, 2008
What about using fread() to read a large buffer and then write it out ?
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 15,546
Reputation: Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute Ancient Dragon has a reputation beyond repute 
Solved Threads: 1483
Team Colleague
Featured Poster
Ancient Dragon's Avatar
Ancient Dragon Ancient Dragon is offline Offline
Still Learning

Re: The Fastest way to read a .txt File

 
0
  #6
Sep 23rd, 2008
Yes, that can be done in a loop. If you are on MS-Windows just call win32 api function CopyFile().
Don't PM me with questions -- you might get a nasty PM in response. If you have a question then post it in one of the forums.
Reply With Quote Quick reply to this message  
Join Date: Jul 2008
Posts: 2,001
Reputation: ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of 
Solved Threads: 343
ArkM's Avatar
ArkM ArkM is offline Offline
Postaholic

Re: The Fastest way to read a .txt File

 
0
  #7
Sep 24th, 2008
The only addition to Ancient Dragon's C stream library method: add setvbuf call after fopen:
  1. const size_t BSZ = 1024*32 // or more
  2. ...
  3. FILE* fp = fopen("..\\TextFile1.txt", "r");
  4. if (fp)
  5. {
  6. setvbuf(fp,0,_IOFBF,BSZ); // No need to free buffers explicitly
  7. ...
Default stream buffer size is too small for huge files. You will get much more faster file reading. As usually, in VC++ C streams and data conversions are faster than C++ ones.
It's possible to accelerate C++ streams with a proper streambuf declarations but it's the other story and VC++ slow getline absorbs the effect...
Reply With Quote Quick reply to this message  
Join Date: Jan 2008
Posts: 119
Reputation: kux is on a distinguished road 
Solved Threads: 10
kux kux is offline Offline
Junior Poster

Re: The Fastest way to read a .txt File

 
0
  #8
Sep 24th, 2008
Originally Posted by ArkM View Post
The only addition to Ancient Dragon's C stream library method: add setvbuf call after fopen:
  1. const size_t BSZ = 1024*32 // or more
  2. ...
  3. FILE* fp = fopen("..\\TextFile1.txt", "r");
  4. if (fp)
  5. {
  6. setvbuf(fp,0,_IOFBF,BSZ); // No need to free buffers explicitly
  7. ...
Default stream buffer size is too small for huge files. You will get much more faster file reading. As usually, in VC++ C streams and data conversions are faster than C++ ones.
It's possible to accelerate C++ streams with a proper streambuf declarations but it's the other story and VC++ slow getline absorbs the effect...
Hmmm, i don't think calling setvbuf has any effect on reading a file. If u loop fgets calls the file is simply read line by line and the read info is placed direclty in the buffer u provide as parameter ( i think).
As far as I know, setvbuf sets the output buffer when writing a file, and as far as I know all file output operations are by default blockbuffered with the buffer set with the optimum size...
U're speed problem is that u read the file line by line. 4 optimum speed eff u should fread chunks of 512 or 1024 bytes ( the optimum size would be u're hdd cluster or sector size, or whatever ) and do the info processing in memory
Reply With Quote Quick reply to this message  
Join Date: Jan 2008
Posts: 119
Reputation: kux is on a distinguished road 
Solved Threads: 10
kux kux is offline Offline
Junior Poster

Re: The Fastest way to read a .txt File

 
0
  #9
Sep 24th, 2008
ok, I was actually curious to see the effect of setvbuf over fgets

  1. #include <stdio.h>
  2. #include <iostream>
  3. #include <assert.h>
  4. using namespace std;
  5.  
  6. int main(int argc, char* argv[])
  7. {
  8. const char* fname = "d:\\test162MB.txt";
  9. FILE *fp = fopen( fname, "rb" );
  10.  
  11. int x = setvbuf(fp, (char *)NULL, _IOFBF, 512);
  12.  
  13. assert( x == 0 &&fp != NULL );
  14.  
  15. char mysmallbuf[20];
  16. while ( fgets( mysmallbuf, 20, fp ) )
  17. {
  18. }
  19.  
  20. /*
  21. char mybigbuff[1024];
  22. while ( fread( mybigbuff, 1024, 1, fp ) )
  23. {
  24. }
  25. */
  26. return 0;
  27. }

running the following code with fgets the 162 mb file is read in 8 secs
running with fread it is read in 2 secs. So my conclussion is that no intermediate 512 bytes buffer is for reading large chunks of the file ( I thought that calling the first fgets would read 512 bytes and store them in a buffer, and the next xxx fgets would get from the intermediate buffer, not directly from the file, thus having the same speed as the fread version, but it seems not, setvbuf just has no effect over fgets, fgets is by default linebufferd )
Reply With Quote Quick reply to this message  
Join Date: Feb 2008
Posts: 517
Reputation: Jennifer84 is an unknown quantity at this point 
Solved Threads: 1
Jennifer84 Jennifer84 is offline Offline
Posting Pro

Re: The Fastest way to read a .txt File

 
0
  #10
Sep 24th, 2008
The fread sounds interesting. I am used to VC++ so some calls here are new to me. First I will show exactly how the lines in the file look like:

Monday,1.1,1.2,1.3,1.4,1.5,1.6,1.7
Tuesday,1.1,1.2,1.3,1.4,1.5,1.6,1.7
Wednesday,1.1,1.2,1.3,1.4,1.5,1.6,1.7


Some questions I wonder:
In the fread(), I understand how fp is pointed to the file that will be red.
mybigbuff, I am not really sure what it stands for but it should be a buffer where data is stored I think ?
The next 1024 should be how many bytes that will be red each time ?
I have put the number of 8 next because I read 8 commadelimited values but I dont know what this number stands for ?
I tried to put the number 1 as in the example in the previous post but the program had an errormessage that said: "Expression: nptr != NULL"
Also I dont know what "rb" stands for in: fopen(fname, "rb"); "r" stands for reading I know.
The second argument should be the mode.


However if I use the code below and read this huge file(130 Mb), the messageBox will show after less than 0.5 sec wich is very fast.
I try to use ofstream OutPut to write some values to a file, but nothing is written.
I wonder if I do this correctly. I find this really interesting as I will read thousands of thousands of these files.


ofstream OutPut;
OutPut.open("C:\\out.txt");

double n1, n2, n3, n4, n5, n6, n7;
	
const char* fname = "C:\\test130mb.txt";
FILE* fp = fopen(fname, "rb");


    char mybigbuff[1024];
    while( fread(mybigbuff, 1024, 8, fp) )
    {
        char* p = strtok(text,",");
        p = strtok(NULL, ",");
        n1 = atol(p);
        p = strtok(NULL, ",");
        n2 = atol(p);
        p = strtok(NULL, ",");
        n3 = atol(p);
        p = strtok(NULL, ",");
        n4 = atol(p);
        p = strtok(NULL, ",");
        n5 = atol(p);
        p = strtok(NULL, ",");
        n6 = atol(p);
        p = strtok(NULL, ",");
        n7 = atol(p);

OutPut << text << " " << n1 << "\n";  //This does not give any OutPut
    }

  fclose(fp);
  MessageBox::Show("File has Reached End");
Last edited by Jennifer84; Sep 24th, 2008 at 12:25 pm.
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Similar Threads
Other Threads in the C++ Forum
Thread Tools Search this Thread



Tag cloud for C++
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC