| | |
The Fastest way to read a .txt File
Please support our C++ advertiser: Intel Parallel Studio Home
![]() |
•
•
Join Date: Feb 2008
Posts: 517
Reputation:
Solved Threads: 1
I am reading Comma delimited Large .txt files(About 50 Mb).
Currently I am using the method below to step through the lines in the file.
I have one other application that Read the exact same .txt file that I do.
That application will reach the end of the textFile in 5 Seconds while my method below will do it in 50 seconds. (I dont know what method that application uses)
So what I wonder is if there is a more effective way to read the txtFile than I do.
I have heard and red around that open the file in binary mode will be more efficient but I dont know the method to do this and what data I will get from ifstream.
The lines in the txt file that I read look like this:
Monday,1,2
Tuesday,2,3
Wednesday,3,4
Currently I am using the method below to step through the lines in the file.
I have one other application that Read the exact same .txt file that I do.
That application will reach the end of the textFile in 5 Seconds while my method below will do it in 50 seconds. (I dont know what method that application uses)
So what I wonder is if there is a more effective way to read the txtFile than I do.
I have heard and red around that open the file in binary mode will be more efficient but I dont know the method to do this and what data I will get from ifstream.
The lines in the txt file that I read look like this:
Monday,1,2
Tuesday,2,3
Wednesday,3,4
std::string Text1;
double Number1 = 0;
doulbe Number2 = 0;
char Comma;
ifstream LargeFile("C:\\LargeFile.txt");
while( getline(LargeFile, Text1, ',') )
{
LargeFile >> Number1;
LargeFile >> Comma;
LargeFile >> Number2;
LargeFile.get();
}
MessageBox::Show("File has Reached End"); Last edited by Jennifer84; Sep 23rd, 2008 at 6:19 pm.
•
•
Join Date: Feb 2008
Posts: 517
Reputation:
Solved Threads: 1
The complete example should look like this instead of the previous post.
I both Read from and Write to a .txt File.
I both Read from and Write to a .txt File.
std::string Text1; double Number1 = 0; doulbe Number2 = 0; char Comma; ofstream OutPut; OutPut.open("C:\\OutPut.txt"); ifstream LargeFile("C:\\LargeFile.txt"); while( getline(LargeFile, Text1, ',') ) { LargeFile >> Number1; LargeFile >> Comma; LargeFile >> Number2; LargeFile.get(); OutPut << Text1 << ',' << Number1 << ',' << Number2 << '\n'; } MessageBox::Show("File has Reached End");std::string Text1;
Last edited by Jennifer84; Sep 23rd, 2008 at 7:51 pm.
Try C style implementation. I don't know if its faster or slower so you will have to test with your huge file.
C++ Syntax (Toggle Plain Text)
int main () { char text[80]; int n1, n2; FILE* fp = fopen("..\\TextFile1.txt", "r"); if(fp) { while( fgets(text, sizeof(text), fp) ) { char* p = strtok(text,","); p = strtok(NULL, ","); n1 = atol(p); p = strtok(NULL, ","); n2 = atol(p); cout << text << " " << n1 << " " << n2 << "\n"; } } fclose(fp); return 0; }
Don't PM me with questions -- you might get a nasty PM in response. If you have a question then post it in one of the forums.
Since you are not doing anything with those integers except output them to another file, there is no reason to convert them from char* to int.
C++ Syntax (Toggle Plain Text)
int main () { char text[80]; int n1, n2; FILE* fp = fopen("..\\TextFile1.txt", "r"); if(fp) { while( fgets(text, sizeof(text), fp) ) { if(text[strlen(text)-1] == '\n') text[strlen(text)-1] = 0; char* p1 = strtok(text,","); char* p2 = strtok(NULL, ","); char* p3 = strtok(NULL, ","); cout << p1 << "," << p2 << "," << p3 << "\n"; } } fclose(fp); return 0; }
Don't PM me with questions -- you might get a nasty PM in response. If you have a question then post it in one of the forums.
The only addition to Ancient Dragon's C stream library method: add setvbuf call after fopen:
Default stream buffer size is too small for huge files. You will get much more faster file reading. As usually, in VC++ C streams and data conversions are faster than C++ ones.
It's possible to accelerate C++ streams with a proper streambuf declarations but it's the other story and VC++ slow getline absorbs the effect...
C++ Syntax (Toggle Plain Text)
const size_t BSZ = 1024*32 // or more ... FILE* fp = fopen("..\\TextFile1.txt", "r"); if (fp) { setvbuf(fp,0,_IOFBF,BSZ); // No need to free buffers explicitly ...
It's possible to accelerate C++ streams with a proper streambuf declarations but it's the other story and VC++ slow getline absorbs the effect...
•
•
Join Date: Jan 2008
Posts: 119
Reputation:
Solved Threads: 10
•
•
•
•
The only addition to Ancient Dragon's C stream library method: add setvbuf call after fopen:
Default stream buffer size is too small for huge files. You will get much more faster file reading. As usually, in VC++ C streams and data conversions are faster than C++ ones.C++ Syntax (Toggle Plain Text)
const size_t BSZ = 1024*32 // or more ... FILE* fp = fopen("..\\TextFile1.txt", "r"); if (fp) { setvbuf(fp,0,_IOFBF,BSZ); // No need to free buffers explicitly ...
It's possible to accelerate C++ streams with a proper streambuf declarations but it's the other story and VC++ slow getline absorbs the effect...
As far as I know, setvbuf sets the output buffer when writing a file, and as far as I know all file output operations are by default blockbuffered with the buffer set with the optimum size...
U're speed problem is that u read the file line by line. 4 optimum speed eff u should fread chunks of 512 or 1024 bytes ( the optimum size would be u're hdd cluster or sector size, or whatever ) and do the info processing in memory
•
•
Join Date: Jan 2008
Posts: 119
Reputation:
Solved Threads: 10
ok, I was actually curious to see the effect of setvbuf over fgets
running the following code with fgets the 162 mb file is read in 8 secs
running with fread it is read in 2 secs. So my conclussion is that no intermediate 512 bytes buffer is for reading large chunks of the file ( I thought that calling the first fgets would read 512 bytes and store them in a buffer, and the next xxx fgets would get from the intermediate buffer, not directly from the file, thus having the same speed as the fread version, but it seems not, setvbuf just has no effect over fgets, fgets is by default linebufferd )
C++ Syntax (Toggle Plain Text)
#include <stdio.h> #include <iostream> #include <assert.h> using namespace std; int main(int argc, char* argv[]) { const char* fname = "d:\\test162MB.txt"; FILE *fp = fopen( fname, "rb" ); int x = setvbuf(fp, (char *)NULL, _IOFBF, 512); assert( x == 0 &&fp != NULL ); char mysmallbuf[20]; while ( fgets( mysmallbuf, 20, fp ) ) { } /* char mybigbuff[1024]; while ( fread( mybigbuff, 1024, 1, fp ) ) { } */ return 0; }
running the following code with fgets the 162 mb file is read in 8 secs
running with fread it is read in 2 secs. So my conclussion is that no intermediate 512 bytes buffer is for reading large chunks of the file ( I thought that calling the first fgets would read 512 bytes and store them in a buffer, and the next xxx fgets would get from the intermediate buffer, not directly from the file, thus having the same speed as the fread version, but it seems not, setvbuf just has no effect over fgets, fgets is by default linebufferd )
•
•
Join Date: Feb 2008
Posts: 517
Reputation:
Solved Threads: 1
The fread sounds interesting. I am used to VC++ so some calls here are new to me. First I will show exactly how the lines in the file look like:
Monday,1.1,1.2,1.3,1.4,1.5,1.6,1.7
Tuesday,1.1,1.2,1.3,1.4,1.5,1.6,1.7
Wednesday,1.1,1.2,1.3,1.4,1.5,1.6,1.7
Some questions I wonder:
In the fread(), I understand how fp is pointed to the file that will be red.
mybigbuff, I am not really sure what it stands for but it should be a buffer where data is stored I think ?
The next 1024 should be how many bytes that will be red each time ?
I have put the number of 8 next because I read 8 commadelimited values but I dont know what this number stands for ?
I tried to put the number 1 as in the example in the previous post but the program had an errormessage that said: "Expression: nptr != NULL"
Also I dont know what "rb" stands for in: fopen(fname, "rb"); "r" stands for reading I know.
The second argument should be the mode.
However if I use the code below and read this huge file(130 Mb), the messageBox will show after less than 0.5 sec wich is very fast.
I try to use ofstream OutPut to write some values to a file, but nothing is written.
I wonder if I do this correctly. I find this really interesting as I will read thousands of thousands of these files.
Monday,1.1,1.2,1.3,1.4,1.5,1.6,1.7
Tuesday,1.1,1.2,1.3,1.4,1.5,1.6,1.7
Wednesday,1.1,1.2,1.3,1.4,1.5,1.6,1.7
Some questions I wonder:
In the fread(), I understand how fp is pointed to the file that will be red.
mybigbuff, I am not really sure what it stands for but it should be a buffer where data is stored I think ?
The next 1024 should be how many bytes that will be red each time ?
I have put the number of 8 next because I read 8 commadelimited values but I dont know what this number stands for ?
I tried to put the number 1 as in the example in the previous post but the program had an errormessage that said: "Expression: nptr != NULL"
Also I dont know what "rb" stands for in: fopen(fname, "rb"); "r" stands for reading I know.
The second argument should be the mode.
However if I use the code below and read this huge file(130 Mb), the messageBox will show after less than 0.5 sec wich is very fast.
I try to use ofstream OutPut to write some values to a file, but nothing is written.
I wonder if I do this correctly. I find this really interesting as I will read thousands of thousands of these files.
ofstream OutPut;
OutPut.open("C:\\out.txt");
double n1, n2, n3, n4, n5, n6, n7;
const char* fname = "C:\\test130mb.txt";
FILE* fp = fopen(fname, "rb");
char mybigbuff[1024];
while( fread(mybigbuff, 1024, 8, fp) )
{
char* p = strtok(text,",");
p = strtok(NULL, ",");
n1 = atol(p);
p = strtok(NULL, ",");
n2 = atol(p);
p = strtok(NULL, ",");
n3 = atol(p);
p = strtok(NULL, ",");
n4 = atol(p);
p = strtok(NULL, ",");
n5 = atol(p);
p = strtok(NULL, ",");
n6 = atol(p);
p = strtok(NULL, ",");
n7 = atol(p);
OutPut << text << " " << n1 << "\n"; //This does not give any OutPut
}
fclose(fp);
MessageBox::Show("File has Reached End"); Last edited by Jennifer84; Sep 24th, 2008 at 12:25 pm.
![]() |
Similar Threads
Other Threads in the C++ Forum
- Previous Thread: Compilation error
- Next Thread: system call to read the windows kernel message queue
| Thread Tools | Search this Thread |
Tag cloud for C++
6 api array arrays based beginner binary bmp c++ c/c++ calculator char class classes code compile compiler console conversion convert count data delete deploy desktop directshow dll download dynamic encryption error file forms fstream function functions game givemetehcodez google graph gui homeworkhelp iamthwee ifstream input int java lib library linkedlist linker list loop looping loops map math matrix memory microsoft newbie news number output pointer problem program programming project python random read recursion recursive reference simple string strings studio system temperature template templates test text text-file tree unix url variable vector video visual visualstudio void win32 windows winsock wordfrequency wxwidgets






