C++ - I read a whole file (_which is a list of words seperated by 2 white s

Question

Despairy 0 Light Poster

11 Years Ago

I have read a file with around 120k words so i try to do it fast. have seen the:

int x = setvbuf(fp, (char *)NULL, _IOFBF, BSZ);
assert( x == 0 && fp != NULL );

option but it takes more than a second ( 1 mb file) so now i tried this method :

f

open_s (&pFile,DICT,"rb");
if (pFile==NULL) {fputs ("File error",stderr); exit (1);}

// obtain file size:
fseek (pFile , 0 , SEEK_END);
lSize = ftell (pFile);
rewind (pFile);

// allocate memory to contain the whole file:
buffer = (char*) malloc (sizeof(char)*lSize);

// copy the file into the buffer:
result = fread (buffer,1,lSize,pFile);

how do i continue from here? buffer holds a list of words and i want to get them one by one as fast as possible because im building a multimap with those words.

need to be something like this:

while(token!=NULL){
        DB.insert(pair<unsigned int,string>((unsigned)strlen(token),token));
        //cout<< buffer;
        strtok_s(buffer,delims,&context);
    }

just getting the words and inserting to a database

thank you!

EDIT:
THIS WORKS :

    char* context   = NULL;
    char  delims[]  = " ,\t\n";
    char* token     = NULL;


    FILE * pFile;
    long lSize;
    char * buffer;
    size_t result;

    fopen_s (&pFile,DICT,"rb");
    if (pFile==NULL) {fputs ("File error",stderr); exit (1);}

    // obtain file size:
    fseek (pFile , 0 , SEEK_END);
    lSize = ftell (pFile);
    rewind (pFile);

    // allocate memory to contain the whole file:
    buffer = (char*) malloc (sizeof(char)*lSize);
    if (buffer == NULL) {fputs ("Memory error",stderr); exit (2);}

    // copy the file into the buffer:
    result = fread (buffer,1,lSize,pFile);
    if (result != lSize) {fputs ("Reading error",stderr); exit (3);}


    token = strtok (buffer," \n");
    while (token != NULL)
    {
        DB.insert(pair<unsigned int,string>((unsigned)strlen(token),token));
        token = strtok (NULL, " \n");
    }

but takes too long... i need something in 0.5> seconds approximetly... file size is only 1mb

c++

Edited 11 Years Ago by Despairy

2 Contributors
2 Replies
280 Views
17 Hours Discussion Span
Latest Post 11 Years Ago Latest Post by Despairy

All 2 Replies

ravenous 266 Posting Pro in Training

11 Years Ago

Reading in the file can be really fast (as it will be in the second example that you have). However, tokenizing the file will be slower. Have you tried a more C++ approach of using istream_iterator to read in and tokenize your file? Something like:

int main()
{
    // Open the file for reading
    std::ifstream in( "test.txt" );
    if ( ! in.is_open() )
    {
        std::cerr << failed to open file" << std::endl;
        return 1;
    }

    // Make a vector to store things in
    std::vector< std::string > vs;

    // Read all the words from the file
    std::copy( std::istream_iterator<std::string>( in ), std::istream_iterator< std::string >(), std::back_inserter( vs ) );

    // Print the words out (just for fun, delete this if you have a lot of words!)
    std::copy( vs.begin(), vs.end(), std::ostream_iterator< std::string >( std::cout, "\n" ) );
}

It looks like you're using a std::map to look-up the words by length as well. If you're not going to be inserting and deleting a lot of words all the time, then this might not be the fastest way to do it. You could consider using a std::vector instead, sorted when you've finished reading them in. You could make functor struct to sort by length and then find the words using std::lower_bound. Something like:

struct LengthComparer
{
    bool operator()( const string& s1, const string& s2 )
    {
        return s1.length() < s2.length();
    }
};

int main()
{
    // Open the file for reading
    std::ifstream in( "test.txt" );
    if ( ! in.is_open() )
    {
        std::cerr << failed to open file" << std::endl;
        return 1;
    }

    // Make a vector to store things in
    std::vector< std::string > vs;

    // Read all the words from the file
    std::copy( std::istream_iterator<std::string>( in ), std::istream_iterator< std::string >(), std::back_inserter( vs ) );

    // Sort the vector of words by length
    std::sort( vs.begin(), vs.end(), LengthComparer() );

    // Print the words out (just for fun, delete this if you have a lot of words!)
    std::copy( vs.begin(), vs.end(), std::ostream_iterator< std::string >( std::cout, "\n" ) );
}

Hope that's some help.

Edited 11 Years Ago by ravenous because: Added missing parenthases

Despairy commented: Good answer , with good explanation +0

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Despairy 0 Light Poster · Answer 1 · 2012-05-14T14:40:46+00:00

this... isssss nice, didnt think of using those iterators... so simple, and quite fast :)
thanks alot!

C++ - I read a whole file (_which is a list of words seperated by 2 white s

Recommended Answers Collapse Answers

All 2 Replies

Recommended Answers