I have read a file with around 120k words so i try to do it fast. have seen the:

int x = setvbuf(fp, (char *)NULL, _IOFBF, BSZ);
assert( x == 0 && fp != NULL );

option but it takes more than a second ( 1 mb file) so now i tried this method :

f

open_s (&pFile,DICT,"rb");
if (pFile==NULL) {fputs ("File error",stderr); exit (1);}

// obtain file size:
fseek (pFile , 0 , SEEK_END);
lSize = ftell (pFile);
rewind (pFile);

// allocate memory to contain the whole file:
buffer = (char*) malloc (sizeof(char)*lSize);

// copy the file into the buffer:
result = fread (buffer,1,lSize,pFile);

how do i continue from here? buffer holds a list of words and i want to get them one by one as fast as possible because im building a multimap with those words.

need to be something like this:

while(token!=NULL){
        DB.insert(pair<unsigned int,string>((unsigned)strlen(token),token));
        //cout<< buffer;
        strtok_s(buffer,delims,&context);
    }

just getting the words and inserting to a database

thank you!

EDIT:
THIS WORKS :

    char* context   = NULL;
    char  delims[]  = " ,\t\n";
    char* token     = NULL;


    FILE * pFile;
    long lSize;
    char * buffer;
    size_t result;

    fopen_s (&pFile,DICT,"rb");
    if (pFile==NULL) {fputs ("File error",stderr); exit (1);}

    // obtain file size:
    fseek (pFile , 0 , SEEK_END);
    lSize = ftell (pFile);
    rewind (pFile);

    // allocate memory to contain the whole file:
    buffer = (char*) malloc (sizeof(char)*lSize);
    if (buffer == NULL) {fputs ("Memory error",stderr); exit (2);}

    // copy the file into the buffer:
    result = fread (buffer,1,lSize,pFile);
    if (result != lSize) {fputs ("Reading error",stderr); exit (3);}


    token = strtok (buffer," \n");
    while (token != NULL)
    {
        DB.insert(pair<unsigned int,string>((unsigned)strlen(token),token));
        token = strtok (NULL, " \n");
    }

but takes too long... i need something in 0.5> seconds approximetly... file size is only 1mb

Recommended Answers

All 2 Replies

Reading in the file can be really fast (as it will be in the second example that you have). However, tokenizing the file will be slower. Have you tried a more C++ approach of using istream_iterator to read in and tokenize your file? Something like:

int main()
{
    // Open the file for reading
    std::ifstream in( "test.txt" );
    if ( ! in.is_open() )
    {
        std::cerr << failed to open file" << std::endl;
        return 1;
    }

    // Make a vector to store things in
    std::vector< std::string > vs;

    // Read all the words from the file
    std::copy( std::istream_iterator<std::string>( in ), std::istream_iterator< std::string >(), std::back_inserter( vs ) );

    // Print the words out (just for fun, delete this if you have a lot of words!)
    std::copy( vs.begin(), vs.end(), std::ostream_iterator< std::string >( std::cout, "\n" ) );
}

It looks like you're using a std::map to look-up the words by length as well. If you're not going to be inserting and deleting a lot of words all the time, then this might not be the fastest way to do it. You could consider using a std::vector instead, sorted when you've finished reading them in. You could make functor struct to sort by length and then find the words using std::lower_bound. Something like:

struct LengthComparer
{
    bool operator()( const string& s1, const string& s2 )
    {
        return s1.length() < s2.length();
    }
};

int main()
{
    // Open the file for reading
    std::ifstream in( "test.txt" );
    if ( ! in.is_open() )
    {
        std::cerr << failed to open file" << std::endl;
        return 1;
    }

    // Make a vector to store things in
    std::vector< std::string > vs;

    // Read all the words from the file
    std::copy( std::istream_iterator<std::string>( in ), std::istream_iterator< std::string >(), std::back_inserter( vs ) );

    // Sort the vector of words by length
    std::sort( vs.begin(), vs.end(), LengthComparer() );

    // Print the words out (just for fun, delete this if you have a lot of words!)
    std::copy( vs.begin(), vs.end(), std::ostream_iterator< std::string >( std::cout, "\n" ) );
}

Hope that's some help.

commented: Good answer , with good explanation +0

this... isssss nice, didnt think of using those iterators... so simple, and quite fast :)
thanks alot!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.