I have read a file with around 120k words so i try to do it fast. have seen the:

int x = setvbuf(fp, (char *)NULL, _IOFBF, BSZ);
assert( x == 0 && fp != NULL );

option but it takes more than a second ( 1 mb file) so now i tried this method :

f

open_s (&pFile,DICT,"rb");
if (pFile==NULL) {fputs ("File error",stderr); exit (1);}

// obtain file size:
fseek (pFile , 0 , SEEK_END);
lSize = ftell (pFile);
rewind (pFile);

// allocate memory to contain the whole file:
buffer = (char*) malloc (sizeof(char)*lSize);

// copy the file into the buffer:
result = fread (buffer,1,lSize,pFile);

how do i continue from here? buffer holds a list of words and i want to get them one by one as fast as possible because im building a multimap with those words.

need to be something like this:

while(token!=NULL){
        DB.insert(pair<unsigned int,string>((unsigned)strlen(token),token));
        //cout<< buffer;
        strtok_s(buffer,delims,&context);
    }

just getting the words and inserting to a database

thank you!

EDIT:
THIS WORKS :

    char* context   = NULL;
    char  delims[]  = " ,\t\n";
    char* token     = NULL;


    FILE * pFile;
    long lSize;
    char * buffer;
    size_t result;

    fopen_s (&pFile,DICT,"rb");
    if (pFile==NULL) {fputs ("File error",stderr); exit (1);}

    // obtain file size:
    fseek (pFile , 0 , SEEK_END);
    lSize = ftell (pFile);
    rewind (pFile);

    // allocate memory to contain the whole file:
    buffer = (char*) malloc (sizeof(char)*lSize);
    if (buffer == NULL) {fputs ("Memory error",stderr); exit (2);}

    // copy the file into the buffer:
    result = fread (buffer,1,lSize,pFile);
    if (result != lSize) {fputs ("Reading error",stderr); exit (3);}


    token = strtok (buffer," \n");
    while (token != NULL)
    {
        DB.insert(pair<unsigned int,string>((unsigned)strlen(token),token));
        token = strtok (NULL, " \n");
    }

but takes too long... i need something in 0.5> seconds approximetly... file size is only 1mb

Edited 4 Years Ago by Despairy

Reading in the file can be really fast (as it will be in the second example that you have). However, tokenizing the file will be slower. Have you tried a more C++ approach of using istream_iterator to read in and tokenize your file? Something like:

int main()
{
    // Open the file for reading
    std::ifstream in( "test.txt" );
    if ( ! in.is_open() )
    {
        std::cerr << failed to open file" << std::endl;
        return 1;
    }

    // Make a vector to store things in
    std::vector< std::string > vs;

    // Read all the words from the file
    std::copy( std::istream_iterator<std::string>( in ), std::istream_iterator< std::string >(), std::back_inserter( vs ) );

    // Print the words out (just for fun, delete this if you have a lot of words!)
    std::copy( vs.begin(), vs.end(), std::ostream_iterator< std::string >( std::cout, "\n" ) );
}

It looks like you're using a std::map to look-up the words by length as well. If you're not going to be inserting and deleting a lot of words all the time, then this might not be the fastest way to do it. You could consider using a std::vector instead, sorted when you've finished reading them in. You could make functor struct to sort by length and then find the words using std::lower_bound. Something like:

struct LengthComparer
{
    bool operator()( const string& s1, const string& s2 )
    {
        return s1.length() < s2.length();
    }
};

int main()
{
    // Open the file for reading
    std::ifstream in( "test.txt" );
    if ( ! in.is_open() )
    {
        std::cerr << failed to open file" << std::endl;
        return 1;
    }

    // Make a vector to store things in
    std::vector< std::string > vs;

    // Read all the words from the file
    std::copy( std::istream_iterator<std::string>( in ), std::istream_iterator< std::string >(), std::back_inserter( vs ) );

    // Sort the vector of words by length
    std::sort( vs.begin(), vs.end(), LengthComparer() );

    // Print the words out (just for fun, delete this if you have a lot of words!)
    std::copy( vs.begin(), vs.end(), std::ostream_iterator< std::string >( std::cout, "\n" ) );
}

Hope that's some help.

Edited 4 Years Ago by ravenous: Added missing parenthases

Comments
Good answer , with good explanation

this... isssss nice, didnt think of using those iterators... so simple, and quite fast :)
thanks alot!

This question has already been answered. Start a new discussion instead.