![]() |
| ||
| Mind-blowing problem here (At least for me) Okay, I'm making a program that puts six word sequences into a hash table by taking the ascii value of each letter of the word and adding it to the sum, and multiplying it by a value, such as; num = num + int(firstName[nq])*(nq+1); in a loop. Now, I have a hash table the exact size of the number of total words. In theory, every six word sequence should have a unique number and be put into the hash table likewise, while those six word sequences that are the same should be put into the same spot of the hash table. However, when I run my program, I get way too many collisions. It will say I've got 643 of the same six-word sequences between two different files that don't even have a single six word sequence that is the same. Can anyone POSSIBLY help? Here is my code: int getdir (string dir, vector<string> &files) |
| ||
| Re: Mind-blowing problem here (At least for me) You haven't told us how many words each of the two files have. |
| ||
| Re: Mind-blowing problem here (At least for me) A few points: Sorry to preach... You allocate 33971 lists!! Much better is to use a map. std::map<int,std::string> If you want to record clashes then us a multimap. Next: Why open all the files at once, have one ifstream and open a file, process and open the next file. C: The loop from line 68 shows that you missed that strings can expose their char data. for(int i=0;i<SIZE;i++) Then I lost the will to live.... I can't figure out what the get six words bit is below is intended to do. However, hope this helps a bit, anyway. |
| ||
| Re: Mind-blowing problem here (At least for me) Quote:
|
| ||
| Re: Mind-blowing problem here (At least for me) Start from the beginning: List_3358<int> *newList=new List_3358<int>[33971];A magic number (explicit int constant > 2 - bad style). Possible solution: const size_t TABLESIZE = 33971; // what is it and why... An example of a wrong code and an irresponsible design: char tempo = *argv[argc-1];It was just your imagination that the last parameter is a single digit. Suppose the program started without parameters or with /? parameter (common convention: an user ask help on parameters, all good programs must support this feature). Alas, the next statement is wrong in any case. The atoi function wants a pointer to C-string (char array terminated by null chr). Of course, you get unpredictable result here. Strictly speaking, it's the most valued reason of your troubles. This unpredictable wrong number becames your SIZE value... Apropos, a very strange all cmd line parameters concatenation is a very dangerous action too. Avoid unusual redeclarations of common and well-known library names (rand, for example). It's possible but looks like a bad style. To spread butter on butter: vector<string> files = vector<string>(); The readdir function returns ALL directory entries: files, directories, links, sockets etc. You need file entries only. Use d_type field of struct dirent to recognize file entries. Consult your system docs for right values of the d_type field. Try to correct errors then try again... |
| All times are GMT -4. The time now is 5:08 pm. |
Forum system based on vBulletin Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
©2003 - 2009 DaniWeb® LLC