| | |
I ran out of vector space?
Please support our C++ advertiser: Programming Forums - DaniWeb Sister Site
Thread Solved |
•
•
Join Date: Jun 2009
Posts: 33
Reputation:
Solved Threads: 0
Here is my code:
I made it to put two wordlists together and remove duplicates. The problem is that it crashes on large wordlists. I cannot say exactly how many words it takes without crashing, but somewhere around 200 megs, it crashes when loading the wordlists. I have 8 gigs of ram, so I know it's not running out of space. Is there a limitation (in MB) that a C++ vector can hold? Is there a way around this? If not, does anybody know of some library which will let me do this?
I thought about making a version that writes a temporary file to the hard drive and scans the file for every new word to majke sure it is not in there already, but I figured this would be waaaaay too slow.
Can anybody help?
C++ Syntax (Toggle Plain Text)
#include <iostream> #include <string> #include <vector> #include <algorithm> #include <fstream> #include <stdlib.h>//(for atoi to work) using namespace std; void usage() { cout << "Usage: <input1> <input2> <output>\n"; cout << "\n see README for more details.\n"; exit(1); } int main(int argc, char *argv[]) { cout << "\nshmoosh - concatenates and uniques wordlists into one\n"; if(argc!=4) usage(); vector<string> vec_wordlist_compilation; ///////////////////////////input1////////////////////////////// ifstream wordlistfile(argv[1]); if(!wordlistfile.is_open()) { cout<<"\nError opening file \'"<<argv[1]<<"\'\n"; exit(1); } int x=0; string word; while(getline(wordlistfile,word)){ vec_wordlist_compilation.push_back(word); x++; } cout << x << " words loaded from file \'"<<argv[1]<<"\'\n"; wordlistfile.close(); ///////////////////////////input2////////////////////////////// ifstream wordlistfiletwo(argv[2]); if(!wordlistfiletwo.is_open()) { cout<<"\nError opening file \'"<<argv[2]<<"\'\n"; exit(1); } int v=0; while(getline(wordlistfiletwo,word)){ vec_wordlist_compilation.push_back(word); v++; } cout << v << " words loaded from file \'"<<argv[2]<<"\'\n"; wordlistfiletwo.close(); ////////////////////////////sort////////////////////////////// cout << "\nsorting " << v+x << " words, removing duplicates...\n"; //sort vector (least to greatest)... sort(vec_wordlist_compilation.begin(),vec_wordlist_compilation.end()); //remove duplicates... vec_wordlist_compilation.resize((unique(vec_wordlist_compilation.begin(),vec_wordlist_compilation.end()))-vec_wordlist_compil ation.begin()); /*for(unsigned int c=0;c<vec_wordlist_compilation.size();c++) cout << vec_wordlist_compilation[c] << "\n";*/ cout << vec_wordlist_compilation.size() << " unique words remain.\n"; ////////////////////////////output////////////////////////////// ofstream output(argv[3]); for(unsigned int c=0;c<vec_wordlist_compilation.size();c++) output << vec_wordlist_compilation[c] << "\n"; return 0; }
I made it to put two wordlists together and remove duplicates. The problem is that it crashes on large wordlists. I cannot say exactly how many words it takes without crashing, but somewhere around 200 megs, it crashes when loading the wordlists. I have 8 gigs of ram, so I know it's not running out of space. Is there a limitation (in MB) that a C++ vector can hold? Is there a way around this? If not, does anybody know of some library which will let me do this?
I thought about making a version that writes a temporary file to the hard drive and scans the file for every new word to majke sure it is not in there already, but I figured this would be waaaaay too slow.
Can anybody help?
A std::vector is guaranteed to maintan contiguous data space -- meaning it cannot handle really large data.
Use a std::deque instead. I looks much the same, but the data need not be stored contiguously -- meaning it can handle a great deal larger amount of data (because it can work with the OS/compiler's memory management more flexibly).
BTW, you shouldn't be using atoi(). Use a stringstream instead...
Untested!
Hope this helps.
Use a std::deque instead. I looks much the same, but the data need not be stored contiguously -- meaning it can handle a great deal larger amount of data (because it can work with the OS/compiler's memory management more flexibly).
BTW, you shouldn't be using atoi(). Use a stringstream instead...
C++ Syntax (Toggle Plain Text)
#include <sstream> #include <stdexcept> #include <string> int myatoi( const std::string& s ) { int result; std::istringstream ss( s ); ss >> result; if (!ss.eof()) throw std::runtime_error( "not an integer" ); return result; }
Hope this helps.
>>I have 8 gigs of ram, so I know it's not running out of space
32-bit programs can not access all that memory at one time. Each 32-bit program is limited to about 2 gig ram.
32-bit programs can not access all that memory at one time. Each 32-bit program is limited to about 2 gig ram.
Don't PM me with questions -- you might get a nasty PM in response. If you have a question then post it in one of the forums.
I created a 1.2 gig text file that contained 10-character words (generated randomly). Then tried to read it into a std::list. The program crashed after reading just over 24 million words. Changed the program to use deque instead of list, and it read even fewer words before crashing. (my computer is running vista home, has 5 gig ram, and used vc++ 2008 express compiler/IDE)
Of course it would have been easier to check by calling the list's max_size() method. For deque
Changed the program to use try/catch and got this:
Of course it would have been easier to check by calling the list's max_size() method. For deque
•
•
•
•
maxsize = 134217727
Press any key to continue . . .
•
•
•
•
23020000
23030000
Out of memory
// final size of the deque
size = 23031567
Last edited by Ancient Dragon; Aug 7th, 2009 at 11:16 am.
Don't PM me with questions -- you might get a nasty PM in response. If you have a question then post it in one of the forums.
•
•
•
•
Curses! I forgot about the 2GB 32-bit limitation. I don't suppose there is any simple way to reconfigure my compiler (Microsoft Visual C++ 2008 Express Edition) to compile this code in 64-bit mode and thus enable it to access the necessary RAM? Would I have to re-write the code?
Don't PM me with questions -- you might get a nasty PM in response. If you have a question then post it in one of the forums.
![]() |
Similar Threads
- comparing degree of similarity between files (Python)
- Readings chars from file into vector (C++)
- Cannot see desktop... (Viruses, Spyware and other Nasties)
- Maybe someone can help me with this.. (Storage)
- Avenged Sevenfold (Geeks' Lounge)
- Windows cannot load installer for the driver. (Windows NT / 2000 / XP)
- I need help covert my source code frm pic16c54a to pic16f84a (Assembly)
- problen with installing redhat 9 and vmware 5 (*nix Software)
- no holy war here please (Getting Started and Choosing a Distro)
- Start menu out off space! Wont list in single column (Windows NT / 2000 / XP)
Other Threads in the C++ Forum
- Previous Thread: file handling
- Next Thread: Heeeelp!!!!!!!!!!!!!
Views: 381 | Replies: 7
| Thread Tools | Search this Thread |
Tag cloud for C++
6 add api array arrays beginner binary bitmap c++ c/c++ calculator char class classes code compile compiler console conversion convert count data delete desktop directshow dll encryption error file forms fstream function functions game getline givemetehcodez google graph homeworkhelper iamthwee ifstream input int integer java lazy lib linkedlist linux loop looping loops map math matrix memory microsoft newbie news node number output parameter pointer problem program programming project proxy python random read recursion recursive reference return sort string strings struct studio system template templates test text tree unix url variable vector video visual visualstudio win32 windows winsock word wordfrequency wxwidgets






