954,160 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

I ran out of vector space?

Here is my code:

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <fstream>
#include <stdlib.h>//(for atoi to work)

using namespace std;

void usage()
{
	cout << "Usage: <input1> <input2> <output>\n";
	cout << "\n see README for more details.\n";
	exit(1);
}

int main(int argc, char *argv[])
{
	cout << "\nshmoosh - concatenates and uniques wordlists into one\n";

	if(argc!=4)
		usage();

	vector<string> vec_wordlist_compilation;

///////////////////////////input1//////////////////////////////

	ifstream wordlistfile(argv[1]);
	if(!wordlistfile.is_open())
	{
		cout<<"\nError opening file \'"<<argv[1]<<"\'\n";
		exit(1);
	}
	int x=0;
	string word;
	while(getline(wordlistfile,word)){
		vec_wordlist_compilation.push_back(word);
		x++;
	}
	cout << x << " words loaded from file \'"<<argv[1]<<"\'\n";

	wordlistfile.close();

///////////////////////////input2//////////////////////////////

	ifstream wordlistfiletwo(argv[2]);
	if(!wordlistfiletwo.is_open())
	{
		cout<<"\nError opening file \'"<<argv[2]<<"\'\n";
		exit(1);
	}
	int v=0;
	while(getline(wordlistfiletwo,word)){
		vec_wordlist_compilation.push_back(word);
		v++;
	}
	cout << v << " words loaded from file \'"<<argv[2]<<"\'\n";

	wordlistfiletwo.close();

////////////////////////////sort//////////////////////////////
	cout << "\nsorting " << v+x << " words, removing duplicates...\n";

	//sort vector (least to greatest)...
	sort(vec_wordlist_compilation.begin(),vec_wordlist_compilation.end());
	//remove duplicates...
	

vec_wordlist_compilation.resize((unique(vec_wordlist_compilation.begin(),vec_wordlist_compilation.end()))-vec_wordlist_compil

ation.begin());

	/*for(unsigned int c=0;c<vec_wordlist_compilation.size();c++)
		cout << vec_wordlist_compilation[c] << "\n";*/
	cout << vec_wordlist_compilation.size() << " unique words remain.\n";

////////////////////////////output//////////////////////////////

	ofstream output(argv[3]);
	for(unsigned int c=0;c<vec_wordlist_compilation.size();c++)
		output << vec_wordlist_compilation[c] << "\n";

return 0;
}


I made it to put two wordlists together and remove duplicates. The problem is that it crashes on large wordlists. I cannot say exactly how many words it takes without crashing, but somewhere around 200 megs, it crashes when loading the wordlists. I have 8 gigs of ram, so I know it's not running out of space. Is there a limitation (in MB) that a C++ vector can hold? Is there a way around this? If not, does anybody know of some library which will let me do this?

I thought about making a version that writes a temporary file to the hard drive and scans the file for every new word to majke sure it is not in there already, but I figured this would be waaaaay too slow.

Can anybody help?

dzhugashvili
Light Poster
35 posts since Jun 2009
Reputation Points: 10
Solved Threads: 0
 

A std::vector is guaranteed to maintan contiguous data space -- meaning it cannot handle really large data.

Use a std::deque instead. I looks much the same, but the data need not be stored contiguously -- meaning it can handle a great deal larger amount of data (because it can work with the OS/compiler's memory management more flexibly).

BTW, you shouldn't be using atoi(). Use a stringstream instead...

#include <sstream>
#include <stdexcept>
#include <string>

int myatoi( const std::string& s )
  {
  int result;
  std::istringstream ss( s );
  ss >> result;
  if (!ss.eof()) throw std::runtime_error( "not an integer" );
  return result;
  }

Untested!

Hope this helps.

Duoas
Postaholic
2,043 posts since Oct 2007
Reputation Points: 1,140
Solved Threads: 229
 

>>I have 8 gigs of ram, so I know it's not running out of space

32-bit programs can not access all that memory at one time. Each 32-bit program is limited to about 2 gig ram.

Ancient Dragon
Retired & Loving It
Team Colleague
30,043 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,341
 

I created a 1.2 gig text file that contained 10-character words (generated randomly). Then tried to read it into a std::list. The program crashed after reading just over 24 million words. Changed the program to use deque instead of list, and it read even fewer words before crashing. (my computer is running vista home, has 5 gig ram, and used vc++ 2008 express compiler/IDE)

Of course it would have been easier to check by calling the list's max_size() method. For deque
maxsize = 134217727
Press any key to continue . . .

Changed the program to use try/catch and got this:23020000
23030000
Out of memory

// final size of the deque
size = 23031567

Ancient Dragon
Retired & Loving It
Team Colleague
30,043 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,341
 

Curses! I forgot about the 2GB 32-bit limitation. I don't suppose there is any simple way to reconfigure my compiler (Microsoft Visual C++ 2008 Express Edition) to compile this code in 64-bit mode and thus enable it to access the necessary RAM? Would I have to re-write the code?

dzhugashvili
Light Poster
35 posts since Jun 2009
Reputation Points: 10
Solved Threads: 0
 

anybody? Compile this code in 64-bit?

dzhugashvili
Light Poster
35 posts since Jun 2009
Reputation Points: 10
Solved Threads: 0
 

I will start another more appropriately labeled thread.

dzhugashvili
Light Poster
35 posts since Jun 2009
Reputation Points: 10
Solved Threads: 0
 
Curses! I forgot about the 2GB 32-bit limitation. I don't suppose there is any simple way to reconfigure my compiler (Microsoft Visual C++ 2008 Express Edition) to compile this code in 64-bit mode and thus enable it to access the necessary RAM? Would I have to re-write the code?

You can not configure the Express edition to do that. You will have to buy a pro edition (or maybe standard). You also might want to check out GNU g++ because I think it will compile 64-bit programs.

Ancient Dragon
Retired & Loving It
Team Colleague
30,043 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,341
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You