Hello,

I read a large file (>7GB) from disk. The file is as set of attributes like this:
0.2,0.3,0.5,0.6,0.8
0.5,0.7,0.2,0.2,0.5
.
.
I want to read only the first columns(i.e. 0.2, 0.5,...) into a vector. I do not need the other columns. The problem that when I use the fstream it buffer all the file into memory that leads to segmentation fault. this is my code

//Initialization

string line;
string token;
stringstream iss;
long col=0,row=0;
float token_value;
std::stringstream trimmer;

//Open File
fstream ratesfile(fpath);

if (!ratesfile.is_open())
{
    printf("Cannot open data file\n");
    return 1;
}

//read header
getline(ratesfile, line);
iss << line;
while(getline(iss, token, ','))
{
    trimmer << token;
    token.clear();
    trimmer >> token;
    this->header.push_back(token);
}
iss.clear();
row=0;

//Read file to multi-dimensions array
while(getline(ratesfile, line))
{

    iss << line;
    col = 0;
    while(getline(iss, token, ','))
    {

        token_value = atof(token.c_str());


        if(col==vid)
        {

            data_vector[row]=token_value;
        }
        col++ ;
    }//end inner while loop
    row++;

    //cout<<row<<"-";
    line="";
    iss.clear();

}//end outter while loop

If all you want is the first column from each line why are you using a loop to iterate through all the columns of the row? Just call getline once, process the token, then read the next row, ignoring all other columns in the row just read.

I don't see where you declared data_vector, is it an array or is it a std::vector? In all the code I posted below I'm assuming it is an array that is large enough to hold all the numbers. If however it is a std::vector then you will want to use push_back() to insert new values into the vector.

while(getline(ratesfile, line))
{
    iss << line;
    getline(iss, token, ','); // get first column
    data_vector[row++]=atof(token.c_str());
}

That can also be done without using stringstream object

    while(getline(ratesfile, line))
    {
        size_t pos = line.find(',');
        data_vector[row++]=atof(line.substr(0,pos).c_str());
    }

Because atof() stops converting when it encounters the first non-numeric character the loop can be further simplified like this:

        while(getline(ratesfile, line))
        {
            data_vector[row++]=atof(line.c_str());
        }

Edited 3 Years Ago by Ancient Dragon

This article has been dead for over six months. Start a new discussion instead.