1,105,644 Community Members

Parsing a CSV file separated by semicolons.

Member Avatar
Yaserk88
Light Poster
27 posts since Sep 2008
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hello! I am trying to open this CSV file separated by semicolons. I know how to open a text file and I tried searching the way to open a CSV, but most methods seemed extremly complicated.

Does any one have a simple suggestion that would work with what I already have.

#include <iostream>
#include <fstream>
#include <string>
#include <locale>
#include <sstream>

using namespace std;


int main()
{


	ifstream in(filename.csv);
	ofstream out(...............);

	int RID;
        int RID_2;
	

	

	string line;
	
	
	while( in >> RID >> RID_2)
	{
	 
	cout << RID << RID_2 << endl;

	}


	
return 0;
}
Member Avatar
Nick Evan
Industrious Poster
4,827 posts since Oct 2006
Reputation Points: 4,005 [?]
Q&As Helped to Solve: 560 [?]
Skill Endorsements: 30 [?]
Team Colleague
Featured
 
0
 

I'd read the file one line at a time and then parse this line to search for semicolons. Getline with a delimiter should do the trick. Here's a sample:

#include <iostream>
#include <sstream>
#include <string>
#include <fstream>

using namespace std;

int main(){
    ifstream infile("c:/in.txt"); // for example
    string line = "";
    while (getline(infile, line)){
        stringstream strstr(line);
        string word = "";
        while (getline(strstr,word, ';')) cout << word << '\n';
    }
}
Member Avatar
Ancient Dragon
Achieved Level 70
27,632 posts since Aug 2005
Reputation Points: 5,232 [?]
Q&As Helped to Solve: 3,037 [?]
Skill Endorsements: 115 [?]
Team Colleague
Featured
Sponsor
 
0
 

you could use getline() with the third parameter

string word;
while( getline(infile, word, ';' )
{
    cout << word << "\n";
}
Member Avatar
Yaserk88
Light Poster
27 posts since Sep 2008
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

you could use getline() with the third parameter

string word;
while( getline(infile, word, ';' )
{
    cout << word << "\n";
}

Ancient Dragon, your solution was simple and helpful. The only problem is that I don't understand how to separate the string into the separate variables I wanted.

It seems that the strtok() function can be a viable method?

The code below is something I found on how to implement strtok() function. Only I do not quite understand all of the steps that are taken.

#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="- This, a sample string.";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str," ,.-");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ,.-");
  }
  return 0;
}
Member Avatar
Nick Evan
Industrious Poster
4,827 posts since Oct 2006
Reputation Points: 4,005 [?]
Q&As Helped to Solve: 560 [?]
Skill Endorsements: 30 [?]
Team Colleague
Featured
 
0
 

The only problem is that I don't understand how to separate the string into the separate variables I wanted.

Why don't try the piece of code I posted earlier? It does just that.

It seems that the strtok() function can be a viable method?

The code below is something I found on how to implement strtok() function. Only I do not quite understand all of the steps that are taken.

The code you posted is C not C++. What language are you intending to use?

Member Avatar
Ancient Dragon
Achieved Level 70
27,632 posts since Aug 2005
Reputation Points: 5,232 [?]
Q&As Helped to Solve: 3,037 [?]
Skill Endorsements: 115 [?]
Team Colleague
Featured
Sponsor
 
0
 

The problem you posted in #4 is not the same as the problem you originally posted in #1. Are they the same or two different problems? If different problems then you need to tell us that to avoid confusion.

What different variables do you want? If each column of the csv variable represents a string, then just use an array of strings

string line;
string arry[10];
int i;
while( getline(infile, line) ) // Oos! missed a )
{
      stringstream str(line);
      for(i = 0; i < 10; i++)
           getline(str, arry[i], ';');
}

The above might have problems if there are blank columns where two or more ; in a row, such as "one;;;two" If there are lines like that then it becomes much more complicated and you can't use getline() with that third parmeter.

Member Avatar
Nick Evan
Industrious Poster
4,827 posts since Oct 2006
Reputation Points: 4,005 [?]
Q&As Helped to Solve: 560 [?]
Skill Endorsements: 30 [?]
Team Colleague
Featured
 
0
 

T
The above might have problems if there are blank columns where two or more ; in a row, such as "one;;;two"

It will also stop at 10 words per line.
I would highly recommend using a vector in this case. I've made a small adjustment to the code I posted earlier:

#include <iostream>
#include <sstream>
#include <string>
#include <fstream>
#include <vector>

using namespace std;

int main(){
    ifstream infile("c:/in.txt");
    string line = "";
    vector<string> all_words;
    while (getline(infile, line)){
        stringstream strstr(line);
        string word = "";
        while (getline(strstr,word, ';')) all_words.push_back(word);
    }
}

After you run this code, all the words will be in the vector. To show the vector use something like:

for (unsigned i = 0; i < all_words.size(); i++)
        cout << all_words.at(i) << '\n';

Note that all this code is untested, so it might have a bug or two that I've missed.

Member Avatar
Ancient Dragon
Achieved Level 70
27,632 posts since Aug 2005
Reputation Points: 5,232 [?]
Q&As Helped to Solve: 3,037 [?]
Skill Endorsements: 115 [?]
Team Colleague
Featured
Sponsor
 
0
 

It will also stop at 10 words per line.
I would highly recommend using a vector in this case.

yes, that is a better solution. But, like mine, it doesn't work when there are two or more adjacient semicolons (or some other column separator).

After testing, strtok() doesn't work right either because it also skips adjacent semicolons.

Member Avatar
Yaserk88
Light Poster
27 posts since Sep 2008
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

It will also stop at 10 words per line.
I would highly recommend using a vector in this case. I've made a small adjustment to the code I posted earlier:

#include <iostream>
#include <sstream>
#include <string>
#include <fstream>
#include <vector>

using namespace std;

int main(){
    ifstream infile("c:/in.txt");
    string line = "";
    vector<string> all_words;
    while (getline(infile, line)){
        stringstream strstr(line);
        string word = "";
        while (getline(strstr,word, ';')) all_words.push_back(word);
    }
}

After you run this code, all the words will be in the vector. To show the vector use something like:

for (unsigned i = 0; i < all_words.size(); i++)
        cout << all_words.at(i) << '\n';

Note that all this code is untested, so it might have a bug or two that I've missed.

Sorry it has taken me a while to come back to this problem, but I have been away for the weekend. I have decided to go with this solution here because Ancient Dragon also seems to agree that it is better.

As far as the empty spaces and adjacent semicolons go, I did have some, but I redesigned my file to have the empty slots filled with "-1".

I have attached my file, so you can see the way it looks. I am trying to read the file and be able to call the numbers as numbers and the words as strings.

So here is the way things look right now. I use "infile.imbue(locale("german_germany.1252"));" to try and have the program read the commas in as decimal points rather than commas. This worked when I was reading the file the following way:

while( infile >> Var1 >> Var2 ...........), but with using getline, I cannot get this to work. Any suggestions?

#include <iostream>
#include <sstream>
#include <string>
#include <fstream>
#include <vector>
#include <locale>

using namespace std;

int main()
{
    ifstream infile("C:\\Dokumente und Einstellungen\\Yaser\\Eigene Dateien\\Internship\\C++_Code\\All_Data.csv");
	ofstream outfile("C:\\Dokumente und Einstellungen\\Yaser\\Eigene Dateien\\Internship\\C++_Code\\Output.txt");
   
	infile.imbue(locale("german_germany.1252"));

	string line = "";
    vector<string> all_words;

    while (getline(infile, line))
	{
        stringstream strstr(line);
        string word = "";
        while (getline(strstr,word, ';')) all_words.push_back(word);
    }
		infile.imbue(locale("german_germany.1252"));
	for (unsigned i = 0; i < all_words.size(); i++)
        outfile << all_words.at(i) << "\t" << endl;
}

Is there anyway to set the file up as a two-dimensional array instead? so that I can have "all_words[....][....]"

Your help is very well appreciated.

Attachments Sample1.txt (7.48KB)
Member Avatar
Ancient Dragon
Achieved Level 70
27,632 posts since Aug 2005
Reputation Points: 5,232 [?]
Q&As Helped to Solve: 3,037 [?]
Skill Endorsements: 115 [?]
Team Colleague
Featured
Sponsor
 
0
 

The only problem with your program is that you need to set local for the stringstream object.

while (getline(infile, line))
	{
        stringstream strstr(line);
        string word = "";
	    strstr.imbue(locale("german_germany.1252"));
        while (getline(strstr,word, ';')) 
            all_words.push_back(word);
    }

Attached is the output file I got.

Attachments Output.txt (9.17KB)
Member Avatar
Duoas
Postaholic
2,039 posts since Oct 2007
Reputation Points: 1,022 [?]
Q&As Helped to Solve: 229 [?]
Skill Endorsements: 10 [?]
Featured
 
4
 

Sorry to respond to this late... but I wanted to post info also...

The getline() function has the obnoxious habit of returning a not good() stream for final blank fields...

For a single blank line at the end of input, that's fine... (there's no record) but for blank fields it makes a difference. You can get past the problem by checking the stream state before getting a line.

For simple CSV files (meaning you cannot use the ';' character [or whatever character you've chosen] in the field value) this is a working example:

#include <deque>
#include <iostream>
#include <sstream>
#include <string>

typedef std::deque <std::string> record_t;
typedef std::deque <record_t>    table_t;

std::istream& operator >> ( std::istream& ins, table_t& table )
  {
  std::string s;
  table.clear();

  while (std::getline( ins, s ))
    {
    std::istringstream ss( s );
    record_t           record;
    std::string        field;
    bool               final = true;

    while (std::getline( ss, field, ';' ))
      {
      record.push_back( field );
      final = ss.eof();
      }
    if (!final)
      record.push_back( std::string() );

    table.push_back( record );
    }

  return ins;
  }

This will allow you to read all seven fields in a record like:

one; two;three;four;;six;

Hope this helps.

Member Avatar
Yaserk88
Light Poster
27 posts since Sep 2008
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

The only problem with your program is that you need to set local for the stringstream object.

while (getline(infile, line))
	{
        stringstream strstr(line);
        string word = "";
	    strstr.imbue(locale("german_germany.1252"));
        while (getline(strstr,word, ';')) 
            all_words.push_back(word);
    }

Attached is the output file I got.

Hi Ancient Dragon. I see that this makes sense, but the commas in the numbers are not being replaced with decimals. I think the reason for this is that things are still in strings and I need to convert them to integer and float values.

For instances when I do all_words[0] + all_words[2] (all_words[0]=2 and all_words[2]=31), I get 231.

How do I convert each indviudal value of my file so that I can specify it either as int, float, or string?

Member Avatar
Yaserk88
Light Poster
27 posts since Sep 2008
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

I will actually mark this thread as solved and post it as a different problem, because it may be useful for other people. Thanks for the help everyone!

Question Answered as of 4 Years Ago by Ancient Dragon, Nick Evan and Duoas
Member Avatar
Ancient Dragon
Achieved Level 70
27,632 posts since Aug 2005
Reputation Points: 5,232 [?]
Q&As Helped to Solve: 3,037 [?]
Skill Endorsements: 115 [?]
Team Colleague
Featured
Sponsor
 
0
 

How do I convert each indviudal value of my file so that I can specify it either as int, float, or string?

Simple substitution -- call find() to locate the comma and replace it with a period.

You
This question has already been solved: Start a new discussion instead
Post:
Start New Discussion
Tags Related to this Article