Hello,
I am new to proogramming and I need read the elements from two csv files into a 2 vectors and display the elements of vectors.
The csv file is in the following fomat:
200, New york, -23.456, 23.455
201,Chicago,-34.5434,34.546
.....
.....
After that I have to perform string matching on the elements of the two vectors
I have used the collowing code to read the csv file into vector:

#pragma warning(disable: 4786) // VC++ 6.0 disable warning about debug line too long
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
using namespace std;
typedef vector<string> LINE;

int main()
{
	string line;
	int pos;
	vector<LINE> array;

	ifstream in("log.csv");
	if(!in.is_open())
	{
		cout << "Failed to open file" << endl;
		return 1;
	}
	while( getline(in,line) )
	{
		LINE ln;
		while( (pos = line.find(',')) >= 0)
		{
			string field = line.substr(0,pos);
			line = line.substr(pos+1);
			ln.push_back(field);
		}
		array.push_back(ln);
	}
	return 0;
}

What modification I should make to this code so that I can display all the elements in the above vector.
Also how can a particular element be accessed when necessary. i.e how could I access the element Chicago.
Please give me any clues with the string matching of elements of two vectors also if possible

I am struck in this and I desperately need your help. Thankyou

Recommended Answers

All 13 Replies

Hello,
I am new to proogramming and I need read the elements from two csv files into a 2 vectors and display the elements of vectors.
The csv file is in the following fomat:
200, New york, -23.456, 23.455
201,Chicago,-34.5434,34.546
.....
.....
After that I have to perform string matching on the elements of the two vectors
I have used the collowing code to read the csv file into vector:

#pragma warning(disable: 4786) // VC++ 6.0 disable warning about debug line too long
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
using namespace std;
typedef vector<string> LINE;

int main()
{
	string line;
	int pos;
	vector<LINE> array;

	ifstream in("log.csv");
	if(!in.is_open())
	{
		cout << "Failed to open file" << endl;
		return 1;
	}
	while( getline(in,line) )
	{
		LINE ln;
		while( (pos = line.find(',')) >= 0)
		{
			string field = line.substr(0,pos);
			line = line.substr(pos+1);
			ln.push_back(field);
		}
		array.push_back(ln);
	}
	return 0;
}

What modification I should make to this code so that I can display all the elements in the above vector.
Also how can a particular element be accessed when necessary. i.e how could I access the element Chicago.
Please give me any clues with the string matching of elements of two vectors also if possible

I am struck in this and I desperately need your help. Thankyou

You have a vector of a vector of strings, it appears, so to display I would have a nested loop that traverses through each vector:

for (int i = 0; i < array.size (); i++)
{
    vector <string> cityData = array.at(i);
    for (int j = 0; j < cityData.size(); j++)
    {
        cout << cityData.at(j) << ",";
    }
    cout << endl;
}

This will add an extra comma at the end of each lines, so you may want to take that out.

If you run this, you'll notice that the last bit of information is not getting into your vector. That's because your data doesn't end in a comma:

201,Chicago,-34.5434,34.546

but your code expects it to:

while( (pos = line.find(',')) >= 0)

You need to change your code slightly so that last bit of information is pushed into your array of strings.

I'm not sure what kind of string matching you are referring to here. Can you elaborate please?

Regarding how to access a certain element, the code I posted does that. You have a vector of vectors, so you need to specify which element of each vector you want. My code did that with the at function from the vector class. Chicago is in the second element of the second vector. Since element numbers start at 0, that's (1,1). So

array.at(1).at(1)

is "Chicago". "New York" would be:

array.at(0).at(1)

Thankyou very much.
I am able to display all the elements in the vector but as you have said the last element in the csv file is not being entered to the vector. I tried changing the below condition but couldnt correct it.

while((pos = line.find(',')) >= 0)

Since there will be a null at the end of the line, I tried giving that condition also with this but i couldn't display the last element.

Can you tell me what change I have to make in that condition.

And about the string matching that I have to do:

In the above manner I have to read two csv files into two separate vectors. -guess array1 and array2
For example array 1 is in the format :
200, New york, -23.456, 23.455
201,Chicago,-34.5434,34.546
.....
array 2 is in the format:
300,york new, -23.456, 23.455
301,Chic,-34.5434,34.546
.....

I have to compare all the elements of second column from the array1 with all the elements of second column or array2 character by character. The comparison of the characters of the second column elements should be ordered(which i explained below).

For example, I have to compare the "New york" from array 1 with "york new" and "Chic" and so on (with all the second column elements of array2) character by character in an order and count the number of matching characters for all.

Detailed explanation:
When "New york" is compared to "york new", the first character 'N' from "New york" is searched within "york new". Since it is found in the fifth place, the second character from "New york" i.e 'e' is searched only after the fifth character in "york new" and it find's its match in 6th place.Now 'w' from "new york" is searched after the 6th place in "york new" and finds its match in 7th position. The count gets incremented each time and the the total count of 3 should be stored for this record.
In this way the comparison should be done for all the records in array2 and counts stored.
Finally I want the records with top 1(or top 10 if necessary) highest counts to be displayed.
Find the count of number of matching characters with each element of second column. Finally display the records having elements of maximum number of count.

I wrote everything in elaborate matter just to make it clear to you.
Thankyou

Thankyou very much.
I am able to display all the elements in the vector but as you have said the last element in the csv file is not being entered to the vector. I tried changing the below condition but couldnt correct it.

while((pos = line.find(',')) >= 0)

Since there will be a null at the end of the line, I tried giving that condition also with this but i couldn't display the last element.

Can you tell me what change I have to make in that condition.

I don't think you need to make a change to the condition. Given your data, the condition is fine. I wouldn't change anything, but I would add to your code. You get out of the loop and you have some leftover data. There's nothing left to "find". The leftover data is your entire string and you want all of it, so after the loop, push the leftover string into the vector. So you're using the same code, but adding a line after the while loop (add line 8):

LINE ln;
while( (pos = line.find(',')) >= 0)
{
	string field = line.substr(0,pos);
	line = line.substr(pos+1);
	ln.push_back(field);
}
ln.push_back(line); // add this line
array.push_back(ln);

Hey thanks . It works.
I didn't get this idea.

Now i will proceede with the string matching part.

In my previous post about string matching I told you that I have to store the counts for all the matches. But I dont have any idea where I cound store the count. So that finally I can display the maximum count.
Can I create a separate array for it or can I insert into the same array - array2?
In that way along with the maximum count I could display the matched record also from array2 and also array1.

Hey thanks . It works.
I didn't get this idea.

Now i will proceede with the string matching part.

In my previous post about string matching I told you that I have to store the counts for all the matches. But I dont have any idea where I cound store the count. So that finally I can display the maximum count.
Can I create a separate array for it or can I insert into the same array - array2?
In that way along with the maximum count I could display the matched record also from array2 and also array1.

These are separate tasks and should be treated as such. One task is to compare two strings ("New York" and "york new") and get the appropriate value of matching characters. In this case, as I understand your post, you should get 3 for this comparison. So I would write a function for this:

int MatchingChars (string string1, string string2)
{
     // code
}

Within this function, you could use the "find" and "substr" functions as you did when you set up your vectors. The code here would be very similar except that instead of doing a push_back every time you find something, you would add to a counter.


So that's that part. As to where to store the results from this function, why not another vector?

vector <int> matches;

So if you have five cities, matches would contain five elements. You would call the MatchingChars function five times, once for each city. Each city would have two strings (the same letters in different order). You would pass this function those two strings ans store the result in the matches vector. Once you have this vector, sort it from highest to lowest. You can either write your own sort (Bubble Sort is fairly easy to write) or you can use C++'s pre-written sort:

http://www.cplusplus.com/reference/algorithm/sort.html

From that sorted vector, display the top ten or top one or whatever. Keep in mind that it's a little more complicated than sorting integers since you need to retain the index numbers of the vector, so you're looking at "paired data". If you end up displaying 3, you also want to retain the information that 3 goes with New York. The example link I posted would need to be tweaked. In my opinion, it would be easier to write your own sort. Set up a struct. This struct will contain two pieces of data. One, the city name or the index number. Two, the number of matches. So a revision on my earlier suggestion. Change this:

vector <int> matches;

to this:

struct cityPair
{
     int vectorIndex;
     int charMatches;
};

vector <cityPair> cities;

Sort cities by charMatches.

This is one way to do it. There are many others. Regardless, make a function for the number of character matches between strings.

Thanks for your detailed reply. I will try to do that.

I have completed the string search by counting the number of characters matched.

But after getting the number of characters, i now want to enter the character_count and also the names(strings) which are compared and all the corresponding data of those strings, for all the records into a new vector, so that I can access the matched character_count column, sort it and pick the top 5.

But I dont have any idea of how to insert these elements into vector.
I guess I can use the push_back() command and the elements gets inputed at the end of vector.
But how can I enter elements into a new row of vector (the second record) .

Please help me with this.
Thankyou

Many ways to do this. I actually now will sort of back away from my previous suggestion and suggest a re-organization of your struct(s)/vector(s). The way I suggested before will definitely work, but it sounds like you want all your information together, which isn't a bad idea. So I'll suggest, one, merging your original two string vectors into one vector. And two, merge the last vector I suggested:

struct cityPair
{
     int vectorIndex;
     int charMatches;
};

vector <cityPair> cities;

into this same vector. So you have one vector for the entire program. Again, if there are five cities, this vector would have five elements. It would be something like this:

struct city
{
     string citystring1;
     string citystring2;
     int charMatches;
};

vector <city> cities;

Note that vectorIndex is no longer needed since it pointed to the other vectors to keep them straight. With one vector, that's no longer necessary.

So, first step is to get your city string pairs into the cities vector. Next, go through the vector element by element, call the function that calculates the number of charMatches, and fill in that part of the struct. By the end of this, you have all your data. Now you sort it by, as I mentioned before, ordering it by charMatches.

So if you have two input files, each with two elements:

200, New york, -23.456, 23.455
201,Chicago,-34.5434,34.546

and

300,york new, -23.456, 23.455
301,Chic,-34.5434,34.546

you end up with this vector:

element 0: New york, york new, 0
element 1: Chicago, Chic, 0

The zeros represent the fact that charMatches haven't been filled in yet. Now fill them in by calling the function for each string pair:

(New york, york new) => 3
(Chicago, Chic) => 4

You end up with your vector like this:

element 0: New york, york new, 3
element 1: Chicago, Chic, 4

I don't know what those other numbers were in the input file. If you need them and that data is meaningful, add some more data members to your struct and put 'em in there. Otherwise, throw them out. The point is that now all of your data for each entry is together and will remain so. Your headaches of making the sorted vector point to the correct element of the unsorted vector are gone since there is only one vector. You sort them by charMatches and do whatever you want with them.

[EDIT]

As to the total number of push_back calls, that's one per element of the the vector. So in this case, there are two cities (New York, Chicago), so two elements in the vector, so two push_back calls total.

[/EDIT]

i am clear with the procedure. But I am not that familiar with the structs concept. Can you please tell me how to store the elements from the function into a structure that you have mentioned. How could I enter the elements necessary into a structure?

i am clear with the procedure. But I am not that familiar with the structs concept. Can you please tell me how to store the elements from the function into a structure that you have mentioned. How could I enter the elements necessary into a structure?

struct city
{
     string citystring1;
     string citystring2;
     int charMatches;
};

vector <city> cities;

If you have the above struct called city and the above vector called cities, you could declare an object of type city:

city aCity;

You have your first input file containing:

200, New york, -23.456, 23.455
201,Chicago,-34.5434,34.546

At some point you read in the value "New york" into the appropriate field of city:

file >> aCity.citystring1;

The same thing will happen later when you deal with the second input file:

300,york new, -23.456, 23.455
301,Chic,-34.5434,34.546

You'll read "york new" into aCity.citystring2.

file2 >> aCity.citystring2;

You have your function called MatchingChars, which takes two strings as arguments. Here's your function call:

aCity.charMatches = MatchingChars(aCity.citystring1, aCity.citystring2);

Does that answer your question? If not, please post your revised code, even if it does not work, along with the input files and where you are having problems.

I will do that. But in

file >> aCity.citystring1;

How should the variable 'file' be initialized.
And does calling this command read the "New york" into citystring1. (since we haven't mentioned which field to be read into the citystring1)

Sorry if I am asking too many questions. I am a bit confused.Thanks a lot.

I will do that. But in

file >> aCity.citystring1;

How should the variable 'file' be initialized.
And does calling this command read the "New york" into citystring1. (since we haven't mentioned which field to be read into the citystring1)

Sorry if I am asking too many questions. I am a bit confused.Thanks a lot.

I think I got mixed up between your thread and another thread I was commenting on. From post 1 of your thread:

#pragma warning(disable: 4786) // VC++ 6.0 disable warning about debug line too long
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
using namespace std;
typedef vector<string> LINE;

int main()
{
	string line;
	int pos;
	vector<LINE> array;

	ifstream in("log.csv");
	if(!in.is_open())
	{
		cout << "Failed to open file" << endl;
		return 1;
	}
	while( getline(in,line) )
	{
		LINE ln;
		while( (pos = line.find(',')) >= 0)
		{
			string field = line.substr(0,pos);
			line = line.substr(pos+1);
			ln.push_back(field);
		}
		array.push_back(ln);
	}
	return 0;
}

Your ifstream is named in . I mistakenly remembered you calling your ifstream file , but I think that was a different person in a different thread. file in my above post refers to the ifstream attached to the input data file where "New York" is stored. So in your case that appears to be called in , not file , so you can change the word file in my last post to in . As to how to initialize it, you can initialize it as you did above in line 15.


Regarding how to get "New York" into the proper string, you may actually want to not use this line:

file >> aCity.citystring1;

Instead you can read it in as you do in line 21, then change lines 23 through 30. You are now dealing with putting your data into a struct rather than a string vector. So you need to decide whether you know the exact format of your data file, you need to decide what each bit of data represents, and you need to decide what you need/want to keep, and adjust your struct accordingly.

If you do it the way I suggested, using those names, you'd change line 23 above to:

city aCity;

In line 13, you are now going to have a vector of type city, rather than a vector of type LINE. If you name this vector cities, as I did, you would change line 13 to:

vector <city> cities;

and change line 30 to:

cities.push_back(aCity);

Lines 24 through 29 would be changed to reflect the new name. The code would be similar to what you have regarding "find" and "substr". Don't forget that you changed this a little later since you were dropping the value after the last comma.

Line 28 will have to change since you're no longer dealing with a vector of strings. That code needs to get the right data into each field of the city struct.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.