Hi,

I have done a lot of regex in perl and feel like in no mans land in c++ :)
Well I want to parse a tab delimited file and want to print specific fields as per my needs. One more thing the input file may not be sorted and I want the final result to be sorted by first field.

Suppose file is like this:

1111<tab>2222<tab>A<tab>word1"jack"<tab>word2"chicago"
3333<tab>4444<tab>B<tab>word1"jack"<tab>word2"chicago"
5555<tab>6666<tab>C<tab>word1"jack"<tab>word2"chicago"
7777<tab>8888<tab>D<tab>word1"john"<tab>word2"london"
9999<tab>11000<tab>E<tab>word1"john"<tab>word2"london"
12000<tab>13000<tab>F<tab>word1"peter"<tab>word2"berlin"
14000<tab>15000<tab>G<tab>word1"peter"<tab>word2"berlin"
16000<tab>17000<tab>H<tab>word1"austin"<tab>word2"texus"

Now I don't want to use Boost. I have to parse a file in 1 GB - 5 GB max. I dont want this word1 and " stuff to be printed in the output file. Its very easy in perl but i donno about C++ howto.

Here is a very simple program I have written and I know it needs lots of modifications to do that kind of parsing.

From the above input here is how I want the output file to look like:-
1111<tab>jack<tab>chicago<tab>Start
3333<tab>jack<tab>chicago<tab>Middle
5555<tab>jack<tab>chicago<tab>End
7777<tab>john<tab>london<tab>Start
9999<tab>john<tab>london<tab>End
12000<tab>peter<tab>berlin<tab>Start
14000<tab>peter<tab>berlin<tab>End
16000<tab>austin<tab>texus<tab>Single

Here is my simple starting program:-

#include<iostream>
#include<fstream>


using namespace std;

int main() {
     char buffer[256];
     ifstream myfile("test.txt");

     while(! myfile.eof()) {
           myfile.getline(buffer, 100, '\t');
           cout << buffer;
     }
   return 0;
}

Thanks :)

Edited 7 Years Ago by Web_Sailor: n/a

You can use a function such as substr() to extract each work that you want and put it into a new string then output that string. With substr(x, y), x specifies the starting point in that string and y is how many characters to go down. For example, in the string called pie, I have the data "I have a pie". If I make a string str, I can extract data from pie to make a smaller string:

str= pie.substr(5,4); //str is now "ve a"

That could be a simple way to solve your problem for now.

Sorry I don't want to use Boost :)

I have come up with th following code:- Now can someone help me howto print specific fields ? Like suppose I want to print only 2 and 3 field ? How can I do that ?

Here is code so far:-

#include<iostream>
#include<fstream>
#include<string>
using namespace std;

int main() {
     string str;
     ifstream myfile("textfile.txt");

     while(! myfile.eof() ) {
           getline(myfile,str,'\t');
           int pos1 = str.find( "word1\"" );
           int pos2 = str.find( "word2\"" );

  if ( pos1 != string::npos && pos2 != string::npos) 
   {
      str.replace( pos1, 5, "" );
      str.replace( pos1, 5, "" );
      pos1 = str.find( "word1"", pos1 + 1 );
      pos2 = str.find( "word1"", pos2 + 1 );

   } 
    cout << str << endl;

     }
   return 0;
}

Edited 7 Years Ago by Web_Sailor: n/a

You can use getline with a delimiter:

istream& getline (char* s, streamsize n, char delim );

In your case '\t' would be the delimiter.

Ok I changed getline to below:-

getline(myfile,str,'\t');

Then I wonder how am I gonna fetch my specified fields like if I want to print only field1 and field 4 ?

Should I have to store data in some sort of Multimap (because I want to allow duplicates) and then print the fields ?

Thanks :)

Maybe you just count the fields you read in with getline. Put an index variable to your loop where you read in with getline. start with 1 and add 1 in every iteration. if index is 6 set it to one and then print out the result of getline if the index is 1 or 4.

Ok I am doing something like this and I think it should be fast and good for large files also. Now the problem is that it is printing my specified field many times. That means if my vector size is 5 and my specified field contains word Duplicate. It will print this 5 times
Duplicate
Duplicate
Duplicate
Duplicate
Duplicate

How to tackle this problem ?

Here is my code

#include<iostream>
#include<fstream>
#include<string>
#include <vector>
using namespace std;

int main() {
     string str;
     vector <string> v;
     ifstream myfile("textfile.txt");
     
     while(getline(myfile,str,'\t')) 
        v.push_back(str); 
           
           for(int i =0; i < v.size(); i++)
           cout << v[3] << endl; 
     return 0;
}

Also I am able to parse only the first line in the file and not the remaining ones down under :(
Thanks

Edited 7 Years Ago by Web_Sailor: n/a

What you do is, that you store the content of the whole file in a vector. Why?
Then you print out the element 3 of the vector of often as there are elements in the vector.

Maybe you do something like this:

#include<iostream>
#include<fstream>
#include<string>

using namespace std;
 
int main() {
     string str;
     vector <string> v;
     ifstream myfile("textfile.txt");

     int i = 0; 
     while(getline(myfile,str,'\t')) {
          if ((i==0) || (i==3)) cout << str;
          i++;
          if (i==6) {
               i = 0;
               cout << endl;
          }
     }

     return 0;
}

What you do is, that you store the content of the whole file in a vector. Why?
Then you print out the element 3 of the vector of often as there are elements in the vector.

Maybe you do something like this:

#include<iostream>
#include<fstream>
#include<string>

using namespace std;
 
int main() {
     string str;
     vector <string> v;
     ifstream myfile("textfile.txt");

     int i = 0; 
     while(getline(myfile,str,'\t')) {
          if ((i==0) || (i==3)) cout << str;
          i++;
          if (i==6) {
               i = 0;
               cout << endl;
          }
     }

     return 0;
}

Hi this program works but I am not sure if its efficient for big file.

#include<iostream>
#include<fstream>
#include<string>
using namespace std;

int main() {
     string str[6];
     ifstream myfile("textfile.txt");
     while(! myfile.eof()) {    
     for(int i =0; i < 6; i++)
     getline(myfile, str[i], '\t');
     cout << str[0]<<"\t"<<str[1]<<"\t"<<str[2]<<str[3]<<"\t"<<str[4]<<"\t"<<str[5]<<"\t"<<str[6];
     }
     return 0;
}

Mr. s.robert can you give me an example howto use counter alongwith getline ? I think that should be faster and better.
I guess it should be like this. Do you mean something like this ? :-

int  count = 1; 
   while(getline(fileobj, str, ''\t')) {
   count++;
  }

I need to breakdown the file string and then load it into the multimap. So that I can do my further processing. So important for me to get the desired fields :)

Do reply ..
Thanks

I think your code above is efficient on large files. Just store the str that you need in a map and don't print it.
A sample of counting the values is in my last post, but I don't think that there is a significant difference in speed.

Thanks s.Robert. Now I just want to ask something. When using your getline logic if I want to print the last field I am unable to do so ? Is there I can do anything ?

Thanks

The Problem is maybe that there is no tab behind the last field. getline will reach the end of the file before it finds a tab. Try something like that:

for(int i =0; i < 6; i++)
     if (i<5) getline(myfile, str[i], '\t');
     else getline(myfile,str[i]);
     cout << str[0]<<"\t"<<str[1]<<"\t"<<str[2]<<str[3]<<"\t"<<str[4]<<"\t"<<str[5]<<"\t"<<str[6];
     }

Thanks. It works. Just for the sake of interest I run the program on 10Mb file. Here are the results:-

1) getline counter logic: - real 0m0.622s
2) array logic :- real 0m0.521s

So the array one is faster.

Thanks for all your help :)

Ahhh !! Sorry I was doing some additional string operations in getline counter program.

getline counter method is a bit fater:-
1) getline counter logic: -real 0m0.528s
2) array logic :- real 0m0.544s

Thanks

Hi..

I have to convert char array,i.e, string str here to int so that I can use it in map.

I have already converted char array into string using function and it works:-

str2.assign(str[3]);

but for converting into int I am not unable to convert ? Can I get some help ?

Also if I have to put a check on suppose str[1] should be an int, if not pop up an error... If str[3] is a string, if not pop up an error...

Thanks

I tried with atoi() function but couldn't get any success

int1 = atoi(str[3]);

Its telling me cannot convert std::string to const char

Thanks

a string is not the same then a char array!

use str.c_str() to get a char array from a string.

atoi is c not c++
In c++ you use stringstreams to convert a string to a integer.

Edited 7 Years Ago by s.robert: n/a

Comments
props to you, some good advice and tightly written up.

Ok is this function Ok ? Will it work not tested.

int strtoint(string String) { //can be modified to make better, but this should work for simple stuff
	int a;
	stringstream stream;
	for(int b = 0; b < String.size(); b++) {
		if(!isdigit(String.at(b))) {
			String.erase(b, 1);
		}
	}
	stream<<String<<flush;
	stream>>a;
	return a;
}

Or this one

int StringToInt(char *str)
{
    int Total = 0;
    while (*str)
        Total = Total * 10 + *(str++) - '0';
    return(Total);
}

I have not tested any of these yet. Any suggestions ?
Thanks

Edited 7 Years Ago by Web_Sailor: n/a

Ok I have done it :)

The simplest way is like this :-

#include <iostream>
#include <sstream>

int main( )
{
      std::string a("321");
      int b;
      std::stringstream ss(a);
      ss >> b;
      std::cout << b;
      return 0;
}

Thanks all for help :)

This question has already been answered. Start a new discussion instead.