954,496 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Tip: How to avoid string parsing, File I/O

A lot of people have questions about how to load a text file using file I/O. There seems to be 2 popular methods to loading a text file:

Use fstream's >> extraction operator. Simple enough, load the file directly into individual containers, word at a time, but you lose your delimiting white spaces (assuming you need them). So some file integrity is lost, unless you make the effort re-construct the file by re-inserting all the white spaces.

So instead, you decide to use getline(). You preserve all white spaces and load the file line by line... but now you have to parse a line of data into individual substrings; either by using member functions, performing string[] array operations, or using strtok().

One alternative I would like to suggest: why not do both? It is possible to read a file in it's entirety and read in text 'word at a time' into individual containers.. without having to do any string parsing:

#include<string>

//Be sure to open your file in binary mode
infile.open("C:\\Users\\Dave\\Documents\\test.txt", ifstream::binary);

//Here you can load the file in it's entirety    
while(getline(infile, lines[i]))
{
     //Go back to the start of the line
     infile.seekg(begin, ios::beg);

     //Now you can load the same data into individual containers
     for(int j=0; j<4; j++)
     {
          infile >> words[word_count];
          word_count++;
     }

     //Discard any extra characters left behind
     infile.ignore(100, '\n');

     //Save current position, so you can go back to the beginning of next line
     begin = infile.tellg();

     i++;
 }


So now you have the entire document preserved in lines[], and you have individual line contents stored in words[]. Depending on your application needs, this method might be a viable option in that you get the best of both worlds without having to do any string parsing.

Clinton Portis
Practically a Posting Shark
833 posts since Oct 2005
Reputation Points: 237
Solved Threads: 118
 

Don't forget, that there are also other methods. For example, you can
extract character by character.

Here is an example of that :

#include<iostream>
#include<string>
#include<fstream>

using namespace std;


int main()
{
	ifstream iFile("test.txt");
	if(!iFile) return -1;

	char ch = 0;

	string content = "";

	while(iFile.get(ch) ){
		content += ch;		
	}

	cout << content << endl;

	return 0;
}


And if you want only the words in the file, then a simple if statement
to check if the char is a space and also using vectors of string will do.

firstPerson
Senior Poster
3,923 posts since Dec 2008
Reputation Points: 841
Solved Threads: 608
 

you could have just used stringstream to split the line into words without re-reading the file. And it would have been more useful to use vector to hold the lines/words.

[edit]I don't see any reason at all for reading the file one character at a time. Too much work just to extract individual words.

Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 
[edit]I don't see any reason at all for reading the file one character at a time. Too much work just to extract individual words.

Yea I know. Just throwing the possibility out there. Reading
character by character might be helpful in some cases though, like
frequency counter, or whatever.

firstPerson
Senior Poster
3,923 posts since Dec 2008
Reputation Points: 841
Solved Threads: 608
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You