Hi, its me again,

How would I go about reading in the same file twice?? What I need to do is read it in the first time, and count how many lines there are between two words (in the middle of the file somewhere - see example below), create and array of that size, and then go back again and read in those lines and store them to the array I just created - or is there a way of extending an array on-the-fly?

Sample Text Document

Line
Line
Line
Line
Keyword
Readthis
Readthis
readthiS
Keyword
Line
Line

So there should be a count of 3, so an array of size 3 should be made, and then the three 'readthis' lines should be added to the array.

I currently have it searching for the keywords the first time, counting it up, then resetting it back to the beginning (using myfile.seekg(0, ios::beg); ) and then re-reading it again with pretty much the same code but this time adding it to an array I just created. It works - but its doubled the amount of code I have - and looks messy.
Any ideas??
Thanks
Mike.

Recommended Answers

All 11 Replies

If I understand your explanation well, do the following:

Read a line of your file.
Is line in array?
no: put it there. Increment a counter.
yes or else : read next line.
until EOF

No, what I want it to do is put it INTO an array.

Basically its got to read in each line, but only the lines between two keywords/lines, and then add those lines to an array.

The lines being read in are just strings.

Here is a snippet of the code I have now

int count = 0;
string line;
if(myfile.is_open() && filesize!=0){

		cout << "File opened. Reading now." << endl;

			
		//while it is not at the end of the file
		while(!myfile.eof())
		{
			//read the contents into the string line
			getline(myfile,line);

			if(line=="<genes>")
				geneBool = true;
			if(line=="</genes>")
				geneBool = false;

			if(geneBool)
				count++;
		}
		//create the array
		StringArrayPtr *geneArr = new StringArrayPtr[count];
		//put back to the beginning
		myfile.seekg(0, ios::beg);

		//start again, but now read it in
		while(!myfile.eof())
		{
			//read the contents into the string line
			getline(myfile,line);

			if(line=="<genes>")
				geneBool = true;
			if(line=="</genes>")
				geneBool = false;

			if(geneBool){
				geneArr[count] = line;
				count++;
		}
}

Is there a better way than this?? because I have lots more code in the while(!myfile.eof()) wraps and i have to do this twice (one for Genes and one for Cells :S)

Thanks

Firstly you should use

while(getline(myfile, line)){}

rather than

while(!myfile.eof()){}

this is because eof() can return true before the end of a file with some excape characters.

Secondly look into the vector header its wonderful its designed with this sort of thing in mind!

Chris

Use Vector and do all this in one stroke.
Vector containers are implemented as dynamic arrays.
Also : if line 14 is true you set geneBool to true, perfect!
Set genBool to false in the first place so you can omit the if on line 16.

Firstly you should use

while(getline(myfile, line)){}

rather than

while(!myfile.eof()){}

this is because eof() can return true before the end of a file with some excape characters.

Secondly look into the vector header its wonderful its designed with this sort of thing in mind!

Chris

Will do!! I'll also look into the EOF thingy too!

Also : if line 14 is true you set geneBool to true, perfect!
Set genBool to false in the first place so you can omit the if on line 16.

I do have the bool set to false to start with, but I need the line 16bit too because there are lines after the instance of <gene> that I do not want (hence the </gene> to stop it), and I also use the same process for other keywords (<cell> and </cell>)

But yea, Ill look into this 'Vector Header' thing.

Thanks for all your help guys!!

Here's a small example using vectors, it's nothing special, just something i made to show you how size does not matter.

#include <iostream>
#include <vector>
#include <string>
#include <ctime>

int main(void){
    std::vector<std::string> v;
    std::string text = "abcdefghijklmnopqrstuvwxyz";
    std::string temp = "";
    
    srand((unsigned)time(NULL));
    int x = rand() % 100;
    int z = 0;
    
    for(int i = 0; i < x; i++){
            z = rand() % 20 + 1;
            for(int q = 0; q < z; q++){
                    temp.push_back(text[rand()%26]);
            }
            v.push_back(temp);
            temp.clear();
    }
    
    for(int i = 0; i < v.size(); i++)
            std::cout << "v[" << i << "]: " << v[i] << std::endl;
   
    return 0;
}

Chris

I like the vector solution to only read through the file once.

I'm presuming you would have 2 vectors, one for genes and one for cells.

Is is possible that genes or cells might occur more than once in the file?

If they do occur more than once, did you want to put all of the data from all of the ocurrances in the same vector?

As the file looks to be an XML format, can you rely on the tokens being the only thing on the line?

(i.e. will it always be:

<genes>
data
data
data
</genes>

or might you see

<genes>data</genes>

?

Below is an example file:

<gene-exp>
	<genes>
		BETA6
		CDC47
		CAP15
		CAP17
	</genes>
	<cells>
		kidney
		liver
		lung
	</cells>
	<matrix>
		1.4	0	0.5
		2.3	1.7	0.1
		0.8	0	0
		0.7	1.0	0.2
	</matrix>
</gene-exp>

I presume that if there is just the one gene/cell value, then it would still be on a separate line. So what I'm trying to do is put everything between the two gene <> tags to be in a gene array, and everything cell <> tags to be in a cell array.

Do any of you have a specific link that you would recommend for me to look at for Vectors? I've found a few, but was wondering what you see to be most helpful.

Thanks for all of your help!!!! :D

How about...don't read the file twice!! It's practices like this that cause execssive wear on harddrives. Why not load everything in the file into memory (an array/vector)? You will find that RAM has much better read/write times than your harddrive...

I thought that the vector code posted by Freaky_Chris was a great example of the parts of vector that you would be most interested in.

// declaring a vector
std::vector<std::string> genes;
// adding a string (genestring) to the vector
genes.push_back(genestring);
// accessing the second string 
//     (assumes there were at least 2 strings added)
std::cout << genes[1];

I have it sorted now, using vectors :)

Thanks for everyones help :D

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.