I am creating a creating a program which needs the ability to make the get stream pointer jump to a specified line in the textfile it has opened. For example line 153.
The lines have a variable length and contain only numbers.
Is there a way to do this and how?

An other part of the program creates the numbers in sequence. So if this is not possible or if it is unnecessary hard this way, could you suggest an other way to get these numbers ordered differently?

Recommended Answers

All 11 Replies

Hm, I can imagine seekg() and seekp() or getline would get the job done. A while loop counting how many getlines you've done or just jump right in with seekg / seekp

I'll give you a pointer to a good reference manual -> Input/Output usages

I already read the page you referred to. And I have these questions about your solutions.

A while loop counting how many getlines you've done

This way I would need to load all the lines before line 153 in the memory. That would use up quite a lot of my resources because the file contains a lot of lines (going towards a couple of GB in textfile).
I could split up the textfile when I am creating it, but I am sort of more looking for a way to jump to the needed line.

just jump right in with seekg / seekp

I don't understand how you would make this one work. In my knowledge you can't specify the amount of lines down directly with these functions only the spot of a character. And since the characters per line are variable I can't use it that way.
If I'm wrong though, I would like this way very much, since it seems like the easiest way.

Hm, I imagined using seek to look for newline characters. In any case, how does getline use up alot of resources? It's not like you have to save each line of code in memory you go through, you just overwrite it with a new one, right? Maybe I've misunderstood what getline does.

Just to be clear, I do not know of another way to go about this.

Hm, I imagined using seek to look for newline characters. In any case, how does getline use up alot of resources? It's not like you have to save each line of code in memory you go through, you just overwrite it with a new one, right? Maybe I've misunderstood what getline does.

Just to be clear, I do not know of another way to go about this.

Just to be clear, I'm a starting programmer, so I could be wrong about the things I said.
But the resource problem I was talking about would be that the program has to write and erase a couple of gigs to the memory every time the program runs and the time that takes. However, it could be that I'm overestimating the time that takes.

has to write and erase a couple of gigs to the memory every time the program runs

My solution:
Read the file one burst at a time, like 1024 characters. Add 1 to a counter each time you reach a newline character, and also keep track of your position in the file (e.g. each file read pos += 1024. When your newline counter reaches your desired line, your position in the file will be your pos - offset from the end of your 1024 file buffer, where you read the proper newline character. You can make this code slightly shorter by reading one character at a time, but that is not recommended unless you don't care about speed. And speaking about speed, declare the buffer and the two counters before your read loop.

Look for '\n' or whatever new line character is in your OS.
Count them, they will tell you which line you are in.
When you reach the line you want, then do whatever you want.

Use seekp(), tellp(), get(). You may actually use binary mode when opening file if you know the structure of the file.

I think it will make so much faster using a buffer there since you are reading sequentially. If it was random access it probably won't matter.

I used the getline way of doing this.
my files have a lot of lines(400MB with 3 characters on each row)
I tested it just now and it worked, but slower than I would like.
I'm quite new to c++ programming so could any of you explain how you search for a '\n' character (I'm using windows).
And how to use a buffer to do this.

I actually posted a code snippet the other day that implements a fast line search example. You still get the slowness, but only once when indexing the file. Subsequent searches are much faster.

I actually posted a code snippet the other day that implements a fast line search example. You still get the slowness, but only once when indexing the file. Subsequent searches are much faster.

Does that indexing need the files to be kept in the memory?
Because all files together are 80GB and that just won't work on my pc.
I just want a way to quickly jump to a line in the file.
So does searching for a newline character in a buffer make my program faster than with a getline loop?
If so, how do I do it?

Does that indexing need the files to be kept in the memory?

No, at most you'll have one line and one file position object in memory at any given time for a single query of any line in the file. The file position object is small and constant, so the only issue is if your line is farking huge. ;)

I just want a way to quickly jump to a line in the file.

Then my snippet is exactly what you want.

So does searching for a newline character in a buffer make my program faster than with a getline loop?

Not for any reasonably large file. The problem is in the process, not the details. Let's assume you have an array {1, 2, 3, 4, 5} , it takes ten seconds to move from one array index to the next, and you can only get to one element from the previous element (ie. sequential access). Searching for 1 would take ten seconds and searching for 5 would take fifty seconds. It really doesn't matter how you implement that algorithm, it'll retain the same properties and still be slow.

If your buffer is very large, you can minimize the slowness by limiting file accesses, but then you're using a large amount of memory.

If so, how do I do it?

Read my code snippet. What I did was loop through the file one time and write a second index file that stores the position of each line:

Index  Source
0      "This\n"
5      "is\n"
8      "a\n"
10     "test\n"

The index file now has position objects of equal size and you can seek directly to them using an offset, just like an array:

index[0 * sizeof pos_object] = 0
index[1 * sizeof pos_object] = 5
index[2 * sizeof pos_object] = 8
index[3 * sizeof pos_object] = 10

This process results in one seek and one read on the index file, which is a constant time operation. With the position object, you can then seek into the source file and read the line. This results in one seek and one read on the source file. So regardless of which line you're trying to read, the time it takes is relatively constant: two seeks, two reads.

Compare that to sequential access where the number of reads is proportional to the index of the line you're looking for. The first line only requires one read, but the 10,000th line requires 10,000 reads.

There is no way to determine where in a file each line begins without reading the entire file (unless you're creating the file, in which case you can remember the line locations as you write them--but that's the same kind of operation as reading it).

So I don't see any way around reading at least as much of the file as needed to determine the location of the highest-numbered line you've ever requested. The following strategy seems to make sense:

1) Plan to create a vector with one element for each line. Each vector element is a seek pointer that corresponds to the location of the beginning of the first character of the correspondingly numbered line. As a micro-optimization, you might omit the first line of the file, because that is known to start at the beginning of the file; but I wouldn't recommend it.

2) Suppose you now want to locate line n in the file. If the vector has at least n elements, all you need to do is seek to the point that corresponds to the beginning of that line.

3) Therefore, the only hard part of the problem is what happens if the vector has fewer than n lines. At that point, you need to seek to the position corresponding to the last element of the vector, and keep reading lines until either the vector is big enough or you reach end of file.

This strategy requires one seek pointer per line in the file. If you don't have the memory for all of those pointers, you can reduce memory usage by storing a pointer to the beginning of every n-th line. Then you have to count forward from the last line before the one you want to the beginning of the line itself.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.