I'm revisiting a topic previously discussed in this thread: http://www.daniweb.com/techtalkforums/thread23030.html
The basic issue is this: when using a StreamReader, and ReadLine, to process a text file, you cannot determine where you are "at" within a file.
This is because StreamReader doesn't actually read from a FILE, it reads from a BUFFER. You can know the actual file position, by looking at the .BaseStream.Position property. However, if you want to know the position, in the file, of the record you just read with StreamReader.ReadLine(), you cannot know.
There has to be an elegant solution to this. I haven't found it.
I'm processing extremely large (4GB+) text files. They are actually PostScript printstreams. I have a number of operations to perform on these files. In fact, I need to perform randon file i/o. As an example, the printstream might mix invoices and credit memos. I need to extract all credit memos to a separate file. Imagine they look exactly alike, only the string "CREDIT MEMO" appears in the middle of page 1 of a credit memo.
I know when a "document" begins. I know when one ends. I know when a document I'm currently "reading" is a credit memo. I need to be able to REPOSITION the stream back to the starting record of the document, and extract until I reach the end of the document.
Specifically, I need to note the byte-position of a particular record, so that I can BaseStream.Seek() back to it.
I'm using a StreamReader because I am indeed reading a text file, and I do need the speed that buffering supplies. However, the buffer prevents me from know exactly where a particular record is within the file.
One idea is to add each record's Length to a counter. The problem is line-termination characters. Does the file use 1 or 2 bytes? How can you know?
Any other ideas?