OK, the eventual goal of my project is a kind of word insert/delete/replace. You read a file... "The quick brown fox jumps over the lazy dog." and you can delete all instances of "fox", replace "fox" with "cat", insert "giraffe" wherever you want, etc. Word processing stuff.

So I figured I'd have a big file and start reading it into a stream, parse that stream, then do whatever (replace, add, delete). Except I can't seem to get that far since it breaks with my simple little "insert" experiment program. The program below starts with a stream...

abcdefghijkmnopqr

The goal is to end up with this stream...

abcdefgzzzhijkmnopqr

What I get is this...

zzzdefghijkmnopqr

So I have at least two problems.

  1. "zzz" is being written at position 0, not position 7. Note the 7 in line 33.
  2. "zzz" overwrites the characters in the stream. It doesn't insert.

Code is below. I'd like to know

  1. What I'm doing wrong.
  2. Is this (manipulating iostreams) a decent approach to the whole problem?
#include <iostream>
#include <sstream>
#include <cassert>
using namespace std;


void PrintStream(iostream& stream)
{
    int startPos = stream.tellg();
    assert(stream.good());
    cout << "Start position = " << startPos << endl;

    char c;

    while(stream.good())
    {
        c = stream.get();
        if(stream.good())
        {
            cout << c;
        }
    }

    cout << endl;

    stream.clear();
    stream.seekg(startPos);
    assert(stream.good());
}

void Manipulate(iostream& stream)
{
    stream.seekg(7, ios_base::beg);
    assert(stream.good());
    stream.write("zzz", 3);
    assert(stream.good());
}


int main()
{
    string aString = "abcdefghijkmnopqr";
    stringstream stream(aString);

    stream.seekg(0, ios_base::beg);
    assert(stream.good());
    PrintStream(stream);

    Manipulate(stream);

    stream.seekg(0, ios_base::beg);
    assert(stream.good());
    PrintStream(stream);

    cin.get();
    return 0;
}

Note. All "assert" statements succeed. Actual program output is the following.

Start position = 0
abcdefghijkmnopqr
Start position = 0
zzzdefghijkmnopqr

>stream.seekg(7, ios_base::beg);
seekg() is for adjusting the read pointer. You want seekp() to adjust the write pointer.

the function seekg() is to move the get-pointer from position to position,
On the other hand seekp() moves the put-pointer.
I assume that the put-pointer is used for writing into a stream where as the get-pointer is rather to just read through.

Seekp

This would clear why the data is being written over the first 3 characters.

AFAIK I think it is only possible to overwrite the data in a stream with the write() function. However I would really love to know how we can insert into it. So I really cant help answering that.

Thanks on the seekg versus seekp. It now replaces words of the same length just fine (or at least fine as far as I've tested it. It probably requires some more testing to make sure I'm not missing anything). The code below replaces "cat" with "bat", then "bat" with "dog". So far this has been the easy case. It's replacement rather than deletion or insertion and the replacement word has the same number of letters.

Does anyone have any idea how to do the others (replacement with a different number of letters, insertion, deletion)? Am I OK to approach this by manipulating iostreams?

#include <iostream>
#include <sstream>
#include <cassert>
using namespace std;


void PrintStream(iostream& stream)
{
    int startPos = stream.tellg();
    assert(stream.good());
    cout << "Start position = " << startPos << endl;

    char c;

    while(stream.good())
    {
        c = stream.get();
        if(stream.good())
        {
            cout << c;
        }
    }

    cout << endl;

    stream.clear();
    stream.seekg(startPos);
    assert(stream.good());
}

void ReplaceWithSameLength(iostream& stream, const string str1, const string str2)
{
    int len1 = str1.length();
    int len2 = str2.length();
    assert(len1 == len2);
    assert(len1 > 0);

    stream.seekg(0, ios_base::beg);
    char c;
    int loc;

    while(stream.good())
    {
        loc = stream.tellg();
        c = stream.get();
        if(!stream.good())
        {
            continue;
        }
        if(str1[0] == c)
        {
            bool replace = true;
            for(int i = 1; i < len1; i++)
            {
                c = stream.get();
                if(!stream.good())
                {
                    replace = false;
                    continue;
                }
                if(str1[i] != c)
                {
                    replace = false;
                }
            }

            if(replace)
            {
                stream.seekp(loc);
                stream.write(str2.c_str(), len2);
            }

            stream.seekg(loc + 1);
        }
    }

    stream.clear();
}

How about this a pproach.
As your application has to go through the entire stream anyway, You could try to implement a buffer in which you can modify the data and write back the whole file into the stream.

Consider this example:
Original Stream= abcdefghijklmnopq
Word to be Inserted is:HELLO
Position to be inserted: 10 (Consider the array-index starts from 0 as a C/C++ array )
Consider the size of the buffer to be 8 characters long.

1) Read the stream into the buffer.
[abcdefgh]
2) As it doesn't have to do anything here it can written it back to the buffer.
3) then the next read will have
[ijklmnop] With insertion point after i
4)so the buffer will now hold
[iHELLOjk] ->With [lmnop] in another variable (whatever remains after insertion)
5) the next stream would be [lmnopqrs] which continues to fully overwrite the rest of the file/Stream.

As files/streams are contiguous sets of bytes. I find it rather very tough to insert data into it. Its probably the only way AFAIK to insert, which is to overwrite the complete stream.

EDIT:

It is almost like writing the data in 2 files.. Input and output, however Considering the file is huge, it would take loads of memory(2times source file and the destination file) and almost the same amount of time as read, modify and write. However this would decrease the memory consideration with a slight increase in the execution time.

Edited 5 Years Ago by Sky Diploma: Adding extra comments

Am I OK to approach this by manipulating iostreams?

I wouldn't go down that particular path unless it's an exercise in stream processing. Inserting and deleting with an inherently sequential source is tedious at best. One thing you can do with streams is work with a source and a sink (input and output stream), and keep an intermediate buffer for matching the search string.

If there's a pending match, keep adding to the buffer. If the buffer eventually matches, write the replacement string, otherwise write the buffer and then the character that failed to match. Something like so:

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main()
{
    const string target("ghi");
    const string replacement("zzzzzz");

    istringstream in("abcdefghijklmnoqrstuvwxyz");
    string buffer;
    char ch;

    while (in.get(ch)) {
        if (buffer == target) {
            cout << replacement;
            buffer.clear();
        }

        if (ch == target[buffer.size()])
            buffer.push_back(ch);
        else if (!buffer.empty())
            cout << buffer << ch;
        else
            cout.put(ch);
    }
}

As files/streams are contiguous sets of bytes.

This I did not know. So if I have a stream of 100,000,000 bytes, they're all contiguous in memory? I figured there was some jumping around in the internal memory. Streams have always been a big black box for me as far as what goes on inside. So adding "a" at the beginning would mean that 100,000,000 bytes have to be moved, just like in a 100,000,000 element array? If that's the case, it seems like I'm taking a pretty inefficient approach.

Narue's idea seems interesting. I think I'd have the output stream as something other than cout, but the point is that at the end, it would have the "correct" data in it. It'd be pretty easy to change for deletions too. Just substitute "" for the word.

I'm going to take several steps back with this and maybe start over. I was assuming I could manipulate streams as easily as I could manipulate strings. That looks to not be the case.

So if I have a stream of 100,000,000 bytes, they're all contiguous in memory?

No. A more likely scenario is that chunks of those characters share the same parts of memory temporarily as they're consumed and the stream buffer is refilled. As far as your concerned, a stream is a queue of characters with very limited out of queue operations like seeking.

Streams have always been a big black box for me as far as what goes on inside.

Yeah, that's the idea. And trust me when I say you really don't want to know. iostream internals is one hell of a can of worms. ;)

Edited 5 Years Ago by Narue: n/a

Comments
Good advice in this thread, as usual

Does anyone have any idea how to do the others (replacement with a different number of letters, insertion, deletion?

How about this one...
For insertion...
1)First read the line in the file and have it in the string.Insert the string "zzz" in the string that has the text read from the file.
2)Rewrite the line (the line in which you wanted to insert the text "zzz") with the string(the string which has the inserted text along with other texts).

For deletion...
1)get the line in a string and delete the text in the string. Replace the line with this string.

Both does all the modifications in the string and rewriting it in the file rather than making the changes in the file itself.

Correct me if i am wrong.
Hope this helps.

I think my post is similar to what sky diploma said.

Edited 5 Years Ago by Arbus: n/a

This I did not know. So if I have a stream of 100,000,000 bytes, they're all contiguous in memory? I figured there was some jumping around in the internal memory. Streams have always been a big black box for me as far as what goes on inside. So adding "a" at the beginning would mean that 100,000,000 bytes have to be moved, just like in a 100,000,000 element array? If that's the case, it seems like I'm taking a pretty inefficient approach.

Narue's idea seems interesting. I think I'd have the output stream as something other than cout, but the point is that at the end, it would have the "correct" data in it. It'd be pretty easy to change for deletions too. Just substitute "" for the word.

I'm going to take several steps back with this and maybe start over. I was assuming I could manipulate streams as easily as I could manipulate strings. That looks to not be the case.

@Vernon:Ah, here comes the situation where I have vouched for something that I have heard from my Cpp instructor, thereby I won't be assuring you of that. I might be wrong. But as far as editing the stream is concerned. I find no other option than rewriting the whole stream.

@Narue:
Your method is very clean :) and by far requires no finace. Only one hiccup is that
If the source is a very large file/stream. the sink would take up as much space as the source right?
Its like creating an output file out of the input file and then deleting the input file and renaming the output file back to as the input file so thatit looks as if the file was modified.
Is there anyway to avoid that?

Comments
Thanks for the advice in this thread.

I imagine it's cheating, but I've been taking the coward's way out and reading the whole file into one gigantic string and manipulating that string. One of the things I'm doing is parsing a C++ program for comments so I have to find the starting /* and the ending */ and replace everything inside with spaces. Anyway, SO FAR all the files I've been playing with have been short enough so that works. String manipulation seems vastly less complicated than stream manipulation. But again, this all feels like cheating to me. The method ought to work for a 1000 page book.

I'm going to leave this unsolved for now in case anyone else wants to weigh in. I appreciate everyone's advice. I'm still digesting some of it.

Its like creating an output file out of the input file and then deleting the input file and renaming the output file back to as the input file so thatit looks as if the file was modified.
Is there anyway to avoid that?

You could implement some kind of move setup for the source so that it's destructive rather than a copy, but there's no standard support for it, and you'd suffer the associated performance hits.

But the real question would be: is this really expected to be an issue? You'll only see it when both memory and storage are severely constrained, and there's no option to stream with alternative sources/destinations, and you have an unreasonably large file for the system that needs to be modified with insertions, deletions, and replacements at random positions. I don't see it as likely unless you're a developer who habitually makes poor decisions. ;)

@Narue:Agreed

Another question over here is, if you would actually insert elements into a file. Its size is expected to increase and (hypothetically) would overwrite some other programs/data that is stored after the files end limit.. Is it posssible? Or would it go on a clean/free memory ?

Its size is expected to increase and (hypothetically) would overwrite some other programs/data that is stored after the files end limit.

When you open a file and append to it for example, nothing gets overwritten, right? It's the same idea. If the file system allowed insertion, the effect would be the same.

Basing upon the code that Vernon wrote,
I managed to write an insert method :) . This would Replace a string of a greater length into the stream.

Now I am thinking of how we could write one to replace one whose length is smaller than the actual one.

Check it out.

void ReplaceWithDifferentLength(iostream& stream, const string str1, const string str2)
{

    int len1 = str1.length();
    int len2 = str2.length();
    // assert(len1 == len2);
    //assert(len1 > 0);
    stream.seekg(0, ios_base::beg);
    stream.seekp(0, ios_base::beg);
    char* inp=new char[len1];
    stream.read(inp, len1);
    string buffer(inp); //Making a buffer with fixed input.
    delete[] inp;
    if ( len2 > len1 )
    {
        char c;
        while ( !buffer.empty())
        {
            if(stream.good())
            {
                c=stream.get();
                buffer+=c;

            if(buffer.find(str1)!= string::npos)
            buffer.replace(buffer.find(str1), len1-1  , str2); // was bugged with len1, changed to len1-1
            stream.put(buffer[0]);
            //   cout<<"Buffer: "<<buffer<<endl;
            //PrintStream(stream);
            buffer.erase(0,1);
            }
            else
            {
                stream.clear(); //Clears the Errors flagged up (Eof).
                assert(stream.good()); //Just checking here
                stream.write(buffer.c_str(),buffer.size());
                buffer.clear();// empties the string.
            }
        }
    }
}

Tests done here

int main()
{
    string aString = "abcdefghijkmnopqr";
    stringstream stream(aString);

    stream.seekg(0, ios_base::beg);
    assert(stream.good());
    PrintStream(stream);
    string a="efg";
    string b="SkyDiploma";
    ReplaceWithDifferentLength(stream,a,b);
    stream.seekg(0, ios_base::beg);
    assert(stream.good());
    PrintStream(stream);

    //cin.get();
    return 0;
}

The PrintStream() function is the same one that Vernon wrote. I was hoping you would give me some suggestions on this one :)

EditeD:
A small bug fix was done on

buffer.replace(buffer.find(str1), len1  , str2);

to

buffer.replace(buffer.find(str1), len1-1  , str2);

Edited 5 Years Ago by Sky Diploma: Bug in code

This article has been dead for over six months. Start a new discussion instead.