954,492 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Copying a microsoft word doc

I have 2 binaries - a java binary that requests a microsoft word doc from a c++ binary. The C++ binary opens the word doc in binary mode, reads x no of chars and returns chars to java binary. Java binary eventually receives all data and writes data using filestream write. When I try to open the newly created file, the contents are not readable. The size of the newly created file is the exact same size as the original file that is read by the C++ server.

Should the java and C++ binaries try and manipulate the microsoft word line feeds etc?

shealy
Newbie Poster
14 posts since Jun 2009
Reputation Points: 6
Solved Threads: 0
 

I once tried to do something like this with ONLY C++. Tried to take all the contents of a document and remake it with the same thing. The problem is though that there are some characters that may not show up (may not follow ascii char set) and their may be text that is not being retrieved. Make sure your getting all the text, so use a pointer:

#include <iostream>
#include <fstream>

using namespace std;

char* main(char* file)
{
    char* contents;
    char* buffer;
    int numOfChars = 0;
    char ch;
    ifstream fin(file);
    if(fin)
    {
        while(fin.get(ch))
        {
            buffer = new char[numOfChars+2];
            for(int i = 0; i < numOfChars; i++)
                buffer[i] = contents[i];
            buffer[numOfChars] = ch;
            buffer[numOfChars+1] = '\0';
            delete contents;
            contents = new char[++numOfChars+1];
            for(int i = 0; i < numOfchars; i++)
                contents[i] = buffer[i];
            contents[numOfChars] = '\0';
            delete buffer;
        }
        fin.close();
    }
    else
        return "Error";
    return contents;
}
u8sand
Junior Poster
131 posts since Dec 2008
Reputation Points: 78
Solved Threads: 15
 

>>When I try to open the newly created file, the contents are not readable
Its because doc files are binary files, not text files. Those files contain a lot of formatting information, such as font, font color, font size, etc, that is only readable by MS-Word or similar compatible program.

Binary files have to be opened in binary mode ifstream fin(file, ios::binary); and use stream's read() method.

ifstream fin(file, ios::binary);
ofstream out("newfile.doc", ios::binary);
char iobuffer[255];
while( fin.read( iobuffer, sizeof(iobuffer) )
{
    // do something with this block of data
    size_t sz = fin.gcount();
    out.write( iobuffer, sz);
}
Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 

I'm pretty sure that the text is being copied correctly insofar as one can using C++ filestream reads/writes and buffers. File sizes are the same also. Does one need to use microsoft apis to ensure that non-ascii chars are converted?

I once tried to do something like this with ONLY C++. Tried to take all the contents of a document and remake it with the same thing. The problem is though that there are some characters that may not show up (may not follow ascii char set) and their may be text that is not being retrieved. Make sure your getting all the text, so use a pointer:

#include <iostream>
#include <fstream>

using namespace std;

char* main(char* file)
{
    char* contents;
    char* buffer;
    int numOfChars = 0;
    char ch;
    ifstream fin(file);
    if(fin)
    {
        while(fin.get(ch))
        {
            buffer = new char[numOfChars+2];
            for(int i = 0; i < numOfChars; i++)
                buffer[i] = contents[i];
            buffer[numOfChars] = ch;
            buffer[numOfChars+1] = '\0';
            delete contents;
            contents = new char[++numOfChars+1];
            for(int i = 0; i < numOfchars; i++)
                contents[i] = buffer[i];
            contents[numOfChars] = '\0';
            delete buffer;
        }
        fin.close();
    }
    else
        return "Error";
    return contents;
}
shealy
Newbie Poster
14 posts since Jun 2009
Reputation Points: 6
Solved Threads: 0
 

I missed this reply - sorry. I am treating the microsoft word doc in the C++ code as a binary doc and using fstreams to read/write the data. Then when I use microsoft word to open the newly copied file, the contents are not readable.
Is it possible to just read the contents of the microsoft doc file in binary form, write and open without doing any formatting of special chars?

shealy
Newbie Poster
14 posts since Jun 2009
Reputation Points: 6
Solved Threads: 0
 

>>Does one need to use microsoft apis to ensure that non-ascii chars are converted?

Huh? I didn't post anything specific to microsoft, only standard C++ stuff. Binary files have to be opened in binary mode using ios::binary option. If you don't do that then the destination file will be corrupt.

Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 
I missed this reply - sorry. I am treating the microsoft word doc in the C++ code as a binary doc and using fstreams to read/write the data. Then when I use microsoft word to open the newly copied file, the contents are not readable. Is it possible to just read the contents of the microsoft doc file in binary form, write and open without doing any formatting of special chars?

That is exactly what the code snipped I posted will do. Its just standard file i/o operation, nothing special about it.

Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 

To be clear - I am using fstreams and read and opening the microsoft word doc in binary mode. Ditto with the newly created file that gets the contents of the word doc. All this is done using C++ code. When I try to view the newly created doc with ms-word, the contents are not readable.

shealy
Newbie Poster
14 posts since Jun 2009
Reputation Points: 6
Solved Threads: 0
 

I once tried to do something like this with ONLY C++. Tried to take all the contents of a document and remake it with the same thing. The problem is though that there are some characters that may not show up (may not follow ascii char set) and their may be text that is not being retrieved. Make sure your getting all the text, so use a pointer:

#include <iostream>
#include <fstream>

using namespace std;

char* main(char* file)
{
    char* contents;
    char* buffer;
    int numOfChars = 0;
    char ch;
    ifstream fin(file);
    if(fin)
    {
        while(fin.get(ch))
        {
            buffer = new char[numOfChars+2];
            for(int i = 0; i < numOfChars; i++)
                buffer[i] = contents[i];
            buffer[numOfChars] = ch;
            buffer[numOfChars+1] = '\0';
            delete contents;
            contents = new char[++numOfChars+1];
            for(int i = 0; i < numOfchars; i++)
                contents[i] = buffer[i];
            contents[numOfChars] = '\0';
            delete buffer;
        }
        fin.close();
    }
    else
        return "Error";
    return contents;
}


Hey, why has no-one told him that it's <strong>int main()</strong> andnot char* main() or void main() or ... ??

tux4life
Nearly a Posting Maven
2,350 posts since Feb 2009
Reputation Points: 2,134
Solved Threads: 243
 

When I actually tried it I had the same problem. I used a command prompt and found out that the two files were just a few bytes different.

well, the code I posted almost works. The problem is that the last few bytes does not get read/written

int main(int argc, char* argv[])
{
    char iobuf[255];
    size_t total = 0;
    size_t sz = 0;
    ifstream fin("file1.doc", ios::binary);
    if( !fin.is_open() )
    {
        cout << "Can't open the file\n";
        return 1;
    }
    ofstream fout( "copy.doc", ios::binary);
    while( fin.read(iobuf, sizeof(iobuf) ))
    {
        sz = fin.gcount();
        total += sz;
        fout.write(iobuf, sz);
        sz = 0;
    }
    sz = fin.gcount();
    if( sz > 0)
    {
        cout << "sz = " << sz << "\n";
        total += sz;
        fout.write(iobuf, sz);
    }
    fin.close();
    fout.close();
    cout << "Total = " << total << "\n";
	return 0;
}
Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 

Hey AD, in your post ( #3 ) you forgot an ending bracket on this line:
while( fin.read( iobuffer, sizeof(iobuffer) )
:P

tux4life
Nearly a Posting Maven
2,350 posts since Feb 2009
Reputation Points: 2,134
Solved Threads: 243
 
Hey AD, in your post ( #3 ) you forgot a bracket on this line: while( fin.read( iobuffer, sizeof(iobuffer) ) :P

Nobody is perfect :)

Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 

II have written code that can successfully read and write a microsoft word doc - that is if the word doc contains plain text only. If there are any headings and different fonts used, these are not copied successfully. Which brings me back to my original question - when copying word docs does one need to manipulate the non-ascii chars? And how is this done?

When I actually tried it I had the same problem. I used a command prompt and found out that the two files were just a few bytes different.

well, the code I posted almost works. The problem is that the last few bytes does not get read/written

int main(int argc, char* argv[])
{
    char iobuf[255];
    size_t total = 0;
    size_t sz = 0;
    ifstream fin("file1.doc", ios::binary);
    if( !fin.is_open() )
    {
        cout << "Can't open the file\n";
        return 1;
    }
    ofstream fout( "copy.doc", ios::binary);
    while( fin.read(iobuf, sizeof(iobuf) ))
    {
        sz = fin.gcount();
        total += sz;
        fout.write(iobuf, sz);
        sz = 0;
    }
    sz = fin.gcount();
    if( sz > 0)
    {
        cout << "sz = " << sz << "\n";
        total += sz;
        fout.write(iobuf, sz);
    }
    fin.close();
    fout.close();
    cout << "Total = " << total << "\n";
	return 0;
}
shealy
Newbie Poster
14 posts since Jun 2009
Reputation Points: 6
Solved Threads: 0
 
II have written code that can successfully read and write a microsoft word doc - that is if the word doc contains plain text only

That isn't a microsoft word doc, but a normal text file.. If there are any headings and different fonts used, these are not copied successfully. Which brings me back to my original question - when copying word docs does one need to manipulate the non-ascii chars? And how is this done?

See the code I already posted and which you quoted. If all you want to do is copy the file then the answer to your question is NO.

Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 

The microsoft word doc is copied successfully - byte per byte. Copied file is same size as original. However, when I try and use microsoft word to open the copied file, the copied file contents which have non-ascii text are not readable. The ascii text is readable. So the copied file is worthless to the end user if he/she cannot see the non-ascii parts. So I want to be able to copy the file AND open it and read it successfully using ms-word.

That isn't a microsoft word doc, but a normal text file.

See the code I already posted and which you quoted. If all you want to do is copy the file then the answer to your question is NO.

shealy
Newbie Poster
14 posts since Jun 2009
Reputation Points: 6
Solved Threads: 0
 
The microsoft word doc is copied successfully - byte per byte. Copied file is same size as original. However, when I try and use microsoft word to open the copied file, the copied file contents which have non-ascii text are not readable. The ascii text is readable. So the copied file is worthless to the end user if he/she cannot see the non-ascii parts. So I want to be able to copy the file AND open it and read it successfully using ms-word.

Zip up the file you are trying to copy and post it so that I can test it. The doc file I tested is readable by MS-Word as expected, and it contains quite a bit of graphics and charts, so there is no reason that program does not work with any document.

Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 

I can confirm AD's code works correctly.
(Also tested it on a couple of Word files, playing a bit with the formatting)
The file is loading correctly after copying.

To the OP:
Ensure that you're copying a file which isn't corrupted, before copying you should check whether the file you want to copy loads correctly in MS Word, otherwise you've already missed the boat before the copying process starts.

tux4life
Nearly a Posting Maven
2,350 posts since Feb 2009
Reputation Points: 2,134
Solved Threads: 243
 

Attached is the input file - Input.doc and the output file that is created - Output.doc. As you can see outfile looks very different to the inputfile. I used the code that you provided to test this.

When I actually tried it I had the same problem. I used a command prompt and found out that the two files were just a few bytes different.

well, the code I posted almost works. The problem is that the last few bytes does not get read/written

int main(int argc, char* argv[])
{
    char iobuf[255];
    size_t total = 0;
    size_t sz = 0;
    ifstream fin("file1.doc", ios::binary);
    if( !fin.is_open() )
    {
        cout << "Can't open the file\n";
        return 1;
    }
    ofstream fout( "copy.doc", ios::binary);
    while( fin.read(iobuf, sizeof(iobuf) ))
    {
        sz = fin.gcount();
        total += sz;
        fout.write(iobuf, sz);
        sz = 0;
    }
    sz = fin.gcount();
    if( sz > 0)
    {
        cout << "sz = " << sz << "\n";
        total += sz;
        fout.write(iobuf, sz);
    }
    fin.close();
    fout.close();
    cout << "Total = " << total << "\n";
	return 0;
}
Attachments Input.doc (12.21KB) Output.doc (12.21KB)
shealy
Newbie Poster
14 posts since Jun 2009
Reputation Points: 6
Solved Threads: 0
 

File is def not corrupted before copying. FYI am using a C++ binary on 2.8 sun solaris operating system. Run the binary and binary creates Output.doc from Input.doc. Output.doc is then ftp'd to desktop where I use miscrosoft word to open it.

Thanks for the help so far.

I can confirm AD's code works correctly. (Also tested it on a couple of Word files, playing a bit with the formatting) The file is loading correctly after copying.

To the OP: Ensure that you're copying a file which isn't corrupted, before copying you should check whether the file you want to copy loads correctly in MS Word, otherwise you've already missed the boat before the copying process starts.

shealy
Newbie Poster
14 posts since Jun 2009
Reputation Points: 6
Solved Threads: 0
 

There might be ftp problem. And I don't know what will happen if you try to copy MS-World doc file on your solaris operating system.

Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You