Copying a microsoft word doc

Question

shealy -4 Newbie Poster

14 Years Ago

I have 2 binaries - a java binary that requests a microsoft word doc from a c++ binary. The C++ binary opens the word doc in binary mode, reads x no of chars and returns chars to java binary. Java binary eventually receives all data and writes data using filestream write. When I try to open the newly created file, the contents are not readable. The size of the newly created file is the exact same size as the original file that is read by the C++ server.

Should the java and C++ binaries try and manipulate the microsoft word line feeds etc?

c++

6 Contributors
30 Replies
535 Views
1 Week Discussion Span
Latest Post 14 Years Ago Latest Post by Krushnat

All 30 Replies

Ancient Dragon 5,243 Achieved Level 70

14 Years Ago

II have written code that can successfully read and write a microsoft word doc - that is if the word doc contains plain text only

That isn't a microsoft word doc, but a normal text file.

. If there are any headings and different fonts used, these are not copied successfully. Which brings me back to my original question - when copying word docs does one need to manipulate the non-ascii chars? And how is this done?

See the code I already posted and which you quoted. If all you want to do is copy the file then the answer to your question is NO.

Ancient Dragon 5,243 Achieved Level 70

14 Years Ago

if I ftp any work doc in binary format or asicc format and view the resulting file they are all fine. ftping the copied file that my binary creates does not work fine. So not too sure it's an ftp issue.
What o/s are you using to copy file?

I am using Microsoft Vista Home Premium and VC++ 2008 Express compiler. No FPT involved. Are you using the code I posted or something you wrote ?

Where did the input.doc file come from that you posted? Was it also FTPd to Windows machine before you posted it here? Or did you post it directly from a browser running on Solaris os ?

Duoas 1,025 Postaholic

14 Years Ago

I agree with AD that the FTP is probably what is going wrong.

FTP defaults to text mode. You must explicitly set it to binary mode before copying files.

mvmalderen commented: Yes. +12

Ancient Dragon 5,243 Achieved Level 70

14 Years Ago

The problem could also be one of Endianness. The byte order on windows and *nix computers are reversed. Maybe your os is writing the bytes out in reverse order during the copy process. To test that use your system's command-line copy function to make the copy, FTP to Windows and check it with MS-Word.

Ancient Dragon 5,243 Achieved Level 70

14 Years Ago

did you try my previous suggestion yet?

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

u8sand 68 Junior Poster · Answer 1 · 2009-07-02T07:24:57+00:00

I once tried to do something like this with ONLY C++. Tried to take all the contents of a document and remake it with the same thing. The problem is though that there are some characters that may not show up (may not follow ascii char set) and their may be text that is not being retrieved. Make sure your getting all the text, so use a pointer:

#include <iostream>
#include <fstream>

using namespace std;

char* main(char* file)
{
    char* contents;
    char* buffer;
    int numOfChars = 0;
    char ch;
    ifstream fin(file);
    if(fin)
    {
        while(fin.get(ch))
        {
            buffer = new char[numOfChars+2];
            for(int i = 0; i < numOfChars; i++)
                buffer[i] = contents[i];
            buffer[numOfChars] = ch;
            buffer[numOfChars+1] = '\0';
            delete contents;
            contents = new char[++numOfChars+1];
            for(int i = 0; i < numOfchars; i++)
                contents[i] = buffer[i];
            contents[numOfChars] = '\0';
            delete buffer;
        }
        fin.close();
    }
    else
        return "Error";
    return contents;
}

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 2 · 2009-07-02T07:31:32+00:00

>>When I try to open the newly created file, the contents are not readable
Its because doc files are binary files, not text files. Those files contain a lot of formatting information, such as font, font color, font size, etc, that is only readable by MS-Word or similar compatible program.

Binary files have to be opened in binary mode ifstream fin(file, ios::binary); and use stream's read() method.

ifstream fin(file, ios::binary);
ofstream out("newfile.doc", ios::binary);
char iobuffer[255];
while( fin.read( iobuffer, sizeof(iobuffer) )
{
    // do something with this block of data
    size_t sz = fin.gcount();
    out.write( iobuffer, sz);
}

shealy -4 Newbie Poster · Answer 3 · 2009-07-02T15:12:32+00:00

I'm pretty sure that the text is being copied correctly insofar as one can using C++ filestream reads/writes and buffers. File sizes are the same also. Does one need to use microsoft apis to ensure that non-ascii chars are converted?

I once tried to do something like this with ONLY C++. Tried to take all the contents of a document and remake it with the same thing. The problem is though that there are some characters that may not show up (may not follow ascii char set) and their may be text that is not being retrieved. Make sure your getting all the text, so use a pointer:
#include <iostream>
#include <fstream>

using namespace std;

char* main(char* file)
{
    char* contents;
    char* buffer;
    int numOfChars = 0;
    char ch;
    ifstream fin(file);
    if(fin)
    {
        while(fin.get(ch))
        {
            buffer = new char[numOfChars+2];
            for(int i = 0; i < numOfChars; i++)
                buffer[i] = contents[i];
            buffer[numOfChars] = ch;
            buffer[numOfChars+1] = '\0';
            delete contents;
            contents = new char[++numOfChars+1];
            for(int i = 0; i < numOfchars; i++)
                contents[i] = buffer[i];
            contents[numOfChars] = '\0';
            delete buffer;
        }
        fin.close();
    }
    else
        return "Error";
    return contents;
}

shealy -4 Newbie Poster · Answer 4 · 2009-07-02T15:17:11+00:00

I missed this reply - sorry. I am treating the microsoft word doc in the C++ code as a binary doc and using fstreams to read/write the data. Then when I use microsoft word to open the newly copied file, the contents are not readable.
Is it possible to just read the contents of the microsoft doc file in binary form, write and open without doing any formatting of special chars?

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 5 · 2009-07-02T15:17:49+00:00

>>Does one need to use microsoft apis to ensure that non-ascii chars are converted?

Huh? I didn't post anything specific to microsoft, only standard C++ stuff. Binary files have to be opened in binary mode using ios::binary option. If you don't do that then the destination file will be corrupt.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 6 · 2009-07-02T15:19:12+00:00

I missed this reply - sorry. I am treating the microsoft word doc in the C++ code as a binary doc and using fstreams to read/write the data. Then when I use microsoft word to open the newly copied file, the contents are not readable.
Is it possible to just read the contents of the microsoft doc file in binary form, write and open without doing any formatting of special chars?

That is exactly what the code snipped I posted will do. Its just standard file i/o operation, nothing special about it.

shealy -4 Newbie Poster · Answer 7 · 2009-07-02T15:21:20+00:00

To be clear - I am using fstreams and read and opening the microsoft word doc in binary mode. Ditto with the newly created file that gets the contents of the word doc. All this is done using C++ code. When I try to view the newly created doc with ms-word, the contents are not readable.

mvmalderen 2,072 Postaholic · Answer 8 · 2009-07-02T15:22:05+00:00

I once tried to do something like this with ONLY C++. Tried to take all the contents of a document and remake it with the same thing. The problem is though that there are some characters that may not show up (may not follow ascii char set) and their may be text that is not being retrieved. Make sure your getting all the text, so use a pointer:
#include <iostream>
#include <fstream>

using namespace std;

char* main(char* file)
{
    char* contents;
    char* buffer;
    int numOfChars = 0;
    char ch;
    ifstream fin(file);
    if(fin)
    {
        while(fin.get(ch))
        {
            buffer = new char[numOfChars+2];
            for(int i = 0; i < numOfChars; i++)
                buffer[i] = contents[i];
            buffer[numOfChars] = ch;
            buffer[numOfChars+1] = '\0';
            delete contents;
            contents = new char[++numOfChars+1];
            for(int i = 0; i < numOfchars; i++)
                contents[i] = buffer[i];
            contents[numOfChars] = '\0';
            delete buffer;
        }
        fin.close();
    }
    else
        return "Error";
    return contents;
}

Hey, why has no-one told him that it's [B]int main()[/B] and not char* main() or void main() or ... ??

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 9 · 2009-07-02T15:34:00+00:00

When I actually tried it I had the same problem. I used a command prompt and found out that the two files were just a few bytes different.

well, the code I posted almost works. The problem is that the last few bytes does not get read/written

int main(int argc, char* argv[])
{
    char iobuf[255];
    size_t total = 0;
    size_t sz = 0;
    ifstream fin("file1.doc", ios::binary);
    if( !fin.is_open() )
    {
        cout << "Can't open the file\n";
        return 1;
    }
    ofstream fout( "copy.doc", ios::binary);
    while( fin.read(iobuf, sizeof(iobuf) ))
    {
        sz = fin.gcount();
        total += sz;
        fout.write(iobuf, sz);
        sz = 0;
    }
    sz = fin.gcount();
    if( sz > 0)
    {
        cout << "sz = " << sz << "\n";
        total += sz;
        fout.write(iobuf, sz);
    }
    fin.close();
    fout.close();
    cout << "Total = " << total << "\n";
	return 0;
}

mvmalderen 2,072 Postaholic · Answer 10 · 2009-07-02T16:16:33+00:00

Hey AD, in your post (#3) you forgot an ending bracket on this line: while( fin.read( iobuffer, sizeof(iobuffer) ) :P

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 11 · 2009-07-02T16:36:08+00:00

Hey AD, in your post (#3) you forgot a bracket on this line: while( fin.read( iobuffer, sizeof(iobuffer) ) :P

Nobody is perfect :)

shealy -4 Newbie Poster · Answer 12 · 2009-07-04T03:19:29+00:00

II have written code that can successfully read and write a microsoft word doc - that is if the word doc contains plain text only. If there are any headings and different fonts used, these are not copied successfully. Which brings me back to my original question - when copying word docs does one need to manipulate the non-ascii chars? And how is this done?

When I actually tried it I had the same problem. I used a command prompt and found out that the two files were just a few bytes different.

well, the code I posted almost works. The problem is that the last few bytes does not get read/written

int main(int argc, char* argv[])
{
    char iobuf[255];
    size_t total = 0;
    size_t sz = 0;
    ifstream fin("file1.doc", ios::binary);
    if( !fin.is_open() )
    {
        cout << "Can't open the file\n";
        return 1;
    }
    ofstream fout( "copy.doc", ios::binary);
    while( fin.read(iobuf, sizeof(iobuf) ))
    {
        sz = fin.gcount();
        total += sz;
        fout.write(iobuf, sz);
        sz = 0;
    }
    sz = fin.gcount();
    if( sz > 0)
    {
        cout << "sz = " << sz << "\n";
        total += sz;
        fout.write(iobuf, sz);
    }
    fin.close();
    fout.close();
    cout << "Total = " << total << "\n";
	return 0;
}

shealy -4 Newbie Poster · Answer 13 · 2009-07-06T17:01:37+00:00

The microsoft word doc is copied successfully - byte per byte. Copied file is same size as original. However, when I try and use microsoft word to open the copied file, the copied file contents which have non-ascii text are not readable. The ascii text is readable. So the copied file is worthless to the end user if he/she cannot see the non-ascii parts. So I want to be able to copy the file AND open it and read it successfully using ms-word.

That isn't a microsoft word doc, but a normal text file.
See the code I already posted and which you quoted. If all you want to do is copy the file then the answer to your question is NO.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 14 · 2009-07-06T17:11:22+00:00

The microsoft word doc is copied successfully - byte per byte. Copied file is same size as original. However, when I try and use microsoft word to open the copied file, the copied file contents which have non-ascii text are not readable. The ascii text is readable. So the copied file is worthless to the end user if he/she cannot see the non-ascii parts. So I want to be able to copy the file AND open it and read it successfully using ms-word.

Zip up the file you are trying to copy and post it so that I can test it. The doc file I tested is readable by MS-Word as expected, and it contains quite a bit of graphics and charts, so there is no reason that program does not work with any document.

mvmalderen 2,072 Postaholic · Answer 15 · 2009-07-06T17:37:53+00:00

I can confirm AD's code works correctly.
(Also tested it on a couple of Word files, playing a bit with the formatting)
The file is loading correctly after copying.

To the OP:
Ensure that you're copying a file which isn't corrupted, before copying you should check whether the file you want to copy loads correctly in MS Word, otherwise you've already missed the boat before the copying process starts.

shealy -4 Newbie Poster · Answer 16 · 2009-07-06T18:10:43+00:00

Attached is the input file - Input.doc and the output file that is created - Output.doc. As you can see outfile looks very different to the inputfile. I used the code that you provided to test this.

When I actually tried it I had the same problem. I used a command prompt and found out that the two files were just a few bytes different.

well, the code I posted almost works. The problem is that the last few bytes does not get read/written

int main(int argc, char* argv[])
{
    char iobuf[255];
    size_t total = 0;
    size_t sz = 0;
    ifstream fin("file1.doc", ios::binary);
    if( !fin.is_open() )
    {
        cout << "Can't open the file\n";
        return 1;
    }
    ofstream fout( "copy.doc", ios::binary);
    while( fin.read(iobuf, sizeof(iobuf) ))
    {
        sz = fin.gcount();
        total += sz;
        fout.write(iobuf, sz);
        sz = 0;
    }
    sz = fin.gcount();
    if( sz > 0)
    {
        cout << "sz = " << sz << "\n";
        total += sz;
        fout.write(iobuf, sz);
    }
    fin.close();
    fout.close();
    cout << "Total = " << total << "\n";
	return 0;
}

shealy -4 Newbie Poster · Answer 17 · 2009-07-06T18:18:33+00:00

File is def not corrupted before copying. FYI am using a C++ binary on 2.8 sun solaris operating system. Run the binary and binary creates Output.doc from Input.doc. Output.doc is then ftp'd to desktop where I use miscrosoft word to open it.

Thanks for the help so far.

I can confirm AD's code works correctly.
(Also tested it on a couple of Word files, playing a bit with the formatting)
The file is loading correctly after copying.
To the OP:
Ensure that you're copying a file which isn't corrupted, before copying you should check whether the file you want to copy loads correctly in MS Word, otherwise you've already missed the boat before the copying process starts.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 18 · 2009-07-06T18:21:52+00:00

There might be ftp problem. And I don't know what will happen if you try to copy MS-World doc file on your solaris operating system.

shealy -4 Newbie Poster · Answer 19 · 2009-07-06T18:29:07+00:00

if I ftp any work doc in binary format or asicc format and view the resulting file they are all fine. ftping the copied file that my binary creates does not work fine. So not too sure it's an ftp issue.

What o/s are you using to copy file?

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 20 · 2009-07-06T18:29:17+00:00

Oh, I just saw the attachments. Downloaded input.doc and my program copied it correctly, as expected. The output.doc file you attached is unreadable for me too, so the problem is either in the ftp or the solaris operating system. My guess is the ftp program is corrupting the output file.

shealy -4 Newbie Poster · Answer 21 · 2009-07-07T02:34:42+00:00

I wrote two programs - one with an fstream read and write and the other has the code provided in the example above. Both produced the unreadable doc.

I really don't think it's anything to do with ftp. I can ftp any word doc from the server to the desktop and vice-versa in binary mode without any issues - the ftp'd file which has graphics, fonts etc can be opened successfully.

The original input file was ftp'd from the desktop on to the solaris server in binary format.

I am using Microsoft Vista Home Premium and VC++ 2008 Express compiler. No FPT involved. Are you using the code I posted or something you wrote ?
Where did the input.doc file come from that you posted? Was it also FTPd to Windows machine before you posted it here? Or did you post it directly from a browser running on Solaris os ?

shealy -4 Newbie Poster · Answer 22 · 2009-07-07T02:54:06+00:00

To eliminate ftp as an issue. I ftp'd a doc from desktop to server. ftp'd the doc back to desktop to different location and doc opened successfully.

Copying a doc on server and then ftping is also successful when opening using ms-word.

Does this just leave us with sun solaris?

shealy -4 Newbie Poster · Answer 23 · 2009-07-08T02:11:48+00:00

Yes - see previous response - I did the unix copy command, ftp-d to windows and opened with ms-word successfully.

did you try my previous suggestion yet?

shealy -4 Newbie Poster · Answer 24 · 2009-07-08T03:03:05+00:00

Solved it. The issue is with the read() and write() methods. Using get() and put() works successfully.

Thanks to all who helped.

did you try my previous suggestion yet?

Copying a microsoft word doc

Recommended Answers Collapse Answers

All 30 Replies

Recommended Answers