I have 2 binaries - a java binary that requests a microsoft word doc from a c++ binary. The C++ binary opens the word doc in binary mode, reads x no of chars and returns chars to java binary. Java binary eventually receives all data and writes data using filestream write. When I try to open the newly created file, the contents are not readable. The size of the newly created file is the exact same size as the original file that is read by the C++ server.

Should the java and C++ binaries try and manipulate the microsoft word line feeds etc?

Recommended Answers

All 30 Replies

I once tried to do something like this with ONLY C++. Tried to take all the contents of a document and remake it with the same thing. The problem is though that there are some characters that may not show up (may not follow ascii char set) and their may be text that is not being retrieved. Make sure your getting all the text, so use a pointer:

#include <iostream>
#include <fstream>

using namespace std;

char* main(char* file)
{
    char* contents;
    char* buffer;
    int numOfChars = 0;
    char ch;
    ifstream fin(file);
    if(fin)
    {
        while(fin.get(ch))
        {
            buffer = new char[numOfChars+2];
            for(int i = 0; i < numOfChars; i++)
                buffer[i] = contents[i];
            buffer[numOfChars] = ch;
            buffer[numOfChars+1] = '\0';
            delete contents;
            contents = new char[++numOfChars+1];
            for(int i = 0; i < numOfchars; i++)
                contents[i] = buffer[i];
            contents[numOfChars] = '\0';
            delete buffer;
        }
        fin.close();
    }
    else
        return "Error";
    return contents;
}

>>When I try to open the newly created file, the contents are not readable
Its because doc files are binary files, not text files. Those files contain a lot of formatting information, such as font, font color, font size, etc, that is only readable by MS-Word or similar compatible program.

Binary files have to be opened in binary mode ifstream fin(file, ios::binary); and use stream's read() method.

ifstream fin(file, ios::binary);
ofstream out("newfile.doc", ios::binary);
char iobuffer[255];
while( fin.read( iobuffer, sizeof(iobuffer) )
{
    // do something with this block of data
    size_t sz = fin.gcount();
    out.write( iobuffer, sz);
}

I'm pretty sure that the text is being copied correctly insofar as one can using C++ filestream reads/writes and buffers. File sizes are the same also. Does one need to use microsoft apis to ensure that non-ascii chars are converted?

I once tried to do something like this with ONLY C++. Tried to take all the contents of a document and remake it with the same thing. The problem is though that there are some characters that may not show up (may not follow ascii char set) and their may be text that is not being retrieved. Make sure your getting all the text, so use a pointer:

#include <iostream>
#include <fstream>

using namespace std;

char* main(char* file)
{
    char* contents;
    char* buffer;
    int numOfChars = 0;
    char ch;
    ifstream fin(file);
    if(fin)
    {
        while(fin.get(ch))
        {
            buffer = new char[numOfChars+2];
            for(int i = 0; i < numOfChars; i++)
                buffer[i] = contents[i];
            buffer[numOfChars] = ch;
            buffer[numOfChars+1] = '\0';
            delete contents;
            contents = new char[++numOfChars+1];
            for(int i = 0; i < numOfchars; i++)
                contents[i] = buffer[i];
            contents[numOfChars] = '\0';
            delete buffer;
        }
        fin.close();
    }
    else
        return "Error";
    return contents;
}

I missed this reply - sorry. I am treating the microsoft word doc in the C++ code as a binary doc and using fstreams to read/write the data. Then when I use microsoft word to open the newly copied file, the contents are not readable.
Is it possible to just read the contents of the microsoft doc file in binary form, write and open without doing any formatting of special chars?

>>Does one need to use microsoft apis to ensure that non-ascii chars are converted?

Huh? I didn't post anything specific to microsoft, only standard C++ stuff. Binary files have to be opened in binary mode using ios::binary option. If you don't do that then the destination file will be corrupt.

I missed this reply - sorry. I am treating the microsoft word doc in the C++ code as a binary doc and using fstreams to read/write the data. Then when I use microsoft word to open the newly copied file, the contents are not readable.
Is it possible to just read the contents of the microsoft doc file in binary form, write and open without doing any formatting of special chars?

That is exactly what the code snipped I posted will do. Its just standard file i/o operation, nothing special about it.

To be clear - I am using fstreams and read and opening the microsoft word doc in binary mode. Ditto with the newly created file that gets the contents of the word doc. All this is done using C++ code. When I try to view the newly created doc with ms-word, the contents are not readable.

I once tried to do something like this with ONLY C++. Tried to take all the contents of a document and remake it with the same thing. The problem is though that there are some characters that may not show up (may not follow ascii char set) and their may be text that is not being retrieved. Make sure your getting all the text, so use a pointer:

#include <iostream>
#include <fstream>

using namespace std;

char* main(char* file)
{
    char* contents;
    char* buffer;
    int numOfChars = 0;
    char ch;
    ifstream fin(file);
    if(fin)
    {
        while(fin.get(ch))
        {
            buffer = new char[numOfChars+2];
            for(int i = 0; i < numOfChars; i++)
                buffer[i] = contents[i];
            buffer[numOfChars] = ch;
            buffer[numOfChars+1] = '\0';
            delete contents;
            contents = new char[++numOfChars+1];
            for(int i = 0; i < numOfchars; i++)
                contents[i] = buffer[i];
            contents[numOfChars] = '\0';
            delete buffer;
        }
        fin.close();
    }
    else
        return "Error";
    return contents;
}

Hey, why has no-one told him that it's [B]int main()[/B] and not char* main() or void main() or ... ??

When I actually tried it I had the same problem. I used a command prompt and found out that the two files were just a few bytes different.

well, the code I posted almost works. The problem is that the last few bytes does not get read/written

int main(int argc, char* argv[])
{
    char iobuf[255];
    size_t total = 0;
    size_t sz = 0;
    ifstream fin("file1.doc", ios::binary);
    if( !fin.is_open() )
    {
        cout << "Can't open the file\n";
        return 1;
    }
    ofstream fout( "copy.doc", ios::binary);
    while( fin.read(iobuf, sizeof(iobuf) ))
    {
        sz = fin.gcount();
        total += sz;
        fout.write(iobuf, sz);
        sz = 0;
    }
    sz = fin.gcount();
    if( sz > 0)
    {
        cout << "sz = " << sz << "\n";
        total += sz;
        fout.write(iobuf, sz);
    }
    fin.close();
    fout.close();
    cout << "Total = " << total << "\n";
	return 0;
}
commented: Good code :) +11

Hey AD, in your post (#3) you forgot an ending bracket on this line: while( fin.read( iobuffer, sizeof(iobuffer) ) :P

Hey AD, in your post (#3) you forgot a bracket on this line: while( fin.read( iobuffer, sizeof(iobuffer) ) :P

Nobody is perfect :)

II have written code that can successfully read and write a microsoft word doc - that is if the word doc contains plain text only. If there are any headings and different fonts used, these are not copied successfully. Which brings me back to my original question - when copying word docs does one need to manipulate the non-ascii chars? And how is this done?


When I actually tried it I had the same problem. I used a command prompt and found out that the two files were just a few bytes different.

well, the code I posted almost works. The problem is that the last few bytes does not get read/written

int main(int argc, char* argv[])
{
    char iobuf[255];
    size_t total = 0;
    size_t sz = 0;
    ifstream fin("file1.doc", ios::binary);
    if( !fin.is_open() )
    {
        cout << "Can't open the file\n";
        return 1;
    }
    ofstream fout( "copy.doc", ios::binary);
    while( fin.read(iobuf, sizeof(iobuf) ))
    {
        sz = fin.gcount();
        total += sz;
        fout.write(iobuf, sz);
        sz = 0;
    }
    sz = fin.gcount();
    if( sz > 0)
    {
        cout << "sz = " << sz << "\n";
        total += sz;
        fout.write(iobuf, sz);
    }
    fin.close();
    fout.close();
    cout << "Total = " << total << "\n";
	return 0;
}

II have written code that can successfully read and write a microsoft word doc - that is if the word doc contains plain text only

That isn't a microsoft word doc, but a normal text file.

. If there are any headings and different fonts used, these are not copied successfully. Which brings me back to my original question - when copying word docs does one need to manipulate the non-ascii chars? And how is this done?

See the code I already posted and which you quoted. If all you want to do is copy the file then the answer to your question is NO.

The microsoft word doc is copied successfully - byte per byte. Copied file is same size as original. However, when I try and use microsoft word to open the copied file, the copied file contents which have non-ascii text are not readable. The ascii text is readable. So the copied file is worthless to the end user if he/she cannot see the non-ascii parts. So I want to be able to copy the file AND open it and read it successfully using ms-word.

That isn't a microsoft word doc, but a normal text file.

See the code I already posted and which you quoted. If all you want to do is copy the file then the answer to your question is NO.

The microsoft word doc is copied successfully - byte per byte. Copied file is same size as original. However, when I try and use microsoft word to open the copied file, the copied file contents which have non-ascii text are not readable. The ascii text is readable. So the copied file is worthless to the end user if he/she cannot see the non-ascii parts. So I want to be able to copy the file AND open it and read it successfully using ms-word.

Zip up the file you are trying to copy and post it so that I can test it. The doc file I tested is readable by MS-Word as expected, and it contains quite a bit of graphics and charts, so there is no reason that program does not work with any document.

I can confirm AD's code works correctly.
(Also tested it on a couple of Word files, playing a bit with the formatting)
The file is loading correctly after copying.

To the OP:
Ensure that you're copying a file which isn't corrupted, before copying you should check whether the file you want to copy loads correctly in MS Word, otherwise you've already missed the boat before the copying process starts.

Attached is the input file - Input.doc and the output file that is created - Output.doc. As you can see outfile looks very different to the inputfile. I used the code that you provided to test this.

When I actually tried it I had the same problem. I used a command prompt and found out that the two files were just a few bytes different.

well, the code I posted almost works. The problem is that the last few bytes does not get read/written

int main(int argc, char* argv[])
{
    char iobuf[255];
    size_t total = 0;
    size_t sz = 0;
    ifstream fin("file1.doc", ios::binary);
    if( !fin.is_open() )
    {
        cout << "Can't open the file\n";
        return 1;
    }
    ofstream fout( "copy.doc", ios::binary);
    while( fin.read(iobuf, sizeof(iobuf) ))
    {
        sz = fin.gcount();
        total += sz;
        fout.write(iobuf, sz);
        sz = 0;
    }
    sz = fin.gcount();
    if( sz > 0)
    {
        cout << "sz = " << sz << "\n";
        total += sz;
        fout.write(iobuf, sz);
    }
    fin.close();
    fout.close();
    cout << "Total = " << total << "\n";
	return 0;
}

File is def not corrupted before copying. FYI am using a C++ binary on 2.8 sun solaris operating system. Run the binary and binary creates Output.doc from Input.doc. Output.doc is then ftp'd to desktop where I use miscrosoft word to open it.

Thanks for the help so far.

I can confirm AD's code works correctly.
(Also tested it on a couple of Word files, playing a bit with the formatting)
The file is loading correctly after copying.

To the OP:
Ensure that you're copying a file which isn't corrupted, before copying you should check whether the file you want to copy loads correctly in MS Word, otherwise you've already missed the boat before the copying process starts.

There might be ftp problem. And I don't know what will happen if you try to copy MS-World doc file on your solaris operating system.

if I ftp any work doc in binary format or asicc format and view the resulting file they are all fine. ftping the copied file that my binary creates does not work fine. So not too sure it's an ftp issue.

What o/s are you using to copy file?

Oh, I just saw the attachments. Downloaded input.doc and my program copied it correctly, as expected. The output.doc file you attached is unreadable for me too, so the problem is either in the ftp or the solaris operating system. My guess is the ftp program is corrupting the output file.

if I ftp any work doc in binary format or asicc format and view the resulting file they are all fine. ftping the copied file that my binary creates does not work fine. So not too sure it's an ftp issue.

What o/s are you using to copy file?

I am using Microsoft Vista Home Premium and VC++ 2008 Express compiler. No FPT involved. Are you using the code I posted or something you wrote ?

Where did the input.doc file come from that you posted? Was it also FTPd to Windows machine before you posted it here? Or did you post it directly from a browser running on Solaris os ?

I agree with AD that the FTP is probably what is going wrong.

FTP defaults to text mode. You must explicitly set it to binary mode before copying files.

commented: Yes. +12

I wrote two programs - one with an fstream read and write and the other has the code provided in the example above. Both produced the unreadable doc.

I really don't think it's anything to do with ftp. I can ftp any word doc from the server to the desktop and vice-versa in binary mode without any issues - the ftp'd file which has graphics, fonts etc can be opened successfully.

The original input file was ftp'd from the desktop on to the solaris server in binary format.

I am using Microsoft Vista Home Premium and VC++ 2008 Express compiler. No FPT involved. Are you using the code I posted or something you wrote ?

Where did the input.doc file come from that you posted? Was it also FTPd to Windows machine before you posted it here? Or did you post it directly from a browser running on Solaris os ?

The problem could also be one of Endianness. The byte order on windows and *nix computers are reversed. Maybe your os is writing the bytes out in reverse order during the copy process. To test that use your system's command-line copy function to make the copy, FTP to Windows and check it with MS-Word.

To eliminate ftp as an issue. I ftp'd a doc from desktop to server. ftp'd the doc back to desktop to different location and doc opened successfully.

Copying a doc on server and then ftping is also successful when opening using ms-word.

Does this just leave us with sun solaris?

did you try my previous suggestion yet?

Yes - see previous response - I did the unix copy command, ftp-d to windows and opened with ms-word successfully.

did you try my previous suggestion yet?

Solved it. The issue is with the read() and write() methods. Using get() and put() works successfully.

Thanks to all who helped.

did you try my previous suggestion yet?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.