>>When I try to open the newly created file, the contents are not readable
Its because doc files are binary files, not text files. Those files contain a lot of formatting information, such as font, font color, font size, etc, that is only readable by MS-Word or similar compatible program.
Binary files have to be opened in binary mode ifstream fin(file, ios::binary); and use stream's read() method.
ifstream fin(file, ios::binary);
ofstream out("newfile.doc", ios::binary);
char iobuffer[255];
while( fin.read( iobuffer, sizeof(iobuffer) )
{
// do something with this block of data
size_t sz = fin.gcount();
out.write( iobuffer, sz);
}
Ancient Dragon
Retired & Loving It
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
>>Does one need to use microsoft apis to ensure that non-ascii chars are converted?
Huh? I didn't post anything specific to microsoft, only standard C++ stuff. Binary files have to be opened in binary mode using ios::binary option. If you don't do that then the destination file will be corrupt.
Ancient Dragon
Retired & Loving It
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
I missed this reply - sorry. I am treating the microsoft word doc in the C++ code as a binary doc and using fstreams to read/write the data. Then when I use microsoft word to open the newly copied file, the contents are not readable.
Is it possible to just read the contents of the microsoft doc file in binary form, write and open without doing any formatting of special chars?
That is exactly what the code snipped I posted will do. Its just standard file i/o operation, nothing special about it.
Ancient Dragon
Retired & Loving It
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
I once tried to do something like this with ONLY C++. Tried to take all the contents of a document and remake it with the same thing. The problem is though that there are some characters that may not show up (may not follow ascii char set) and their may be text that is not being retrieved. Make sure your getting all the text, so use a pointer:
#include <iostream>
#include <fstream>
using namespace std;
char* main(char* file)
{
char* contents;
char* buffer;
int numOfChars = 0;
char ch;
ifstream fin(file);
if(fin)
{
while(fin.get(ch))
{
buffer = new char[numOfChars+2];
for(int i = 0; i < numOfChars; i++)
buffer[i] = contents[i];
buffer[numOfChars] = ch;
buffer[numOfChars+1] = '\0';
delete contents;
contents = new char[++numOfChars+1];
for(int i = 0; i < numOfchars; i++)
contents[i] = buffer[i];
contents[numOfChars] = '\0';
delete buffer;
}
fin.close();
}
else
return "Error";
return contents;
}
Hey, why has no-one told him that it's <strong>int main()</strong> andnot char* main() or void main() or ... ??
tux4life
Nearly a Posting Maven
2,350 posts since Feb 2009
Reputation Points: 2,134
Solved Threads: 243
When I actually tried it I had the same problem. I used a command prompt and found out that the two files were just a few bytes different.
well, the code I posted almost works. The problem is that the last few bytes does not get read/written
int main(int argc, char* argv[])
{
char iobuf[255];
size_t total = 0;
size_t sz = 0;
ifstream fin("file1.doc", ios::binary);
if( !fin.is_open() )
{
cout << "Can't open the file\n";
return 1;
}
ofstream fout( "copy.doc", ios::binary);
while( fin.read(iobuf, sizeof(iobuf) ))
{
sz = fin.gcount();
total += sz;
fout.write(iobuf, sz);
sz = 0;
}
sz = fin.gcount();
if( sz > 0)
{
cout << "sz = " << sz << "\n";
total += sz;
fout.write(iobuf, sz);
}
fin.close();
fout.close();
cout << "Total = " << total << "\n";
return 0;
}
Ancient Dragon
Retired & Loving It
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
Hey AD, in your post ( #3 ) you forgot an ending bracket on this line:
while( fin.read( iobuffer, sizeof(iobuffer) )
:P
tux4life
Nearly a Posting Maven
2,350 posts since Feb 2009
Reputation Points: 2,134
Solved Threads: 243
Hey AD, in your post ( #3 ) you forgot a bracket on this line:
while( fin.read( iobuffer, sizeof(iobuffer) )
:P
Nobody is perfect :)
Ancient Dragon
Retired & Loving It
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
II have written code that can successfully read and write a microsoft word doc - that is if the word doc contains plain text only
That isn't a microsoft word doc, but a normal text file.. If there are any headings and different fonts used, these are not copied successfully. Which brings me back to my original question - when copying word docs does one need to manipulate the non-ascii chars? And how is this done?
See the code I already posted and which you quoted. If all you want to do is copy the file then the answer to your question is NO.
Ancient Dragon
Retired & Loving It
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
The microsoft word doc is copied successfully - byte per byte. Copied file is same size as original. However, when I try and use microsoft word to open the copied file, the copied file contents which have non-ascii text are not readable. The ascii text is readable. So the copied file is worthless to the end user if he/she cannot see the non-ascii parts. So I want to be able to copy the file AND open it and read it successfully using ms-word.
Zip up the file you are trying to copy and post it so that I can test it. The doc file I tested is readable by MS-Word as expected, and it contains quite a bit of graphics and charts, so there is no reason that program does not work with any document.
Ancient Dragon
Retired & Loving It
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
I can confirm AD's code works correctly.
(Also tested it on a couple of Word files, playing a bit with the formatting)
The file is loading correctly after copying.
To the OP:
Ensure that you're copying a file which isn't corrupted, before copying you should check whether the file you want to copy loads correctly in MS Word, otherwise you've already missed the boat before the copying process starts.
tux4life
Nearly a Posting Maven
2,350 posts since Feb 2009
Reputation Points: 2,134
Solved Threads: 243
There might be ftp problem. And I don't know what will happen if you try to copy MS-World doc file on your solaris operating system.
Ancient Dragon
Retired & Loving It
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343