954,500 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

how to load .doc file into richTextBox

Hi guys,
How to load any other document formats other than .txt , and .rtf (which naturally supported by LoadFile() function), like .java, .c# , and .doc intorichTextBox object?????

any help appreciated. Thanks in advance.

MxDev
Junior Poster
141 posts since Sep 2007
Reputation Points: 8
Solved Threads: 3
 

you can read the contents of any file to a text box with the string reader class

using (System.IO.StreamReader sr = new System.IO.StreamReader("TestFile.txt"))
            {
                RichTextBox1.Text =  sr.ReadToEnd();
            }


but .doc files will not load as plain text because they are a special format created by Microsoft word. But if you have Microsoft word installed on the computer you are using your C# app on you can load up the Microsoft Word Object Library by adding it into the reference and create an object for reading .doc files.

but the streamreader will read most text formats.

Diamonddrake
Master Poster
724 posts since Mar 2008
Reputation Points: 442
Solved Threads: 89
 

Thanks for your reply, but I think there's no need to use the StreamReader class here because the richTextBox.LoadFile() support this inherently.

This method I think it won't load .doc file because it may contain some object and it won't read it correctly.

So it still unsolved.

Thanks again

MxDev
Junior Poster
141 posts since Sep 2007
Reputation Points: 8
Solved Threads: 3
 

I don't think you caught most of my post. the Stream reader class will let you load text that loadfile won't.

.DOC is a proprietary format!
this means to edit it you need access to the original software that it belongs to. microsoft allows you to interlop methods from its office suite in other applications.

I.E. If you have Microsoft office installed on the computer you run your c# application on, then you can open .doc files, Otherwise you cannot!

assuming you do have Microsoft office installed on your developer machine must first add a reference to Microsoft Word Object Library then just use the code

Word.ApplicationClass wordApp=new ApplicationClass();
//Word.ApplicationClass is to access the word application

object file=path;

object nullobj=System.Reflection.Missing.Value;  

Word.Document doc = wordApp.Documents.Open(

ref file, ref nullobj, ref nullobj,

                                      ref nullobj, ref nullobj, ref nullobj,

                                      ref nullobj, ref nullobj, ref nullobj,

                                      ref nullobj, ref nullobj, ref nullobj);

doc.ActiveWindow.Selection.WholeStory();

doc.ActiveWindow.Selection.Copy();

IDataObject data=Clipboard.GetDataObject();

txtFileContent.Text=data.GetData(DataFormats.Text).ToString();

doc.Close();


NOTE: .doc files are a zip compressed special formatted XML file, but getting it right manually would be virtually impossible because there are possible hundreds of special formatting commands. So even though technically you could decompress and parse the document files manually, you would spend months getting it to work right. Better to just target machines with MS Word

Diamonddrake
Master Poster
724 posts since Mar 2008
Reputation Points: 442
Solved Threads: 89
 

I had to do something similar about a week ago and did something very similar what Diamonddrake suggested. However this method looses all the formatting of the text that was on the doc file.

Is there any way you can preserve that formatting when its copied to the RichTextBox? If you manually select a text, copy and paste, the text pasted on the RichTextBox usually retains the formatting...
I think it would be something to do with the GetData(DataFormats.???) but not sure.

jatin24
Junior Poster in Training
75 posts since Aug 2009
Reputation Points: 31
Solved Threads: 21
 

Hi guys, How to load any other document formats other than .txt , and .rtf (which naturally supported by LoadFile() function), like .java, .c# , and .doc intorichTextBox object?????

any help appreciated. Thanks in advance.

Well you can use this on button click event:

OpenFileDialog f = new OpenFileDialog();
            f.Title = "open file as..";
            f.Filter = "Doc Files|*.doc|Java Files|*.java|C# Files|*.cs|All Files|*.*"; // and in a similar way you can load any format here.......
            DialogResult dr = f.ShowDialog();
            if (dr == DialogResult.OK)
            {
                s1 = f.FileName;
                richTextBox1.LoadFile(s1);
                open=true;
            }

By the help of this you can load any other doument format files as well...............

avirag
Posting Whiz
313 posts since Jun 2009
Reputation Points: 31
Solved Threads: 36
 

Mark this thread as solved if it help you...........

avirag
Posting Whiz
313 posts since Jun 2009
Reputation Points: 31
Solved Threads: 36
 

Well you can use this on button click event:

OpenFileDialog f = new OpenFileDialog();
            f.Title = "open file as..";
            f.Filter = "Doc Files|*.doc|Java Files|*.java|C# Files|*.cs|All Files|*.*"; // and in a similar way you can load any format here.......
            DialogResult dr = f.ShowDialog();
            if (dr == DialogResult.OK)
            {
                s1 = f.FileName;
                richTextBox1.LoadFile(s1);
                open=true;
            }

By the help of this you can load any other doument format files as well...............


This will work for .cs files and other native text files, but .doc is a proprietary format. It WILL NOT WORK for microsoft word documents. (if you just name a text file file.doc it doesn't not make it a .doc file. it just appears that way, true .doc are zipped special XML files and the ritchtextbox class will not parse it.)

Diamonddrake
Master Poster
724 posts since Mar 2008
Reputation Points: 442
Solved Threads: 89
 

you can read the contents of any file to a text box with the string reader class

using (System.IO.StreamReader sr = new System.IO.StreamReader("TestFile.txt"))
            {
                RichTextBox1.Text =  sr.ReadToEnd();
            }

but .doc files will not load as plain text because they are a special format created by Microsoft word. But if you have Microsoft word installed on the computer you are using your C# app on you can load up the Microsoft Word Object Library by adding it into the reference and create an object for reading .doc files.

but the streamreader will read most text formats.

Is this true? Look at this picture:
[img]http://www.file.si/files/g0f81hvm4fua5fiad6h1.jpg[/img]

Here doesn`t look that I can get any text out of .doc file.

Mitja Bonca
Nearly a Posting Maven
2,485 posts since May 2009
Reputation Points: 641
Solved Threads: 474
 

as mentioned before, .doc files are ZIP compressed xml files. so if you read the data of a .doc file using the code I posted it will not show you the text it will show you the result of the binary compression expressed as ascii characters.

Sorry. the point of all my posts was to explain that word is a special format that requires an office interlop to read.

Diamonddrake
Master Poster
724 posts since Mar 2008
Reputation Points: 442
Solved Threads: 89
 

as mentioned before, .doc files are ZIP compressed xml files. so if you read the data of a .doc file using the code I posted it will not show you the text it will show you the result of the binary compression expressed as ascii characters.

Sorry. the point of all my posts was to explain that word is a special format that requires an office interlop to read.

That means, as you said in one of your post, that I need a new reference of "Microsoft Word Object Library", right? http://www.vbforums.com/showpost.php?p=3114899&postcount=8 - this one?!
I was trying to do that refernece, but I got an error on Word refernece - a yellow exclamation mark. As I read this happens if I don`t have the SP3 installed, or someting. Right?

EDIT: I got the file which salves that problem. Word reference is no long in a yellow exclamation mark. So how do I got that reserved word "Word", with this: "using Microsoft.Office.Interop.Word;" ? With this I only got "Words".

Mitja Bonca
Nearly a Posting Maven
2,485 posts since May 2009
Reputation Points: 641
Solved Threads: 474
 

Where do you get:

Word.ApplicationClass wordApp = new Word.ApplicationClass();

I can only have:

ApplicationClass wordApp = new Word.ApplicationClass();
Mitja Bonca
Nearly a Posting Maven
2,485 posts since May 2009
Reputation Points: 641
Solved Threads: 474
 
object file = myFullPath;
                object nullobj = System.Reflection.Missing.Value;
                ApplicationClass wordApp = new ApplicationClass();
                Document doc = wordApp.Documents.Open(
                    ref file, ref nullobj, ref nullobj,
                    ref nullobj, ref nullobj, ref nullobj,
                    ref nullobj, ref nullobj, ref nullobj,
                    ref nullobj, ref nullobj, ref nullobj);
                
                doc.ActiveWindow.Selection.WholeStory();
                doc.ActiveWindow.Selection.Copy();
                IDataObject data = Clipboard.GetDataObject();
                String myGetString = data.GetData(DataFormats.Text).ToString();
                //doc.Close();


I would like to know what are those ref file, ref nullobj for?

Mitja Bonca
Nearly a Posting Maven
2,485 posts since May 2009
Reputation Points: 641
Solved Threads: 474
 

Essentially, the method called in that library expects some object references from the MSWord application. to use the method you have to match that methods params, since you don't have those values, you have to pass null values.

the first param needs to be a reference to a string with the value of the file path, the nullobj is just as it says, its an object that is null. the method needs an object reference, so if you just pass null it will fail, so you create a null object and pass that.

the ApplicationClass creates an object that is the one word uses to create the editing window, so now you have this object that you can grab the text from. That's how it works.

if you are wondering how we know what values to pass, that would be microsoft word documentation. since its not a managed dll it doesn't share that information. When microsoft created the com object responsible they also wrote documentation for it. The original author of the code above got it from there. but that wasn't me.

Diamonddrake
Master Poster
724 posts since Mar 2008
Reputation Points: 442
Solved Threads: 89
 

If nothing else rename your document to a .zip, open it, extract the xml files, and you can extract the information with the XML classes. However using the MSWord references should be much easier...

papanyquiL
Junior Poster
168 posts since May 2009
Reputation Points: 55
Solved Threads: 18
 
If nothing else rename your document to a .zip, open it, extract the xml files, and you can extract the information with the XML classes. However using the MSWord references should be much easier...


I did manage to work it out. It works great. Thx guys.

Mitja Bonca
Nearly a Posting Maven
2,485 posts since May 2009
Reputation Points: 641
Solved Threads: 474
 

Glad to hear you got it!

Diamonddrake
Master Poster
724 posts since Mar 2008
Reputation Points: 442
Solved Threads: 89
 

Mitja Bonca,
If your problem is fix now, so mark this thread as solved now..

avirag
Posting Whiz
313 posts since Jun 2009
Reputation Points: 31
Solved Threads: 36
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You