Hi, Guys.

I have a issue reading a word document, well i can actually read the document but i need some how split some data into array, now the data that i want to split is into a table in word, is there an easy way to split that data into array from the table in word?. Because my big problem is that somehow when i read it, i get different result structure, what i mean is that sometimes i receive every single row in lines and others just as it is in columns. in summary my result from word it is not always the same so i cannot play with it. what i know is that the data i want is into a table.

I know office 2003 is saved in binary but 2007 saved in ooxml so is make any easy to work that way, i do not have any problem to convert all files into 2007 and work with that.

I hope you guys understand what my issue is thanks.

Recommended Answers

All 8 Replies

Thanks Serkan.
Base on your link, I came out with this code, which can read the table as I want, thanks again.

Microsoft.Office.Interop.Word.Application app = new Microsoft.Office.Interop.Word.ApplicationClass();
            app.Visible = false;
            object nullobj = System.Reflection.Missing.Value;
            object file = path;
            Microsoft.Office.Interop.Word.Document doc = app.Documents.Open(
            ref file, ref nullobj, ref nullobj,
            ref nullobj, ref nullobj, ref nullobj,
            ref nullobj, ref nullobj, ref nullobj,
            ref nullobj, ref nullobj, ref nullobj,
            ref nullobj, ref nullobj, ref nullobj, ref nullobj);
            Microsoft.Office.Interop.Word.Table tbl = doc.Tables[1];
            doc.ActiveWindow.Selection.WholeStory();
            doc.ActiveWindow.Selection.Copy();
            IDataObject data = Clipboard.GetDataObject();
            string text = data.GetData(DataFormats.Text).ToString();
            //with this I go for every single row, and my case the tables always have 5 columns.
            [B] for (int i = 1; i < tbl.Rows.Count; i++)
            {
                for (int a = 1; a < 5; a++)
                {
                    textBox1.Text += tbl.Cell(i, a).Range.Text + Environment.NewLine;
                }
            }[/B]

But now, I do not know why if I open the document with Microsoft Word I can see there is a table, but the programming not recognize the table, this just happen in some word documents not all. the things is that i have thousands of word documents that i have to go thru and get the info that i want.

anyway base on my first problem you did answer my question, but i would like to know any opinion about my second issue.

regards, thanks again.

if you are searching for a string in some word documents, i think it is better to treat them as text files instead of word documents. open them as text files and search the text you are looking for, if you find the text, then open the document as word document to get the info you want. this way your search will be a lot faster.

commented: Excellent poster. +1

Yes, I understand your point, but the problem is that I do not have any specific parameter to search into the document, the data could be any, like a I said before the only thing I know is that the data i want is into table, but now the documents have different structure. what i thinking to do is, go for every single word document and verify if it have a table and then get my data and move that document to another location and see how many word document i have left. and pass the data manually to the database.

Thanks again for your help.

so you dont know what you are searching for? how are you going to search then?

Well what I mean is that I do not have like a specific string to search all the documents has data at least the header, now what i know is that column 1 = Drawing Revision for example in the table column2 = Paper Revision and so on. that is why I am interesting to go over every single row and column because for me it is easier to retrieve the values, sorry if I am not explaining this issue correctly but that is it, for my bad lucky there is no consistency in the word documents make the automatic read kind of difficult.
Thanks.

to your second question all i can think of is that there could be some hidden tables in the document, so when you get the table collection with indexing, you may get the wrong one, that may be why although you see them visually, from the code you access the empty ones. make sure that your document only has that particular table , not empty ones or invisible ones.

Ok, thanks for your help.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.