Hello;
I am currently doing a project that involves the reading the data from a MS word document..... At present I am using the HWPFDocument library to read the text from the word document.... But I need a way to identify the Title of the document when retreiving the data..... I tried many ways and searched but of no result..... Can anybody please help regarding this I would greatly appreciate it....... Please be kind enough to give me a reply thank you in advance......

Recommended Answers

All 4 Replies

Isn't the title of word documents just the first sentence in the file?

Isn't the title of word documents just the first sentence in the file?

Yes it is.... But what about the documents without the title.... How can I determine whether the document has a title or not..... Please give me ur idea on this.... I would really appreciate it.. Thank u again

Who knows what the author regards as the title - there are so many different places to put it, with no obligation to use any of them.

There may be a document properties field called "Title".
There may be some text formatted with the style called "Title". But what if it's used more than once, what then?
It might be the first line of the document.
It might be the text in the largest / bold / underlined font (not necessarily the first line).
It might be the text which is preceded by the word "title:"
Or it might simply have nothing which could be reasonably be called a title.

If the first one isn't present, then you're basically reduced to guess-work in one way or another. In all these other cases, you need to look at the sample of documents you have, and come up with some kind of heuristic which gives good results most of the time (and an error message when it doesn't).

Who knows what the author regards as the title - there are so many different places to put it, with no obligation to use any of them.

There may be a document properties field called "Title".
There may be some text formatted with the style called "Title". But what if it's used more than once, what then?
It might be the first line of the document.
It might be the text in the largest / bold / underlined font (not necessarily the first line).
It might be the text which is preceded by the word "title:"
Or it might simply have nothing which could be reasonably be called a title.

If the first one isn't present, then you're basically reduced to guess-work in one way or another. In all these other cases, you need to look at the sample of documents you have, and come up with some kind of heuristic which gives good results most of the time (and an error message when it doesn't).

Yes that seems fine but isnt there a way that is specific to identify the title of a document like taking in some arguments and determinig it............ Please if there is some alternative please let me kno... And thank u Salem as Im working on a heuristic model of my own but not sure how far will it work... BCos still Im finding facts on that.... But if there is any alternative please let me kno ... Thank you....

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.