Need Help On Reading the Title OF a Word Document Using Java

Question

Zaad -9 Junior Poster in Training

15 Years Ago

Hello;
I am currently doing a project that involves the reading the data from a MS word document..... At present I am using the HWPFDocument library to read the text from the word document.... But I need a way to identify the Title of the document when retreiving the data..... I tried many ways and searched but of no result..... Can anybody please help regarding this I would greatly appreciate it....... Please be kind enough to give me a reply thank you in advance......

java

3 Contributors
4 Replies
307 Views
1 Day Discussion Span
Latest Post 15 Years Ago Latest Post by Zaad

All 4 Replies

Phaelax 52 Practically a Posting Shark

15 Years Ago

Isn't the title of word documents just the first sentence in the file?

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Zaad -9 Junior Poster in Training · Answer 1 · 2010-01-18T00:51:01+00:00

Isn't the title of word documents just the first sentence in the file?

Yes it is.... But what about the documents without the title.... How can I determine whether the document has a title or not..... Please give me ur idea on this.... I would really appreciate it.. Thank u again

Salem 5,265 Posting Sage · Answer 2 · 2010-01-18T01:07:03+00:00

Who knows what the author regards as the title - there are so many different places to put it, with no obligation to use any of them.

There may be a document properties field called "Title".
There may be some text formatted with the style called "Title". But what if it's used more than once, what then?
It might be the first line of the document.
It might be the text in the largest / bold / underlined font (not necessarily the first line).
It might be the text which is preceded by the word "title:"
Or it might simply have nothing which could be reasonably be called a title.

If the first one isn't present, then you're basically reduced to guess-work in one way or another. In all these other cases, you need to look at the sample of documents you have, and come up with some kind of heuristic which gives good results most of the time (and an error message when it doesn't).

Zaad -9 Junior Poster in Training · Answer 3 · 2010-01-18T15:18:11+00:00

Who knows what the author regards as the title - there are so many different places to put it, with no obligation to use any of them.
There may be a document properties field called "Title".
There may be some text formatted with the style called "Title". But what if it's used more than once, what then?
It might be the first line of the document.
It might be the text in the largest / bold / underlined font (not necessarily the first line).
It might be the text which is preceded by the word "title:"
Or it might simply have nothing which could be reasonably be called a title.
If the first one isn't present, then you're basically reduced to guess-work in one way or another. In all these other cases, you need to look at the sample of documents you have, and come up with some kind of heuristic which gives good results most of the time (and an error message when it doesn't).

Yes that seems fine but isnt there a way that is specific to identify the title of a document like taking in some arguments and determinig it............ Please if there is some alternative please let me kno... And thank u Salem as Im working on a heuristic model of my own but not sure how far will it work... BCos still Im finding facts on that.... But if there is any alternative please let me kno ... Thank you....

Need Help On Reading the Title OF a Word Document Using Java

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers