class to read any kind of file format e.g .doc,.pdf,.txt

Question

ndumbo 0 Newbie Poster

15 Years Ago

HI
my package reads file from txt format only. it does not read .pdf or .doc file. is there any single java library that opens any kind of file format as stream and then reads or manipulates its contents.

java

4 Contributors
5 Replies
603 Views
2 Days Discussion Span
Latest Post 15 Years Ago Latest Post by orko

All 5 Replies

stultuske 1,116 Posting Maven

15 Years Ago

I doubt it. especially since .pdf is intended as a formatted output, not as input

Ezzaral 2,714 Posting Sage

15 Years Ago

HI
my package reads file from txt format only. it does not read .pdf or .doc file. is there any single java library that opens any kind of file format as stream and then reads or manipulates its contents.

The literal answer is yes: http://java.sun.com/javase/6/docs/api/java/io/package-summary.html

The actual answer to the question as you intend it:
No, and there never will be.
Any application could use anything it wanted for a file format and "manipulate its contents" depends completely on the nature of that application. You cannot expect to have a completely generic solution to an inherently non-generic problem.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

ndumbo 0 Newbie Poster · Answer 1 · 2008-10-22T06:57:50+00:00

HI
Thank you two for your reply. if what you are saying is true, then "Does it mean that there cannot be a java application that reads any kind of file and count the frequency of any given word in it."
1) i mean we can not use java for trying to make an index of keywords?
2) URL and URLConnection classes of java given me the html contents of the web page, is there any way to get just the content like text etc of the web page.

Ezzaral 2,714 Posting Sage Team Colleague Featured Poster · Answer 2 · 2008-10-22T21:37:42+00:00

What it means is that you need identify exactly what you want to read and use appropriate APIs for that content. There are libraries for working with .doc files, .pdf files, etc., but there is not a "reading anything I might happen to come across" library because that is a completely unrealistic expectation.

orko 36 Junior Poster · Answer 3 · 2008-10-23T03:04:21+00:00

HI
Thank you two for your reply. if what you are saying is true, then "Does it mean that there cannot be a java application that reads any kind of file and count the frequency of any given word in it."
1) i mean we can not use java for trying to make an index of keywords?
2) URL and URLConnection classes of java given me the html contents of the web page, is there any way to get just the content like text etc of the web page.

There can be a java application that reads any kind of file and count the frequency of any given word in it. BUT, when the application is given that "any" kind of file, that application has to recognize the type, and treat them differently while parsing data from the files. Once data is parsed, they can be treated as same. An application dealing with "any kind of file" and a class dealing with "any kind of file" is NOT the same.
1. I think I already said the answer. Precisely, you can.
2. So are you saying, you want the data from html file without the tags? Them parse the html file using DOM or SAX, and iterate through the nodes (the tags), and get whatever data you want. Or another way is to using regular expression to remove the tags.

class to read any kind of file format e.g .doc,.pdf,.txt

Recommended Answers Collapse Answers

All 5 Replies

Recommended Answers