HI
my package reads file from txt format only. it does not read .pdf or .doc file. is there any single java library that opens any kind of file format as stream and then reads or manipulates its contents.

Recommended Answers

All 5 Replies

I doubt it. especially since .pdf is intended as a formatted output, not as input

HI
my package reads file from txt format only. it does not read .pdf or .doc file. is there any single java library that opens any kind of file format as stream and then reads or manipulates its contents.

The literal answer is yes: http://java.sun.com/javase/6/docs/api/java/io/package-summary.html

The actual answer to the question as you intend it:
No, and there never will be.
Any application could use anything it wanted for a file format and "manipulate its contents" depends completely on the nature of that application. You cannot expect to have a completely generic solution to an inherently non-generic problem.

HI
Thank you two for your reply. if what you are saying is true, then "Does it mean that there cannot be a java application that reads any kind of file and count the frequency of any given word in it."
1) i mean we can not use java for trying to make an index of keywords?
2) URL and URLConnection classes of java given me the html contents of the web page, is there any way to get just the content like text etc of the web page.

What it means is that you need identify exactly what you want to read and use appropriate APIs for that content. There are libraries for working with .doc files, .pdf files, etc., but there is not a "reading anything I might happen to come across" library because that is a completely unrealistic expectation.

HI
Thank you two for your reply. if what you are saying is true, then "Does it mean that there cannot be a java application that reads any kind of file and count the frequency of any given word in it."
1) i mean we can not use java for trying to make an index of keywords?
2) URL and URLConnection classes of java given me the html contents of the web page, is there any way to get just the content like text etc of the web page.

There can be a java application that reads any kind of file and count the frequency of any given word in it. BUT, when the application is given that "any" kind of file, that application has to recognize the type, and treat them differently while parsing data from the files. Once data is parsed, they can be treated as same. An application dealing with "any kind of file" and a class dealing with "any kind of file" is NOT the same.
1. I think I already said the answer. Precisely, you can.
2. So are you saying, you want the data from html file without the tags? Them parse the html file using DOM or SAX, and iterate through the nodes (the tags), and get whatever data you want. Or another way is to using regular expression to remove the tags.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.