943,754 Members | Top Members by Rank

Ad:
  • Java Discussion Thread
  • Unsolved
  • Views: 1644
  • Java RSS
Oct 20th, 2008
0

class to read any kind of file format e.g .doc,.pdf,.txt

Expand Post »
HI
my package reads file from txt format only. it does not read .pdf or .doc file. is there any single java library that opens any kind of file format as stream and then reads or manipulates its contents.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
ndumbo is offline Offline
4 posts
since Oct 2008
Oct 21st, 2008
0

Re: class to read any kind of file format e.g .doc,.pdf,.txt

I doubt it. especially since .pdf is intended as a formatted output, not as input
Reputation Points: 919
Solved Threads: 354
Nearly a Posting Maven
stultuske is offline Offline
2,487 posts
since Jan 2007
Oct 21st, 2008
0

Re: class to read any kind of file format e.g .doc,.pdf,.txt

Click to Expand / Collapse  Quote originally posted by ndumbo ...
HI
my package reads file from txt format only. it does not read .pdf or .doc file. is there any single java library that opens any kind of file format as stream and then reads or manipulates its contents.
The literal answer is yes: http://java.sun.com/javase/6/docs/ap...e-summary.html

The actual answer to the question as you intend it:
No, and there never will be.
Any application could use anything it wanted for a file format and "manipulate its contents" depends completely on the nature of that application. You cannot expect to have a completely generic solution to an inherently non-generic problem.
Moderator
Featured Poster
Reputation Points: 3239
Solved Threads: 838
Posting Genius
Ezzaral is offline Offline
6,761 posts
since May 2007
Oct 21st, 2008
0

Re: class to read any kind of file format e.g .doc,.pdf,.txt

HI
Thank you two for your reply. if what you are saying is true, then "Does it mean that there cannot be a java application that reads any kind of file and count the frequency of any given word in it."
1) i mean we can not use java for trying to make an index of keywords?
2) URL and URLConnection classes of java given me the html contents of the web page, is there any way to get just the content like text etc of the web page.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
ndumbo is offline Offline
4 posts
since Oct 2008
Oct 22nd, 2008
0

Re: class to read any kind of file format e.g .doc,.pdf,.txt

What it means is that you need identify exactly what you want to read and use appropriate APIs for that content. There are libraries for working with .doc files, .pdf files, etc., but there is not a "reading anything I might happen to come across" library because that is a completely unrealistic expectation.
Moderator
Featured Poster
Reputation Points: 3239
Solved Threads: 838
Posting Genius
Ezzaral is offline Offline
6,761 posts
since May 2007
Oct 22nd, 2008
0

Re: class to read any kind of file format e.g .doc,.pdf,.txt

Click to Expand / Collapse  Quote originally posted by ndumbo ...
HI
Thank you two for your reply. if what you are saying is true, then "Does it mean that there cannot be a java application that reads any kind of file and count the frequency of any given word in it."
1) i mean we can not use java for trying to make an index of keywords?
2) URL and URLConnection classes of java given me the html contents of the web page, is there any way to get just the content like text etc of the web page.

There can be a java application that reads any kind of file and count the frequency of any given word in it. BUT, when the application is given that "any" kind of file, that application has to recognize the type, and treat them differently while parsing data from the files. Once data is parsed, they can be treated as same. An application dealing with "any kind of file" and a class dealing with "any kind of file" is NOT the same.
1. I think I already said the answer. Precisely, you can.
2. So are you saying, you want the data from html file without the tags? Them parse the html file using DOM or SAX, and iterate through the nodes (the tags), and get whatever data you want. Or another way is to using regular expression to remove the tags.
Reputation Points: 46
Solved Threads: 11
Junior Poster
orko is offline Offline
164 posts
since Apr 2006

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Java Forum Timeline: How do I round?
Next Thread in Java Forum Timeline: Best compiler to use java applet on online games?





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC