class to read any kind of file format e.g .doc,.pdf,.txt

Reply

Join Date: Oct 2008
Posts: 4
Reputation: ndumbo is an unknown quantity at this point 
Solved Threads: 0
ndumbo ndumbo is offline Offline
Newbie Poster

class to read any kind of file format e.g .doc,.pdf,.txt

 
0
  #1
Oct 20th, 2008
HI
my package reads file from txt format only. it does not read .pdf or .doc file. is there any single java library that opens any kind of file format as stream and then reads or manipulates its contents.
Reply With Quote Quick reply to this message  
Join Date: Jan 2007
Posts: 706
Reputation: stultuske is a jewel in the rough stultuske is a jewel in the rough stultuske is a jewel in the rough 
Solved Threads: 84
stultuske's Avatar
stultuske stultuske is offline Offline
Master Poster

Re: class to read any kind of file format e.g .doc,.pdf,.txt

 
0
  #2
Oct 21st, 2008
I doubt it. especially since .pdf is intended as a formatted output, not as input
Reply With Quote Quick reply to this message  
Join Date: May 2007
Posts: 4,363
Reputation: Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of 
Solved Threads: 501
Moderator
Featured Poster
Ezzaral's Avatar
Ezzaral Ezzaral is offline Offline
Industrious Poster

Re: class to read any kind of file format e.g .doc,.pdf,.txt

 
0
  #3
Oct 21st, 2008
Originally Posted by ndumbo View Post
HI
my package reads file from txt format only. it does not read .pdf or .doc file. is there any single java library that opens any kind of file format as stream and then reads or manipulates its contents.
The literal answer is yes: http://java.sun.com/javase/6/docs/ap...e-summary.html

The actual answer to the question as you intend it:
No, and there never will be.
Any application could use anything it wanted for a file format and "manipulate its contents" depends completely on the nature of that application. You cannot expect to have a completely generic solution to an inherently non-generic problem.
Reply With Quote Quick reply to this message  
Join Date: Oct 2008
Posts: 4
Reputation: ndumbo is an unknown quantity at this point 
Solved Threads: 0
ndumbo ndumbo is offline Offline
Newbie Poster

Re: class to read any kind of file format e.g .doc,.pdf,.txt

 
0
  #4
Oct 21st, 2008
HI
Thank you two for your reply. if what you are saying is true, then "Does it mean that there cannot be a java application that reads any kind of file and count the frequency of any given word in it."
1) i mean we can not use java for trying to make an index of keywords?
2) URL and URLConnection classes of java given me the html contents of the web page, is there any way to get just the content like text etc of the web page.
Reply With Quote Quick reply to this message  
Join Date: May 2007
Posts: 4,363
Reputation: Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of 
Solved Threads: 501
Moderator
Featured Poster
Ezzaral's Avatar
Ezzaral Ezzaral is offline Offline
Industrious Poster

Re: class to read any kind of file format e.g .doc,.pdf,.txt

 
0
  #5
Oct 22nd, 2008
What it means is that you need identify exactly what you want to read and use appropriate APIs for that content. There are libraries for working with .doc files, .pdf files, etc., but there is not a "reading anything I might happen to come across" library because that is a completely unrealistic expectation.
Reply With Quote Quick reply to this message  
Join Date: Apr 2006
Posts: 164
Reputation: orko is an unknown quantity at this point 
Solved Threads: 10
orko orko is offline Offline
Junior Poster

Re: class to read any kind of file format e.g .doc,.pdf,.txt

 
0
  #6
Oct 22nd, 2008
Originally Posted by ndumbo View Post
HI
Thank you two for your reply. if what you are saying is true, then "Does it mean that there cannot be a java application that reads any kind of file and count the frequency of any given word in it."
1) i mean we can not use java for trying to make an index of keywords?
2) URL and URLConnection classes of java given me the html contents of the web page, is there any way to get just the content like text etc of the web page.

There can be a java application that reads any kind of file and count the frequency of any given word in it. BUT, when the application is given that "any" kind of file, that application has to recognize the type, and treat them differently while parsing data from the files. Once data is parsed, they can be treated as same. An application dealing with "any kind of file" and a class dealing with "any kind of file" is NOT the same.
1. I think I already said the answer. Precisely, you can.
2. So are you saying, you want the data from html file without the tags? Them parse the html file using DOM or SAX, and iterate through the nodes (the tags), and get whatever data you want. Or another way is to using regular expression to remove the tags.
A Perfect World
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Other Threads in the Java Forum
Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC