parsing html

Please support our Java advertiser: Programming Forums - DaniWeb Sister Site
Reply

Join Date: Jun 2004
Posts: 2,108
Reputation: server_crash is on a distinguished road 
Solved Threads: 18
server_crash server_crash is offline Offline
Postaholic

Re: parsing html

 
0
  #11
Feb 26th, 2006
I think I did the exact same thing once writing a console web crawler. Took some tricky work with the indexOf() method.

Well, actually it was the link i was after.
Reply With Quote Quick reply to this message  
Join Date: Mar 2006
Posts: 1
Reputation: sjoshi is an unknown quantity at this point 
Solved Threads: 0
sjoshi sjoshi is offline Offline
Newbie Poster

Re: parsing html

 
0
  #12
Mar 22nd, 2006
Hi,

I am getting a javax.swing.text.ChangedCharSetException when I use the following code. Where do I set the prperty that you are talking about? ( I have a meta tag that is causing the exception.

try {
Reader r = new FileReader("PJMData.htm");
ParserDelegator parser = new ParserDelegator();
HTMLEditorKit.ParserCallback callback = new PJMParser();
parser.parse(r, callback, false);
} catch (IOException e) {
e.printStackTrace();
}

Let me know. Thanks.
Reply With Quote Quick reply to this message  
Join Date: Mar 2004
Posts: 802
Reputation: Phaelax is on a distinguished road 
Solved Threads: 40
Phaelax Phaelax is offline Offline
Practically a Posting Shark

Re: parsing html

 
0
  #13
Mar 22nd, 2006
I'm not sure in your case, since you're using the parser callback whereas I read the html into a document. The property I mentioned is set on the document itself.

doc.putProperty("IgnoreCharsetDirective", new Boolean(true));

Could you read your htm file into a document first then use the parser on it?
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Similar Threads
Other Threads in the Java Forum


Views: 8720 | Replies: 12
Thread Tools Search this Thread



Tag cloud for Java
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC