RSS Forums RSS
Please support our Java advertiser: Lunarpages Java Web Hosting

parsing html

Join Date: Mar 2004
Posts: 733
Reputation: Phaelax is on a distinguished road 
Rep Power: 6
Solved Threads: 32
Phaelax Phaelax is offline Offline
Master Poster

parsing html

  #1  
Feb 19th, 2006
The probably isn't the parsing actually, I can't even get to that part yet. The webpage uses a different character set, "windows-1252". But even after setting the reader to use that charset (which exists in the system), I still get the ChangedCharSetException.


String link = "myurl.com";
 
URL url = new URL(link);
			URLConnection conn = url.openConnection();
			Reader reader = new InputStreamReader(conn.getInputStream(),Charset.forName("windows-1252"));
			
			EditorKit kit = new HTMLEditorKit();
			HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
			//throws error here while reading
			kit.read(reader, doc, 0);

Here's the first couple lines from the html file:
<html>
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<meta http-equiv="Pragma" content="no-cache">

Is there perhaps some way of reading the file but ignoring the meta data?
AddThis Social Bookmark Button
Reply With Quote  
Forums | Blogs | Tutorials | Code Snippets | Whitepapers | RSS Feeds | Advertising
All times are GMT -4. The time now is 2:07 am.
Newsletter Archive - Sitemap - Privacy Statement - Acceptable Use Policy - Contact Us
Forum system based on vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC