Scan web pages for attributes.

Please support our Java advertiser: Programming Forums - DaniWeb Sister Site
Reply

Join Date: Nov 2009
Posts: 2
Reputation: Tactical Fart is an unknown quantity at this point 
Solved Threads: 0
Tactical Fart Tactical Fart is offline Offline
Newbie Poster

Scan web pages for attributes.

 
0
  #1
Nov 2nd, 2009
My current goal is to scan and parse an html page and get the attributes from the tags. Right now, I can take a page and scan everything token by token and save them to a String. What I want to so is pick out certain tags and strip them to get certain attributes. Here's an example (the following code does not work):
  1. <img class="awsome" src="http://www.thebestsiteonthefaceoftheplanet.com/image.jpg">
I know that I have to use something like:
  1. s = new Scanner(new BufferedReader(new InputStreamReader(yahoo.openStream())));
  2. String stuff;
  3. while (s.hasNext())
  4. {
  5. stuff = s.nextLine()
  6. System.out.println(stuff);
  7. }
This will barrel through the source (the html file) token by token, save them to the string "stuff" and print a line using "stuff" as the argument.

Now I need to find a way to pick out a target tag, and get certain attributes. I want to detect when the img tag is run over, and harvest the various attributes within, separating them into different strings. Using the tag demonstrated above, I want to find an <img> tag, with with the class attribute "awesome" and get the src attribute.

I think I'm making this more difficult than it needs to be. There might be a simpler way of doing this and I'm not seeing it. Also, whenever I'm done, I need to do something else that's a bit more complex, but I'm taking baby steps right now. Anyone know whether I should keep on this line of thinking, or is there a better way to do this?
Reply With Quote Quick reply to this message  
Reply

Message:



Similar Threads
Other Threads in the Java Forum
Thread Tools Search this Thread



Tag cloud for Java
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC