| | |
Scan web pages for attributes.
Please support our Java advertiser: Programming Forums - DaniWeb Sister Site
![]() |
•
•
Join Date: Nov 2009
Posts: 2
Reputation:
Solved Threads: 0
My current goal is to scan and parse an html page and get the attributes from the tags. Right now, I can take a page and scan everything token by token and save them to a String. What I want to so is pick out certain tags and strip them to get certain attributes. Here's an example (the following code does not work):
I know that I have to use something like:
This will barrel through the source (the html file) token by token, save them to the string "stuff" and print a line using "stuff" as the argument.
Now I need to find a way to pick out a target tag, and get certain attributes. I want to detect when the img tag is run over, and harvest the various attributes within, separating them into different strings. Using the tag demonstrated above, I want to find an <img> tag, with with the class attribute "awesome" and get the src attribute.
I think I'm making this more difficult than it needs to be. There might be a simpler way of doing this and I'm not seeing it. Also, whenever I'm done, I need to do something else that's a bit more complex, but I'm taking baby steps right now. Anyone know whether I should keep on this line of thinking, or is there a better way to do this?
Java Syntax (Toggle Plain Text)
<img class="awsome" src="http://www.thebestsiteonthefaceoftheplanet.com/image.jpg">
Java Syntax (Toggle Plain Text)
s = new Scanner(new BufferedReader(new InputStreamReader(yahoo.openStream()))); String stuff; while (s.hasNext()) { stuff = s.nextLine() System.out.println(stuff); }
Now I need to find a way to pick out a target tag, and get certain attributes. I want to detect when the img tag is run over, and harvest the various attributes within, separating them into different strings. Using the tag demonstrated above, I want to find an <img> tag, with with the class attribute "awesome" and get the src attribute.
I think I'm making this more difficult than it needs to be. There might be a simpler way of doing this and I'm not seeing it. Also, whenever I'm done, I need to do something else that's a bit more complex, but I'm taking baby steps right now. Anyone know whether I should keep on this line of thinking, or is there a better way to do this?
![]() |
Similar Threads
- can only open web pages with a reboot (Viruses, Spyware and other Nasties)
- cannot view web pages (Web Browsers)
- IE7 won't load some web pages (Viruses, Spyware and other Nasties)
- virus and web redirect (Viruses, Spyware and other Nasties)
- Internet Connected But Won't Open Web Pages (Viruses, Spyware and other Nasties)
- News Story: 10,000 booby trapped web pages revealed (Network Security)
- certain web pages won't load (Viruses, Spyware and other Nasties)
- Dynamic web pages? Which will exist? (IT Professionals' Lounge)
- Explorer windows disapear when accessing some directories and web pages (Viruses, Spyware and other Nasties)
Other Threads in the Java Forum
- Previous Thread: While Loops
- Next Thread: MenuBar in swing not working :(
| Thread Tools | Search this Thread |
Tag cloud for Java
android api appinventor apple applet application arc arguments array arrays automation binary bluetooth c++ chat class classes client code codesnippet compiler component csv database doctype draw ebook eclipse error event exception fractal freeze game givemetehcodez graphics gui html ide image input integer intellij iphone j2me java java.xls javaprojects jni jpanel julia linux list login loop loops mac map method methods mobile netbeans newbie number online oracle page parameter print problem program programming project recursion reporting rotatetext scanner screen server set size sms socket sort sourcelabs sql string superclass swing system template test testautomation threads time title tree tutorial-sample windows working





