| | |
Scan web pages for attributes.
![]() |
•
•
Join Date: Nov 2009
Posts: 2
Reputation:
Solved Threads: 0
My current goal is to scan and parse an html page and get the attributes from the tags. Right now, I can take a page and scan everything token by token and save them to a String. What I want to so is pick out certain tags and strip them to get certain attributes. Here's an example (the following code does not work):
I know that I have to use something like:
This will barrel through the source (the html file) token by token, save them to the string "stuff" and print a line using "stuff" as the argument.
Now I need to find a way to pick out a target tag, and get certain attributes. I want to detect when the img tag is run over, and harvest the various attributes within, separating them into different strings. Using the tag demonstrated above, I want to find an <img> tag, with with the class attribute "awesome" and get the src attribute.
I think I'm making this more difficult than it needs to be. There might be a simpler way of doing this and I'm not seeing it. Also, whenever I'm done, I need to do something else that's a bit more complex, but I'm taking baby steps right now. Anyone know whether I should keep on this line of thinking, or is there a better way to do this?
Java Syntax (Toggle Plain Text)
<img class="awsome" src="http://www.thebestsiteonthefaceoftheplanet.com/image.jpg">
Java Syntax (Toggle Plain Text)
s = new Scanner(new BufferedReader(new InputStreamReader(yahoo.openStream()))); String stuff; while (s.hasNext()) { stuff = s.nextLine() System.out.println(stuff); }
Now I need to find a way to pick out a target tag, and get certain attributes. I want to detect when the img tag is run over, and harvest the various attributes within, separating them into different strings. Using the tag demonstrated above, I want to find an <img> tag, with with the class attribute "awesome" and get the src attribute.
I think I'm making this more difficult than it needs to be. There might be a simpler way of doing this and I'm not seeing it. Also, whenever I'm done, I need to do something else that's a bit more complex, but I'm taking baby steps right now. Anyone know whether I should keep on this line of thinking, or is there a better way to do this?
![]() |
Similar Threads
- can only open web pages with a reboot (Viruses, Spyware and other Nasties)
- cannot view web pages (Web Browsers)
- IE7 won't load some web pages (Viruses, Spyware and other Nasties)
- virus and web redirect (Viruses, Spyware and other Nasties)
- Internet Connected But Won't Open Web Pages (Viruses, Spyware and other Nasties)
- News Story: 10,000 booby trapped web pages revealed (Network Security)
- certain web pages won't load (Viruses, Spyware and other Nasties)
- Dynamic web pages? Which will exist? (IT Professionals' Lounge)
- Explorer windows disapear when accessing some directories and web pages (Viruses, Spyware and other Nasties)
Other Threads in the Java Forum
- Previous Thread: While Loops
- Next Thread: MenuBar in swing not working :(
| Thread Tools | Search this Thread |
-xlint add android api applet application applications array arrays automation bank bi binary blackberry bluetooth chat class client code compile compiler component database development digit eclipse equation error event fractal freeze functiontesting game gameprogramming givemetehcodez graphics gui health html hyper ide idea image infinite input int integer j2me java javame javaprojects jetbrains jni jpanel jtable julia learningresources linux list login loop main map method methods mobile myregfun netbeans newbie nonstatic notdisplaying pearl problem program programming project qt recursion scanner screen scrollbar server set sms sort sorting spamblocker sql sqlserver string superclass swing system text-file thread threads tree variablebinding windows xor





