Parsing html

Question

sfrider0 6 Junior Poster

15 Years Ago

I've been working on an applet that gets all the links from a webpage. So far, I have it getting the source. I have found some regular expressions that supposedly will parse out the source, but it doesn't make any change to the original source.

I also found some example using the HTMLEditorKit, but haven't had any luck with that. I don't think that works with applets. Any ideas on how to parse out all of the hyperlinks from the html source? I have to source saved as a string. I have done this before in C#, but completely stuck with java.

html-css java

3 Contributors
4 Replies
96 Views
15 Hours Discussion Span
Latest Post 15 Years Ago Latest Post by sfrider0

All 4 Replies

kay25 0 Newbie Poster

15 Years Ago

Hi, this link contains some open source html parsers:

http://java-source.net/open-source/html-parsers

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

masijade 1,351 Industrious Poster Team Colleague Featured Poster · Answer 1 · 2010-05-25T14:11:38+00:00

Use an HTML parser of some sort. Google for one. Adding it to the JNLP list or the archive tag will make it available to an applet from the server.

sfrider0 6 Junior Poster · Answer 2 · 2010-05-25T22:30:04+00:00

Thanks for the link, I'll check those out. I think one of my problems is that when I get the source, it is reading it as characters, then just read to a string. So when I try to split the string, it leaves it as it is, and just removes the letters or spaces from the string, and when I do a println, it prints one char per line. Would a string builder or something be able to create a string from these characters?

sfrider0 6 Junior Poster · Answer 3 · 2010-05-25T23:14:44+00:00

sfrider0 6 Junior Poster

15 Years Ago

StringBuilder worked!

Parsing html

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers