I've been working on an applet that gets all the links from a webpage. So far, I have it getting the source. I have found some regular expressions that supposedly will parse out the source, but it doesn't make any change to the original source.

I also found some example using the HTMLEditorKit, but haven't had any luck with that. I don't think that works with applets. Any ideas on how to parse out all of the hyperlinks from the html source? I have to source saved as a string. I have done this before in C#, but completely stuck with java.

Recommended Answers

All 4 Replies

Use an HTML parser of some sort. Google for one. Adding it to the JNLP list or the archive tag will make it available to an applet from the server.

Thanks for the link, I'll check those out. I think one of my problems is that when I get the source, it is reading it as characters, then just read to a string. So when I try to split the string, it leaves it as it is, and just removes the letters or spaces from the string, and when I do a println, it prints one char per line. Would a string builder or something be able to create a string from these characters?

StringBuilder worked!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.