943,969 Members | Top Members by Rank

Ad:
  • Java Discussion Thread
  • Unsolved
  • Views: 7656
  • Java RSS
Oct 15th, 2005
0

Parsing a String

Expand Post »
Hi Everybody!

I'm trying to create my own web accelerator/browser. When you open a page, it will take all the links on that page and preload them. I just have one question: how do you retrieve HTML source code of a page and how do you parse that huge string to find everyhting inside the quotes of an <a></a> tag.

Just so you know, to make a link in HTML, you use the following code: <a href="WHAT I WANT TO PARSE">WHAT TEXT WILL BE DISPLAYED ON THE PAGE</a>

Thanks for all your help.
Similar Threads
Reputation Points: 12
Solved Threads: 2
Posting Whiz
Ghost is offline Offline
352 posts
since Aug 2004
Oct 15th, 2005
0

Re: Parsing a String

You will need to get the complete html data anyway else you can't render it

If the data is properfly formatted XHTML it's easy as 1-2-3, just create a DOM parser and look for all "a" tags, then take the href arguments from those.
If it's not properly formatted XHTML you're out of luck and will basically have to write something to do that yourself (and all possible corrupted alternatives, like uppercase and combinations of upper and lowercase).
Team Colleague
Reputation Points: 1658
Solved Threads: 331
duckman
jwenting is offline Offline
7,719 posts
since Nov 2004
Oct 15th, 2005
0

Re: Parsing a String

Well, I actuall just wrote a parser for finding links inside of html just the other day at work.

Java Syntax (Toggle Plain Text)
  1. public static String addTarget(String staticDetail)
  2. {
  3. String returnUrl
  4.  
  5. Pattern pattern = Pattern.compile("<+");
  6. Matcher matcher = pattern.matcher(staticDetail);
  7.  
  8. while(matcher.find())
  9. {
  10. int lessIndex = matcher.start();
  11. int greatIndex = staticDetail.indexOf(">", lessIndex + 1);
  12. int aIndex = staticDetail.indexOf("a", lessIndex + 1);
  13. int hrefIndex = staticDetail.indexOf("href", aIndex + 1);
  14. if(aIndex != -1 && hrefIndex != -1)
  15. {
  16. if(aIndex < greatIndex && hrefIndex < greatIndex)
  17. {
  18. int firstQuoteIndex = staticDetail.indexOf("\"", hrefIndex + 1);
  19. int secondQuoteIndex = staticDetail.indexOf("\"", firstQuoteIndex + 1);
  20. returnURL = staticDetial.subString(fristQuoteIndex, secondQuoteIndex);
  21. }
  22. }
  23. }
  24. return returnUrl;
  25. }

Now, I re-did some of the code above to fit your needs better and I didn't test it out.

Regards,

Nate
Reputation Points: 11
Solved Threads: 8
Posting Whiz in Training
hooknc is offline Offline
216 posts
since Aug 2005
Oct 16th, 2005
0

Re: Parsing a String

Thanks, Nate! Does that return all of the links or just one of them? Also, how do you retrieve HTML from a web page

Thanks

One more thing: I made a web browser (the code is below) but it doesn't work well on some sites. For example, it won't connect to GMail. If you have any suggestions I would appreciate them.

WEB BROWSER CODE:
Java Syntax (Toggle Plain Text)
  1. import java.awt.*;
  2. import java.awt.event.*;
  3. import javax.swing.*;
  4. import javax.swing.event.*;
  5. import java.io.*;
  6. import java.net.*;
  7.  
  8. public class Main extends JFrame
  9. {
  10. private JTextField enterField;
  11. private Button goToURL;
  12. private JEditorPane contentsArea;
  13. private JPanel top;
  14.  
  15. public Main ()
  16. {
  17. super("Alpha Browser");
  18. setSize(500,400);
  19. setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
  20. setVisible(true);
  21.  
  22. Container container = getContentPane();
  23.  
  24. top = new JPanel();
  25. enterField = new JTextField(40);
  26. goToURL = new Button("Visit");
  27. goToURL.addActionListener(
  28. new ActionListener()
  29. {
  30. public void actionPerformed (ActionEvent event)
  31. {
  32. loadPage(enterField.getText());
  33. }
  34. }
  35. );
  36.  
  37. top.add(enterField);
  38. top.add(goToURL);
  39.  
  40. container.add(top, BorderLayout.NORTH);
  41.  
  42. contentsArea = new JEditorPane();
  43. contentsArea.setEditable(false);
  44. contentsArea.addHyperlinkListener(
  45. new HyperlinkListener()
  46. {
  47. public void hyperlinkUpdate(HyperlinkEvent event)
  48. {
  49. if(event.getEventType() == HyperlinkEvent.EventType.ACTIVATED)
  50. loadPage(event.getURL().toString());
  51. }
  52. }
  53. );
  54.  
  55. container.add( new JScrollPane(contentsArea),
  56. BorderLayout.CENTER);
  57.  
  58. setContentPane(container);
  59. }
  60.  
  61. private void loadPage(String loc)
  62. {
  63. try
  64. {
  65. contentsArea.setPage(loc);
  66. enterField.setText(loc);
  67. }
  68. catch (IOException ioException)
  69. {
  70. JOptionPane.showMessageDialog(null,
  71. "Unable to contact URL.\n\nPossible reasons for error:\n"+
  72. "1.) Server Timeout\n2.) Mis-typed URL\n3.) Internet connection error\n"+ioException.toString(),
  73. "Error in Contacting Given URL",
  74. JOptionPane.ERROR_MESSAGE);
  75. }
  76. }
  77.  
  78. public static void main (String [] args)
  79. {
  80. Main main = new Main();
  81. }
  82. }
Reputation Points: 12
Solved Threads: 2
Posting Whiz
Ghost is offline Offline
352 posts
since Aug 2004
Oct 16th, 2005
0

Re: Parsing a String

Yes, the posted code will get every url in a html document.

I also looked at your code and ran it on my machine (java 1.5) and it seems to connect to gmail just fine.

Regards,

Nate
Reputation Points: 11
Solved Threads: 8
Posting Whiz in Training
hooknc is offline Offline
216 posts
since Aug 2005
Oct 17th, 2005
0

Re: Parsing a String

Thanks Nate.

Two questions:
First, how do I get the HTML code into a String?
Second, what do you pass through your method and what does it return?

Thanks.
Reputation Points: 12
Solved Threads: 2
Posting Whiz
Ghost is offline Offline
352 posts
since Aug 2004
Oct 20th, 2005
0

Re: Parsing a String

Hooknc, would you mind answering my questions? Thanks.
Reputation Points: 12
Solved Threads: 2
Posting Whiz
Ghost is offline Offline
352 posts
since Aug 2004
Oct 21st, 2005
0

Re: Parsing a String

Quote originally posted by Ghost ...
Hooknc, would you mind answering my questions? Thanks.
Sure.

I actually don't know how to get the html. I tried about 4 different ways of getting the html and wan't good at doing it. (InputStreams really arn't my strong point.) I don't know what my problem was. I was hoping that the Textarea would return the html, but it really removes A LOT of the html and that isn't a good solution.

The method that was written actually needs to be worked over for your purpose. It should actually be returning a List and where the returnUrl gets set...that url should actually added to the list.

Regards,

Nate
Reputation Points: 11
Solved Threads: 8
Posting Whiz in Training
hooknc is offline Offline
216 posts
since Aug 2005
Oct 21st, 2005
0

Re: Parsing a String

A JTextPane will use a filter to format the text. That filter will probably (I've not tried) also be applied when retrieving the text.
Try a JEditorPane instead (maybe just casting it to JEditorPane and asking for the text will be enough), or try getting the text through the model instead of directly.
Team Colleague
Reputation Points: 1658
Solved Threads: 331
duckman
jwenting is offline Offline
7,719 posts
since Nov 2004

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Java Forum Timeline: Scaling
Next Thread in Java Forum Timeline: File I/O





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC