Parsing a String

Please support our Java advertiser: Programming Forums - DaniWeb Sister Site
Reply

Join Date: Aug 2004
Posts: 350
Reputation: Ghost is an unknown quantity at this point 
Solved Threads: 2
Ghost's Avatar
Ghost Ghost is offline Offline
Posting Whiz

Parsing a String

 
0
  #1
Oct 15th, 2005
Hi Everybody!

I'm trying to create my own web accelerator/browser. When you open a page, it will take all the links on that page and preload them. I just have one question: how do you retrieve HTML source code of a page and how do you parse that huge string to find everyhting inside the quotes of an <a></a> tag.

Just so you know, to make a link in HTML, you use the following code: <a href="WHAT I WANT TO PARSE">WHAT TEXT WILL BE DISPLAYED ON THE PAGE</a>

Thanks for all your help.
Reply With Quote Quick reply to this message  
Join Date: Nov 2004
Posts: 6,143
Reputation: jwenting is just really nice jwenting is just really nice jwenting is just really nice jwenting is just really nice 
Solved Threads: 212
Team Colleague
jwenting's Avatar
jwenting jwenting is offline Offline
duckman

Re: Parsing a String

 
0
  #2
Oct 15th, 2005
You will need to get the complete html data anyway else you can't render it

If the data is properfly formatted XHTML it's easy as 1-2-3, just create a DOM parser and look for all "a" tags, then take the href arguments from those.
If it's not properly formatted XHTML you're out of luck and will basically have to write something to do that yourself (and all possible corrupted alternatives, like uppercase and combinations of upper and lowercase).
As people are clearly allowed to attack me but I'm not allowed to defend myself, I no longer post to this site.
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 216
Reputation: hooknc is an unknown quantity at this point 
Solved Threads: 8
hooknc hooknc is offline Offline
Posting Whiz in Training

Re: Parsing a String

 
0
  #3
Oct 15th, 2005
Well, I actuall just wrote a parser for finding links inside of html just the other day at work.

  1. public static String addTarget(String staticDetail)
  2. {
  3. String returnUrl
  4.  
  5. Pattern pattern = Pattern.compile("<+");
  6. Matcher matcher = pattern.matcher(staticDetail);
  7.  
  8. while(matcher.find())
  9. {
  10. int lessIndex = matcher.start();
  11. int greatIndex = staticDetail.indexOf(">", lessIndex + 1);
  12. int aIndex = staticDetail.indexOf("a", lessIndex + 1);
  13. int hrefIndex = staticDetail.indexOf("href", aIndex + 1);
  14. if(aIndex != -1 && hrefIndex != -1)
  15. {
  16. if(aIndex < greatIndex && hrefIndex < greatIndex)
  17. {
  18. int firstQuoteIndex = staticDetail.indexOf("\"", hrefIndex + 1);
  19. int secondQuoteIndex = staticDetail.indexOf("\"", firstQuoteIndex + 1);
  20. returnURL = staticDetial.subString(fristQuoteIndex, secondQuoteIndex);
  21. }
  22. }
  23. }
  24. return returnUrl;
  25. }

Now, I re-did some of the code above to fit your needs better and I didn't test it out.

Regards,

Nate
Reply With Quote Quick reply to this message  
Join Date: Aug 2004
Posts: 350
Reputation: Ghost is an unknown quantity at this point 
Solved Threads: 2
Ghost's Avatar
Ghost Ghost is offline Offline
Posting Whiz

Re: Parsing a String

 
0
  #4
Oct 16th, 2005
Thanks, Nate! Does that return all of the links or just one of them? Also, how do you retrieve HTML from a web page

Thanks

One more thing: I made a web browser (the code is below) but it doesn't work well on some sites. For example, it won't connect to GMail. If you have any suggestions I would appreciate them.

WEB BROWSER CODE:
  1. import java.awt.*;
  2. import java.awt.event.*;
  3. import javax.swing.*;
  4. import javax.swing.event.*;
  5. import java.io.*;
  6. import java.net.*;
  7.  
  8. public class Main extends JFrame
  9. {
  10. private JTextField enterField;
  11. private Button goToURL;
  12. private JEditorPane contentsArea;
  13. private JPanel top;
  14.  
  15. public Main ()
  16. {
  17. super("Alpha Browser");
  18. setSize(500,400);
  19. setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
  20. setVisible(true);
  21.  
  22. Container container = getContentPane();
  23.  
  24. top = new JPanel();
  25. enterField = new JTextField(40);
  26. goToURL = new Button("Visit");
  27. goToURL.addActionListener(
  28. new ActionListener()
  29. {
  30. public void actionPerformed (ActionEvent event)
  31. {
  32. loadPage(enterField.getText());
  33. }
  34. }
  35. );
  36.  
  37. top.add(enterField);
  38. top.add(goToURL);
  39.  
  40. container.add(top, BorderLayout.NORTH);
  41.  
  42. contentsArea = new JEditorPane();
  43. contentsArea.setEditable(false);
  44. contentsArea.addHyperlinkListener(
  45. new HyperlinkListener()
  46. {
  47. public void hyperlinkUpdate(HyperlinkEvent event)
  48. {
  49. if(event.getEventType() == HyperlinkEvent.EventType.ACTIVATED)
  50. loadPage(event.getURL().toString());
  51. }
  52. }
  53. );
  54.  
  55. container.add( new JScrollPane(contentsArea),
  56. BorderLayout.CENTER);
  57.  
  58. setContentPane(container);
  59. }
  60.  
  61. private void loadPage(String loc)
  62. {
  63. try
  64. {
  65. contentsArea.setPage(loc);
  66. enterField.setText(loc);
  67. }
  68. catch (IOException ioException)
  69. {
  70. JOptionPane.showMessageDialog(null,
  71. "Unable to contact URL.\n\nPossible reasons for error:\n"+
  72. "1.) Server Timeout\n2.) Mis-typed URL\n3.) Internet connection error\n"+ioException.toString(),
  73. "Error in Contacting Given URL",
  74. JOptionPane.ERROR_MESSAGE);
  75. }
  76. }
  77.  
  78. public static void main (String [] args)
  79. {
  80. Main main = new Main();
  81. }
  82. }
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 216
Reputation: hooknc is an unknown quantity at this point 
Solved Threads: 8
hooknc hooknc is offline Offline
Posting Whiz in Training

Re: Parsing a String

 
0
  #5
Oct 16th, 2005
Yes, the posted code will get every url in a html document.

I also looked at your code and ran it on my machine (java 1.5) and it seems to connect to gmail just fine.

Regards,

Nate
Reply With Quote Quick reply to this message  
Join Date: Aug 2004
Posts: 350
Reputation: Ghost is an unknown quantity at this point 
Solved Threads: 2
Ghost's Avatar
Ghost Ghost is offline Offline
Posting Whiz

Re: Parsing a String

 
0
  #6
Oct 17th, 2005
Thanks Nate.

Two questions:
First, how do I get the HTML code into a String?
Second, what do you pass through your method and what does it return?

Thanks.
Reply With Quote Quick reply to this message  
Join Date: Aug 2004
Posts: 350
Reputation: Ghost is an unknown quantity at this point 
Solved Threads: 2
Ghost's Avatar
Ghost Ghost is offline Offline
Posting Whiz

Re: Parsing a String

 
0
  #7
Oct 20th, 2005
Hooknc, would you mind answering my questions? Thanks.
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 216
Reputation: hooknc is an unknown quantity at this point 
Solved Threads: 8
hooknc hooknc is offline Offline
Posting Whiz in Training

Re: Parsing a String

 
0
  #8
Oct 21st, 2005
Originally Posted by Ghost
Hooknc, would you mind answering my questions? Thanks.
Sure.

I actually don't know how to get the html. I tried about 4 different ways of getting the html and wan't good at doing it. (InputStreams really arn't my strong point.) I don't know what my problem was. I was hoping that the Textarea would return the html, but it really removes A LOT of the html and that isn't a good solution.

The method that was written actually needs to be worked over for your purpose. It should actually be returning a List and where the returnUrl gets set...that url should actually added to the list.

Regards,

Nate
Reply With Quote Quick reply to this message  
Join Date: Nov 2004
Posts: 6,143
Reputation: jwenting is just really nice jwenting is just really nice jwenting is just really nice jwenting is just really nice 
Solved Threads: 212
Team Colleague
jwenting's Avatar
jwenting jwenting is offline Offline
duckman

Re: Parsing a String

 
0
  #9
Oct 21st, 2005
A JTextPane will use a filter to format the text. That filter will probably (I've not tried) also be applied when retrieving the text.
Try a JEditorPane instead (maybe just casting it to JEditorPane and asking for the text will be enough), or try getting the text through the model instead of directly.
As people are clearly allowed to attack me but I'm not allowed to defend myself, I no longer post to this site.
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Similar Threads
Other Threads in the Java Forum
Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC