| | |
Parsing a String
Please support our Java advertiser: Programming Forums - DaniWeb Sister Site
![]() |
Hi Everybody!
I'm trying to create my own web accelerator/browser. When you open a page, it will take all the links on that page and preload them. I just have one question: how do you retrieve HTML source code of a page and how do you parse that huge string to find everyhting inside the quotes of an <a></a> tag.
Just so you know, to make a link in HTML, you use the following code: <a href="WHAT I WANT TO PARSE">WHAT TEXT WILL BE DISPLAYED ON THE PAGE</a>
Thanks for all your help.
I'm trying to create my own web accelerator/browser. When you open a page, it will take all the links on that page and preload them. I just have one question: how do you retrieve HTML source code of a page and how do you parse that huge string to find everyhting inside the quotes of an <a></a> tag.
Just so you know, to make a link in HTML, you use the following code: <a href="WHAT I WANT TO PARSE">WHAT TEXT WILL BE DISPLAYED ON THE PAGE</a>
Thanks for all your help.
You will need to get the complete html data anyway else you can't render it 
If the data is properfly formatted XHTML it's easy as 1-2-3, just create a DOM parser and look for all "a" tags, then take the href arguments from those.
If it's not properly formatted XHTML you're out of luck and will basically have to write something to do that yourself (and all possible corrupted alternatives, like uppercase and combinations of upper and lowercase).

If the data is properfly formatted XHTML it's easy as 1-2-3, just create a DOM parser and look for all "a" tags, then take the href arguments from those.
If it's not properly formatted XHTML you're out of luck and will basically have to write something to do that yourself (and all possible corrupted alternatives, like uppercase and combinations of upper and lowercase).
As people are clearly allowed to attack me but I'm not allowed to defend myself, I no longer post to this site.
•
•
Join Date: Aug 2005
Posts: 216
Reputation:
Solved Threads: 8
Well, I actuall just wrote a parser for finding links inside of html just the other day at work.
Now, I re-did some of the code above to fit your needs better and I didn't test it out.
Regards,
Nate
Java Syntax (Toggle Plain Text)
public static String addTarget(String staticDetail) { String returnUrl Pattern pattern = Pattern.compile("<+"); Matcher matcher = pattern.matcher(staticDetail); while(matcher.find()) { int lessIndex = matcher.start(); int greatIndex = staticDetail.indexOf(">", lessIndex + 1); int aIndex = staticDetail.indexOf("a", lessIndex + 1); int hrefIndex = staticDetail.indexOf("href", aIndex + 1); if(aIndex != -1 && hrefIndex != -1) { if(aIndex < greatIndex && hrefIndex < greatIndex) { int firstQuoteIndex = staticDetail.indexOf("\"", hrefIndex + 1); int secondQuoteIndex = staticDetail.indexOf("\"", firstQuoteIndex + 1); returnURL = staticDetial.subString(fristQuoteIndex, secondQuoteIndex); } } } return returnUrl; }
Now, I re-did some of the code above to fit your needs better and I didn't test it out.
Regards,
Nate
Thanks, Nate! Does that return all of the links or just one of them? Also, how do you retrieve HTML from a web page
Thanks
One more thing: I made a web browser (the code is below) but it doesn't work well on some sites. For example, it won't connect to GMail. If you have any suggestions I would appreciate them.
WEB BROWSER CODE:
Thanks
One more thing: I made a web browser (the code is below) but it doesn't work well on some sites. For example, it won't connect to GMail. If you have any suggestions I would appreciate them.
WEB BROWSER CODE:
Java Syntax (Toggle Plain Text)
import java.awt.*; import java.awt.event.*; import javax.swing.*; import javax.swing.event.*; import java.io.*; import java.net.*; public class Main extends JFrame { private JTextField enterField; private Button goToURL; private JEditorPane contentsArea; private JPanel top; public Main () { super("Alpha Browser"); setSize(500,400); setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); setVisible(true); Container container = getContentPane(); top = new JPanel(); enterField = new JTextField(40); goToURL = new Button("Visit"); goToURL.addActionListener( new ActionListener() { public void actionPerformed (ActionEvent event) { loadPage(enterField.getText()); } } ); top.add(enterField); top.add(goToURL); container.add(top, BorderLayout.NORTH); contentsArea = new JEditorPane(); contentsArea.setEditable(false); contentsArea.addHyperlinkListener( new HyperlinkListener() { public void hyperlinkUpdate(HyperlinkEvent event) { if(event.getEventType() == HyperlinkEvent.EventType.ACTIVATED) loadPage(event.getURL().toString()); } } ); container.add( new JScrollPane(contentsArea), BorderLayout.CENTER); setContentPane(container); } private void loadPage(String loc) { try { contentsArea.setPage(loc); enterField.setText(loc); } catch (IOException ioException) { JOptionPane.showMessageDialog(null, "Unable to contact URL.\n\nPossible reasons for error:\n"+ "1.) Server Timeout\n2.) Mis-typed URL\n3.) Internet connection error\n"+ioException.toString(), "Error in Contacting Given URL", JOptionPane.ERROR_MESSAGE); } } public static void main (String [] args) { Main main = new Main(); } }
•
•
Join Date: Aug 2005
Posts: 216
Reputation:
Solved Threads: 8
•
•
•
•
Originally Posted by Ghost
Hooknc, would you mind answering my questions? Thanks.
I actually don't know how to get the html. I tried about 4 different ways of getting the html and wan't good at doing it. (InputStreams really arn't my strong point.) I don't know what my problem was. I was hoping that the Textarea would return the html, but it really removes A LOT of the html and that isn't a good solution.
The method that was written actually needs to be worked over for your purpose. It should actually be returning a List and where the returnUrl gets set...that url should actually added to the list.
Regards,
Nate
A JTextPane will use a filter to format the text. That filter will probably (I've not tried) also be applied when retrieving the text.
Try a JEditorPane instead (maybe just casting it to JEditorPane and asking for the text will be enough), or try getting the text through the model instead of directly.
Try a JEditorPane instead (maybe just casting it to JEditorPane and asking for the text will be enough), or try getting the text through the model instead of directly.
As people are clearly allowed to attack me but I'm not allowed to defend myself, I no longer post to this site.
![]() |
Similar Threads
- File parsing and then parsing the string (C)
- parsing a string (xml) in C (C)
- Parsing a String in Ksh (Shell Scripting)
- repost: leak using c++ string (C++)
- leak using c++ string (C++)
Other Threads in the Java Forum
| Thread Tools | Search this Thread |
2dgraphics account android api apple applet application array arrays automation banking binary binarytree bluetooth chat chatprogramusingobjects class classes client code component data database derby design draw eclipse encryption error event exception fractal game givemetehcodez graphics gui html ide if_statement image inheritance input integer interface j2me java javadesktopapplications javaprojects jlabel jni jpanel jtextfield julia linux list loop map method methods midlethttpconnection mobile monitoring netbeans newbie nullpointerexception open-source oracle print printing problem program programming project property recursion reference ria scanner screen search server set size sms sort sourcelabs splash sql static stop string swing testautomation threads time tree ui unicode validation windows






