954,541 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Keyword search in a web page

Can anyone tell how to search for a keyword in any web page in java...Suppose if
am giving query in a google page, The results will be displaying in many pages. In the 1st page i want to search for a keyword www. ...

abar_sow
Light Poster
36 posts since Jul 2007
Reputation Points: 10
Solved Threads: 0
 

I suppose you would need to first parse out the hrefs from the links and then get the content from each those urls and search for the match. Regular expressions parsing could do both or perhaps HtmlEditorKit
http://java.sun.com/javase/6/docs/api/javax/swing/text/html/HTMLEditorKit.html

Ezzaral
Posting Genius
Moderator
15,986 posts since May 2007
Reputation Points: 3,250
Solved Threads: 847
 

I want to download page for that first to search keyword ...can anyone locate the following problem

code:

import java.io.*;
import java.net.*;
public class page
{
public static void main(String args[]) throws IOException
{

java.io.BufferedInputStream in = new java.io.BufferedInputStream(new java.net.URL("http://www.google.co.in/search?q=Testing&hl=en&start=00&sa=N").openStream());
java.io.FileOutputStream fos = new java.io.FileOutputStream("testing1.htm");
java.io.BufferedOutputStream bout = new BufferedOutputStream(fos,1024);
byte data[] = new byte[1024];
while(in.read(data,0,1024)>=0)
{
bout.write(data);
}
bout.close();
in.close();
}
}

Problem is:

C:\Program Files\Java\jdk1.5.0\bin>javac page.java

C:\Program Files\Java\jdk1.5.0\bin>java page
Exception in thread "main" java.io.IOException: Server returned HTTP response co
de: 403 for URL: http://www.google.co.in/search?q=Testing&hl=en&start=00&sa=N
at sun.net. www.protocol.http.HttpURLConnection.getInputStream(HttpURLCon
nection.java:1133)
at java.net.URL.openStream(URL.java:1007)
at page.main(page.java:8)

C:\Program Files\Java\jdk1.5.0\bin>

abar_sow
Light Poster
36 posts since Jul 2007
Reputation Points: 10
Solved Threads: 0
 
peter_budo
Code tags enforcer
Moderator
15,436 posts since Dec 2004
Reputation Points: 2,806
Solved Threads: 902
 

Am saving that web page in a text file passing so tat i can avoid tat error also.Anyone send me code how to find a keyword stating from www. and ending with .doc(or .hmt/.pdf) in tat text file and i should store the url in a temp string .For example in the text file if am having link like this means www.cdc.gov/hiv/testing.htm i want to extract and pass cdc.gov/hiv/
into my url string..........

abar_sow
Light Poster
36 posts since Jul 2007
Reputation Points: 10
Solved Threads: 0
 

no, we're not going to do your (home)work for you.
That's pretty basic functionality, anyone should be able to figure it out for themselves.

Regular expressions to find URLs are scattered all over the web if you want them.

jwenting
duckman
Team Colleague
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337
 

Ok Thank u..

abar_sow
Light Poster
36 posts since Jul 2007
Reputation Points: 10
Solved Threads: 0
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You