Wow you guys have been very helpful! I'm appreciative for each of your posts
(Yes, the HTML page source is just marked up text)
Sorry I had left the other day before getting back to you about it, I had assumed it read straight from the source, but just wanted to make sure. Thanks for confirming that.
----this is some code I wrote to make a custom crawler to get images from a site. just change the pattern and the method that stores the string (Bajador is a class that downloads the image in my case) and put everything in a loop that goes through all the urls you wish to examine.
try {
URL url = new URL(baseURL+"/index.php?id="+par);
BufferedReader br= new BufferedReader( new InputStreamReader(url.openStream()));
String line;
while((line=br.readLine())!=null){
line=line.trim();
String pat="<img src=\"/(\\w*).jpg\" alt=\"Picture\"/></div>";
Pattern pattern = Pattern.compile(pat);
Matcher matcher = pattern.matcher(line);
boolean matchFound = matcher.find();
if (matchFound) {
String nombre=matcher.group(1);
String urlI=baseURL+"/file/"+nombre+".jpg";
Bajador baj= new Bajador(baseURL,nombre,par);
baj.start();
// baja(baseURL,nombre);
}
}
} catch (MalformedURLException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException e) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, e);
}
Hope this help you.
That helped me quite a bit, I read into patterns and while it seems a bit confusing I understood enough to where I believe you are correct in saying I only needed to search from a-z and 0-9.
I am left a bit confused about some of your code, so if you don't mind I'll just ask the questions here for a better understanding.
if (matchFound) {
String nombre=matcher.group(1);
String urlI=baseURL+"/file/"+nombre+".jpg";
Bajador baj= new Bajador(baseURL,nombre,par);
baj.start();
// baja(baseURL,nombre);
}
I understand that your code is for pulling images off a website (which I might be using later) and I also understand that if a match is found this is where I tell it what to do, but I'm a bit curious about your code here.
The java docs left me a bit confused about the matcher.group. Would you mind explaining the piece of code above?
Instead of bajador, I would put a class that will add the finding to an arraylist and write it to file, unless there is a better option
----You could use HTML Parser , which a Java library used to parse HTML in either a linear or nested fashion. It is an open source tool and can be found on SourceForge.
you could also use The Swing HTML Parser.
Thanks for pointing this out, this looks like it cuts out a lot of the work :] I'll check into this as well. However, one thing is I'm also looking to interact with the web page too such as clicking buttons, I'm not sure if this would allow that.
----
Another question I have is that will I be able to manipulate menus and what not with similar code. Meaning, I am looking to also do such things through a search feature on forums, so would I be able to click buttons and select things from drop down menus?