3
Contributors
4
Replies
5
Views
6 Years
Discussion Span
Last Post by aanders5
0

If you ask me "Search Results" would be a better String.

In any case you can always do this idiocy

char quote = '"';
String search = "<h3 class=" + quote + "r" + quote + ">"

or

char[] text = { '<', 'h', '3', ' ', 'c', 'l', 'a', 's', 's', '=', '"', 'r', '"', '>' }
String search = new String(text);

but, in any case the simple

String search = "<h3 class=\"r\">";

should work without any problem.

Edited by masijade: Oops added a three

0

the String search = "<h3 class=\"r\">"; didn't work, because I printed out the result of the index location of it, and it was around the 35k character mark, not the 20k, and when I printed the substring(search,logpuller.length()) it had none of the results within it.

I shall try the other methods

Thanks,
0Austin

0

aanders,
a couple of weeks ago I was hired to do some data mining, and the mechanism at the core of the task I was assigned was this what you are trying to accomplish.

what i did is the following, and I believe this will help you, too.

I accessed remote files with Java's URL, BufferredReader, InputStream, and InputStreamReader classes. so, I read the html source code one line at a time, so every time I searched for a substring inside the line, the return was either -1 or somethinge way less than 35k, because a line of html code is never as long as that.

additionally, as I had to do the same for various sites, I would first analyze the source code of pages I was interested in and looked for ways to reduce the part of each page to be parsed, and that could help you too. the simplest example would be reading the source code between <body and </body instead of the entire page.

one more point to mention is that when I looked for the index of a string, say <hr class=, and needed to extract a string coming after the index returned, I would always add the length of this searched for string so that it I do not extract it.

I believe you know, but I want to remind you that string.indexOf("<hr class=") returns the position of < character... lest you forgot.

I hope I was of any help.

0

Hey guys, I got it to work, and I can pretty much parse everything PERFECTLY, I have a small glitch that I am working on...as I do not get why it is reacting the way it is, but so far the search is good.

Thanks!
-Austin

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.