0

Hi,

I have a piece of code, which gets a string of code from a webpage. It's the HTML source code, from which I want to make an array, in which I can find data the user will input. However, I need to extract all the useful information from the array, and discard the useless. How to search for a two substrings in a string, and copy the string in between?

My code:

NSString *googleString = @"http://www.mypage.com"; 
NSURL *googleURL = [NSURL URLWithString:googleString];
NSError *error;
NSString *googlePage = [NSString stringWithContentsOfURL:googleURL 
                                                encoding:NSASCIIStringEncoding
                                                   error:&error];

returns something like:

<HTML>
<BODY>
<TABLE style="border: Solid 1px Black; border-collapse: collapse; font-family: arial; width: 100%;">
<tr>  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="1.htm" target="main">H4A</A>
  </td>
  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="1.htm" target="main">Aanen </A>
  </td>
  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">

<A HREF ="1.htm" target="main">Joeri</A>
  </td>
</tr>
<tr>  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="2.htm" target="main">H4A</A>
  </td>
  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="2.htm" target="main">Ali </A>
  </td>

  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="2.htm" target="main">Sohail</A>
  </td>
</tr>
<tr>  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="3.htm" target="main">H4A</A>
  </td>
  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="3.htm" target="main">Beerthuijzen </A>

  </td>
  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="3.htm" target="main">Iris</A>
  </td>
</tr>

and so on...

Maybe someone has a better idea? I need to get the strings: Name, Sirname, and page number (Example (The last one): Iris, Beerthuijzen, 3).

Thanks in advance!

2
Contributors
1
Reply
2
Views
6 Years
Discussion Span
Last Post by Prabakar
0

HTML's are supposed to follow strict xml syntax (not all web pages do so) But the html that your have posted does follow xml syntax so you can use xml parsers and extract data from this. If the page is not xml complaint then you might have to use reg-ex pattern match to extract data. I used this when I first started with reg-ex

Edited by Prabakar: n/a

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.