Start New Discussion within our Software Development Community

Hi,

I have a piece of code, which gets a string of code from a webpage. It's the HTML source code, from which I want to make an array, in which I can find data the user will input. However, I need to extract all the useful information from the array, and discard the useless. How to search for a two substrings in a string, and copy the string in between?

My code:

NSString *googleString = @"http://www.mypage.com"; 
NSURL *googleURL = [NSURL URLWithString:googleString];
NSError *error;
NSString *googlePage = [NSString stringWithContentsOfURL:googleURL 
                                                encoding:NSASCIIStringEncoding
                                                   error:&error];

returns something like:

<HTML>
<BODY>
<TABLE style="border: Solid 1px Black; border-collapse: collapse; font-family: arial; width: 100%;">
<tr>  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="1.htm" target="main">H4A</A>
  </td>
  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="1.htm" target="main">Aanen </A>
  </td>
  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">

<A HREF ="1.htm" target="main">Joeri</A>
  </td>
</tr>
<tr>  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="2.htm" target="main">H4A</A>
  </td>
  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="2.htm" target="main">Ali </A>
  </td>

  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="2.htm" target="main">Sohail</A>
  </td>
</tr>
<tr>  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="3.htm" target="main">H4A</A>
  </td>
  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="3.htm" target="main">Beerthuijzen </A>

  </td>
  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="3.htm" target="main">Iris</A>
  </td>
</tr>

and so on...

Maybe someone has a better idea? I need to get the strings: Name, Sirname, and page number (Example (The last one): Iris, Beerthuijzen, 3).

Thanks in advance!

HTML's are supposed to follow strict xml syntax (not all web pages do so) But the html that your have posted does follow xml syntax so you can use xml parsers and extract data from this. If the page is not xml complaint then you might have to use reg-ex pattern match to extract data. I used this when I first started with reg-ex

This article has been dead for over six months. Start a new discussion instead.