1,105,263 Community Members

Objective Objective C: Find and copy substings from NSString.

Member Avatar
hiddepolen
Posting Whiz in Training
293 posts since Oct 2010
Reputation Points: 32 [?]
Q&As Helped to Solve: 36 [?]
Skill Endorsements: 1 [?]
 
0
 

Hi,

I have a piece of code, which gets a string of code from a webpage. It's the HTML source code, from which I want to make an array, in which I can find data the user will input. However, I need to extract all the useful information from the array, and discard the useless. How to search for a two substrings in a string, and copy the string in between?

My code:

NSString *googleString = @"http://www.mypage.com"; 
NSURL *googleURL = [NSURL URLWithString:googleString];
NSError *error;
NSString *googlePage = [NSString stringWithContentsOfURL:googleURL 
                                                encoding:NSASCIIStringEncoding
                                                   error:&error];

returns something like:

<HTML>
<BODY>
<TABLE style="border: Solid 1px Black; border-collapse: collapse; font-family: arial; width: 100%;">
<tr>  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="1.htm" target="main">H4A</A>
  </td>
  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="1.htm" target="main">Aanen </A>
  </td>
  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">

<A HREF ="1.htm" target="main">Joeri</A>
  </td>
</tr>
<tr>  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="2.htm" target="main">H4A</A>
  </td>
  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="2.htm" target="main">Ali </A>
  </td>

  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="2.htm" target="main">Sohail</A>
  </td>
</tr>
<tr>  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="3.htm" target="main">H4A</A>
  </td>
  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="3.htm" target="main">Beerthuijzen </A>

  </td>
  <td BGCOLOR="DCDCDC" NOWRAP style="border: Solid 1px Black; font-family: arial; padding: 2px; width: 100%;">
<A HREF ="3.htm" target="main">Iris</A>
  </td>
</tr>

and so on...

Maybe someone has a better idea? I need to get the strings: Name, Sirname, and page number (Example (The last one): Iris, Beerthuijzen, 3).

Thanks in advance!

Member Avatar
Prabakar
Posting Whiz
342 posts since May 2008
Reputation Points: 77 [?]
Q&As Helped to Solve: 33 [?]
Skill Endorsements: 0 [?]
 
0
 

HTML's are supposed to follow strict xml syntax (not all web pages do so) But the html that your have posted does follow xml syntax so you can use xml parsers and extract data from this. If the page is not xml complaint then you might have to use reg-ex pattern match to extract data. I used this when I first started with reg-ex

You
This article has been dead for over three months: Start a new discussion instead
Post:
Start New Discussion
View similar articles that have also been tagged: