Hello,
I have not had a huge amount of experience with regular expressions.
Any help you can give is greatly appreciated.

I have an input file like this:

Name: Bob, Age: 20, Details: Likes chocolate
Hates lettuce, Location: London

Name: James, Age: 42, Details: Sometimes goes swimming, Location: New York

Name: Jenny, Age: 15, Details: Eats out every night
Dislikes cats
Loves Dogs, Location: Tokyo

Name: Henry, Age: 65, Details: Hate rain
Loves sunshine

Name: Julian, Age: 34, Details: Has too many CDs, Location: Paris

I want to match some details using the regular expression and preg_match_all:

/Name: (?<name>\w+), Age: (?<age>\d+),.*?Location: (?<location>.*?)/is

Where it holds the values of name, age and location, over multiple lines and being case insensitive.

However, the above regular expression will match the below as one entity because "Henry" does not have a location.

Name: Henry, Age: 65, Details: Hate rain
Loves sunshine

Name: Julian, Age: 34, Details: Has too many CDs, Location: Paris

Is is possible to have it so that the match ".*?" cannot contain the string "Name" (which would then prevent it from carrying on to into another person's entry).
eg:

/Name: (?<name>\w+), Age: (?<age>\d+),.*?[^Name]Location: (?<location>.*?)/is

However, I realise this is not correct as [^Name] matches N, a, m or e
I know it's possible for a single character, but how about a string?

Thank you for any help you can give,
Ralf

There's no need to use a regular expression for that

$string =  'Name: Bob, Age: 20, Details: Likes chocolate';
  $properties = explode(",", $string);
  $result = array();
  foreach($properties as $property) {
    $split = explode(":", $property);
    $result[$split[0]] = $split[1];
  }
  print_r($result);
  /*
    Array
      (
        [Name] => ' Bob',
        [Age]  => ' 20',
        [Details] => ' Likes chocolate'
      )
  */

There's no need to use a regular expression for that

$string =  'Name: Bob, Age: 20, Details: Likes chocolate';
  $properties = explode(",", $string);
  $result = array();
  foreach($properties as $property) {
    $split = explode(":", $property);
    $result[$split[0]] = $split[1];
  }
  print_r($result);
  /*
    Array
      (
        [Name] => ' Bob',
        [Age]  => ' 20',
        [Details] => ' Likes chocolate'
      )
  */

That's a good solution, thanks!
The only problem is if there were commas in the details section, eg:
"Likes chocolate, cake and fishfingers". Any ideas?

Also, out of interest, is explode more efficient than regular expressions? The examples I gave were only a small section of a larger page.

Thanks again,
Ralf

That's a good solution, thanks!
The only problem is if there were commas in the details section, eg:
"Likes chocolate, cake and fishfingers". Any ideas?

Also, out of interest, is explode more efficient than regular expressions? The examples I gave were only a small section of a larger page.

Thanks again,
Ralf

Well the following code should solve that for you:

<?
$string =  'Name: Bob, Age: 20, Details: Likes chocolate, cake and fishfingers';
  $properties = explode(",", $string);
  $result = array();
  $joinid=3;
  while(isset($properties[$joinid])) {
  $properties[2].=','.$properties[$joinid];
  unset($properties[$joinid]);
  $joinid+=1;
  }
  foreach($properties as $property) {
    $split = explode(":", $property);
    $result[$split[0]] = $split[1];
  }
  print_r($result);
?>

Well the following code should solve that for you:

<?
$string =  'Name: Bob, Age: 20, Details: Likes chocolate, cake and fishfingers';
  $properties = explode(",", $string);
  $result = array();
  $joinid=3;
  while(isset($properties[$joinid])) {
  $properties[2].=','.$properties[$joinid];
  unset($properties[$joinid]);
  $joinid+=1;
  }
  foreach($properties as $property) {
    $split = explode(":", $property);
    $result[$split[0]] = $split[1];
  }
  print_r($result);
?>

Thanks for that.
If I've dry run your code properly, a problem I can see is that by appending everything after element 3 onto element 3, if there was a location, it will also get appended onto element 3.
Then when you explode by ":" it would be element 2, which would not be picked up by the results.

I had hoped the code could be kept simple by using a regular expression as Details is an unknown input and could potentially contain anything.

Thanks for your replies.
Ralf

Thanks for that.
If I've dry run your code properly, a problem I can see is that by appending everything after element 3 onto element 3, if there was a location, it will also get appended onto element 3.
Then when you explode by ":" it would be element 2, which would not be picked up by the results.

I had hoped the code could be kept simple by using a regular expression as Details is an unknown input and could potentially contain anything.

Thanks for your replies.
Ralf

If you are saying that you want another array or field then the following should allow for the location array providing it remains called "location:" (without the quotes)

<?
$string =  'Name: Bob, Age: 20, Details: Likes chocolate, cake and fishfingers, Location: New York';
  $properties = explode(",", $string);
  $result = array();
  $joinid=3;
  $join='details';
  while(isset($properties[$joinid])) {
  if (!preg_match('/[ ]location\:/i',$properties[$joinid]) && $join=='details') {
      $properties[2].=','.$properties[$joinid];
      unset($properties[$joinid]);
      } else {
      if ($join=='details') {
          $join='location';
          $properties[3].=$properties[$joinid];
          } else {
          $properties[3].=','.$properties[$joinid];
          }
      unset($properties[$joinid]); 
      }
  $joinid+=1;
  }
  foreach($properties as $property) {
    $split = explode(":", $property);
    $result[$split[0]] = $split[1];
  }
  echo "<xmp>";
  print_r($result);
  echo "</xmp>";
?>

Hope that helps answer it.

If you are saying that you want another array or field then the following should allow for the location array providing it remains called "location:" (without the quotes)
...
Hope that helps answer it.

Had a mess about with the code. There was one bug

unset($properties[$joinid]);

which would always run. This was a problem when there weren't commas in the Details section, as it would then unset array element 3, therefore deleting the Location. eg:

Name: Julian, Age: 34, Details: Has too many CDs, Location: Paris

However, thank you very much for the code. Here is my final version:

<?
$string = <<<EOF
Name: Bob, Age: 20, Details: Likes chocolate, sweets, 
cake and fishfingers, 
Location: London, New York

Name: Henry, Age: 65, Details: Hate rain,
Loves sunshine

Name: Julian, Age: 34, Details: Has too many CDs, Location: Paris
EOF;

$allResults = array();
$i = -1;
$eachEntry = explode("Name: ", $string);
foreach($eachEntry as $entry) {

   $entry="Name: ".$entry;
   $joinid = 3;
   $join = true;
   $properties = explode(",", $entry);

   while(isset($properties[$joinid])) {
      if (!preg_match('/\s*Location\:/i',$properties[$joinid]) && $join) {
         $properties[2].=', '.trim($properties[$joinid]);
         unset($properties[$joinid]);
      } else {
         if ($joinid != 3) {
            if ($join) {
               $join = false;
               $properties[3] .= trim($properties[$joinid]);
            } else {
               $properties[3] .= ', '.trim($properties[$joinid]);
            }
            unset($properties[$joinid]); 
         }
      }     
      $joinid++;
   }

   $result = array();
   foreach($properties as $property) {
      $split = explode(":", $property);
      $result[trim($split[0])] = $split[1];
   }

   if ($i != -1)
      $allResults[$i] = $result;
   $i++;
}

echo "<xmp>";
print_r($allResults);
echo "</xmp>";
?>

And gives the output

Array
(
    [0] => Array
        (
            [Name] =>  Bob
            [Age] =>  20
            [Details] =>  Likes chocolate, sweets, cake and fishfingers
            [Location] =>  London, New York
        )

    [1] => Array
        (
            [Name] =>  Henry
            [Age] =>  65
            [Details] =>  Hate rain, Loves sunshine
        )

    [2] => Array
        (
            [Name] =>  Julian
            [Age] =>  34
            [Details] =>  Has too many CDs
            [Location] =>  Paris
        )
)

It's a bit of a brute, but it does the job.

Thanks again for all your help!
Ralf

For those who are interested, I found out a regular expression that can match the pattern I was looking for.

Old Expression:

/Name: (?<name>\w+), Age: (?<age>\d+),.*?Location: (?<location>.*?)/is

New Expression:

/Name: (?<name>\w+), Age: (?<age>\d+), (?:(?:(?!Name:\s).)*Location: (?<location>[^\n]*)\n)?/is

The new section can be show more generically:

(?:(?:(?![STRING NOT ALLOWED]).)*[BOUNDARY BEFORE](?<[GROUP NAME]>.*?)[BOUNDARY AFTER])?

where you replace (Including the [ and ]):
[STRING NOT ALLOWED] = String you don't want allowed through
[BOUNDARY BEFORE]    = The text or characters just before the information you want
[BOUNDARY AFTER]     = The text or characters just after the information you want

For those who wish to know how the expression works, read the details between the dashed lines.
-------------------------------------------------------------------------------------------------
The new section can be broken into 3 main sections:

1. (?:
2. (?:(?!Name:\s).)*
3. Location: (?<location>[^\n]*)\n
4. )?

1. and 4. are the wrapper for the non-capturing group (http://www.regular-expressions.info/named.html) which can occur 0 or 1 times (ie. optional)
2. is a non-capturing group containing a negative look ahead, checking that the string Name is not present. This group occurs 0 or more times as you want to check all the characters before reaching Location. - **Note 1**
3. Matches 0 to many characters which are not a new line between "Location: " and "\n" (new line character) and stores them in named group "location" - **Note 2**

**Note 1**
This is a greedy search and will only return the last match before "Location: ".
Therefore, had there been another location field on the next line after an entry, this expression would match that location.
To match the previous location, use a lazy search (by adding a "?" after "*").

**Note 2**
In the generic expression I have used ".*?" as a lazy search, but in my actual expression I have used "[^\n]*"
This is because I can be more specific about what I want to stop at. The lazy search will work for anything, but the carot negate is more specific and therefore more efficient.
-------------------------------------------------------------------------------------------------

The expression will still match and save the name and age of entries, even if they do not have a location.
In order to not display them, use the PREG_SET_ORDER flag on the preg_match_all() and then when iterating over the array, simply check if location has been set. eg:

foreach($matches as $m) {
   if($m['location']) { 
      echo "<br />Name     = {$m['name']}\n"; 
      echo "<br />Age      = {$m['age']}\n"; 
      echo "<br />Location = {$m['location']}\n"; 
      echo "<br />\n"; 
   }
}

So, my final expression and code:
(I added the "x" flag after the "/" to allow whitespace in the regular expression. Therefore actual whitespace to be matched is represented by "\s")

<? 
$pattern = '/
  Name: \s    (?<name>     \w+   ),\s
  Age:  \s    (?<age>      \d+   ),\s
(?:
  (?:(?!Name:\s).)*
  Location:\s (?<location> [^\n]*)\n
)?
/isx';

$subject = <<<BLOCK
Name: Bob, Age: 20, Details: Likes chocolate
Hates lettuce, Location: London

Name: James, Age: 42, Details: Sometimes goes swimming, Location: New York

Name: Jenny, Age: 15, Details: Eats out every night
Dislikes cats
Loves Dogs, Location: Tokyo

Name: Henry, Age: 65, Details: Hate rain
Loves sunshine

Name: Julian, Age: 34, Details: Has too many CDs, Location: Paris

BLOCK;

preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);

foreach($matches as $m) {
   if($m['location']) { 
      echo "<br />Name     = {$m['name']}\n"; 
      echo "<br />Age      = {$m['age']}\n"; 
      echo "<br />Location = {$m['location']}\n"; 
      echo "<br />\n"; 
   }
}
?>

Gives the output:

<br />Name     = Bob
<br />Age      = 20
<br />Location = London
<br /> 
<br />Name     = James
<br />Age      = 42
<br />Location = New York
<br /> 
<br />Name     = Jenny
<br />Age      = 15
<br />Location = Tokyo
<br /> 
<br />Name     = Julian
<br />Age      = 34
<br />Location = Paris
<br />
This question has already been answered. Start a new discussion instead.