Can I get a hand with Regex?

Question

cjohnweb 14 User Title? What's that?

13 Years Ago

Hello!

I've come across a little problem with Regex, I was hoping someone might see the problem off hand.

Here is some sample content I am searching through, It's the gameshark codes for PEC:

#Ace Combat 3 Electrosphere#SLPS-02021#SLPS-02020
§N Joker Command
D00BF32E ????
"Infinite Time for battle
D0054332 2442
80054332 2402
#Name of Game#SLUS-Number#Second-Slus-num
"Code Folder/Name of Code (codes in hex)
abcdef01 2345
6789ABCD EF12
.This line is a comment

I am trying to write a PHP script to organize the codes. You have to edit the 4MB+ txt file to add or remove codes, it's a pain. I was hoping to make life easier for myself and anyone who might want to upload existing code list, Add codes to database, compile a custom code list, and allow the user to save the custom file.

This project will be open source under a CC license!

I am working to import the existing code list at the moment. Here is what I have:

//Open code list
$file = "./codelist.inf";
$fh = fopen($file, 'r');
$codes = fread($fh, filesize($file));
fclose($fh);


//Match Game Title
$mg = "(#[A-Za-z0-9\-\ ]{1,200}(#[A-Za-z0-9\-\ ]{1,25}){1,3})";

//Match Codes
$mc = "[A-Fa-f0-9\ ]{13}";

//Matches Codes Names
$mcn = "\"[A-Za-z0-9\ \-\\\\]{1,}";



//Do Matches
preg_match_all($mg, $codes, $return);
print_r($return);

Here is the problem: I can match the Game title, match codes and the Code Names expressions individually, but I can't seam to get them to work together.

I am open to other possibilities, though I would like to see how regex can do it. Ultimately I just need to separate the file at each game title into an array.

Goal output might look like this:

Array
(
[0] => Array(
#Ace Combat 3 Electrosphere#SLPS-02021#SLPS-02020
§N Joker Command
D00BF32E ????
"Infinite Time for battle
D0054332 2442
80054332 2402
)

[1] => Array(
#Name of Game#SLUS-Number#Second-Slus-num
"Code Folder/Name of Code (codes in hex)
abcdef01 2345
6789ABCD EF12
)

)

Right now I have:

preg_match_all('/(#[A-Za-z0-9\-\ ]{1,200}(#[A-Za-z0-9\-\ ]{1,25}){1,3})/ism', $codes, $return);

and when I add a new line \n to the end (before the /ism) it no longer matches the game titles but returns a few empty arrays.

preg_match_all('/(#[A-Za-z0-9\-\ ]{1,200}(#[A-Za-z0-9\-\ ]{1,25}){1,3})\n/ism', $codes, $return);

Is there an alternative to \n, am I using it wrong? I thought that I could do this:

//Match Game Title
$mg = "(#[A-Za-z0-9\-\ ]{1,200}(#[A-Za-z0-9\-\ ]{1,25}){1,3})";

//Match Codes
$mc = "[A-Fa-f0-9\ ]{13}";

//Matches Codes Names
$mcn = "\"[A-Za-z0-9\ \-\\\\]{1,}";



//Do Matches
preg_match_all("($mg)\n(($mc|$mcn)\n){2,}", $codes, $return);
print_r($return);

But, alas, to no avail!

How might I go about combining the 3 expressions so that it returns the game titles, code names and codes, etc?

Thanks for the help!

open-source php regex

Edited 13 Years Ago by cjohnweb because: corrections

2 Contributors
8 Replies
119 Views
1 Day Discussion Span
Latest Post 13 Years Ago Latest Post by cjohnweb

All 8 Replies

diafol

13 Years Ago

How about read the file into a string,
Explode the string on \n
Foreach line use $ as the end for regex

Build the output array as you seem fit within the loop.

Does that make sense?? I'm not that hot on regex. It's just I've been working on a project lately, which shares a little common ground with your problem.

Reading this again, how about explode on initial # and then on \n ?

Use preg_split?

$chunks = preg_split("/(\n#)/", $string, -1,PREG_SPLIT_DELIM_CAPTURE);

Dunno if that would work though

Edited 13 Years Ago by diafol because: n/a

diafol

13 Years Ago

$str ='#Ace Combat 3 Electrosphere#SLPS-02021#SLPS-02020
§N Joker Command
D00BF32E ????
"Infinite Time for battle
D0054332 2442
80054332 2402
#Name of Game#SLUS-Number#Second-Slus-num
"Code Folder/Name of Code (codes in hex)
abcdef01 2345
6789ABCD EF12
.This line is a comment';


$chunks = preg_split("/(\n#)/", $str,-1, PREG_SPLIT_DELIM_CAPTURE);

echo "<pre>";
print_r($chunks);
echo "</pre>";

Gives this output:

Array
(
    [0] => #Ace Combat 3 Electrosphere#SLPS-02021#SLPS-02020
Â§N Joker Command
D00BF32E ????
"Infinite Time for battle
D0054332 2442
80054332 2402
    [1] => 
#
    [2] => Name of Game#SLUS-Number#Second-Slus-num
"Code Folder/Name of Code (codes in hex)
abcdef01 2345
6789ABCD EF12
.This line is a comment
)

OK, so # stays on first entry coz no \n at start, but that's fine.
PREG_SPLIT_... constant places delim (\n#) into own item. I didn't expect that, I thought it would just tag on to the item. Doh.

Edited 13 Years Ago by diafol because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

cjohnweb 14 User Title? What's that? · Answer 1 · 2011-06-29T03:52:26+00:00

That sounds like it could work...ill have to mess with it here for a little bit...Thanks for the input.

cjohnweb 14 User Title? What's that? · Answer 2 · 2011-06-29T04:13:23+00:00

Ah! That is brilliant! I never thought to do it that way.

ardav: I got a foreach loop to work as well, like so:

echo "<pre>";
echo "<div style=\"float:left; width:650px; padding:10px; margin:10px; border:1px solid #ff0000; overflow:auto;\">";
$file = "./codelist.inf";
$fh = fopen($file, 'r');
$codes = fread($fh, filesize($file));
fclose($fh);

// Remove all comments from file first (lines starting with ;*) 
preg_match_all("/(;\*).*?\n/is", $codes, $return);
foreach ($return[0] as $k => $v){
$codes = str_replace($v,"",$codes);
}
unset($return);


$pcs = explode("\n",$codes);

$count = 0; // set count to 0
foreach ($pcs as $k => $v){

// If first char is a #
if($v[0] == "#"){$count++;}
$cs[$count] .= $v."\n";

}
print_r($cs);

Which gives me:

Array
(
    [1] => #007 - Tomorrow never dies#SLUS-00975
"Infinite lives
8001E096 0007
"Infinite pk7 ammo
800EA846 000F
"Infinite assault rifle ammo
800EA8B2 000F
"Infinite medikits
800EA9DC 0007

    [2] => #007 - Tomorrow Never dies#SLPS-00000
"INFINITE LIVES
8001E07A 0063
"CAN SELECT ALL MISSION
8001E224 0101
8001E226 0101
8001E228 0101
8001E22A 0101
8001E22C 0101

//...etc

Thanks guys.

Anyone have any ideas why the newline wont work? I ended up trying \r\n and it worked for the Game Titles but not the code titles.

diafol · Answer 3 · 2011-06-29T04:37:46+00:00

OK, perhaps this is easier?

$chunks = preg_split("/\n#/", $str);
$chk[] = $chunks[0];
for($x=1;$x<count($chunks);$x++){
	$chk[] = '#' . $chunks[$x];
}
echo "<pre>";
print_r($chk);
echo "</pre>";

cjohnweb 14 User Title? What's that? · Answer 4 · 2011-06-29T04:43:56+00:00

ardav, That works beautifully! I went ahead and finished the script with foreach loops and such, I'll post code. But I think I will attempt the same thing with Regex and see which one is less code, fast, etc...I'll have to run some tests. Thanks everyone for all the help!

<?php
echo "<pre>";
echo "<div style=\"float:left; width:650px; padding:10px; margin:10px; border:1px solid #ff0000; overflow:auto;\">";
//Open code file
$file = "./codelist.inf";
$fh = fopen($file, 'r');
$codes = fread($fh, filesize($file));
fclose($fh);

// Remove all comments from file first (lines starting with ;*) 
preg_match_all("/(;\*).*?\n/is", $codes, $return);
foreach ($return[0] as $k => $v){
$codes = str_replace($v,"",$codes);
}
unset($return);

// Explode at every line
$pcs = explode("\n",$codes);

//Set code count, used in foreach loop
$count_code_titles = 0;

// For each line...
foreach ($pcs as $k => $v){

// If first char is a #, Set game title
if($v[0] == "#"){$title = ucwords(strtolower(substr($v,1)));}

// If first char is #, Set code name
if($v[0] == "\""){
$count_code_titles++;
$cs[$title][$count_code_titles]['name'] = ucwords(strtolower(substr($v,1)));
}

//If first char is ., set comment
if($v[0] == "."){ 
$cs[$title][$count_code_titles]['comment'] = substr($v,1);
}

//If not a ., " or #, save code
if($v[0] != "." && $v[0] != "\"" && $v[0] != "#"){$cs[$title][$count_code_titles][] = $v;}

}

//Print results!
print_r($cs);
echo "</div>";


echo "<div style=\"float:left; width:600px; padding:10px; margin:10px; border:1px solid #000000;\"><h2>Original</h2><br />";
echo $codes;
echo "</div>";
echo "</pre>";
?>

Here is a sample of real output:

Array
(
    [007 - Tomorrow Never Dies#slus-00975
] => Array
        (
            [1] => Array
                (
                    [name] => Infinite Lives
                    [0] => 8001E096 0007
                )

            [2] => Array
                (
                    [name] => Infinite Pk7 Ammo
                    [0] => 800EA846 000F
                )

            [3] => Array
                (
                    [name] => Infinite Assault Rifle Ammo
                    [0] => 800EA8B2 000F
                )

            [4] => Array
                (
                    [name] => Infinite Medikits
                    [0] => 800EA9DC 0007
                )

        )

    [007 - Tomorrow Never Dies#slps-00000
] => Array
        (
            [5] => Array
                (
                    [name] => Infinite Lives
                    [0] => 8001E07A 0063
                )

            [6] => Array
                (
                    [name] => Can Select All Mission
                    [0] => 8001E224 0101
                    [1] => 8001E226 0101
                    [2] => 8001E228 0101
                    [3] => 8001E22A 0101
                    [4] => 8001E22C 0101
                )

        )

...

diafol · Answer 5 · 2011-06-29T23:26:25+00:00

I know this is solved, but had another thought:

$lines = explode("\n",$str);
$x=-1;
foreach($lines as $line){
	switch(true){
		case preg_match("/^#/",$line):
			$i=0;
			$x++;
			$key='user';
			$output[$x][$key] = $line;
			break;
		case preg_match("/^\"/",$line):
			$key = 'name';
			$output[$x]['titles'][][$key] = $line;	
			$i++;
			break;
		case preg_match('/^\./',$line):
			$key = "comment";
			$output[$x][$key] = $line;
			break;
		default:
			$key = "item";
			$output[$x]['titles'][$i][] = $line;
			break;	
	}
}

echo "<pre>";
print_r($output);
echo "</pre>";

Probably a bit mashed but the output isn't too far off. However, there are probably other 'types' to check. For example the '§N Joker Command' line ends up as an 'item' - which is probably wrong. The 'default' needs some work. :(

cjohnweb 14 User Title? What's that? · Answer 6 · 2011-06-30T00:06:48+00:00

That's a really nice and easy way to do it for sure! That's sooo simple to read, I really like that. I'll mess with it here in a little bit. Thanks for the help! I am going to publish the tool at http://pec-code-manager.iluvjohn.com/

Can I get a hand with Regex?

Recommended Answers Collapse Answers

All 8 Replies

Recommended Answers