Here is my issue, I am really perplexed when it comes to string parsing and manipulation in php. I am writing a database driven inventory using mysql and php. One of the issues is that there are multiple people who are able (by company rule) to add to this database. One of the items we track is calibration standards. We use a company serial as the serial for the database (which is the pk).

Some people enter the serial in this format "EU#######" the way it is supposed to be entered is "EU ########"

after the form grabs the serial number I need to be able to parse that string see if there is a space after the EU
if not I need to add it.

The regexp comes in in finding that particular substring in the begining of the string and determining if a space is already there or not.

Then I also need to know how to replace that string or simply add a space in between the numeric value and the U.

If anyone could help me or point me to the right functions to make it work that would help alot. Also if anyone has any recommendations on good understandable reading on Regexp and string manipulation I would be grateful, as those are weak areas for me.


That should be simple enough. Question, are the hashes meant to be numbers, or can they be letters too?

How about something like:

echo format_serial('EU12345678'); // EU 12345678
echo format_serial('eu abcd1234'); // EU ABCD1234
echo format_serial('EUz34jaMLA'); // EU Z34JAMLA

function format_serial($serial)
    return preg_replace('/^(eu) ?([a-z\d]{8})$/ie', 'strtoupper("$1 $2")', $serial);

they are supposed to be numbers, this is good for the second part, the first part though there is a need to check if "EU" is in the begining because we don't get all of our serial numbers from this sort of company sometimes the serials are all numeric and do not need to have spaces added. So I need to check if EU is the begining of the string first.

What would be more beneficial would be a place to go to learn this stuff on my own, you know the whole give a fish vs teach to fish method.

Kudos for the desire to learn. I was answering previously when on a break, so just fired off a response.

Lets start again then. So you're looking to:
1. Validate whether the serial number contains "EU" at the beginning, or is entirely numberic;
2. If it does contain "EU" at the beginning, reformat it to contain a space;
3. Ensure the serial number in the format "EU ########", EU followed by 8 digits.

If the serial number is entirely numeric, will it always contain the same number of digits, or do you want to just validate that it does contain entirely digits?

We'll work from the latter premise for the time being.

So below is some revised code, and then we'll step through it piece by piece:

 * Validate serial number format.
 * @param string $serial
 * @return boolean|string
function validate_serial($serial)
    // Check if serial matches EU ######## format (with or without a space)
    if(preg_match('/^(eu) ?(\d{8})$/i', $serial, $matches))
        return strtoupper("{$matches[1]} {$matches[2]}");

    // Check if serial matches ######## format
    if(preg_match('/^\d+$/', $serial))
        return $serial;

    // Otherwise return false
    return false;

validate_serial('EU12345678');  // EU 12345678
validate_serial('EA12345678');  // false
validate_serial('eu87654321');  // EU 87654321
validate_serial('24561357');    // 24561357
validate_serial('1234abcd');    // false

The validate_serial function will take a serial number, validate it's format against the two aforementioned formats and return either the formatted serial, or false.

How does it do this?

  1. Use a regular expression to check if the format matches EU ########, with or without the space as explained - /^(eu) ?(\d{8})$/i:

The slashes at the beginning and end of the string are delimiters to show where the regex starts and ends.

The ^ and $ are used to indicate that the entire string must match the defined pattern, not just a sub section of the string.

The ( ) are used to enclose matches. Any part of the string that matches the sub-patterns enclosed within the brackets will be added to the $matches array.

The eu means literally the characters e and u. It doesn't matter that they're lowercase because of the i at the end of the regex. That means case insensitive.

The space followed by ? means literally a space character, which is optional. The question mark shows means that it's a single optional character. Multiple spaces will fail.

The \d means any numeric character, i.e. 0 - 9. You can also write this [0-9], but \d is shorter.

The {8} means that we're expecting 8 characters. Not more, less or a range. Exactly 8. You could say 8 - 10 using {8,10}, or 8 or more using {8,} or up to 8 using {,8}. Hopefully you get the idea.

If this regex matches, it'll reformat the serial by explicitly adding a space between the first and second matches (the two bits enclosed in brackets).

  1. Use another regular expression to check if the format matches ######## (numbers), as explained - /^\d+$/:

So we've covered the /, ^ and $. In this regex, we don't need to collect the matches to use them later, so none of the pattern is enclosed in brackets.

The \d again means any numeric character. This time it's followed by +. This means one or more. We could have written this {1,} as mentioned above, but again + is shorter.

If this regex matches, it'll return the serial as is.

Finally, if neither regex matches, the function will return false to indicate the serial number is invalid.

Does that make sense? Hopefully it wasn't too much detail.

Tried your code out and with a litle tweaking I got it to work the way I needed, looke like the preg replace automatically checks out the serial for me so it solves bot issues.

my final is :

function format_serial($serial)
    return preg_replace('/^(eu) ?([a-z\d].+)$/ie', 'strtoupper("$1 $2")', $serial);

changed the 8 to any because the ending numbers are sometimes more and sometimes less.