I'm trying to cleanse a db field which stores UK postcodes. Postcodes can be 7 or 8 characters long but they will always have a space character in the 4th position from the end. The last 3 will be 1 digit plus 2 letters.

Customers on entering their address will frequently leave out the space (SW1A2AA instead of SW1A 2AA) so I was trying to firstly remove any space characters then re-inserting it at the 4th position from the end.

Remove the space

$query = "UPDATE db set new_postcode = replace(old_postcode, ' ' , '' ) ";$result = mysql_query($query) or die (mysql_error() . "<br />Couldn't execute query: $query");

Add the space

$new_postcode = explode(" ", $new_postcode, -3);

But I think I am not fully grasping using the explode to insert the space.

Member Avatar

You could try this:

$pc = " sa18 1da ";

//for clarity, but you could use \d etc.
echo (preg_match('/^([A-Z]{1,2}[0-9]{1,2})[ ]?([0-9][ABDEFGHJLNPQRSTUWXYZ]{2})$/', strtoupper(trim($pc)),$bits)) ? $bits[1] . ' ' . $bits[2] : 'Invalid Postcode';

I've tested it and it seems to work for uppercase/lowercase mixes and variable first set of letters (1 or 2) and digits (1 or 2). It also works with whether a space is included or not. There's probably a nicer solution though, I try to avoid regex unless it's a last resort as the pregs tend to be a bit slow/clunky.

You seem to want to to the replacement into SQL? Regex (AFAIK) is not that well supported in MySQL - for searching yes, but not for replacement.
When a record gets added you can use the regex above to transform before INSERT.

Just noticed this: http://www.postcode-info.co.uk/about-uk-postcodes.html
It has some postcode formats I've not seen before - LNL NLL, LLNL NLL

SO this:

$pc = "sa1m2qb");
echo (preg_match('/^([A-Z]{1,2}[0-9]{1,2}|[A-Z]{1,2}[0-9][A-Z])[ ]?([0-9][ABDEFGHJLNPQRSTUWXYZ]{2})$/', strtoupper(trim($pc)),$bits)) ? $bits[1] . ' ' . $bits[2] : 'Invalid Postcode'; 

Thanks for that but the problem is cleaning a existing db. We get an orders file from a large ecommmerce site that starts with an "e" and ends with "bay". They allow the customers to put in the postcode in any format - lower case, upper case, no space, 2 spaces or even spaces in the wrong place.

I had considered Regex but it tends to be used in the way you suggested which is to warn when the input is incorrect. However when we get the orders file, the errors are already there.

I think what I will do is create a dummy field where all the spaces are removed. This is so I can compare the pure string with another similar table.

Member Avatar

SO your DB data is already mashed - no way to intercept or modify? If so perhaps you could use a trigger to run an update sql? I think you may have to engineer a SUBSTRING clause to reformat the ppostcode in that case. Creating additional tables seems pretty contrived. If you add a column to the main table such as pc_verified (tinyint, 0/1) then that may be sufficient to flag bad formats.

When you say orders file, what do you mean? In what format is this - a record in a table or a physical file?


Here's some update code I knocked up - you'll need to test it thoroughly though. I coulnd't make it throw a wobbly, but that's only 2 minutes testing:

UPDATE postcodes SET pc = 
    UCASE(IF(LENGTH(TRIM(REPLACE(pc,' ','')))= 7,
        CONCAT(SUBSTRING(TRIM(pc),1,4),' ',SUBSTRING(TRIM(pc),-3)),
    IF(LENGTH(TRIM(REPLACE(pc,' ','')))= 6,
        CONCAT(SUBSTRING(TRIM(pc),1,3),' ',SUBSTRING(TRIM(pc),-3)),
    IF(LENGTH(TRIM(REPLACE(pc,' ','')))= 5,
        CONCAT(SUBSTRING(TRIM(pc),1,2),' ',SUBSTRING(TRIM(pc),-3)),