Member Avatar
diafol

Something odd going on with str_replace. When I replace a set of double characters with another set (double characters), I get a text length increase of 1.

$gw = array("ll","ch");
$gw2 = array("lž","cž");
$cw = array("â","`");
$cw2 = array("â","à");

$s = str_replace($cw,$cw2,$s);
$s = preg_replace('~[^\\pL\d]+~u', '-', $s);
$s = trim($s);
if (function_exists('iconv')){
	$s = iconv('utf-8', 'us-ascii//TRANSLIT', $s);
}
$s = preg_replace('~[^-\w]+~', '', $s);
//text increases by 1 for every occurrance of below: 
$out = str_replace($gw,$gw2,$s);
//text displays correct number of characters, but when a replacement is made, a check with strlen() gives an increase of 1.

This is driving me bonkers!

echo strlen($s);//for llanelli: output = 8
$out = str_replace($gw,$gw2,$s);//replace 2 x ll with 2 x lž 
echo strlen($out);//for lžanelži: output = 10

Don't ask why I'm doing this - long story! Am I missing something really basic? [Page encoding = utf-8]

UTF-8 etc have multibyte character encoding
characters above the 128 us ascii set get 2 bytes, 3 bytes (up to 6 bytes) to represent them
makes it backwards compatible with non-utf8 applications, which display garbage, but ascii garbage and dont crash
strlen() appears to be ascii byte counting
here is an explanation in wikipedia
that result is proper, displayed value is XXX characters, for each character that is 2byte encoded you get XXX+1

commented: Thanks - spot on as usual +5
Member Avatar
diafol

I had a creeping suspicion it was an ascii issue - but know v. little about it. AB - thanks again. Much appreciated. Will check my code to ensure that I haven't made other mistakes with non-ascii chars.

UTF-8 is the first time I have seen 'new' stuff that didnt actively kill 'old' stuff
its a good idea, it works

it must have been an accident

Member Avatar
diafol

UTF-8 is the first time I have seen 'new' stuff that didnt actively kill 'old' stuff
its a good idea, it works

it must have been an accident

Care to elaborate?

Member Avatar
diafol

Ah, OK, brain freeze. I understand now. SOlved.