0

Something odd going on with str_replace. When I replace a set of double characters with another set (double characters), I get a text length increase of 1.

$gw = array("ll","ch");
$gw2 = array("lž","cž");
$cw = array("â","`");
$cw2 = array("â","à");

$s = str_replace($cw,$cw2,$s);
$s = preg_replace('~[^\\pL\d]+~u', '-', $s);
$s = trim($s);
if (function_exists('iconv')){
	$s = iconv('utf-8', 'us-ascii//TRANSLIT', $s);
}
$s = preg_replace('~[^-\w]+~', '', $s);
//text increases by 1 for every occurrance of below: 
$out = str_replace($gw,$gw2,$s);
//text displays correct number of characters, but when a replacement is made, a check with strlen() gives an increase of 1.

This is driving me bonkers!

echo strlen($s);//for llanelli: output = 8
$out = str_replace($gw,$gw2,$s);//replace 2 x ll with 2 x lž 
echo strlen($out);//for lžanelži: output = 10

Don't ask why I'm doing this - long story! Am I missing something really basic? [Page encoding = utf-8]

2
Contributors
5
Replies
6
Views
7 Years
Discussion Span
Last Post by diafol
3

UTF-8 etc have multibyte character encoding
characters above the 128 us ascii set get 2 bytes, 3 bytes (up to 6 bytes) to represent them
makes it backwards compatible with non-utf8 applications, which display garbage, but ascii garbage and dont crash
strlen() appears to be ascii byte counting
here is an explanation in wikipedia
that result is proper, displayed value is XXX characters, for each character that is 2byte encoded you get XXX+1

Votes + Comments
Thanks - spot on as usual
0

I had a creeping suspicion it was an ascii issue - but know v. little about it. AB - thanks again. Much appreciated. Will check my code to ensure that I haven't made other mistakes with non-ascii chars.

0

UTF-8 is the first time I have seen 'new' stuff that didnt actively kill 'old' stuff
its a good idea, it works

it must have been an accident

0

UTF-8 is the first time I have seen 'new' stuff that didnt actively kill 'old' stuff
its a good idea, it works

it must have been an accident

Care to elaborate?

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.