0

Hello,

I have a problem with htmlentities(). I 'm missing something here, but I don't know what it is.
In my webpage, which is set to utf-8, the result of

echo htmlentities('éè')

is éè.
Why isn't the result éè ?
Thing is : I want to put the name 'Hélène' in my database. After stripslashes() and validating the string, I prepare the string as follows :

$name = htmlentities(mysqli_real_escape_string($dbc, trim($name)));

In my database, I have set the column "name" to VARCHAR(30). If I enter 'éè' in this column, it results in éè
In other words, the name 'Hélène' is too long to be entered in a column set to VARCHAR(30).
Can someone tell me what I did wrong here ?
Thanks a lot.

3
Contributors
9
Replies
10
Views
6 Years
Discussion Span
Last Post by Geertc
0

Try this code I found a while back

function charset_decode_utf_8($string) {
    if(@!ereg("[\200-\237]",$string) && @!ereg("[\241-\377]",$string)) {
        return $string;
    }
$string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e","'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",$string);
$string = preg_replace("/([\300-\337])([\200-\277])/e","'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'",$string);
return $string;
}
echo charset_decode_utf_8('é');
0

Thanks Metalix !

Does this mean that, if you webpages are utf-8 encoded, you don't really need to use the htmlentities()-function ?

0

I've had problems with multibyte characters since before I could walk!

Firstly, ensure everything is set to UTF-8 (head encoding and DB charset).7
Make sure that your files are saved as "UTF-8 without BOM". You can check this with a free-use editor like Notepad++.

Non-US-ASCII (or whatever you want to call them) characters take up 2 bytes (or even 3 for some Asian sets), so if you're using UTF-8 throughout, your table field sizes should be x3 as long as you envisaged. There's some talk of 4-byte charcters, so perhaps we should ammend our max sizes to x4. VARCHAR will be preferable to CHAR datatypes for this - otherwise you'll end up wasting mem.

Don't store text in html encoded form (e.g. © ). That'll really mess up your field widths.

When performing string functions, ensure that you use the multibyte versions if they are available: mb_strlen() as opposed to strlen() and mb_substr() as opposed to substr(). There are also 'count' functions.

Edited by diafol: n/a

0

Hey again.
yes you definitely need to use htmlentities when outputting any user data.
especially if that data is going inside a tag you will need to use htmlentities('',ENT_QUOTES) so it doesn't break your site

0

If you use mysql_real_escape_string(), you've got all the quotes covered.
Using htmlentities or striptags can be used when outputting to protect yourself from scipting. I don't think you necessarily need that for input. As I've mentioned previously, if everything is in utf8, I can't see the benefit in using htmlentities just to store non-ASCII characters. Anybody have any different ideas?

0

Is it necessary to manipulate user-input with htmlentities before inserting it into the database ?

$name = strip_tags($_POST['name'];
// now check if the content of $name is valid....and than prepare to insert in db
$name = htmlentities(mysqli_real_escape_string($dbc, trim($name)));
// next insert data in db

I've seen bits of code where 'htmlentities' is used, and 'strip_tags' is not used.
(and not 'stripslashes' as I wrongly mentionned in my original question.)
Metalix, you are talking about 'outputting any user data'. I assume to a browser ?

0

Hey Ardav,

That's the point. In my case, if I use htmlentities before storing data in my db, I have to change the size of my db-fields to 3 times the size they have now.
If I don't use htmlentities, I save a lot of memory...

0

Yes, I'd leave the htmlentities personally. However, take heed of the need to increase your field lengths x3 anyway. As some multibyte chars can take 3 bytes. If you use varchar(18) instead of varchar(6) and the data ends up being just 6 chars, you don't lose out. The problem comes when you use char(18) instead of char(6) - now that really does bite.

Iñtërnâtiônàlizætiøn is stored as 27 bytes, although it's only 20 chars.

A varchar(20) would store: Iñtërnâtiônàliz (obviously not enough)
A varchar(27) would store: Iñtërnâtiônàlizætiøn (on the button!)

The htmlentities('Iñtërnâtiônàlizætiøn') gives a whopping 118 bytes - over 4x the amount req'd by multibyte storage.

Edited by diafol: n/a

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.