1

Sooo ... I learned during my schema migration that our webpages use UTF-8 encoding but our MySQL database (all InnoDB tables) uses latin1. After finally running some utf8_decode() php magic, I was finally able to successfully get all posts to display correctly again.

However, now that I realize there's an issue, I want to do something about it moving forward. I'm not so sure it's a good idea to continue to be throwing UTF-8 encoded characters into a latin1 column in the database.

While we are an English-speaking site, a lot of users will sometimes use non-English characters within their posts/usernames, not to mention we have a lot of curly quotes that were copy/pasted from Microsoft Word.

We use the CodeIgniter PHP framework, and one of its functions is called ascii_to_entities(). I could run all user-submitted data through this function, and essentially it would be converted to HTML entities (i.e. © is the copyright symbol) where it can be stored nicely in both UTF-8 and latin1 databases and also display nicely on webpages. However, is this really something that I want to do? Is this the better alternative than converting MySQL to UTF-8?

If I were to go ahead and convert MySQL to UTF-8, would it correctly fix all of the UTF-8 characters currently stored in the latin1 database? (They look like gibberish when browsing through phpMyAdmin, for example, although they render just fine in the UTF-8 website.)

Being the SEO that I am, I'm also concerned about the search engine repercussions of my decision. Suppose a staff writer writes an article which makes heavy use of MS Word's "smart quotes". When I View the HTML source of that webpage, either I'm going to see curly quotes or I'm going to see a whole bunch of HTML entities. Neither are standard English characters. Which does googlebot prefer? ;)

Trying to make an educated decision about which is the best route moving forward.

Edited by Dani

2
Contributors
1
Reply
2
Views
5 Years
Discussion Span
Last Post by smantscheff
1

You will definitively have less problems if all your stuff - database, scripts, html templates etc. - are in utf-8. If this is a hassle with phpMyAdmin, use a fine database editor like Navicat or HeidiSQL.
If your site has the proper "encoding" attributes in header and meta taga, then the search engines will correctly interpret the typographical quotes as such and not index them.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.