Accessing Unicode Data From a MySQL Database With PHP

Thread Solved

Join Date: Sep 2009
Posts: 22
Reputation: Alba Ra is an unknown quantity at this point 
Solved Threads: 0
Alba Ra Alba Ra is offline Offline
Newbie Poster

Accessing Unicode Data From a MySQL Database With PHP

 
0
  #1
20 Days Ago
Hello,

I am a training developer in .NET and web 2.0 technologies and this one is an exercise from one of our first lessons in PHP: accessing a MySQL database from a PHP script.

The problem I've encountered is that while database is thought to be in Unicode (i.e. the values contain accented characters, as this exercise is in French), the PHP script result in garbage characters (actually just one, a enlongated rectangle).

The exercise consists of a PHP script and a MySQL database with two tables where the script queries all entry data from one column in just one table (so no joins necessary).

The MySQL table is called "Restaurant" - well, actually this side isn't that much important yet. (Which means I'll come back to that later.)

The PHP script is
  1. <?php
  2. mysql_connect("localhost","root","");
  3. mysql_select_db("Restaurant");
  4. $myQuery = "SELECT LibPlat FROM Plats";
  5. $rs = mysql_query($myQuery);
  6. while ($row = mysql_fetch_array($rs)) {
  7. echo ($row['LibPlats']."<br />\n");
  8. }
  9. ?>

My setup consists of Eclipse with PDT and Zend Debugger (latest version, i.e. 3.5.1 and just updated). I use Zend Server CE with its MySQL install (phpMyAdmin).

Basically the script does just what I've been asking it to do, except that the French accents (like in thé or café) don't appear (there's this rectangle instead).

Shawn Olson (or rather Andrew Penry on Olson's website) says that storing data in UTF-8 is not enough on MySQL: "[y]ou must also tell MySQL that the data you are passing into it is UTF-8 otherwise it will assume it is in its default encoding."

His code looks like this:
  1. <?php
  2. $DB = new mysqli('localhost', 'user', 'root', 'dbname');
  3. $DB->query("SET NAMES 'utf8'");
  4. if (!empty($_POST['ta'])) {
  5. $DB->query("UPDATE document SET unicodeText='{$_POST['ta']}' WHERE ID=1");
  6. }
  7. $result = $DB->query("SELECT unicodeText FROM document WHERE ID=1");
  8. $return = $result->fetch_object();
  9. $result->close();
  10. ?>

I just have a little bit of a problem decyphering his code. I'm pretty sure that I can ignore his mysqli line. And $DB->query(""); looks pretty much the same way like my two-liner $myQuery = ""; mysql_query($myQuery); but then I run into some code I can't my hands nor tails of.

Does anyone know how to adapt Andrew's code snippet for my exercise? Or does anyone have a better idea of how to achieve that goal, and, as Andrew wrote, configuring the server may not always be an option, yet he hints that his code should eventually do the same thing as if I would set the server up to use unicode as the default character encoding.

Thank you!

Source:
Unicode Data with PHP 5 and MySQL 4.1
Alba Ra
C# (coming from C and C++)
PHP/XHTML/CSS
plans: Java, VB and Python
Reply With Quote Quick reply to this message  
Join Date: May 2007
Posts: 431
Reputation: Atli is on a distinguished road 
Solved Threads: 55
Atli's Avatar
Atli Atli is offline Offline
Posting Pro in Training
 
0
  #2
20 Days Ago
Hey.

Originally Posted by Alba Ra View Post
Basically the script does just what I've been asking it to do, except that the French accents (like in thé or café) don't appear (there's this rectangle instead).
Do you set the charset for the resulting page?
By default, most browsers use the ISO charset, so if you plan on printing Unicode characters into it, you must specify that in the request.
Try adding this to the top of the page and see what happens.
  1. header('content-type: text/html; charset=utf8');
You can also swap charsets in your browser, just to check it out.
If you use Firefox you do: View->Character Encoding->[new encoding]
If you use Internet Explorer... get Firefox! IE is the bane of web development
Please do not ask for help in a PM. Use the forums.
And use [code] tags!
Reply With Quote Quick reply to this message  
Join Date: Sep 2009
Posts: 22
Reputation: Alba Ra is an unknown quantity at this point 
Solved Threads: 0
Alba Ra Alba Ra is offline Offline
Newbie Poster
 
0
  #3
20 Days Ago
I am sorry that I didn't mention it.

The PHP is embedded in an HTML page. Doctype and everything is declared and the resulting HTML checks out as it should as XHTML 1.0 Strict on the W3C validator. Already when the PHP parser writes the HTML code the accented characters come out garbled, so this isn't the reason. Thanks anyway! Any help is appreciated (and if only to rule out some possible error sources)!
Alba Ra
C# (coming from C and C++)
PHP/XHTML/CSS
plans: Java, VB and Python
Reply With Quote Quick reply to this message  
Join Date: May 2007
Posts: 431
Reputation: Atli is on a distinguished road 
Solved Threads: 55
Atli's Avatar
Atli Atli is offline Offline
Posting Pro in Training
 
0
  #4
20 Days Ago
Originally Posted by Alba Ra View Post
Shawn Olson (or rather Andrew Penry on Olson's website) says that storing data in UTF-8 is not enough on MySQL: "[y]ou must also tell MySQL that the data you are passing into it is UTF-8 otherwise it will assume it is in its default encoding."
Well, I don't know what they are on about there, but I have never come across a situation where I explicitly needed to specify a charset when sending a query, and I use UTF-8 exclusively in my PHP apps.
Not beyond specifying a charset in the CREATE TABLE commands, anyways.

You might want to check this manual entry out tho: 9.1.4. Connection Character Sets and Collations

Originally Posted by Alba Ra View Post
I just have a little bit of a problem decyphering his code. I'm pretty sure that I can ignore his mysqli line. And $DB->query(""); looks pretty much the same way like my two-liner $myQuery = ""; mysql_query($myQuery); but then I run into some code I can't my hands nor tails of.
See if this makes more sense:
  1. <?php
  2. // Open a new MySQL connection
  3. $DB = new mysqli('localhost', 'user', 'root', 'dbname');
  4.  
  5. // Execute a query that sets the charset to be used in the following queries.
  6. $DB->query("SET NAMES 'utf8'");
  7.  
  8. // Checks if a POST value named 'ta' was passed or not.
  9. if (!empty($_POST['ta'])) {
  10. // Update a row in the 'document' table using the passed data.
  11. $DB->query("UPDATE document SET unicodeText='{$_POST['ta']}' WHERE ID=1");
  12. }
  13. // Fetch a row from the 'document', the same row we just updated.
  14. $result = $DB->query("SELECT unicodeText FROM document WHERE ID=1");
  15.  
  16. // Assign the result from the previous query to the $return variable.
  17. // I assume it will be used later in the code. Otherwise this line is pointless.
  18. $return = $result->fetch_object();
  19.  
  20. // Close the result set from the previous query.
  21. $result->close();
  22. ?>
Please do not ask for help in a PM. Use the forums.
And use [code] tags!
Reply With Quote Quick reply to this message  
Join Date: May 2007
Posts: 431
Reputation: Atli is on a distinguished road 
Solved Threads: 55
Atli's Avatar
Atli Atli is offline Offline
Posting Pro in Training
 
0
  #5
20 Days Ago
Originally Posted by Alba Ra View Post
I am sorry that I didn't mention it.

The PHP is embedded in an HTML page. Doctype and everything is declared and the resulting HTML checks out as it should as XHTML 1.0 Strict on the W3C validator. Already when the PHP parser writes the HTML code the accented characters come out garbled, so this isn't the reason. Thanks anyway! Any help is appreciated (and if only to rule out some possible error sources)!
Ahh ok.

Is the MySQL table created to use UTF-8?
If you are not sure, try using the SHOW CREATE TABLE command. It should provide a DEFAULT CHARSET clause or two.

And how do you insert the data into MySQL?
You mention using phpMyAdmin. I've had problems inserting Unicode data via phpMyAdmin in the past. It seemed to only want to send ISO chars, even into a table clearly specified as UTF8.

Try applying the utf8_encode function to your data before printing it:
  1. echo utf8_encode($row['LibPlats']) ."<br />\n";
Last edited by Atli; 20 Days Ago at 8:32 pm. Reason: Spell-checker phail.
Please do not ask for help in a PM. Use the forums.
And use [code] tags!
Reply With Quote Quick reply to this message  
Join Date: Sep 2009
Posts: 22
Reputation: Alba Ra is an unknown quantity at this point 
Solved Threads: 0
Alba Ra Alba Ra is offline Offline
Newbie Poster
 
0
  #6
19 Days Ago
Thank you - both ways work.

Using a query to set the charset for a query:
  1. <?php
  2. mysql_connect("localhost", "root", "");
  3. mysql_select_db("restaurant");
  4. $myQuery = "SET NAMES 'utf8'";
  5. mysql_query($myQuery);
  6.  
  7. if (!empty($_POST['ta'])) {
  8. mysql_query("UPDATE Plats SET LibPlats='{$_POST['ta']}'");
  9. }
  10. $myQuery = "SELECT LibPlats, PrixPlats FROM Plats";
  11. $rs = mysql_query($myQuery);
  12. while ($row = mysql_fetch_array($rs)) {
  13. print ($row['LibPlats']."<br />\n&euro;".$row['PrixPlats']."<br />\n");
  14. }
  15. ?>
And setting the charset for print/echo:
  1. <?php
  2. mysql_connect("localhost", "root", "");
  3. mysql_select_db("restaurant");
  4. $myQuery = "SELECT LibPlats, PrixPlats FROM Plats";
  5. $rs = mysql_query($myQuery);
  6. while ($row = mysql_fetch_array($rs)) {
  7. print utf8_encode($row['LibPlats']."<br />\n&euro;".$row['PrixPlats']."<br />\n");
  8. }
  9. ?>
What I don't understand yet is the if-condition, especially with the $_POST array. Why 'ta'? (Of course, this lack of understanding doesn't prevent me from using the code!)

And: setting the charset for print seems the quicker way, so what may be the drawback of its use over setting the charset for the query?
Alba Ra
C# (coming from C and C++)
PHP/XHTML/CSS
plans: Java, VB and Python
Reply With Quote Quick reply to this message  
Join Date: May 2007
Posts: 431
Reputation: Atli is on a distinguished road 
Solved Threads: 55
Atli's Avatar
Atli Atli is offline Offline
Posting Pro in Training
 
0
  #7
19 Days Ago
Originally Posted by Alba Ra View Post
What I don't understand yet is the if-condition, especially with the $_POST array. Why 'ta'?
The $_POST array represents the data sent via a HTML <form>, using the POST method. So the data in $_POST['ta'] is just the data in whatever form element was named "ta".

The if statement checks to see if there is any data in that element, and if there is, updates the database with that value.

Originally Posted by Alba Ra View Post
And: setting the charset for print seems the quicker way, so what may be the drawback of its use over setting the charset for the query?
I'm not sure, to be honest. I've never seen a setup that required the charset to be explicitly set like that.
The obvious downside is that by using the utf_encode function, you need to encode each field individually. By setting it for the queries you can just set it and forget it.

I still suspect there is something wrong with the charset configuration in your MySQL server. If the tables are correctly set up to use UTF-8, PHP should be getting the data UTF-8 encoded.
I recommend you check your CREATE TABLE statements. See what charset you are actually using on them.
Please do not ask for help in a PM. Use the forums.
And use [code] tags!
Reply With Quote Quick reply to this message  
Join Date: Sep 2009
Posts: 22
Reputation: Alba Ra is an unknown quantity at this point 
Solved Threads: 0
Alba Ra Alba Ra is offline Offline
Newbie Poster
 
0
  #8
19 Days Ago
Okay, I verified, recreated, exported, re-recreated my tables so everything is in UTF-8 character set:
  1. -- phpMyAdmin SQL Dump
  2. -- version 3.2.0
  3. -- http://www.phpmyadmin.net
  4. --
  5. -- Host: localhost
  6. -- Generation Time: Nov 05, 2009 at 09:01 PM
  7. -- Server version: 5.1.35
  8. -- PHP Version: 5.3.0
  9.  
  10. SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
  11.  
  12. --
  13. -- Database: `restaurant`
  14. --
  15. CREATE DATABASE `restaurant` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
  16. USE `restaurant`;
  17.  
  18. -- --------------------------------------------------------
  19.  
  20. --
  21. -- Table structure for table `categorie`
  22. --
  23.  
  24. CREATE TABLE IF NOT EXISTS `categorie` (
  25. `IdCategorie` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  26. `LibCategorie` varchar(255) NOT NULL,
  27. `OrdreCategorie` tinyint(3) unsigned DEFAULT NULL,
  28. PRIMARY KEY (`IdCategorie`)
  29. ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=5;
  30.  
  31. --
  32. -- Dumping data for table `categorie`
  33. --
  34.  
  35. INSERT INTO `categorie` (`IdCategorie`, `LibCategorie`, `OrdreCategorie`) VALUES
  36. (1, 'Entrées', 1),
  37. (2, 'Plats', 2),
  38. (3, 'Desserts', 3),
  39. (4, 'Boissons', NULL);
  40.  
  41. -- --------------------------------------------------------
  42.  
  43. --
  44. -- Table structure for table `plats`
  45. --
  46.  
  47. CREATE TABLE IF NOT EXISTS `plats` (
  48. `IdPlats` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  49. `LibPlats` varchar(255) NOT NULL,
  50. `PrixPlats` decimal(5,2) DEFAULT NULL,
  51. `IdCategorie` smallint(6) NOT NULL,
  52. PRIMARY KEY (`IdPlats`)
  53. ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=15 ;
  54.  
  55. --
  56. -- Dumping data for table `plats`
  57. --
  58.  
  59. INSERT INTO `plats` (`IdPlats`, `LibPlats`, `PrixPlats`, `IdCategorie`) VALUES
  60. (1, 'soup chinois', '2.99', 1),
  61. (2, 'soup russe', '2.49', 1),
  62. (3, 'bienenstich', '2.79', 3),
  63. (4, 'steak haché', '6.99', 2),
  64. (5, 'gratin dauphin', '6.39', 2),
  65. (6, 'mohnpielen', '2.75', 3),
  66. (7, 'canard aigre-doux', '7.29', 2),
  67. (8, 'coca zéro', '1.59', 4),
  68. (9, 'coca light', '1.59', 4),
  69. (10, 'coca cola', '1.79', 4),
  70. (11, 'perri-air', '18.99', 4),
  71. (12, 'thé', '2.19', 4),
  72. (13, 'café', '3.39', 4),
  73. (14, 'eau plat', '1.29', 4);
Yet the PHP code still produces the rectangles instead of the correct character set.

So I think that in the case of unicode (UTF-8) characters, the queries have to be declared UTF-8. (Yeah, it makes sense to declare the charset encoding as early and globally as possible.)

So for the time being all that's need is:
  1. <?php
  2. mysql_connect("localhost", "root", "");
  3. mysql_select_db("restaurant");
  4. $myQuery = "SET NAMES 'utf8'";
  5. mysql_query($myQuery);
  6.  
  7. $myQuery = "SELECT LibPlats, PrixPlats FROM Plats";
  8. $rs = mysql_query($myQuery);
  9. while ($row = mysql_fetch_array($rs)) {
  10. print ($row['LibPlats']."<br />\n&euro;".$row['PrixPlats']."<br />\n");
  11. }
  12. ?>

Should the if-condition testing $_POST's contents be necessary at some later point, I'll ask again.
Last edited by Alba Ra; 19 Days Ago at 9:12 am.
Alba Ra
C# (coming from C and C++)
PHP/XHTML/CSS
plans: Java, VB and Python
Reply With Quote Quick reply to this message  
Reply

Tags
mysql, php, unicode

This thread has been marked solved.
Perhaps start a new thread instead?
Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC