SELECT getdatabaseencoding(), current_setting('client_encoding'), current_setting('lc_ctype'), current_setting('lc_collate'), version();

returns

UTF8 UNICODE C C PostgreSQL 9.1.7 on x86_64-unknown-linux-gnu, compiled by gcc-4.5.real (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2, 64-bit

select lower('ČĘĖĮ')

returns ČĘĖĮ

Actually what I want is to seach surnames with ILIKE which is not working for those chars. WHy this can be?

Member Avatar for LastMitch

UTF8 UNICODE C C PostgreSQL 9.1.7 on x86_64-unknown-linux-gnu, compiled by gcc-4.5.real (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2, 64-bit

May I ask what is the default encoding on PostgreSQL? I think it is not recognizing it.

Try this:

psql my_database -c 'SHOW SERVER_ENCODING'

Read this (it has a function which you can used to test it out before you can test out your query):

http://www.postgresql.org/docs/8.1/static/multibyte.html

I executed

SHOW SERVER_ENCODING

in navicat and it shows UTF8. It probably deos not matter if I use navicat and not psql, right?

Member Avatar for LastMitch

in navicat and it shows UTF8. It probably deos not matter if I use navicat and not psql, right?

I never heard or used navicat until you mention it.

This is more than just an query issue I think it has something to do with configuration.

Read these short instructions (I assume you have read this if not you can take a look at it, it's very short):

http://wiki.navicat.com/wiki/index.php/How_to_display_special_characters_properly_in_MySQL%3F

http://wiki.navicat.com/wiki/index.php/Why_I_can%27t_change_the_character_set_and_collation_for_field,_table_and_database%3F

http://wiki.navicat.com/wiki/index.php/How_can_I_display_Unicode_in_Linux%3F

and also look at this tutorial (work around):

http://supergeekery.com/geekblog/comments/mamp-and-navicat-and-character-set-hell

and also look at this navicat manuel (go to the page 60 or look for the word conversion and click on it and read and follow the instructions):

http://www.navicat.com/manual/pdf_manual/en/navicat/mac_manual/PostgreSQL_Database_Object_Management.pdf

those links are just about set up on navicat. But think I need in general. I mean seeting up navicat will not make in php it to work well.

And it does not matter how it shows in navicat those chars, the problem is that they are not converted to lowercase

Member Avatar for LastMitch

the problem is that they are not converted to lowercase

Try these query:

SELECT LOWER('word') FROM SomeTable WHERE word ILIKE 'ČĘĖĮ%';

SELECT * FROM SomeTable where LOWER('word') ILIKE 'ČĘĖĮ%';

To me this word ČĘĖĮ, those letters have accents so I think it's not changing the word in lower case?

Have you try just using plain word without an accent. For example Work:

SELECT id FROM SomeTable WHERE LOWER(word)=LOWER('WORK')

SELECT * FROM sometable WHERE LOWER(word) ILIKE (LOWER('WORK') || '%');

If the query works from above then try this:

SELECT * FROM SomeTable WHERE LOWER(word)=LOWER('ČĘĖĮ')
SELECT LOWER('word') FROM SomeTable WHERE word ILIKE 'ČĘĖĮ%';

you meant

lower(word)

probably.

-----------

SELECT LOWER(surname) FROM managers WHERE surname ILIKE 'ČĘĖĮ%';



returns ČĘĖĮ

--------------

this

SELECT * FROM managers where LOWER(surname) ILIKE 'ČĘĖĮ%';

returns row, but its ok. WHen we change to 'ČĘĖį' (one letter is lower) then it does not return row.

----------

SELECT id FROM managers WHERE LOWER(surname)=LOWER('WORK')

this returns a row, when real surname is word 'WORK'

--------

SELECT * FROM sometable WHERE LOWER(word) ILIKE (LOWER('WORK') || '%');

this upper code row also returns.

--------

If the query works from above then try this:

SELECT * FROM SomeTable WHERE LOWER(word)=LOWER('ČĘĖĮ')

works. Also as in earlier - it works because fucntion lower returns uppercase those uccented letters and so they are equal. If I change one letter to lowerse

LOWER('ČĘĖį')
  • then it does not return row.
Member Avatar for LastMitch

returns row, but its ok. WHen we change to 'ČĘĖį' (one letter is lower) then it does not return row.

SELECT * FROM managers where LOWER(surname) ILIKE 'ČĘĖĮ%';

I think that's a good thing.

What language is this word ČĘĖĮ?

Since it did lower one letter then I think it has something to do with accents preventing it to lower case and character sets UTF8 encoding on your navicat / PostgreSQL is not readable (compatible) with that language.

I think UTF8 encoding might not be right for the language you are using.

Have you thought about another encoding? If you did what was your second encoding?

What language is this word ČĘĖĮ?

Its just random letters. Those letters are from lithuanian language.

it would be very weird that UTF-8 would be bad. Every site which uses UTF-8 works well with lituanian chars. I did not try another encoding.

Member Avatar for LastMitch

Is it really still a bug or what the hell. On mysql tried - works like charm.

That link is pretty old but useful.

There's nothing wrong with the UTF8 encoding so disregard what I mention.

You're right it would be wierd not to have that.

Read this (it's more updated and more detail about the issue you are having):

http://postgresql.1045698.n5.nabble.com/Concerning-about-Unicode-aware-string-handling-td5709258.html

I read and find something that you can used as a workaround:

Convert Lithuanian alphabet to ASCII then I think you can used lower() function.

The query in the link used Russian alphabet but you can used it for Lithuanian alphabet:

http://postgres.cz/wiki/PostgreSQL_SQL_Tricks#from_Russian_alphabet_.28Cyrillic.29_to_ASCII

Yeah, converting to ASCII might help, but then there is no convertion back. But for search purpose it might do the job -

convert to ascii, then what user enters in the for also convert to ascii, both lower case, and serach will find.

Then ILIKE will work as well so I will not even need function lower().

here is what I made:

CREATE OR REPLACE FUNCTION transliterate(my_text VARCHAR) RETURNS varchar AS $$
DECLARE 
  text_out VARCHAR DEFAULT '';
BEGIN
   text_out := my_text;
   text_out := translate(text_out, 'àâäåáăąãā', 'aaaaaaaaa');
   text_out := replace(text_out, 'æ', 'ae');
   text_out := translate(text_out, 'çċćčĉ', 'ccccc');
   text_out := translate(text_out, 'éèėëêēĕę', 'eeeeeee');
   text_out := translate(text_out, 'îïìíīį', 'iiiii');
   text_out := translate(text_out, 'ñ', 'n');
   text_out := translate(text_out, 'ôöøõō', 'ooooo');
   text_out := replace(text_out, 'œ', 'oe');
   text_out := translate(text_out, 'ùúüûūų', 'uuuuuu');
   text_out := translate(text_out, 'ýÿỳ', 'yyy');
   text_out := replace(text_out, 'ß', 'ss');
     text_out := translate(text_out, 'š', 's');
   text_out := translate(text_out, 'ž', 'z');
   text_out := translate(text_out, 'ÀÂÄÅÁĂĄÃĀ', 'AAAAAAAAA');
   text_out := replace(text_out, 'Æ', 'AE');
   text_out := translate(text_out, 'ÇĊĆČĈ', 'CCCCC');
   text_out := translate(text_out, 'ÉÈĖËÊĒĔĘ', 'EEEEEEE');
   text_out := translate(text_out, 'ÎÏÌÍĪĮ', 'IIIII');
   text_out := translate(text_out, 'Ñ', 'N');
   text_out := translate(text_out, 'ÔÖØÕŌ', 'OOOOO');
   text_out := replace(text_out, 'Œ', 'OE');
   text_out := translate(text_out, 'ÙÚÜÛŪŲ', 'UUUUUU');
   text_out := translate(text_out, 'ÝŸỲ', 'YYY');
     text_out := translate(text_out, 'Š', 'S');
   text_out := translate(text_out, 'Ž', 'Z');

   RETURN text_out;
END;
$$ LANGUAGE plpgsql;

this function is from some page, I just added few more chars of lithuanian language. It will be ok. This will not work for example with russian chars, because there are no those yet, but if we will need we can update this function.

Still not perfect solution when it needs manual letters update, but at least better will be. Now I use surname search like this:

$surname_condition = "AND transliterate(surname) ILIKE '%" . $this->db->escape_like_str($surname_like) . "%'";

thanks for help :)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.