Hello ! I'm trying to create the dictionary which would handle Latin characters like (Å), µ and so on. Apparently I'm require some sort of encoding to Unicode. Is there a way to handle this problem? see code bellow

wordDic = {
        '_chemical_formula_moiety':          'chemical formula                    ',
        '_chemical_formula_weight':          'Fw                                     ',
        '_symmetry_space_group_name_H-M':    'space group                            ',
        '_cell_length_a':                    'a (Å)"                                ',
        '_cell_length_b':                    'b (Å)"                                 ',
        '_cell_length_c':                    'c (Å)"                                ',
        '_cell_angle_alpha':                 '(deg)                                 ',
        '_cell_angle_beta':                  '(deg)                                 ',
        '_cell_angle_gamma':                 '(deg)                                 ',
        '_cell_volume':                      'V (3)                           ',
        '_cell_formula_units_Z':             'Z                                         ',
        '_cell_measurement_temperature':     'T (K)                                       ',
        '_exptl_crystal_density_diffrn':     'calcd (g cm-3)                           ',
        '_exptl_absorpt_coefficient_mu':     ' µ (mm-1)                                     ',
        '_diffrn_radiation_wavelength':      'wavelength ()                          ',
        '_diffrn_reflns_theta_min':          'range (deg)                          ',
        '_refine_ls_R_factor_all':           'R1 [all data]                          ',
        '_refine_ls_R_factor_gt':            'R1a [I > 2s(I)]                        ',
        '_refine_ls_wR_factor_ref':          'wR2 [all data]                         ',
        '_refine_ls_wR_factor_gt':           'wR2b [I > 2s(I)]                         ',
        'refine_ls_goodness_of_fit_ref':     'GOF                                        ',
        'ship': 'slip'}
        wordDic = unicode(wordDic,'latin-1')

It seems like if I place # coding: latin-1 in a beginning of my code it solves a problem with handling Latin character but I still have a problem with Greek ones. Is there universal way to handle both of them at the same time. Also I need to be able to write this Greek characters into a text file !

Any help??

Edited 6 Years Ago by deonis: n/a

You can generate any unicode code point (in any language) by looking up the character code in the unicode chart, and keying the number into the string. For example, the code for Greek upper case Pi is 03A0 (the values are in hexidecimal) and lower case is 03C0.
so you can write;

# this is for Python 2.6
x = u'The lower case of \u03a0 is \u03c0'
print x
# in Python 3.1 all strings are unicode,  so:
x = 'The lower case of \u03a0 is \u03c0'

You can find the code values at

Now, the catch is that you must be running your code in an environment which can actually print the characters you select. The above code works fine inside the interactive window of the pywin32 editor -- which is running in a Windows GUI window. Now if I try the same thing in an interactive command from a 'DOS' command window (which does not have a Greek encoding) I get:

>>> print(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python31\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03a0' in position
18: character maps to <undefined>

Good Luck.

Have you tried # -*- coding: utf-8 -*- ? According to this Unicode HOWTO page the UTF-8 encoding can handle any unicode code point. As vernondcole says, when testing you can't rely on IDLE or DOS to display the characters correctly, no matter what encoding you use. To test, write the output to a text file and open it with a text editor that can handle utf-8 encoded files. I use ActiveState's Komodo Edit.

Thanks Guys I very much appreciate you help!!!! Finally my problem was with the text editor I use to write my code. Apparently, DrPython does not support a Greek characters and generate an error while running:

x = u'The lower case of \u03a0 is \u03c0'
print x

UnicodeEncodeError: 'ascii' codec can't encode character u'\u03a0' in position 18: ordinal not in range(128)
at the same time running a same code in IDLE displays the next output:
"The lower case of Π is π"
Thanks allot ones again !!!!!!!!!!

This article has been dead for over six months. Start a new discussion instead.