You can use a one to one dictionary
"ą" --> "a"
or use the ord/decimal value if there are codec conversion problems. A decimal value conversion dictionary would be something like:
conv_dict = {}
conv_dict[ord("ą")] = "a"
#
# and to convert
test_ch = "ą"
if ord(test_ch) in conv_dict:
test_ch = conv_dict[ord(test_ch)]
woooee
Nearly a Posting Maven
2,454 posts since Dec 2006
Reputation Points: 777
Solved Threads: 714
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
What is your platform windows, linux , other ? Also what is your version of python, 2.6 ?
A good thing to do is go here http://pypi.python.org/pypi/setuptools and download and instal setuptools by following the installation instructions. Then in a console, simply type
easy_install unidecode
This works for all the modules in the Python package index (Pypi http://pypi.python.org ). Then you can use the module.
Also note that the module name is not unicode but unidecode !
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
Try to run this
import sys
def main():
text = raw_input("Enter some words with polish letters:")
print repr(text)
print sys.version_info
print sys.platform
print sys.executable
main()
and post the output here between code tags.
Could you download and rununidecode ?
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
It's impossible, it's very easy to install. Try this
>>> import setuptools
>>>
does this work ?
If this works
1) open a windows console
2) type: easy_install unidecode
look at the printed messages. If it works, then restart a python shell and try
>>> import unidecode
if import setuptools doesn't work we are going to install setuptools first.
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
I tested the procedure on windows XP. I was able to install unidecode like this
C:\Python26\python.exe C:\Python26\Lib\site-packages\easy_install.py unidecode
In a cmd.exe terminal.
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
The problem analysis:
# -*- coding: utf-8 -*-
import string
x = []
a='ąóęśłżźćńĄÓĘŚŁŻŹĆŃ' ## utf8 has variable number of bytes
##a='boeslzzcnAOESLZZCN' #only changes b to a to test the end of the solution
b='aoeslzzcnAOESLZZCN'
print a,len(a),b,len(b) ## here is why
t = string.maketrans(a,b)
text = raw_input("> ").translate(t)
print text
If you can use above mentioned premade module OK, otherwise use loop of the conversion strings and dictionary to save the mapping string1 -> string2. Save the strings as real unicode to have same amount of bytes per letter to simplify solution.
Greeting, Tony Veijalainen
pyTony
pyMod
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
Here some dict lookup code:
# -*- coding: utf-8 -*-
import string
x = []
a='ąóęśłżźćńĄÓĘŚŁŻŹĆŃ' ## utf8 has variable number of bytes
b='aoeslzzcnAOESLZZCN'
au=unicode(a,'utf8')
unpolish=dict()
for i in range(len(au)):
unpolish[au[i]]=b[i]
print unpolish
def unp(a):
a=unicode(a,'utf-8')
t=''
for i in a:
if i in unpolish:
t+=unpolish[i]
else:
t+=i
return t
print
print 'Unpolishing the conversion string'
print a,'unpolished is',unp(a)
pyTony
pyMod
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
P.S:
a='ąóęśłżźćńĄÓĘŚŁŻŹĆŃ'.decode('utf8') ## utf8 has variable number of bytes
fixes the len, but still the variable number of bytes per letter confuse maketrans (and translate would not work)
Tony
pyTony
pyMod
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
Ok, last trial, can you *run* this file and *post* its output in this thread ?
# run_me.py
import os, sys, os.path
class Command(object):
"""Run a command and capture it's output string, error string and exit status"""
def __init__(self, command):
self.command = command
def run(self, shell=True):
import subprocess as sp
process = sp.Popen(self.command, shell = shell, stdout = sp.PIPE, stderr = sp.PIPE)
self.pid = process.pid
self.output, self.error = process.communicate()
self.failed = process.returncode
return self
@property
def returncode(self):
return self.failed
def main():
try:
import easy_install
except ImportError:
print("Could not import easy_install")
return
com = Command("%s %s unidecode" % (sys.executable, easy_install.__file__)).run()
if com.failed:
print com.output
print com.error
else:
try:
import unidecode
except ImportError:
print("Could not import unidecode")
else:
print("Module unidecode is installed")
main()
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691