problems

Question

kur3k -3 Light Poster

14 Years Ago

Hello

My english is not good, so pleas - understat me ;-)

I have two question about python ;)

1. I'm from poland, in poland letter a == ą, z == ź itp. Can i remove polsich letter?

example

in input - jacek bąk

in output - jacek bak

I can use unicode, but, how? or ... i must creat 'krtoka' - (1,2,3)

2. How fast, i can creat list of list

exapmle

[[1,2,3],[1,2,3]]

this list creat user

thank ;-)

list python unicode

5 Contributors
28 Replies
664 Views
5 Days Discussion Span
Latest Post 14 Years Ago Latest Post by jice

All 28 Replies

jice 53 Posting Whiz in Training

14 Years Ago

Problem 1 (done with python 2.4)

# -*- coding: latin-1 -*-
# The first line is important and must be consistant with your file encoding (maybe utf8)
import string
t=string.maketrans('àéèëêôöîïù', 'aeeeeooiiu') 
print "lévitèrent à l'ïstrùmen".translate(t)

for python 3.1

# -*- coding: latin-1 -*-
t=str.maketrans('àéèëêôöîïù', 'aeeeeooiiu')
print ("lévitèrent à l'ïstrùmen".translate(t))

problem 2

mylist=[]
for i in range(10):
    mylist.append([])
    for j in range(1,9,2):
        mylist[i].append(j)
print mylist

# same result with list comprehension
print [[i for i in range(1,9,2)] for i in range(10)]

Gribouillis 1,391 Programming Explorer

14 Years Ago

I found a funny module on the web, called unidecode http://code.zemanta.com/tsolc/unidecode/ . Here is my result

>>> from unidecode import unidecode
>>> unidecode(u'ąóęśłżźćńĄÓĘŚŁŻŹĆ')
'aoeslzzcnAOESLZZC'

nice, isn't it ?

Edit: it's also in pypi: http://pypi.python.org/pypi/Unidecode

Edited 14 Years Ago by Gribouillis because: n/a

TrustyTony 888 pyMod

14 Years Ago

Here some dict lookup code:

# -*- coding: utf-8 -*-

import string

x = []
a='ąóęśłżźćńĄÓĘŚŁŻŹĆŃ' ## utf8 has variable number of bytes
b='aoeslzzcnAOESLZZCN'

au=unicode(a,'utf8')

unpolish=dict()

for i in range(len(au)):
    unpolish[au[i]]=b[i]

print unpolish

def unp(a):
    a=unicode(a,'utf-8')
    t=''
    for i in a:
        if i in unpolish:
            t+=unpolish[i]
        else:
            t+=i
    return t

print
print 'Unpolishing the conversion string'
print a,'unpolished is',unp(a)

Gribouillis 1,391 Programming Explorer

14 Years Ago

Ok, last trial, can you *run* this file and *post* its output in this thread ?

# run_me.py
import os, sys, os.path

class Command(object):
    """Run a command and capture it's output string, error string and exit status"""

    def __init__(self, command):
        self.command = command 

    def run(self, shell=True):
        import subprocess as sp
        process = sp.Popen(self.command, shell = shell, stdout = sp.PIPE, stderr = sp.PIPE)
        self.pid = process.pid
        self.output, self.error = process.communicate()
        self.failed = process.returncode
        return self

    @property
    def returncode(self):
        return self.failed

def main():
    try:
        import easy_install
    except ImportError:
        print("Could not import easy_install")
        return
    com = Command("%s %s unidecode" % (sys.executable, easy_install.__file__)).run()
    if com.failed:
        print com.output
        print com.error
    else:
        try:
            import unidecode
        except ImportError:
            print("Could not import unidecode")
        else:
            print("Module unidecode is installed")

main()

TrustyTony commented: Neat Command object I could use to put command output to Tk text window +1

jice 53 Posting Whiz in Training

14 Years Ago

So I explored a little my solution (which may not be the simplest).
In your case, you have to use an encoding parameter which is using the same length to encode all the polish characters. It seems to be iso-8859-13 as mentionned here
standard-encodings
So, my example becomes :

# -*- coding: utf-8 -*-
import string
a='ąóęśłżźćńĄÓĘŚŁŻŹĆŃ' ## utf8 has variable number of bytes
b='aoeslzzcnAOESLZZCN'

t = string.maketrans(a.decode("utf8").encode('iso8859_13'),b.decode("utf8").encode('iso8859_13'))

print 'ąóęśłżźćńĄÓĘŚŁŻŹĆŃ'.decode("utf8").encode('iso8859_13').translate(t).encode("utf8")
# .decode("utf8").encode('iso8859_13') : decodes the preceeding string using the utf-8 encoding and encodes it in a fixed length encoding defining the letters you need
# .translate(t) translates the string using the rule you mentionned
# .encode("utf8") reencodes (if needed) the result (here encoded in iso8859_13) in utf8

When you're not sure of which encoding you should use, you can try this code :

# These are all the standards encodings
encs=['ascii', 'big5', 'big5hkscs', 'cp037', 'cp424', 'cp437', 'cp500', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2', 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab', 'koi8_r', 'koi8_u', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_32', 'utf_32_be', 'utf_32_le', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8']
for e in encs:
    try:
        t = string.maketrans(a.decode("utf8").encode(e),b.decode("utf8").encode(e))
        print "%s : %s" % (e, 'ąóęśłżźćńĄÓĘŚŁŻŹĆŃ'.decode("utf8").encode(e).translate(t).encode("utf8"))
    except:
        print "error %s" % e

Edited 14 Years Ago by jice because: n/a

Gribouillis commented: Nice. I didn't think it was possible with translate. +3

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Neat Command object I could use to put command output to Tk text window

kur3k -3 Light Poster · Answer 1 · 2010-04-09T18:35:54+00:00

thaks;)

# -*- coding: utf-8 -*-

import string

x = []
text = raw_input()
x.append(text)

t = string.maketrans('ąóęśłżźćńĄÓĘŚŁŻŹĆŃ', 'aoeslzzcnAOESLZZCN')

for i in x:
    print i.translate(t),

what is wrong?

kur3k -3 Light Poster · Answer 2 · 2010-04-09T20:05:49+00:00

# -*- coding: utf-8 -*-

import string

x = []

t = string.maketrans('ąóęśłżźćńĄÓĘŚŁŻŹĆŃ', 'aoeslzzcnAOESLZZCN')

text = raw_input(u"> ")
for i in x:
    x.append(i.translate(t))

print x

other cod, but don't run ... coding is ... xD

jice 53 Posting Whiz in Training · Answer 3 · 2010-04-09T21:33:24+00:00

What's the error message ?
Probably a decoding encoding problem...
try things like

t = string.maketrans('ąóęśłżźćńĄÓĘŚŁŻŹĆŃ'.decode("utf8").encode("latin1"), 'aoeslzzcnAOESLZZCN').decode("utf8").encode("latin1"))

woooee 814 Nearly a Posting Maven · Answer 4 · 2010-04-10T01:29:03+00:00

You can use a one to one dictionary
"ą" --> "a"
or use the ord/decimal value if there are codec conversion problems. A decimal value conversion dictionary would be something like:

conv_dict = {}
conv_dict[ord("ą")] = "a"
#
# and to convert
test_ch = "ą"
if ord(test_ch) in conv_dict:
    test_ch = conv_dict[ord(test_ch)]

kur3k -3 Light Poster · Answer 5 · 2010-04-10T21:49:13+00:00

jice, this example is not better, it show many error ...

woooee, thanks but i search short code and dynamic so who know something about this problem, wrire - pleas ... ; ]

kur3k -3 Light Poster · Answer 6 · 2010-04-11T03:38:50+00:00

Yes, nice but i must have some modules? unicode is basic modules in python?

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 7 · 2010-04-11T03:56:10+00:00

What is your platform windows, linux , other ? Also what is your version of python, 2.6 ?
A good thing to do is go here http://pypi.python.org/pypi/setuptools and download and instal setuptools by following the installation instructions. Then in a console, simply type

easy_install unidecode

This works for all the modules in the Python package index (Pypi http://pypi.python.org). Then you can use the module.

Also note that the module name is not unicode but unidecode !

kur3k -3 Light Poster · Answer 8 · 2010-04-11T17:47:49+00:00

I can use function replace? For example

text = raw_input()

text.replace("ą", "a")

This code isnt run .. why? replace i can use for ... ?

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 9 · 2010-04-11T19:38:54+00:00

Try to run this

import sys
def main():
    text = raw_input("Enter some words with polish letters:")
    print repr(text)
    print sys.version_info
    print sys.platform
    print sys.executable
main()

and post the output here between code tags.
Could you download and run unidecode ?

kur3k -3 Light Poster · Answer 10 · 2010-04-11T20:37:40+00:00

Could you download and run unidecode ?

I download modules unidecode and instal, IDLE return for me error - i dont have this modules ...

i use win os ( i download installer for win, python 2.6 )

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 11 · 2010-04-11T20:56:43+00:00

It's impossible, it's very easy to install. Try this

>>> import setuptools
>>>

does this work ?
If this works
1) open a windows console
2) type: easy_install unidecode
look at the printed messages. If it works, then restart a python shell and try

>>> import unidecode

if import setuptools doesn't work we are going to install setuptools first.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 12 · 2010-04-12T13:42:26+00:00

I tested the procedure on windows XP. I was able to install unidecode like this

C:\Python26\python.exe C:\Python26\Lib\site-packages\easy_install.py unidecode

In a cmd.exe terminal.

TrustyTony 888 pyMod Team Colleague Featured Poster · Answer 13 · 2010-04-12T14:55:05+00:00

The problem analysis:

# -*- coding: utf-8 -*-

import string

x = []
a='ąóęśłżźćńĄÓĘŚŁŻŹĆŃ' ## utf8 has variable number of bytes
##a='boeslzzcnAOESLZZCN' #only changes b to a to test the end of the solution
b='aoeslzzcnAOESLZZCN'

print a,len(a),b,len(b)  ## here is why
t = string.maketrans(a,b)

text = raw_input("> ").translate(t)

print text

If you can use above mentioned premade module OK, otherwise use loop of the conversion strings and dictionary to save the mapping string1 -> string2. Save the strings as real unicode to have same amount of bytes per letter to simplify solution.

Greeting, Tony Veijalainen

TrustyTony 888 pyMod Team Colleague Featured Poster · Answer 14 · 2010-04-12T15:40:27+00:00

P.S:
a='ąóęśłżźćńĄÓĘŚŁŻŹĆŃ'.decode('utf8') ## utf8 has variable number of bytes
fixes the len, but still the variable number of bytes per letter confuse maketrans (and translate would not work)

Tony

kur3k -3 Light Poster · Answer 15 · 2010-04-13T00:16:52+00:00

This is my code http://wklej.org/id/314014/

This is 'simulator' send SMS, i must remove polish letter, but i don't how. I read many reply and i still don't now.

Pleas, i must finish this homework in few day ...

kur3k ( sorry, my english is no good )

kur3k -3 Light Poster · Answer 16 · 2010-04-13T04:32:58+00:00

Could not import unidecode

why? i install for win ( file exe, for python 2.6 ) ? I have linux in virtual machine, but i don't like this os ( in this day xD ) so, what i must do now?

I try in cmd console, but i fell it dosent work ;<

I go today to my teacher, i ask ...

My next question for matrix, i must ( this word in polsih 'transponowac' ). In math wrtie for example A(T) - i search algorythimcs for this problem ;-)

Thanks for help all ;-)

jice 53 Posting Whiz in Training · Answer 17 · 2010-04-13T14:24:38+00:00

Sorry for not having been here for some days...
Gribouillis' solution is probably better.
I have used the one i told you some times but I always have problems to adjust encoding and decoding parameters...
To help you, i need your errors messages...

kur3k -3 Light Poster · Answer 18 · 2010-04-13T23:16:23+00:00

First code dosent work, return error

Traceback (most recent call last):
  File "C:\Users\Konrad\Desktop\testowy.py", line 6, in <module>
    t = string.maketrans(a.decode("utf8").encode('iso8859_13'),b.decode("utf8").encode('iso8859_13'))
  File "C:\Python26\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb9 in position 0: unexpected code byte

Second code run, thanks - give me 40 mintues, i use this code ... ;]

TrustyTony 888 pyMod Team Colleague Featured Poster · Answer 19 · 2010-04-13T23:22:03+00:00

I suffered a lot with these encodings with my anagram program. We have not accented characters but we have characters åäö here in Finland.

Problem is that while the program works in text console in windows, Finnish works normally cp850 encoding. When program is running with windowed environment it has different encoding. From my experiments I ended up with too versions of my code and data file with encodings cp850 and latin1 (or iso8859_15 I think is almost same with euro symbol). Unix environment can allways work in utf-8 and is simpler that way.

So my succestion is take out linuxes sensible encoding if the program must run in windows text terminal.

For more confusion see:
http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using

The console application (appropriate for raw_input use) becomes little simpler:

run cmd.exe and type chcp. Put the encoding in the header and code where I put cp850 (by search and replace, usally ctrl-H in editors). I proved the code for myself with cp850 and only åäö (which by the way is stupid encoding as cp437 is completely capable of displaying the Finnish letters and works with text programs which use characters to draw boxes. But that is usually forced by Microsoft to cmd.exe.)

I must use my language, please adapt to your language.

# -*- coding: cp850 -*-
import string
a='åäö' ## cp850 has fixed number of bytes
b='aao'

t = string.maketrans(a,b)

print 'åäö'.translate(t)

a=raw_input('Hyödyttömästi ääkkösiä, kiitos!\n')

print a.translate(t)

IDLE test!

IDLE 2.6.4
>>>
aao
Hy”dytt”m„sti „„kk”si„, kiitos!Tarkoitatko hyödyttömästi?
Tarkoitatko hyödyttömästi?
>>>

Hey, it almost works! The printing with input from raw_input seems to adapt to cp850 in code. Only printing did not adapt. So when printing own literals, looks like better put them in unicode. Let's prove:

# -*- coding: cp850 -*-
import string
a='åäö' ## cp850 has fixed number of bytes
b='aao'

t = string.maketrans(a,b)

print 'åäö'.translate(t)

a=raw_input(u'Hyödyttömästi ääkkösiä, kiitos!\n')

print a.translate(t)

IDLE OUTPUT:

>>>
aao
Hyödyttömästi ääkkösiä, kiitos!
Hei, nyt näkyy hyödyllisiä aakkosia!
Hei, nyt näkyy hyödyllisiä aakkosia!
>>>

Then we test same in cmd:

D:\Tony>t.py
aao
Traceback (most recent call last):
File "D:\Tony\t.py", line 10, in <module>
a=raw_input(u'Hyödyttömästi ääkkösiä, kiitos!\n')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 2: ordinal not
in range(128)
D:\Tony>

Not quite there for console. How about the other version (not u", but "):

D:\Tony>notepad2 t.py
D:\Tony>t
aao
Hyödyttömästi ääkkösiä, kiitos!
Hämmästyttävää, täytyy myöntää!
Hammastyttavaa, taytyy myontaa!
D:\Tony>

Works in console!

So other version (unicode) works with IDLE, other (8 bit chars) works in console.

Anybody has magic to put one version to work without doing if else with to strings or too version of input part.

If not, maybe I should try to make another version of raw_input, my_input!

Greeting,
Tony

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 20 · 2010-04-13T23:31:20+00:00

Gribouillis 1,391 Programming Explorer

14 Years Ago

@tonyjv Did you try the unidecode module for Finnish ?

TrustyTony 888 pyMod Team Colleague Featured Poster · Answer 21 · 2010-04-14T00:40:04+00:00

@tonyjv Did you try the unidecode module for Finnish ?

Yes it worked OK.

More I often it is bad than good to use that kind of module.

One earlier anagram program had other similar utility always on. It makes sense if the vocabulary used has not those special characters. But anagrams from Finnish vocabulary with those letters changed does not make any sense.

I was by the way proving and posting same time. And if you look carefully the IDLE test was failure for input. Means other code version with first line with latin1 is needed for success in IDLE:

# -*- coding: latin1 -*-
import string

a='åäöÅÄÖ' ## latin1 has fixed number of bytes
b='aaoAAO'

t = string.maketrans(a,b)

print 'Take out äöå:'
print 'åäö'.translate(t)

a=raw_input('Hyödyttömästi ääkkösiä, kiitos!\n')
print 'You said:',a

a=a.translate(t)
print a

>>>
Take out äöå:
aao
Hyödyttömästi ääkkösiä, kiitos!
Ällöttävä testi!
You said: Ällöttävä testi!
Allottava testi!
>>>

TrustyTony 888 pyMod Team Colleague Featured Poster · Answer 22 · 2010-04-14T03:27:43+00:00

More exactly:

from unidecode import unidecode as ud
a=raw_input('Give text: ')
print 'unidecoded text:',ud(a)

>>>
Give text: hölmöläiset HÖLMÖLÄISET Åland
unidecoded text: holmolaiset HOLMOLAISET Aland
>>>

python -m says that unidecode is a package and can not be executed.

jice 53 Posting Whiz in Training · Answer 23 · 2010-04-14T13:48:20+00:00

give me 40 mintues, i use this code ... ;]

This second code is only to help to find the good encoding (this is the hardest part)... Not to translate your whole file... It takes each encoding format and try to translate. If there is an exception, it writes "error".
The output shows you which encoding succeeds in encoding your file.

If the first programme raises an exception, you certainly have something wrong :
- the encoding line (# -*- coding: utf-8 -*-) MUST be consistant with the file encoding and the .decode() part of the commands.
- you've got some characters I hadn't in the string i used for my test and which isn't supported by "iso8859_13". If so, you can use the second part of my code to try to find another encoding format...

Hope this helps... Encoding is really tricky...

problems

Recommended Answers Collapse Answers

All 28 Replies

Recommended Answers