0

I have this url (ISO-8859-1 encoded):
file://///companyweb/pr%C3%B8ve.doc

which I need transformed into this url (UTF-8 encoded):
file://///companyweb/pr%F8ve.doc

The difference is the character 'ΓΈ' which will be encoded as '%C3%B8' in the first case, and as '%F8' in the latter.

In Java I would be able to do this conversion very easy doing something like this..

String iso88591UrlEncoded = "file://///companyweb/pr%C3%B8ve.doc";
String decoded = URLDecoder.decode( iso88591Encoded , "ISO-8859-1" );
String utf8UrlEncoded = URLEncoder.encode( decoded, "UTF-8");

..but Python is very new to and I'm sort of stuck. I have tried several combination of string's decode() and encode() together with urllib's quote() and unquote() methods without much luck. I can't seem to figure out either how to decode it or encode it.

Anyone out there know can give me a clue?

2
Contributors
1
Reply
2
Views
9 Years
Discussion Span
Last Post by OriginalZeroth
0

Difficult question, but mostly because Python represents unicode type characters differently.
Where you have %, python uses \x. And python chokes hardcore when you try to do anything involving an incomplete \x. If its at all possible, you should convert all the % to \x outside of python, then you can do it. Otherwise, with what you gave, its impossible.

And, here is how to do it with Python-style literals:

loc="file://///companyweb/pr\xC3\xB8ve.doc"
utf8encoded = unicode(loc, 'utf-8')
print utf8encoded

The result is:
u'file://///companyweb/pr\xf8ve.doc'

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.