Hi. I'm having a little difficulty understanding character sets in Python. Basically I'm trying to write a function that will substitute a non-ASCII character with a similar ASCII equivalent. So if given a string like 'ÂBÇD', the function would iterate through the string object, replacing select characters to return a fully-ASCII string, 'ABCD'. It would substitute A for Â and C for Ç, while leaving the other characters alone. (The point to all this is to write EXIF metadata in JPEG images, where some fields only allow ASCII characters. So I want to use the real, partially non-ASCII names for some metadata fields and "safe" ASCIIfied versions for others.)
I've tried this a number of ways and all of them have failed. Adding debugging output, the issue seems to be a character set problem, and I've tried adding encode, decode and unicode functions, however the more I read about character sets in Python, the more confused I get. Right now the solution I'm working on involves a list of tuples like this:
translations = [('Ä', 'A'), ('Å', 'A'), ...] The program accepts a user-supplied string (called nameString) that may contain non-ASCII characters as an argument on the command line. It passes this nameString to the convertToAscii() function. That function uses the translations list of tuples to swap characters where needed:
def convertToAscii(nameString): global translations res = '' for character in nameString: found = False for translation in translations: if character == translation: res = res + translation # replace with ASCII equivalent found = True break if found == False: res = res + character # just use original character return res
Except it doesn't work. The names all come out as they went in, so for some reason Python isn't matching non-ASCII characters in the nameString with strings in translation. Adding debugging print statements shows that it is reading the translations list and its component tuples, however. It's just not recognizing matches. If anyone knows why, I'd be much obliged. In case it's helpful, it's Python 2.5.2 on Linux.