Hi all, my question is as follows:

If I use the code as below:

text = '中文' #Text is in Chinese, whereby text = 'chinese'

with open('file.txt','w', encoding = 'UTF-8') as f:
        f.write(text)
        f.close()

My file.txt on Windows notepad will show it as saved as UTF-8.

However, if the text above is changed to some values within ASCII range (English alphabets, numbers etc.), then it will save as ANSI.

I need the file to be saved as UTF-8 as the purpose of this file is some configuration file that might have international characters (For localization purpose).

Tyvm.

Recommended Answers

All 6 Replies

I think you will find that the problem lies less with your program than it does with the way Notepad renders the codepoints. I don't know all of the details of how Notepad determines how to interpret the characters, but if I read what you are saying correctly, it basically only checks whether the the first few characters are in the ASCII range, and if they are not, it interprets the file as UTF-8; but if the first character is in the ASCII range, it interprets the file as ASCII (or more likely, either LATIN-1 or ISO-8859-1). The stream of bytes you are writing is correct; it is the interpretation of them that Notepad uses that is at fault.

If anyone knows how to force Notepad to interpret a file as UTF-8, please speak up. This video purports to explain how, but it is not a trivial process, and appears to involve registry changes.

In the meawhile, I would instead use an editor that supports multiple encodings such as Notepad++ or TextPad.

However, if the real issue is permitting your user base to edit the file manually, then that is a serious problem with no obvious solution.

However, when this code runs, I get an error crashing from Idle. And only when I force save it from notepad, then the situation is fixed.

This works out of the IP Notebook ...
(read file with Windows8.1 Notepad, some other editors will not read properly)

import codecs

# Text is in Chinese, whereby text = 'chinese'
text = '中文' 

with codecs.open('test.txt', encoding='utf-8', mode='w') as fp:
    fp.write(text)

I think Schol-R-LEA is correct.

Not sure if this helps ...

I used IDLEX installed on Python341 and it worked fine, could also read the resulting file. The chinese characters displayed properly.

Again, same code as above.

Same holds true for regular IDLE and Python33.

OS = Windows 8.1

You can get IDLEX from:
http://idlex.sourceforge.net/download.html

Thanks all and this IDLEX helps.

Apparently this issue has to do with Windows Notepad and not a Python issue.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.