How to force the text to save in UTF-8

Question

samuel1991 0 Newbie Poster

10 Years Ago

Hi all, my question is as follows:

If I use the code as below:

text = '中文' #Text is in Chinese, whereby text = 'chinese'

with open('file.txt','w', encoding = 'UTF-8') as f:
        f.write(text)
        f.close()

My file.txt on Windows notepad will show it as saved as UTF-8.

However, if the text above is changed to some values within ASCII range (English alphabets, numbers etc.), then it will save as ANSI.

I need the file to be saved as UTF-8 as the purpose of this file is some configuration file that might have international characters (For localization purpose).

Tyvm.

python

3 Contributors
6 Replies
660 Views
6 Days Discussion Span
Latest Post 10 Years Ago Latest Post by samuel1991

All 6 Replies

Schol-R-LEA 1,446 Commie Mutant Traitor

10 Years Ago

I think you will find that the problem lies less with your program than it does with the way Notepad renders the codepoints. I don't know all of the details of how Notepad determines how to interpret the characters, but if I read what you are saying correctly, it basically only checks whether the the first few characters are in the ASCII range, and if they are not, it interprets the file as UTF-8; but if the first character is in the ASCII range, it interprets the file as ASCII (or more likely, either LATIN-1 or ISO-8859-1). The stream of bytes you are writing is correct; it is the interpretation of them that Notepad uses that is at fault.

If anyone knows how to force Notepad to interpret a file as UTF-8, please speak up. This video purports to explain how, but it is not a trivial process, and appears to involve registry changes.

In the meawhile, I would instead use an editor that supports multiple encodings such as Notepad++ or TextPad.

However, if the real issue is permitting your user base to edit the file manually, then that is a serious problem with no obvious solution.

Edited 10 Years Ago by Schol-R-LEA

vegaseat 1,735 DaniWeb's Hypocrite

10 Years Ago

I think Schol-R-LEA is correct.

Not sure if this helps ...

I used IDLEX installed on Python341 and it worked fine, could also read the resulting file. The chinese characters displayed properly.

Again, same code as above.

Same holds true for regular IDLE and Python33.

OS = Windows 8.1

You can get IDLEX from:
http://idlex.sourceforge.net/download.html

Edited 10 Years Ago by vegaseat

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

samuel1991 0 Newbie Poster · Answer 1 · 2014-10-27T12:53:42+00:00

However, when this code runs, I get an error crashing from Idle. And only when I force save it from notepad, then the situation is fixed.

vegaseat 1,735 DaniWeb's Hypocrite Team Colleague · Answer 2 · 2014-10-27T18:31:16+00:00

vegaseat 1,735 DaniWeb's Hypocrite

10 Years Ago

You might also want to check
http://python.org/dev/peps/pep-0263/

Edited 10 Years Ago by vegaseat

vegaseat 1,735 DaniWeb's Hypocrite Team Colleague · Answer 3 · 2014-10-27T22:59:30+00:00

This works out of the IP Notebook ...
(read file with Windows8.1 Notepad, some other editors will not read properly)

import codecs

# Text is in Chinese, whereby text = 'chinese'
text = '中文' 

with codecs.open('test.txt', encoding='utf-8', mode='w') as fp:
    fp.write(text)

samuel1991 0 Newbie Poster · Answer 4 · 2014-11-02T13:32:41+00:00

Thanks all and this IDLEX helps.

Apparently this issue has to do with Windows Notepad and not a Python issue.

How to force the text to save in UTF-8

Recommended Answers Collapse Answers

All 6 Replies

Recommended Answers