decode string

Question

Zeinab_1 0 Newbie Poster

9 Years Ago

I have wrote a simple code in python

s = b'B1=A\xF1adir+al+carrito\n'.decode('latin-1')
print(s)
with open ('lat.txt','wb') as f:
    f.write(bytes(s,'latin-1'))

the output is B1=Añadir+al+carrito and the content of the file is also the same.

but when I try to read from a file (with this content B1=A\xF1adir+al+carrito )

for lines in open('mytxt1.txt','rb'):
      print(lines)
      s = lines.decode('latin-1')
      print(s)
      with open ('lat1.txt','wb') as f:
         f.write(bytes(s,'latin-1'))

I don't get the output B1=Añadir+al+carrito but instead I get B1=A\xF1adir+al+carrito,
and the file empty,
any idea whta should I do?

python

Edited 9 Years Ago by Zeinab_1 because: couldn't insert code at first place

5 Contributors
5 Replies
269 Views
23 Hours Discussion Span
Latest Post 9 Years Ago Latest Post by vegaseat

All 5 Replies

woooee 814 Nearly a Posting Maven

9 Years Ago

Use the codecs' encoding parameter when reading and writing, although you can do it manually yourself, I find that this method works without problems. Also, note that how it prints depends on the default encoding of your OS.

import codecs

s = b'B1=A\xF1adir+al+carrito\n'.decode('latin-1')
with codecs.open('lat.txt', mode="wb", encoding='latin-1') as fp:
    fp.write(s)

with codecs.open('lat.txt', "r", encoding='latin-1') as fp:
    r=fp.read()

print s
print r

Edited 9 Years Ago by woooee

snippsat 661 Master Poster

9 Years Ago

Also, strings in Python 3 are unicode so enocde and decode are not necessary.

That's only true if text is already inside Python 3.
Here we are talking about taking text from outside into Python 3,
then we must define a encoding like utf-8,latin-1...,
or it will give an error or become a byte string.

Because we must read with correct encoding when taking text into Python 3,
with open() has new stuff like errors='ignore', errors='replace'

with open('some_file', 'r', encoding='utf-8', errors='ignore') as f:
    print(f.read())

So this statement.
In Python 3 are all strings are sequences of Unicode character
Yes this is true,
but then all text taken in from outside must have been correct encoded into Python 3.

Edited 9 Years Ago by snippsat

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 1 · 2015-07-23T16:52:14+00:00

Note that in python 3, the built-in function open() has an encoding parameter. You don't need to use codecs.open(). Otherwise, use io.open() for cross-python code :)

Python 3.4.0 (default, Jun 19 2015, 14:20:21) 
>>> import codecs
>>> codecs.open is open
False
>>> import io
>>> io.open is open
True

woooee 814 Nearly a Posting Maven · Answer 2 · 2015-07-24T01:37:30+00:00

Also, strings in Python 3 are unicode so enocde and decode are not necessary.

vegaseat 1,735 DaniWeb's Hypocrite Team Colleague · Answer 3 · 2015-07-24T14:49:28+00:00

FYI ...

# get your current locale encoding

import locale

print(locale.getpreferredencoding(False))  # eg. US-ASCII

As of Python 3.4.3 these are the options with open() ...
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

I you use a default encoding of None, then your current locale ancoding is applied. Also your default for mode is really 'r' and text 't'

decode string

Recommended Answers Collapse Answers

All 5 Replies

Recommended Answers