ok so every time i write this
f=open('file.html')
p=f.read()

this come out and i do not understand it help!!
p=f.read()
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 8171: ordinal not in range(128)


i am trying to open a .html file (not from a website but a file on my computer)
i know you use open() for .txt file or is this not the code to open .html files
and if so what is it?

or is there a way to convert a .html file into a .txt file?

Recommended Answers

All 2 Replies

Like this for python 3,there has been big changes in urllib for python 3(urllib + urllib2 joined together)
But i guess you havent used python 2,so the changes dosen`t matter for you.

or is there a way to convert a .html file into a .txt file?

The code under you get source kode(html),that is now just a string(text).
You can not convert it to txt,you can save it as txt(but what is the point of that)

One example of what you can to:
when you have read in source code(html) you can du stuff like parse out info you want,with good pareser like beautifulsoup and lxml.
Or just use python string tools like find,slice.split... to take out info you need.

#python 3
import urllib.request

page = urllib.request.urlopen('http://homepage.mac.com/s_lott/books/index.html')
text = page.read().decode("utf8")
print(text)

The same in python 2.x

#python 2.x
import urllib

page =  urllib.urlopen('http://diveintopython3.org/').read()
print page

Staright forward buddy ;)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.