How can I reach to source code of a web page with python ?
I use these code block but it doesn't work perfectly. If I open the html file(site.html) on a web browser, some characters disapper.

import urllib.request

req = urllib.request.Request('https://www.google.com')
response = urllib.request.urlopen(req)
the_page = response.read()
a = open('site.html', 'w+')
a.write(str(the_page))
a.close()

Recommended Answers

All 2 Replies

This code block is solving the our problem:

from urllib.request import urlretrieve
url = 'https://www.google.com'
urlretrieve(url, 'site1.html')

You probably have to open the file in binary mode ("wb"). I do not think you have to convert the_page to anything before writing.

Otherwise use urlretrieve, as you already found out.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.