Simplest web browser

Question

10 Years Ago

Hi all,
I am working on simplest web browser

import socket

mysock=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
mysock.connect(('www.py4info.com',80))
mysock.send('GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\n\n')
while True:
    data=mysock.recv(512)
    print data


mysock.close()

It's returning error 408:Request timeout at one time and at other runs smoothly but with incomplete output as:
HTTP/1.1 200 OK
Date: Sun, 14 Mar 2010 23:52:41 GMT
Server: Apache
Last-Modified: Tue, 29 Dec 2009 01:31:22 GMT
ETag: "143c1b33-a7-4b395bea"
Accept-Ranges: bytes
Content-Length: 167
Connection: close
Content-Type: text/plain

Desired output:

HTTP/1.1 200 OK
Date: Sun, 14 Mar 2010 23:52:41 GMT
Server: Apache
Last-Modified: Tue, 29 Dec 2009 01:31:22 GMT
ETag: "143c1b33-a7-4b395bea"
Accept-Ranges: bytes
Content-Length: 167
Connection: close
Content-Type: text/plain
But soft what light through yonder windo
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief

I also tried to increase buffer size but still didn't work
why?

apache python web-browser

2 Contributors
1 Reply
378 Views
3 Days Discussion Span
Latest Post 10 Years Ago Latest Post by chriswelborn

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

chriswelborn 63 ... · Answer 1 · 2014-08-26T22:40:58+00:00

This works ok on my end, don't use the while True loop. Use iter(). (this is slightly modified for Python3):

#!/usr/bin/env python3
import socket

# This would be 'import urlparse' in Python 2.
import urllib.parse

url = 'http://www.py4inf.com/code/romeo.txt'
# Using urlsplit to extract the domain from the url.
# This way, the url and domain doesn't really have to be
# hardcoded into the program. (it can be if you want it to)
urlinfo = urllib.parse.urlsplit(url)

# urlinfo.netloc holds the domain that we want to connect to.
print('Connecting to: {}'.format(urlinfo.netloc))
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((urlinfo.netloc, 80))

# Using str.format() to build the GET line.
# ..also converting to bytes so it can be sent over the wire.
print('Retrieving: {}'.format(url))
getline = 'GET {} HTTP/1.0\n\n'.format(url).encode('utf-8')
mysock.send(getline)

# Iterate over .recv(512) until data is exhausted.
for data in iter(lambda: mysock.recv(512), b''):
    if data:
        # decoding the bytes to text before printing/using them.
        print(data.decode('utf-8'))

mysock.close()

Using iter() with a sentinel value is my preferred method of doing this. iter accepts a callable, and if you provide the sentinel value it will stop when that value is found.

So I am saying "iterate over the results of calling .recv(512) over and over, but stop when b'' is found. (an empty byte-string, the b means byte-string and is not needed in Python 2.)"

I am using lambda to provide a callable that ensures recv is called with 512. You could also use functools.partial to achieve the same effect.

..In your version the function is called forever, and then after all data is exhausted, .recv() times out waiting for data that will never come. It's a 'blocking' call, so the script never returns. The only reason it stopped at all is because an Exception was raised.

The main difference between the Python 2 and Python 3 versions would be that you don't have to call .encode('utf-8') on the data that's sent, and you don't have to call .decode('utf-8') on the data that's received. Python 2 is already sending bytes. In Python 3 you have to convert the text to bytes and back by encoding/decoding. I actually prefer this method. In Python 2, when you call print data it is implicitly decoding the bytes anyway.