I am trying to work on some code that will fetch all hyperlinks from a webpage url using the python.

The code works perfectly fine when i try to run the code at home network without my proxy. When i have to show the code to my teacher in my University - it would never work because of the proxy my University is using . Its proxy that all students use to connect to the internet is proxy4.nehu.ac.in how do i code so that it will go through the proxyport. please help i have been trying other stuff by no luck for so many days.
my code is :

import sys
import urllib
import urlparse

from bs4 import BeautifulSoup
def process(url):
page = urllib.urlopen(url) 
text = page.read()
page.close()
soup = BeautifulSoup(text) 
with open('s.txt','w') as file:
    for tag in soup.findAll('a', href=True):
        tag['href'] = urlparse.urljoin(url, tag['href'])
        print tag['href']
        file.write('\n')
        file.write(tag['href'])


def main():
   if len(sys.argv) == 1:
   print 'No url !!'
   sys.exit(1)
for url in sys.argv[1:]:

this is the error

Traceback (most recent call last):
  File "myurl.py", line 26, in <module>
   main()
  File "myurl.py", line 24, in main
   process(url)
   File "myurl.py", line 7, in process
   page = urllib.urlopen(url)
   File "/usr/lib/python2.7/urllib.py", line 84, in urlopen
   return opener.open(url)
   File "/usr/lib/python2.7/urllib.py", line 205, in open
   return getattr(self, name)(url)
   File "/usr/lib/python2.7/urllib.py", line 342, in open_http
   h.endheaders(data)
   File "/usr/lib/python2.7/httplib.py", line 940, in endheaders
   self._send_output(message_body)
   File "/usr/lib/python2.7/httplib.py", line 803, in _send_output
   self.send(msg)
   File "/usr/lib/python2.7/httplib.py", line 755, in send
   self.connect()
   File "/usr/lib/python2.7/httplib.py", line 736, in connect
   self.timeout, self.source_address)
   File "/usr/lib/python2.7/socket.py", line 551, in create_connection
   for res in getaddrinfo(host, port, 0, SOCK_STREAM):
   IOError: [Errno socket error] [Errno -2] Name or service not known''

Recommended Answers

All 3 Replies

A must read

Or a better read
You should really look into and use Requests Shailang.

commented: Nice. Something I didn't know, thanks. +5

Thanks guys i will take a look @snippit @rproffit

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.