I am trying to work on some code that will fetch all hyperlinks from a webpage url using the python.
The code works perfectly fine when i try to run the code at home network without my proxy. When i have to show the code to my teacher in my University - it would never work because of the proxy my University is using . Its proxy that all students use to connect to the internet is proxy4.nehu.ac.in how do i code so that it will go through the proxyport. please help i have been trying other stuff by no luck for so many days.
my code is :
import sys import urllib import urlparse from bs4 import BeautifulSoup def process(url): page = urllib.urlopen(url) text = page.read() page.close() soup = BeautifulSoup(text) with open('s.txt','w') as file: for tag in soup.findAll('a', href=True): tag['href'] = urlparse.urljoin(url, tag['href']) print tag['href'] file.write('\n') file.write(tag['href']) def main(): if len(sys.argv) == 1: print 'No url !!' sys.exit(1) for url in sys.argv[1:]:
this is the error
Traceback (most recent call last): File "myurl.py", line 26, in <module> main() File "myurl.py", line 24, in main process(url) File "myurl.py", line 7, in process page = urllib.urlopen(url) File "/usr/lib/python2.7/urllib.py", line 84, in urlopen return opener.open(url) File "/usr/lib/python2.7/urllib.py", line 205, in open return getattr(self, name)(url) File "/usr/lib/python2.7/urllib.py", line 342, in open_http h.endheaders(data) File "/usr/lib/python2.7/httplib.py", line 940, in endheaders self._send_output(message_body) File "/usr/lib/python2.7/httplib.py", line 803, in _send_output self.send(msg) File "/usr/lib/python2.7/httplib.py", line 755, in send self.connect() File "/usr/lib/python2.7/httplib.py", line 736, in connect self.timeout, self.source_address) File "/usr/lib/python2.7/socket.py", line 551, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): IOError: [Errno socket error] [Errno -2] Name or service not known''