How do I read a .txt/.csv file from an internet address? For example: http:\\www.internetaddress.com\file.txt I don't think file() would work for this.

Thanks

Recommended Answers

All 8 Replies

Basic example with for loop

#URL LIBRARY
from urllib2 import *
ur = urlopen("http://www.daniweb.com/forums/thread161312.html")#open url
contents = ur.readlines()#readlines from url file
fo = open("test.txt", "w")#open test.txt
for line in contents: 
    print "writing %s to a file" %(line,)
    fo.write(i)#write lines from url file to text file
fo.close()#close text file

Thanks for the help. That solved my problem.

How to remove all the html tags?

urlopen() does not seem to work for me, as in I cannot import it. I am using Python 3.4.3 though.

In python 3, urlopen() is in module urllib.request. You can go here https://docs.python.org/3/index.html and type the name of a function in the quick search box to find it in the documentation.

Here are the diffrent ways,
and also what i would call the prefered way these day with Requests.

Python 2:

from urllib2 import urlopen

page_source = urlopen("http://python.org").read()
print page_source

Python 3:

from urllib.request import urlopen

page_source = urlopen('http://python.org').read().decode('utf_8')
print(page_source)

For Python 3 to get str output and not byte we need to decode to utf-8.

Here with Requests,work for Python 2 and 3:

import requests

page_source = requests.get('http://python.org')
print(page_source.text)

Basic web-scraping we read in with Requests and parse with BeautifulSoup.

import requests
from bs4 import BeautifulSoup    

page_source = requests.get('http://python.org')
soup = BeautifulSoup(page_source.text)
print(soup.find('title').text) #--> Welcome to Python.org
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.