Hello. I have a homework. I have asked to create a web crawler that be able to enter into a music website and then for the first step, collect the name of singers that their names starts with the letter "A". Now i need a little help for this step. How my crawler should understand wich words in that page are the singers names?! The crawler should find their names in a special tag, correct?! But what kind of tag?! Their names could be in any tag like <h4></h4> for example or in a single <p></p> tag or in a …

Member Avatar
Member Avatar
+0 forum 12

Hello, me again :) With this code: >>> from BeautifulSoup import BeautifulSoup >>> import urllib2 >>> url = urllib2.urlopen('http://www.python.org').read() >>> soup = BeautifulSoup(url) >>> links = soup('a') >>> print links A list of links printed into the terminal. I want to send the list into a text file, i tried this: >>> with open('python-links.txt.', 'w') as f: ... f.write(links) But there was an error: File "<stdin>", line 2, in <module> TypeError: expected a character buffer object What is the problem? How can fix that? -------------------------------------------------------------------- And one more question; as that list looks like this: (I will copy only small …

Member Avatar
Member Avatar
+0 forum 3

Hi again. I want to create a robot or spider or crawler with python urllib. Still couldn't find any good tutorial. Any suggestion?!

Member Avatar
Member Avatar
+0 forum 3

Hello. I want to learn python urllib. I have installed it and now looking for a good tutorial, any suggestion?

Member Avatar
Member Avatar
+0 forum 4

Hi friends! import urllib url = 'http://www.python.org' text = urllib.urlopen(url).read() I have typed the code above on the terminal and in the next line with `print text` an html file printed there. I want to send it to a text file, how can i do that?

Member Avatar
Member Avatar
+0 forum 3

This is a script that was supposed to be very basic, just running a command with my scripts arguments attached. I didn't realize that if user 'cj' opens firefox, when user 'root' does '**firefox -new-tab**' it doesn't work. Firefox will just open a new window, and thats exactly what I didn't want to happen. The whole point was to open in a new-tab. So finding the user that logged onto the desktop, when the script was ran by root became the challenge. You can find out who ran the script alot of ways, but thats not what I wanted. Sometimes …

Member Avatar
Member Avatar
+0 forum 3

I am running into some issue with scraping data. If I hardcode value for key "lbo race" in the code below it is able to scrape the data but if I try to set key "lbo race" to a variable which is being read in it doesn't seem to scrape the data correctly. I tried to put a time to slow it down but that doesn't seem to be the issue. Would I use threading to solve this problem? Thanks! import urllib.parse import urllib.request import csv import time def parseTable(html): #Each "row" of the HTML table will be a list, …

Member Avatar
Member Avatar
+0 forum 1

hi im trying to make a program that will go to 4chan and download a the images on a thread(i.e. [url]http://4chan.org/b[/url]). the program will work the first time but after that when i go to run it again it trys to download the same urls as it did the first time it ran, and those have 404'ed. please help. also 4chan is NSFW (though most of you probably know about it, but just in case. Edit: i just tested the code on 4chan.org/s and it works perfectly. so i dont know why its not working on 4chan.org/b heres my code: …

Member Avatar
+0 forum 0

Hi all, I'm trying to use Python's urllib to get a Facebook profile page. I get the following error: [CODE]IOError: [Errno socket error] [Errno 10035] A non-blocking socket operation could not be completed immediately[/CODE] Here's my code: [CODE] import urllib member_profile_text = urllib.urlopen('http://www.facebook.com/profile.php?id=1073109649').read() [/CODE] I need to get this working soon, so any help would be extremely appreciated. Thanks so much! Benjamin

Member Avatar
Member Avatar
+0 forum 8

Hi I am continuing on with one of my first projects in python. This module for my larger project will be to download files(legal freely published by sporting assosciations). I am not sure if my concepts are right in how best to execute this, I have read the urllib docs on python.org and the syntax itself seems okay. For example the first site publishes the information in zip files each containing 5 csv files. My aim is to save these files to hard as read only and in a later module take csv and format to input to a database(new …

Member Avatar
Member Avatar
-1 forum 2

Hey guys, I am making a program called 'Weather Watch' which basically gets weather updates for any city you type in. For now, it only gets info for a particular city. I don't know how to search for the term entered in [url]www.weather.com[/url] and then get the updates. The code so far: [code='python'] import urllib2 as url import os import time os.system('cls') print("[Content provided by The Weather Channel]") time.sleep(3) os.system('cls') print("Please wait...this may take a few seconds.") time.sleep(3) os.system('cls') condition=True try: url_open=url.urlopen("http://www.weather.com") lines=url_open.readlines()[1192:1199] except: condition=False if(condition): print("Weather forecast for Bangalore, INDIA\n") for x in lines: onsplit=x.split(">") tags=onsplit[2][:-5] if(tags=="Pressure:"): data=onsplit[4][:5] elif(tags=="Dew …

Member Avatar
Member Avatar
+0 forum 5

The End.