| | |
screen scraping
Thread Solved
![]() |
•
•
Join Date: May 2008
Posts: 11
Reputation:
Solved Threads: 0
I am trying to run the following screen scraping script but it's not displaying any output. Can someone tell me what I'm doing wrong?
python Syntax (Toggle Plain Text)
from BeautifulSoup import BeautifulSoup import urllib url = 'http://toronto.en.craigslist.ca/search/cta?query=civic&minAsk=min&maxAsk=max' doc = urllib.urlopen(url).read() soup = BeautifulSoup(doc) tags = soup.findAll('p') for tag in tags: addate = tag.contents[0] path = tag.contents[1].attrs[0][1] desc = tag.next.next.string print addate, path, desc
I ran the code unmodified and got this:
How are you running the code? If you're just double clicking the .py file perhaps the console is closing before you're able to capture the output...
•
•
•
•
Originally Posted by output
Nov 12 - /tor/cto/915709511.html FS: 2004 Honda Civic Si Low Km - $12500 -
Nov 12 - /tor/cto/915669421.html FS; 1993 HONDA CIVIC CX HATCHBACK (EG) - $850
-
Nov 12 - /tor/cto/915654012.html FS: 1997 HONDA CIVIC CX HATCHBACK - $1500 -
Nov 11 - /yrk/cto/915504337.html 95 civic ex coupe -
Nov 11 - /mss/cto/915500509.html 997 HONDA CIVIC -
Nov 11 - /tor/cto/915425141.html 2006 honda civic DX-g - $15000 -
Nov 11 - /yrk/cto/915372101.html 1999 HONDA CIVIC EX, 4 DOOR, AUTO, $4000 !! -
$4000 -
... (Continues for 142 lines)
Last edited by jlm699; Nov 12th, 2008 at 7:01 am.
![]() |
Similar Threads
- PHP HTTP Screen-Scraping Class with Caching (PHP)
- prevent scraping (PHP)
- PHP Screen Scraping (PHP)
- Looking for table rows with Regular expression. (PHP)
Other Threads in the Python Forum
- Previous Thread: python script works in pywin but not in command window
- Next Thread: Detecting changes in RichTextCtrl
| Thread Tools | Search this Thread |
abrupt ansi anti apache application approximation array assignment backend beginner binary bluetooth builtin calculator character cmd converter countpasswordentry curved customdialog dan08 decimals dictionary dynamic edit exe file float format function gnu graphics heads homework http ideas inches input java leftmouse library line lines linux list lists loop module mouse movingimageswithpygame mysqlquery number numbers numeric output parsing path phonebook pointer prime programming progressbar py2exe pygame python random recursion redirect remote reverse schedule scrolledtext session software sqlite statictext statistics string strings sudokusolver syntax terminal text thread threading time tlapse tuple twoup ubuntu unicode unit urllib urllib2 variable wikipedia wordgame write wxpython xlib






