DaniWeb IT Discussion Community

DaniWeb IT Discussion Community (http://www.daniweb.com/forums/index.php)
-   Python (http://www.daniweb.com/forums/forum114.html)
-   -   screen scraping (http://www.daniweb.com/forums/thread156737.html)

ccandillo Nov 11th, 2008 10:43 pm
screen scraping
 
I am trying to run the following screen scraping script but it's not displaying any output. Can someone tell me what I'm doing wrong?

from BeautifulSoup import BeautifulSoup
import urllib

url = 'http://toronto.en.craigslist.ca/search/cta?query=civic&minAsk=min&maxAsk=max'

doc = urllib.urlopen(url).read()
soup = BeautifulSoup(doc)
tags = soup.findAll('p')
for tag in tags:
    addate = tag.contents[0]
    path = tag.contents[1].attrs[0][1]
    desc = tag.next.next.string
    print addate, path, desc

jlm699 Nov 12th, 2008 7:00 am
Re: screen scraping
 
I ran the code unmodified and got this:
Quote:

Originally Posted by output
Nov 12 - /tor/cto/915709511.html FS: 2004 Honda Civic Si Low Km - $12500 -
Nov 12 - /tor/cto/915669421.html FS; 1993 HONDA CIVIC CX HATCHBACK (EG) - $850
-
Nov 12 - /tor/cto/915654012.html FS: 1997 HONDA CIVIC CX HATCHBACK - $1500 -
Nov 11 - /yrk/cto/915504337.html 95 civic ex coupe -
Nov 11 - /mss/cto/915500509.html 997 HONDA CIVIC -
Nov 11 - /tor/cto/915425141.html 2006 honda civic DX-g - $15000 -
Nov 11 - /yrk/cto/915372101.html 1999 HONDA CIVIC EX, 4 DOOR, AUTO, $4000 !! -
$4000 -
... (Continues for 142 lines)

How are you running the code? If you're just double clicking the .py file perhaps the console is closing before you're able to capture the output...

ccandillo Nov 12th, 2008 9:54 am
Re: screen scraping
 
My bad. I was running the code from idle and kept getting a 'RuntimeError: maximum recursion depth exceeded' error message. I am not quite sure why but it works from the console. Thanks!


All times are GMT -4. The time now is 1:13 pm.

Forum system based on vBulletin Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
©2003 - 2009 DaniWeb® LLC