screen scraping

Thread Solved

Join Date: May 2008
Posts: 11
Reputation: ccandillo is an unknown quantity at this point 
Solved Threads: 0
ccandillo ccandillo is offline Offline
Newbie Poster

screen scraping

 
0
  #1
Nov 11th, 2008
I am trying to run the following screen scraping script but it's not displaying any output. Can someone tell me what I'm doing wrong?

  1. from BeautifulSoup import BeautifulSoup
  2. import urllib
  3.  
  4. url = 'http://toronto.en.craigslist.ca/search/cta?query=civic&minAsk=min&maxAsk=max'
  5.  
  6. doc = urllib.urlopen(url).read()
  7. soup = BeautifulSoup(doc)
  8. tags = soup.findAll('p')
  9. for tag in tags:
  10. addate = tag.contents[0]
  11. path = tag.contents[1].attrs[0][1]
  12. desc = tag.next.next.string
  13. print addate, path, desc
Reply With Quote Quick reply to this message  
Join Date: Jul 2008
Posts: 1,041
Reputation: jlm699 is a jewel in the rough jlm699 is a jewel in the rough jlm699 is a jewel in the rough jlm699 is a jewel in the rough 
Solved Threads: 261
Sponsor
jlm699's Avatar
jlm699 jlm699 is offline Offline
Knows where his Towel is

Re: screen scraping

 
0
  #2
Nov 12th, 2008
I ran the code unmodified and got this:
Originally Posted by output
Nov 12 - /tor/cto/915709511.html FS: 2004 Honda Civic Si Low Km - $12500 -
Nov 12 - /tor/cto/915669421.html FS; 1993 HONDA CIVIC CX HATCHBACK (EG) - $850
-
Nov 12 - /tor/cto/915654012.html FS: 1997 HONDA CIVIC CX HATCHBACK - $1500 -
Nov 11 - /yrk/cto/915504337.html 95 civic ex coupe -
Nov 11 - /mss/cto/915500509.html 997 HONDA CIVIC -
Nov 11 - /tor/cto/915425141.html 2006 honda civic DX-g - $15000 -
Nov 11 - /yrk/cto/915372101.html 1999 HONDA CIVIC EX, 4 DOOR, AUTO, $4000 !! -
$4000 -
... (Continues for 142 lines)
How are you running the code? If you're just double clicking the .py file perhaps the console is closing before you're able to capture the output...
Last edited by jlm699; Nov 12th, 2008 at 7:01 am.
1. Use Code Tags.
2. Homework? Show Effort.
3. Keep discussions on the forum: no PMs
Reply With Quote Quick reply to this message  
Join Date: May 2008
Posts: 11
Reputation: ccandillo is an unknown quantity at this point 
Solved Threads: 0
ccandillo ccandillo is offline Offline
Newbie Poster

Re: screen scraping

 
0
  #3
Nov 12th, 2008
My bad. I was running the code from idle and kept getting a 'RuntimeError: maximum recursion depth exceeded' error message. I am not quite sure why but it works from the console. Thanks!
Reply With Quote Quick reply to this message  
Reply

This thread has been marked solved.
Perhaps start a new thread instead?
Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC