943,790 Members | Top Members by Rank

Ad:
  • Python Discussion Thread
  • Marked Solved
  • Views: 1470
  • Python RSS
Nov 11th, 2008
0

screen scraping

Expand Post »
I am trying to run the following screen scraping script but it's not displaying any output. Can someone tell me what I'm doing wrong?

python Syntax (Toggle Plain Text)
  1. from BeautifulSoup import BeautifulSoup
  2. import urllib
  3.  
  4. url = 'http://toronto.en.craigslist.ca/search/cta?query=civic&minAsk=min&maxAsk=max'
  5.  
  6. doc = urllib.urlopen(url).read()
  7. soup = BeautifulSoup(doc)
  8. tags = soup.findAll('p')
  9. for tag in tags:
  10. addate = tag.contents[0]
  11. path = tag.contents[1].attrs[0][1]
  12. desc = tag.next.next.string
  13. print addate, path, desc
Similar Threads
Reputation Points: 10
Solved Threads: 0
Newbie Poster
ccandillo is offline Offline
16 posts
since May 2008
Nov 12th, 2008
0

Re: screen scraping

I ran the code unmodified and got this:
Quote originally posted by output ...
Nov 12 - /tor/cto/915709511.html FS: 2004 Honda Civic Si Low Km - $12500 -
Nov 12 - /tor/cto/915669421.html FS; 1993 HONDA CIVIC CX HATCHBACK (EG) - $850
-
Nov 12 - /tor/cto/915654012.html FS: 1997 HONDA CIVIC CX HATCHBACK - $1500 -
Nov 11 - /yrk/cto/915504337.html 95 civic ex coupe -
Nov 11 - /mss/cto/915500509.html 997 HONDA CIVIC -
Nov 11 - /tor/cto/915425141.html 2006 honda civic DX-g - $15000 -
Nov 11 - /yrk/cto/915372101.html 1999 HONDA CIVIC EX, 4 DOOR, AUTO, $4000 !! -
$4000 -
... (Continues for 142 lines)
How are you running the code? If you're just double clicking the .py file perhaps the console is closing before you're able to capture the output...
Last edited by jlm699; Nov 12th, 2008 at 7:01 am.
Reputation Points: 355
Solved Threads: 292
Veteran Poster
jlm699 is offline Offline
1,102 posts
since Jul 2008
Nov 12th, 2008
0

Re: screen scraping

My bad. I was running the code from idle and kept getting a 'RuntimeError: maximum recursion depth exceeded' error message. I am not quite sure why but it works from the console. Thanks!
Reputation Points: 10
Solved Threads: 0
Newbie Poster
ccandillo is offline Offline
16 posts
since May 2008

This thread is solved

Either the thread starter or a moderator has marked this thread as solved. You can most likely trust the responses and answers given. There is most likely no reason for any further responses to be posted here. If you have a related question, please start a new thread in this forum instead.

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Python Forum Timeline: python script works in pywin but not in command window
Next Thread in Python Forum Timeline: Detecting changes in RichTextCtrl





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC