| | |
screen scraping
Thread Solved |
•
•
Join Date: May 2008
Posts: 11
Reputation:
Solved Threads: 0
I am trying to run the following screen scraping script but it's not displaying any output. Can someone tell me what I'm doing wrong?
python Syntax (Toggle Plain Text)
from BeautifulSoup import BeautifulSoup import urllib url = 'http://toronto.en.craigslist.ca/search/cta?query=civic&minAsk=min&maxAsk=max' doc = urllib.urlopen(url).read() soup = BeautifulSoup(doc) tags = soup.findAll('p') for tag in tags: addate = tag.contents[0] path = tag.contents[1].attrs[0][1] desc = tag.next.next.string print addate, path, desc
I ran the code unmodified and got this:
How are you running the code? If you're just double clicking the .py file perhaps the console is closing before you're able to capture the output...
•
•
•
•
Originally Posted by output
Nov 12 - /tor/cto/915709511.html FS: 2004 Honda Civic Si Low Km - $12500 -
Nov 12 - /tor/cto/915669421.html FS; 1993 HONDA CIVIC CX HATCHBACK (EG) - $850
-
Nov 12 - /tor/cto/915654012.html FS: 1997 HONDA CIVIC CX HATCHBACK - $1500 -
Nov 11 - /yrk/cto/915504337.html 95 civic ex coupe -
Nov 11 - /mss/cto/915500509.html 997 HONDA CIVIC -
Nov 11 - /tor/cto/915425141.html 2006 honda civic DX-g - $15000 -
Nov 11 - /yrk/cto/915372101.html 1999 HONDA CIVIC EX, 4 DOOR, AUTO, $4000 !! -
$4000 -
... (Continues for 142 lines)
Last edited by jlm699; Nov 12th, 2008 at 7:01 am.
![]() |
Similar Threads
- PHP HTTP Screen-Scraping Class with Caching (PHP)
- prevent scraping (PHP)
- PHP Screen Scraping (PHP)
- Looking for table rows with Regular expression. (PHP)
Other Threads in the Python Forum
- Previous Thread: python script works in pywin but not in command window
- Next Thread: Detecting changes in RichTextCtrl
| Thread Tools | Search this Thread |
alarm ansi assignment avogadro backend beginner binary bluetooth character cmd code customdialog cx-freeze data decimals dictionary directory dynamic error examples exe file float format function generator gnu graphics gui halp heads homework http ideas import input itunes java leftmouse line linux list lists loop maze module mouse number numbers output parsing path pointer port prime programming progressbar projects push py2exe pygame pyglet pyqt python random recursion schedule screensaverloopinactive script scrolledtext slicenotation sqlite ssh statistics string strings sudokusolver sum terminal text thread threading time tlapse tricks tuple tutorial ubuntu unicode urllib urllib2 variable ventrilo vigenere web webservice wikipedia write wxpython xlib






