Hey guys, I'm a little confused on a web scraping topic. I understand that you can use urllib2 to webscrape, but my project requires a little more advanced scraping. I want to use python to inpute data into a search box, then return the results of that information. As an example, I want to be able to input a doctors name into healthgrades.com, then retrieve the results, as well as the url links to those results. Can anyone help me out? I've been reading a lot about something called mechanize, but I don't really know where to begin with all of this. Thank you.

A HTML form submits data using HTTP GET or HTTP POST. See what your form does and write code accordingly. Since you are using urllib, I assumed you know HTTP GET or HTTP POST. If not search google. THis HOWTO tells you how to use urbllib2 to submit data http://docs.python.org/howto/urllib2.html

Mechanize is the standard answer to "how do I fill in a form" type questions. I have not used it so can not say what would be required to use it.

A HTML form submits data using HTTP GET or HTTP POST. See what your form does and write code accordingly. Since you are using urllib, I assumed you know HTTP GET or HTTP POST. If not search google. THis HOWTO tells you how to use urbllib2 to submit data http://docs.python.org/howto/urllib2.html

hmm, i didn't know that urllib can do those types of get and post requests. I'll give that a go. And Mechanize seems really confusing to me.

Also, can you run this type of urllib2 stuff on a development computer, or does the python file have to be on a server? Thank you.

Edited 5 Years Ago by de1337ed: n/a

any machine that can connect to the web site you intend to scrape is suitable.

(donning my lawyer hat): Be sure you are following fair use rules as described at healthgrades.com and any other place you would like to scrape information. Because people are so much slower than machines, many sites restrict you from using a program to scrape information or restrict the number of queries per hour; or whatever. Of course fair use is anything they say it is on their site.

This article has been dead for over six months. Start a new discussion instead.