Hey guys, I'm a little confused on a web scraping topic. I understand that you can use urllib2 to webscrape, but my project requires a little more advanced scraping. I want to use python to inpute data into a search box, then return the results of that information. As an example, I want to be able to input a doctors name into healthgrades.com, then retrieve the results, as well as the url links to those results. Can anyone help me out? I've been reading a lot about something called mechanize, but I don't really know where to begin with all of this. Thank you.

Recommended Answers

All 6 Replies

Try BeautifulSoup or try Python standard library parsers http://docs.python.org/library/markup.html.

These seem to be only ways to scrape information. But I want to be able to actually input data on a website. So if a website had a search bar, I want to be able to input a search term using python, and retrieve the results in python.

Member Avatar for GreenDay2001

A HTML form submits data using HTTP GET or HTTP POST. See what your form does and write code accordingly. Since you are using urllib, I assumed you know HTTP GET or HTTP POST. If not search google. THis HOWTO tells you how to use urbllib2 to submit data http://docs.python.org/howto/urllib2.html

Mechanize is the standard answer to "how do I fill in a form" type questions. I have not used it so can not say what would be required to use it.

A HTML form submits data using HTTP GET or HTTP POST. See what your form does and write code accordingly. Since you are using urllib, I assumed you know HTTP GET or HTTP POST. If not search google. THis HOWTO tells you how to use urbllib2 to submit data http://docs.python.org/howto/urllib2.html

hmm, i didn't know that urllib can do those types of get and post requests. I'll give that a go. And Mechanize seems really confusing to me.

Also, can you run this type of urllib2 stuff on a development computer, or does the python file have to be on a server? Thank you.

any machine that can connect to the web site you intend to scrape is suitable.

(donning my lawyer hat): Be sure you are following fair use rules as described at healthgrades.com and any other place you would like to scrape information. Because people are so much slower than machines, many sites restrict you from using a program to scrape information or restrict the number of queries per hour; or whatever. Of course fair use is anything they say it is on their site.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.