Python Question: HTML Extracting

Question

jdm3 0 Newbie Poster

13 Years Ago

zipc = "47408"
url = "http://watchdog.net/us/?zip="
conn = u.urlopen(url+zipc)
content = conn.readlines()

for line in content:
    line = line.decode("utf-8")

So I am working on a CGI/Python project that requires input of a zipcode to then find the name of a certain politician. Right now I am trying to find a way to extract that name from the code.

I am currently inputing the zipcode of 47408 into the watchdog webpage. This access' Baron Hill's Distrcict. In the code I need to extract this: <a href="/p/NAME"> where NAME should equal the name of the politician from the given zipcode, or in this case baron_hill.

My question is how do I extract the 'baron_hill' or NAME part from the page source?

I was told to use the "find" method, but I can not get it to work quite how I'd like.
Above is the small piece of code I've been using to give you all a general idea of what I'm doing.

I am working with Python3.

Thanks for any input!

html-css python

Edited 13 Years Ago by jdm3

2 Contributors
2 Replies
218 Views
1 Hour Discussion Span
Latest Post 13 Years Ago Latest Post by jdm3

All 2 Replies

TrustyTony 888 ex-Moderator

13 Years Ago

I do not catch exactly what you want to do as line 7 does nothing and so the loop is without purpose.

import urllib2 as u
zipc = "47408"
url = "http://watchdog.net/us/?zip="
conn = u.urlopen(url+zipc)
for line in conn:
    if 'Represented by' in line:
        print(line)

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

jdm3 0 Newbie Poster · Answer 1 · 2012-04-14T21:18:02+00:00

Whoops, forgot to include that print statement.

However, after I print all of the content from that web page I need to extract the part of it that says

<a href='/p/NAME'> I'm making a program that can search based on user inputs. For this specific example I am defaulting 47408 as the zipcode. This brings up a page where I need to extract <a href='/p/baron_hill'>
My problem is, how do I extract it if the name will always be different? How do I specify in Python to extract the <a href='/p/baron_hill'> part of the code?

Python Question: HTML Extracting

Recommended Answers Collapse Answers

All 2 Replies

Recommended Answers