How to extact the personal address from html document

Question

akie2741 0 Newbie Poster

15 Years Ago

my code is:

import urllib 
import urllib2 
import re

#get URL
urla='http://www.sc.iitb.ac.in/~bijnan/personal-details.htm'
#connect to this website 
request=urllib.urlopen(urla)
#get html file from this website
html=request.read() 
#get the address from above html file

print html

How can i find all his addresses in this html,can use the re.complie() method to match the address pattern,but how?

html-css python

2 Contributors
1 Reply
99 Views
3 Days Discussion Span
Latest Post 15 Years Ago Latest Post by djidjadji

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

djidjadji 28 Light Poster · Answer 1 · 2009-09-27T05:38:21+00:00

It made with Microsoft Word.
The WORST HTML editor there is.

A good cadidate to use is module: HTMLParser

First clean up the file and get rid of all the style stuff and comments.
Then you have a better view of what the file is made up of.