Help with REGEX

Question

debasishgang7 0 Junior Poster in Training

12 Years Ago

Hi all,

I am trying to extract some text from a HTML page using regex.

<html>
...some code....
<B><FONT color="green">TEXT to BE EXTRACTED 1</FONT></B><br>
<P>
<B><FONT color="green">TEXT to BE EXTRACTED 2</FONT></B><br>
<P>
<B><FONT color="red">TEXT to BE EXTRACTED 3</FONT></B>
....some code....
</html>

I want to make a script which will print

TEXT to BE EXTRACTED 1
TEXT to BE EXTRACTED 2
TEXT to BE EXTRACTED 3

from the entire HTMl Page.

Thanks

python

3 Contributors
3 Replies
123 Views
18 Hours Discussion Span
Latest Post 12 Years Ago Latest Post by debasishgang7

All 3 Replies

snippsat 661 Master Poster

12 Years Ago

As Tony poster regex is not a god choice for html or xml.
This is why paser exist to do this job.

from BeautifulSoup import BeautifulSoup

html = '''\
<html>
...some code....
<B><FONT color="green">TEXT to BE EXTRACTED 1</FONT></B><br>
<P>
<B><FONT color="green">TEXT to BE EXTRACTED 2</FONT></B><br>
<P>
<B><FONT color="red">TEXT to BE EXTRACTED 3</FONT></B>
....some code....
</html>'''

soup = BeautifulSoup(html)
tag = soup.findAll('font')
for item in tag:
    print item.text

'''Output-->
TEXT to BE EXTRACTED 1
TEXT to BE EXTRACTED 2
TEXT to BE EXTRACTED 3
'''

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

TrustyTony 888 pyMod Team Colleague Featured Poster · Answer 1 · 2011-09-03T02:25:41+00:00

Before re experts start to give they advices I give the standard answer for HTML: Don't, use for example BeautifulSoup module instead.

debasishgang7 0 Junior Poster in Training · Answer 2 · 2011-09-03T14:35:43+00:00

debasishgang7 0 Junior Poster in Training

12 Years Ago

thanks all....!!

Help with REGEX

Recommended Answers Collapse Answers

All 3 Replies

Recommended Answers