Parsing bad HTML code in python?

Question

mahela007 6 Posting Whiz in Training

15 Years Ago

Is there someway I can parse badly written HTML code in python? I want to get some info from a web page which uses HTML tables for it's formatting and I found numerous flaws in the code using w3cs validator. can I parse this code in python?

html-css python

3 Contributors
3 Replies
95 Views
1 Day Discussion Span
Latest Post 15 Years Ago Latest Post by mahela007

All 3 Replies

vegaseat 1,735 DaniWeb's Hypocrite

15 Years Ago

For those of you who use Python3:
BeautifulSoup works fine with Python3 if you copy
BeautifulSoup.py (version3.0.7a or lower)
and
sgmllib.py (find it typically in C:\Python25\Lib)
to a separate directory and convert both programs with 2to3.py

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 1 · 2010-01-11T01:09:55+00:00

The Beautifulsoup module can parse bad html. Also if you have beautifulsoup, you can use the lxml module to parse your bad html code.

mahela007 6 Posting Whiz in Training · Answer 2 · 2010-01-11T18:31:06+00:00

thanks..Both useful posts because I use python 3 and I'm going to look around about beautiful soup. (For anyone else reading this thread, bad HTML code refers to badly constructed bode but this code displays well enough in firefox)

Parsing bad HTML code in python?

Recommended Answers Collapse Answers

All 3 Replies

Recommended Answers