Is there someway I can parse badly written HTML code in python? I want to get some info from a web page which uses HTML tables for it's formatting and I found numerous flaws in the code using w3cs validator. can I parse this code in python?

Recommended Answers

All 3 Replies

The Beautifulsoup module can parse bad html. Also if you have beautifulsoup, you can use the lxml module to parse your bad html code.

For those of you who use Python3:
BeautifulSoup works fine with Python3 if you copy
BeautifulSoup.py (version3.0.7a or lower)
and
sgmllib.py (find it typically in C:\Python25\Lib)
to a separate directory and convert both programs with 2to3.py

thanks..Both useful posts because I use python 3 and I'm going to look around about beautiful soup. (For anyone else reading this thread, bad HTML code refers to badly constructed bode but this code displays well enough in firefox)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.