Beautiful Soup default parser

Question

rwe0 13 Newbie Poster

11 Years Ago

I am using Beautiful Soup 4, python 3.x on a project just to learn it.

soup = BeautifulSoup(s ) # use default parser
soup = BeautifulSoup(s, 'html5lib') # specified parser

Actually #2 solved my problem already. But, when using the first approach I got different behavior on my Ubuntu system than on the windows 7 system. Both systems are running python install 3.3 or 3.4. It appeared that when runing the native parser on linux it lost some of the html in parsing, which was present on windows.

My question: Is the native parser an integral part of python ? Or is it from the OS ?

Why would I get different behavior ? My test input is just a page I got off yahoo site for experimentation and it is saved to a file so both tests are working off the same html data.

Although my immediate problem is solved using the html5lib parser I would appreciate someone smarter than me to explain this.

Thanks for any enlightenment.

rich

python

2 Contributors
2 Replies
401 Views
50 Minutes Discussion Span
Latest Post 11 Years Ago Latest Post by rwe0

Gribouillis 1,391 Programming Explorer

11 Years Ago

Startpage led to this blog entry. Perhaps you have lxml on one OS and not on the other.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

rwe0 13 Newbie Poster · Answer 1 · 2013-10-15T16:16:56+00:00

Thank you.

Specifing 'html.parser' explicitly made it work. Yes, I had installed lxml on my linux system and had no idea the default had been switched.