I am using Beautiful Soup 4, python 3.x on a project just to learn it.

  1. soup = BeautifulSoup(s ) # use default parser
  2. soup = BeautifulSoup(s, 'html5lib') # specified parser

Actually #2 solved my problem already. But, when using the first approach I got different behavior on my Ubuntu system than on the windows 7 system. Both systems are running python install 3.3 or 3.4. It appeared that when runing the native parser on linux it lost some of the html in parsing, which was present on windows.

My question: Is the native parser an integral part of python ? Or is it from the OS ?

Why would I get different behavior ? My test input is just a page I got off yahoo site for experimentation and it is saved to a file so both tests are working off the same html data.

Although my immediate problem is solved using the html5lib parser I would appreciate someone smarter than me to explain this.

Thanks for any enlightenment.


Thank you.

Specifing 'html.parser' explicitly made it work. Yes, I had installed lxml on my linux system and had no idea the default had been switched.