Hi, I am using beautiful soup to get data from a webpage. With help I was able to get a list of cities with correct accents.

Now am trying to get a list of movie theaters in a selected city but these come with no accents, but with weird characters.

Code:

from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup

page = urlopen("http://www.cinepolis.com/_CARTELERA/cartelera.aspx?ic=2")
html = page.read()
soup = BeautifulSoup(html)
complejos = soup.findAll('span',{'class':'TitulosBlanco'})
compList = []
for comp in complejos:
  name = comp.contents[0]
  compList.append(name)
  print "Complejo %s agregado áé" % name

I get this

Complejo Cinépolis VIP Galerías Diana Acapulco agregado áé
Complejo Cinépolis Galerías Diana Acapulco agregado áé
Complejo Cinépolis Acapulco agregado áé
Complejo Cinépolis Acapulco Renacimiento agregado áé
Complejo Cinépolis La Isla agregado áé
Complejo Cinépolis Pie de la Cuesta agregado áé
Complejo Cinépolis Sendero Acapulco agregado áé

Recommended Answers

All 2 Replies

Didn't you forget the argument convertEntities=BeautifulSoup.HTML_ENTITIES in BeautifulSoup() ?

Short version: I am using that argument and the massage.
This is a small code I wrote so people in the forum could copy and run it. But in my project code I have a method that returns the soup using the argument you just mentioned, and also the massage to correct the page.
It works for getting the cities, but in this case it doesn't.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.