8 Topics

Member Avatar for
Member Avatar for Niloofar24

Hello. This is my code: from bs4 import BeautifulSoup import urllib2 url = urllib2.urlopen('http://www.website_address.com') soup = BeautifulSoup(url) images = soup.find_all('img') Now how can I get the "src" of img tags?

Member Avatar for Gribouillis
0
17K
Member Avatar for dancks

I'm using beautifulsoup to grab text from HTML files. Buts its not perfect: For example it seems to keep css and javascript code that was added haphazardly. My overall goal is to make a list of words and their frequency to compare and contrast html files to categorize them. Dealing …

Member Avatar for Gribouillis
0
277
Member Avatar for Afroula

Hi everyone, I'm trying to extract text from between tags but only in certain conditions. <title> and <pos> are both children of <page>, but neither one is nested inside the other (i.e., they're siblings). Each <page> always has one <title> and zero to 5 <pos> sections. What I need to …

Member Avatar for Afroula
0
2K
Member Avatar for John A.

Hey guys, I'm trying to get all links on a website using BeautifulSoup, Queue, Threading, and urllib2. I am specifically looking for links that lead to other pages of the same site. It runs for a few seconds, going through about 3 URLs before giving me the error: Traceback (most …

Member Avatar for Sky Diploma
0
729
Member Avatar for sarelnet

Hi, I have a HTML page in one variable. I need build a mehod that will extract a tag content (dif extract_tag(self, tag_name)). For example, given webpage: <div id="mw-page-base" class="noprint"></div> <div id="mw-head-base" class="noprint"></div> <!-- content --> <div id="content" class="mw-body"> <a id="top"></a> <div id="mw-js-message" style="display:none;"></div> <!-- sitenotice --> <div id="siteNotice"><!-- centralNotice …

Member Avatar for ryantroop
0
249
Member Avatar for hemant_rajput

Hi, i've used the Beautifulsoup module to parse the site and grab the img tag from it, but the problem is , Beautifulsoup while parsing not returning the whole content of the given url. The truncated content contain the image location I want to download: [CODE] from urllib2 import urlopen …

Member Avatar for Gribouillis
0
447
Member Avatar for Huakalero

In the line 72 of the code i do a findAll to retrieve all 'a' tags that have a 'horariosCarteleraUnderline' class and that have an href url that contains `?ic=[code]&` where code is a common code used to identifie the movie start time. It should retrieve all movie times, but …

Member Avatar for Huakalero
0
219
Member Avatar for Huakalero

Hi, I am using beautiful soup to get data from a webpage. With help I was able to get a list of cities with correct accents. Now am trying to get a list of movie theaters in a selected city but these come with no accents, but with weird characters. …

Member Avatar for Huakalero
0
858

The End.