Search DaniWeb - beautifulsoup

Re: Anybody know how to speed up beautifulsoup? 14 Years Ago by vegaseat BeautifulSoup is a third party module for Python2 that allows you to access even badly coded HTML code. What do you want to do with it? Re: Python - writing to file 14 Years Ago by snippsat … as unicode strings. In order to convert BeautifulSoup's unicode strings to human readable strings, you have to … Re: Insatlling module when multiple version of pyhon is available 12 Years Ago by vegaseat BeautifulSoup requires the sgmllib module, which has been removed in Python 3. Re: Ignoring Comments When Parsing XML? 13 Years Ago by snippsat BeautifulSoup is a famous python HTML/XML parser. [url]http://www.crummy.com/software/BeautifulSoup/[/url] BeautifulSoup is only one file BeautifulSoup.py. build parser like minidom,elementtree should work. If not 2 of the best is BeautifulSoup and lmxl. [url]http://codespeak.net/lxml/[/url] Re: New python 3 modules 15 Years Ago by vegaseat BeautifulSoup works fine with Python30 if you copy BeautifulSoup.py (version3.0.7a or lower) and sgmllib.py (find it typically in C:\Python25\Lib) to a separate directory and convert both programs with 2to3.py This rather obvious approach was overlooked by the BeautifulSoup folks. Re: Working with html files 15 Years Ago by Gribouillis BeautifulSoup has functions [icode]find[/icode], [icode]findAll[/icode], and related functions which should help you. Try to learn how to use them. Re: Parsing HTML with Python 13 Years Ago by ultimatebuster BeautifulSoup and urllib2 Re: Question and Answer APIs 13 Years Ago by Tech B Beautifulsoup or even regex could lighten the load. I think there is even an html parser in the standard lib. Re: New python 3 modules 11 Years Ago by Gribouillis BeautifulSoup 4.1.3 is out since August 20, 2012. It is compatible with python 2.6+ and python 3 ! BeautifulSoup does not retrieves all 'a' tags. 12 Years Ago by Huakalero …the full code: from urllib2 import urlopen from BeautifulSoup import BeautifulSoup import re class cuapi(): def __init__(self):…;_blank\"'), self.cureHTML())] self.soup = BeautifulSoup(html, convertEntities=BeautifulSoup.HTML_ENTITIES, markupMassage=myMassage) return self.soup def … Re: BeautifulSoup to extract multiple TD tags within TR 12 Years Ago by sys73r …suggestion plus some more data to test [CODE]from BeautifulSoup import BeautifulSoup html = '''\ <tr id="index_table_12345"… "build/bdist.macosx-10.7-intel/egg/BeautifulSoup.py", line 601, in __getitem__ KeyError: 0… some tests [CODE]>>> soup = BeautifulSoup(html) >>> tag = soup.findAll('td') … BeautifulSoup and accented words 12 Years Ago by Huakalero … weird characters. Code: [CODE]from urllib2 import urlopen from BeautifulSoup import BeautifulSoup page = urlopen("http://www.cinepolis.com/_CARTELERA/cartelera.aspx…?ic=2") html = page.read() soup = BeautifulSoup(html) complejos = soup.findAll('span',{'class':'TitulosBlanco'}) compList = [] for … BeautifulSoup to extract multiple TD tags within TR 12 Years Ago by sys73r … = urllib2.urlopen('http://www.NotAvalidURL.com').read() soup = BeautifulSoup(data) table = soup("tr", {'class' : 'index_table_in' }) print table[… Re: BeautifulSoup to extract multiple TD tags within TR 12 Years Ago by snippsat … 1,string 2.... just iterate over the content. [CODE]from BeautifulSoup import BeautifulSoup html = '''\ <tr id="index_table_12345" class="…; </tr>''' soup = BeautifulSoup(html) tag = soup.findAll('td') #all "td" tag… Re: BeautifulSoup to extract multiple TD tags within TR 12 Years Ago by snippsat [CODE]from BeautifulSoup import BeautifulSoup html = '''\ <tr id="index_table_12345" class="index_table_in&…; </tr>''' soup = BeautifulSoup(html) tag = soup.findAll('a') #all "a" tag… Re: BeautifulSoup to extract multiple TD tags within TR 12 Years Ago by sys73r … got it working: [CODE]import urllib2 from BeautifulSoup import BeautifulSoup data = urllib2.urlopen('http://').read() soup = BeautifulSoup(data) tag = soup.findAll('a') #all… Re: BeautifulSoup and accented words 12 Years Ago by Gribouillis Didn't you forget the argument [icode]convertEntities=BeautifulSoup.HTML_ENTITIES[/icode] in BeautifulSoup() ? Re: BeautifulSoup to extract multiple TD tags within TR 12 Years Ago by sys73r …; </tr>''' soup = BeautifulSoup(html) tag = soup.findAll('a') #all "a" tag… problem parsing webpage using BeautifulSoup 12 Years Ago by hemant_rajput …from it, but the problem is , Beautifulsoup while parsing not returning the whole content of…: [CODE] from urllib2 import urlopen from BeautifulSoup import BeautifulSoup #reading the webpage source webpage = urlopen('… webpage content into variable named soup using beautifulsoup soup = BeautifulSoup(''.join(webpage)) print soup #finding all … Re: problem parsing webpage using BeautifulSoup 12 Years Ago by snippsat …drop to simulate javascript and use regex(because Beautifulsoup cant find stuff in javascript) [CODE]…from urllib2 import urlopen from BeautifulSoup import BeautifulSoup import re webpage = urlopen('http://www.…santabanta.com/photos/aalesha/10066001.htm') soup = BeautifulSoup(webpage) #print soup bac_img = re.search(r&… Re: problem parsing webpage using BeautifulSoup 12 Years Ago by hemant_rajput …drop to simulate javascript and use regex(because Beautifulsoup cant find stuff in javascript) [CODE]…from urllib2 import urlopen from BeautifulSoup import BeautifulSoup import re webpage = urlopen('http://www.…santabanta.com/photos/aalesha/10066001.htm') soup = BeautifulSoup(webpage) #print soup bac_img = re.search(r&… Help with Navigating BeautifulSoup Tree 13 Years Ago by kshw …and return it? Thanks [CODE]import re import urllib2 from BeautifulSoup import BeautifulSoup, NavigableString html = ['<html><head>&…NavigableString): print str(current) Text += str(current) return Text soup = BeautifulSoup(''.join(html)) Page_Text = ParseContent(soup) print "Text after function… Help with Python Threading Library with BeautifulSoup. 10 Years Ago by John A. … Queue import threading import urllib2 import time from BeautifulSoup import BeautifulSoup hosts = ["http://waoanime.tv"] queue…from queue chunk = self.out_queue.get() soup = BeautifulSoup(chunk) #parse the chunk for line in soup.findAll… HTML Scraper: Urllib2 / BeautifulSoup / Regex Help 15 Years Ago by katamole … i have ironed out the following problems): [code]from BeautifulSoup import BeautifulSoup import urllib2 import re #get source code of page (function…/find?s=" + searchstring print url source = fetchsource(url) soup = BeautifulSoup(source) filmlink = soup.find('a', href=re.compile("title… Re: HTML Scraper: Urllib2 / BeautifulSoup / Regex Help 15 Years Ago by Gribouillis … is coming through ok. [ICODE]rating_source = fetchsource(pagelink) soup = BeautifulSoup(rating_source) ratingregexp = re.compile(r"^[^/]*/10$") rating_element = soup…= fetchsource("http://www.imdb.com/title/tt0071853/") soup = BeautifulSoup(source) ratingregexp = re.compile(r"^[^/]*/10$") rating_element = … newb: BeautifulSoup 16 Years Ago by jobs I am trying to use BeautifulSoup: soup = BeautifulSoup(page) td_tags = soup.findAll('td') i=0 for td in … Re: HTML Scraper: Urllib2 / BeautifulSoup / Regex Help 15 Years Ago by Gribouillis …&q=" + searchstring print url source = fetchsource(url) soup = BeautifulSoup(source) filmlink = soup.find('a', href=re.compile(r"… Re: HTML Scraper: Urllib2 / BeautifulSoup / Regex Help 15 Years Ago by katamole … source is coming through ok. [ICODE]rating_source = fetchsource(pagelink) soup = BeautifulSoup(rating_source) ratingregexp = re.compile(r"^[^/]*/10$") rating_element = soup… Re: newb: BeautifulSoup 16 Years Ago by jobs [code="Python"] soup = BeautifulSoup(page) td_tags = soup.findAll('td') i=0 for td in … Re: Anybody know how to speed up beautifulsoup? 14 Years Ago by gunbuster363 … is an example (Python2 code) ... [code]import urllib from BeautifulSoup import BeautifulSoup, SoupStrainer html = urllib.urlopen("http://python.org").read… = SoupStrainer('a') # create a list a_tags = [tag for tag in BeautifulSoup(html, parseOnlyThese=a_tag)] # show all the a_tag lines for line…