Start New Discussion within our Software Development Community

Hello,
Iam trying to get a table from the html page.I succeeded in getting the table values.But i have problem in getting the field names in the 1st column.

# Getting the field names.How to strip the 'b' and get only the names
paramKey = table.findAll('b')
print paramKey

The above code prints all the text and <b></b> tags also. I want to remove the tags and read only the text.I found a function named striphtml in net but that expects string as an argument and not accepting this soup object.I have provided the full code below. Can somebody advice me on this?

import urllib2
import BeautifulSoup
import re
from BeautifulSoup import *

def striphtml(data):
    p = re.compile(r'<.*?>')
    return p.sub('',data)
    #return p.sub('', data)

pageurl = "http://www.cholawealthdirect.com/Corporateinfo/CompSearch.aspx?id=KFR1&cocode=476"
page = urllib2.urlopen(pageurl)
soup = BeautifulSoup(page)

rowIndex = 0
colIndex = 0
table = soup.find('td', { "id" : "_ctl0_InnerTable" })
rows = table.findAll('tr')

for tr in rows:
    cols = tr.findAll('td')
    print "----Row No----",rowIndex 
    for td in cols:
        print "Column no",colIndex,cols[colIndex].string
        colIndex = colIndex + 1
    colIndex = 0
    rowIndex = rowIndex + 1

# Getting the field names.How to strip the 'b' and get only the names
paramKey = table.findAll('b')
print paramKey

To have the values inside the tages ... you do this
on line 31

print paramkey.text

Now using find all makes paramkey a list. Therefore you must iter. paramkey.

print([x.text for x in paramkey])

Hope you got the idea :)
show your love....

This article has been dead for over six months. Start a new discussion instead.