0

Hello,
Iam trying to get a table from the html page.I succeeded in getting the table values.But i have problem in getting the field names in the 1st column.

# Getting the field names.How to strip the 'b' and get only the names
paramKey = table.findAll('b')
print paramKey

The above code prints all the text and <b></b> tags also. I want to remove the tags and read only the text.I found a function named striphtml in net but that expects string as an argument and not accepting this soup object.I have provided the full code below. Can somebody advice me on this?

import urllib2
import BeautifulSoup
import re
from BeautifulSoup import *

def striphtml(data):
    p = re.compile(r'<.*?>')
    return p.sub('',data)
    #return p.sub('', data)

pageurl = "http://www.cholawealthdirect.com/Corporateinfo/CompSearch.aspx?id=KFR1&cocode=476"
page = urllib2.urlopen(pageurl)
soup = BeautifulSoup(page)

rowIndex = 0
colIndex = 0
table = soup.find('td', { "id" : "_ctl0_InnerTable" })
rows = table.findAll('tr')

for tr in rows:
    cols = tr.findAll('td')
    print "----Row No----",rowIndex 
    for td in cols:
        print "Column no",colIndex,cols[colIndex].string
        colIndex = colIndex + 1
    colIndex = 0
    rowIndex = rowIndex + 1

# Getting the field names.How to strip the 'b' and get only the names
paramKey = table.findAll('b')
print paramKey

Edited by meensatwork: n/a

2
Contributors
1
Reply
3
Views
6 Years
Discussion Span
Last Post by richieking
0

To have the values inside the tages ... you do this
on line 31

print paramkey.text

Now using find all makes paramkey a list. Therefore you must iter. paramkey.

print([x.text for x in paramkey])

Hope you got the idea :)
show your love....

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.