1.11M Members

parse html,lose many lines why?

 
0
 

here is my code:

import urllib
import lxml.html

equitydown="http://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm"
file=urllib.urlopen(equitydown).read()
root=lxml.html.document_fromstring(file')

rdata = root.xpath('//tr[@class="tr_normal" and (.//img)]')
for data in rdata:
data.getparent().remove(data)

root1=lxml.html.tostring(root)
my=open('c:\\hk1.html','w')
my.write(root1)
my.close()

when i open c:\hk1.html,comparing it with
http://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm

there is a problem ,many lines in the
http://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm
such as
06830 华众控股 2,000 #
06838 盈利时 2,000 #
06868 天福 1,000 #
06880 豪特保健 2,000 #
06883 新濠博亚娱乐 300 #

can't find in the c:\hk1.html,why??

You
Post:
Start New Discussion
Tags Related to this Article