here is my code:
import urllib import lxml.html
equitydown="http://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm" file=urllib.urlopen(equitydown).read() root=lxml.html.document_fromstring(file')
rdata = root.xpath('//tr[@class="tr_normal" and (.//img)]') for data in rdata: data.getparent().remove(data)
root1=lxml.html.tostring(root) my=open('c:\\hk1.html','w') my.write(root1) my.close()
when i open c:\hk1.html,comparing it withhttp://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm
there is a problem ,many lines in thehttp://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm such as 06830 华众控股 2,000 # 06838 盈利时 2,000 # 06868 天福 1,000 # 06880 豪特保健 2,000 # 06883 新濠博亚娱乐 300 #
can't find in the c:\hk1.html,why??