We're a community of 1.1M IT Pros here for help, advice, solutions, professional growth and fun. Join us!
1,080,439 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Start New Discussion Reply to this Discussion

parse html,lose many lines why?

here is my code:

import urllib
import lxml.html

equitydown="http://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm"
file=urllib.urlopen(equitydown).read()
root=lxml.html.document_fromstring(file')

rdata = root.xpath('//tr[@class="tr_normal" and (.//img)]')
for data in rdata:
data.getparent().remove(data)

root1=lxml.html.tostring(root)
my=open('c:\\hk1.html','w')
my.write(root1)
my.close()

when i open c:\hk1.html,comparing it with
http://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm

there is a problem ,many lines in the
http://sc.hkex.com.hk/gb/www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm
such as
06830 华众控股 2,000 #
06838 盈利时 2,000 #
06868 天福 1,000 #
06880 豪特保健 2,000 #
06883 新濠博亚娱乐 300 #

can't find in the c:\hk1.html,why??

1
Contributor
0
Replies
1
View
luofeiyu
Newbie Poster
7 posts since Aug 2010
Reputation Points: 7
Solved Threads: 0
Skill Endorsements: 0

This article has been dead for over three months: Start a new discussion instead

Post: Markdown Syntax: Formatting Help
 
You
 
© 2013 DaniWeb® LLC
Page generated in 0.1534 seconds using 2.68MB