Working with html files

Question

luke77 0 Newbie Poster

16 Years Ago

Hi guys,

I'm new to programming and am learning with Python. I am trying to write a program that will get information from a website and display it in a program. Specifically, I'm writing a stock portfolio program that will look up information on a specific ticker and return that information. Reading through the Python documentation, I've figured out the following:

I can open the page of interest using this method:

import urllib2
ticker='GOOG'
tickerurl='http://finance.yahoo.com/q?s='+ticker
pagehtml=urllib2.urlopen(tickerurl)

This returns a big string with the HTML code for the page. I don't know what to do at this point, though, to get the info I want from the page. I could do something like:

ticker=raw_input('Enter the ticker symbol: ')
        ticker='http://finance.yahoo.com/q?s='+ticker
        tickerhtml=urllib2.urlopen(ticker)
        for i in tickerhtml.readlines():
                print i
                if 'Last' in i:
                        htmltext=i
        
        quoteindex=htmltext.find('Last')
        htmltext=htmltext[quoteindex:]
        print htmltext[:100]

        quoteindex=htmltext.find(ticker+'">')
        htmltext=htmltext[quoteindex:]
        price=htmltext[:4]
        print htmltext[:100]


        print ticker, " last traded at ",price

I came up with this by looking at the source code that urlopen returns, and essentially searching for the words 'last trade', and then searching for the first instance after that of ticker+'">', because those are the characters that immediately preced the quote on the source page. I am sure that there must be a better, easier way to do this, though. Particularly because I'm sure the source code changes from page to page and this method may not work for different stocks. And say I want to pull up other information, like last trade time, volume, market cap, etc?

Any advice appreciated.

Thanks,
Luke

finance html-css open-source python

2 Contributors
3 Replies
127 Views
15 Hours Discussion Span
Latest Post 16 Years Ago Latest Post by Gribouillis

All 3 Replies

Gribouillis 1,391 Programming Explorer

16 Years Ago

I suggest you try using a module like BeautifulSoup which contains functions to extract informations from html pages. See this link http://www.crummy.com/software/BeautifulSoup/documentation.html

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

luke77 0 Newbie Poster · Answer 1 · 2008-10-21T07:26:40+00:00

Thanks. I downloaded BeautifulSoup and used it to parse a sample page, but now all I have is a slightly nicer looking version of the huge string of HTML code from the page in question. I still don't know how to extract specific items of interest (like price, etc.) from this huge string.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 2 · 2008-10-21T12:13:37+00:00

BeautifulSoup has functions find , findAll , and related functions which should help you. Try to learn how to use them.

Working with html files

Recommended Answers Collapse Answers

All 3 Replies

Recommended Answers