Regex (Newbie trouble) re

Question

steven.rose.94 0 Newbie Poster

10 Years Ago

'''
in Python 3.4 Attempting to parse and print one line (actually a number)from the downloaded code from yahoo sourecode using regex to pull the number that is located at the (.*?). I've tried everything I can think of to get this to work - I expect the problem is my coding somehow - any help appreciated!! :)
'''

pbr = re.search(r'(Price\/Book (mrq):<\/td><td class="yfnc_tabledata1">)(.*?)<\/td>',str(respData))
print (pbr)

python regex

Edited 10 Years Ago by steven.rose.94

3 Contributors
10 Replies
361 Views
2 Days Discussion Span
Latest Post 10 Years Ago Latest Post by steven.rose.94

All 10 Replies

Gribouillis 1,391 Programming Explorer

10 Years Ago

We see the failing regex, but we don't know how it fails. Can you post a fully failing python example with a (short) concrete respData ?

Edited 10 Years Ago by Gribouillis

snippsat 661 Master Poster

10 Years Ago

Cant find what you search in respData.
Do post also post your import.

import urllib.request, urllib.parse

This is the adress,you get data from.

>>> resp.geturl()
'https://ca.finance.yahoo.com/lookup?s=basics'

Do you find Price Book or class="yfnc_tabledata1 in url or in return respData?

Some notes this use JavaScript heavy,and are not a easy site to start with.
Which mean that you may have to use other method than urllib to read site.
I use Selenium to read sites like this.
Then i get executed JavaSript to,and can parse with Beautiful Soup or lxml.

Regex to parse HTML can be a bad choice,
it can work in some cases,but use a parser(BeautifulSoup) is the first choice.
I usually post this link,why not to use regex.

Edited 10 Years Ago by snippsat

Gribouillis commented: good tips +14

Gribouillis 1,391 Programming Explorer

10 Years Ago

You need to add

value = float([y for x, y in valueTable if x == 'Price/Book (mrq):'][0])
print(value)

You could perhaps find the table first with a findAll('table', ...).

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

steven.rose.94 0 Newbie Poster · Answer 1 · 2015-01-05T15:02:54+00:00

#

used to parse values into the url

url = 'https://ca.finance.yahoo.com/q/ks?s=CUS.TO'

values = {'s': 'basics',
         'submit': 'search'}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8')  # data should be bytes
req = urllib.request.Request(url, data)
resp = urllib.request.urlopen(req)
respData = resp.read()

seems to work to this point below I have tried numerous things but Im very new tp programming so it might be simple

pbr = re.findall(r''Price\/Book \(mrq\):<\/td><td class="yfnc_tabledata1">(.*?)</td>',(respData))
print (pbr)`

steven.rose.94 0 Newbie Poster · Answer 2 · 2015-01-05T17:55:54+00:00

the "Price Book or class="yfnc_tabledata1" is in the return respData which is the source code downloaded from yahoo.ca. my goal to get the number between that and the </td> tag to return to a floating variable. I've yet to try out BeautifulSoup - I'll have a look tonight when I'm home from work - Thank you ! :)

snippsat 661 Master Poster · Answer 3 · 2015-01-05T18:58:54+00:00

the "Price Book or class="yfnc_tabledata1" is in the return respData which is >the source code downloaded from yahoo.ca.

Ok i understand,it's just that i cant find it if search through "respData" or url.

steven.rose.94 0 Newbie Poster · Answer 4 · 2015-01-07T08:00:06+00:00

Ok downloaded Beautifulsoup4 and installed after a few attempts .. seems to be working well now :). I've still got some more of the docs to read but if I am after the "1.41" in the following string of HTML from only the Price/Book what would my soup.findAll('') look like???
I'm still playing around with the code now but I'm still getting lots of misc characters. Any Help appreciated! If Im asking too many questions on this let me know - Cheers!

#### this is the HTML line which returns as soup - I'm after the 1.41 only - which I hope to return as valueTable

<td class="yfnc_tablehead1" width="74%">Price/Book (mrq):</td><td class="yfnc_tabledata1">1.41</td>

Below is the Full code I've been playing with it does return close to what I want I just need to be more specific

import time
import urllib.request
import urllib.error
import urllib.parse
import bs4
import requests
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen

tsxowned = ('CUS.TO', 'CG.TO', 'S.TO', 'AQN.TO', 'GPS.TO', 'COS.TO', 'CSE.TO', 'CPX.TO', 'ERG.TO', 'CWW.TO', 'LEA.TO', 'WEF.TO')


############# Soup Calls for Yahoo!#########################

#Fetching the Yahoo Finance Page
optionsUrl = 'https://ca.finance.yahoo.com/q/ks?s=CUS.TO'
optionsPage = urlopen(optionsUrl)

#The following code will load the page into BeautifulSoup:
from bs4 import BeautifulSoup
soup = BeautifulSoup(optionsPage)

# Need 1.41 from <td class="yfnc_tablehead1" width="74%">Price/Book (mrq):</td><td class="yfnc_tabledata1">1.41</td>

valueTable = [
    [x.text for x in y.parent.contents]
    for y in soup.findAll('td', attrs={'class': 'yfnc_tabledata1', 'nowrap': ''})
]


# print (soup) # shows all recovered data
print (valueTable) # shows varibles your after eg price to book ...

steven.rose.94 0 Newbie Poster · Answer 5 · 2015-01-07T08:41:54+00:00

steven.rose.94 0 Newbie Poster

10 Years Ago

ok great thanks !:)

snippsat 661 Master Poster · Answer 6 · 2015-01-07T14:49:31+00:00

this is the HTML line which returns as soup - I'm after the 1.41 only - which
I hope to return as valueTable

Using .next_sibling can be better.

from bs4 import BeautifulSoup

html = '''\
<td class="yfnc_tablehead1" width="74%">Price/Book (mrq):</td><td class="yfnc_tabledata1">1.41</td>'''

soup = BeautifulSoup(html)
tag = soup.find('td', {'class': 'yfnc_tablehead1'})

Test with parent and nextSibling.

>>> tag
<td class="yfnc_tablehead1" width="74%">Price/Book (mrq):</td>
>>> tag.parent
<td class="yfnc_tablehead1" width="74%">Price/Book (mrq):</td><td class="yfnc_tabledata1">1.41</td>
>>> tag.parent.text
'Price/Book (mrq):1.41'    

>>> tag.nextSibling
<td class="yfnc_tabledata1">1.41</td>
>>> tag.nextSibling.text
'1.41'
>>> float(tag.nextSibling.text) + 1
2.41

steven.rose.94 0 Newbie Poster · Answer 7 · 2015-01-08T05:19:09+00:00

steven.rose.94 0 Newbie Poster

10 Years Ago

Woo Hoo !! working! thank you so much everyone :)

Regex (Newbie trouble) re

Recommended Answers Collapse Answers

All 10 Replies

Recommended Answers