I want to extract this data from a website You need to add the numbers present in this string. Your string is: 3754537dd2f082ad578e0ff1806d86a6

This is what I was doing

url = opener.open('http://www.website.com')
data = url.read()
extract = re.search('your string is: <strong>(.*)', data)
ans = extract.group(1)
ans = string.split(ans, '</strong>')[0]
print "Given string "+ans

But it showing me this error.

Traceback (most recent call last):
  File "time2.py", line 16, in <module>
    ans = extract.group(1)
AttributeError: 'NoneType' object has no attribute 'group'

Which means that extract has got nothing from website.
How can I solve this problem ? And is there any other/better way to scrape particular data ?

Recommended Answers

All 4 Replies

And is there any other/better way to scrape particular data ?

There is a better way,and HTML and regex is not the best way.
Read bobince funny and good answer her.

Python has two star paser Beautiful Soup and lxml
A couple of example parsing a <strong> tag.

from bs4 import BeautifulSoup

html = """\
<html>
<head>
   <title>html page</title>
</head>
<body>
  <strong>Hello world</strong>
</body>
</html>
"""
soup = BeautifulSoup(html)
tag = soup.find('strong')
print tag.text #--> Hello world

With lxml let say a website has <strong> tag contain text Hello world.

from lxml.html import parse

page = parse('http://www.website.com').getroot()
print page.xpath('//strong')[0].text  #--> Hello world

Actually I did code it wrong, the regular expression part. The given md5 hash isn't inside <strong> tags. Its simple plain text. I was thinking of how to match it using re.search()

Here are some example for web scripting in python:

!/usr/bin/env python

print "Content-Type: text/html"
print
print """\
<html>
<body>
<h2>Hello World!</h2>
</body>
</html>

. The given md5 hash isn't inside <strong> tags. Its simple plain text

Even if it's in plain text,it's inside HTML on a website.
Then using a parser is the right tool,of course there is possible get one value like this with regex.
If you need help you have to post where it are,or some lines where value is present.

Here are some example for web scripting in python:

This has nothing to do with web-scraping.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.