Here is a way to get the addresses of the images

# extract the addresses of the images included in a web page
# you need the modules lxml and beautifulsoup
# (for linux, packages python-lxml and python-beautifulsoup)
# tested with python 2.6 
from lxml.html import soupparser
from urllib2 import urlopen

def gen_elements(tag, root):
    if root.tag == tag:
        yield root
    for child in root:
        for elt in gen_elements(tag, child):
            yield elt

def gen_img_src(url):
    content = urlopen(url).read()
    content = soupparser.fromstring(content)
    for elt in gen_elements("img", content):
        yield elt.attrib.get("src", None)

def main():
    url = "http://www.it.usyd.edu.au/about/people/staff/tomc.shtml"
    for src in gen_img_src(url):
        print(src)

if __name__ == "__main__":
    main()

OK - that's an overly complex answer for a question which requires a simple answer.

Right-click on image and save.... nothing more complex required

@Gribouillis - bear in mind that if the OP is even asking how to source the original image, it makes me wonder if he is even the owner of said image!

Edited 7 Years Ago by kaninelupus: n/a

it doesn't work,when i run the code,it gave
Traceback (most recent call last):
File "C:/Users/ALEXIS/Desktop/extactphoto.py", line 5, in <module>
from lxml.html import soupparser
ImportError: No module named lxml.html

it doesn't work,when i run the code,it gave
Traceback (most recent call last):
File "C:/Users/ALEXIS/Desktop/extactphoto.py", line 5, in <module>
from lxml.html import soupparser
ImportError: No module named lxml.html

As I said, you need the lxml module.
@kaninelupus. there are different ways to understand the question. I don't think my solution is complex.

Go here and download "Beautiful Soup version 3.1.0.1". This is a compressed .tar.gz file. To uncompress it with windows, you'll need 7zip from here. Right click on the BeautifulSoup.tar.gz and uncompress once, which should give you a file with suffix .tar, uncompress a second time and you should get a folder BeautifulSoup. Then copy BeautifulSoup.py to the site-packages directory in your python lib. It should work then.

thx...how about beatifulsoup,how to install it?

http://www.crummy.com/software/BeautifulSoup/

Somewhat simple minded, but you could try this and go from there ...

# retrieve the html code of a given website
# and check for potential image sources
# tested with Python 2.5.4

import urllib2

def extract(text, sub1, sub2):
    """
    extract a substring from text between first
    occurances of substrings sub1 and sub2
    """
    return text.split(sub1, 1)[-1].split(sub2, 1)[0]


url_str = 'http://www.it.usyd.edu.au/about/people/staff/tomc.shtml'
fin = urllib2.urlopen(url_str)
html = fin.read()
fin.close()
  
#print(html)  # test
html = html.lower()

while True:
    try:
        s = extract(html, '<img src=', '/>')
        print s
        if not s:
            break
        pos = html.find(s) + len(s)
        # slice to potential next image
        html = html[pos:]
    except:
        break

Edited 7 Years Ago by vegaseat: n/a

This question has already been answered. Start a new discussion instead.