Hi All,

Newbie here -

I am trying to write a program that will list out all the URL's from a webpage which I was able to successfully complete using the following script:

import urllib, urllister
urladdr = raw_input("Enter URL here: ")
usock = urllib.urlopen(urladdr)
parser = urllister.URLLister()
for url in parser.urls: print url

However, I am trying to sort the output to unique and also to take out only the domain, sub domain names from the list. For example, if the ouput is:


I want to filter out the ouput so that it only displays unique domain names and subdomains only For example,


Can anybody guide me with this?

There are no doubt other and probably better ways to do this, but try
print url.split("/")
You want the 3rd item, or result[2]. You can then add it to a list or dictionary if it is not already in the list or dictionary.