Hey guys,

I'm working on a basic search engine and am really close to completion.

I currently have a function that takes a string and compares each word and its synonyms to a webpage.

My output at the moment is [("closeness" percentage of terms to webpage, webpage contents,(x,y),(x,y)...(x,y)]

I am almost there, but I now need to remove the items that have no match to a site (ie, where x = 0.

I have found out that the itemgetter() function isolated just the first variables, then I filtered out the zeros from there with this code

def Google_search(string):
    internet_length = len(Internet)
    percentage_list = []
    for x in range(0,internet_length):
        position = x
        closeness_percentage = closeness(string, Internet[x])
        percentage_list.append([closeness_percentage, Internet[position]])

    sorted_list = sorted(percentage_list, key=operator.itemgetter(1), reverse = True)
##    print sorted_list
    ## now to delete the ones with zero percentage

    get_percentages = operator.itemgetter(0)
    percentages = map(get_percentages, sorted_list)
    print percentages
    no_zeros = [x for x in percentages if x is not 0]
    print no_zeros
    print sorted_list

So any example of the output would be
[13, 0, 3, 2, 0, 0, 4, 0, 0, 6, 2, 3, 0, 0]
[13, 3, 2, 4, 6, 2, 3]

This is good, however, deleting the zeros from percentage only list does not correlate to them being deleted from the list with the webpages - obviously as its a new list!

I have been straining my brain for hours about how to get around this! I think I need to make a loop that compares the 2nd value in each SUBLIST to the values of the original list, then if its a match return true, then filter the results! But i dont know how to do something like

for x in range(0, length):
     for y in range(0, no_zeros_length):
           if sorted_list[x].itemgetter(1) == no_zeros:
                   return true

Do you guys get what I mean? Or is there a much easier way to omit the zeros from the original list?

Thanks heaps in advance!

ps. Ive attached the file (rename to .py if you want to use it)..so its easier to understand whats going on as this is part 4 and each part is dependant on the others before it (thought it would be too much code for a post)!

or get them here

Python File

As .txt

I think you want a dictionary, if I understand correctly. The dictionary is the standard way of mapping one set of items to another.

So you have

mydict = {URL1: 13, URL2: 0, URL3: 3, URL4: 2, ...}

And then you run this bit of code:

for URL in mydict.copy():

   if mydict[URL] == 0:

and then your list of hot URLs is simply mydict.keys().