Hi there,

I'm quite new to python and have a question regarding list matching. What would be the most efficient method of searching for matching items in a list?

Ideally, the code would look like:

list1 = ['spam', 'eggs', 'bacon']
list2 = ['spam', 'ham', 'king richard']

matchitems(list1, list2)

...

"There is/are X matching item(s)" (in this case 1)

Am I right in thinking the best way to approach this would be with regular expressions? Perhaps a raw string search which adds to a counter every time an item is found in both lists?

I will be dealing with lots of large lists (music tags) so would like to work on an efficient method for this.

Any advice much appreciated (and if I've missed anything obvious, then apologies, I have looked through the documentation but there isn't much regarding this problem apart from using regexes).

Recommended Answers

All 6 Replies

Just to add, is there any merit to turning the lists into tuples and hashing the items? Then the keys can be matched. I'm not too sure how this would affect efficiency though.

Oh i did exactly the same exercise myself when learning about lists, what you need to know here is about the "in" keyword. It is used to check if "a in b" so what we can do is check if item a is in list b. If it is then we have a match. Get it? I hope so. If not here is an example:

def getMatches(first, second):
    count = 0
    for item in first:
        if item in second:
            count += 1
    return count
            

list1 = ['spam', 'eggs', 'bacon',"ham"]
list2 = ['spam', 'ham', 'king richard']

print "there are ",getMatches(list1, list2)," matches"

Hope that helps you on your way!
:)

I don't know how efficient that would be, especially for a large list. The in operator in this context basically loops through the sequence until it finds a match, returning false if it doesn't.

The problem I have with its use in the above code is that it can be used up to len(l1) * len (l2) times for a single comparison! That's a lot of iterations! That code is bound to get exponentially slower for larger lists. Python has a built-in package called difflib with a class called SequenceMatcher. I imagine the algorithm that they use would be more efficient (but you can never know until you test it yourself).

Yeah, i know what you mean scru. I was using this because of the relatively simple lists. But thanks for the heads up. I love learning of new modules :)

Thanks both for your input. I will try these methods out :D

Python (with batteries included) usually has a tested tool that will already do most common things. In this case, you want sets.

list1 = ['spam', 'eggs', 'bacon']
list2 = ['spam', 'ham', 'king richard']

set_1 = set(list1)
set_2 = set(list2)

##matchitems(list1, list2)
new_set=set_1.intersection(set_2)
print new_set
## works either way
print set_2.intersection(set_1)
#
## FYI
new_set = set_1.difference(set_2)
print new_set
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.