Matching items in lists

Question

katamole 0 Newbie Poster

16 Years Ago

Hi there,

I'm quite new to python and have a question regarding list matching. What would be the most efficient method of searching for matching items in a list?

Ideally, the code would look like:

list1 = ['spam', 'eggs', 'bacon']
list2 = ['spam', 'ham', 'king richard']

matchitems(list1, list2)

...

"There is/are X matching item(s)" (in this case 1)

Am I right in thinking the best way to approach this would be with regular expressions? Perhaps a raw string search which adds to a counter every time an item is found in both lists?

I will be dealing with lots of large lists (music tags) so would like to work on an efficient method for this.

Any advice much appreciated (and if I've missed anything obvious, then apologies, I have looked through the documentation but there isn't much regarding this problem apart from using regexes).

python

4 Contributors
6 Replies
181 Views
9 Hours Discussion Span
Latest Post 16 Years Ago Latest Post by woooee

All 6 Replies

lllllIllIlllI 178 Veteran Poster

16 Years Ago

Oh i did exactly the same exercise myself when learning about lists, what you need to know here is about the "in" keyword. It is used to check if "a in b" so what we can do is check if item a is in list b. If it is then we have a match. Get it? I hope so. If not here is an example:

def getMatches(first, second):
    count = 0
    for item in first:
        if item in second:
            count += 1
    return count
            

list1 = ['spam', 'eggs', 'bacon',"ham"]
list2 = ['spam', 'ham', 'king richard']

print "there are ",getMatches(list1, list2)," matches"

Hope that helps you on your way!
:)

scru 909 Posting Virtuoso

16 Years Ago

I don't know how efficient that would be, especially for a large list. The in operator in this context basically loops through the sequence until it finds a match, returning false if it doesn't.

The problem I have with its use in the above code is that it can be used up to len(l1) * len (l2) times for a single comparison! That's a lot of iterations! That code is bound to get exponentially slower for larger lists. Python has a built-in package called difflib with a class called SequenceMatcher. I imagine the algorithm that they use would be more efficient (but you can never know until you test it yourself).

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

katamole 0 Newbie Poster · Answer 1 · 2009-04-15T18:13:20+00:00

Just to add, is there any merit to turning the lists into tuples and hashing the items? Then the keys can be matched. I'm not too sure how this would affect efficiency though.

lllllIllIlllI 178 Veteran Poster · Answer 2 · 2009-04-15T19:11:11+00:00

Yeah, i know what you mean scru. I was using this because of the relatively simple lists. But thanks for the heads up. I love learning of new modules :)

katamole 0 Newbie Poster · Answer 3 · 2009-04-15T20:41:14+00:00

Thanks both for your input. I will try these methods out :D

woooee 814 Nearly a Posting Maven · Answer 4 · 2009-04-15T21:22:20+00:00

Python (with batteries included) usually has a tested tool that will already do most common things. In this case, you want sets.

list1 = ['spam', 'eggs', 'bacon']
list2 = ['spam', 'ham', 'king richard']

set_1 = set(list1)
set_2 = set(list2)

##matchitems(list1, list2)
new_set=set_1.intersection(set_2)
print new_set
## works either way
print set_2.intersection(set_1)
#
## FYI
new_set = set_1.difference(set_2)
print new_set

Matching items in lists

Recommended Answers Collapse Answers

All 6 Replies

Recommended Answers