Collection Comparison Problem

Question

lrh9 95 Posting Whiz in Training

13 Years Ago

I'm working on a project where I would like to compare collections of words, but I have additional constraints I need to account for.

The collections are like a set in that order doesn't matter for the comparison.

However, in my problem equal elements are significant.

E.g. the collection of "foo", "bar", "baz" would not be the same as the collection of "baz", "foo", "bar", "baz".

However, the collection of "baz", "foo", "bar", "baz" would be the same as the collection of "bar", "baz", "foo", "baz".

Additionally, the collection itself must be hashable.

Thank you for reading.

python

2 Contributors
3 Replies
141 Views
1 Day Discussion Span
Latest Post 13 Years Ago Latest Post by TrustyTony

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

lrh9 95 Posting Whiz in Training · Answer 1 · 2011-08-04T04:49:10+00:00

papna, a user in the #python channel on freenode.irc.net, gave me a very clever solution.

papna told me to use a collections.Counter.

http://docs.python.org/py3k/library/collections.html?highlight=collections#collections.Counter

It will satisfy my requirements excepting hashability, but I know how to deal with that.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 2 · 2011-08-04T13:28:16+00:00

>>> def are_same(a, b):
	return set(a) == set(b) and len(a) == len(b)

>>> are_same(("foo", "bar", "baz"), ("baz", "foo", "bar", "baz"))
False
>>> are_same(("foo", "baz", "bar", "baz"), ("baz", "foo", "bar", "baz"))
True
>>>

For hashability use frozenset or tuple of sorted values. BTW other formulation from this is:

>>> def are_same(a,b):
	return sorted(a) == sorted(b)

>>> are_same(("foo", "bar", "baz"), ("baz", "foo", "bar", "baz"))
False
>>> are_same(("foo", "baz", "bar", "baz"), ("baz", "foo", "bar", "baz"))
True
>>>

This is wrong. Equality of length isn't equality of the amount of elements.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 3 · 2011-08-05T17:39:44+00:00

Sorry the first formulation is not correct as different elements could be doubled and length same:

>>> def are_same(a, b):
	return set(a) == set(b) and len(a) == len(b)

>>> are_same(*(("foo", 'foo', "bar", "baz"), ("baz", "foo", "bar", "baz")))
True
>>>

The second, sorted version should be valid though. You are basically checking 'anagram sequences', so also you could use the function from my anagram code snippet:

def isanaseq(k,s):
    """ Goes through the items of second sequence (s) and returns True
        if first sequence (k) contains exactly same elements in same number
    """
    ## different length, not anagram, makes function faster (fail fast principle)
    if len(k) != len(s):
        return False
    for c in s:
        if c not in k:  ## element not contained in first one found
            return False
        pos = k.index(c)
        ## drop the element in found position pos by slicing, so tuple is OK
        k = k[:pos]+k[pos+1:] 
    return not k

for t in ((("foo", "bar", "baz"), ("baz", "foo", "bar", "baz")),
          (("foo", 'foo', "bar", "baz"), ("baz", "foo", "bar", "baz")),
          (("foo", "baz", "bar", "baz"), ("baz", "foo", "bar", "baz"))):
    print('%s and %s: %s' % (t+(isanaseq(*t),)))