I'm working on a project where I would like to compare collections of words, but I have additional constraints I need to account for.

The collections are like a set in that order doesn't matter for the comparison.

However, in my problem equal elements are significant.

E.g. the collection of "foo", "bar", "baz" would not be the same as the collection of "baz", "foo", "bar", "baz".

However, the collection of "baz", "foo", "bar", "baz" would be the same as the collection of "bar", "baz", "foo", "baz".

Additionally, the collection itself must be hashable.

Thank you for reading.

6 Years
Discussion Span
Last Post by pyTony
>>> def are_same(a, b):
	return set(a) == set(b) and len(a) == len(b)

>>> are_same(("foo", "bar", "baz"), ("baz", "foo", "bar", "baz"))
>>> are_same(("foo", "baz", "bar", "baz"), ("baz", "foo", "bar", "baz"))

For hashability use frozenset or tuple of sorted values. BTW other formulation from this is:

>>> def are_same(a,b):
	return sorted(a) == sorted(b)

>>> are_same(("foo", "bar", "baz"), ("baz", "foo", "bar", "baz"))
>>> are_same(("foo", "baz", "bar", "baz"), ("baz", "foo", "bar", "baz"))
Votes + Comments
This is wrong. Equality of length isn't equality of the amount of elements.

Sorry the first formulation is not correct as different elements could be doubled and length same:

>>> def are_same(a, b):
	return set(a) == set(b) and len(a) == len(b)

>>> are_same(*(("foo", 'foo', "bar", "baz"), ("baz", "foo", "bar", "baz")))

The second, sorted version should be valid though. You are basically checking 'anagram sequences', so also you could use the function from my anagram code snippet:

def isanaseq(k,s):
    """ Goes through the items of second sequence (s) and returns True
        if first sequence (k) contains exactly same elements in same number
    ## different length, not anagram, makes function faster (fail fast principle)
    if len(k) != len(s):
        return False
    for c in s:
        if c not in k:  ## element not contained in first one found
            return False
        pos = k.index(c)
        ## drop the element in found position pos by slicing, so tuple is OK
        k = k[:pos]+k[pos+1:] 
    return not k

for t in ((("foo", "bar", "baz"), ("baz", "foo", "bar", "baz")),
          (("foo", 'foo', "bar", "baz"), ("baz", "foo", "bar", "baz")),
          (("foo", "baz", "bar", "baz"), ("baz", "foo", "bar", "baz"))):
    print('%s and %s: %s' % (t+(isanaseq(*t),)))

Edited by pyTony: n/a

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.