Hey,

Problem:
Need to find an algorithm that given 2 sequences of n numbers each, say S1 and S2, and 1 number, say x, it would tell if x is in the sum result of S1 and S2. The hashing function, say h, must be picked at random from a universal set of hashing functions, say H. The algorithm should not take more than O(n) time.

My attempt:
Assume H is a universal hashing family, from which h(z) is picked at random. Use h on every element, say s1 in S1 and s2 in S2 to compute h(s1 + s2) and place result in table of size n, A[1,....,n].
Check if A[h(z)] is occupied. If it is then z is in sum of S1 and S2.

Recommended Answers

All 18 Replies

Hashing functions have absolutely nothing to do with the problem you're trying to solve. Your question is completely ridiculous or else you've paraphrased it incorrectly.

Hashing functions have absolutely nothing to do with the problem you're trying to solve. Your question is completely ridiculous or else you've paraphrased it incorrectly.

How nice of you to say that, as its a question from a homework I'm trying to solve, and I found this particular question to be quite ambiguous, as it doesn't really specify the order of the sequences.

However, I apologize if the way I stated the question was confusing. I will try to make it simpler:

Input:

  1. Sequence of n numbers A = {a_1, a_2, .... , a_n}
  2. Sequence of n numbers B = {b_1, b_2, .... , b_n}
  3. A number x

Output:

  1. Boolean - 1 if x \in {a_i + b_j where 1 <= i, j <= n}, 0 otherwise.

Requirements:

  1. Universal Hashing Families
  2. Running time = O(n)

I think the reason as to why use hashing is to do a fast look up on the hash value of x and see if it exists in the hash table constructed from the summation over elements of A and B.

I think the reason as to why use hashing is to do a fast look up on the hash value of x and see if it exists in the hash table constructed from the summation over elements of A and B.

You see, that's ridiculous, because instead of putting the elements of {a+b : a \in A, b \in B} into a hash table, you could just compare each of them to x.

You see, that's ridiculous, because instead of putting the elements of {a+b : a \in A, b \in B} into a hash table, you could just compare each of them to x.

But then you would have to compare to x every time when x changes. Moreover, what happens if there is more than one pair of a and b that sum up to x. Algorithm must run in linear time.

But then you would have to compare to x every time when x changes.

Now you're changing the problem from the original way it was stated.

Moreover, what happens if there is more than one pair of a and b that sum up to x. Algorithm must run in linear time.

My response is simply a demonstration of how your proposal to construct a hash table doesn't make sense. And the running time has nothing to do with the possibility of having multiple pairs sum to x, it's always O(n^2) whether you like it or not.

As for getting it running in O(n): You can easily get an algorithm running in O(n log n) without any use of hash functions. I hope that is obvious. Once you start thinking about that solution, it is easy to think of an O(n) solution that uses a hash table.

(Assuming you have the magical hash function that can distinguish n inputs without needing to look at log_2(n) bits of each value. Which is what we are assuming, isn't it?)

I would hash values of S1 and go over s in S2 hashing x-s and checking if that is saved in hash table then s, x-s is one pair for the sum.

In Python terms

s1_set = set(s1)
pairs = [(x, x-s) for s in S2 if x-s in s1_set]

So if you consider x fixed then it could work out if we have changing x, we need to evaluate the list for each x again.

I would hash values of S1 and go over s in S2 hashing x-s and checking if that is saved in hash table then s, x-s is one pair for the sum.

In Python terms

s1_set = set(s1)
pairs = [(x, x-s) for s in S2 if x-s in s1_set]

So if you consider x fixed then it could work out if we have changing x, we need to evaluate the list for each x again.

Excuse me if I misunderstood your idea, but wouldn't that take O(n^2) time.
If you hash all the values in S1 taking O(n) .
Then looping over every value s in S2, and hashing x - s, taking another 0(n) .

If you need to be able to change x, it takes O(n), but you hash one time for every number for each x, you would create two hashes, but first hash is always same, so it does not need to be calculated again if x changes (so it is best to chose longer one for S1). Of course if sum is found for you can immediately short cut the operation as one is enough.

Actually I think that first sorting both S1 and S2 could be useful. Combined with binary search it could be pretty fast compared to hashing all sums (which would be n/2*n/2 = n**2/4 = O(n**2) for equal S1 and S2)

I am little rusty on these O() calculations though.

Of course if sum is found for you can immediately short cut the operation as one is enough.

That was my initial thinking; Calculating the sums before starting to look up x. Then hashing these sums and placing them in a hash table. Then you can search for x if it exists by taking its hash value and looking it up in the hash table in only O(1) time (by the definition of a Hash Table). If x changes then we can just take the hash value for the new x and look it up again in the hash table in O(n) .

Actually I think that first sorting both S1 and S2 could be useful. Combined with binary search it could be pretty fast compared to hashing all sums (which would be n/2*n/2 = n**2/4 = O(n**2) for equal S1 and S2)
I am little rusty on these O() calculations though.

I think since it mentioned that both S1 and S2 are sequences, I assume they are ordered in some way (e.g. ascending order) by the definition of a sequence, so not sure if there's a need to sort in this case.

However I'm a bit confused on the result of summing two sequences, for example:

Assume S1 = { 1, 2, 3 } and S2 = { 1, 3, 5 }
What is S1 + S2 ? 
Is it { 1+1, 2+3, 3+5 } or { 1+1, 1+3, 1+5, 2+1, 2+3, 2+5, ... , 3+5} ?

If you have sorted sequences you can first check that value is in possible range sum of minimums and sum of maximums. If the input is random x in integers, so actually probability of x hitting that finite range is minimal. So you should also know something about range of numbers.

Sum of sets is usually union of sets. What I took you to mean was
Sum a + b for a in S1 for b in S2
ie the latter one with product of length of sets sums to calculate.

The idea is simple once you know it. You need to store the diff values into hash, in order to get it in linear time*.

Here is the idea:

1) Combine S1 and S2, call it S
2) For each element in S, check if the difference is in the hash table
3) If it is, then output the match else keep processing.

Example:

A = {1,2,4}
X = 5;
HashTable = {}

//step 2

i = 1
key = 1, diffValue = 5 - 1 = 4
is diffValue a key in HashTable?
false so insert: HashTable = { (1,4) }
i = 2
key = 2, diffValue = 5 - 2 = 3
isdiffValue a key in HashTable?
false so insert: HashTable = { (1,4), (2,3) }
i = 3
key = 3, diffValue = 5 - 3 = 2
isdiffValue a key in HashTable?
true so output: {(2,3)}

no more iteration.


However there is a catch to the above algorithm.
1) It does not handle the case where target value is 2 times the diff value. You need to fix that
2) Hashing is at worst linear time in theory. So that still makes it O(n^2) in theory
But its possible to make the hashing O(1) because its a sequence of number and we can achieve perfect hashing. So you will have to assume that to assume linear time solution.

2) Hashing is at worst linear time in theory? So that still makes it O(n^2) in theory

That's why there was that verbiage about selecting a hash function randomly from the universal set of hash functions. So we can say that for some C the probability of the algorithm taking more than C*n time approaches zero as n gets arbitrarily large.

My understanding of stated problem seems to be different from firstPersons, here the simplest hash the possible sums version in Python

# Python program for sums in set product

def check(a,s):
    return a in s

# two unordered sequences
S1 = [1, 1010100, 2, 3]
S2 = [-423, 34523, 2342]

# hash sums approach len(S1) * len(S2) = 4 * 3 = 12 hash operations

possible = {a+b for a in S1 for b in S2}

print(possible)
print('%i valid sums' % len(possible))
"""Output:
set([1012442, 2343, 2344, -420, 1009677, 1044623, 2345, -422, -421, 34524, 34525, 34526])}
"""

#check is only hash lookup
# 3 not possible as 1 and 2 are both in S1
print(check(3, possible))
# -> False

# 2344 is possible as 2 in S1 and 2342 in S2 and 2 + 2342 = 2344
print(check(2344, possible))
# -> True

1) It does not handle the case where target value is 2 times the diff value. You need to fix that

That's why mod is used, right? As a hashing function, [TEX]h(x) =\ ((ax + b) mod p) mod N) + 1[/TEX], where p is a prime number larger than N and N >= 2n + 1, assuming we combine S1 and S2 into S, making its size 2n, and adding 1 to make N prime.

That's why there was that verbiage about selecting a hash function randomly from the universal set of hash functions. So we can say that for some C the probability of the algorithm taking more than C*n time approaches zero as n gets arbitrarily large.

Yea the whole idea of using a universal hashing family is to pick a hashing function at random by picking some constants, say a and b, at random from a range of numbers. Then I think the probability of two different keys mapping on to the same spot, that is the number of collisions, [TEX]c_h(x, S)[/TEX] is some [TEX]O(\frac{n}{N})[/TEX]

I'm hoping I'm on the right track here?

Actually the solution is very simple. You need to think out of the box... What you need to do is the "key" of the Hash, not its value. Use arithmetic in computation and you get the result.

For example.
Let a = [1, 6, 7, 9, 11]  (length n)
Let b = [5, 9, 3, 14, 6, 5]  (can even be duplicated)  (length m)

Create a hash for each array. Let the value be "true"
A = {"1"=>true, "6"=>true, "7"=>true, "9"=>true, "11"=>true}
B = {"5"=>true, "9"=>true, "3"=>true, "14"=>true, "6"=>true}  (notice that 5 is now one)
  ** The time complexity of this O(n+m) or still O(n) **

Given x value is 10:
Search for combination number either going through A or B (same result).
If going through A:
  foreach key in A, subtract the key value from the x value
    use the result value as a key to look in B  ( the correct result will hit when A is 1)
    if the key exists in B, return the value of key in A & B as combination (the correct result will be 10-1 = 9)
    end if
  end for
  not found, return no result

If going through B:
  foreach key in B, subtract the key value from the x value
    use the result value as a key to look in A  ( the correct result will hit when B is 9)
    if the key exists in A, return the value of key in A & B as combination (the correct result will be 10-9 = 1)
    end if
  end for
  not found, return no result

  ** The time complexity of this is either O(n) or O(m) which is still O(n). **

Hope this would help you understanding it.

Actually the solution is very simple. You need to think out of the box... What you need to do is the "key" of the Hash, not its value. Use arithmetic in computation and you get the result.

For example.
Let a = [1, 6, 7, 9, 11]  (length n)
Let b = [5, 9, 3, 14, 6, 5]  (can even be duplicated)  (length m)

Create a hash for each array. Let the value be "true"
A = {"1"=>true, "6"=>true, "7"=>true, "9"=>true, "11"=>true}
B = {"5"=>true, "9"=>true, "3"=>true, "14"=>true, "6"=>true}  (notice that 5 is now one)
  ** The time complexity of this O(n+m) or still O(n) **

Given x value is 10:
Search for combination number either going through A or B (same result).
If going through A:
  foreach key in A, subtract the key value from the x value
    use the result value as a key to look in B  ( the correct result will hit when A is 1)
    if the key exists in B, return the value of key in A & B as combination (the correct result will be 10-1 = 9)
    end if
  end for
  not found, return no result

If going through B:
  foreach key in B, subtract the key value from the x value
    use the result value as a key to look in A  ( the correct result will hit when B is 9)
    if the key exists in A, return the value of key in A & B as combination (the correct result will be 10-9 = 1)
    end if
  end for
  not found, return no result

  ** The time complexity of this is either O(n) or O(m) which is still O(n). **

Hope this would help you understanding it.

My 'Python as pseudocode' version of the same thing was little shorter (http://www.daniweb.com/software-development/computer-science/threads/388732/1675118#post1675118). Of course you must understand that set() does hashing of values.
Here actually run version, with your lists:

s1, s2  = [1, 6, 7, 9, 11], [5, 9, 3, 14, 6, 5]
x = 10

s1_set = set(s1)
pairs = [(x-s, s) for s in s2 if x-s in s1_set]
print pairs
"""Output:
[(1, 9), (7, 3)]
"""

I understand, array & hash can be convertible. Though, explaining it in words instead of use program code is more difficult sometimes.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.