Hello, I have been asked to provide an algorithm for the following:

You are given an array of length N, which contains values in the range from 0 to (N^2) - 1. You are to sort it in O(N) time.

I have been unable to find any way to do it, due to the lack of a "hard" upper bound on the input range (Which varies with the size of input).

I'll be happy to hear ideas for anyone who can offer help. Thanks !

10 Years
Discussion Span
Last Post by Direwolf007

My first thought was a radix sort, but since that does depend on the value of the largest number, I'm don't think that it's strictly O(N) in this case.


I have tried Radix Sort, but since the number of digits is not fixed, it scales with size of input, due to that fact, in this case, the complexity would be O(nlogn), since log(base10)n is the amount of digits.


Yeah, that's what I was thinking too. Even counting sort depends on the range of values, so it wouldn't fit either. I'm out of ideas, sorry :(


Thanks for the reply, nonetheless.

If anyone else could chime in, I would be rather thankful.


While I thank you for the reply - I already read the wikipedia entry for that, twice. I also spent a few hours on google, and reading "Introduction To Algorithms" (One of the authors is Cormen, another is Rivest, another is Stein, I don't recall the last one).

Any other ideas folks ?


>> You are given an array of length N, which contains values in the range from 0 to (N^2) - 1. You are to sort it in O(N) time.

2 questions to understand the question:

1. Why does the range of value matter? (size/range of value would have no impact on the order of algorithm from what I know)
2. When you say "in O(N) time" do you mean it's an algorithm of order N or you really mean some number of mili/micro/seconds.. ?


1. It does affect algorithm runtime in several cases, especially when dealing with non-comparison based sorts (since it's proven that comparision based sorts cannot be guaranteed to run in better than O(n log n) time, they don't make the cut for this problem). Look at some of the sorts already mentioned in this thread.

2. Linear time with respect to the length of the input.


Since it does have the O(n log n) case, it wouldn't fit the requirements. And I've never seen an implementation of it outside of that paper... ;)


But practically speaking, a sorting algo having O(n) as its worst case would be hard to get by. I guess the only choice the OP is left with is to pick a nice O(n log(n)) algo and make custom changes to it (depending on the problem statement) to bring it down to O(n) though achieving O(n) for worst case still looks kind of impossible.

And the reason Smoothsort is rarely used is because of the difficulty faced in its implementation.


Well, in a normal case you'd be able to claim that radix sort, bucket sort, or counting sort (to name a couple) would run in O(n) time. But since they're all dependent on the range in some very small degree, seems like, the claim doesn't hold anymore.


Thanks for the replies guys.

I have tried modifying bucketsort (But there is not guarantee whatsoever that the values will indeed be anywhere near a normal statistical spread over the range, and ofcourse - we already got too many buckets), I tried Radix sort (But since the range is dependant on the input, we have the problem of having the range scale upwards, pushing the running time to O(nlogn), due to the fact that the number of digits is log(base10)n. Countingsort falls through on similar grounds.

I've had the following idea:

What if we create a secondary auxillary array of size N of ints/booleans, which would indicate which elements of the N^2 sized array are used. This would lead to a rather awkward implementation of countsort, where it would not run over the N^2 sized array (but it will allocate and access it - but it will never access more than N of its elements), but would still count and sort the values.

The only thing I am not sure on, is what the allocation of a size N^2 array counts as, complexity wise ?


What about something like this:

Make buckets for the ranges 0..n-1, n..2n-1, ..., n^2-n..n^2-1.

Then, first go through the array and fill the buckets with counts of how many elements fall in the buckets (and maybe an array that you fill linearly alongside the count). Then _smartly_ (that's the key) base your strategy on the distribution of the elements in the buckets. (Of course, you'll need to figure out what strategy to use in linear time. But then again, since the counts associated with each bucket can be sorted in linear time using bucket sort.) By the way, you can allocate any amount of memory in constant time.

So maybe with the right smart selection of strategy, you can succeed in linear time.


Here's some ideas about strategy: First of all, consider an extreme case, where sqrt(n) buckets have sqrt(n) elements fall into them. Well, know (by induction) that any bucket with more than sqrt(n) elements and n possible values can be sorted in linear time using a recursive call to this same algorithm. The hard part is sorting those buckets in which the range of possible values has length N, but the number of values is much less than N (since this is closer to the general sorting problem).

Consider another extreme case. Suppose we get n/log(n) buckets of size log(n). Then each of these buckets can be sorted in log(n)(log(log(n)) time, which makes the total running time n*log(log(n)). That's obviously not the right algorithm to use, but it's much better than n*log(n).

Well, what _can_ we do? If we're to sort each of these buckets separately, we'll need to sort them in log(n) time (which is linear with respect to size).

And in general, if we consider the cases where we get n/f(n) buckets containing approximately f(n) elements each, then sorting these buckets naively will give overall n * log(f(n)) time. In "average" cases, we'll have a long tail of buckets with 2,1, and 0 elements, which means average case will take O(n) overall running time. But if f = log, then we get nlog(log(n)). And if f(n) = sqrt(n)/log(n), we get n*log(n) running time... But if f(n) = sqrt(n) or higher, then all of the sudden we get to call our algorithm recursively and get O(n) overall time. (And of course we're assuming we can get a way for our algorithm to be linear in the other cases.)

But then to solve the log(n) case, we'll need a way to sort (generally speaking) n elements in the range 1..e^n in O(n) time... And to solve the f(n) case, we'll need a way for 1..f^-1(n) in O(n) time. So ignore my previous post; this attack is going nowhere.


Are you sure you don't need to provide O(N) _average_ time?

Yes, I am positive. I needs to be O(n) in the worst case.


hm... just an idea... what if you tried applying any of the otherwise O(n) algorithms, but took the sqrt of the data as you sorted it. That would cut out the polynomial range, leavning you from 0..N, which is then polynomial time. You'd have to make an extra run over the loop at the end to square everything back to it's actual value.


Nope, won't do, unless he is ready to sacrifice the real values and accept the approximates. Think of it this way, would the data before squaring and after squaring be the same? I guess no. If you know the way floating point numbers are handled by the system, you would know what I am trying to convey.


That's true, but a work-around could be possible. For example, if memory isn't an issue, one could use a wrapper struct to maintain the original value and the square
root of the value, sort by square root, and then return the original values in sorted order. Sounds like a bit of extra work, but I think it could make the time complexity requirement.


Hmm. This sounds interesting.

I will see how I can apply that, but the one question I have is:

Since we are dealing in ints, the sqrts would not neccesarily be integers making sorting them a bit more annoying
(If only for the reason they aren't usable as array indices), but it should be doable.
As for keeping a copy, nobody said I shouldn't have a big overhead, or having space requirements to meet, so keeping a copy of the data would be perfectly acceptable.

I'll give this a shot, thanks.


It need not necessarily be the rooting. Try to come up with any function which cuts down the range and you should be good to go. Just make sure that the function gives integer values for the elements. That way, handling the transformed values and sorting them shouldn't be a problem as such.


I'll give it a shot, thanks. I'll post any ideas, or the answer, when I hopefully finish.


Considering that computing sqrt(N) takes log(N) time (at least), square rooting all the integers that you're sorting will doom you to N*log(N) complexity. And it will just arrange your elements into buckets, as they were before.

Heck, simply _looking_ at an integer takes log(N) time. My bad idea of putting elements into buckets and being smart about that requires log(N) time just to put the elements into buckets, actually, because computing k `mod` n, where k < n^2, still takes log(n) time.


Considering a generalized senario in which complexity for only looping constructs is considered and all other operations are considered to be performed in constant time, I don't see whats wrong with the idea...


Because it's wrong. You can make assumptions of O(1) comparison for comparison-sort based algorithms, mainly because the comparison operation is in fact O(1) average over the course of the sorting algorithm with respect to the amount of data you have. But here, you're performing an algorithm with your data which takes non-constant time, and you're proposing ignoring it. More particularly, the square root algorithm _uses_ looping constructs (how would you propose implementing it?); are you just going to ignore those looping constructs?

If ignoring the running time of functions written by other people is your modus operandi, why not implement selection sort where the selection part of the algorithm is abstracted away by a function that you just pretend to be constant-time. Then you've proven that selection sort can be done in linear time.

Heck, if you're going to pretend the log(n) of the square root algorithm doesn't matter, why not just use merge sort and declare its log(n) factor doesn't matter either?


How about the following psuedo-code, I think it should do the trick:

  1. B[][] = new Array[n][n]
  2. C[] = new Array[n]
  3. for i: 1 to n do
  4. ----C = 0
  5. for j: 1 to n do
  6. ----d=(A[j]/n)+1
  7. ----B[d][C[d]] = A[j]
  8. ----C[d]++
  9. for i: 1 to n do
  10. // Perform a version of counting sort on each sub array (bucket). the total of the actions performed here is n no matter how the buckets are filled. The counting sort should be performed on each array but from 1 to c.
  11. counter = 1
  12. for i: 1 to n
  13. ----for j: 1 to c
  14. --------A[counter] = B[j]
  15. --------counter++

I couldn't indent for some reason so I used ---- instead, I hope the code is readable.
As you see, this is a modified bucket sort using counting sort on each bucket. Hope that was helpful.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.