I'm looking for an algorithm, pseudocode or any langauge, to generate random numbers with the following characteristics:

integer values in the range 0-n
lower values much more likely to happen than higher values,
eg prob of 0 is n/(n(n-1)), prob of n is 1/(n(n-1)) - linear relationship between value and probablility
but the exact relationship isn't important - could be exponential, square etc
speed/efficiency is important - don't want to use expensive functions like exp, log sqrt etc

I'd be grateful for any suggestions, pointers, or links.
Thanks
James

(In case anyone wants to know, short version: this is for picking requests from a FIFO queue - the items should be picked randomly, but the earlier requests should have a higher probablility of being picked)

Recommended Answers

All 10 Replies

How come 'randomly' if it's a FIFO queue?

OK, strictly speaking I should call it a FIMLTBFO queue - First In, Most Likely To Be First Out queue

Your probability doesn't seem to be correct. Let rework it a bit. We want to random from 0 to n. Let a be the value between 0 to n. The probability of P(a) = 2 * (a + 1) / (n + 1)(n + 2). This way, the sum of P(a) where a from 0 to n will be equal to 1. However, this definition gives higher chance for largeer value (but we can easily reverse the definition to fit your need).

Firstly, we random number between k = 0 to (n+1)(n+2)/2 - 1. Then, we try to find which value should k belong. For example, n = 4 if k is 11 then value is 4.

   k: 0 | 1 2 | 3 4 5 | 6 7 8 9 | 10 11 12 13 14
   v: 0   1     2       3         4

You can easily find the v = ceil( (sqrt(1+8*k)-3)/2 ).

Code in Python:

from random import randint
import math

def rand(n):
    limit = ((n + 2) * (n + 1)) / 2
    r = randint(0, limit-1)
    return math.ceil( ( (1 + 8 * r)**0.5 - 3 ) / 2 )

Here is the probabiliy for rand(5) after 100,000 tried.

0 = 0.04974   ( 4%) 
1 = 0.09431   ( 9%)
2 = 0.14168   (14%)
3 = 0.1887    (18%)
4 = 0.23739   (23%)
5 = 0.28818   (28%)

Excellent! Thank you very much!

Yes, I realise the probability formula I posted wasn't quite right, but I was just trying to illustrate a point, so I coudn't be bothered to do it rigorously ;)

So, moving on... efficiency is an issue here, so would rather avoid the sqrt and it's implied int-float conversions. Instead of v = ceil( (sqrt(1+8*k)-3)/2 )maybe it would be faster to loop v down from n-1 to 0 subtracting it from k until k <= 0? Typical queue size will be between dozens and hundreds of entries, but throughput will be huge. So something along the lines of this may work well by the time the compiler and CPU have finished optimising the loop?
for ( v = n-1; v>k; k-= --v);
(The actual requirement is for values 0..(n-1), so I may have initial values out by +/- 1 there.. no access to compiler where I am at the moment, so can't test)

J

I believe that computer can quickly calculate square root. Just using Taylor Series or Newton Raphson. These two method is cheap if you don't want too accurate result of the square root. I heard most of GPU has a built-in inverse square root (which can be done in single instruction).

One other thought, a more brute force way, since you are doing FIFO, is to create duplicates in the queue. So say you have a unique identifier to specifiy each record type. For a normal random engine you'd have 1 entry in the queue to select from for each record. However, to increase the chances of getting older records, you could create duplicates of these IDs, increasing the chances of them being picked since there are more occurances.

You'd probably have to setup ranges of some sort like "if it's 60 days old have 3 entries for each record".

Also, for your random engine, if you are looking for a good one, check out RandomOps for C#. The author of the library built it so it can actually use random.org as a source of entropy for seeding a random number generator (which happens to be a true random source of entrophy). The site limits you to 1 million bits a day I believe, but if you initialize it once, and then just keep it static that shouldn't be an issue (the author though did design it to fall back and allow to be seeded by another random engine). Of course that's for free users, you can actually purchase more a day if you desire

I have used that library myself many times, and even have a nice wrapper class (the initialization of the random number generator using random.org as an entropy source takes a second or two, but once that's done the code works very quick to generate random numbers).

Oh yeah the library also has multiple random engines to choose from. I like to use CMWC (I think I got that right)

There is another way which is even simpler than my last approach. Since random.random() generates number between 0 to 1. If we square it, it will getting smaller. Hence, the lower value will have more chance.

def rand2(n):
    r = random.random()
    r = r * r
    return round(r * n)

The result of rand(5) after 100,000 attempts

0 = 0.31604   (31%) 
1 = 0.2316    (23%)
2 = 0.15841   (15%)
3 = 0.12994   (12%)
4 = 0.11245   (11%)
5 = 0.05156   ( 5%)

If you want to increase the chance of the lower value even further you can do r = r * r * r. The result of rand(5) after 100,000 attempts

0 = 0.46415  (46%) 
1 = 0.20534  (20%)
2 = 0.12483  (12%)
3 = 0.093    ( 9%)
4 = 0.07767  ( 7%)
5 = 0.03501  ( 3%)

I like where this is going.
To explain a bit more of the background (without violating an NDA) this is a mutli-player game situation in which a large number many kinds of "requests" need to be processed in real time, in an overall FIFO kind of way. However it's important that the players cannot predict or rely on the exact sequence in which any two requests will be processed. We say its probable that the earlier request will be processed first, and the greater the time difference the greater the probablility, but it's only probable, never guaranteed, and there can always be exceptions.
Given that, I don't see how the duplicate entries would work, but the "square the random" looks ideal - easy to decide how much to bias the distribution and just a few multiplications. Brilliant, thank you.
As for the exotic random generators, I think the stadard Java Random will be just fine. All these queues are being updated in real time by multiple players, so there's a major source of randomness anyway. Even if our random generator is predicatble or repetitive that will be invisible behind the intrisnic randomness of the input.

Just for info, here's the amazingly trivial Java implementation that seems to do exactly what I wanted:

    private Random r = new Random();
    int biasedRandom(int n) {
        double x = r.nextDouble();
        return (int) (x*x*n);
    }

Thanks everyone
James

I will say this about those stock random number generators. I had one for a screensaver program, I won't go into details, but it invovled populating multiple queues with random information. Despite me priming the generator (pulling like 5000 random numbers ahead of time), I could still see a pattern in the program with each queue matching what another one did like 2 indexes ago. Even after a queue would drain and be repopulated, I'd still see patterns. This was with around a few hundred entries, but still, I am no math genious who can predict patterns easily, and could see it. That's why I took the time to really dive into the RandomOps library, and have been much more impressed.

(On a side note, I should point out that RandomOps was just one possible library. If you go to random.org they have a list of all these libraries for different languages you can use)

I might have to check out this square root solution, looks like a much better approach and maybe something I could utilize.

Let examine how good is square random number as random number. The easiest and simplest way to do so would be using chi-squared test. First, to calculate the chi-squared test, we need to know the probability distribution of our algorithm. Let rand(n) to be function that random between 0 to n - 1. p(a) = sqrt((a+1)/n) - sqrt(a/n). Let do examine 100,000 attempts of rand(5):

    Actual  Expected  SquareDiff
0 |  44675   44721     2116
1 |  18507   18524     289
2 |  14155   14214     3481
3 |  12148   11983     27225
4 |  10515   10557     1764

V = SUM(SquareDiff(a)/Expected(a)) where a from 0 to n-1
V = 2.746877902

First of all, we don't want value of V to be too high because if it is too high, it is not respect our probability distribution. We also don't want value of V to be too low because that mean it is too predictable.

Capture.PNG

Using the table provided by Art of Computer Programming Vol 2, our V value is between 25% and 50% which is not too high and not too low. Therefore, our algorithm pass chi-square random-ness test.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.