choose a random element from a sequence
variations of this are being posted repeatedly here. hope this will clarify things once and for all.
// to choose a random element from a sequence when
// a. you do not know how many elements are there before hand
// b. you want to make one single pass through the sequence
// c. you do not want to use auxiliary storage
// algorithm:
// step 1: select first element with probability 1
// step 2: replace the first element with the second with probability 1/2
// now both first and second are equally likely (prob 1/2 each)
// step 3: replace the selected with the third with probability 1/3
// now first, second and third are equally likely (prob 1/3 each)
// .....
// step n: replace the selected element with the nth with probability 1/n
// now any of the n are equally likely (prob 1/n each)
// continue till end of sequence
// note: this algorithm (with appropriate modifications) can be used in
// a. reading a random line from a file
// b. filling up an array with unique random values
// c. generating n random values in ascending order
// and so on.
//
// here is a C implementation
// to choose a random node from a singly linked list
#include <stdlib.h>
typedef struct node node ;
struct node { int value ; node* next ; };
node* choose_random_node( node* first )
{
int num_nodes = 0 ; // nodes seen so far
node* selected = NULL ; // selected node
node* pn = NULL ;
for( pn = first ; pn != NULL ; pn = pn->next )
if( ( rand() % ++num_nodes ) == 0 ) selected = pn ;
return selected ;
}
vijayan121
Posting Virtuoso
1,606 posts since Dec 2006
Reputation Points: 1,159
Solved Threads: 287
That is very nice; thank you for posting it. And thanks for putting it in the code snippets section already.
If I may criticize one aspect of your code, your line if( ( rand() % ++num_nodes ) == 0 ) selected = pn ; does a bit too much in one line: there's really no reason not to write it like the following:
#include <stdlib.h>
typedef struct node node ;
struct node { int value ; node* next ; };
node* choose_random_node( node* first )
{
int num_nodes = 0 ; // nodes seen so far
node* selected = NULL ; // selected node
node* pn = NULL ;
for( pn = first ; pn != NULL ; pn = pn->next ) {
num_nodes += 1 ;
if( ( rand() % num_nodes ) == 0 )
selected = pn ;
}
return selected ;
}
Furthermore, you're using the default random number generator a bit uncarefully. The low-significance bits of the numbers produced by many default pseudo-random implementations tend to be a bit less reliably random than the high-significance bits; see the random number tutorials floating around.
Rashakil Fol
Super Senior Demiposter
2,658 posts since Jun 2005
Reputation Points: 1,135
Solved Threads: 177
If I may criticize one aspect of your code, your line if( ( rand() % ++num_nodes ) == 0 ) selected = pn ; does a bit too much in one line
true; especially in a code snippet.Furthermore, you're using the default random number generator a bit uncarefully. The low-significance bits ...
the focus was on exposition of the algorithm, not on random number generation. if i was serious about generating pseudo random numbers, i would not use the c library rand() in any case. it often is a poor random number generator and remains a poor one even if some of its weaknesses are programmed around. a well configured mersenne_twister or lagged_fibonacci generator would produce much better results.
vijayan121
Posting Virtuoso
1,606 posts since Dec 2006
Reputation Points: 1,159
Solved Threads: 287