variations of this are being posted repeatedly here. hope this will clarify things once and for all.

// to choose a random element from a sequence when
//    a. you do not know how many elements are there before hand
//    b. you want to make one single pass through the sequence
//    c. you do not want to use auxiliary storage
//    algorithm:
//    step 1: select first element with probability 1
//    step 2: replace the first element with the second with  probability 1/2 
//       now both first and second are equally likely  (prob 1/2 each)
//    step 3: replace the selected with the third with  probability 1/3
//      now first, second and third are equally likely (prob 1/3 each)
//         .....
//    step n: replace the selected element with the nth with  probability 1/n
//       now any of the n are equally likely (prob 1/n each)
//    continue till end of sequence
// note: this algorithm (with appropriate modifications) can be used in 
//                   a. reading a random line from a file
//                   b. filling up an array with unique random values
//                   c. generating n random values in ascending order
//                   and so on.
//    
// here is a C implementation 
// to choose a random node from a singly linked list

#include <stdlib.h>

typedef struct node node ;
struct node {  int value ;  node* next ; };

node* choose_random_node( node* first )
{
  int num_nodes = 0 ; // nodes seen so far
  node* selected = NULL ; // selected node
  node* pn = NULL ;
  for( pn = first ; pn != NULL ; pn = pn->next )
    if(  ( rand() % ++num_nodes  ) == 0 ) selected = pn ;
  return selected ;
}

That is very nice; thank you for posting it. And thanks for putting it in the code snippets section already.

If I may criticize one aspect of your code, your line if( ( rand() % ++num_nodes ) == 0 ) selected = pn ; does a bit too much in one line: there's really no reason not to write it like the following:

#include <stdlib.h>
 
typedef struct node node ;
struct node {  int value ;  node* next ; };
 
node* choose_random_node( node* first )
{
  int num_nodes = 0 ; // nodes seen so far
  node* selected = NULL ; // selected node
  node* pn = NULL ;
  for( pn = first ; pn != NULL ; pn = pn->next ) {
    num_nodes += 1 ;
    if(  ( rand() % num_nodes  ) == 0 )
      selected = pn ;
  }
  return selected ;
}

Furthermore, you're using the default random number generator a bit uncarefully. The low-significance bits of the numbers produced by many default pseudo-random implementations tend to be a bit less reliably random than the high-significance bits; see the random number tutorials floating around.

If I may criticize one aspect of your code, your line if( ( rand() % ++num_nodes ) == 0 ) selected = pn ; does a bit too much in one line

true; especially in a code snippet.

Furthermore, you're using the default random number generator a bit uncarefully. The low-significance bits ...

the focus was on exposition of the algorithm, not on random number generation. if i was serious about generating pseudo random numbers, i would not use the c library rand() in any case. it often is a poor random number generator and remains a poor one even if some of its weaknesses are programmed around. a well configured mersenne_twister or lagged_fibonacci generator would produce much better results.

This article has been dead for over six months. Start a new discussion instead.