the basic problem here is that we need to select one item at random
when we do not know beforehand how many items there are. the most
common example of this in real code is when a random element has to
be picked from a list (size unknown). there is a well known algorithm to
do this without having to pass through the entire sequence:
string random_line( ifstream file )
{
string line ;
int nlines = 0 ;
while( getline( file, line ) )
if( ( rand() % ++nlines ) == 0 ) break ;
return line ;
}
the variable
nlines counts the number of lines as they are read.
for any line,
rand() % ++nlines == 0 is true with probability
1 / nlines. the first line is initially selected (with probability 1),
the second line replaces it with probability 1/2, the third line will
replace the survivor (either line one or line 2, each of which are
equally likely) with probability 1/3, and so on.therefore, for each
of the
nlines lines seen so far is selected with probability
1/nlines.
if i remember right, this was first seen widely in some code
written by brian kernighan.