Probability Q - Page 3

Question

mike_2000_17 2,669 21st Century Viking

13 Years Ago

All your prof wants of you is that you compute the function given here for M = 1000. It requires a loop and a few basic math operations.

So much ink has been spilled on this issue, it's crazy.

Forget all the random shuffling business. This is a just about translating a basic math equation (summation with some factorial in it) into a computer program.

Just to end the 4 weeks of suffering:

#include <iostream>
#include <cstdlib>
#include <cmath>
#include <set>

int main() {
  const int N = 1000;

  double num = N - 1;
  double denom = N;
  double avg = 2.0;
  for(int i = 2; i <= N; ++i) {
    avg += num / denom;
    num *= double(N) - i;
    denom *= double(N);
  };
  std::cout << "Theoretically, the average number of random songs played before a repeat event is " << avg << std::endl; 

  std::cout << "Approximately, the average number of random songs played before a repeat event is " 
        << (std::sqrt(M_PI * N / 2.0) - 1.0 / 3.0 + std::sqrt(M_PI / (2.0 * N)) / 12.0 - 4.0 / (135.0 * N)) << std::endl;

  srand((unsigned int)time(NULL));

  double exp_avg = 0.0;
  for(int i = 0; i < 1e6; ++i) {
    std::set<int> s;
    int j = 0;
    while(s.insert( rand() % N ).second) ++j;
    exp_avg = (i * exp_avg + double(j)) / (i + 1);
  };
  std::cout << "Experimentally, the average number of random songs played before a repeat event is " << exp_avg << std::endl; 

  return 0;
};

By running the above code, you will see why it is an interesting problem from the point of view of teaching computer programming! Try to understand why the first loop cannot work.

Edited 13 Years Ago by mike_2000_17 because: small fix to code

mike_2000_17 2,669 21st Century Viking

13 Years Ago

The M_PI is not standard but appears in many compilers. That is just the pi constant, so you can create your own constant const double pi = 3.14159; and substitute pi for the M_PI in the code.

To be standard, the time function appears in #include <ctime> and should be qualified as std::time.

Sorry about that, I wrote the code pretty quickly.

Let me guess, you tried to compile with Visual Studio, which is the only main compiler that I know of which would give you this kind of problem.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

pjh-10 0 Light Poster · Answer 1 · 2012-03-22T19:35:55+00:00

lerner- is the code sample you have provided above, the general concept of how to do this, or is it just your personnel opinion?

Lerner 582 Nearly a Posting Maven · Answer 2 · 2012-03-23T03:03:27+00:00

Only you (and your instructor) can decide if the pseudocode I wrote meets your needs or not. I think it does, but that's an opinion, not a fact.

Lerner 582 Nearly a Posting Maven · Answer 3 · 2012-03-23T12:57:05+00:00

This is a just about translating a basic math equation (summation with some factorial in it) into a computer program.

I politely disagree with mike_2000_17. Having knowledge of arcane math equation and statistical analysis protocols can allow you to create more elegant, sophisticated and efficient programs than use of brute force and simple logic. But you can create a program to answer this question without the need to know the math behind the process or to know about sets or other containers that beginners don't know about yet. To me this is a program assigned to a beginning programmer to gain experience with loops.

I agree, that at this point the OP knows what level of knowlege s/he has and what s/he feels comfortable using or wishes to explore. Clearly s/he has several options that s/he can use to proceed.

pjh-10 0 Light Poster · Answer 4 · 2012-03-23T19:03:45+00:00

when i run the code above it doesnt work,says M_PI andd time arent declared, even when i declare them, still has problems. yet it runs fully on a differenet compiler i use?

pjh-10 0 Light Poster · Answer 5 · 2012-03-23T19:03:48+00:00

when i run the code above it doesnt work,says M_PI andd time arent declared, even when i declare them, still has problems. yet it runs fully on a differenet compiler i use?

Lerner 582 Nearly a Posting Maven · Answer 6 · 2012-03-23T20:22:47+00:00

M_PI is probably declared in cmath which is present in one compiler and not the other.

I forget where the time struct is declared, but apparently the one compiler doesn't recognize that either. It's probably in cstdlib.

The prolematic compiler is probably older and needs math.h and stdlib.h as headers to get the desired declarations visible.

pjh-10 0 Light Poster · Answer 7 · 2012-03-24T12:28:23+00:00

yip, u guessed right with visual studio,but even with them added in bits, the output of theoretically is -1.#IND, so somethings clearly up here, the approximate is fine! but the final "experimental" output doesnt output at all in the compiler, and whats the difference between experimental,thoretically and experimentally, very interested to know.thanks

Lerner 582 Nearly a Posting Maven · Answer 8 · 2012-03-24T13:35:17+00:00

Well, to me, the difference between theory and experimenting is theory is sublime whereas experimenting is sublime. Theory looks at relationships whereas experimenting does something--often physically or with brute force. Theory describes the expected whereas exprimentation describes reality.

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 9 · 2012-03-24T16:49:55+00:00

As I said earlier, the interesting part of this problem, from a perspective of learning computer programming, is to understand why that theoretical calculation doesn't work (outputs NaN).

The loop that computes the theoretical value will try to calculate both 1000^1000 and 1000!, which are both extremely large values can simply cannot be stored in a double-precision floating point number.

This is an important lesson to learn. Even a perfectly sound mathematical formula might not be any useful when trying to compute it on a computer because of the inherent limitations in representation and precision.

To some extent, the experimental calculation might also suffer the same fate because computing the average of a large sequence of numbers might also give a similar problem. Overall, we call this numerical stability.

Theory describes the expected whereas exprimentation describes reality.

Sure, but in computer science, both theoretical and experimental calculations are only approximated. The key is to understand how approximate they are, and how to mitigate these problems in order to get something useful. Blindly doing an experimental calculation is just as bad as blindly doing a theoretical calculation, neither results can be trusted if you don't consider the numerical approximations inherent to those calculations.

pjh-10 0 Light Poster · Answer 10 · 2012-03-25T09:46:30+00:00

ah interesting takes on the experimental,etc. so regarding the numerical stability, how is this corrected?

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 11 · 2012-03-25T16:24:13+00:00

You rearrange to math such that you don't get numbers that are too large (possibly overflowing) or too small (possibly underflowing), or such that you avoid many other pitfalls of numerical calculations, like subtracting two numbers that are almost the same value, adding a large number with a very small one (which is a big problem when taking averages or other statistics), and so on.

In this case, you can fix the theoretical calculation simply like this:

double factor = (N - 1) / double(N);
double avg = 1.0;
for(int i = 2; i <= N; ++i) {
  avg += factor;
  factor *= (double(N) - i) / double(N);
};
std::cout << "Theoretically, the average number of random songs played before a repeat event is " << avg << std::endl;

The other way to fix such problems is to develop an alternative expression, often an approximate one, which produces less round-off errors. Like in the above case, the Approximate expression is just as good as the theoretical one, or even better because it gives you the result within acceptable precision in one simple math expression as opposed to a loop. The idea here is that if the exact expression might lead to a lot of round-off error, then it is acceptable to use an approximate solution with less round-off error problems. In other words, if the round-off error of the exact expression is larger than the combination of the approximation error and its round-off error, then you are better off with the approximate expression. And this is very often the case.

pjh-10 0 Light Poster · Answer 12 · 2012-03-25T18:45:03+00:00

it still doesnt do experimental still?
(std::sqrt(pi * N / 2.0) - 1.0 / 3.0 + std::sqrt(pi / (2.0 * N)) / 12.0 - 4.0 / (135.0 * N))<<<< regarding this, just wondering where i can see this equation, interested to know the theory towards it (would write it down, and get lookin at it, only i cant write atm). or is it the same as the one on the first page? thanks

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 13 · 2012-03-25T19:20:01+00:00

I gave the link already, here it is again:

http://en.wikipedia.org/wiki/Birthday_problem#Average_number_of_people

pjh-10 0 Light Poster · Answer 14 · 2012-03-26T18:57:26+00:00

just seems way too easy that way,as opposed to arrays and storing the value,etc.

pjh-10 0 Light Poster · Answer 15 · 2012-03-28T10:12:58+00:00

i just think that maybe the array method, might just look better and has abit more substance to it

pjh-10 0 Light Poster · Answer 16 · 2012-04-01T15:32:44+00:00

im gonna try the array method,think its better, do you not agree, mike_lerner?