First of all, forgive the corny title, I couldn't resist once I thought of it. ;)

I have a small assignment for a statistics class that had been giving me a little bit of trouble. How we did the assignment was left up to the students since it is a stats class and not a programming class. I was fairly sure I could easily do what I needed to with C++ so I gave it a whirl but started hitting weird snags that kept it from working properly.

I'll save you from the specifics of the assignment and get right down to where the weird stuff started happening. I have an array of size 20000 and I initially give all the values a 0. I then go through a user-specified number of "jumps" where the program goes through each of the 20000 values and moves them 0.1 positive or 0.1 negative with a 50/50 chance of each happening for each jump. Any of you familiar with probability curves can probably recognize that this would end up looking something like a bell curve after enough jumps. At this point, I go ahead and get it to spit out the 20000 values so I can see what some of them look like. Most of them seem fine but every 10 or 20 values, I don't see the expected ##.# format like it should, only ever adding or subtracting 0.1. I see numbers like 1.258384e-017 and the such. That's anomaly number 1. It's not a real big issue, these values sort of just fall off the map and don't get counted.

Now on to the part that really just doesn't work. After this, I go through each of the 20000 values and compare them to a "min" and "max", each initially set at 0. For each value less than min, min becomes that value and the same for max. It seems like it finds the minimum and maximum values in the array well. The next part of the assignment is to split this range into 20 sections and count how many numbers fall into these sections so I can get points in a distance-to-points graph that will resemble the expected bell curve. Obviously, the division won't always yield a nice and neat ##.# format that I can easily use for comparing. Most of the time, it's values like 1.524333. I can't compare this to the values that are in the array, values like 1.4 and -2.6. I did some casting trickery and cut off the excess numbers to turn 1.524333 into 1.5. There may be another way to do this but this is the one that I figured out and it seems to work. Now I have to count the numbers that match in each of the sections and add 1 to the corresponding section in another array, sized 20. It does not seem to like comparing the two doubles together, at least not in the way that I formed the numbers. The results would be highly erratic and incoherent. Similar to how I cut down the first number, I did some more int casting here so that I would be comparing ints and not doubles. That seemed to do the trick and the output seems good enough now. Is there something else I could have done to fix this or is it just a property of doubles that you have to work with?

Sorry for the wall of text. If something is too confusing and you still want to try and help me out, just let me know and I'll try to explain it better. I will include the source in this post as well. The assignment is due tomorrow so this will really be just for me to know. Like I said at the end of the post, the program works well enough for me.

#include <iostream>
#include <iomanip>
#include <time.h>
#include <math.h>

using namespace std;

double trunc(double a);
int rect(double a);

int main()
{
double diff[20000],rectd,rectb;         
double max,min,bin,times=0,jump=0;
int bins[20];
double z=0,currbin=0;

for(int x=0;x<20000;x++)
  {
   diff[x]=0;                            //Initialize the array to 0
  }

srand(time(NULL));                       //Give the random timer a seed

cout<<"How many jumps?"<<endl<<":";
cin>>times;

for(int i=0;i<times;i++)
  {
   for(int x=0;x<20000;x++)
     {
      jump=rand()%10;                    //Generate a value between 0 and 9
      if(jump>4)
        {diff[x]+=0.1;}                  //50% chance to jump +0.1
      else
        {diff[x]-=0.1;}                  //50% chance to jump -0.1
     }
  }

for(int x=0;x<20000;x++)
  {
   if(diff[x]<min){min=diff[x];}         //Find the maximum
   if(diff[x]>max){max=diff[x];}         //and minimum
  }

cout<<"The minimum value is "<<min<<" and the maximum value is "<<max<<endl;

bin = (max + (min*-1.0))/19;             //Split the range into 20 sections

cout<<"The values will be separated into 20 bins, each "<<bin<<" units apart."<<endl;

for(int m=0;m<20;m++)
  {
   currbin=min+(bin*z);                  //Select the current section
   if((m>0)&(m<19))
     {
      currbin=trunc(currbin);            //Cut off excess decimals if needed
     }
   for(int x=0;x<20000;x++)
     {
      rectd=rect(diff[x]);               //Create accurate values to 
      rectb=rect(currbin);               //compare correctly
      if(rectd == rectb){bins[m]++;}     //If the value is in the section, add
     }                                   //one to that section in the bin array
   z++;
  }

cout<<"The final values in the array are:"<<endl;

z=0;
for(int m=0;m<20;m++)
  {
   currbin=min+(bin*z);
   if((m>0)&(m<19))
     {
      currbin=trunc(currbin);
     }                                    //Display the values accumulated in the 
   cout<<setw(5)<<left<<currbin           //bins array.
   <<": "<<bins[m]<<endl;
   z++;
  }

system("pause");
return 0;
}


int rect(double a)
{
int b;
a*=3;
b=int(a);    
return b;
}

double trunc(double a)
{
int s=0;
a/=0.1;
s=int(a);
a=double(s);
a*=0.1;
return a;
}

You have come across a common problem that is inherent in the way floats and doubles are represented in memory. There are many numbers that can't be represented exactly.

One way around your problem is to use only long integers with implied decimal point. For example instead of 1.23 in double you could use 123 integer.

Another possible solution is to use boost math libraries. I hav enot used it myself do don't know if it will solve your problem or not.

Yet another solution is to use a different language, such as FORTRAN which is designed for mathimatical problems.

two things you need to do to fix what you've got:

Initialize the bins array - you start your counting with random junk in the array int bins[20] = { 0 }; Initialize min and max to the first element of the diff array. You say in your text that you set them to 0, but you haven't. And in general, that may not be sufficient. You should always set the min and max to a value from the data set to ensure a valid comparison.

min = diff[0];
    max = diff[0];
    for(int x=0;x<20000;x++)
    {
        if(diff[x]<min){min=diff[x];}         //Find the maximum
        if(diff[x]>max){max=diff[x];}         //and minimum
    }

Argh, you are absolutely correct, vmanes. I must have omitted the code setting min and max to 0 but I can see how setting them to values in the array instead would be a better choice.

As far as not initializing the bins array, that was just stupid on my part. I don't know how the results I got were looking good without doing that first.

Oh and one last thing, I can initialize an array like that? int bins[20] = { 0 }; Does that set all 20 values in the array to 0? Thanks for your input on this.

Oh and one last thing, I can initialize an array like that?
int bins[20] = { 0 };

Does that set all 20 values in the array to 0? Thanks for your input on this.

Yes, yes. Arrays are initialized with an "initializer list" which is enclosed in curly brackets, individual values separated by commas. You can set the list to be exactly the number of values the array holds, or any lesser number of values. Any uninitialized elements will be set to 0. If you put too many elements in the list, you will get a compilation error.

So, (for me, anyway) the most common initialization of an array, to ensure it does not contain random memory patterns, is the list you see above.

This article has been dead for over six months. Start a new discussion instead.