How to increase Precision in floating point arithematic

Question

chiraag 0 Newbie Poster

14 Years Ago

Hi there all,

Could someone please tell me how I could increase my precision for floating point arithematic?

My requirement is that I add a very small value of the order 10^-7 with a relatively big value, say 36.63 and then I want multiply it with 10^7. The problem Im facing using float nos alone is that when i sum up the numbers i get 36.63 itself and the former is not padded to the value 36.63.
My problem here make a significant difference in values because rigtht after the summing i multiply it with 10^7.

Could somebody tell me how I could go about this problem?

Thanks in advance

c++

4 Contributors
16 Replies
880 Views
6 Days Discussion Span
Latest Post 14 Years Ago Latest Post by vali82

All 16 Replies

vali82 1 Light Poster

14 Years Ago

use double :)

mrnutty 761 Senior Poster

14 Years Ago

Try cout.precision(25);

vali82 1 Light Poster

14 Years Ago

I guess you did not understand the problem.
I am adding two numbers-1.37*10^-5+6.3, in C++ when we do floating point arithmetic it automatically truncates the sum. And this is where I have the problem, I don't want it to truncate at all. because after i get the sum, i multiply it with 10^7,so if it truncates there is a significant difference in values. Could somebody tell me how to go about this problem?
Thank you.

There's a problem with your compiler friend !
I just tried the sum and it doesn't truncate anything. (VS 2008)

Float has a range of 3.4E +/- 38 so it's way more than you need !

Maybe it you paste the code you use ...

Salem 5,138 Posting Sage

14 Years Ago

> My requirement is that I add a very small value of the order 10^-7 with a relatively big value, say 36.63
floats have about 6 decimal digits of precision, doubles about 15.
10^-7 is more than 6 digits away from 36.63 so your really small number is effectively zero.

Using double will buy you some head room, but won't solve the underlying problem.

Two choices
- use a math library with arbitrary precision like GMP
- rearrange your expressions so that precision is preserved as long as possible.
Eg.

float small[10] = { };
float big;
for ( i = 0 ; i < 10 ; i++ ) big += small[i];

would become

float small[10] = { };
float big;
float smallish = 0;
for ( i = 0 ; i < 10 ; i++ ) smallish += small[i];
big += smallish;

Each small by itself is too small to affect big, but by combining them all together you end up with something which can affect it.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

chiraag 0 Newbie Poster · Answer 1 · 2009-08-18T15:39:41+00:00

I guess you did not understand the problem.

I am adding two numbers-1.37*10^-5+6.3, in C++ when we do floating point arithmetic it automatically truncates the sum. And this is where I have the problem, I don't want it to truncate at all. because after i get the sum, i multiply it with 10^7,so if it truncates there is a significant difference in values. Could somebody tell me how to go about this problem?

Thank you.

chiraag 0 Newbie Poster · Answer 2 · 2009-08-19T08:45:56+00:00

Hi Vale82 and Salem,
I will show my code to show whats happening.

Nx=1280;
Ny=960;
double pinhole_ccd_D_rec=6.3;
double pinDrec=pow(pinhole_ccd_D_rec,2);
double k=2*pi/lamda;[1.18105e+07]
 for (m=0;m<Nx;m++)
   { for(n=0;n<Ny;n++)
 	{ 	x[n][m]=(1+m-Nx/2); y[n][m]=(1+n-Ny/2);
	        r[n][m]= pow((x[n][m]*dx),2)+ pow((y[n][m]*dy),2);
                PD=k*sqrt(r[n][m]+pinDrec);
            }
   }

My ans to PD is more or less the same because of the truncation.
First value of r[n][m]-1.37*10^-5,pinDrec=36.96, I take the sum, take the sqrt and then multiply with a huge value k. So even the small changes should be reflected in the sum r[n][m]+pinDrec. I am using VC++ 6.0.

Could you guide me where I am going wrong

vali82 1 Light Poster · Answer 3 · 2009-08-19T12:19:07+00:00

Hi chiraag,

I've tested with a somewhat simplified version of your example:

const int Nx = 1280;
    const int Ny = 960;

    double x;
    double y;
    double r;

    double pinhole_ccd_D_rec = 6.3;
    double pinDrec = pow(pinhole_ccd_D_rec,2);
    double pi = 3.14, lamda = 1.18105e+07;

    double k = 2 * pi /lamda;//[1.18105e+07]
    double dx = 1, dy = 1;
    double PD = 0;

    for (int m = 0; m < Nx; m++)
    { 
        for(int n = 0; n < Ny; n++)
        { 	
            x = (1 + m- Nx / 2); 
            y = (1 + n- Ny / 2);

            r= pow((x * dx),2) + pow((y * dy), 2);

            PD = k * sqrt(r + pinDrec);
        }
    }

The thing is that PD is quite different every time. Maybe my version of the code is missing something ?!

My advice would be to get a new compiler ( download the express VS 2008 for C++, it's free on the microsoft site ) and try your code in that one. VS 6.0 has a LOT of bugs and even if the 2008 version still isn't working for you it's still an upgrade you MUST make!

One more thing, when you're testing the values, I hope you're in debug and looking directly at them! If you're printing them out on the console ... all bets are off :)

chiraag 0 Newbie Poster · Answer 4 · 2009-08-20T08:41:15+00:00

Hi chiraag,
I've tested with a somewhat simplified version of your example:
const int Nx = 1280;
    const int Ny = 960;

    double x;
    double y;
    double r;

    double pinhole_ccd_D_rec = 6.3;
    double pinDrec = pow(pinhole_ccd_D_rec,2);
    double pi = 3.14, lamda = 1.18105e+07;

    double k = 2 * pi /lamda;//[1.18105e+07]
    double dx = 1, dy = 1;
    double PD = 0;

    for (int m = 0; m < Nx; m++)
    { 
        for(int n = 0; n < Ny; n++)
        { 	
            x = (1 + m- Nx / 2); 
            y = (1 + n- Ny / 2);

            r= pow((x * dx),2) + pow((y * dy), 2);

            PD = k * sqrt(r + pinDrec);
        }
    }
The thing is that PD is quite different every time. Maybe my version of the code is missing something ?!
My advice would be to get a new compiler ( download the express VS 2008 for C++, it's free on the microsoft site ) and try your code in that one. VS 6.0 has a LOT of bugs and even if the 2008 version still isn't working for you it's still an upgrade you MUST make!
One more thing, when you're testing the values, I hope you're in debug and looking directly at them! If you're printing them out on the console ... all bets are off :)

Could you please tell me what were the values of PD? Because I tried it on both VC++ 6 and 9..and both the versions i have the same problem. My value is 7.44061*10^7, the value remains the same for the whole for loop. Were you getting different values throughout? I am really confused in this regard.

mrnutty 761 Senior Poster · Answer 5 · 2009-08-20T10:26:37+00:00

Copied and pasted and the value I got was : 0.0004254 for PD

chiraag 0 Newbie Poster · Answer 6 · 2009-08-21T12:58:54+00:00

Could you please tell me what were the values of PD? Because I tried it on both VC++ 6 and 9..and both the versions i have the same problem. My value is 7.44061*10^7, the value remains the same for the whole for loop. Were you getting different values throughout? I am really confused in this regard.

Hi ,

I went through the code that I posted, I made a mistake in typing the lambda value. lambda=532*10^-07 You are getting the value of PD different in every case because the k term is no longer 10^7 value. k=1.18105*10^7 r values in the code are of the order 10^-05, the r values are negligible when we add it to 36.96 and then we when we multiply with a value of the order 10^07, there is a significant difference in values if we do not consider that truncation as well.

This is what I was asking about. How can I avoid the truncation that occurs when I do the step r[n][m]+pinDrec ?

Could somebody please help me with this problem?

vali82 1 Light Poster · Answer 7 · 2009-08-21T13:52:41+00:00

Could you post a short, compilable code example where I could see the problem ?

chiraag 0 Newbie Poster · Answer 8 · 2009-08-21T14:35:56+00:00

Could you post a short, compilable code example where I could see the problem ?

My code is given below. PD value is where I have my problem.

Nx=1280;
Ny=960;
double pinhole_ccd_D_rec=6.3;
double lambda=5.32*10^-07
double pinDrec=pow(pinhole_ccd_D_rec,2);
double k=2*pi/lamda;//[1.18105e+07]
 for (m=0;m<Nx;m++)
   { for(n=0;n<Ny;n++)
 	{ 	x[n][m]=(1+m-Nx/2); y[n][m]=(1+n-Ny/2);
	        r[n][m]= pow((x[n][m]*dx),2)+ pow((y[n][m]*dy),2);
                PD=k*sqrt(r[n][m]+pinDrec);
            }
   }

thank you.

mrnutty 761 Senior Poster · Answer 9 · 2009-08-21T22:32:49+00:00

e lambda=5.32*10^-07

You do know that '^' is called XOR and is not "to the power of"
operator ?

10 ^ -07 != 0.0000001;

chiraag 0 Newbie Poster · Answer 10 · 2009-08-22T08:37:03+00:00

e lambda=5.32*10^-07
You do know that '^' is called XOR and is not "to the power of"
operator ?
10 ^ -07 != 0.0000001;

yes i do know..

Nx=1280;
Ny=960;
double pinhole_ccd_D_rec=6.3;
double lambda=5.32*pow(10,-7);
double pinDrec=pow(pinhole_ccd_D_rec,2);
double k=2*pi/lamda;//[1.18105e+07]
 for (m=0;m<Nx;m++)
   { for(n=0;n<Ny;n++)
 	{ 	x[n][m]=(1+m-Nx/2); y[n][m]=(1+n-Ny/2);
	        r[n][m]= pow((x[n][m]*dx),2)+ pow((y[n][m]*dy),2);
                PD=k*sqrt(r[n][m]+pinDrec);
            }
   }

mrnutty 761 Senior Poster · Answer 11 · 2009-08-22T08:58:27+00:00

Is it possible to post your complete code, if its not to big?

vali82 1 Light Poster · Answer 12 · 2009-08-24T11:42:58+00:00

Yes, please post a short test code that compiles.

How to increase Precision in floating point arithematic

Recommended Answers Collapse Answers

All 16 Replies

Recommended Answers