chase.lewis.3114

I'm working an SSE / AVX library and have almost all the basics working, but now I'm trying to get the accuracy / speed a bit better on certain functions.

Essentially I'm using a minimax polynomial that when using infinite precision should give me ~2E-9 error, plenty for floats for calculating sin after reducing the range of sin(x/(2*pi)) to [0,0.25] from (-inf,inf). This is the common technique I've been reading about so you can use a smaller minimax polynomial for greater accuracy. However, the range reduction in this section

``````x = x/(2*pi)
x = x - round(x);
``````

which gives me the range [-0.5,0.5] before being further range reduced is causing me to lose a lot of precision as x moves away from zero since 1E6f/(2*pi) has very few sigfigs after the zero subtracting 1E6f leaves me with 1 or 2 sigfigs when doing the rest of the calculation. The internal sinf implementations doesn't seem to have that issue and match up with double sin(x) for values into 7 sigfigs. I'm curious how does one wrap a value near 6-7 sigfigs from (-inf,inf) - limited by sigfigs of float - to a range [1,-1] without losing so much precision?

rubberman 1,355

Use doubles instead of floats, or long doubles and then cast the final output to a float. This is exactly why the 8087 processor family used 80 bit precision for all intermediate calculations for floats and doubles - no lost precision... :-)

mike_2000_17 2,669

First, let me introduce you to two extremely useful standard floating-point functions: frexp and ldexp. The first allows you to do a base-2 logarithm (integer result, rounded above). And the second allows you to multiply a floating-point number with 2 to some integer power (positive or negative), and it does so in exact arithmetic, meaning, absolutely no loss of precision whatsoever. That's where all the magic happens.

There are many algorithms that rely on scaling the input numbers such that they fall within some desirable range (for numerical precision) where the algorithm works better. In matrix numerical methods, this is crucial, and it is called "matrix balancing", and similarly, the so-called scale-and-square methods which apply to a number of exponential-based functions. In any case, the implication is always that you do the scaling using exact arithmetic functions (i.e., frexp and ldexp).

That said, your problem does not really fall under that category per se. Your situation is a classic problem of smearing. Once your input number reaches a high enough magnitude (compared to PI), you've lost the precision, and there's nothing you can do to retrieve it. Sorry.

What you need to do in this situation is ensuring that the number never reaches a high magnitude, which it shouldn't anyways, because angles are periodic, so they can just wrap around. So, this is a problem that you can only fix at the source (where the "x" comes from) and ensure it never reaches large magnitudes. This is a basic rule when using trigonometric functions in numerical calculations, you always keep your angles within the first period (e.g., `[0,2*pi]` or `[-pi,pi]`).

Another trick I often use is to store the sine and cosine of the angle instead of the angle itself. I always keep my "2D rotations" represented as a pair of sine/cosine. The point is that if you use this pair, it means that whenever you need to change the angle (add / substract, multiply, etc.) you have to apply a trigonometric identity to achieve that, but it also means that whenever you need the sine or cosine you don't have to compute it through the sin/cos functions. And, at the end of the day, it turns out you get better numerical precision that way (less generation of round-off errors) because most trigo-identities needed are pretty well conditioned while direct angle manipulations (like add / sub) can be surprisingly problematic from a numerical point of view (e.g., smearing and range issues).

vijayan121 1,152

... lose a lot of precision as x moves away from zero since 1E6f/(2*pi) has very few sigfigs after the zero subtracting 1E6f leaves me with 1 or 2 sigfigs when doing the rest of the calculation.

If you haven't done so already, you might want to have a look at the section on cancellation in this paper:
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

I suppose you are already using a compensated summation algorithm.
http://en.wikipedia.org/wiki/Kahan_summation_algorithm