Hello,

I am working on an arbitrary precision arithmetic library. I already created a version that just stored numbers as "before decimal point [unsigned array]" and "after decimal point [unsigned array]". The issue is that it is both inefficient and incomplete. I would like to incorporate complex numbers into the system (I am not, however going to incorporate quartic numbers). I am also going to incorporate vectors and matrices. My issue is that I cannot decide how to store complex numbers and (maybe?) vectors. I realize that the optimal approach would probably be to store it as either standard representation [a+bi] or polar representation [re^i(theta)] but I do not yet have the skills to create an algorithm that chooses a format heuristically. I am now faced with the problem of choosing one of the representations. I realize that certain things are easier to represent in standard representation and others in polar, but my question is which do you think would be most efficient and easiest (efficiency trumping ease) once implemented? Also I was wondering if it was worth using a linked list in my number representation rather than a standard dynamically allocated array. I think it would be best to use a dynamically allocated array since I will usually know the required size to store my results (even if I have to over-estimate), but I am wondering if someone can think of a more efficient solution. Thank you.

I already created a version that just stored numbers as "before decimal point [unsigned array]" and "after decimal point [unsigned array]".

The decimal point? That's untenable. Are you saying that in order to store the number 1e30 or 1e-30 you need to store a long array of zeros followed by 1? That's insane. You should look at the way the floating point numbers are stored for real, and emulate that to arbitrary precision (instead of fixed). Where the decimal point falls is completely arbitrary, and has no baring on the precision nor the storage requirement.

Second, I must assume that you're doing base-2 arithmetic. Base-2 arithmetic is what the computer understands and work with well. So, I assume you mean "decimal" just in a colloquial sense, not in the sense that you use base-10 arithmetic. Don't work against what is natural for the processor.

The issue is that it is both inefficient and incomplete.

Yeah, that's gonna be a problem. Doing this is really hard, see the GMP and MPFR libraries.

I would like to incorporate complex numbers into the system (I am not, however going to incorporate quartic numbers). I am also going to incorporate vectors and matrices.

That shouldn't be a problem at all. Once you have a good arbitrary precision class, with all the required operator overloads and standard function overloads (sin, tan, log, exp, ldexp, frexp, etc...), you should be able to use std::complex directly. As in, just use std::complex< arbitrary_precision_float >. As for vectors and matrices, well, just use any good linear-algebra library, or your own, there should not be a difference between operating with a built-in floating-point type and your arbitrary precision type.

I realize that the optimal approach would probably be to store it as either standard representation [a+bi] or polar representation [re^i(theta)] but I do not yet have the skills to create an algorithm that chooses a format heuristically. I am now faced with the problem of choosing one of the representations. I realize that certain things are easier to represent in standard representation and others in polar, but my question is which do you think would be most efficient and easiest (efficiency trumping ease) once implemented?

Use cartesian representation (a + bi). There is no debate here. Cartesian representation wins, hands down. Most operations that are somewhat simplified by the use of polar representation are already complicated enough (complex versions of sin, cos, etc.) that you won't see the difference. And polar representation makes the easiest and most used operations (additions and subtraction) much worse than they should be. And the only grey-area is for multiplication and division, which are still quite cheap in cartesian form.

I guess you can always test both and see which works best, but it's a pretty safe bet that cartesian form will be better.

Also I was wondering if it was worth using a linked list in my number representation rather than a standard dynamically allocated array.

Use the dynamically allocated array (i.e., std::vector). It is the clear winner here, I'm 1e30% positive on that ;)

A linked-list in this case would be attrocious, absolutely terrible. See a detailed discussion here.

I think it would be best to use a dynamically allocated array since I will usually know the required size to store my results (even if I have to over-estimate), but I am wondering if someone can think of a more efficient solution.

In my opinion (I guess the GMP/MPFR people wouldn't agree), a better solution would probably be to use a static array (e.g., std::array). This might sound insane for an arbitrary precision library, but it makes perfect sense when you think about it. You have to understand something fundamental about round-off errors: they never get better. Starting from an initial precision, things can only go down-hill from there, all you can do is limit the damage. If I take the number 2 (which I know exactly, i.e., to infinite precision, i.e., 2.000000...) and multiply it by pi which I know down to 20 digits, then the result is 2 times pi, down to 20 digits. And that is even stretching things because there is nothing that I know to infinite precision. In other words, if you start your calculation with a certain maximum precision, you'll never need to dynamically increase that precision, if anything, you might need to eventually dynamically decrease the precision as you proceed through the calculations, to save space. And, if you use 128 bits (16 bytes) overall, you'll get a precision in the ball-park of about 32 significant digits, and if you use 256 bits (32 bytes), you double that to about 64 digits. I have never hear of any number being known to a better precision than that, except artificial mathematical numbers like pi or e, or numbers that are directly tied to the definition of the measurement unit (e.g., the speed of light, which is tied to the definition of the standard meter), but real-world calculations start with real-world numbers (e.g., measurements) which are always limited to far less accuracy than 32 or 64 digits.

My point is, if you never need numbers any bigger than that, than the overhead of dynamically allocating the array will not be worth it, which is anywhere from 16 bytes to 40 bytes on most systems and more importantly, the overhead incurred by the indirection (read the link above about linked-lists).

Round-off error accumulation and amplification is not a function of how much "precision" your floating point numbers store, it is important to understand that. The precision your type offers only determines how little round-off error you can start with, but it does not affect its accumulation or amplification, and you eventually store a lot of trailing digits that are effectively meaningless, regardless of the "precision" offered by the number. I can't stress that enough.

Comments
Deep knowledge.

Yeah, I already changed my storage technique to num+exponent. Also I am working in base 256 (so that I use 1 byte per digit) which allows me to easily use the base 2 arithmetic of the computer. It is true that my last library used decimal... and I quickly realized my mistake. I am not sure how the std::complex system works (I may look into it later) but I want to overload operators so that my arbitrary precision type can be used fully. As such calling sqrt() should always return a valid result, which requires that I have access to imaginary numbers inherently in my data type. I was also considering storing my number as a fraction (numeratorbase,numeratorexponent,denominatorbase,denominatorexponent) but I am not sure how efficient that will be.

That comment about precision... that is genious! I never thought about it like that. I have functions to obtain irrational numbers up to a certain level of precision, but I never thought about the fact that my precision will only drop. I don't think I will use std::array, since manually storing a non-growing dynamically allocated array is trivial.

I guess you definately answered my question about complex number representation. I was never really going to use linked lists, the overhead is silly and it wouldnt be that great a speed boost since I need to use a decent amount of random data access, it was more for the sake of my friends that are helping me out (since we are software engineering students at the University of Waterloo in first year we are still learning all the tedious linked list implementations).

Thank you very much!

since we are software engineering students at the University of Waterloo in first year we are still learning all the tedious linked list implementations

Keep in mind that linked-lists are theoretically nice and make for good prorgramming exercises in computer science classes, but they are far from being all that useful in practice. Cases where a linked-list is the better choice are few and far between. Can't use them if you want random-access. Shouldn't use them if you want to do frequent traversals. Shouldn't use them if you need to sort the data, or keep it sorted, or do searches. An array of pointers is almost always preferrable to a linked-list if the objects are expensive to copy (and cannot be moved, which is very rare) and/or you need them to have a persistent address in memory, and the array-of-pointers comes with the added bonus of less overhead and random-access capability.

This leaves you with very few application areas, most of which end up requiring a hybrid structure (e.g., unrolled linked-list, linked-arrays, cache-oblivious lists, etc., or with an efficient small-object allocator), or have the data distributed over modules or computer-nodes in a linked topology, or in a medium where data mobility is very difficult (e.g., a file-system). The argument also extends to other kinds of linked-structures (linked-trees, adjacency-list graphs, etc.), although in many cases the alternatives are not as easy or convenient to implement and don't pay off as much. But for everyday simple tasks (for which you shouldn't over-design the data structure you use), if either std::vector or std::list would fit the bill just as good, then go with std::vector, it's almost always the better choice between the two. I don't want to start a flame war about linked-lists, because I know many CS students have a love-hate relationship to linked-lists (they love to use them, but hate to implement them), but a bit of realism can't hurt.

Keep in mind that linked-lists are theoretically nice and make for good prorgramming exercises in computer science classes, but they are far from being all that useful in practice.

All these fancy data structures, and the go-to data structure is nearly always an array. That's one of those nasty realities that tend to smack new programmers in the face when they enter the professional world.

This question has already been answered. Start a new discussion instead.