Array resize - enhance the efficiency

Question

bejfake 0 Newbie Poster

11 Years Ago

Hi,

I need to implement list based on dynamic arrays (second task is to implement as double linked list). Both tasks are done, but I have problem with time limit on test platform when values are millionths. I suppose that resizing dynamic array is delaying so much. I have implemented common list methods as push_front(), pop_back() ... For example:

    void push_front(int x)
    {
        size++;
        int* tab_temp = new int[size];

        memcpy(tab_temp + 1, tab, (size - 1) * sizeof(int));
        tab_temp[0] = x;
        delete [] tab;

        tab = tab_temp;
    }

I thought about change resizing step from 1 to 10, but I have to ask whether is it allowed. Maybe some advices?

c++ linked-list

5 Contributors
5 Replies
1K Views
1 Week Discussion Span
Latest Post 11 Years Ago Latest Post by rubberman

All 5 Replies

Moschops 683 Practically a Master Poster

11 Years Ago

You're right; this is going to cost you some time. It's always worth using some profiling tool to double-check where the time is being taken in your code (it is very, very common for people to think they know, and to spend a lot of effort optimising their code how they think they should, only to discover afterwards that the time was being burned somewhere else).

So if you're sure this is where you can best spend effort to improve performance, allocating extra is a good idea. I seem to recall a popular implementationg of the C++ vector container that, whenever it had to do a reallocation, doubled its own capacity. That seems like a good idea; it keep the size of the memory roughly on the order of what you need.

So, simply keep track of the size of your array, and how much of it you've used so you know when to expand it; and when you need to expand it, double it. If you're going to be doing a lot of push_front, you can be smart and fill your array from the middle, leaving yourself room to add to the front and add to the end.

mike_2000_17 2,669 21st Century Viking

11 Years Ago

What is really important is to resize by an amount that is proportional to the current amout in the array. Typical factors vary from 1.25 to 2. The reason why it is important to use a growth factor like that is because it makes the cost of allocation (and copying) have an amortized constant cost.

Here is how it works. Let's say the factor is 2, and you currently have N (valid) elements just after a resizing to a total capacity of 2N. This means that you can add N elements more to the array before you have to resize again, at which point you will have to copy the 2N valid elements from the old memory to the newly allocated memory. That copy will have a O(2N) cost, but you did it once after N additions to the array, meaning you did it with 1/N frequency, and so, the average cost is O(2N * 1/N) = O(2), which is what we call an amortized constant cost. If the factor was 1.5, by the same calculation (with frequency 1/(0.5*N)), you get an amortized cost of O(1.5*N * 1/(0.5*N)) = O(3). So, for any factor F, the cost is O(F/(F-1)). That's where you have a trade-off, with a larger growth factor the amortized cost is lower (limit is 1), but the excess memory is greater (on average, the unused capacity is N*(F-1)/2). A typical choice is 1.5, or around that. This is the standard why to handle a dynamic-sized array that most standard libraries in most languages will implement (with different factors).

If you grow the array by a constant amount every time (1 or more, like 32 or 64), you will get a reduction of the cost that is proportional to the amount that you add to the capacity every time, but you will not get an amortized constant cost per added element, but instead, it will be a linear cost per element. In other words, adding N elements to the array by expanding by 1 every time will cost you O(N*N), while adding N elements to the array by expanding by 32 every time will cost you O(N*N/32) (still quadratic, but reduced in magnitude). As deceptikon said, there are some special cases where this could still be useful, like small arrays that you know will not grow by very much.

Also, to clarify about this, you might think that using a constant additional capacity at every resizing is stupid since you would have to store the same amount of information (pointer to first element, size of valid elements, and total capacity) but you don't get the amortized cost that you get with a geometric expansion (by a factor). However, you can, indeed, save a little bit of memory because you can avoid storing the capacity. Let's say the amount of pre-allocation is 32 elements, then you know that at any point when you have N valid elements, the total capacity must be the next factor of 32, which you can compute with (N/32 + 1)*32 (with integer arithmetic, of course). This is why it can be an advantage for small arrays, like, of example, a string (array of character), which are typically not very long (a couple of hundred characters, at most, around 20-80 characters in typical use-cases). This is also why powers of 2 are preferred, because the arithmetic can be done with bitwise operations (faster). You can also implement a similar scheme for geometrically expanding arrays, but it implies additional constraints and difficulties in the implementation (and it's only really conceivable for power-of-2 factors).

And yes, if your teacher asked you to implement a "dynamic array", that usually means implementing a geometrically expanding one, because that is the typical meaning of "dynamic array" in computer science terminology, because dynamic-sized arrays that are not geometrically expanding are the exception (special-purpose, or just naive), not the norm.

What if I have to implement resizing capacity of 1, not double?

You can easily accommodate any arbitrary resizing (of array or of capacity) with a dynamic array. Is there any specific reason why you think this is not possible?

I have implemented common list methods as push_front()

Adding elements to the front of or middle of a dynamic array is a bit of an issue, performance-wise. Dynamic allocation of memory is really not that costly compared to the copying of memory. If you add elements to the front of the array, then you will always have to copy (or, more precisely, "rotate") the memory to make place for the new element at the front. Inserting in the middle has the same issue. And whatever scheme you use for resizing the capacity, this cost will still be there, and it will be significant.

One encouraging fact is that memory rotation can sometimes be done faster than straight copying from one location to another. To be clear, a rotation is something like taking all elements and moving them some amount forward or back in memory. For this, you should use the standard std::rotate algorithm, which could, in principle, be faster (this is because for simple raw memory copying operations, this operation could be optimized (by the compiler) into some fancy machine instructions, but that's too much to chew right now).

Regardless, there are a number of tricks to use to deal with the problem of efficient front-and-back insertions (see double-ended queue) such as ring-buffers, bi-directional expansions, an array of arrays, unrolled linked-lists, etc. They all have their trade-offs. Dealing with insertions in the middle is a bit more tricky, but can be realized with similar tricks (especially array of arrays and unrolled linked-lists). But I don't think that your teacher is expecting you to do anything about these problems, in fact, I'm pretty sure he just wants you to notice those problems so that he can say "Lo and behold! I give you the linked-list to solve all those problems!", which is, of course, following the curriculum by-the-book (and by the way, the linked-list is not really a viable solution to those problems, but you'll need a bit more experience to understand why, stuff that you won't find written in those textbooks).

StuXYZ commented: great post +9

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

bejfake 0 Newbie Poster · Answer 1 · 2014-05-10T20:59:32+00:00

What if I have to implement resizing capacity of 1, not double?

deceptikon 1,790 Code Sniper Team Colleague Featured Poster · Answer 2 · 2014-05-10T21:27:03+00:00

I thought about change resizing step from 1 to 10, but I have to ask whether is it allowed.

Of course it's allowed, provided you don't have unusual restrictions from the assignment. If you're required to resize by a single element every time you grow, then there's not a whole lot you can do about the performance hit. Memory allocations are among the more expensive of CPU bound tasks.

There are typically three different methods of resizing in libraries:

Increase the capacity by a fixed amount such as 16 or 32. This amount depends on expected average usage for the dynamic array. Not the best approach for a general purpose library. However, for string classes I've preferred this one because strings tend to be shorter in average use cases.
Increase the capacity by half again the current capacity, so new_cap = old_cap * 1.5. This tends to be the most popular because it's a more conservative amount than doubling while taking into account that the larger the array grows, the larger it's likely to grow.
Double the capacity as Moschops suggested. The problem with this one is it can allocate an excessive amount of unused memory as the array grows.

rubberman 1,355 Nearly a Posting Virtuoso Featured Poster · Answer 3 · 2014-05-11T13:48:30+00:00

I have used powers-of-2 to reallocate array sizes very successfully for years. IE, when I need to add an element to an array, and the array is full, then I allocate twice the current size of the array, using the C-function realloc(), and reset the maxsize variable, then add the new member to the array, incrementing the currsize variable. That minimizes the number of allocations required, and using realloc() means that I don't need to copy the data as a separate step or manually delete the old array - usually the compiler+hardware can optimize this operation better than I can.

This only breaks down when you get to REALLY large arrays, and in 30+ years I haven't run into that problem except on small (16-bit or smaller) embedded systems.

FWIW, do start with a power-of-2 for the size of the initial array. IE, if you need to handle 20 elements, then allocate a 32 element array (2^5 elements). When you are about to add the 33'd element, you realloc() for 2^6 (64) elements.

Linked lists are another problem entirely... :-)

Array resize - enhance the efficiency

Recommended Answers Collapse Answers

All 5 Replies

Recommended Answers