I have been using vector in c++ lately, but I needed to speed up my code a bit so someone recommended I use regular arrays. My question is 2 part -
1) I thought you couldn't do this!!?? For some reason it is working with no problem?

int a=2;
int test[a];

2) I used a timer (from the VUL library of VXL) to time the following two functions. The only difference is one of the arrays is allocated using a global int, and the other is allocated using a #define. It actually turns out the the global int is twice as fast!!?? I thought they should be about the same (if the int even worked at all! (see question 1))

FunctionWithArray time: 17 ms
FunctionWithDynamicArray time: 8 ms
#include <iostream>

using namespace std;

int DynamicSize = 2000000;
#define ArraySize 2000000

void FunctionWithDynamicArray();
void FunctionWithArray();

int main()
{
FunctionWithDynamicArray();
FunctionWithArray();
return 0;
}

void FunctionWithDynamicArray()
{
 int test[DynamicSize];
  for(int i = 0; i < DynamicSize; i ++)
    {
      test[i] = i;
    }
 }

void FunctionWithArray()
{
  int test[ArraySize];
  for(int i = 0; i < ArraySize; i ++)
    {
      test[i] = i;
    }
}

Any thoughts on either of these?

Thanks,

Dave

> 1) I thought you couldn't do this!!?? For some reason it is working with no problem?
Depends on your compiler.
Some compiler extensions permit this, so you just get lucky.
Enable whatever "strict ANSI" flags for a better idea of what is allowed and what isn't

2 million int's = 8MB of stack space.
Since stack space is allocated on demand, the first one to be called has all the overheads of allocating yet another page of stack every 4K.

But the second one - now that's already got the stack allocated from the first one, so it just blitzes through.

Swap the calls over, you'll see the reverse.

Or they could have been the same for a completely difference reason. Namely that the array is not used, and the compiler could have optimised the code out.

you're right, I switched the order and in both cases the first one took longer!!

How would you make an array of size determined at run time if you're compiler doesn't allow this?

Thanks,

Dave

I have been using vector in c++ lately, but I needed to speed up my code a bit so someone recommended I use regular arrays.

Using arrays instead of vectors only saves you the overhead of C++'s object model, especially if you need a dynamically sized array. Instead of playing memory games, try using something other than a linear storage structure for your data. Your code might be too slow because you should be using a set instead of a vector, for example.

Never use std::vector for numbers crunching (in scientific or engineering calculations). This STL container never intended for fast calculations. If you don't want to take much trouble over new/delete, use std::valarray class.

For example, times to calculate sum of 10000000 double type elements:

1. vector[]:  ~50 milliseconds
2. vector/iterator: ~90 milliseconds
3. array[]:  ~25 millisecond
4. valarray[]: ~26 milliseconds
5. valarray.sum(): ~24 milliseconds

AMD 5000+,VC++ 9 (Plauger's STL implementation), release mode, optimization for speed.
The vector was previously resized and initialized - no "overhead of C++'s object model" in the test loop. Timing precision was equal to ~1 microsecond.

I agree with Radical Edward - you may save some time switching from vectors to arrays (maybe half the time), but you may be able to structure your data so that you can perform your most often called operations in log(n) time instead of linear time. In this case a linear solution that took 1 second might take 10ms - a much larger improvement. What exactly are you storing and what kind of operations are you doing that is taking your program too long?

Alas, there are lots of problems (in computational physics, for exampe) where you never achieve O(log n) or more realistic O(n*log n) computational complexity. For example, you never multiply two matrices with better than O(n**2) operations (all existing algorithms have slightly better than O(n**3) ). There are lots of gas and fluids mechanics problems where computational complexity O(n**4) or even more...

I am not really doing operations on the vector, I am just storing values and retrieving them later. The problem is there are just lots and lots of them. I'll look into valarray's, but for now, if I use a standard array

int *myArray = new int [ somesize ];

don't I have to then manually delete the variable when I'm done with it?

Also, how can you get the length of the array after the fact (ie. the equivalent of MyVector.size() )?

Thanks,
Dave

I am not really doing operations on the vector, I am just storing values and retrieving them later. The problem is there are just lots and lots of them.

How are you retrieving them? Is it in linear order or some kind of search? A straight traversal from beginning to end or end to beginning is well suited for an array, but you probably won't see a big difference between an array and a vector in that case. The storage overhead for a vector will be overwhelmed right away by the values, and the runtime cache difference between an array and a vector will be nonexistent.

If you're searching for values, the time you spend paging cache lines in and out is probably going to be significant with lots and lots of values, so a structure designed for searching like a set or hash_set should be your first step in optimizing the performance.

don't I have to then manually delete the variable when I'm done with it?

Yes, if you manually allocate memory, you have to manually delete it.

Also, how can you get the length of the array after the fact (ie. the equivalent of MyVector.size() )?

If it's an actual array you can divide the size of the array as a whole by the size of a single element to get the number of elements:

#include <iostream>

int main()
{
  int a[100];

  // Prints 100
  std::cout << sizeof a / sizeof a[0] << '\n';
}

If all you have is a pointer, you have to rely on passing around the size as a separate entity. Or you can wrap the whole thing in an object so that it has a size() method, just like the vector class. :)

The reason I thought that arrays were so much better was from this experiment. I just wrote to a big vector and then read back from it and timed it - similar to this:

void FunctionWithVectorSubscript()
{
  vector<int> test;
  for(int i = 0; i < ArraySize; i ++)
    {
      test.push_back(i);
    }

  int temp;

  for(int i = 0; i < ArraySize; i ++)
    {
      temp = test[i];
    }


}

Here are the results. As I figured, much of the time was due to using calls like push_back() and at().

Array time: 17 ms
BestVector time(write using [], read using [] ): 39 ms
WorstVector time (write using push_back(), read using at() ): 89 ms (an addittional 50!)
VectorPushBackSubscript time(write using push_back(), read using [] ): 62 ms (an additional 20)
VectorSubscriptAt time(write using [], read using at() ): 50 ms (an additional 10)

But it still seems to take twice as long even with the best way of using a vector vs an array.

Thoughts? (I can attach the entire "test" program if you want, it was just alot of lines so I just gave the one example function)

Thanks,
Dave

A good comparison will use a dynamic array using new and delete, and resize() or the size constructor for the vector to set the size initially just like the array. A good test will also use comparable operations. Comparing direct assignment for an array with push_back() for a vector will be no contest for the array.

Finally, be sure to run in some kind of optimized release mode where your compiler's debugging code is disabled. On Ed's compiler, vector::operator[] is identical to vector::at() in debug mode. Comparing the unchecked [] operator for an array with a checked [] operator for vector is an unfair test.

But it still seems to take twice as long even with the best way of using a vector vs an array.

Performance is tricky to guess about, but Edward would guess without seeing your code that you used a real array and not a dynamic array in the test. That saves an extra indirection step that the vector still has. But since you need the array size set at runtime, the array test doesn't fit your goals.

Ed would also guess that you tested in debug mode without optimizations. Any good library class will have a strong debugging framework, but that skews the results in favor of the array because the array is completely unchecked.

It seems we have more and more aimless talking here. What's an amusing statement:

I am not really doing operations on the vector, I am just storing values and retrieving them later. The problem is there are just lots and lots of them.

Of course, you declare and maintain ALL data structures to store and retrieve values. The key point is: what for? It's an absolutely senseless debates about milli-microseconds while the author keeps silence about operations with these dynamically or statically or what else allocated data arrays/vectors/valarrays etc..

Pretty harsh, don't you say? I clearly did not know the difference in a lot of these things, hence my post on a forum....

And also, the question was indeed about the time taken to store and retrieve these values, not to perform any operations on them. If the answer is "it doesn't matter how you store them" then that is fine with me.

Sorry, don't take offence at my previous post. The point is that nobody stores and retrives data in the core memory for no particular reason. You want to process these data values, I want to process these data values, he/she wants...

You may fill the vector of 10000000 doubles for 50 milliseconds then process them for 72 hours (to solve 3D system of differential equations, for example) - or you may fill dynamically allocated valarray for 25 milliseconds then process it for 24 hours. May be, your program wants to calculate sum then print the only number and die. Feel the difference...

The problem is there are just lots and lots of them.

What's a problem? The problem is data processing. It's not a problem to allocate storage for 1000000 words. The problem is what you want to do with these lots and lots of them. If you have 10Mb text file and want to collect this text dictionary, you need store data in some kind of balanced tree, not in std::vector or array. May be, you have 100 GB raw array for data mining - then prepare it for loading to a proper OLAP system: read sequentially block by block, convert, rewrite to a mass storage device - it's the other story, "it doesn't matter how you store them" because of you can't "store" 100 GB in the core...

That's why I said about "aimless talking".

This article has been dead for over six months. Start a new discussion instead.