Array question
I have been using vector in c++ lately, but I needed to speed up my code a bit so someone recommended I use regular arrays. My question is 2 part -
1) I thought you couldn't do this!!?? For some reason it is working with no problem?
int a=2;
int test[a];
2) I used a timer (from the VUL library of VXL) to time the following two functions. The only difference is one of the arrays is allocated using a global int, and the other is allocated using a #define. It actually turns out the the global int is twice as fast!!?? I thought they should be about the same (if the int even worked at all! (see question 1))
FunctionWithArray time: 17 ms
FunctionWithDynamicArray time: 8 ms
#include <iostream>
using namespace std;
int DynamicSize = 2000000;
#define ArraySize 2000000
void FunctionWithDynamicArray();
void FunctionWithArray();
int main()
{
FunctionWithDynamicArray();
FunctionWithArray();
return 0;
}
void FunctionWithDynamicArray()
{
int test[DynamicSize];
for(int i = 0; i < DynamicSize; i ++)
{
test[i] = i;
}
}
void FunctionWithArray()
{
int test[ArraySize];
for(int i = 0; i < ArraySize; i ++)
{
test[i] = i;
}
}
Any thoughts on either of these?
Thanks,
Dave
daviddoria
Posting Virtuoso
1,996 posts since Feb 2008
Reputation Points: 437
Solved Threads: 204
> 1) I thought you couldn't do this!!?? For some reason it is working with no problem?
Depends on your compiler.
Some compiler extensions permit this, so you just get lucky.
Enable whatever "strict ANSI" flags for a better idea of what is allowed and what isn't
2 million int's = 8MB of stack space.
Since stack space is allocated on demand, the first one to be called has all the overheads of allocating yet another page of stack every 4K.
But the second one - now that's already got the stack allocated from the first one, so it just blitzes through.
Swap the calls over, you'll see the reverse.
Or they could have been the same for a completely difference reason. Namely that the array is not used, and the compiler could have optimised the code out.
Salem
Posting Sage
11,531 posts since Dec 2005
Reputation Points: 5,862
Solved Threads: 953
you're right, I switched the order and in both cases the first one took longer!!
How would you make an array of size determined at run time if you're compiler doesn't allow this?
Thanks,
Dave
daviddoria
Posting Virtuoso
1,996 posts since Feb 2008
Reputation Points: 437
Solved Threads: 204
int *myArray = new int [ somesize ];
Salem
Posting Sage
11,531 posts since Dec 2005
Reputation Points: 5,862
Solved Threads: 953
Never use std::vector for numbers crunching (in scientific or engineering calculations). This STL container never intended for fast calculations. If you don't want to take much trouble over new/delete, use std::valarray class.
For example, times to calculate sum of 10000000 double type elements:
1. vector[]: ~50 milliseconds
2. vector/iterator: ~90 milliseconds
3. array[]: ~25 millisecond
4. valarray[]: ~26 milliseconds
5. valarray.sum(): ~24 milliseconds
AMD 5000+,VC++ 9 (Plauger's STL implementation), release mode, optimization for speed.
The vector was previously resized and initialized - no "overhead of C++'s object model" in the test loop. Timing precision was equal to ~1 microsecond.
ArkM
Postaholic
2,001 posts since Jul 2008
Reputation Points: 1,234
Solved Threads: 348
I agree with Radical Edward - you may save some time switching from vectors to arrays (maybe half the time), but you may be able to structure your data so that you can perform your most often called operations in log(n) time instead of linear time. In this case a linear solution that took 1 second might take 10ms - a much larger improvement. What exactly are you storing and what kind of operations are you doing that is taking your program too long?
mahlerfive
Junior Poster in Training
77 posts since Aug 2008
Reputation Points: 33
Solved Threads: 18
Alas, there are lots of problems (in computational physics, for exampe) where you never achieve O(log n) or more realistic O(n*log n) computational complexity. For example, you never multiply two matrices with better than O(n**2) operations (all existing algorithms have slightly better than O(n**3) ). There are lots of gas and fluids mechanics problems where computational complexity O(n**4) or even more...
ArkM
Postaholic
2,001 posts since Jul 2008
Reputation Points: 1,234
Solved Threads: 348
I am not really doing operations on the vector, I am just storing values and retrieving them later. The problem is there are just lots and lots of them. I'll look into valarray's, but for now, if I use a standard array
int *myArray = new int [ somesize ];
don't I have to then manually delete the variable when I'm done with it?
Also, how can you get the length of the array after the fact (ie. the equivalent of MyVector.size() )?
Thanks,
Dave
daviddoria
Posting Virtuoso
1,996 posts since Feb 2008
Reputation Points: 437
Solved Threads: 204
The reason I thought that arrays were so much better was from this experiment. I just wrote to a big vector and then read back from it and timed it - similar to this:
void FunctionWithVectorSubscript()
{
vector<int> test;
for(int i = 0; i < ArraySize; i ++)
{
test.push_back(i);
}
int temp;
for(int i = 0; i < ArraySize; i ++)
{
temp = test[i];
}
}
Here are the results. As I figured, much of the time was due to using calls like push_back() and at().
Array time: 17 ms
BestVector time(write using [], read using [] ): 39 ms
WorstVector time (write using push_back(), read using at() ): 89 ms (an addittional 50!)
VectorPushBackSubscript time(write using push_back(), read using [] ): 62 ms (an additional 20)
VectorSubscriptAt time(write using [], read using at() ): 50 ms (an additional 10)
But it still seems to take twice as long even with the best way of using a vector vs an array.
Thoughts? (I can attach the entire "test" program if you want, it was just alot of lines so I just gave the one example function)
Thanks,
Dave
daviddoria
Posting Virtuoso
1,996 posts since Feb 2008
Reputation Points: 437
Solved Threads: 204
daviddoria
Posting Virtuoso
1,996 posts since Feb 2008
Reputation Points: 437
Solved Threads: 204
It seems we have more and more aimless talking here. What's an amusing statement:
I am not really doing operations on the vector, I am just storing values and retrieving them later. The problem is there are just lots and lots of them.
Of course, you declare and maintain ALL data structures to store and retrieve values. The key point is: what for? It's an absolutely senseless debates about milli-microseconds while the author keeps silence about operations with these dynamically or statically or what else allocated data arrays/vectors/valarrays etc..
ArkM
Postaholic
2,001 posts since Jul 2008
Reputation Points: 1,234
Solved Threads: 348
Pretty harsh, don't you say? I clearly did not know the difference in a lot of these things, hence my post on a forum....
And also, the question was indeed about the time taken to store and retrieve these values, not to perform any operations on them. If the answer is "it doesn't matter how you store them" then that is fine with me.
daviddoria
Posting Virtuoso
1,996 posts since Feb 2008
Reputation Points: 437
Solved Threads: 204
Sorry, don't take offence at my previous post. The point is that nobody stores and retrives data in the core memory for no particular reason. You want to process these data values, I want to process these data values, he/she wants...
You may fill the vector of 10000000 doubles for 50 milliseconds then process them for 72 hours (to solve 3D system of differential equations, for example) - or you may fill dynamically allocated valarray for 25 milliseconds then process it for 24 hours. May be, your program wants to calculate sum then print the only number and die. Feel the difference...
The problem is there are just lots and lots of them.
What's a problem? The problem is data processing. It's not a problem to allocate storage for 1000000 words. The problem is what you want to do with these lots and lots of them. If you have 10Mb text file and want to collect this text dictionary, you need store data in some kind of balanced tree, not in std::vector or array. May be, you have 100 GB raw array for data mining - then prepare it for loading to a proper OLAP system: read sequentially block by block, convert, rewrite to a mass storage device - it's the other story, "it doesn't matter how you store them" because of you can't "store" 100 GB in the core...
That's why I said about "aimless talking".
ArkM
Postaholic
2,001 posts since Jul 2008
Reputation Points: 1,234
Solved Threads: 348