Hello,


I am not sure if this is the place to ask.

I develop an algorithm in C++ under Windows XP with Visual studio 2008,

the algorithm is proved to be linear ( O(n) ),

there is a usage of memory during the algorithm ( all of it is being allocated dynamically before the algorithm starts )

and I measure the time it took to complete the computation and I observed the following -


1. using 4 MB of memory take X time of running.

2. using 40 MB of memory, only 4 MB is used, but it is scattered through all of the 40 MB,

( e.g. I mean that I can use any part of the 40 MB allocated, but guaranteed to use only 4 MB )

and this takes 1.3X time of running.


Does anyone have an explanation to this ? or references to read ?

maybe the amount of page faults involved ?

NOTE: this question was first posted here -
http://www.cplusplus.com/forum/windows/35330/
with no reply, so I am sorry if someone double viewed it.

Thanks,

Omri

Recommended Answers

All 10 Replies

1.3 times as long doesn't seem very significant, how long does the 'fastest' algorithm take? How many times did you run both algorithms.

Can you show us the exact difference in code between both approaches, your explanation is a little unclear to me.

1.3 times as long doesn't seem very significant, how long does the 'fastest' algorithm take? How many times did you run both algorithms.

Can you show us the exact difference in code between both approaches, your explanation is a little unclear to me.

What do you mean 1.3 times is not so long ? it is 30% percent more than I expect.
Do you have any example to simple and known algorithms that behave in this way ?

The 'fastest' algorithm runs for 0.2 seconds and the 'slower' runs for 0.26 seconds.

The number of times I ran this checks ( if I understand you correct ) is estimated
to be some hundreds times.

It is a problem for me to show the differences between the executions using my code,
but I can explain a bit more.
The best way to describe my situation, is to take 10,000 points in an XY plane
and then to take 100,000 points in the same plane, these points gets their coordinate on the plane in a random fashion. The algorithm runs only on the first 10,000 points in the points array, and "sums" up the values in the 8 physically closest points. which means that this points can be anywhere in the points array, and most likely not neighbors in the points array.

I hope this helps and thanks for the quick interest.
Omri

I didn't say 1.3 is not so long, but a difference of 30% on 0.2 seconds is not that much.

However since you ran it many times it is meaningful - so you can ignore that comment ;).

What gonefission posted is likely to be part of the 'problem'.
A cache miss won't generate a page fault though, so you won't see an increase of page faults with increasing cache misses.


Also, this may be silly, but just checking... you allocate the memory before starting the algorithm - do you then hand over a pointer to this block of memory? Or is it copied somewhere.

I didn't say 1.3 is not so long, but a difference of 30% on 0.2 seconds is not that much.

However since you ran it many times it is meaningful - so you can ignore that comment ;).

What gonefission posted is likely to be part of the 'problem'.
A cache miss won't generate a page fault though, so you won't see an increase of page faults with increasing cache misses.


Also, this may be silly, but just checking... you allocate the memory before starting the algorithm - do you then hand over a pointer to this block of memory? Or is it copied somewhere.

The memory is allocated at the beginning and a pointer to the memory
is being used. There are some copies during the algorithm run, these copies are
needed for the algorithm and cannot be reduced.
I don't have currently any estimation of the amount of copied data in each run,
but it should not be too much comparing to the amount of allocated data.

Omri

Google gets me to this page:
http://oprofile.sourceforge.net/docs/intel-P4-events.php

I don't know how good it is.

I don't know how closed source your algorithm is, but I have VS2010 through uni and I wouldn't mind running the profile for you, if it's not too much of a hassle to get it to compile.

Google gets me to this page:
http://oprofile.sourceforge.net/docs/intel-P4-events.php

I don't know how good it is.

I don't know how closed source your algorithm is, but I have VS2010 through uni and I wouldn't mind running the profile for you, if it's not too much of a hassle to get it to compile.

OProfile is a nice tool :), but unfortunately they don't support Windows,
and my code isn't supported under Linux currently.. :(

My code is pretty closed :(, so I can't share it,
but thanks a lot for the offer.

Omri

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.