0

Ok, everyone in a programming efficiency class or still remembers their notes on the subject...I've got a question.

I have a program where I have to take a performance hit--there's no way around it.
My choices:
1. Take the hit when a create an object and the hit is that I do x assign statements twice.
2. Take the hit every time I reinit the object and the hit is that I have an extra jump.

I don't know how many times I'll reinit the same object versus create a new instance. There will be at least one section where I'll probably reinit the same object several times.

Which one is the worst hit in terms of CPU cycles only?

4
Contributors
3
Replies
5
Views
14 Years
Discussion Span
Last Post by infamous
0

Sorry, I don't think I can offer much help on this one. The only thing I know about time and space complexity is Big-Oh notation and Big-Omega notation!

0

there must be a way to time how long each option takes (which would give you the answer)

0

you can use this code i've written to measure the number of cycles each one takes. however, im already sure that teh 2nd way is faster. also u should always inline any simple assingment functions so that there will be no function call. creating/destroying objects will most certainly incur more cycles than just assigning.

What this is:
-a small C/asm source file to help benchmark a function that iterates over a data set.

What it does:
-computes the number of cycles required for the function to run
-computes the CPE(cycles per element) of your function

How it Works:
-it uses Pentium specific assembly instructions to read the processors timestamp counter, which is a 64 bit value that represents the number of cycles passed since the processor was reset

What you do:
-you write a function that takes and returns a void pointer for an argument. in the main function you pack your necessary args into a structure or w/e, and then unpack it in your function. you call the function i wrote, test_it() and you pass to it a pionter to your function and your packed up argument structure. the test_it() function will then run your function passing it your argument and benchmark the performance.

Does it work:
-yes actually well as far as i can tell. but that doesnt mean it actually does ;)

Is it annoyingly complex?:
-no i hope not. i provided a pretty clear example imo. if you feel otherwise then tell me so.

Code is available in zip and tar:
http://www.1nfamus.netfirms.com/#benching
and here it is if you wanna just look:

/*	12/25/03 - Merry Xmas!!!
  *	This code is meant to provide reasonably accurate benchmarking of functions
  *	that iterate over a set of data.  it is meant to be as modular as possible
  *	so as to make testing of different functions go as fast as possible.  It 
  *	calculates the total number of cycles required for a function to run.  It 
  *	also calculates the CPE of a function that iterates over a dataset. What is
  *	the CPE you ask? 
  *	CPE - cycles per element.  the number of cycles required to process an element
  *	of a data set.  This is a term i stole from this book(which i highly recommend):
  *	[url]http://csapp.cs.cmu.edu/[/url]
  *	It's a good way imo(and theirs) of benchmarking code b/c it lets you clearly see
  *	how the processor is performing. example: the intel pentiums have an integer arithmetic
  *	unit that is capable of executing addittion with a latency of 1 cycle. it is also
  *	capable of starting a new instruction every cycle.  Now if you just time your code using
  *	something like times, you really have no idea how close your code is coming to reaching
  *	the max capabilities of the processor. if you instead have a measure of how many cycles
  *	it takes for an element to be processed, there is much clearer relationship between what
  *	is going on in the processor and where the delays are occurring.  
  *	
  *	BUGS:
  *		i have compared all my tests to the benchmarks in the above book, and my results
  *		using their code are nearly the same as their results, so im fairly confident that
  *		this works correctly.  i emailed my code to the author and asked him to check it 
  *		out, and will be updating anything as necessary and reposting in the thread where
  *		this was posted.  
  *	TESTED ON:
  *		this code was written and compiled on redhat 8 and debian 4.  i've come to learn
  *		(unfortunately) that some of the code i've written will compile fine on one version
  *		of gcc and then have several errors on others; so im only hoping that you'll be able
  *		to compile this. i'd like to make it work on as many platforms as possible, so if you
  *		fix it to work on a different one then let me see it plz. 
  *	BUILD:
  *		due to the idiocy of the gcc inline assembler the asm CANNOT be inlined or it breaks
  *		as soon as it is optimized.  so i had to stick the asm routines in a separate assembly
  *		file. the way i've compiled is like so:
  *		gcc -Wall this_source_file.c the_assembly_source.s  
  *		
  *		feel free to do w/e u like with this but if you make it better u 
  *		gotta share with me. 
  *  UPDATED:
  *	  02/04/03 - fixed it so that it uses __u64 unsigned ints to store the stamp values. before
  *	  i wasn't checking for an overflow in the low 32 bits of the counter, now it does :).
  *			-sean larsson	*/

and some output:

[n00b@highjack3d] ./a.out
loop is not unrolled
overhead is 37 cycles
+-function took 4167 cycles That's a CPE of 4.069336
+-function took 3243 cycles That's a CPE of 3.166992
+-function took 2926 cycles That's a CPE of 2.857422
+-function took 2983 cycles That's a CPE of 2.913086
+-function took 3028 cycles That's a CPE of 2.957031
sum was 523776
unrolling the loop by 6
overhead is 37 cycles
+-function took 2990 cycles That's a CPE of 2.919922
+-function took 1972 cycles That's a CPE of 1.925781
+-function took 1998 cycles That's a CPE of 1.951172
+-function took 2025 cycles That's a CPE of 1.977539
+-function took 1978 cycles That's a CPE of 1.931641
sum was 523776

from teh output you can clearly see the difference between code/data being in the cache or not.

ps. this information was gleaned from the followin:
-intel manuals, primarily vol2,3, and optimizing 1
-the above mentioned book
-the link laying around in one of these threads about performance posted by jc(peenie) regarding the rdtsc instruction proper use of it

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.