Efficient Programming

Question

FUDragon 0 Newbie Poster

21 Years Ago

Ok, everyone in a programming efficiency class or still remembers their notes on the subject...I've got a question.

I have a program where I have to take a performance hit--there's no way around it.
My choices:
1. Take the hit when a create an object and the hit is that I do x assign statements twice.
2. Take the hit every time I reinit the object and the hit is that I have an extra jump.

I don't know how many times I'll reinit the same object versus create a new instance. There will be at least one section where I'll probably reinit the same object several times.

Which one is the worst hit in terms of CPU cycles only?

4 Contributors
3 Replies
163 Views
7 Months Discussion Span
Latest Post 21 Years Ago Latest Post by infamous

All 3 Replies

Dani 4,675 The Queen of DaniWeb

21 Years Ago

Sorry, I don't think I can offer much help on this one. The only thing I know about time and space complexity is Big-Oh notation and Big-Omega notation!

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

alk 0 Newbie Poster · Answer 1 · 2004-03-04T07:38:35+00:00

there must be a way to time how long each option takes (which would give you the answer)

infamous 26 Junior Poster in Training · Answer 2 · 2004-03-24T02:39:36+00:00

you can use this code i've written to measure the number of cycles each one takes. however, im already sure that teh 2nd way is faster. also u should always inline any simple assingment functions so that there will be no function call. creating/destroying objects will most certainly incur more cycles than just assigning.

What this is:
-a small C/asm source file to help benchmark a function that iterates over a data set.

What it does:
-computes the number of cycles required for the function to run
-computes the CPE(cycles per element) of your function

How it Works:
-it uses Pentium specific assembly instructions to read the processors timestamp counter, which is a 64 bit value that represents the number of cycles passed since the processor was reset

What you do:
-you write a function that takes and returns a void pointer for an argument. in the main function you pack your necessary args into a structure or w/e, and then unpack it in your function. you call the function i wrote, test_it() and you pass to it a pionter to your function and your packed up argument structure. the test_it() function will then run your function passing it your argument and benchmark the performance.

Does it work:
-yes actually well as far as i can tell. but that doesnt mean it actually does ;)

Is it annoyingly complex?:
-no i hope not. i provided a pretty clear example imo. if you feel otherwise then tell me so.

Code is available in zip and tar:
http://www.1nfamus.netfirms.com/#benching
and here it is if you wanna just look:

/*	12/25/03 - Merry Xmas!!!
  *	This code is meant to provide reasonably accurate benchmarking of functions
  *	that iterate over a set of data.  it is meant to be as modular as possible
  *	so as to make testing of different functions go as fast as possible.  It 
  *	calculates the total number of cycles required for a function to run.  It 
  *	also calculates the CPE of a function that iterates over a dataset. What is
  *	the CPE you ask? 
  *	CPE - cycles per element.  the number of cycles required to process an element
  *	of a data set.  This is a term i stole from this book(which i highly recommend):
  *	[url]http://csapp.cs.cmu.edu/[/url]
  *	It's a good way imo(and theirs) of benchmarking code b/c it lets you clearly see
  *	how the processor is performing. example: the intel pentiums have an integer arithmetic
  *	unit that is capable of executing addittion with a latency of 1 cycle. it is also
  *	capable of starting a new instruction every cycle.  Now if you just time your code using
  *	something like times, you really have no idea how close your code is coming to reaching
  *	the max capabilities of the processor. if you instead have a measure of how many cycles
  *	it takes for an element to be processed, there is much clearer relationship between what
  *	is going on in the processor and where the delays are occurring.  
  *	
  *	BUGS:
  *		i have compared all my tests to the benchmarks in the above book, and my results
  *		using their code are nearly the same as their results, so im fairly confident that
  *		this works correctly.  i emailed my code to the author and asked him to check it 
  *		out, and will be updating anything as necessary and reposting in the thread where
  *		this was posted.  
  *	TESTED ON:
  *		this code was written and compiled on redhat 8 and debian 4.  i've come to learn
  *		(unfortunately) that some of the code i've written will compile fine on one version
  *		of gcc and then have several errors on others; so im only hoping that you'll be able
  *		to compile this. i'd like to make it work on as many platforms as possible, so if you
  *		fix it to work on a different one then let me see it plz. 
  *	BUILD:
  *		due to the idiocy of the gcc inline assembler the asm CANNOT be inlined or it breaks
  *		as soon as it is optimized.  so i had to stick the asm routines in a separate assembly
  *		file. the way i've compiled is like so:
  *		gcc -Wall this_source_file.c the_assembly_source.s  
  *		
  *		feel free to do w/e u like with this but if you make it better u 
  *		gotta share with me. 
  *  UPDATED:
  *	  02/04/03 - fixed it so that it uses __u64 unsigned ints to store the stamp values. before
  *	  i wasn't checking for an overflow in the low 32 bits of the counter, now it does :).
  *			-sean larsson	*/

and some output:

[n00b@highjack3d] ./a.out
loop is not unrolled
overhead is 37 cycles
+-function took 4167 cycles That's a CPE of 4.069336
+-function took 3243 cycles That's a CPE of 3.166992
+-function took 2926 cycles That's a CPE of 2.857422
+-function took 2983 cycles That's a CPE of 2.913086
+-function took 3028 cycles That's a CPE of 2.957031
sum was 523776
unrolling the loop by 6
overhead is 37 cycles
+-function took 2990 cycles That's a CPE of 2.919922
+-function took 1972 cycles That's a CPE of 1.925781
+-function took 1998 cycles That's a CPE of 1.951172
+-function took 2025 cycles That's a CPE of 1.977539
+-function took 1978 cycles That's a CPE of 1.931641
sum was 523776

from teh output you can clearly see the difference between code/data being in the cache or not.

ps. this information was gleaned from the followin:
-intel manuals, primarily vol2,3, and optimizing 1
-the above mentioned book
-the link laying around in one of these threads about performance posted by jc(peenie) regarding the rdtsc instruction proper use of it

Efficient Programming

Recommended Answers Collapse Answers

All 3 Replies

Recommended Answers