In a scenario that caller doesn't necessarily need callee to use memory of one of its local variables of a primitive type to track that variable later for anything, I know that when passing that variable to a callee function; it'd be done by value and that makes total sense for primitive types with a size equal to or smaller than size of pointer; but I'm curious that from an assembly point(performance-wise or space-wise), wouldn't be there a situation when passing a variable of a primitive type with a size bigger than pointers like long double on my platform which has a size of 8 bytes and is bigger than pointers that have a size of 4 bytes; by reference would be more efficient? like an imagined situation where pointer can be loaded directly into some register by caller but the primitive itself not and thus no need to load the pointer from callee stack frame to some register by callee and there's 8 more free bytes of stack memory in the end comparing to pass by value where there'd be 8 more used bytes of stack memory.
If passing by reference might ever be more efficient, how can we know to pass by reference or value?

Edited 6 Years Ago by Garrett2011: n/a

I'm not really sure what you're asking, but I think you're asking about the efficiency of passing conventions.

In your example, if you were originally passing an 8-byte long and you change it to a 4-byte pointer it wouldn't make an 8-byte difference. It would only make a 4-byte difference, because you are still passing 4-bytes of information.

As far as the related assembly goes, I really can't say because I don't know assembly well enough. When dealing with primitives, I suspect it would be similar from an operations POV because you still need to read some sort of information then push the information onto the stack. Where you pick up your efficiency, if any, is the volume of data being moved. It doesn't take as long to move 4-bytes as it does 8-bytes.

Hope that helps...

Edited 6 Years Ago by Fbody: n/a

would ever passing a primitive to functions by reference be more efficient ?
Yes, because you avoid passing a copy. So you save memory and execution time

I'm not really sure what you're asking, but I think you're asking about the efficiency of passing conventions.

You're right.

In your example, if you were originally passing an 8-byte long and you change it to a 4-byte pointer it wouldn't make an 8-byte difference. It would only make a 4-byte difference, because you are still passing 4-bytes of information.

The 4-byte pointer may be loaded into some register and thus no stack usage as far as I know.

As far as the related assembly goes, I really can't say because I don't know assembly well enough. When dealing with primitives, I suspect it would be similar from an operations POV because you still need to read some sort of information then push the information onto the stack. Where you pick up your efficiency, if any, is the volume of data being moved. It doesn't take as long to move 4-bytes as it does 8-bytes.

Hope that helps...

not enough gain for me, thanks anyway.

Edited 6 Years Ago by Garrett2011: n/a

Consider this simple example:

double twice(double d) {
  return 2.0*d;
};

int main() {
  double result = twice(2.0);
  std::cout << "result is: " << result << std::endl;
  return 0;
};

This would, most probably, with no optimization and no inlining, to something like this in pseudo-assembly code (I'm just a novice in real asm and can't write this):

twice:
  //here it does some preamble like pushing the stack frame and instruction register
  mov fr(0), -8(%ebp)   //fetches the parameter from the stack
  mul fr(0), $2.0       //multiplies fr(0) with 2 and stores it in fr(0)
  mov -8(%esp), fr(0)   //puts the result on the stack for return value
  mov %edx, %esp        //copies the stack frame address on edx register (result)
  sub %edx, $8          //decrement result pointer to point to the actual pointer
  //unwinds the stack frame
main:
  //does some application starting stuff
  sub %esp, $16          //allocates 16 bytes for both the parameter and result
  mov -8(%esp), $2.0     //writes the value 2.0 in the smallest address stack position (in the parameter spot directly)
  call twice             //calls the function twice
  mov -16(%esp), (%edx)  //copies the value pointed to by edx (result) to stack position -16 from esp
  //call the std::cout printing stuff using the result value in stack position -8 from esp.
  //unwinds the stack before exiting.

The lesson to take from the above, is that if the parameter that is passed to the function is already stored in the stack frame of the caller, the compiler will try to initially place it in the spot that it needs to be when the function is calls. In other words, in simple cases, the parameter value is not copied before the function call at all. If you use a pass-by-reference, this is not as likely because the compiler has to be able to resolve, from the callee code only, that passing-by-value has the same effect and is faster. If it does do a pass-by-reference, which is essentially a pass-by-pointer (at the implementation level), it has to take the address of the parameter (wherever it may be) and place it on the stack (in the correct calling convention's order).

So, obviously, this is a bit complicated. If the parameter is already in the caller stack, then passing-by-value is better because it has either no copy at all or a copy from stack to stack or register(s) to stack (very local, little overhead). If the parameter is fetched from elsewhere (like the heap), then taking its address and passing-by-reference is faster, from the caller's point of view, than copying it to the stack, but the callee will have to perform that fetch operation anyways so you don't stand a gain. So, in conclusion, passing-by-value is faster in any case when you are sure that the callee will need the entire parameter's value at once, which is the case for any primitive or other slightly-bigger-but-still-simple types.

This is because by leaving the fetching operation on the caller's side, you allow for no fetching at all when it is not necessary, while if you put the fetching in the callee (with a pass-by-reference or pointer) you (almost)force a fetch operation that will be, in a bad case, a copy from heap-to-stack, or, in the best case, just a copy from stack-to-stack (leaving out the two better options: register-to-stack (if the parameter is resident on a register) or no-copy-at-all (if the parameter is resident in the right spot in the stack)). This is why, for example, even passing a simple struct like a 3D vector of doubles is also better by value for a function like one that computes the magnitude of that vector, because you know that eventually, all the three (x,y,z) values will have to go in the callee's stack, so mind as well put them in before the call to save some time if the vector parameter in question is already in the caller's stack and doesn't need to be copied.

The opposite case is when the parameter passed is some complex thing like an object with many data members where the called function is expected to address only a few of those values, or address all of them but at different times. Another case is passing an array which is most likely processed with a loop and thus, rarely will the compiler copy the entire array on the stack (if it can do it at all). In those cases, obviously, you pass by reference or pointer.

So the rule of thumb that many people hold about that issue is: "if the type is as big or smaller than its pointer-type, then pass-by-value". This, in my opinion, is overly simplistic and passing-by-value can be better for bigger types too, depending on the nature of the code and the expected memory pattern (it's not a black and white issue). My personal threshold is essentially a 3D vector of doubles, anything bigger is passed-by-reference, but, again, most of my types are polymorphic, so passing-by-value is often out-of-the-question.

Disclaimer: I don't claim to be an expert on this, so don't take the above for granted, it is just the way I understand it and I would be happy to be proven wrong, thus increasing my knowledge.

Also, dont forget that optimisation options are reduced if you pass a pointer to a function instead of the value, const <should> avoid this.

In my experience its best to write code with not so much regard to efficiency/speed as this, then when its working to a reasonable degree use a profiler so you just concentrate on modifying functions which give a worthwhile improvement. its best to check if things like this actually make the difference you are hoping for. before that its best to make sure any slow algorithms cant be improved, then perhaps use SSE if its realy number crunchingly hungry.

This article has been dead for over six months. Start a new discussion instead.