What gives you the impression that creating such a macro would be more efficient than without it? Obviously each compiler might do it differently, but if you look at the disassembled code for VC++ 2008 Express compiled for Debug all the compiler has to do is push the address of the std::string object on the stack, the address of the string literal on the stack, and call a std::string method to concantinate the two. Seems to me that inserting your _MyMacro would just complicate things and slow it down, not speed it up, or it would have absolutely no affect one way or the other.
Then allow me to elaborate:
As chip designers, not C++ programmers, we would normally construct these routines at ASM/uC level and thus have complete control over both implicit/explicit code generation. However, due to product restructuring, we must now produce native C/C++ "programmer-friendly" firmware which is readily accessible by designer and non-designer alike. As most of our systems are tailored specifically for the embedded market, these routines MUST often maintain a customer-defined T-State/Memory metric in order to complete post-production testing.
From your rather elementary diagnosis of the VC++ debug code, the parameter reference semantic is indeed correct - after all, what else would it be - but it's what happens subsequent to method invocation that constitutes the real crux of our problem. No matter which way you cut it, ASCIIZ strings often require iterative length parsing when size/processing phases are not easily interchangeable and as such, introduce additional back-end T-States that were not present prior to our adoption of C/C++.
In order to alleviate this problem, we have created a custom string class - std::string is neither desirable nor available - which successively stores the reference count, string length, buffer length and string content as part of its transient reference buffer. We have of
course included methods which explicitly deal with standard ASCIIZ strings at run-time, but it would be both "programmer-friendly" and efficient if we could coerce the implicit (Construct -> "String Literal", Reference -> char *) compile-time semantic in order to construct our own custom literals directly within the executable's data section.
Thus, depending upon preference, the programmer could invoke:
String1("Hello"); <- Familiar syntax, but length processed during run-time construction.
OR
String1(_MyMacro("Hello")); <- Reasonably familiar syntax, but length processed via compile-time construction and immediately available at run-time.
Note that the transient buffer containing both string parameters and fixed-length content are of POD type and subsequently referenced from within one or multiple non-POD string objects. Of course, the literal transient buffers could be manually constructed via standard C/C++ constructs, but this regresses back to our "unfriendly" ASM/uC practices.