Lets say I have a shared library in Linux, written in C, with a single function that simply returns the pointer to a string literal.

void *ReturnObject() {
  return (void *)"\xFA\x03\x44\x10\xE0";
}

When this library is loaded by the host application at runtime and this function is called, is the returned pointer pointing to some place in RAM or to the binary file on disk? Is RAM ever involved in this situation other than storing the pointer itself?

I ask because I plan on implementing this sort of thing in my project and the string will be several megabytes in size, and I'm worried that this strategy will in some way consume an equal amount of RAM.

If it does eat up RAM, would closing the library connection after the host program is done with this function release that memory?

Recommended Answers

All 15 Replies

When this library is loaded by the host application at runtime and this function is called, is the returned pointer pointing to some place in RAM or to the binary file on disk? Is RAM ever involved in this situation other than storing the pointer itself?

That's heavily dependent on the compiler/linker, and how they manage the rodata segment. However, this is easily testable if you're working with the target system and not cross-compiling.

The pointer is invalid after the return. Exactly how it's implemented depends on the compiler. If you want to access it, you'll need to give it a scope that's permanent -- the most common way to do this is to make it a static or global variable.

In any case, yes, your string will always be in RAM. If you release the memory so other code can use it, then you can't guarantee that your pointer is pointing to anything useful. In the best of circumstances, using a pointer after release will cause your program to crash.

A more useful way to access a binary string like this without using up a large portion of available memory would be to access it from some other medium (such as a data file), using the file I/O functions. Access it a small chunk at a time, and you won't tie up 'several megabytes' holding the entire string.

The pointer is invalid after the return. Exactly how it's implemented depends on the compiler. If you want to access it, you'll need to give it a scope that's permanent -- the most common way to do this is to make it a static or global variable.

Isn't a string literal by definition a static value? Are you suggesting that every parallel instance of the call to ReturnObject() would load a unique copy of the string to RAM thereby invalidating the pointer on the return? If anything, I would think the compiler would be more efficient than that since its not a variable and only offer one copy to all instances of the function call.

I don't know what the behaviour would be, if it is even possible, to overwrite a literal string's data.

That's heavily dependent on the compiler/linker, and how they manage the rodata segment. However, this is easily testable if you're working with the target system and not cross-compiling.

I am using GCC. I could test it of course, but given the circumstances at the moment it would be easier to just ask in the off chance I could get a quick accurate answer.

Lets say I have a shared library in Linux, written in C, with a single function that simply returns the pointer to a string literal.

void *ReturnObject() {
  return (void *)"\xFA\x03\x44\x10\xE0";
}

When this library is loaded by the host application at runtime and this function is called, is the returned pointer pointing to some place in RAM or to the binary file on disk? Is RAM ever involved in this situation other than storing the pointer itself?

I ask because I plan on implementing this sort of thing in my project and the string will be several megabytes in size, and I'm worried that this strategy will in some way consume an equal amount of RAM.

If it does eat up RAM, would closing the library connection after the host program is done with this function release that memory?

Linux provides several IPC methods, shared memory and memory queues come to mind.

Linux provides several IPC methods, shared memory and memory queues come to mind.

Inter-process Communication does in fact offer many useful systems for shared memory and such. Unfortunately, this question concerns the behaviour of one process and one library, not multiple processes and its intercommunication methodology for resources shared by that library.

Err ... no, I don't think it /can/ be static. After all, you are creating a new array each time the function is called and returning a pointer to it. The caller can change this array -- and should expect it to be the same on the next invocation.

GCC may well implement this in the way you want, but you can't rely on that to be the behavior used tomorrow, or with some other compiler.

Caveat: I haven't looked at the standard in some time. I *could* be wrong about the behavior of a string literal like that, but I know it's a great way to create bugs.

When this library is loaded by the host application at runtime and this function is called, is the returned pointer pointing to some place in RAM or to the binary file on disk?

Pointers do not point places in RAM. The are pointing into the address space of the process. Portions of the address space are mapped in and out of the physical RAM as the process execution goes on.

Is RAM ever involved in this situation other than storing the pointer itself?

Of course. The processor may only operate on data which are physically present in RAM.

If it does eat up RAM, would closing the library connection after the host program is done with this function release that memory?

Most likely, yes. As Narue said already, test it on a target system.

The keyword of the underlying subject is "address space" as nezachem mentioned.

Here is the bottom line of what I need to take away from these responses:

- The exact behavior is compiler specific, thus platform specific.
- Testing the target platform should be done to ensure that it behaves as I expect.
- I should not rely on that behavior for portability sake.

Ultimately, the question of what a string literal IS in C apparently is undefined by the C standard.

The confusion I had comes from my work on writing firmware for Cypress CPU's. In there, string literals are written to ROM and accessed a byte at a time to a CPU register. RAM is never involved. Then again, I write it in Assembly.

Multiple instances of the same string literal becomes even more interesting. Some compilers have an option (switch) that allows the compiler to combine all instances of the same string literal into just one instance. So if you have coded the string "Hello" multiple times the compiler will generate just one instance of that string. If that switch is not set then the compiler will include multiple instances of the string in the program, thus increasing program size. You will have to read the compiler's docs to see if such a switch is available, and if not then how the compiler handles it.

Multiple instances of the same string literal becomes even more interesting. Some compilers have an option (switch) that allows the compiler to combine all instances of the same string literal into just one instance. So if you have coded the string "Hello" multiple times the compiler will generate just one instance of that string. If that switch is not set then the compiler will include multiple instances of the string in the program, thus increasing program size. You will have to read the compiler's docs to see if such a switch is available, and if not then how the compiler handles it.

That sounds useful for programs which heavily use strings defined within pre-processor macro's.

I tried this code just now to test a theory in GCC under linux:

void func(int index) {
   char *buf = "Hello World";
   if (index >= 0 && index < strlen(buf)) {
      buf[index] = 'A';
      printf("%s\n", buf);
   }
}
int main(int argc, char** argv) {
    func(1);
    func(2);
    fuct(3);
    return 0;
}

I designed this experiment to test Dervish1's logic, that the compiler is quote "creating a new array each time the function is called". This implies the ability to edit such an array at runtime since there is no reason a compiler would disallow access to a generated instance of memory.

I expected the following result:

HAllo World
HeAlo World
HelAo World

Instead I got a SIGSEGV exception at func(1) on the line "buf[index] = 'A';". That to me means that there was some attempt to write to this memory but it was an invalid request, perhaps because it is not in fact creating a copy.

Not to say that another compiler would'nt allow this or perhaps works exactly as Dervish1 describes, but the outcome of this test on this compiler seems to suggest an alternative system.

It is more likely to estimate, without knowing the detailed mechanics of how Linux executes a program, that when the binary is executed the entire code is loaded to RAM so it can be run by the processor, thus when the program wants to read a byte off of a string literal, it is reading it from RAM indirectly via the address space mechanism. Since this program is loaded in a protected memory space, the Operating System rejects the program's request to write to it.

I think that's a pretty good guess. If anyone has any link to a reference explaining this mechanism better, I would love to read it.

char * str = "some string"; creates an immutable variable. char str[] = "some string"; creates a mutable one.

char * str = "some string"; creates an immutable variable. char str[] = "some string"; creates a mutable one.

Thank you for your comment. You demonstrated a local variable initialization technique in which C creates and initializes a local "str" array with the value and length a string literal. However, this topic concerns the mechanics behind the string literal "some string", not exactly how to make it local.

Consider that the "some string" string in your second example is still stored in the executable and goes through the exact same querying process as it would in the first example, except that the compiler is adding code to initialize that local variable with the value of the given string literal, same as...

char str[12]; 
memcpy(str,"some string",12);

I think that's a pretty good guess. If anyone has any link to a reference explaining this mechanism better, I would love to read it.

Here is a related thread

I was addressing your point to wit:

This implies the ability to edit such an array at runtime since there is no reason a compiler would disallow access to a generated instance of memory.

...unless you created an immutable string constant.

Here is a related thread

Thank you for your effort locating that information. It appears that explanation closely matches my guesstimate.

I was addressing your point to wit

Here's a paradox: Try to write an address inside a point. Now that's witty!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.