I have come across a strange behavior..

#include <stdio.h>
#include <conio.h>



int strlen(const char *);

int main()
{
	char str[]="rahul";
	printf("%d",strlen(str));
	_getch();
	return 0;
}
int strlen(const char *str)
{
	const char *eostr=str;
	while(*eostr++);
		
	return eostr - str;
	
}

Outputs :
In VS9 Length:6 // correct
In gcc Length:5 //a less iteration ... why it is??
In devcpp Length:6 //correct

If I edit the strlen function as

int strlen(const char *str)
{
	const char *eostr=str;
	while(*eostr)
		eostr++;
	return eostr - str;
	
}

Now output:

n VS9 Length:5 // correct
In gcc Length:5 //still 5.. same no of iteration
In devcpp Length:5 //correct

Any idea please??

Cheers!!

Recommended Answers

All 26 Replies

When I create my own string length function I use

int mystrlen(char *s)
{
	int i = 0;

	while (*s++)
		++i;
	return i;
}

Try this function

Also by gcc do you mean Linux?

So the question you are asking is in regard to the output of the code on top when using gcc?

#include <stdio.h>

int strlenA(const char *str)
{
   const char *eostr = str;
   while ( *eostr++ );
   return eostr - str;
}

int strlenB(const char *str)
{
   const char *eostr = str;
   while ( *eostr )
   {
      eostr++;
   }
   return eostr - str;
}

int main()
{
   char str[] = "rahul";
   printf("strlenA(\"%s\") = %d\n", str, strlenA(str));
   printf("strlenB(\"%s\") = %d\n", str, strlenB(str));
   return 0;
}

/* my output
strlenA("rahul") = 6
strlenB("rahul") = 5
*/

gcc --version
gcc (GCC) 3.4.5 (mingw-vista special)

When I create my own string length function I use

int mystrlen(char *s)
{
	int i = 0;

	while (*s++)
		++i;
	return i;
}

Try this function

Also by gcc do you mean Linux?

Your above function will work fine but not fully optimized as mine.
The reason is the extra overhead of instructions ie
for every iteration
a. Fetch the value of i from the memory and put it in a register
b. add one to it.
c. Put the result back in memory.
In my function we can avoid these few extra cycles.

Yes gcc means Linux.

Cheers!!

Your above function will work fine but not fully optimized as mine.
The reason is the extra overhead of instructions ie
for every iteration
a. Fetch the value of i from the memory and put it in a register
b. add one to it.
c. Put the result back in memory.
In my function we can avoid these few extra cycles.

Maybe. Maybe not. Optimizing compilers can, well, optimize. Sometimes trying to do it yourself you can unoptimize too. It depends. I would verify your expectations by at least examining the generated assembly.

Your above function will work fine but not fully optimized as mine.
The reason is the extra overhead of instructions ie
for every iteration
a. Fetch the value of i from the memory and put it in a register
b. add one to it.
c. Put the result back in memory.
In my function we can avoid these few extra cycles.

Yes gcc means Linux.

Cheers!!

I don't use a Windows machine but doesn't windows use a different newline combination. What I mean is Linux uses '\n\0' for new line end of string what does Windows use?

Maybe. Maybe not. Optimizing compilers can, well, optimize. Sometimes trying to do it yourself you can unoptimize too. It depends. I would verify your expectations by at least examining the generated assembly.

Hey Dave I commented after examining the assembly code generated(disassembled actually)..tested with VS 9 and gcc both.
Compiler can optimize instructions but cannot avoid them. And I think this is where the difference lies between a good compiler and a bad programming !
Please correct me if I am wrong and please post your assembly analysis.

I don't use a Windows machine but doesn't windows use a different newline combination. What I mean is Linux uses '\n\0' for new line end of string what does Windows use?

Windows uses CR+LF combination for new line.
Does Linux use '\n\0' combination or just '\n' for new line?
I don't think this has anything to do with the OS ,coz the program is passing a zero terminated string. Or may this has to do with OS ,I am not sure at this point of time.

Cheers!!

I don't know if you had previously mentioned examining the generated code, but now I'll keep in mind that you are in that rare 1% that actually does. I'm more familiar with the other 99% that make similar claims without looking.

I was misreading some prior code as

int qux(const char *s)
{
   int i = 0;
   while ( s[i] )
   {
      ++i;
   }
   return i;
}

How might that compare? For me, practically the same as your pointer version: no difference in the loop, but it avoids the subtraction.

So was your original question regarding the output of a particular version of gcc?

I don't know if you had previously mentioned examining the generated code, but now I'll keep in mind that you are in that rare 1% that actually does. I'm more familiar with the other 99% that make similar claims without looking.

Thanks! I will take that as compliment.

There could be three ways of doing this,so I thought of doing analysis of all the three ways

int strlen(const char *str)
{
	const char *eostr=str;
	while(*eostr++);
	
	return eostr - str -1;	
	
}
only while loop:(masm) 
X:
move eax,[eostr]
move edx,eax                      //temp register for comparison 
lea      eax,[eostr]
inc      dword ptr[eax]
cmp    byte ptr[edx],0
je       xyz
jmp X
xyz:subtraction code          //extra overhead

We can remove this subtraction using your code which is also identical with minor catches..

int qux(const char *s)
{
   int i = 0;
   while ( s[i] )
   {
      ++i;
   }
   return i;
}
Only while:
X:
mov eax,[s]
add eax,[i]
cmp byte ptr[eax],0
je xyz
lea eax,[i]
inc dword ptr[eax]
jmp X 

xyz: No subtraction over head here better than above code but ....

Can we have better code than both ??
Lets try this one !!

int strlen(const char *str)
{
	const char *eostr=str;
	while(*eostr)
                  eostr++;
	
	return eostr - str;	
	
}


X:mov eax,[eostr]
   cmp byte ptr[eax],0       //no temp register like the first code 
   je xyz
   lea eax,[eostr]
   inc dword ptr[eax]
   jmp X

xyz: subtraction here

I think last one is better than both coz it is using less instructions within the loop ,though it has a subtraction but out of the loop.
Your code has an extra add instruction.

Assumed all arguments and local variables within 1 byte of offset.
Many compilers have different way of handling these things,I tried to generate a general code.
Please post your opinion if this could be done in a more better way.

I have got the solution of my problem,will post it afterwards.

Cheers!!

What I was looking at was (in part):

int bar(const char *str)
{
	const char *eostr=str;
	while(*eostr)
   {
		eostr++;
   }
	return eostr - str;

}

int qux(const char *s)
{
   int i = 0;
   while ( s[i] )
   {
      ++i;
   }
   return i;
}

gcc -O3 -S main.c

And rearranging the generated assembly side-by-side:

_bar:                     _qux:
   pushl   %ebp              pushl   %ebp
   movl    %esp, %ebp        xorl    %eax, %eax
   movl    8(%ebp), %edx     movl    %esp, %ebp
   cmpb    $0, (%edx)        movl    8(%ebp), %edx
   movl    %edx, %eax        cmpb    $0, (%edx)
   jmp     L15               jmp     L36
L16:                      L37:
   incl    %eax              incl    %eax
   cmpb    $0, (%eax)        cmpb    $0, (%eax,%edx)
L15:                      L36:
   jne     L16               jne     L37
   popl    %ebp              popl    %ebp
   subl    %edx, %eax        ret
   ret
#include <stdio.h>
int strlen(const char*s);
int main()
{
   strlen("duggy");
   return 1;
}

int strlen(const char *s)
{
   // My  function
}

perfectly alright in VS ,will call my strlen .

But on top of gcc , my strlen function is not called...instead by default the standard library function is called....
why is that?
because when I look into symbol table strlen is resolved statically
To be more specific I am pasting the symbol table generated by gcc compiler

main.o:     file format elf32-i386

SYMBOL TABLE:
00000000 l    df *ABS*	00000000 main.c
00000000 l    d  .text	00000000 .text
00000000 l    d  .data	00000000 .data
00000000 l    d  .bss	00000000 .bss
00000000 l    d  .rodata	00000000 .rodata
00000000 l    d  .note.GNU-stack	00000000 .note.GNU-stack
00000000 l    d  .comment	00000000 .comment
00000000 g     F .text	0000006e main
00000000         *UND*	00000000 printf
[b]0000006e g     F .text	0000002f strlen[/b]

Above strlen(which is my strlen) is bounded with an known offset so it is binded statically...so no sign of imported strlen.
Then how the standard library strlen is called??

Any idea?

Cheers!!

I'll sidestep the why for a moment and mention that strlen is a reserved identifier and simply should not be used for your own function name.

[edit] C Reserved Identifiers

Context
The standard lists (7.1.3/4.1.2.1) the categories of reserved identifiers, where "reserved" means, essentially, prohibited for use by the individual C programmer. Recall that "if a program declares or defines an identifier with the same name as an identifier reserved in that context .., the behavior is undefined" (emphasis added). The "context" combines the concepts of scope (6.1.2.1/3.1.2.1), linkage (6.1.2.1/3.1.2.2), and name space (6.1.2.3/3.1.2.3).

identifier       context    header     ISO        ANSI
strlen             ext     string.h   7.11.6.3   4.11.6.3

ext: always reserved for use with external linkage (7.1.3/4.1.2.1). This means that, even if you don't include the indicated header file, you still must never create global variables or non-static functions with these names. Strictly speaking, you can create local variables, static functions, typedefs, or macros with those names (depending on which headers you include, and other circumstances, you may need to #undef them first), but it's a bad idea because even if the compiler gets it right you'll confuse any humans who look at your program.

commented: Thanks dave. Very good explanation. +17
main.o:     file format elf32-i386

SYMBOL TABLE:
00000000 l    df *ABS*	00000000 main.c
00000000 l    d  .text	00000000 .text
00000000 l    d  .data	00000000 .data
00000000 l    d  .bss	00000000 .bss
00000000 l    d  .rodata	00000000 .rodata
00000000 l    d  .note.GNU-stack	00000000 .note.GNU-stack
00000000 l    d  .comment	00000000 .comment
00000000 g     F .text	0000006e main
00000000         *UND*	00000000 printf
[b]0000006e g     F .text	0000002f strlen[/b]

I notice your symbol table is for main.o so has it been linked with the libraries yet?...Or is that your point that strlen is resolved here so why is it linking to the external library in the linking stage

main.o:     file format elf32-i386

SYMBOL TABLE:
00000000 l    df *ABS*	00000000 main.c
00000000 l    d  .text	00000000 .text
00000000 l    d  .data	00000000 .data
00000000 l    d  .bss	00000000 .bss
00000000 l    d  .rodata	00000000 .rodata
00000000 l    d  .note.GNU-stack	00000000 .note.GNU-stack
00000000 l    d  .comment	00000000 .comment
00000000 g     F .text	0000006e main
00000000         *UND*	00000000 printf
[b]0000006e g     F .text	0000002f strlen[/b]

I notice your symbol table is for main.o so has it been linked with the libraries yet?...Or is that your point that strlen is resolved here so why is it linking to the external library in the linking stage

yes exactly that !

@dave: Really nice information,Thanks.
According to that If I use strlen in my program gcc is free to link it on its own way ignoring my function definition.
Standard wise not allowed,VS is better in that case either it allows you cleanly or stop you at compile it..but gcc forces you to follow standards.
Actually did not get enough time to hack into the way linker implements after violation.I have already given my object file symbol table... below is the executable symbol table..

maintest:     file format elf32-i386

SYMBOL TABLE:
08048154 l    d  .interp	00000000              .interp
08048168 l    d  .note.ABI-tag	00000000              .note.ABI-tag
08048188 l    d  .hash	00000000              .hash
080481b4 l    d  .gnu.hash	00000000              .gnu.hash
080481d8 l    d  .dynsym	00000000              .dynsym
08048238 l    d  .dynstr	00000000              .dynstr
0804828c l    d  .gnu.version	00000000              .gnu.version
08048298 l    d  .gnu.version_r	00000000              .gnu.version_r
080482b8 l    d  .rel.dyn	00000000              .rel.dyn
080482c0 l    d  .rel.plt	00000000              .rel.plt
080482d8 l    d  .init	00000000              .init
080482f0 l    d  .plt	00000000              .plt
08048330 l    d  .text	00000000              .text
08048524 l    d  .fini	00000000              .fini
08048540 l    d  .rodata	00000000              .rodata
08048560 l    d  .eh_frame	00000000              .eh_frame
08049f0c l    d  .ctors	00000000              .ctors
08049f14 l    d  .dtors	00000000              .dtors
08049f1c l    d  .jcr	00000000              .jcr
08049f20 l    d  .dynamic	00000000              .dynamic
08049ff0 l    d  .got	00000000              .got
08049ff4 l    d  .got.plt	00000000              .got.plt
0804a00c l    d  .data	00000000              .data
0804a018 l    d  .bss	00000000              .bss
00000000 l    d  .comment	00000000              .comment
00000000 l    df *ABS*	00000000              main.c
08049ff4 l     O .got.plt	00000000              .hidden _GLOBAL_OFFSET_TABLE_
08049f0c l       .ctors	00000000              .hidden __init_array_end
08049f0c l       .ctors	00000000              .hidden __init_array_start
08049f20 l     O .dynamic	00000000              .hidden _DYNAMIC
0804a00c  w      .data	00000000              data_start
08048470 g     F .text	00000005              __libc_csu_fini
08048330 g     F .text	00000000              _start
00000000  w      *UND*	00000000              __gmon_start__
00000000  w      *UND*	00000000              _Jv_RegisterClasses
08048540 g     O .rodata	00000004              _fp_hw
08048524 g     F .fini	00000000              _fini
00000000       F *UND*	000001b9              __libc_start_main@@GLIBC_2.0
[B]08048442 g     F .text	0000002c              strlen[/B]
08048544 g     O .rodata	00000004              _IO_stdin_used
0804a00c g       .data	00000000              __data_start
0804a010 g     O .data	00000000              .hidden __dso_handle
08048480 g     F .text	00000067              __libc_csu_init
00000000       F *UND*	00000039              printf@@GLIBC_2.0
0804a018 g       *ABS*	00000000              __bss_start
0804a01c g       *ABS*	00000000              _end
0804a018 g       *ABS*	00000000              _edata
080484e7 g     F .text	00000000              .hidden __i686.get_pc_thunk.bx
080483d4 g     F .text	0000006e              main
080482d8 g     F .init	00000000              _init

looks to me like static linking of strlen but standard libraries are resolved dynamically.
Please if one can make out something ?
Or which is the right place to have such kind of queries?

Cheers!!

You should try an object dump of the code

objdump -D filename>filedest

Also I tried this on my Gcc compiler which is version 4.3.2 and it behaves the same

Found something interesting

#include <stdio.h>

size_t strlen(const char *s);

int ans = 0;

int main(int argc, char**argv)
{
	return ans = strlen("gerard");
}

size_t strlen(const char *s)
{
	int i = 0;
	
	while (*s++)
		++i;
		return i + 10;
}
000000000040048c <main>:
  40048c:	55                   	push   %rbp
  40048d:	48 89 e5             	mov    %rsp,%rbp
  400490:	89 7d fc             	mov    %edi,-0x4(%rbp)
  400493:	48 89 75 f0          	mov    %rsi,-0x10(%rbp)
  400497:	c7 05 0f 04 20 00 06 	movl   $0x6,0x20040f(%rip)        # 6008b0 <ans>
  40049e:	00 00 00 
  4004a1:	8b 05 09 04 20 00    	mov    0x200409(%rip),%eax        # 6008b0 <ans>
  4004a7:	c9                   	leaveq 
  4004a8:	c3                   	retq

Found something interesting in the object dump...The compiler is calculating the strlen an placing the value directly into my global variable ans

The line

400497: c7 05 0f 04 20 00 06 movl $0x6,0x20040f(%rip) # 6008b0 <ans>

is proof that there is no function call at all just a compiler computed value is dumped into ans....Can you verify this on your end

You should try an object dump of the code

objdump -D filename>filedest

Also I tried this on my Gcc compiler which is version 4.3.2 and it behaves the same

O yes I did but need some time to sort out coz gas syntaxes are alien to me I am more familiar with masm .
Meanwhile if you can make out something please do post.
Cheers!!

O yes I did but need some time to sort out coz gas syntaxes are alien to me I am more familiar with masm .
Meanwhile if you can make out something please do post.
Cheers!!

Did look at the prev post

This will call the function with a little trickery

#include <stdio.h>

typedef size_t(*pfunc)(const char*);

size_t strlen(const char *s);

int ans = 0;

int main(int argc, char**argv)
{
	pfunc tfunc = (pfunc)strlen;
	return ans = tfunc("gerard");
}

size_t strlen(const char *s)
{
	int i = 0;
	
	while (*s++)
		++i;
		return i + 10;
}

exe now returns 16 to operating system bypassing the compiler generated value

Yes the trick works.

is proof that there is no function call at all just a compiler computed value is dumped into ans....Can you verify this on your end

yes behaving same on my side.
This is coz strlen is getting linked statically ..not only static linking but also the code is inlined by the linker as the function is too small.
Compiler is not doing the calculation,it's the linker.
But still I am not sure ... I think the glibc is the standard C library in Linux... All the C standard library functions are in it .So that is supposed to be linked dynamically by default which is the default behavior of gcc.
Like it always link printf dynamically.
In that case this can be assumed that this is the linker's decision which function should be linked statically and inlined or dynamically.
Coz if linker has to make something inline that function has to be statically linked(here strlen)..but codes cannot be patched(or made inline) at runtime ie by a loader.Loader can do only address patching.
At this point of time this is my assumption only.
I might be wrong.

Cheers!!

Yes the trick works.

yes behaving same on my side.
This is coz strlen is getting linked statically ..not only static linking but also the code is inlined by the linker as the function is too small.
Compiler is not doing the calculation,it's the linker.
But still I am not sure ... I think the glibc is the standard C library in Linux... All the C standard library functions are in it .So that is supposed to be linked dynamically by default which is the default behavior of gcc.
Like it always link printf dynamically.
In that case this can be assumed that this is the linker's decision which function should be linked statically and inlined or dynamically.
Coz if linker has to make something inline that function has to be statically linked(here strlen)..but codes cannot be patched(or made inline) at runtime ie by a loader.Loader can do only address patching.
At this point of time this is my assumption only.
I might be wrong.

Cheers!!

My understanding is - linkers handle address resolution and compilers generate efficient code by optimizing where possible i.e. reducing a redundant or constant calculations to a fixed value...

Here's some code that can't optimize away the function call

#include <stdio.h>

size_t strlen(const char *s);

int ans = 0;

int main(int argc, char**argv)
{
	char ch[20];
	fputs("enter a string->", stdout);
	fgets(ch, 19, stdin);
	return ans = strlen(ch);
}

size_t strlen(const char *s)
{
	int i = 0;
	
	while (*s++)
		++i;
		return i + 10;
}

Here's some code that can't optimize away the function call

#include <stdio.h>

size_t strlen(const char *s);

int ans = 0;

int main(int argc, char**argv)
{
	char ch[20];
	fputs("enter a string->", stdout);
	fgets(ch, 19, stdin);
	return ans = strlen(ch);
}

size_t strlen(const char *s)
{
	int i = 0;
	
	while (*s++)
		++i;
		return i + 10;
}

I all have to say is..Why does this one work as intended if its the linker or some other mysterious thing at work...Shouldn't it return the normal strlen result instead of the one I defined here...It doesn't the one I defined works fine in this situation because the compiler can't optimize the function call away...

I all have to say is..Why does this one work as intended if its the linker or some other mysterious thing at work...Shouldn't it return the normal strlen result instead of the one I defined here...It doesn't the one I defined works fine in this situation because the compiler can't optimize the function call away...

Tell me how can a compiler calculate the strlen if it does not know what the string is.You are inputting the string at runtime ie out of compiler's reach.To optimze the code it need a string constant.. something at compile time.

Cheers!!

Tell me how can a compiler calculate the strlen if it does not know what the string is.You are inputting the string at runtime ie out of compiler's reach.To optimze the code it need a string constant.. something at compile time.

Cheers!!

This I don't know for certain but I do know that the object dumps support the compiler calculating the value and is this so hard to conceive - we use strlen all the time maybe they(gcc) decide to allow optimization on this function since its used frequently and its value is just the len of a string - so when we substitute our own strlen function(one that doesn't return a standard string length) we can get inconsistent values....I know this sounds pretty far out but everything thing I've seen points to the compiler optimizing the function call away...

This I don't know for certain but I do know that the object dumps support the compiler calculating the value and is this so hard to conceive -

Yes it is,if something is pushed to some runtime user input.
That is why there are things like dynamic..dynamic variables,binding ,,linking.. where compiler can assume only.

we use strlen all the time maybe they(gcc) decide to allow optimization on this function since its used frequently and its value is just the len of a string - so when we substitute our own strlen function(one that doesn't return a standard string length) we can get inconsistent values....I know this sounds pretty far out but everything thing I've seen points to the compiler optimizing the function call away...

Yes exactly I think so!!
But may be there are some other criteria to decide to like how complex the function is... strlen is a simple function.. compiler can deal with this...
But we also use printf frequently..try that with functions like printf or strncat which are complex... no optimisation in those cases..

Cheers!!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.