0

I wrote strlen() for practice but it's really slow. Even if I copy the code from my compiler and run it, it's still slow. This is my test.

/*
  strlen_comp
  
  Comparing my and Visual C++'s strlen
    by Kimberly Hamrick
*/
#include <assert.h> // for result tests
#include <string.h> // for C++'s strlen()
#include <time.h>   // for timing stuff
#include <iostream> // for C++ I/O

using namespace std;

namespace hamrick {
  // partial namespace for prototypes
  size_t strlen( const char *str );
  size_t xstrlen( const char *str );
}

typedef size_t (*strlen_pfunc)( const char * );

void unit_test( const char *title, const char *data, strlen_pfunc func ) {
  // add the time for a bunch of runs
  clock_t start = clock();
  for ( int i = 0; i < 100000000; ++i )
    func( data );
  cout<< title <<" -- "<< ( double( clock() ) - start ) / CLOCKS_PER_SEC <<endl;
}

int main() {
  // test different cases
  assert( hamrick::strlen( "" ) == size_t( 0 ) );
  assert( hamrick::strlen( "A" ) == size_t( 1 ) );
  assert( hamrick::strlen( "ABCDEFGHIJKLMNOPQRSTUVWXYZ" ) == size_t( 26 ) );
  assert( hamrick::strlen( 0 ) == size_t( 0 ) );

  // test performance
  unit_test( "strlen          ", "ABCDEFGHIJKLMNOPQRSTUVWXYZ", strlen );
  unit_test( "hamrick::strlen ", "ABCDEFGHIJKLMNOPQRSTUVWXYZ", hamrick::strlen );
  unit_test( "hamrick::xstrlen", "ABCDEFGHIJKLMNOPQRSTUVWXYZ", hamrick::xstrlen );

  return 0;
}

namespace hamrick {
  // this is my version of strlen
  size_t strlen( const char *str ) {
    // treat a null pointer as an empty string
    size_t len = 0;

    if ( str != 0 ) {
      // find the end and count each character
      while ( *str != '\0' ) {
        ++len;
        ++str;
      }
    }

    return len;
  }

  // I copied this from the compiler source
  size_t xstrlen( const char *str ) {
    const char *eos = str;
    while( *eos++ ) ;
    return( eos - str - 1 );
  }
}

When I run it hamrick::strlen() and hamrick::xstrlen() are about 10 times slower than strlen()! Why?

strlen           -- 1.874
hamrick::strlen  -- 9.904
hamrick::xstrlen -- 9.405
Press any key to continue . . .
4
Contributors
10
Replies
11
Views
10 Years
Discussion Span
Last Post by Hamrick
0

That's cool, thanks! But that doesn't answer my question. This is what I stole for the xstrlen() test.

size_t __cdecl strlen (
        const char * str
        )
{
        const char *eos = str;

        while( *eos++ ) ;

        return( eos - str - 1 );
}

But when I call strlen(), it's 10 times faster than xstrlen() when the code for both is the same. Why is that?

0

Did you compile the debug code, or the release code?

Did you disable the stack checking?
- the diagnostic checks for stack overflow and/or stack corruption will dominate the time for small functions.

In anticipation of your next question (how do I disable...), which version of Visual C++ are you using?

Did you look at the actual assembler instructions generated by the compiler for xstrlen() ?

0

Did you compile the debug code, or the release code?

Debug.

In anticipation of your next question (how do I disable...), which version of Visual C++ are you using?

Visual C++ 2005 Express Edition.

Did you look at the actual assembler instructions generated by the compiler for xstrlen() ?

I tried that, but the assembly output doesn't show me the code generated for C++'s strlen(), just my strlen() and xstrlen(). I can't look at the assembler to see what C++'s strlen() is doing different.

0

> the assembly output doesn't show me the code generated for C++'s strlen(), just my strlen() and xstrlen().

the microsoft C++ compiler recognizes and directly inserts optimized inline code for several functions (including strlen). functions like strlen in the Standard C library are available in both intrinsic and non-intrinsic forms. we can control the use of intrinsics by inserting #pragma intrinsic statements into the source code or using the -Oi compiler switch.

0

the microsoft C++ compiler recognizes and directly inserts optimized inline code for several functions (including strlen). functions like strlen in the Standard C library are available in both intrinsic and non-intrinsic forms. we can control the use of intrinsics by inserting #pragma intrinsic statements into the source code or using the -Oi compiler switch.

I get the same result if intrinsic forms are enabled or disabled. :(

0

without intrinsics, the code generated for hamrick::strlen is:

;    COMDAT ?strlen@hamrick@@YAIPBD@Z
_TEXT    SEGMENT
_str$ = 8                        ; size = 4
?strlen@hamrick@@YAIPBD@Z PROC                ; hamrick::strlen, COMDAT

; 49   :     // treat a null pointer as an empty string
; 50   :     size_t len = 0;
; 51   : 
; 52   :     if ( str != 0 ) {

    mov    ecx, DWORD PTR _str$[esp-4]
    xor    eax, eax
    test    ecx, ecx
    je    SHORT $LN1@strlen

; 53   :       // find the end and count each character
; 54   :       while ( *str != '\0' ) {

    cmp    BYTE PTR [ecx], al
    je    SHORT $LN1@strlen
    npad    2
$LL2@strlen:

; 55   :         ++len;
; 56   :         ++str;

    add    ecx, 1
    add    eax, 1
    cmp    BYTE PTR [ecx], 0
    jne    SHORT $LL2@strlen
$LN1@strlen:

; 57   :       }
; 58   :     }
; 59   : 
; 60   :     return len;
; 61   :   }

    ret    0
?strlen@hamrick@@YAIPBD@Z ENDP                ; hamrick::strlen
_TEXT    ENDS

for hamrick::xstrlen, it is

PUBLIC    ?xstrlen@hamrick@@YAIPBD@Z            ; hamrick::xstrlen
; Function compile flags: /Ogtpy
;    COMDAT ?xstrlen@hamrick@@YAIPBD@Z
_TEXT    SEGMENT
_str$ = 8                        ; size = 4
?xstrlen@hamrick@@YAIPBD@Z PROC                ; hamrick::xstrlen, COMDAT

; 65   :     const char *eos = str;

    mov    ecx, DWORD PTR _str$[esp-4]
    mov    eax, ecx
$LL2@xstrlen:

; 66   :     while( *eos++ ) ;

    mov    dl, BYTE PTR [eax]
    add    eax, 1
    test    dl, dl
    jne    SHORT $LL2@xstrlen

; 67   :     return( eos - str - 1 );

    sub    eax, ecx
    sub    eax, 1

; 68   :   }

    ret    0
?xstrlen@hamrick@@YAIPBD@Z ENDP                ; hamrick::xstrlen
_TEXT    ENDS

for the crt strlen, this is the code picked up directly from the debugger:

78144550  mov         ecx,dword ptr [esp+4] 
78144554  test        ecx,3 
7814455A  je          78144580 
7814455C  mov         al,byte ptr [ecx] 
7814455E  add         ecx,1 
78144561  test        al,al 
78144563  je          781445B3 
78144565  test        ecx,3 
7814456B  jne         7814455C 
7814456D  add         eax,0 
78144572  lea         esp,[esp] 
78144579  lea         esp,[esp] 
;----------------------------------------------------------
78144580  mov         eax,dword ptr [ecx] 
78144582  mov         edx,7EFEFEFFh 
78144587  add         edx,eax 
78144589  xor         eax,0FFFFFFFFh 
7814458C  xor         eax,edx 
7814458E  add         ecx,4 
78144591  test        eax,81010100h 
78144596  je          78144580 
;-------------------------------------------------------------
78144598  mov         eax,dword ptr [ecx-4] 
7814459B  test        al,al 
7814459D  je          781445D1 
7814459F  test        ah,ah 
781445A1  je          781445C7 
781445A3  test        eax,0FF0000h 
781445A8  je          781445BD 
781445AA  test        eax,0FF000000h 
781445AF  je          781445B3 
781445B1  jmp         78144580 
781445B3  lea         eax,[ecx-1] 
781445B6  mov         ecx,dword ptr [esp+4] 
781445BA  sub         eax,ecx 
781445BC  ret

the loop that executes in this code is from instructions at address 78144580 to 78144596; this compares 4 bytes at a time. (hamrick::strlen,xstrlen compare one byte at a time.) the code before this loop is to get to a multiple of 4-bytes (for alignment) and the code after that is to take care of the trailing bytes. because the string we are using to test is a literal, it starts on a 4-byte multiple. 27 bytes need to be tested, but because of alignment in the SDA, a null byte of padding would be added at the end making it a clear multiple of 28 bytes.
because of this

char test_str[] = "0ABCDEFGHIJKLMNOPQRSTUVWXYZ\0\0\0" ;
unit_test( "strlen          ", test_str+1, strlen );

would make the results closer
and

char test_str[] = "0ABCD\0\0\0" ;
unit_test( "strlen          ", test_str+1, strlen );

would make hamrick::strlen,xstrlen actually faster than CRT strlen (provided we disable intrinsics).

0

That makes sense. But even when I disable intrinsics, the result is the same. Is Visual C++ really using a different function than the one in the CRT source folder?

0

That makes sense. But even when I disable intrinsics, the result is the same. Is Visual C++ really using a different function than the one in the CRT source folder?

it appears so - the disassembly code i posted is with intrinsics disabled. perhaps there are different versions of sources for the debug version of the library and the release version. i do not have the CRT source for visual studio with me (i tested it on vc++ express); you could look into the CRT source tree for a directory containing assembly code.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.