strcmp implementation from scratch

tux4life 0 Tallied Votes 673 Views Share

This is how mine strcmp function would look like if I'd to implement it from scratch :)

/**
@author: Mathias Van Malderen (tux4life)
*/

int strcmp(const char *s1, const char *s2)
{
    while((*s1 && *s2) && (*s1++ == *s2++));
    return *(--s1) - *(--s2);
}
tux4life 2,072 Postaholic

Yet to mention: I didn't include a NULL-pointer check :D

agamiro 0 Newbie Poster

your solution doesn't deal with the case that one string is the prefix of the other.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster

>>return *(--s1) - *(--s2);
That may give you the wrong result. It should be return *s1 - *s2; There are four possible return values

  1. If *s1 is '\0' and *s2 is not, then the return value will be some negative value.
  2. If *s1 is NOT '\0' but s2 is, then the return value will be some positive value
  3. If both *s1 and *s2 are '\0' then the return result will be 0.
  4. If neither *s1 nor *s2 is '\0' then the return value could be either positive or negative, depending on their values.

In either event, decrementing s1 and s2 before the subtraction may (and most probably will) produce the wrong result because s1 and s2 will not be pointer to the characters that cause the previous loop to terminate.

tux4life 2,072 Postaholic

@Ancient Dragon:
Thanks for pointing out this issue, I didn't notice it when I was writing the code. However, the solution you suggest (unfortunately) won't fix the problem, since it will create another one.
If I replace return *(--s1) - *(--s2); by return *s1 - *s2; ,
then strcmp("a", "b") would yield the result 0, and not -1, because both s1 and s2 are incremented by one in the while's expression, s1 and s2 will point to the
nul-terminators by the time the return statement is encountered, and thus return *s1 - *s2; will result in returning 0.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster

correction

int mstrcmp(const char *s1, const char *s2)
{
    while((*s1 && *s2) && (*s1 == *s2))
        s1++,s2++;
    return *s1 - *s2;
}
tux4life commented: Yup, now I can fully agree on you :) +8
jephthah commented: i'm glad tux agrees with you. :P +6
Dave Sinkula 2,398 long time no c Team Colleague

Here's a better idea for the return value: return an explicit value instead of allowing the possibility of signed integer overflow?

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster

@Dave: how can *s1 - *s2 cause an integer overflow? The maximum values would be -255 (0 - 255). That's hardly even close to an integer overflow value.

Dave Sinkula 2,398 long time no c Team Colleague

@AD: I'll leave that as an exercise for the programmer.

You don't really like portable programming or the design process involved, do you? If it happens to run on a particular compiler at a given moment in time, it appears "correct" to you?

Do you remember that qsort comparison function with the integer overflow issue?

No. I don't wantto continue. You are the lord of this roost. I am exhausted trying to get people to write portable code -- especially since you always come along and tell people to write implementation-specific stuff. Again, this is why I think of you as Daniweb's Herb Schildt. But he is a good author who has influenced many C and C++ programmers. It just sucks that it took him like 10 years to clean up his code. I think I need a break from all of this stupid bickering. Please take some time to wander into c.l.c and learn some new things.

tux4life 2,072 Postaholic

@Dave: I didn't quite get you when you said:

Here's a better idea for the return value: return an explicit value instead of allowing the possibility of signed integer overflow?

Also, you seem to do exactly the same as I and AD did in your code snippet:
http://www.daniweb.com/code/snippet216567.html .
So, could you please clarify what you meant with returning an explicit value to avoid the possibility of a signed integer overflow ?

[EDIT]
@Dave:
Do you perhaps mean: return (int) *s1 - (int) *s2; ?
[/EDIT]

Dave Sinkula 2,398 long time no c Team Colleague

Also, you seem to do exactly the same as I and AD did in your code snippet:
http://www.daniweb.com/code/snippet216567.html .

Ah. From back in the day when I could have possibly edited it. :( Dozens of times I've wished I could change that naive implementation; I've learned things in the past five years. But hey -- I no longer have any interest; you appear to. Enjoy solving the problem.

Your explicit casts do the usual arithmetic conversions? So that makes the implicit cast explicit. Do you see the problem then? What's (INT_MIN - 1) - 4, for example? I guess it's a nevermind; who cares; it works for me; code by happenstance(?).

[edit]Interesting:

The sign of a nonzero value returned by the comparison functions memcmp, strcmp,
and strncmp is determined by the sign of the difference between the values of the first
pair of characters (both interpreted as unsigned char) that differ in the objects being
compared.

tux4life 2,072 Postaholic
Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster

@Dave: what are you rambling on about??? What does INT_MAX have to do with subtracting two character values?

>>You are the lord of this roost. I
Not because I am any smarter than anyone else -- I just liked to post and answer questions a lot. I never said I knew everything -- indeed I've screwed up lots of times.

>>especially since you always come along and tell people to write implementation-specific stuff

Not knowingly. If *s1 - *s2 is implementation specific then please tell us why instead of pussyfooting around it. If there is a case where that does not work, then please tell me because I'm all ears :)

Dave Sinkula 2,398 long time no c Team Colleague

Haven't I done this already?

Signed integer overflow is undefined behavior.
C'mon AD -- "it works ok for me?" :cheesy:

Here's a better idea for the return value: return an explicit value instead of allowing the possibility of signed integer overflow?

What does INT_MAX have to do with subtracting two character values?

3.3.6 Additive operators

Semantics

If both operands have arithmetic type, the usual arithmetic conversions are performed on them.

3.2.1 Arithmetic operands
3.2.1.1 Characters and integers

A char, a short int, or an int bit-field, or their signed or unsigned varieties, or an object that has enumeration type, may be used in an expression wherever an int or unsigned int may be used. If an int can represent all values of the original type, the value is converted to an int; otherwise it is converted to an unsigned int. These are called the integral promotions.

The integral promotions preserve value including sign. As discussed earlier, whether a "plain" char is treated as signed is implementation-defined.

So let's see. We have a subtraction, which is done on two operands that are converted to int. The int may or may not be signed. If it is signed, then sign-extension could be performed. A byte is not necessarily 8 bits.

You know -- I have no idea why I am writing this. I've already mentioned the issue several times in previous posts. Why do I waste my time?

Aia commented: You are not wasting your time. Those that really want to know are listening. +8
tux4life commented: The more you post, the better I understand what you actually mean :) +8
Ancient Dragon commented: If you think you are wasting your time then why not just post a link to a paper that explains what you mean? You just seem to be pissed off at me for some unknown reason. -5
ibykow 0 Newbie Poster

What about:

int stringcmp(char *s, char *t) {
  for(;*s++ == *t++; *s++)
    if(!*--s) 
      return 0;

  return *--s - *--t;
} 
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.