Hi guys, I've to read a txt file and produce a statistic of each alphabetic char occurrence.
I'm approaching this task reading char by char until EOF and using an int array to hold the number of occurrences of each character read so far!
So that's my solution:

int notalpha=0;
int nchar[26];
for (...) char[i] = 0; //set each char to 0

while (c ...){
    if(toupper(c) == 'A') nchar[0]++;
    else if (toupper(c) == 'B') nchar[1]++;
    else if ....
    else if (toupper(c) == 'Z') nchar[25]++;
    else notalpha++;
}

printf("A = %d, B = %d, ...", char[0], char[1] ...);

Obviously I'm confident it can be largely improved, since it's very bad style, but I don't see the light..
Is it possible to avoid the 26 rows if-else, and the horrible printf?

>Is it possible to avoid the 26 rows if-else, and the horrible printf?
Yes, if you're willing to generalize the solution and use up more storage to do it:

#include <ctype.h>
#include <limits.h>
#include <stdio.h>

int main ( void )
{
  int lookup[UCHAR_MAX + 1] = {0};
  int ch;
  int i;

  while ( ( ch = getchar() ) != EOF )
    ++lookup[ch];

  for ( i = 0; i < UCHAR_MAX; i++ ) {
    if ( isgraph ( i ) )
      printf ( "%c = %d\n", (char)i, lookup[i] );
    else
      printf ( "%#x = %d\n", (unsigned)i, lookup[i] );
  }

  return 0;
}

Since 'A' is the internal value 65, 'B' is 66, 'C' is 67, and so on, subtracting 'A' from your letter will give you 0, 1, 2, and so on. Therefore:

int idx;
int notalpha=0;
int nchar[26];
for (...) char[i] = 0; //set each char to 0

while (c ...)
{
    if (isalpha(c))               // is it an alphabetic character?
    {
        idx = toupper(c) - 'A';   // Subtract A to get an offset from 0 
        nchar[idx]++;             // use that as an index into nchar
    }
    else
    {
        notalpha++;
    }
}

printf("A = %d, B = %d, ...", char[0], char[1] ...);

>Since 'A' is the internal value 65, 'B' is 66, 'C' is 67, and so on,
>subtracting 'A' from your letter will give you 0, 1, 2, and so on.
This is a non-portable assumption.

>Since 'A' is the internal value 65, 'B' is 66, 'C' is 67, and so on,
>subtracting 'A' from your letter will give you 0, 1, 2, and so on.
This is a non-portable assumption.

Are you refering to Extended Binary Coded Decimal Interchange Code (EBCDIC)?

>Are you refering to Extended Binary Coded Decimal Interchange Code (EBCDIC)?
I'm referring to the standard leaving values of the character set implementation-defined. But EBCDIC does fit into the category of character sets where this kind of assumption would blow up on you.

Are you refering to Extended Binary Coded Decimal Interchange Code (EBCDIC)?

Pretty much. But portability is portability.

I remember thinking it was really cool to do stuff like x - 32 instead of toupper , or somesuch. Until I found out it was both more cryptic and slower.

>Since 'A' is the internal value 65, 'B' is 66, 'C' is 67, and so on,
>subtracting 'A' from your letter will give you 0, 1, 2, and so on.
This is a non-portable assumption.

True. I forgot that his school may be learning on an IBM mainframe, rather than the universally accessible school and home computers that run ASCII. :icon_rolleyes:

That it: 20/20 hindsight, and 20/800 foresight. :icon_rolleyes:

An Ariane 5 can crash in the back yard, and we'll still program it the same.

>I forgot that his school may be learning on an IBM mainframe, rather than the universally
>accessible school and home computers that run ASCII.
That's the very attitude that caused a mad rush to convert two digit years into four digit years at the turn of the millennium.

> That's the very attitude that caused a mad rush to convert two digit years into four digit years
Whilst memory nowadays seems to be cheap enough to be given away in cereal packets, it was not always so. Inflation and Moores law has seen to that.

Back in the days of yore, memory was $1 per byte (when $1 was worth a lot to Joe Average). At that level, you needed a damn good explanation for every byte used.

Later on of course, this wasn't an excuse, but then the need to maintain "bug compatibility" with older systems was the reason.

This article has been dead for over six months. Start a new discussion instead.