Char occurrence issue

Question

BigFormat 0 Light Poster

17 Years Ago

Hi guys, I've to read a txt file and produce a statistic of each alphabetic char occurrence.
I'm approaching this task reading char by char until EOF and using an int array to hold the number of occurrences of each character read so far!
So that's my solution:

int notalpha=0;
int nchar[26];
for (...) char[i] = 0; //set each char to 0

while (c ...){
    if(toupper(c) == 'A') nchar[0]++;
    else if (toupper(c) == 'B') nchar[1]++;
    else if ....
    else if (toupper(c) == 'Z') nchar[25]++;
    else notalpha++;
}

printf("A = %d, B = %d, ...", char[0], char[1] ...);

Obviously I'm confident it can be largely improved, since it's very bad style, but I don't see the light..
Is it possible to avoid the 26 rows if-else, and the horrible printf?

c

6 Contributors
10 Replies
431 Views
1 Day Discussion Span
Latest Post 17 Years Ago Latest Post by Salem

Narue 5,707 Bad Cop

17 Years Ago

>Is it possible to avoid the 26 rows if-else, and the horrible printf?
Yes, if you're willing to generalize the solution and use up more storage to do it:

#include <ctype.h>
#include <limits.h>
#include <stdio.h>

int main ( void )
{
  int lookup[UCHAR_MAX + 1] = {0};
  int ch;
  int i;

  while ( ( ch = getchar() ) != EOF )
    ++lookup[ch];

  for ( i = 0; i < UCHAR_MAX; i++ ) {
    if ( isgraph ( i ) )
      printf ( "%c = %d\n", (char)i, lookup[i] );
    else
      printf ( "%#x = %d\n", (unsigned)i, lookup[i] );
  }

  return 0;
}

WaltP 2,905 Posting Sage w/ dash of thyme

17 Years Ago

Since 'A' is the internal value 65, 'B' is 66, 'C' is 67, and so on, subtracting 'A' from your letter will give you 0, 1, 2, and so on. Therefore:

int idx;
int notalpha=0;
int nchar[26];
for (...) char[i] = 0; //set each char to 0

while (c ...)
{
    if (isalpha(c))               // is it an alphabetic character?
    {
        idx = toupper(c) - 'A';   // Subtract A to get an offset from 0 
        nchar[idx]++;             // use that as an index into nchar
    }
    else
    {
        notalpha++;
    }
}

printf("A = %d, B = %d, ...", char[0], char[1] ...);

Narue 5,707 Bad Cop

17 Years Ago

>Since 'A' is the internal value 65, 'B' is 66, 'C' is 67, and so on,
>subtracting 'A' from your letter will give you 0, 1, 2, and so on.
This is a non-portable assumption.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Aia 1,977 Nearly a Posting Maven · Answer 1 · 2007-09-27T04:15:44+00:00

>Since 'A' is the internal value 65, 'B' is 66, 'C' is 67, and so on,
>subtracting 'A' from your letter will give you 0, 1, 2, and so on.
This is a non-portable assumption.

Are you refering to Extended Binary Coded Decimal Interchange Code (EBCDIC)?

Narue 5,707 Bad Cop Team Colleague · Answer 2 · 2007-09-27T06:00:56+00:00

>Are you refering to Extended Binary Coded Decimal Interchange Code (EBCDIC)?
I'm referring to the standard leaving values of the character set implementation-defined. But EBCDIC does fit into the category of character sets where this kind of assumption would blow up on you.

Dave Sinkula 2,398 long time no c Team Colleague · Answer 3 · 2007-09-27T06:01:19+00:00

Are you refering to Extended Binary Coded Decimal Interchange Code (EBCDIC)?

Pretty much. But portability is portability.

I remember thinking it was really cool to do stuff like x - 32 instead of toupper , or somesuch. Until I found out it was both more cryptic and slower.

WaltP 2,905 Posting Sage w/ dash of thyme Team Colleague · Answer 4 · 2007-09-27T10:38:35+00:00

>Since 'A' is the internal value 65, 'B' is 66, 'C' is 67, and so on,
>subtracting 'A' from your letter will give you 0, 1, 2, and so on.
This is a non-portable assumption.

True. I forgot that his school may be learning on an IBM mainframe, rather than the universally accessible school and home computers that run ASCII. :icon_rolleyes:

Dave Sinkula 2,398 long time no c Team Colleague · Answer 5 · 2007-09-27T11:29:04+00:00

That it: 20/20 hindsight, and 20/800 foresight. :icon_rolleyes:

An Ariane 5 can crash in the back yard, and we'll still program it the same.

Narue 5,707 Bad Cop Team Colleague · Answer 6 · 2007-09-27T19:07:22+00:00

>I forgot that his school may be learning on an IBM mainframe, rather than the universally
>accessible school and home computers that run ASCII.
That's the very attitude that caused a mad rush to convert two digit years into four digit years at the turn of the millennium.

Salem 5,199 Posting Sage · Answer 7 · 2007-09-27T20:00:35+00:00

> That's the very attitude that caused a mad rush to convert two digit years into four digit years
Whilst memory nowadays seems to be cheap enough to be given away in cereal packets, it was not always so. Inflation and Moores law has seen to that.

Back in the days of yore, memory was $1 per byte (when $1 was worth a lot to Joe Average). At that level, you needed a damn good explanation for every byte used.

Later on of course, this wasn't an excuse, but then the need to maintain "bug compatibility" with older systems was the reason.