I wrote a function in C that receives a pointer to a character array as argument and switch all accented letters with their corresponding character, the code is as follows:

void removeacc(char *texto){
        char a[]="\x83\x84\x85\x86\xa0\xc6",A[]="\x8e\x8f\xb5\xb6\xb7\xc7";
        char e[]="\x88\x89\x8a\x82",E[]="\x90\xd2\xd3\xd4";
        char i[]="\x8b\x8c\x8d\xa1",I[]="\xd6\xd7\xd8\xde";
        char o[]="\x93\x94\x95\xa2\xe4",O[]="\x99\xe0\xe2\xe3\xe5";        
        char u[]="\x81\x96\x97\xa3",U[]="\x9a\xe9\xea\xeb";
        char n[]="\xa4",N[]="\xa5";       
        char c[]="\x87",C[]="\x80";               
        char y[]="\x98\xec",Y[]="\xed";
        int x,z;
        for(x = 0; x < strlen(texto); x++){
              for(z = 0; z < 7; z++){
                    if(texto[x] == a[z]) texto[x] = 'a';
                    if(texto[x] == A[z]) texto[x] = 'A';
              }
              for(z = 0; z < 6; z++){
                    if(texto[x] == o[z]) texto[x] = 'o';
                    if(texto[x] == O[z]) texto[x] = 'O';
              }              
              for(z = 0; z < 5; z++){
                    if(texto[x] == e[z]) texto[x] = 'e';
                    if(texto[x] == E[z]) texto[x] = 'E';
                    if(texto[x] == i[z]) texto[x] = 'i';
                    if(texto[x] == I[z]) texto[x] = 'I';
                    if(texto[x] == u[z]) texto[x] = 'u';
                    if(texto[x] == U[z]) texto[x] = 'U';
              }
              if(texto[x] == n[0]) texto[x] = 'n';
              if(texto[x] == N[0]) texto[x] = 'N';              
              if(texto[x] == c[0]) texto[x] = 'c';
              if(texto[x] == C[0]) texto[x] = 'C';              
              if(texto[x] == y[0]) texto[x] = 'y';
              if(texto[x] == y[1]) texto[x] = 'y';
              if(texto[x] == Y[0]) texto[x] = 'Y';              
        }
}

As you can see it searches for escape characters corresponding to the accented letters and substitute. My brother looked at my code and said it was not polished enough and could be more efficient, he also stated that the following if's should be encapsulated as exeption from the previous.

  1. Would this really boost performance?
  2. And what changes you guys think could I do to make it better?
  3. Kinda off-topic but is there a method to read text from a file with the correct accented letters? fgets puts random letters instead of the accented letters :(

Recommended Answers

All 2 Replies

I agree that the code is not polished. I disagree though with the proposed "encapsulation of ifs". Your approach requires 25 comparisons for each non-accented letter - just to leave it alone. Really wasteful, isn't it?

Consider a redesign based on a lookup table. Such table indexed by the original character, with the value being a the substitution one. For example, translate[0x83] is 'a', translate[0xc6] is also 'a', etc. This way the code collapses to a short simple loop:

for(x = 0; x < strlen(texto); x++)
        text[x] = translate[texto[x]];

Of course, you need to put some extra effort to populate the translation table.

I agree that the code is not polished. I disagree though with the proposed "encapsulation of ifs". Your approach requires 25 comparisons for each non-accented letter - just to leave it alone. Really wasteful, isn't it?

Consider a redesign based on a lookup table. Such table indexed by the original character, with the value being a the substitution one. For example, translate[0x83] is 'a', translate[0xc6] is also 'a', etc. This way the code collapses to a short simple loop:

for(x = 0; x < strlen(texto); x++)
        text[x] = translate[texto[x]];

Of course, you need to put some extra effort to populate the translation table.

Thank you very much sir, this indeed made my code way better.
This is what I did:

void removeacc(char *texto){
    char converter[256]={'\x0','\x1','\x2','\x3','\x4','\x5','\x6','\x7','\x8','\x9','\xa','\xb','\xc','\xd','\xe','\xf','\x10','\x11','\x12','\x13','\x14','\x15','\x16','\x17','\x18','\x19','\x1a','\x1b','\x1c','\x1d','\x1e','\x1f','\x20','\x21','\x22','\x23','\x24','\x25','\x26','\x27','\x28','\x29','\x2a','\x2b','\x2c','\x2d','\x2e','\x2f','\x30','\x31','\x32','\x33','\x34','\x35','\x36','\x37','\x38','\x39','\x3a','\x3b','\x3c','\x3d','\x3e','\x3f','\x40','\x41','\x42','\x43','\x44','\x45','\x46','\x47','\x48','\x49','\x4a','\x4b','\x4c','\x4d','\x4e','\x4f','\x50','\x51','\x52','\x53','\x54','\x55','\x56','\x57','\x58','\x59','\x5a','\x5b','\x5c','\x5d','\x5e','\x5f','\x60','\x61','\x62','\x63','\x64','\x65','\x66','\x67','\x68','\x69','\x6a','\x6b','\x6c','\x6d','\x6e','\x6f','\x70','\x71','\x72','\x73','\x74','\x75','\x76','\x77','\x78','\x79','\x7a','\x7b','\x7c','\x7d','\x7e','\x7f','C','u','e','a','a','a','a','c','e','e','e','i','i','i','A','A','E','\x91','\x92','o','o','o','u','u','y','O','U','\x9b','\x9c','\x9d','\x9e','\x9f','a','i','o','u','n','N','\xa6','\xa7','\xa8','\xa9','\xaa','\xab','\xac','\xad','\xae','\xaf','\xb0','\xb1','\xb2','\xb3','\xb4','A','A','A','\xb8','\xb9','\xba','\xbb','\xbc','\xbd','\xbe','\xbf','\xc0','\xc1','\xc2','\xc3','\xc4','\xc5','a','A','\xc8','\xc9','\xca','\xcb','\xcc','\xcd','\xce','\xcf','\xd0','\xd1','E','E','E','\xd5','I','I','I','\xd9','\xda','\xdb','\xdc','\xdd','I','\xdf','O','\xe1','O','O','o','O','\xe6','\xe7','\xe8','U','U','U','y','Y','\xee','\xef','\xf0','\xf1','\xf2','\xf3','\xf4','\xf5','\xf6','\xf7','\xf8','\xf9','\xfa','\xfb','\xfc','\xfd','\xfe','\xff'};
    int id,x;
    for(x = 0; x < strlen(texto); x++){
        id = (int) texto[x];
        if(id < 0) id += 256;
        texto[x] = converter[id];
    }
}
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.