I am using delphi 2010 in WinXP. In C# and Java there are Normalizer function. It can transform letters with diacritics into ASCII (remove those accent marks). I do not know if there is Normalizer in delphi. What I am testing is below, but failed. I do not know how to solve it.
I am using the NormalizeString

Test for stripping accents

aáeéiíoóöuúü AÁEÉIÍOÓÖUÚÜ --> aaeeiiooouuu AAEEIIOOOUUU

Type
...

const
  NormalizationD=2;

var
  Form1: TForm1;

implementation

{$R *.dfm}

function NormalizeString(NormForm: Integer; lpSrcString: LPCWSTR; cwSrcLength: Integer;
 lpDstString: LPWSTR; cwDstLength: Integer): Integer; stdcall; external 'C:\WINDOWS\system32\normaliz.dll';


function NormalizeText(Str: string): string;
var
  nLength: integer;
  c: char;
  i: integer;
  temp: string;
  CatStr:string;

begin

  SetLength(temp, Length(Str));
  nLength := NormalizeString(NormalizationD, PChar(Str), Length(Str), PChar(temp), 0); 
  CatStr:='';
  for i := 1 to length(temp) do
     begin
     c:=temp[i];
     if (TCharacter.GetUnicodeCategory(c) <> TUnicodeCategory.ucNonSpacingMark) or
        (TCharacter.GetUnicodeCategory(c) <> TUnicodeCategory.ucCombiningMark) then
         CatStr:=CatStr+c;
     end;
  result:=CatStr;
end;


procedure TForm1.Button3Click(Sender: TObject);
begin
memo1.lines.Text:= 'aáeéiíoóöuúü AÁEÉIÍOÓÖUÚÜ';
memo2.Lines.Add(NormalizeText(memo1.Lines.Text));
end;

procedure TForm1.Button4Click(Sender: TObject);
var
w:string;
begin
  w:='aáeéiíoóöuúü AÁEÉIÍOÓÖUÚÜ';
  memo2.Lines.Add(NormalizeText(w));
end;

Observation 1:

After starting test program, if clicking button3 for the first time, memo1 loads 'aáeéiíoóöuúü AÁEÉIÍOÓÖUÚÜ' but No string is added in memo2. If I click button3 for the second time, memo2 adds in 'aáeéiíoóöuúü AÁEÉIÍOÓÖUÚÜ', (same as memo1, NormalizeString not works).

Observation 2:

After starting test program, If I first click button4, memo2 adds in unknown characters '܀¨꽐', if I then click button3 memo2 still adds in those unknow characters, if I then click button3, mem2 adds in 'aáeéiíoóöuúü AÁEÉIÍOÓÖUÚÜ', (same as memo1, NormalizeString not works).

It seems NormalizeText does not work and
why I click button3 twice to have ouput in memo2.

How can I solve it? Thank you in advance.

Edited 1 Year Ago by ipage: character not shown

The declaration is correct, so the problem is elsewhere.
I don't have Delphi on this machine so can't test anything at the moment, but one simple problem is your code:

     if (TCharacter.GetUnicodeCategory(c) <> TUnicodeCategory.ucNonSpacingMark) or
        (TCharacter.GetUnicodeCategory(c) <> TUnicodeCategory.ucCombiningMark) then

This will ALWAYS be true. I guess you meant to write:

     if (TCharacter.GetUnicodeCategory(c) <> TUnicodeCategory.ucNonSpacingMark) and
        (TCharacter.GetUnicodeCategory(c) <> TUnicodeCategory.ucCombiningMark) then

I'll take another look at this later when I have access to Delphi.

Oh, and with your other problem (extra characters) I suspect there could be a problem/confusion with length. You are setting the output string to the same length as the source string, but that is probably incorrect since the API call will split single (accented) characters into multiple characters. I suggest you call NormalizeString twice: once to find out the length to which your output string should be set, and once to do the normalization. The result of the first call should be the final parameter in the second.

Try this:

function NormalizeString(NormForm: Integer; lpSrcString: LPCWSTR; cwSrcLength: Integer;
 lpDstString: LPWSTR; cwDstLength: Integer): Integer; stdcall; external 'C:\WINDOWS\system32\normaliz.dll';
function NormalizeText(Str: string): string;
var
  nLength: integer;
  c: char;
  i: integer;
  temp: string;
  CatStr:string;
begin
  nLength := NormalizeString(NormalizationD, PChar(Str), Length(Str), nil, 0); // New line
  SetLength(temp, nLength);  // Modified line

  nLength := NormalizeString(NormalizationD, PChar(Str), Length(Str), PChar(temp), nLength); // Modified line
  SetLength(temp, nLength);  // New line

  CatStr:='';
  for i := 1 to length(temp) do
     begin
     c:=temp[i];
     if (TCharacter.GetUnicodeCategory(c) <> TUnicodeCategory.ucNonSpacingMark) and // Modified line
        (TCharacter.GetUnicodeCategory(c) <> TUnicodeCategory.ucCombiningMark) then
         CatStr:=CatStr+c;
     end;
  result:=CatStr;
end;

@SalmiSoft, Thank you so much.
Your answer almost solves perfectly.
Below is what I tested:

function NormalizeText(Str: string): string;
var
  nLength: integer;
  c: char;
  i: integer;
  temp: string;
  CatStr:string;
begin

  nLength := NormalizeString(NormalizationD, PChar(Str), Length(Str), nil, 0); // New line
// nLength always returns Length(Str)*3, regardless of ASCII or other character sets

  SetLength(temp, nLength);
  // This is not very reliable, because if I input
  // 012345678901234567890123
  // aáeéiíoóöuúü AÁEÉIÍOÓÖUÚÜ
  // No string output in memo2
  // If I input
  // 01234567890123456789012   (without last 3)
  // aáeéiíoóöuúü AÁEÉIÍOÓÖUÚÜ
  // It will output exactly what I want. It seems nLength is not enough for
  // decomposing Str

   Setlength(temp,nLength*2);
   nLength := NormalizeString(NormalizationD, PChar(Str), Length(Str), PChar(temp), nLength*2);

 // The above output all correctly.
 // But nLength*2 seems waste, because nLength is already three times of Length(Str)

  Setlength(temp,Length(Str)*SizeOf(Str));
  nLength := NormalizeString(NormalizationD, PChar(Str), Length(Str), PChar(temp), Length(Str)*SizeOf(Str)); 
  SetLength(temp, nLength);  

  // The above also output fine. Is this correct ?

Thank you again for your help. I almost mark it solved.
Is that Length(Str)*SizeOf(Str) OK?

Oh that is interesting. It would be nice to see the code for NormalizeString to see what is really going on, but in the absence of that here is my understanding.

The documentation says:

Returns the length of the normalized string in the destination buffer. If cwDstLength is set to 0, the function returns the estimated buffer length required to do the actual conversion.

So the value returned by our first call to NormalizeString only gives us an estimate of the required buffer size and it turns out that the estimate is wrong for your test data. So when we call again the routine fails and the value returned by that call is a (more?) accurate value for the buffer size required. So we have to call yet again using that size of buffer and this should (at last) give us the result we want. So you have to check for errors when you call NormalizeString.

Putting that into code gives:

function NormalizeText(Str: string): string;
var
  nLength: integer;
  c: char;
  i: integer;
  temp: string;
  CatStr:string;
  TheProblem : Cardinal;
begin
  nLength := NormalizeString(NormalizationD, PChar(Str), Length(Str), nil, 0);
  SetLength(temp, nLength);

  nLength := 
    NormalizeString(NormalizationD, PChar(Str), Length(Str), PChar(temp), nLength);

  if nLength <= 0 then
  begin
    TheProblem := GetLastError;
    case TheProblem of

      ERROR_INSUFFICIENT_BUFFER:
        begin
          // Our previous call to NormalizeString gave an ESTIMATED size for the
          // buffer we need., but that estimate was wrong. We now have a new estimate.
          // Give it a try with that estmate and if it still fails, throw all our
          // toys out of the pram.
          nLength := // Try again to see if we can recover from the error
            NormalizeString( NormalizationD, PChar(Str), Length(Str),
                             PChar(temp), abs(nLength)); // Modified line  *2
          if nLength < 0 then
            raise EConvertError.Create
              ('The buffer was roo small or it was incorrectly set to NULL.');
        end;

      ERROR_INVALID_PARAMETER:
        raise EConvertError.Create   ('One of the parameter values was invalid.');

      ERROR_NO_UNICODE_TRANSLATION:
        raise EConvertError.CreateFmt
          ('Invalid Unicode (%s) in string at position %d',[Str[nLength], nLength]);

      ERROR_SUCCESS: nLength := 0; // Normalized OK but no results so return empty string

      else raise EConvertError.CreateFmt('Unknown error code %d',[TheProblem]);

    end;
  end;

  SetLength(temp, nLength);

  CatStr:='';
  for i := 1 to length(temp) do
     begin
     c:=temp[i];
     if (TCharacter.GetUnicodeCategory(c) <> TUnicodeCategory.ucNonSpacingMark) and
        (TCharacter.GetUnicodeCategory(c) <> TUnicodeCategory.ucCombiningMark) then
         CatStr:=CatStr+c;
     end;

  result:=CatStr;
end;

Your Length(Str)*SizeOf(Str) code makes no real sense by the way. A string is really a pointer to a structure containing metadata about the string and so SizeOf(SomeString) is the size of a pointer.

@SalmiSoft, thank you for your answer.
It solves.

Edited 1 Year Ago by ipage: characters not shown

This question has already been answered. Start a new discussion instead.