When casting isn't enough...

Question

N1GHTS 102 Posting Whiz in Training

14 Years Ago

I have a weird problem and I want to know if any of you have any ideas on how to fix this.

If I do this...

swprintf(Dest, 500, L"G%sY", "AMEDA");

(wchar_t*)Dest contains the following text...

GAMEDAY

But if I do this...

swprintf(Dest, 500, L"G%sY", L"AMEDA");

(wchar_t*)Dest now contains...

GAY

So it seems that the compiler is converting the text to a wide character array, but swprintf() still thinks that %s is pointing to a simple character array. I've tried this too...

swprintf(Dest, 500, L"G%sY", (const wchar_t *)L"AMEDA");

Same thing happens.

I know this is supposed to work because according to THIS site...

If no l modifier is present: The ''const char *'' argument is expected to be a pointer to an array of character type (pointer to a string) containing a multibyte character sequence beginning in the initial shift state. Characters from the array are converted to wide characters (each by a call to the mbrtowc() function with a conversion state starting in the initial state before the first byte). The resulting wide characters are written up to (but not including) the terminating null wide character. If a precision is specified, no more wide characters than the number specified are written. Note that the precision determines the number of wide characters written, not the number of bytes or screen positions. The array must contain a terminating null byte, unless a precision is given and it is so small that the number of converted wide characters reaches it before the end of the array is reached. If an l modifier is present: The ''const wchar_t *'' argument is expected to be a pointer to an array of wide characters. Wide characters from the array are written up to (but not including) a terminating null wide character. If a precision is specified, no more than the number specified are written. The array must contain a terminating null wide character, unless a precision is given and it is smaller than or equal to the number of wide characters in the array.

I am using the latest GCC in the latest Ubuntu. Any ideas?

c

Edited 14 Years Ago by N1GHTS because: n/a

3 Contributors
16 Replies
185 Views
1 Day Discussion Span
Latest Post 14 Years Ago Latest Post by sree_ec

Ancient Dragon 5,243 Achieved Level 70

14 Years Ago

you are seeing the difference between %s and %S (not the capitalization os s and S).
with swprintf():
%s assumes the parameter is a wchar_t*
%S assumes the prameter is char*

The rule is just the opposite with sprintf()

This illustrates the difference

int main()
{
    wchar_t buf[255] = {0};

    swprintf(buf, sizeof(buf)/sizeof(buf[0]), L"G%sY", L"AMEDA");
    printf("%S\n", buf);
    swprintf(buf, sizeof(buf)/sizeof(buf[0]), L"G%SY", "AMEDA");
    printf("%S\n", buf);

}

Edited 14 Years Ago by Ancient Dragon because: n/a

N1GHTS commented: This answer saved me long hours of trial and error +1

sree_ec commented: I dint know that %s and %S existed :D +1

sree_ec 10 Junior Poster

14 Years Ago

I have a weird problem and I want to know if any of you have any ideas on how to fix this.
If I do this...
swprintf(Dest, 500, L"G%sY", "AMEDA");
(wchar_t*)Dest contains the following text...
GAMEDAY
But if I do this...
swprintf(Dest, 500, L"G%sY", L"AMEDA");
(wchar_t*)Dest now contains...
GAY
So it seems that the compiler is converting the text to a wide character array, but swprintf() still thinks that %s is pointing to a simple character array. I've tried this too...
swprintf(Dest, 500, L"G%sY", (const wchar_t *)L"AMEDA");
Same thing happens.
I know this is supposed to work because according to THIS site...
I am using the latest GCC in the latest Ubuntu. Any ideas?

I don't know the answer to your question. But I am putting myself in into this for learning purpose.
I have not seen this function used in any c program that I have come across in my short experience with C. Can you brief about what exactly is the purpose of wide characters. I referred Google and now know what is wide characters but if you can give some examples where all this is used in real life, I would be grateful.

Ancient Dragon 5,243 Achieved Level 70

14 Years Ago

I don't know the answer to your question. But I am putting myself in into this for learning purpose.
I have not seen this function used in any c program that I have come across in my short experience with C. Can you brief about what exactly is the purpose of wide characters. I referred Google and now know what is wide characters but if you can give some examples where all this is used in real life, I would be grateful.

Wide characters wchar_t is used in UNICODE programs, which may or may not be English. Some character, such as Chinese, will not fit in one-byte char, so they use wchar_t. The sizeof(wchar_t) is not consistent among operating systems -- MS-Windows sizeof(wchar_t) == 2 while *nix its 4. So you have to be very careful about transferring unicode-witten files from one os to another.

sree_ec commented: Thanks... +1

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

N1GHTS 102 Posting Whiz in Training · Answer 1 · 2010-09-28T20:44:25+00:00

@sree_ec

printf() -- Prints formatted text to the screen.
sprintf() -- Prints formatted text to a string
swprintf() -- Prints formatted text to a wide character string.

Unlike printf() and sprintf(), swprintf() includes a "maxlen" to prevent buffer overflow.

So you use it any time you need to write formatted text to a wide character array.

In my application, I use it with __builtin_apply() and hack the argument list to modify the incoming parameters to auto convert some structures into compatible swprintf() elements.

N1GHTS 102 Posting Whiz in Training · Answer 2 · 2010-09-29T06:22:17+00:00

Acient Dragon's advice was correct, only it was backwards (%S corresponded to wchar_t while %s corresponded to char).

Thank you very much!

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 3 · 2010-09-29T09:28:21+00:00

>>Acient Dragon's advice was correct, only it was backwards (%S corresponded to wchar_t while %s corresponded to char).

It depends on whether you use sprintf() or swprintf(). The usage of %s and %S is reversed for each of those two functions. I tested the code I posted using vc++ 2010 express so I know it's correct for swprintf(). If you use sprintf() you will get the opposite results.

sree_ec 10 Junior Poster · Answer 4 · 2010-09-29T09:47:56+00:00

you are seeing the difference between %s and %S (not the capitalization os s and S).
with swprintf():
%s assumes the parameter is a wchar_t*
%S assumes the prameter is char*
The rule is just the opposite with sprintf()
This illustrates the difference
int main()
{
    wchar_t buf[255] = {0};

    swprintf(buf, sizeof(buf)/sizeof(buf[0]), L"G%sY", L"AMEDA");
    printf("%S\n", buf);
    swprintf(buf, sizeof(buf)/sizeof(buf[0]), L"G%SY", "AMEDA");
    printf("%S\n", buf);

}

Thanks.. but I dint understand correctly.
This second printf statement did not print anything on my linux system. Why so?
I have some doubts
1. We are using L infront of format string to tell that It is expecting a wide character array Isnt it?
2. If 1 is yes, we should always use wide character array right? that means L"AMEDA" is correct.
The first printf is printing GAY. Why only 'A' is taken by %s? If you can share me some links on I/O, it will be helpful.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 5 · 2010-09-29T10:35:16+00:00

On Ubuntu with Code::Blocks it's like below. Apparently MS-Windows and *nix work opposite because Ubuntu worked like Nights mentioned.

For some unknown reason CB could not find the function prototype from wchar.h for swprintf(), so I just manually copied it into the C code. I have not bothered to figure out why that happened, must be some wort of #define problem.

#include <stdio.h>
#include <wchar.h>
#include <string.h>
extern int swprintf (wchar_t *__restrict __s, size_t __n,
		     __const wchar_t *__restrict __format, ...);

int main()
{
    wchar_t buf[255] = {0};

    swprintf(buf, sizeof(buf)/sizeof(buf[0]), L"G%SY", L"AMEDA");
    printf("%S\n", buf);
    swprintf(buf, sizeof(buf)/sizeof(buf[0]), L"G%sY", "AMEDA");
    printf("%S\n", buf);
    getchar();
    return 0;
}

N1GHTS 102 Posting Whiz in Training · Answer 6 · 2010-09-29T11:22:46+00:00

@sree_ec

1. We are using L infront of format string to tell that It is expecting a wide character array Isnt it?

"L" in front of a string tells the compiler to store the character array constant into a wchar_t* array.

So "HELLO" written L"HELLO" gets stored like this in "H E L L O " with a final 4 byte null character. (In Windows its half that length)

Why only 'A' is taken by %s?

Those spaces created by my example above are not spaces at all, they are null characters. That's because a wchar_t* is describing an array whose element is each bigger than 8 bits, so in linux...

L"H" -> [0x48, 0x00, 0x00, 0x00] [0x00, 0x00, 0x00, 0x00]

The first set of 4 bytes represents an "H" in linux wide character format. The second 4 bytes is the terminating null character.

If you use a wchar_t array in any place expecting a char, you will only get the first ASCII letter because the next immediate letter looks like a null terminating character. It doesn't know that its part of the first letter.

If you use any language other than English in your source code, or you want to give your program international support, always try to use wchar_t* or an int array instead of the char array. Not all functions support the wide character format, so in those cases you must use char*, but whenever you can try to use the alternatives. It doesn't hurt anything.

@Ancient Dragon

I had the same problem with the swprintf() function header that you had and I posted that question recently on the C forum. My solution was to do the same thing you did. It's very strange.

sree_ec 10 Junior Poster · Answer 7 · 2010-09-29T12:14:31+00:00

@sree_ec
1. We are using L infront of format string to tell that It is expecting a wide character array Isnt it?
"L" in front of a string tells the compiler to store the character array constant into a wchar_t* array.
So "HELLO" written L"HELLO" gets stored like this in "H E L L O " with a final 4 byte null character. (In Windows its half that length)
Why only 'A' is taken by %s?
Those spaces created by my example above are not spaces at all, they are null characters. That's because a wchar_t* is describing an array whose element is each bigger than 8 bits, so in linux...
L"H" -> [0x48, 0x00, 0x00, 0x00] [0x00, 0x00, 0x00, 0x00]
The first set of 4 bytes represents an "H" in linux wide character format. The second 4 bytes is the terminating null character.
If you use a wchar_t array in any place expecting a char, you will only get the first ASCII letter because the next immediate letter looks like a null terminating character. It doesn't know that its part of the first letter.
If you use any language other than English in your source code, or you want to give your program international support, always try to use wchar_t* or an int array instead of the char array. Not all functions support the wide character format, so in those cases you must use char*, but whenever you can try to use the alternatives. It doesn't hurt anything.

@Ancient Dragon
I had the same problem with the swprintf() function header that you had and I posted that question recently on the C forum. My solution was to do the same thing you did. It's very strange.

@NIGHTS
Thanks..
So do you mean to say a character in wide format takes 8 bytes [instead of 1 byte of normal char] including the null character in the case of linux and 4 bytes in the case of windows?

The compilation error is indeed because of a Macro which I think is a problem in giving options to gcc. Anyways, It doesnt matter much

sree_ec 10 Junior Poster · Answer 8 · 2010-09-29T12:22:32+00:00

>>Acient Dragon's advice was correct, only it was backwards (%S corresponded to wchar_t while %s corresponded to char).
It depends on whether you use sprintf() or swprintf(). The usage of %s and %S is reversed for each of those two functions. I tested the code I posted using vc++ 2010 express so I know it's correct for swprintf(). If you use sprintf() you will get the opposite results.

@AncientDragon
Why we want to use sprintf in the case of wide characters in the first place[when swprintf is available]

Also How can %s and %S depend upon OS/compiler ? :O

N1GHTS 102 Posting Whiz in Training · Answer 9 · 2010-09-29T12:47:48+00:00

So do you mean to say a character in wide format takes 8 bytes [instead of 1 byte of normal char] including the null character in the case of linux and 4 bytes in the case of windows?

8 bits = 1 byte.

A normal char is 1 byte. A wide char is 4 bytes in linux, 2 bytes in windows.

sree_ec 10 Junior Poster · Answer 10 · 2010-09-29T13:56:09+00:00

So do you mean to say a character in wide format takes 8 bytes [instead of 1 byte of normal char] including the null character in the case of linux and 4 bytes in the case of windows?
8 bits = 1 byte.
A normal char is 1 byte. A wide char is 4 bytes in linux, 2 bytes in windows.

>>
L"H" -> [0x48, 0x00, 0x00, 0x00] [0x00, 0x00, 0x00, 0x00]

This is 8bytes right? Thats why I asked .. Now H takes 8 bytes..

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 11 · 2010-09-29T19:50:56+00:00

%ls also works (see this man page) swprintf(buf, sizeof(buf)/sizeof(buf[0]), L"G%lsY", L"AMEDA");

N1GHTS 102 Posting Whiz in Training · Answer 12 · 2010-09-29T21:10:01+00:00

@sree_ec

This is 8bytes right? That's why I asked .. Now H takes 8 bytes..

L"H" <-- This is 8 bytes because it is stores 2 characters. 2 x 4 = 8. It stores the letter H and \0, the null terminating character. In C, every string contains one extra letter which is added to the end by the compiler, and that's the null terminator which alerts many standard C library functions to know when the string actually ends, because without it the functions would not see the end of the string and keep on going, which is very bad.

So in linux, "H" is 2 bytes, "HI" is 3 bytes, L"H" is 8 bytes, L"HI" is 12 bytes.

sree_ec 10 Junior Poster · Answer 13 · 2010-09-29T21:34:56+00:00

@sree_ec
This is 8bytes right? That's why I asked .. Now H takes 8 bytes..
L"H" <-- This is 8 bytes because it is stores 2 characters. 2 x 4 = 8. It stores the letter H and \0, the null terminating character. In C, every string contains one extra letter which is added to the end by the compiler, and that's the null terminator which alerts many standard C library functions to know when the string actually ends, because without it the functions would not see the end of the string and keep on going, which is very bad.
So in linux, "H" is 2 bytes, "HI" is 3 bytes, L"H" is 8 bytes, L"HI" is 12 bytes.

Thanks !
Understood.. I just thought that each character is followed by a null character of 4 bytes, in the case of wide...Now the confusion is cleared...!