Been studying what I thought was a fairly authoritative article about BSTRs by Bruce McKinney...

http://www.ecs.syr.edu/faculty/fawcett/handouts/cse775/Presentations/BruceMcKinneyPapers/COMstrings.htm

where he states that the four bytes before the start of the BSTR contains the string's length. Here is the exact quote...

What Is a BSTR? The BSTR type is actually a typedef, which in typical Windows include file fashion, is made up
of more typedefs and defines. You can follow the twisted path yourself, but here's what it boils down to:

typedef wchar_t * BSTR;

Hmmm. A BSTR is actually a pointer to Unicode characters. Does that look familiar? In case you don't recognize
this, let me point out a couple of similar typedefs:

typedef wchar_t * LPWSTR;
typedef char * LPSTR;

So if a BSTR is just a pointer to characters, how is it different from the null-terminated strings that C++
programmers know so well? Internally, the difference is that there's something extra at the start and end of the
string. The string length is maintained in a long variable just before the start address being pointed to, and the
string always has an extra null character after the last character of the string. This null isn't part of the
string, and you may have additional nulls embedded in the string.

However, my tests prove this to be most untrue. Here is a program with the output directly afterwards where I'm using a simple string as follows for testing purposes...

OLECHAR szOleChar[]=L"Here Is Some Text";

That string contains 17 characters. An Ansi string would need a memory allocation of 18 characters to hold it, and I expect a wide character string something like 36 characters. I'd expect the system string functions in oleauto.dll that create and manage BSTRs would need a few more if they are going to preface these things with some kind of length information whether it be counts of characters or actual memory allocation counts of bytes. Having said that, can someone explain to me the results of the below program where I'm coming up with 34 bytes stored in the four byte slot right before the start of the BSTR???

#include <windows.h>
#include <stdio.h>

int main(void)
{
 OLECHAR szOleChar[]=L"Here Is Some Text";
 unsigned int iLen=0;
 int* pBStrLen=0;
 wchar_t* pChar;
 BSTR strText;

 iLen=wcslen(szOleChar);
 printf("iLen                  = %u\n",iLen);
 strText=SysAllocStringLen(szOleChar,iLen);
 printf("SysStringLen(strText) = %u\n", SysStringLen(strText));
 wprintf(L"strText               = %s\n", strText);
 printf("strText               = %u\n", (unsigned)strText);
 pBStrLen=(int*)strText;
 pBStrLen--;
 printf("pBStrLen              = %u\n", (unsigned)pBStrLen);
 printf("*pBStrLen             = %u\n\n",(unsigned int)*pBStrLen);
 pChar=strText;
 for(unsigned int i=0; i<iLen; i++)
 {
     wprintf(L"%u\t%u\t\t%c\n", i, pChar, *pChar);
     pChar++;
 }
 SysFreeString(strText);
 getchar();

 return 0;
}

/*
iLen                  = 17
SysStringLen(strText) = 17
strText               = Here Is Some Text
strText               = 2312572
pBStrLen              = 2312568
*pBStrLen             = 34

0       2312572         H
1       2312574         e
2       2312576         r
3       2312578         e
4       2312580
5       2312582         I
6       2312584         s
7       2312586
8       2312588         S
9       2312590         o
10      2312592         m
11      2312594         e
12      2312596
13      2312598         T
14      2312600         e
15      2312602         x
16      2312604         t
*/

Recommended Answers

All 6 Replies

The four bytes before the beginning of the BSTR is the number of bytes allocated to the BSTR, which may or may not be the length of the string. In fact the BSTR need not contain a null-terminated string at all (either UNICODE or ascii). It could contain binary data of any type. It's up to your program to interpret the contents of the BSTR correctly.

In the example code you posted, the value of 34 is correct because 17 * sizeof(wchar_t) is 17 * 2 = 34. Note that in this case the string is NOT null terminated because you didn't tell SysAllocStringLen() to include that. If you wanted the null terminator then you have to add 1 to the length of the string.

Run the same program on *nix and you will probably find different results because sizeof(wchar_t) on MS-Windows (2) is different then it is on *nix (4). So if you transmit that BSTR from MS-Windows to *nix (or vice versa) then some sort of translation will need to be made.

Thanks a lot for the thoughts and clarification Ancient Dragon. It appears to me almost that a BSTR could be used very similiarly to a VARIANT in Visual Basic where just about anything or maybe even anything could be passed through a BSTR parameter, and it would be up to the receiving function/object to determine what it is and what to do with it. It further appears to me that the various BSTR system string functions plus the length prefixed data would allow the receiving function/object to determine specifically whether a ansi or wide character string is contained in the BSTR. What are your thoughts on this?

That actually is where I'm trying to go with this. I would like to build a GUI Exe COM server in the PowerBASIC programming language, and while that programming language supports the creation of COM dlls very handily, it doesn't natively support the creation of Exe servers. So I'm having to go the low level C route and construct it in a manner very much like in Jeff Glatt's article "COM In Plain C"....

http://www.codeproject.com/KB/COM/com_in_c1.aspx

String data types should be BSTRs so that non-C/C++ clients can easily connect, and it appears to me that within the server I'll need to examine incomming BSTRs to determine whether they are unicode or ansi. I have this server written in C++ and it works fine. What I'm trying to do is translate it to PowerBASIC, and thats where I'm running into these difficulties.

Below is a somewhat modified version of my example program above followed with the output where I changed the BSTR "Here Is Some Text" to an ansi string. When doing that the string length returned by strlen() is 17; the length returned by SysStringByteLen() is 17; the length returned by my pointer decremented from the start of the string to get the prefix is 17; and the SysStringLen() is 8!!!!

#include <windows.h>
#include <string.h>
#include <stdio.h>

int main(void)
{
 char szChar[]="Here Is Some Text";
 unsigned int iLen=0;
 int* pBStrLen=0;
 char* pChar;
 BSTR strText;

 iLen=strlen(szChar);
 printf("iLen                      = %u\n",iLen);
 strText=SysAllocStringByteLen(szChar,iLen);
 printf("SysStringByteLen(strText) = %u\n", SysStringByteLen(strText));
 printf("SysStringLen(strText)     = %u\n", SysStringLen(strText));
 printf("strText                   = %s\n", (char*)strText);
 printf("strText                   = %u\n", (unsigned)strText);
 pBStrLen=(int*)strText;
 pBStrLen--;
 printf("pBStrLen                  = %u\n", (unsigned)pBStrLen);
 printf("*pBStrLen                 = %u\n\n",(unsigned int)*pBStrLen);
 pChar=(char*)strText;
 for(unsigned int i=0; i<iLen; i++)
 {
     printf("%u\t%u\t\t%c\n", i, (unsigned)pChar, *pChar);
     pChar++;
 }
 SysFreeString(strText);
 getchar();

 return 0;
}

/*
iLen                      = 17
SysStringByteLen(strText) = 17
SysStringLen(strText)     = 8
strText                   = Here Is Some Text
strText                   = 2315548
pBStrLen                  = 2315544
*pBStrLen                 = 17

0       2315548         H
1       2315549         e
2       2315550         r
3       2315551         e
4       2315552
5       2315553         I
6       2315554         s
7       2315555
8       2315556         S
9       2315557         o
10      2315558         m
11      2315559         e
12      2315560
13      2315561         T
14      2315562         e
15      2315563         x
16      2315564         t
*/

The part I haven't been able to think through yet is whether the server should require wide character strings be passed in. If that would be the case I wouldn't have to worry about the ansi situation.

>>It appears to me almost that a BSTR could be used very similiarly to a VARIANT in Visual Basic

Not really. C also has a VARIANT structure which, among other things, contains a BSTR. The VARIANT is an enumeration of a lot of different pointers and data types. In C its main purpose is for COM, but can be useful to VB as well. When VB passes a string to C it is usually BSTR.

>>and the SysStringLen() is 8!!!!

Why? Should be obvious is you look for that function in MSDN. That function expects the BSTR to contain a UNICODE string (wchar_t*), which in your case it does not (char *). As I said previously, you, the programmer, are responsible for the contents of BSTR, and its up to you to use the appropriate win32 api functions to deal with it.

Thanks again Ancient Dragon! I'll mark this post as solved because you helped me clarify my thoughts. And, you've been a member here for 4 - 1/2 years * 365 days / year = 1642 days and you've solved 1625 posts so that's about a post per day with only a few sick days per year!

>>with only a few sick days per year!
What sick days?? :) Don't be fooled by that Solved Threads count. The count gets increments for all threads that I post in, whether I helped solve them or not. The absolute post count is a better gauge of someone's activity. This topic has been discussed several times in DaniWeb Community Feedback forum.

Oooooh! I didn't know that!:)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.