1. Is there a character, that can be written/read by system (C#/.NET), but can't be written by any standard keyboard (European, Nordic, Slavic, Cyrlic, Arabic, Mongolian, African, Greek and Mandarin). What I need this for is seperation of characters, I need user to input own character and that I can split them like "Oh, here's break and here's break", I will encrypt it, but I need to make enough sure that characters that are split, are actually split and it's not just content of string.

2. Is x00 exploit possible with this?: http://msdn.microsoft.com/en-us/library/bb311038.aspx Can somebody paste x00 or change virtual address in RAM memory something to 00 and make program go insane?

Edited 2 Years Ago by pritaeas: Moved to software.

Is there a character, that can be written/read by system (C#/.NET), but can't be written by any standard keyboard

Strings in .NET are Unicode, so you have plenty of options. Something like '\xFFFD' ("Replacement Character") might be the closest to what you're looking for. But that seems overkill; you're probably safe using tabs.

Is x00 exploit possible with this?

No. .NET strings can have embedded zeros. You won't be using '\0' to detect the end of the string; the runtime manages finding it for you.

Further reading: Strings (C# Programming Guide)

Comments
Thanks for the "character".

0x00 is a valid (null) character, and indicates the end of a string for C, C++, and other languages. Some languages encode strings with a length indicator (Java and others) and embedded 0x00 chars are just more data. 0x00 is only a problem if it is used to set a pointer to memory as it will then cause an exception (or segfault) when something tries to access it. Some poorly writtent languages may "go insane", but most will just give you an error and quit.

The bigger problem would be if a "valid" address (usually on the stack) is set to point to bogus/bad data, which could then result in "undesirable" results. This is how buffer overflows work for the most part.

Edited 2 Years Ago by rubberman

Comments
Yea, heard about buffer overflows.

Something like '\xFFFD' ("Replacement Character") might be the closest to what you're looking for. But that seems overkill; you're probably safe using tabs.

Before I actually waste my time, I need to know one thing before it, does StreamReader or any other IO in C# (in Visual Studio) treat \xFFFD as ONE character, or actually 6? This can heavily change whether I'm actually going to use the input > seperation > encryption method at all. What if user says "I'm gonna troll you so badly let's say \xFFFD", it will be there in data? When \xFFFD is written into file, is it treated as string of 6 characters or actually one character that is unreadable to users? And won't string that I mentioned earlier mess this all up? Sure, I could write this file now and make this character print and read from it, but further functions of my program are seperating, encrypting, reading and "moving forward" and I need to know how this character is stored in memory since my bells are ringing HEAVILY at this flaw (if any).

Edited 1 Year Ago by RikTelner

does StreamReader or any other IO in C# (in Visual Studio) treat \xFFFD as ONE character, or actually 6?

Only one character, but many possible encodings.

When in .NET-land, as opposed to C/C++-land et al., there's only one type of string, which is a sequential collection of UTF-16 encoded characters. That's all you get.

But of course data are typically stored and transmitted as sequences of bytes, so you have various implementations of the Encoding class to convert from bytes to text and back again.

What if user says "I'm gonna troll you so badly let's say \xFFFD", it will be there in data?

If a user manages to get it entered, sure. It'll be there.

But if you're going to delimit text with text, you'll always have the "what if I get a delimiter embedded in content" problem. This has been a fact of life for a long time: C strings can't have embedded zeros because they use it to mark the end of the text.

When \xFFFD is written into file, is it treated as string of 6 characters or actually one character that is unreadable to users?

That depends on which encoding scheme you use. With the .NET I/O library, you always have the option to specify the encoding, although there are some defaults so you don't have to when it's not needed.

Examples:
* ASCIIEncoding will write one byte, but it won't be that character. I think it uses '?'.
* UTF8Encoding will write the three-byte sequence EF BF BD
* UTF32Encoding always writes four bytes for each character.

And won't string that I mentioned earlier mess this all up?

If the delimiter is properly encoded, you'll get an extra delimiter. If your code that processes the string isn't written carefully, unexpected things may happen.

Otherwise, it depends on the encoder you use. For example, the ASCII encoder will just blindly read bytes and assume they're ASCII, and the UTF-8 encoder can either throw an exception or ignore the bad bytes at your discretion.

Sure, I could write this file now and make this character print and read from it, but further functions of my program are seperating, encrypting, reading and "moving forward"

Short answer: Pick an encoding and use it consistently.

I need to know how this character is stored in memory

UTF-16. But all anyone really needs to know is everything's a char. The worst that can happen is you get some gibberish input. What happens after that is up to you.

since my bells are ringing HEAVILY at this flaw (if any).

Short answer: No flaw. it's not that kind of string.

Edited 1 Year Ago by gusano79

Now I'm confused.

Is there a character, that can be written/read by system (C#/.NET), but can't be written by any standard keyboard (European, Nordic, Slavic, Cyrlic, Arabic, Mongolian, African, Greek and Mandarin). What I need this for is seperation of characters.

Something like '\xFFFD' ("Replacement Character") might be the closest to what you're looking for.

So, I assumed that \xFFFD is a character that can't be written by anything but system and further notes says that \xFFFD displays as ?.

What if user says "I'm gonna troll you so badly let's say \xFFFD", it will be there in data?

If a user manages to get it entered, sure. It'll be there.

But you have said that user can't write \xFFFD and reading \xFFFD will result in reading a \x3F (question mark). And then again, you say:

If the delimiter is properly encoded, you'll get an extra delimiter.

Wouldn't that take me to point of start, when I actually try to find proper delimiter? And then this "extra delimiter", the "extra delimiter" now is \xFFFD, how many delimiters would I actually need? I have now 4 ,, ;, |, \xFFFD.

Edited 1 Year Ago by RikTelner

So, I assumed that \xFFFD is a character that can't be written by anything but system

No, it's just one that's highly unlikely to appear in anyone's data. As far as I know, nobody has a keyboard that has a key for it, but there are ways to type any Unicode character, so you can't assume it'll never happen.

and further notes says that \xFFFD displays as ?.

Read carefully. That only happens if you write it out using .NET's ASCII encoding scheme.

But you have said that user can't write \xFFFD and reading \xFFFD will result in reading a \x3F (question mark). And then again, you say:

I haven't, and again, the '?' thing is specific to ASCIIEncoder.

Wouldn't that take me to point of start, when I actually try to find proper delimiter?

There's no such thing as a "proper" delimiter. No matter which character you choose, it's possible for it to appear as part the content you're reading. There's no "safe" character that is guaranteed to not be used.

And then this "extra delimiter", the "extra delimiter" now is \xFFFD

What I meant was if someone embeds \xFFFD in their data and you try to read it, you'll need to be able interpret it as a field delimiter and not as content. This is known as delimiter collision.

how many delimiters would I actually need? I have now 4 ,, ;, |, \xFFFD.

I would only use one delimiter. Adding more won't make the collision issue go away.

Then I'm at point of mashing my face onto wall. So, there's no way to do anything about user messing up the code using "false" delimiter? I thought someone invented solution for this, to prevent "C#-file-parsing-string-splitting-injection". As there's something like mysqli_real_escape_string(); which prevent user to do SQL-injection by doing delimiters.

Edited 1 Year Ago by RikTelner

So, there's no way to do anything about user messing up the code using "false" delimiter? I thought someone invented solution for this

If you're talking about delimiter collision, this is a well-known problem that isn't language- or platform-specific. There are various ways you can reduce that risk, and they can be effective if you have some control over user input.

As there's something like mysqli_real_escape_string(); which prevent user to do SQL-injection by doing delimiters.

There are some string-escaping functions in the .NET Framework library, but they're aimed at specific situations like URL encoding. There isn't anything for delimited text files, probably because there is no consistently-used standard for formatting them.

It should be straightforward enough to write your own using one of the solutions I linked above. A common approach for CSV is to quote fields that contain embedded commas, and use two quote characters for embedded quote characters. This can get a little interesting to parse.

This article has been dead for over six months. Start a new discussion instead.