Labdabeta 182 Posting Pro in Training Featured Poster

Hello,

I posted on this forum two times before about a particular pair of non-printing characters being inserted into my code. You can find the threads here and here. Two great posters tried their best to solve this problem (mike_2000_17 and deceptikon) to no avail.

Finally, using the hints that their responses provided, and a great deal of unhealthy obsession, I dug to the root of the problem. In doing so I uncovered a sort of 'generic response' to finding these kinds of errors. I figured I would share that methodology here.

Step 1: Isolate the characters
In my case, I got stray /302 then /206 in my code, so my characters were /302/206.
Another example is mike_2000_17's stray /302/240, let's analyze that as well.

Step 2: Convert to Hex
Most tools check the hexadecimal, not octal, representation of a character set, for ease of verification convert your characters to hex. Note that the /### notation represents an octal number, so each digit represents 3 bits. Also note that the first digit is never more than 3, so each trio of numbers represents one byte. These two-byte numbers can easily be written as 4 hexadecimal numbers.
/302/206 becomes 0xC286
/302/240 becomes 0xC2A0

Step 3: Check Unicode (UTF-16)
There are many tables and automatic converters from hex to unicode. You can always plug the values into wolframalpha as U+XXXX where XXXX is your hexadecimal number.
In both of these cases, the result is some korean glyph, not likely the problem.

Step 4: Check UTF-8
UTF-8 will treat the two bytes as separate entities to a certain extent. This is especially likely if your first byte is CX in hexadecimal, as is the case with our tests. In this case, the byte C2 will be, effectively, discarded (seems to be related to C0 and C1 control codes (http://en.wikipedia.org/wiki/C0_and_C1_control_codes) ). Now we merely examine the last byte in plain Unicode. Back to wolframalpha with U+00XX and we find that mike_2000_17 is correct in that his character combination represents a non-breaking space, commonly generated on many layouts (including Canadian French and US Dvorak) via the key combination SHIFT+SPACE.

My characters are a bit more interesting. The character code U+0086 corresponds to "start of selected area". As the above wikipedia article explains, this code is an out-of-date holdover from the days of block-oriented terminals. My issue was caused by my gaming mouse in conjunction with clipboard enhancement software leading to the "start of selected area" signal being processed whenever I switched window selections into an editable text region using the mouse with text selected in the source window.

tl;dr: Convert each byte of your stray characters into hexadecimal, check U+#### and U+00## to see if one of the results makes sense. Further googling for the resulting character name often shows results for causes (EG: Non-breaking space in code shows that Shift+Space can cause it).

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.