954,541 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

newbie: how do I test a byte string?

How do I test a byte string in Python? I want to manually convert (no libraries or functions) a UTF-8 string into UTF-16.

My basic solution is to reading from the stream some number of UTF-8 bytes, convert them into codepoints, then convert those codepoints into UTF-16 bytes. I want to code this myself, but I don't understand how to test the actual byte sequence.

Let's say I use the following code to ensure I have a UTF-8 encoding (from Evan Jones' Scratch Pad: http://evanjones.ca/python-utf8.html )

s = "hello normal string"
u = unicode( s, "utf-8" )
backToBytes = u.encode( "utf-8" )


Now, I need to test the lead byte of the sequence for each character in "backToBytes", right? Is there a function that does this? Any help would be appreciated.

ChrisP_Buffalo
Newbie Poster
20 posts since Mar 2008
Reputation Points: 10
Solved Threads: 0
 

I guess I get to solve my own thread (thanks again to the natural Language Toolkit's online tutorial). The function repr() appears to give me what I need:

line = u'\u0144'
line_utf = line.encode('utf8')

print 'line = ', line_utf
print 'line repr ', repr(line_utf)


Output:
line = Å„
line repr '\xc5\x84'

It's the '\xc5\x84' part that I needed.

ChrisP_Buffalo
Newbie Poster
20 posts since Mar 2008
Reputation Points: 10
Solved Threads: 0
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You