newbie: how do I test a byte string?

Question

ChrisP_Buffalo 0 Newbie Poster

16 Years Ago

How do I test a byte string in Python? I want to manually convert (no libraries or functions) a UTF-8 string into UTF-16.

My basic solution is to reading from the stream some number of UTF-8 bytes, convert them into codepoints, then convert those codepoints into UTF-16 bytes. I want to code this myself, but I don't understand how to test the actual byte sequence.

Let's say I use the following code to ensure I have a UTF-8 encoding (from Evan Jones' Scratch Pad: http://evanjones.ca/python-utf8.html)

s = "hello normal string"
u = unicode( s, "utf-8" )
backToBytes = u.encode( "utf-8" )

Now, I need to test the lead byte of the sequence for each character in "backToBytes", right? Is there a function that does this? Any help would be appreciated.

python

1 Contributor
1 Reply
152 Views
2 Days Discussion Span
Latest Post 16 Years Ago Latest Post by ChrisP_Buffalo

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

ChrisP_Buffalo 0 Newbie Poster · Answer 1 · 2008-08-01T02:03:43+00:00

I guess I get to solve my own thread (thanks again to the natural Language Toolkit's online tutorial). The function repr() appears to give me what I need:

line = u'\u0144'
line_utf = line.encode('utf8')

print 'line = ', line_utf
print 'line repr ', repr(line_utf)

Output:
line = Å„
line repr '\xc5\x84'

It's the '\xc5\x84' part that I needed.