943,769 Members | Top Members by Rank

Ad:
  • Python Discussion Thread
  • Marked Solved
  • Views: 741
  • Python RSS
Sep 11th, 2009
0

a[0] != a[0:1] ??

Expand Post »
Why does this make sense in 3.1?

Python Syntax (Toggle Plain Text)
  1. >>> a = b'\x01\x02'
  2. >>> a[0]
  3. 1
  4. >>> a[0:1]
  5. b'\x01'
  6. >>> a[0] == a[0:1]
  7. False
  8. >>> a = '\x01\x02'
  9. >>> a[0]
  10. '\x01'
  11. >>> a[0:1]
  12. '\x01'
  13. >>> a[0] == a[0:1]
  14. True

Shouldn't we get True for both comparisons?
Reputation Points: 10
Solved Threads: 2
Light Poster
foosion is offline Offline
31 posts
since Jul 2009
Sep 11th, 2009
0

Re: a[0] != a[0:1] ??

No. b'\x01\x02' Is a byte string in python. The expression bs[x] means that you want the byte at position x in the bytestring x, while bs[a:b] means that you want the part of the byte string from a to (but not including) b. Not the difference: indexing gives a byte (as long), and slicing gives a byte string. The reason bs[0] != bs[0:1] is because they are different types.
Last edited by scru; Sep 11th, 2009 at 9:16 pm.
Featured Poster
Reputation Points: 975
Solved Threads: 140
Posting Virtuoso
scru is offline Offline
1,624 posts
since Feb 2007
Sep 11th, 2009
0

Re: a[0] != a[0:1] ??

Is this in the documentation somewhere?

Python Syntax (Toggle Plain Text)
  1. >>> a = b'\x01\x02'
  2. >>> type(a[0])
  3. <class 'int'>
  4. >>> type(a[0:1])
  5. <class 'bytes'>
  6. >>> a = '\x01\x02'
  7. >>> type(a[0])
  8. <class 'str'>
  9. >>> type(a[0:1])
  10. <class 'str'>

Note that in the byte string version, a[0] and a[0:1] return different types, while in the regular string version, both return the same type. Why does it make sense to treat the two cases differently?

I would have expected python to be more consistent.

Here's a similar example:
Python Syntax (Toggle Plain Text)
  1. >>> s = "hello"
  2. >>> type(s)
  3. <class 'str'>
  4. >>> b = s.encode()
  5. >>> type(b)
  6. <class 'bytes'>
  7. >>> s[0]
  8. 'h'
  9. >>> b[0]
  10. 104
  11. >>> s[0:1]
  12. 'h'
  13. >>> b[0:1]
  14. b'h'
Last edited by foosion; Sep 11th, 2009 at 11:02 pm.
Reputation Points: 10
Solved Threads: 2
Light Poster
foosion is offline Offline
31 posts
since Jul 2009
Sep 12th, 2009
1

Re: a[0] != a[0:1] ??

Click to Expand / Collapse  Quote originally posted by foosion ...
Is this in the documentation somewhere?

Python Syntax (Toggle Plain Text)
  1. >>> a = b'\x01\x02'
  2. >>> type(a[0])
  3. <class 'int'>
  4. >>> type(a[0:1])
  5. <class 'bytes'>
  6. >>> a = '\x01\x02'
  7. >>> type(a[0])
  8. <class 'str'>
  9. >>> type(a[0:1])
  10. <class 'str'>

Note that in the byte string version, a[0] and a[0:1] return different types, while in the regular string version, both return the same type. Why does it make sense to treat the two cases differently?

I would have expected python to be more consistent.

Here's a similar example:
Python Syntax (Toggle Plain Text)
  1. >>> s = "hello"
  2. >>> type(s)
  3. <class 'str'>
  4. >>> b = s.encode()
  5. >>> type(b)
  6. <class 'bytes'>
  7. >>> s[0]
  8. 'h'
  9. >>> b[0]
  10. 104
  11. >>> s[0:1]
  12. 'h'
  13. >>> b[0:1]
  14. b'h'
I happen to think this behavior is consistent (with indexing and slicing rules).

Here's why it makes sense:

bytes and str are not the same. They aren't even conceptually the same.

A bytes (byte string) is a sequence of bytes (I assume you know what a byte is, and that it isn't a "character"). It is data, not text. A string on the other hand is a sequence of characters and is text.

As I mentioned earlier, indexing a byte string gives you a byte (it's a sequence of bytes, so this makes perfect sense). Since a byte is a numeric type (and not a character) what you get is a number (long). Slicing a sequence gives you the portion of the sequence from a to (but not including) b as a new sequence. Note that slicing a sequence always gives a sequence. Why? Because you are asking for a portion of and not just a single element of the sequence. This is why the result is a byte string and not a long. You may notice that while the two results actually have the same data (in essence, at least), their types (and representation) are completely different, and reasonably so. This is why b[0] != b[0:1] where b is a byte string.

str on the other hand is a sequence of characters (text). Indexing an str (which I'll call string from now on) gives you a character (makes perfect sense again). However, python characters are str instances with just one element, so this is what indexing a string gives you, another string. Slicing as string gives you a portion of the string, as a new string. If you slice for just one element, what you get is a new string with just one element. This is why s[0] == s[0:1] where s is a string.
Last edited by scru; Sep 12th, 2009 at 8:05 am.
Featured Poster
Reputation Points: 975
Solved Threads: 140
Posting Virtuoso
scru is offline Offline
1,624 posts
since Feb 2007
Sep 12th, 2009
0

Re: a[0] != a[0:1] ??

One byte is a number, two bytes is a string of bytes. One str element is a string, two string elements is a string. If you accept that, it all makes sense. However, it seems to have changed from 2.6 to 3.1.

The reason I got into this was trying to port a 2.6 app to 3.1. The code reads some bytes from a file, then examines the bytes.

Python Syntax (Toggle Plain Text)
  1. f = open(filename, 'rb')
  2. data = f.read(12)
  3. if data[0:2] == '\xFF\xD8':
  4. if data[2] == '\xFF' and data[6:10] == 'Exif':

Note that both single bytes and the sequence of bytes are treated the same, and that they don't require any 'casting' for the comparisons.

This works in 2.6, but fails in 3.1. 3.1 requires
Python Syntax (Toggle Plain Text)
  1. if data[0:2] == b'\xFF\xD8':
  2. if data[2] == ord('\xFF') and data[6:10] == b'Exif':

Note that the single byte is treated differently than the sequence, and that we need to add 'b' and 'ord'.

As an aside, haven't we eliminated longs in 3.1, so that all numbers are of type int?
Reputation Points: 10
Solved Threads: 2
Light Poster
foosion is offline Offline
31 posts
since Jul 2009
Sep 12th, 2009
1

Re: a[0] != a[0:1] ??

Click to Expand / Collapse  Quote originally posted by foosion ...
One byte is a number, two bytes is a string of bytes. One str element is a string, two string elements is a string. If you accept that, it all makes sense. However, it seems to have changed from 2.6 to 3.1.

The reason I got into this was trying to port a 2.6 app to 3.1. The code reads some bytes from a file, then examines the bytes.

Python Syntax (Toggle Plain Text)
  1. f = open(filename, 'rb')
  2. data = f.read(12)
  3. if data[0:2] == '\xFF\xD8':
  4. if data[2] == '\xFF' and data[6:10] == 'Exif':

Note that both single bytes and the sequence of bytes are treated the same, and that they don't require any 'casting' for the comparisons.

This works in 2.6, but fails in 3.1. 3.1 requires
Python Syntax (Toggle Plain Text)
  1. if data[0:2] == b'\xFF\xD8':
  2. if data[2] == ord('\xFF') and data[6:10] == b'Exif':

Note that the single byte is treated differently than the sequence, and that we need to add 'b' and 'ord'.

As an aside, haven't we eliminated longs in 3.1, so that all numbers are of type int?
bytes, as they exist in Python 3.1 are conceptually new to Python (starting with version 3). This isn't to say that 8-bit strings didn't exist before in the form of regular text strings. Note the distinction, each element in a 8-bit python 2.x string (str) is treated as a character, not a byte, yielding the same slicing and indexing behavior as str in Python 3.1. Note that the bytes type in Python 2.6 is a synonym for str, and there is no "true" bytes type in that version. I think this was done to ease the transition into python 3.

Responding to your aside, long wasn't really removed. Int was removed and long was renamed to int, in a way. More accurately, int in Python 3 now behaves like long in Python 2, and uses the underlying PyLongType.
Last edited by scru; Sep 12th, 2009 at 10:14 am.
Featured Poster
Reputation Points: 975
Solved Threads: 140
Posting Virtuoso
scru is offline Offline
1,624 posts
since Feb 2007
Sep 12th, 2009
0

Re: a[0] != a[0:1] ??

Very helpful. Thanks.

Now, if you could just explain my email problems, life would be complete
http://www.daniweb.com/forums/thread210744.html. Also see http://www.daniweb.com/forums/thread213686.html
Reputation Points: 10
Solved Threads: 2
Light Poster
foosion is offline Offline
31 posts
since Jul 2009

This thread is solved

Either the thread starter or a moderator has marked this thread as solved. You can most likely trust the responses and answers given. There is most likely no reason for any further responses to be posted here. If you have a related question, please start a new thread in this forum instead.

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Python Forum Timeline: Re: Projects for the Beginner
Next Thread in Python Forum Timeline: Using __init__ statement of a class to create an object of another class.





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC