a[0] != a[0:1] ??

Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
Thread Solved

Join Date: Jul 2009
Posts: 31
Reputation: foosion is an unknown quantity at this point 
Solved Threads: 2
foosion foosion is offline Offline
Light Poster

a[0] != a[0:1] ??

 
0
  #1
Sep 11th, 2009
Why does this make sense in 3.1?

  1. >>> a = b'\x01\x02'
  2. >>> a[0]
  3. 1
  4. >>> a[0:1]
  5. b'\x01'
  6. >>> a[0] == a[0:1]
  7. False
  8. >>> a = '\x01\x02'
  9. >>> a[0]
  10. '\x01'
  11. >>> a[0:1]
  12. '\x01'
  13. >>> a[0] == a[0:1]
  14. True

Shouldn't we get True for both comparisons?
Reply With Quote Quick reply to this message  
Join Date: Feb 2007
Posts: 1,614
Reputation: scru has a spectacular aura about scru has a spectacular aura about 
Solved Threads: 131
Featured Poster
scru's Avatar
scru scru is offline Offline
Posting Virtuoso

Re: a[0] != a[0:1] ??

 
0
  #2
Sep 11th, 2009
No. b'\x01\x02' Is a byte string in python. The expression bs[x] means that you want the byte at position x in the bytestring x, while bs[a:b] means that you want the part of the byte string from a to (but not including) b. Not the difference: indexing gives a byte (as long), and slicing gives a byte string. The reason bs[0] != bs[0:1] is because they are different types.
Last edited by scru; Sep 11th, 2009 at 9:16 pm.
Reply With Quote Quick reply to this message  
Join Date: Jul 2009
Posts: 31
Reputation: foosion is an unknown quantity at this point 
Solved Threads: 2
foosion foosion is offline Offline
Light Poster

Re: a[0] != a[0:1] ??

 
0
  #3
Sep 11th, 2009
Is this in the documentation somewhere?

  1. >>> a = b'\x01\x02'
  2. >>> type(a[0])
  3. <class 'int'>
  4. >>> type(a[0:1])
  5. <class 'bytes'>
  6. >>> a = '\x01\x02'
  7. >>> type(a[0])
  8. <class 'str'>
  9. >>> type(a[0:1])
  10. <class 'str'>

Note that in the byte string version, a[0] and a[0:1] return different types, while in the regular string version, both return the same type. Why does it make sense to treat the two cases differently?

I would have expected python to be more consistent.

Here's a similar example:
  1. >>> s = "hello"
  2. >>> type(s)
  3. <class 'str'>
  4. >>> b = s.encode()
  5. >>> type(b)
  6. <class 'bytes'>
  7. >>> s[0]
  8. 'h'
  9. >>> b[0]
  10. 104
  11. >>> s[0:1]
  12. 'h'
  13. >>> b[0:1]
  14. b'h'
Last edited by foosion; Sep 11th, 2009 at 11:02 pm.
Reply With Quote Quick reply to this message  
Join Date: Feb 2007
Posts: 1,614
Reputation: scru has a spectacular aura about scru has a spectacular aura about 
Solved Threads: 131
Featured Poster
scru's Avatar
scru scru is offline Offline
Posting Virtuoso

Re: a[0] != a[0:1] ??

 
1
  #4
Sep 12th, 2009
Originally Posted by foosion View Post
Is this in the documentation somewhere?

  1. >>> a = b'\x01\x02'
  2. >>> type(a[0])
  3. <class 'int'>
  4. >>> type(a[0:1])
  5. <class 'bytes'>
  6. >>> a = '\x01\x02'
  7. >>> type(a[0])
  8. <class 'str'>
  9. >>> type(a[0:1])
  10. <class 'str'>

Note that in the byte string version, a[0] and a[0:1] return different types, while in the regular string version, both return the same type. Why does it make sense to treat the two cases differently?

I would have expected python to be more consistent.

Here's a similar example:
  1. >>> s = "hello"
  2. >>> type(s)
  3. <class 'str'>
  4. >>> b = s.encode()
  5. >>> type(b)
  6. <class 'bytes'>
  7. >>> s[0]
  8. 'h'
  9. >>> b[0]
  10. 104
  11. >>> s[0:1]
  12. 'h'
  13. >>> b[0:1]
  14. b'h'
I happen to think this behavior is consistent (with indexing and slicing rules).

Here's why it makes sense:

bytes and str are not the same. They aren't even conceptually the same.

A bytes (byte string) is a sequence of bytes (I assume you know what a byte is, and that it isn't a "character"). It is data, not text. A string on the other hand is a sequence of characters and is text.

As I mentioned earlier, indexing a byte string gives you a byte (it's a sequence of bytes, so this makes perfect sense). Since a byte is a numeric type (and not a character) what you get is a number (long). Slicing a sequence gives you the portion of the sequence from a to (but not including) b as a new sequence. Note that slicing a sequence always gives a sequence. Why? Because you are asking for a portion of and not just a single element of the sequence. This is why the result is a byte string and not a long. You may notice that while the two results actually have the same data (in essence, at least), their types (and representation) are completely different, and reasonably so. This is why b[0] != b[0:1] where b is a byte string.

str on the other hand is a sequence of characters (text). Indexing an str (which I'll call string from now on) gives you a character (makes perfect sense again). However, python characters are str instances with just one element, so this is what indexing a string gives you, another string. Slicing as string gives you a portion of the string, as a new string. If you slice for just one element, what you get is a new string with just one element. This is why s[0] == s[0:1] where s is a string.
Last edited by scru; Sep 12th, 2009 at 8:05 am.
Reply With Quote Quick reply to this message  
Join Date: Jul 2009
Posts: 31
Reputation: foosion is an unknown quantity at this point 
Solved Threads: 2
foosion foosion is offline Offline
Light Poster

Re: a[0] != a[0:1] ??

 
0
  #5
Sep 12th, 2009
One byte is a number, two bytes is a string of bytes. One str element is a string, two string elements is a string. If you accept that, it all makes sense. However, it seems to have changed from 2.6 to 3.1.

The reason I got into this was trying to port a 2.6 app to 3.1. The code reads some bytes from a file, then examines the bytes.

  1. f = open(filename, 'rb')
  2. data = f.read(12)
  3. if data[0:2] == '\xFF\xD8':
  4. if data[2] == '\xFF' and data[6:10] == 'Exif':

Note that both single bytes and the sequence of bytes are treated the same, and that they don't require any 'casting' for the comparisons.

This works in 2.6, but fails in 3.1. 3.1 requires
  1. if data[0:2] == b'\xFF\xD8':
  2. if data[2] == ord('\xFF') and data[6:10] == b'Exif':

Note that the single byte is treated differently than the sequence, and that we need to add 'b' and 'ord'.

As an aside, haven't we eliminated longs in 3.1, so that all numbers are of type int?
Reply With Quote Quick reply to this message  
Join Date: Feb 2007
Posts: 1,614
Reputation: scru has a spectacular aura about scru has a spectacular aura about 
Solved Threads: 131
Featured Poster
scru's Avatar
scru scru is offline Offline
Posting Virtuoso

Re: a[0] != a[0:1] ??

 
1
  #6
Sep 12th, 2009
Originally Posted by foosion View Post
One byte is a number, two bytes is a string of bytes. One str element is a string, two string elements is a string. If you accept that, it all makes sense. However, it seems to have changed from 2.6 to 3.1.

The reason I got into this was trying to port a 2.6 app to 3.1. The code reads some bytes from a file, then examines the bytes.

  1. f = open(filename, 'rb')
  2. data = f.read(12)
  3. if data[0:2] == '\xFF\xD8':
  4. if data[2] == '\xFF' and data[6:10] == 'Exif':

Note that both single bytes and the sequence of bytes are treated the same, and that they don't require any 'casting' for the comparisons.

This works in 2.6, but fails in 3.1. 3.1 requires
  1. if data[0:2] == b'\xFF\xD8':
  2. if data[2] == ord('\xFF') and data[6:10] == b'Exif':

Note that the single byte is treated differently than the sequence, and that we need to add 'b' and 'ord'.

As an aside, haven't we eliminated longs in 3.1, so that all numbers are of type int?
bytes, as they exist in Python 3.1 are conceptually new to Python (starting with version 3). This isn't to say that 8-bit strings didn't exist before in the form of regular text strings. Note the distinction, each element in a 8-bit python 2.x string (str) is treated as a character, not a byte, yielding the same slicing and indexing behavior as str in Python 3.1. Note that the bytes type in Python 2.6 is a synonym for str, and there is no "true" bytes type in that version. I think this was done to ease the transition into python 3.

Responding to your aside, long wasn't really removed. Int was removed and long was renamed to int, in a way. More accurately, int in Python 3 now behaves like long in Python 2, and uses the underlying PyLongType.
Last edited by scru; Sep 12th, 2009 at 10:14 am.
Reply With Quote Quick reply to this message  
Join Date: Jul 2009
Posts: 31
Reputation: foosion is an unknown quantity at this point 
Solved Threads: 2
foosion foosion is offline Offline
Light Poster

Re: a[0] != a[0:1] ??

 
0
  #7
Sep 12th, 2009
Very helpful. Thanks.

Now, if you could just explain my email problems, life would be complete
http://www.daniweb.com/forums/thread210744.html. Also see http://www.daniweb.com/forums/thread213686.html
Reply With Quote Quick reply to this message  
Reply

This thread has been marked solved.
Perhaps start a new thread instead?
Message:




Views: 379 | Replies: 6
Thread Tools Search this Thread



Tag cloud for Python
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC