0
import os
data = map(lambda c: ord(c), file(args[0]).read(os.path.getsize(args[0])))

For one file, os.path.getsize(args[0]) returns 10456 while len(data) returns 281.
After looking at different files, I realized that it always stops reading at the 0x1A character.
The documentation says that in Windows, Python uses _wfopen which (for compatibility reasons) interprets 0x1A (CTRL-Z in DOS) as the end-of-file.

Does anyone know how to read an entire binary file which may contain 'x1A'?

3
Contributors
6
Replies
9
Views
4 Years
Discussion Span
Last Post by Gribouillis
Featured Replies
  • Problem solved. Code now reads: f=open(args[0], mode='rb') import os filelength=os.path.getsize(args[0]) data = map(lambda c: ord(c), f.read()) and works fine (not sure if whole program works on Windows yet, but this particular problem is solved). I saw the post referred to in Gribouillis's response, but I didn't implement the suggestion correctly. Read More

  • Notice that from struct import unpack s = f.read() data = list(unpack("%dB" % len(s), s)) is much faster to create the data. Read More

  • It's a very good remark, in python 3, the bytes type is already a sequence of integers >>> s = bytes("hello", encoding="utf8") >>> s b'hello' >>> s[0] 104 >>> s[1] 101 >>> list(s) [104, 101, 108, 108, 111] In python 2, there is also an array type >>> import array … Read More

0

I already tried 'rb'. Just tried 'rU' but it didn't work either.
Here is more of the code related to this issue:

 open(args[0], 'rb')
 import os
 filelength=os.path.getsize(args[0]) #gives correct file size
 data = map(lambda c: ord(c), file(args[0]).read())
 mdebug(5, "File is %(filelen)d, Data is %(len)d bytes" % {'filelen': filelength, 'len': len(data)})
1

Problem solved. Code now reads:

            f=open(args[0], mode='rb') 
            import os
            filelength=os.path.getsize(args[0])
            data = map(lambda c: ord(c), f.read())

and works fine (not sure if whole program works on Windows yet, but this particular problem is solved).
I saw the post referred to in Gribouillis's response, but I didn't implement the suggestion correctly.

Edited by DustinS

1

Notice that

from struct import unpack
s = f.read()
data = list(unpack("%dB" % len(s), s))

is much faster to create the data.

1

With the advent of Python3 your life is easier:

# test binary file read 

fname = "ball.png"
with open(fname, mode='rb') as f:
    data = f.read()

print(type(data))

'''
result with Python2 >>>
<type 'str'>
result with Python3  >>>
<class 'bytes'>
'''
Comments
indeed!
1

It's a very good remark, in python 3, the bytes type is already a sequence of integers

>>> s = bytes("hello", encoding="utf8")
>>> s
b'hello'
>>> s[0]
104
>>> s[1]
101
>>> list(s)
[104, 101, 108, 108, 111]

In python 2, there is also an array type

>>> import array
>>> x = array.array("B", "hello")
>>> list(x)
[104, 101, 108, 108, 111]
>>> x
array('B', [104, 101, 108, 108, 111])

Edited by Gribouillis

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.