Making Python ignore EOF (0xA1) so entire binary file is read

Question

DustinS 0 Newbie Poster

12 Years Ago

import os
data = map(lambda c: ord(c), file(args[0]).read(os.path.getsize(args[0])))

For one file, os.path.getsize(args[0]) returns 10456 while len(data) returns 281.
After looking at different files, I realized that it always stops reading at the 0x1A character.
The documentation says that in Windows, Python uses _wfopen which (for compatibility reasons) interprets 0x1A (CTRL-Z in DOS) as the end-of-file.

Does anyone know how to read an entire binary file which may contain 'x1A'?

file-system python

3 Contributors
6 Replies
2K Views
3 Days Discussion Span
Latest Post 12 Years Ago Latest Post by Gribouillis

All 6 Replies

Gribouillis 1,391 Programming Explorer

12 Years Ago

This post says to open the file in mode 'rb' or 'rU'.

Edited 12 Years Ago by Gribouillis

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

DustinS 0 Newbie Poster · Answer 1 · 2012-10-19T22:10:34+00:00

I already tried 'rb'. Just tried 'rU' but it didn't work either.
Here is more of the code related to this issue:

 open(args[0], 'rb')
 import os
 filelength=os.path.getsize(args[0]) #gives correct file size
 data = map(lambda c: ord(c), file(args[0]).read())
 mdebug(5, "File is %(filelen)d, Data is %(len)d bytes" % {'filelen': filelength, 'len': len(data)})

DustinS 0 Newbie Poster · Answer 2 · 2012-10-19T22:21:53+00:00

Problem solved. Code now reads:

            f=open(args[0], mode='rb') 
            import os
            filelength=os.path.getsize(args[0])
            data = map(lambda c: ord(c), f.read())

and works fine (not sure if whole program works on Windows yet, but this particular problem is solved).
I saw the post referred to in Gribouillis's response, but I didn't implement the suggestion correctly.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 3 · 2012-10-20T06:56:20+00:00

Notice that

from struct import unpack
s = f.read()
data = list(unpack("%dB" % len(s), s))

is much faster to create the data.

Lardmeister 461 Posting Virtuoso · Answer 4 · 2012-10-21T21:50:14+00:00

With the advent of Python3 your life is easier:

# test binary file read 

fname = "ball.png"
with open(fname, mode='rb') as f:
    data = f.read()

print(type(data))

'''
result with Python2 >>>
<type 'str'>
result with Python3  >>>
<class 'bytes'>
'''

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 5 · 2012-10-23T04:55:35+00:00

It's a very good remark, in python 3, the bytes type is already a sequence of integers

>>> s = bytes("hello", encoding="utf8")
>>> s
b'hello'
>>> s[0]
104
>>> s[1]
101
>>> list(s)
[104, 101, 108, 108, 111]

In python 2, there is also an array type

>>> import array
>>> x = array.array("B", "hello")
>>> list(x)
[104, 101, 108, 108, 111]
>>> x
array('B', [104, 101, 108, 108, 111])

Making Python ignore EOF (0xA1) so entire binary file is read

Recommended Answers Collapse Answers

All 6 Replies

Recommended Answers