Hi all,

I'm relatively new to python (I've been writing python code for about half a year now) and I'm trying to figure out how to use "seek" on larger files.

I'm doing some work with large hard disk image files, and the raw devices themselves. I need the ability to seek towards the tail end of a file that is extremely large (hundreds of gigs).

Below is an excerpt of the code I'm using:

f.seek(self.offset*512L)
        f.seek(self.sectors_per_cluster*self.start_cluster_mft*512L, os.SEEK_CUR)
        f.seek(record_number*512L*self.mft_cluster_size, os.SEEK_CUR)
        mft = f.read(512L*self.mft_cluster_size)

It looks like currently, the seek in python only allows me to use long (32 bit integers) as an offset in fseek, or maybe it's just that I can't use long long (64 bit) integers with Python under my current operating system. I don't know how to get around this limitation...

The Windows API can definitely work with large files, and in Linux I believe (if I remember correctly) I could use lseek64 using C.

Is there a way around this limitation with Python? Is there a library I can use, or any other way I do this without have to write some kinda crazy hack.

BTW, I'm using a 32 bit build of Windows XP and Python 2.5.

Recommended Answers

All 6 Replies

Why don't you use the second parameter of the seek function ?

From http://docs.python.org/library/stdtypes.html#file-objects :

file.seek(offset[, whence])¶

Set the file’s current position, like stdio‘s fseek. The whence argument is optional and defaults to os.SEEK_SET or 0 (absolute file positioning); other values are os.SEEK_CUR or 1 (seek relative to the current position) and os.SEEK_END or 2 (seek relative to the file’s end). There is no return value.

For example, f.seek(2, os.SEEK_CUR) advances the position by two and f.seek(-3, os.SEEK_END) sets the position to the third to last.

Note that if the file is opened for appending (mode 'a' or 'a+'), any seek() operations will be undone at the next write. If the file is only opened for writing in append mode (mode 'a'), this method is essentially a no-op, but it remains useful for files opened in append mode with reading enabled (mode 'a+'). If the file is opened in text mode (without 'b'), only offsets returned by tell() are legal. Use of other offsets causes undefined behavior.

Note that not all file objects are seekable.

Changed in version 2.6: Passing float values as offset has been deprecated.

At one time you could pass a float to seek() and it would convert to a long long. I don't know if that still works or not. It would be doubtful if you are using 2.6 or 3.0.

I am using the second parameter, if you look at the example code I posted... Problem is - even though the Windows build does have large file support, sys.maxint returns 32. So a 32 bit integers is the biggest number I can specify when using seek. That means the max file size I can address is 4GB.

I don't see any way around this, and may have to switch to c# to complete this project....

At one time you could pass a float to seek() and it would convert to a long long. I don't know if that still works or not. It would be doubtful if you are using 2.6 or 3.0.

Well, I'm using python 2.5, do you know if that would still work? If so, how would I do that?

actually tried using floats with seek in python 2.5.2 and I got the following error:

OverflowError: long int too large to convert to int

And this may be because of 32 bit MS Windows limitations. Datetime will go down to microseconds on Linux, but only milliseconds on MS Windows. There was a bug report filed, and supposedly fixed, but the files tested were only a few GB each http://bugs.python.org/issue1672853 You may have to either split the file into parts or use the whence option as stated above, with either a seek from the end, or multiple os.SEEK_CUR statements (which may or may not work).

f.seek(2, os.SEEK_CUR) advances the position by two and f.seek(-3, os.SEEK_END) sets the position to the third to last.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.