What is the procedure in Python for opening a very large binary file in read/write mode, and overwriting individual bytes or sequences of bytes?

I'm working with huge files that are many GB, and I would like to seek(arbitrary_long_integer), and overwrite what's there (instead of appending or inserting). I need to avoid loading the huge file into memory, since it is so big. I just want to overwrite it on the disk.

I've looked around at various Python file operations tutorials, and I haven't found what I need.

Thanks in advance!

You cannot do that in python. But what you can do, is create a temporary file, read your original file, replace the bytes and write it into the temporary file, then copy it into the original location. This is more or less what you need.

from tempfile import mkstemp
from shutil import move
from os import remove, close

def replace(file, pattern, subst):
    #Create temp file
    fh, abs_path = mkstemp()
    new_file = open(abs_path,'w')
    old_file = open(file)
    for line in old_file:
        new_file.write(line.replace(pattern, subst))
    #close temp file
    #Remove original file
    #Move new file
    move(abs_path, file)

Thanks to Thomas Watnedal for the code

I do not get it this, what is not possible?

f = open('/tmp/workfile', 'w')
f = open('/tmp/workfile', 'rb+')

And now I have overwritten the 5th position in the file with "I". In place.

It is possible, but 99 % not recommended. You should anyway backup the old file. Only if file should have dynamic content changing all the time it makes sense. But then why are you not using database?

Well britanicus option is good for large files or an unknown size.Other than that...
You just do it the easy way. Just readin and repalce what you need.
Simple. :)

But in the future use some organized data storage system. at least anydbm,pickle or sqlite my pick!.

  • There was a statement in the thread, that overwriting arbitrary bytes in a file is not possible. This is not true.
  • The original question does not specify if the input file would be line oriented or not, if it would be a record structure or not. So recommending some database is too specific.
  • Having a backup of the file is not significant here. It can be assumed, imho, that the questioner knows what overwriting means.

So be it. Just let's hope that Murphy's law is not in power ('The back ups which have been taken were not taken or if taken, the media does not function' or how was it? :icon_evil: )

Slate Every data is a candidate for database insertion.
Data does not have to be in some sort of order. You do that job.

Aftert all that is what programing is all about. ;)