I am looking for a good example of the tarfile module, writing to and reading from an archive. Particularly the highly compressed filename.tar.bz2 format.

Recommended Answers

All 3 Replies

Here is an example:

import tarfile

# uncompressed = "w"  use extension .tar
# gzip compressed = "w:gz"  dito .tar.gz
# bzip2 super compressed = "w:bz2"  dito .tar.bz2
tar = tarfile.open("sample.tar.bz2", "w:bz2")
# turn three regular files into a tar file archive
for name in ["test1.py", "test2.py", "test3.py"]:

# read the tarfile
tar = tarfile.open("sample.tar.bz2", "r:bz2")
file_list = []
for file in tar:
    print file.name, file.size

# another way to get the file list
file_list2 = tar.getnames()
print file_list
print file_list2    

# pick one of the three files in the tarball/tar-archive
filename = file_list[1]

# decompress the particular file
data = tar.extractfile(filename).read()

print "Content of file %s in the tar-archive:" % filename
print data

Thanks Vega,
works like a charm, got to keep playing with it.

Thanks for the example, VE. I have a .tar.bz2 file I need to read within Python and was able to take your code and use it. I've put together a little .tar.bz2 extraction example in case others follow the same path and wind up here. This is for Python 2.5 WinXP Pro but should almost work under Linux.

import os
import tarfile
tar = tarfile.open("MyTarFile.tar.bz2","r:bz2") # Replace MyTarFile with the right name
file_list = tar.getnames()
for fn in file_list:        # Filenames
    xfile = tar.extractfile(fn)
    if xfile:  # True if data file, False if directory (apparently)
        data = xfile.read()
        fo = open(fn, "wb")
        if fo:
            print fn
            print "Error opening output file %s" % fn
    else:                       # ASSuME xfile None because filename is a directory
            os.mkdir(fn)        # Also ASSuME higher directories show up first
        except WindowsError, e:
            if e[0] == 183:     # This happens when you try to re-make an existing directory
                continue        # Ignore duplicate directory
                print repr(e)
                raise WindowsError, e

OT mini-rant: I'd been googling in vain for an example using the bz2 module to decompress a .tar.bz2 file, and not had any luck. The bz2 documentation is lacking (in my opinion) as it doesn't precisely describe where the bz2.decompress() input data comes from. I tried all the obvious alternatives but none of them worked, hence the tarfile module instead. I only mention this for the benefit of the next person with the same problem.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.