niek_e is correct, successful compression depends on the source and the compression algorithm. Here is one example of large repeating text that the zip algorithm fails with, but bz2 compression works great:
# exploring Python file compression/decompression module bz2
# .bz2 files can be decompressed with WinRAR utility
# uncompressed file 'longtext.txt' has size of 48 kbytes
# level9 compressed 'longtext.bz2' has size of 1 kbyte
# zip compressed file 'longtext.zip' is of size 95 kbytes!
# tested with Python 3.1
import bz2
text = "use this to make one very long string right now\n"
# create 1000 lines of repeating text
text = text*1000
# create uncompressed text file
uc_fname = 'longtext.txt'
fout = open(uc_fname, 'w')
fout.write(text)
fout.close()
# create bz2 compressed file (can be used with WinRAR)
bz2_fname = 'longtext.bz2'
bzout = bz2.BZ2File(bz2_fname, 'w', compresslevel=9)
# Python3 requires text string to be encoded to bytearray
bzout.write(text.encode('utf8'))
bzout.close()
# read bz2 compressed file
bz2_fname = 'longtext.bz2'
bzin = bz2.BZ2File(bz2_fname, 'r', compresslevel=9)
# result of read() is of type string
text2 = bzin.read()
bzin.close()
# short test ...
print("Length of original text = %d" % len(text))
print("Length of recovered text = %d" % len(text2))
# the other way to compress ...
# create zipfile compressed file
# in this case the compressed file is larger than the original!
import zipfile as zf
z_fname = 'longtext.zip'
zout = zf.ZipFile(z_fname, 'w', compression=zf.ZIP_DEFLATED)
zout.write(uc_fname, text)
zout.close()
Last edited by bumsfeld; Aug 24th, 2009 at 12:57 pm.
Reputation Points: 404
Solved Threads: 180
Nearly a Posting Virtuoso
Offline 1,422 posts
since Jul 2005