hello,

I have developed a file compression tool using zlib compression library. As the zlib use the deflate and inflate process, does the size of the file effect compression ratio?
If it has effect, what is the reason?

Thank you.

Recommended Answers

All 4 Replies

not really, no.
Of course you need enough data in the original file to be able to compress it at all.
A 1 byte file for example can't be compressed.

Apart from that, the compression algorithm might add more overhead in markers and stuff to the compressed file than it removes by compressing the orignal data.
That however is independent of file size, can happen with any size input (but might be more readily apparent with small input files than with large ones on average).

According to your opinion, It means that as the file size increase the compression ratio will also increase(A common understanding).

I don't know zlib, but most compression algorithms have these properties:

  • There is a 'lookup table' overhead, so for files that are quite short, a 'compressed file' may be longer than the raw file. The lookup table can be considered constant size (for long enough files), so other things equal, the ratio is better for longer files, but the improvement per added length is less the longer the file: (Const+Ratio*Length)/Length approaches Ratio from above as Length increases.
  • The amount of compression possible depends on the uniformity of the contents: If your file has a lot duplication, particularly if the duplicates are long, then compression is high. If the data is nearly random, you will get very little (possibly negative) compression.
  • Some algorithms adjust to changing data by inserting (partial) new lookup tables if the compression ratio drops, but some do not; so uniformity of the file 'end to end' may affect the ratio. Think about a multipart mime document.

According to your opinion, It means that as the file size increase the compression ratio will also increase(A common understanding).

No, you read me wrong.
And that "common understanding" (if it exists) is also wrong.
While most algorithms may become more efficient with more data to handle, that's only up to a point.
For the first few kilobytes of data for example they might become slightly more efficient, then reach their limit.
But that doesn't mean all larger files compress better than all smaller files given the same algorithm (or even in general).
A 1kb file containing plain text and all the same ASCII code for example will compress very well using most algorithms. A 1kb JPEG image otoh will likely not compress at all (and may even expand in size) when compression is attempted (this of course due to the fact that it's already compressed as part of the JFIF algorithm that created the file content).

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.