954,499 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Effect on Compression ration by File size in Zlib

hello,

I have developed a file compression tool using zlib compression library. As the zlib use the deflate and inflate process, does the size of the file effect compression ratio?
If it has effect, what is the reason?

Thank you.

blackmagic01021
Light Poster
36 posts since Jan 2010
Reputation Points: 8
Solved Threads: 0
 

not really, no.
Of course you need enough data in the original file to be able to compress it at all.
A 1 byte file for example can't be compressed.

Apart from that, the compression algorithm might add more overhead in markers and stuff to the compressed file than it removes by compressing the orignal data.
That however is independent of file size, can happen with any size input (but might be more readily apparent with small input files than with large ones on average).

jwenting
duckman
Team Colleague
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337
 

According to your opinion, It means that as the file size increase the compression ratio will also increase(A common understanding).

blackmagic01021
Light Poster
36 posts since Jan 2010
Reputation Points: 8
Solved Threads: 0
 

I don't know zlib, but most compression algorithms have these properties:
There is a 'lookup table' overhead, so for files that are quite short, a 'compressed file' may be longer than the raw file. The lookup table can be considered constant size (for long enough files), so other things equal, the ratio is better for longer files, but the improvement per added length is less the longer the file: (Const+Ratio*Length)/Length approaches Ratio from above as Length increases.
The amount of compression possible depends on the uniformity of the contents: If your file has a lot duplication, particularly if the duplicates are long, then compression is high. If the data is nearly random, you will get very little (possibly negative) compression.
Some algorithms adjust to changing data by inserting (partial) new lookup tables if the compression ratio drops, but some do not; so uniformity of the file 'end to end' may affect the ratio. Think about a multipart mime document.

griswolf
Veteran Poster
1,165 posts since Apr 2010
Reputation Points: 344
Solved Threads: 256
 
According to your opinion, It means that as the file size increase the compression ratio will also increase(A common understanding).

No, you read me wrong.
And that "common understanding" (if it exists) is also wrong.
While most algorithms may become more efficient with more data to handle, that's only up to a point.
For the first few kilobytes of data for example they might become slightly more efficient, then reach their limit.
But that doesn't mean all larger files compress better than all smaller files given the same algorithm (or even in general).
A 1kb file containing plain text and all the same ASCII code for example will compress very well using most algorithms. A 1kb JPEG image otoh will likely not compress at all (and may even expand in size) when compression is attempted (this of course due to the fact that it's already compressed as part of the JFIF algorithm that created the file content).

jwenting
duckman
Team Colleague
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You