I understand that data on computers can be compressed (I see it all the time with zip files and jpegs) but I was wondering just how compression works. Thinking it through, if you data is represented by 1s and 0s then it can be corresponded 1 to 1 with a binary number (albeit a large one). The idea behind compression is that some smaller number can represent the same data, but how is this possible since that new number must represent something else? I suppose if you know in advance that this number is a compressed version of another number than you can interpret it differently? Are there any relatively simple compression algorithms?

Recommended Answers

All 9 Replies

For an introduction, look at Huffman Encoding. There is certainly much more than just that - but it should give you a start to get an understanding.

Thank you, that was a helpful link, but I don't really think it suits my needs well (I am storing many variables of different types.) I was wondering if there are any on-line (not as in on the internet but as in sequential input) algorithms that compress individual bytes into a file. I just want a very simple algorithm... Assuming one exists.

If the algorithm was "simple" it probably wouldn't compress very well. If it helps you understand better, many compression algorithms actually can end up making a chunk of data larger if it's not suited for the task (notice how sound files don't compress very well at all with ZIP compression).

>>I am storing many variables of different types.

Can you convert/encode all those variables to a string version?

Theoretically I could convert the variables to string.... but not very efficiently. I was wondering if there was any kind of online compression algorithm that takes one byte at a time and encrypts it to the file. Something with a function prototype that could look like this:

void writeCompressed(FILE *file, unsigned char byte)

I think it will be easier if you try to compress a series of bytes into a smaller chunk instead of trying to compress a byte into something smaller. I suggest to read this and see if it helps.

Run length encoding is simple, but it may not be suitable for your type of data. But it may work if you can deal with your data as a continuous string of ones and zeros and then compress the ones and zeros. After you uncompress the continuous string of ones and zeros you can chop the string up into eight bit bytes.


http://en.wikipedia.org/wiki/Run-length_encoding

Just another thought:

If your data contained 32 or 64 bit integers, and they held relatively small numbers, then there would be quite a few leading zeros in a row that could be run length encoded. For example, if you had ten 32 bit integers, then that would be 320 bits total. If, on average, each integer was equal to 100 or less, then that would mean there are on average 25 leading zeros for each integer. Now you could run length encode those 25 zeros using just 5 bits, thus compressing 20 bits. So 20 bits saved times ten integers = 200 bits saved, resulting in the 320 bits compressing down to only 120 bits, plus the additional compression of any runs of ones in the rest of the bits.

But it all depends on the type of data as to how well RLE would work in your application.

Okay... thanks for the help. I think that it may not be worth the trouble. My files are only around 50-100k (binary format) and the data is all over the place. I don't think that compression is necessary, though I may revisit the topic later. Thanks for the help.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.