hi all, I need some help w.r.t. arrays and it's representation. My assignment was to think of an idea(s) to redesign a 2-dimensional array that is a compressed form of binary patterns.

e.g. {3,9,4,2} is compressed form and it's decompressed form is 111000000000111100. Then same for next few rows.

I did some reading up and found out that it is called run-length encoding. The question requires me to redesign the array such that through a decompression method, it will give the same binary pattern.

I was thinking that every number is significant in it's representation and hence, the number of entries should remain the same.

Does anyone have any idea how a smaller array can be implemented to show the same data after decompression?

3
Contributors
5
Replies
6
Views
7 Years
Discussion Span
Last Post by griswolf

I was thinking that every number is significant in it's representation and hence, the number of entries should remain the same.

Does anyone have any idea how a smaller array can be implemented to show the same data after decompression?

If the number of entries are the same, then it can't be smaller.

Given that, do some entries end in zeros, that is, {9,3,2,0}? If so, you can look at jagged arrays (Yes, it's a C# page, but it has a good description of jagged arrays).

Do you need to deal with the possibility that the data can start with zero(s)? Generally, RLE has to deal with more than two tokens, so it uses a dictionary header, and outputs both the dictionary index and the repetition for each run of "same token". So, your example would look like "{[0,1]1:3,0:9,1:4,0:2}" or some such. In your case you could just have a 'first digit is' convention so it looks like "{1:3,9,4,2}" or with 5 leading zeros: "{0:5,3,9,4,2}"

You can make it even shorter (if lucky) by finding repeating sequences rather than repeating digits. For instance to encode '1001001001001100100100100100' you might output"{[100,1]0:4,1:1,0:5}". This is a gamble: If you start out using this idea, you may not find enough repetition to 'pay off' the size of the header. Note that for binary data, there are only 4 possible pairs of digits (A:00,B:01,C:10,D:11). You could encode your 11,10,00,00,00,00,11,11,00 as "{D1C1A4D2A1}". One common variation is that a single repetition requires no number, so reduce all the counts by one: "{DCA3D1A}". The same idea applies to triples, quads and (using upper and lower case) 5-bit groups.
Note: no header needed if your convention is ABCD representing 00,01,10,11 respectively.
Note: that by using letters as indices, we can lose the commas.
Note: If you are actually putting this data on the wire, you don't want to spend 7 or 8 bits for every index, every digit of the count and every separator (if any), so this is all more for fun than actual use.

Edited by griswolf: n/a

The entries do not have zeroes. Thanks for your help though. I've submitted my assignment but I'm still curious.
I wrote something along the lines of using differencing to obtain a smaller array. But explained that there might be data loss and decompression is not complete.

:)

Do you need to deal with the possibility that the data can start with zero(s)? Generally, RLE has to deal with more than two tokens, so it uses a dictionary header, and outputs both the dictionary index and the repetition for each run of "same token". So, your example would look like "{[0,1]1:3,0:9,1:4,0:2}" or some such. In your case you could just have a 'first digit is' convention so it looks like "{1:3,9,4,2}" or with 5 leading zeros: "{0:5,3,9,4,2}"

You can make it even shorter (if lucky) by finding repeating sequences rather than repeating digits. For instance to encode '1001001001001100100100100100' you might output"{[100,1]0:4,1:1,0:5}". This is a gamble: If you start out using this idea, you may not find enough repetition to 'pay off' the size of the header. Note that for binary data, there are only 4 possible pairs of digits (A:00,B:01,C:10,D:11). You could encode your 11,10,00,00,00,00,11,11,00 as "{D1C1A4D2A1}". One common variation is that a single repetition requires no number, so reduce all the counts by one: "{DCA3D1A}". The same idea applies to triples, quads and (using upper and lower case) 5-bit groups. Note that by using letters as indices, we can lose the commas.

Hello there. The array in my assignment had 5 values in every row, some rows having even lesser values. Just to make an observation, do correct me if I'm wrong;
The smaller the amount of data, the harder it is to find a repetitive pattern to compress.

Thanks for your reply, it was insightful.

So: "The smaller the amount of data, the harder it is to find a repetitive pattern to compress.?"
As a general rule, yes.
If every datum is one of a small set of choices, then the compression is possible by stepping up one level. For example, when sending colors, you might just say "'chartreuse' is 0, 'vermilion' is 1, and 'turquoise' is 2. Anything else is spelled out." Every message might have only one color in it, so the message would not seem to be subject to compression, but if you know it always has a color in it, you can compress the color part by using your knowledge of the system.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.