943,513 Members | Top Members by Rank

Ad:
Jun 24th, 2004
0

Does random data compress?

Expand Post »
I've been wondering about this for awhile now: does random data compress?

Some people seem to think it does, while others maintain that it doesn't. I mean to mathematically prove that only 1/255 of random files can compress.

Conditions:

Original file size: 32 bytes
Maximum size of "compressed" file: 31 bytes
Number of possible different original files: 1.158 * 10^77
Number of possilbe "compressed" files: let's calculate that

Uhoh, this is gonna be hard to do without summation sybols.... well, ASCII art never hurt anyone.

31_
\
/__ 1.158 * 10^77 * (1/256)^k = 4.540 * 10^74
k=1

(1.158 * 10^77) / (4.540 * 10^74) ~ 1/255

Does this or does this not prove that only 1/255 of random data files can be compressed, and is my math correct?
Reputation Points: 115
Solved Threads: 5
Junior Poster
Toba is offline Offline
192 posts
since Jun 2004
Jun 24th, 2004
0

Re: Does random data compress?

Quote originally posted by Toba ...
I've been wondering about this for awhile now: does random data compress?
...
Does this or does this not prove that only 1/255 of random data files can be compressed, and is my math correct?
That sum is a lot more complicated than it has to be. It turns out to be equivalent to sum[k=1..31](256^k). Also, it looks like you've transposed the numerator and denominator in the last step, though your result is correct.

You have the ratio for a sequence of 32 bytes; the general solution for the ratio of shorter to full-length sequences is (1/255) - (1/((256^(n-1))*255)). For sequences of very short length, this ratio varies quite a bit, but the second term quickly approaches zero, so for any practical sequence of data (who would want to compress three bytes?), the ratio is 1/255 minus a very small constant.

What has actually been shown is that there are, in general, approximately only 1/255 as many possible shorter sequences, which is more of an upper bound--a given compression algorithm will be able to compress randomly-generated sequences of bytes at most 1/255 of the time, but realistically this will happen even less often.

If you derive the ratio for bits instead of bytes, it is 1-(1/(2^(n-1))), which is much more agreeable. However, on a machine with 8-bit bytes, any 32-byte sequence (for example) that compresses to 31 bytes plus a few bits would have to be padded to 32 bits anyway, effectively losing the compression--so the bit-based ratio doesn't contradict the above math.

I could post the derivation if anyone's interested.
--sg
Reputation Points: 182
Solved Threads: 71
Posting Pro in Training
gusano79 is offline Offline
475 posts
since May 2004
Jun 24th, 2004
0

Re: Does random data compress?

I would have said what you just said if it hadn't been 1:45 AM . Well, I'm gonna try making random data and seeing if winzip will compress it. It will be interesting to see if winzip can do the impossible...
Reputation Points: 115
Solved Threads: 5
Junior Poster
Toba is offline Offline
192 posts
since Jun 2004
Aug 15th, 2004
0

Re: Does random data compress?

Hrm.... truly random data is harder to find than I thought. <grumbles>
Reputation Points: 115
Solved Threads: 5
Junior Poster
Toba is offline Offline
192 posts
since Jun 2004
Aug 15th, 2004
0

Re: Does random data compress?

Have you tried an encryption routine? They're designed to produce pretty-darn-random numbers so that statistical analisys of the result won't yield clues to the input. There are some industrial strength random number generators, or you can start with something like an SHA-1 one-way hash.

Here's some code to grab:

http://www.codeproject.com/cpp/csha1.asp

If you want some big texts to use as a seed, you could grab some text from project gutenberg (sp?) like 'Romeo and Juliet', then feed that text into the SHA-1 code and poof you have a pretty darn random file!
Reputation Points: 36
Solved Threads: 11
Posting Pro in Training
Chainsaw is offline Offline
436 posts
since Jun 2004
Aug 16th, 2004
0

Re: Does random data compress?

Thanks for the idea Chainsaw. I encrypted a bitmap with PGP and deleted the header to leave a file of basically random data. Winzip couldn't compress it at all (It got bigger by a couple hundred bytes, actually). I guess random data doesn't compress... I was right.
Reputation Points: 115
Solved Threads: 5
Junior Poster
Toba is offline Offline
192 posts
since Jun 2004
Oct 6th, 2004
0

Re: Does random data compress?

Quote originally posted by Toba ...
Hrm.... truly random data is harder to find than I thought. <grumbles>
lol.
Winzip doing a pretty good job, huh?
Reputation Points: 10
Solved Threads: 1
Newbie Poster
Nuez_Jr is offline Offline
18 posts
since Oct 2004
Dec 20th, 2004
0

Re: Does random data compress?

lol Truly random data wont compress if every bit of it is different. The 'break even' case is if at least half of the data bits occurred twice or more. Then RLE or a Look Up method would JUST about reduce the size

Reputation Points: 16
Solved Threads: 6
Posting Pro in Training
1o0oBhP is offline Offline
445 posts
since Dec 2004
Jan 19th, 2005
0

Re: Does random data compress?

You could always write:

Summation (for k=1 to 31, 1.158 * 10^77 * (1/256)^k = 4.540 * 10^74)

I have used that notation for years. For a continued product, use:

ContProduct (for i=1 to m, i+1)

A limit, and a one-sided limit:

Limit(a goes to b, a^2)

Limit(a goes up to b, a^2)
Reputation Points: 11
Solved Threads: 8
Posting Whiz in Training
Real-tiner is offline Offline
207 posts
since Dec 2004
Jan 19th, 2005
0

Re: Does random data compress?

Algebraic random numbers are not very random. I usually write random number generators by finding the natural antilog e^x of some value and clipping off the leading 10 digits.
Reputation Points: 11
Solved Threads: 8
Posting Whiz in Training
Real-tiner is offline Offline
207 posts
since Dec 2004

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Computer Science Forum Timeline: theory of programming languages
Next Thread in Computer Science Forum Timeline: Eliminate Left Recursive Grammar





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC