Does random data compress?

Please support our Computer Science advertiser: Learn about neural networks and artificial intelligence.
Reply

Join Date: Jun 2004
Posts: 190
Reputation: Toba is an unknown quantity at this point 
Solved Threads: 4
Toba's Avatar
Toba Toba is offline Offline
Junior Poster

Does random data compress?

 
0
  #1
Jun 24th, 2004
I've been wondering about this for awhile now: does random data compress?

Some people seem to think it does, while others maintain that it doesn't. I mean to mathematically prove that only 1/255 of random files can compress.

Conditions:

Original file size: 32 bytes
Maximum size of "compressed" file: 31 bytes
Number of possible different original files: 1.158 * 10^77
Number of possilbe "compressed" files: let's calculate that

Uhoh, this is gonna be hard to do without summation sybols.... well, ASCII art never hurt anyone.

31_
\
/__ 1.158 * 10^77 * (1/256)^k = 4.540 * 10^74
k=1

(1.158 * 10^77) / (4.540 * 10^74) ~ 1/255

Does this or does this not prove that only 1/255 of random data files can be compressed, and is my math correct?
what? WHAT?
Reply With Quote Quick reply to this message  
Join Date: May 2004
Posts: 82
Reputation: gusano79 is on a distinguished road 
Solved Threads: 4
gusano79 gusano79 is offline Offline
Junior Poster in Training

Re: Does random data compress?

 
0
  #2
Jun 24th, 2004
Originally Posted by Toba
I've been wondering about this for awhile now: does random data compress?
...
Does this or does this not prove that only 1/255 of random data files can be compressed, and is my math correct?
That sum is a lot more complicated than it has to be. It turns out to be equivalent to sum[k=1..31](256^k). Also, it looks like you've transposed the numerator and denominator in the last step, though your result is correct.

You have the ratio for a sequence of 32 bytes; the general solution for the ratio of shorter to full-length sequences is (1/255) - (1/((256^(n-1))*255)). For sequences of very short length, this ratio varies quite a bit, but the second term quickly approaches zero, so for any practical sequence of data (who would want to compress three bytes?), the ratio is 1/255 minus a very small constant.

What has actually been shown is that there are, in general, approximately only 1/255 as many possible shorter sequences, which is more of an upper bound--a given compression algorithm will be able to compress randomly-generated sequences of bytes at most 1/255 of the time, but realistically this will happen even less often.

If you derive the ratio for bits instead of bytes, it is 1-(1/(2^(n-1))), which is much more agreeable. However, on a machine with 8-bit bytes, any 32-byte sequence (for example) that compresses to 31 bytes plus a few bits would have to be padded to 32 bits anyway, effectively losing the compression--so the bit-based ratio doesn't contradict the above math.

I could post the derivation if anyone's interested.
--sg
Reply With Quote Quick reply to this message  
Join Date: Jun 2004
Posts: 190
Reputation: Toba is an unknown quantity at this point 
Solved Threads: 4
Toba's Avatar
Toba Toba is offline Offline
Junior Poster

Re: Does random data compress?

 
0
  #3
Jun 24th, 2004
I would have said what you just said if it hadn't been 1:45 AM . Well, I'm gonna try making random data and seeing if winzip will compress it. It will be interesting to see if winzip can do the impossible...
what? WHAT?
Reply With Quote Quick reply to this message  
Join Date: Jun 2004
Posts: 190
Reputation: Toba is an unknown quantity at this point 
Solved Threads: 4
Toba's Avatar
Toba Toba is offline Offline
Junior Poster

Re: Does random data compress?

 
0
  #4
Aug 15th, 2004
Hrm.... truly random data is harder to find than I thought. <grumbles>
what? WHAT?
Reply With Quote Quick reply to this message  
Join Date: Jun 2004
Posts: 436
Reputation: Chainsaw is an unknown quantity at this point 
Solved Threads: 11
Chainsaw's Avatar
Chainsaw Chainsaw is offline Offline
Unprevaricator

Re: Does random data compress?

 
0
  #5
Aug 15th, 2004
Have you tried an encryption routine? They're designed to produce pretty-darn-random numbers so that statistical analisys of the result won't yield clues to the input. There are some industrial strength random number generators, or you can start with something like an SHA-1 one-way hash.

Here's some code to grab:

http://www.codeproject.com/cpp/csha1.asp

If you want some big texts to use as a seed, you could grab some text from project gutenberg (sp?) like 'Romeo and Juliet', then feed that text into the SHA-1 code and poof you have a pretty darn random file!
Reply With Quote Quick reply to this message  
Join Date: Jun 2004
Posts: 190
Reputation: Toba is an unknown quantity at this point 
Solved Threads: 4
Toba's Avatar
Toba Toba is offline Offline
Junior Poster

Re: Does random data compress?

 
0
  #6
Aug 16th, 2004
Thanks for the idea Chainsaw. I encrypted a bitmap with PGP and deleted the header to leave a file of basically random data. Winzip couldn't compress it at all (It got bigger by a couple hundred bytes, actually). I guess random data doesn't compress... I was right.
what? WHAT?
Reply With Quote Quick reply to this message  
Join Date: Oct 2004
Posts: 18
Reputation: Nuez_Jr is an unknown quantity at this point 
Solved Threads: 1
Nuez_Jr's Avatar
Nuez_Jr Nuez_Jr is offline Offline
Newbie Poster

Re: Does random data compress?

 
0
  #7
Oct 6th, 2004
Originally Posted by Toba
Hrm.... truly random data is harder to find than I thought. <grumbles>
lol.
Winzip doing a pretty good job, huh?
Reply With Quote Quick reply to this message  
Join Date: Dec 2004
Posts: 445
Reputation: 1o0oBhP is an unknown quantity at this point 
Solved Threads: 6
1o0oBhP's Avatar
1o0oBhP 1o0oBhP is offline Offline
Posting Pro in Training

Re: Does random data compress?

 
0
  #8
Dec 20th, 2004
lol Truly random data wont compress if every bit of it is different. The 'break even' case is if at least half of the data bits occurred twice or more. Then RLE or a Look Up method would JUST about reduce the size

http://sales.carina-e.com

no www
no nonsense

coming soon to a pc near you! :cool:
Reply With Quote Quick reply to this message  
Join Date: Dec 2004
Posts: 207
Reputation: Real-tiner is an unknown quantity at this point 
Solved Threads: 8
Real-tiner Real-tiner is offline Offline
Posting Whiz in Training

Re: Does random data compress?

 
0
  #9
Jan 19th, 2005
You could always write:

Summation (for k=1 to 31, 1.158 * 10^77 * (1/256)^k = 4.540 * 10^74)

I have used that notation for years. For a continued product, use:

ContProduct (for i=1 to m, i+1)

A limit, and a one-sided limit:

Limit(a goes to b, a^2)

Limit(a goes up to b, a^2)
Reply With Quote Quick reply to this message  
Join Date: Dec 2004
Posts: 207
Reputation: Real-tiner is an unknown quantity at this point 
Solved Threads: 8
Real-tiner Real-tiner is offline Offline
Posting Whiz in Training

Re: Does random data compress?

 
0
  #10
Jan 19th, 2005
Algebraic random numbers are not very random. I usually write random number generators by finding the natural antilog e^x of some value and clipping off the leading 10 digits.
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Other Threads in the Computer Science Forum
Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC