| | |
Does random data compress?
Please support our Computer Science advertiser: Learn about neural networks and artificial intelligence.
![]() |
I've been wondering about this for awhile now: does random data compress?
Some people seem to think it does, while others maintain that it doesn't. I mean to mathematically prove that only 1/255 of random files can compress.
Conditions:
Original file size: 32 bytes
Maximum size of "compressed" file: 31 bytes
Number of possible different original files: 1.158 * 10^77
Number of possilbe "compressed" files: let's calculate that
Uhoh, this is gonna be hard to do without summation sybols.... well, ASCII art never hurt anyone.
31_
\
/__ 1.158 * 10^77 * (1/256)^k = 4.540 * 10^74
k=1
(1.158 * 10^77) / (4.540 * 10^74) ~ 1/255
Does this or does this not prove that only 1/255 of random data files can be compressed, and is my math correct?
Some people seem to think it does, while others maintain that it doesn't. I mean to mathematically prove that only 1/255 of random files can compress.
Conditions:
Original file size: 32 bytes
Maximum size of "compressed" file: 31 bytes
Number of possible different original files: 1.158 * 10^77
Number of possilbe "compressed" files: let's calculate that
Uhoh, this is gonna be hard to do without summation sybols.... well, ASCII art never hurt anyone.
31_
\
/__ 1.158 * 10^77 * (1/256)^k = 4.540 * 10^74
k=1
(1.158 * 10^77) / (4.540 * 10^74) ~ 1/255
Does this or does this not prove that only 1/255 of random data files can be compressed, and is my math correct?
what? WHAT?
•
•
Join Date: May 2004
Posts: 82
Reputation:
Solved Threads: 4
•
•
•
•
Originally Posted by Toba
I've been wondering about this for awhile now: does random data compress?
...
Does this or does this not prove that only 1/255 of random data files can be compressed, and is my math correct?
You have the ratio for a sequence of 32 bytes; the general solution for the ratio of shorter to full-length sequences is (1/255) - (1/((256^(n-1))*255)). For sequences of very short length, this ratio varies quite a bit, but the second term quickly approaches zero, so for any practical sequence of data (who would want to compress three bytes?), the ratio is 1/255 minus a very small constant.
What has actually been shown is that there are, in general, approximately only 1/255 as many possible shorter sequences, which is more of an upper bound--a given compression algorithm will be able to compress randomly-generated sequences of bytes at most 1/255 of the time, but realistically this will happen even less often.
If you derive the ratio for bits instead of bytes, it is 1-(1/(2^(n-1))), which is much more agreeable. However, on a machine with 8-bit bytes, any 32-byte sequence (for example) that compresses to 31 bytes plus a few bits would have to be padded to 32 bits anyway, effectively losing the compression--so the bit-based ratio doesn't contradict the above math.
I could post the derivation if anyone's interested.
--sg
Have you tried an encryption routine? They're designed to produce pretty-darn-random numbers so that statistical analisys of the result won't yield clues to the input. There are some industrial strength random number generators, or you can start with something like an SHA-1 one-way hash.
Here's some code to grab:
http://www.codeproject.com/cpp/csha1.asp
If you want some big texts to use as a seed, you could grab some text from project gutenberg (sp?) like 'Romeo and Juliet', then feed that text into the SHA-1 code and poof you have a pretty darn random file!
Here's some code to grab:
http://www.codeproject.com/cpp/csha1.asp
If you want some big texts to use as a seed, you could grab some text from project gutenberg (sp?) like 'Romeo and Juliet', then feed that text into the SHA-1 code and poof you have a pretty darn random file!
![]() |
Other Threads in the Computer Science Forum
- Previous Thread: Anyone know how to do PSEUDOCODE?
- Next Thread: Eliminate Left Recursive Grammar
| Thread Tools | Search this Thread |
ai algorithm algorithms amazon assignment assignmenthelp assignments automata battery bigbrother binary bittorrent bizarre bletchleypark blogging bomb business cern codebreaker compiler computer computers computerscience computertrackingsoftware connect conversion csc dataanalysis dataintepretation development dfa dissertation dissertations dissertationthesis dissertationtopic ebook employment energy extensions floatingpoint foreclosure fuel gadgets geeks givemetehcodez government hardware homeowners homeworkassignment homeworkhelp humor ibm idea ideas internet iphone itcontracts jobs kindle laser linkbait lsmeans mainframes marketing mining msaccess nano netbeans networking news os p2p piracy piratebay principles programming rasterizer research sam-being-cute sas science security sex simulation software spying sql stephenfry study supercomputer supercomputing sweden technology textfield turingtest two'scompliment uk virus warehouse ww2






. Well, I'm gonna try making random data and seeing if winzip will compress it. It will be interesting to see if winzip can do the impossible...