310 points · 98 comments · 11 years ago · morisy
github.comzaroth
nadam
baddox
0 1 10 11 100 101 110 111 1000...
becomes 0.0110111001011101111000...
It's much easier to demonstrate that this number is normal than to do so for pi. It's also much easier to calculate the nth digit, and to find an occurrence of a given string of bits.zachrose
"You can't copyright a fact (like a number), but you can copyright a creative work, like a song or a piece of software. But since one can be transformed into another, copyright law is logically INCOHERENT."
bonchibuji
peterkelly
ttflee
TOMDM
bunderbunder
e.g., the first instance of "256" in pi starts at the 1750th digit. So in that case you're getting a 'compression' rate of -33% if we go by the count of decimal digits used.
skhavari
TheAuditor
The good probability that a 5 digit combination is found in Pi will be in the range of locations above 10000, for example I once located by 6 digit phone number in position 685214 which was not actually helpful at all.
Further we are not sure if Pi is normal hence the better idea would be use a simple computable normal series.
It was just yesterday I uploaded a paper that presents a idea for Compressing Random Data to -> https://www.academia.edu/7620004/Advanced_Compression_Techni... which proposes an Idea to push multiple bytes represented by a positions in a computable number series into small representation and generate them on the go when required. (need lot of improvement to actually apply)
braydenjw
For example, the byte 0xFF, which is the number 255, first occurs at the 1168th value of Pi.This means instead of storing 255, you're now storing 1168, or 0x490, requiring an extra half-byte. However, 0x328, or number 808, first occurs at the 105th value of pi, or 0x69, requiring one less half-byte.
How does this system provide better compression? The way I see it, the best case scenario would be if no sequence from 000 to 255 was ever repeated in Pi (or rather, not until every pattern in that sequence has been covered), in this case the compression ratio should be exactly 0%, no net gain or loss.
fluff3141592653
EdwardCoffin
[1] http://en.wikipedia.org/wiki/Frederik_Pohl
[2] http://www.encyclopediaofmath.org/index.php/Gödelization
[3] http://kasmana.people.cofc.edu/MATHFICT/mfview.php?callnumbe...
cettox
Because your index size would be at least equal or higher than your original data.
The only way you get a smaller compression index, you have to look for recurrences, and try to only include most recurring sequences up to a number(there would be a tradeof and an optimal number for compression ratio) and left other sequences uncompressed. Only this way you can achieve compression ratio's smaller than 100%.
pbhjpbhj
andybak
Here's a link in case you have trouble locating it within Pi: http://hyperdiscordia.crywalt.com/library_of_babel.html
Jack5500
bmh100
tluyben2
tragomaskhalos
andrewfong
andrey-p
aaron695
somid3
alixaxel
Am I right in assuming that the decompression step is several orders of magnitude faster than the compression phase?
mavdi
haddr
tiku
soheil
Igglyboo
dvanduzer
serge2k
They said 100% compression was impossible? You're looking at it!
If the offset within pi is so large that any representation of it is larger than my data?