SmackerNews

File system that stores location of file in Pi

310 points · 98 comments · 11 years ago · morisy

  Now, we all know that it can take a while to find a long sequence of digits in π,
  so for practical reasons, we should break the files up into smaller chunks that
  can be more readily found.

  In this implementation, to maximise performance, we consider each individual byte
  of the file separately, and look it up in π.

Definitely worth a chuckle. Very cute idea and implementation.

nadam11 years ago
When I was young I had this idea that any hard drive can be compressed into 100 bytes. The compressed data is a 4 dimensional vector, a component of the vector is a 25 byte floating point number, and represent the space-time coordinates of the hard drive. (For example my hard drive in 1994 marc 3 23:00:45.456 at a specific place in Budapest) The extractor algorithm just have to simulate the universe from the big bang up until the given time, read the state of the atoms at the specified location, recognize the hard drive, and read the data from it. (Provided that the universe is deterministic, and what seems to be random in quantum mechanics can be simulated with a pseudorandom number generator.)
baddox11 years ago
If you're going to use a normal number for this purpose, why not choose a much nicer one? Let's use a number such that its binary representation is the concatenation of consecutive ascending binary numbers.
```
    0 1 10 11 100 101 110 111 1000...
```
becomes
```
    0.0110111001011101111000...
```
It's much easier to demonstrate that this number is normal than to do so for pi. It's also much easier to calculate the nth digit, and to find an occurrence of a given string of bits.
zachrose11 years ago
Obligatory Dinosaur Comics: http://www.qwantz.com/index.php?comic=353
"You can't copyright a fact (like a number), but you can copyright a creative work, like a song or a piece of software. But since one can be transformed into another, copyright law is logically INCOHERENT."
bonchibuji11 years ago
Isn't this one of the April 1st jokes from 2012? Most of the commits were made on March 31, 2012[1]. And there's even a reference to the pi joke[2].
[1] https://github.com/philipl/pifs/commits/master
[2] http://www.netfunny.com/rhf/jokes/01/Jun/pi.html
peterkelly11 years ago
Great, now we're going to see a DMCA takedown for π as it contains copyrighted content.
ttflee11 years ago
Like other lossless compression algorithms, there always exist some blobs of data, where the length of the location plus metadata exceeds that of the the original blob, due to the pigeon hole principle. The trouble in the case of pi-fad is that probably we will not know whether the location is longer or not before it is ever actually computed.
TOMDM11 years ago
I love pieces of code like this, it appeals to me in the same way sleepsort does. A superficial understanding of it might make you think it would be worth it, but really, while it may work, it's better left as a joke.
bunderbunder11 years ago
I'm skeptical that this could really save any space. Just speculating here, really, but it seems like on average the amount of space needed to store the starting index of an arbitrary string of digits in pi should be greater than (or at least comparable to) the size of the string itself.
e.g., the first instance of "256" in pi starts at the 1750th digit. So in that case you're getting a 'compression' rate of -33% if we go by the count of decimal digits used.
skhavari11 years ago
Hooli is gonna be pissed when they learn that Pi-ed Piper nailed a compression algorithm.
TheAuditor11 years ago
I had played with this idea some time back and gave up after some very specific flaws came became clear.
The good probability that a 5 digit combination is found in Pi will be in the range of locations above 10000, for example I once located by 6 digit phone number in position 685214 which was not actually helpful at all.
Further we are not sure if Pi is normal hence the better idea would be use a simple computable normal series.
It was just yesterday I uploaded a paper that presents a idea for Compressing Random Data to -> https://www.academia.edu/7620004/Advanced_Compression_Techni... which proposes an Idea to push multiple bytes represented by a positions in a computable number series into small representation and generate them on the go when required. (need lot of improvement to actually apply)
braydenjw11 years ago
I'm not sure I understand how this would compress files. I mean, the only way it could is if the decimal place in Pi at which the byte occurs is significantly less than the value of the byte itself. Statistically, this would happen less than 50% of the time, the other 50+% of the time occurring at a higher decimal place. I don't see this providing any real compression benefit.
For example, the byte 0xFF, which is the number 255, first occurs at the 1168th value of Pi.This means instead of storing 255, you're now storing 1168, or 0x490, requiring an extra half-byte. However, 0x328, or number 808, first occurs at the 105th value of pi, or 0x69, requiring one less half-byte.
How does this system provide better compression? The way I see it, the best case scenario would be if no sequence from 000 to 255 was ever repeated in Pi (or rather, not until every pattern in that sequence has been covered), in this case the compression ratio should be exactly 0%, no net gain or loss.
fluff314159265311 years ago
I've been looking for the lyrics to the song that, when sung, will bring about peace on this planet. Now to hear that the file containing these lyrics is already contained in pi is revelatory. Could someone please give me the index and length of the file? I've got some singing to do.
EdwardCoffin11 years ago
This reminds me of Frederik Pohl's [1] book The Gold at Starbow's End, in which Gödelization [2] is used to compress a huge message into a very short one. There's a brief description of that part of the book at MathFiction [3]
[1] http://en.wikipedia.org/wiki/Frederik_Pohl
[2] http://www.encyclopediaofmath.org/index.php/Gödelization
[3] http://kasmana.people.cofc.edu/MATHFICT/mfview.php?callnumbe...
cettox11 years ago
As many pointed that out using Pidgeon Hole principle, it is not practical to create a compression index(A lookup index where you map actual data with some kind of adresses preferably smaller than sequences), using every possible n byte sequence of your data!
Because your index size would be at least equal or higher than your original data.
The only way you get a smaller compression index, you have to look for recurrences, and try to only include most recurring sequences up to a number(there would be a tradeof and an optimal number for compression ratio) and left other sequences uncompressed. Only this way you can achieve compression ratio's smaller than 100%.
pbhjpbhj11 years ago
I scanned the responses and saw only one that mentioned that pi is not proved (or possibly also provably) normal. That comment was downvoted.
andybak11 years ago
If anyone here hasn't read The Library of Babel yet, then now is a good time.
Here's a link in case you have trouble locating it within Pi: http://hyperdiscordia.crywalt.com/library_of_babel.html
Jack550011 years ago
This project was posted before and hasn't been updated since. I doubt that it is still in development
bmh10011 years ago
This is extremely clever and something I have wanted to do for a while. If you are interested in contributing to a fun, small Clojure project, stop by: https://github.com/bmhimes/clojure-pifs
tluyben211 years ago
Reminds me of Jan Sloot; http://en.wikipedia.org/wiki/Jan_Sloot. It was like an april fools but a lot of big people fell for it at the time.
tragomaskhalos11 years ago
If you use base 11 you get the added bonus of proving the existence of god !(http://en.wikipedia.org/wiki/Contact_(novel))
andrewfong11 years ago
Reminds me of this SMBC: https://medium.com/the-nib/jesus-is-destroying-civilization-...
andrey-p11 years ago
I prefer the infinite monkey database [1] myself.
[1]: https://github.com/brycebaril/infinite-monkey-db
aaron69511 years ago
I'm not sure pi is proven to contain all sequence of digits. Anyone care to link a proof. The joke be on them and they might not really understand pi at all.
somid311 years ago
Fascinating implementation I must admit. How does i/o performance change as the byte-length or chunks vary in size from 3 to 200 bytes?
alixaxel11 years ago
This is genius!
Am I right in assuming that the decompression step is several orders of magnitude faster than the compression phase?
mavdi11 years ago
Smartest thing I've seen in months, not so much the speed, but the compression value of it is great
haddr11 years ago
really cool idea, but not sure if actually true: http://math.stackexchange.com/questions/216343/does-pi-conta...
tiku11 years ago
if you have this large base file with pi-numbers, you could use it to compress data right? and with the current internet speeds, pi-storing in the cloud could be an option. or hell, even distributed pi files :)
soheil11 years ago
The location of where the data is is no less complex than the actual data.
Igglyboo11 years ago
His weissman score must be off the charts.
dvanduzer11 years ago
has anybody run bonnie++ benchmarks with this fs?
serge2k11 years ago
They said 100% compression was impossible? You're looking at it!
If the offset within pi is so large that any representation of it is larger than my data?

news.ycombinator.com/item?id=8018818