SmackerNews

AI Engineer Claims to Have Cracked Linear A

427 points · 167 comments · 19 hours ago · Kosturdistan

aiclambake.com

stratocumulus017 hours ago
As an amateur who's been fascinated by this puzzle himself, I will add some context that might be relevant in assessing the plausibility of this claim:
- The "Libation Formula", which the author used as the base for his translations, is the most studied piece of writing in Linear A, because it's the only recurring phrase (with grammatical variation) that we have. The corpus is extremely fragmentary, with just a handful of instances of longer text (and even then, the texts are the length of an average sentence in English). The majority of documents available to us are lists (of inventory, personnel, offerings or something of this sort). The longer texts make use of punctuation marks, likely put in between words. This gives us a non-trivial vocabulary, which still does not match that of any known language.
- With such fragmentary remaining material, we cannot be sure that a) all the texts we call "Linear A" are written in the same language, and b) the recognizable words are not abbreviations, for example.
- The author made an assumption that Linear A symbols which have counterparts in Linear B should have the same phonetic values. This gives us an already known glyph that represented "NA". "Duplicate" glyphs are only found in the P-series, and are assumed to represent syllables which were distinguished by the Linear A language, but not by Greek - such as aspirated/unaspirated P. There is a glyph that stands for "NWA" in Linear B, but instances of it have been found in Linear A as well.
- There are countless words with no known etymology in Ancient Greek, assumed to originate from a substrate language or languages spoken in the area at the time Greeks migrated to their present-day homeland. The language of Linear A would be a likely candidate for such substrate. If Linear A were a Semitic language, then we should already be able to establish Semitic etymologies for those words as they were in Greek. Of course it could also be the case that these words came from an another language which did not adopt writing or its writing did not survive to our times.
KosturdistanOP19 hours ago
A lot of loonies make this claim, but Tom's work is credible enough that it's being reviewed by linguistics experts at Rutgers and Cambridge. Additional validation: his approach produces results. He's translated over 300 words, and that's never been done before, and his solution actually solves some problems in Linear B. Tom is an AI engineer, and Claude Code was key to his work. Disclosures: I know Tom socially, and I wrote the post at the link.
simonw15 hours ago
Di Mino used Claude Code to build a suite of Python scripts that query, cross-reference, and organize the digitized Linear A corpus (drawn from the GORILA and SigLA databases), enabling systematic hypothesis testing at a scale that would have been impractical to do manually.
That's exactly the kind of thing I'd hope Claude would be used for in these kinds of projects - building tools, not black-box "solving" the problem.
Tuna-Fish17 hours ago
The reason linear A is so difficult is that the total remaining corpus of Linear A text is ~7500 characters, spread out over ~1500 inscriptions.
If you have a 4k screen, you can fit all remaining Linear A text on your screen at once, in 14pt high font.
petjuh14 hours ago
From what I know, the main issue is that the Linear A script corpus is rather small. Another commenter here said it's only 7500 symbols in total, spread around 1500 inscriptions (so on average 5 symbols per inscription).
The other thing I find odd, however, is that it's found to be a Semitic language. If it's a Semitic language, I would have expected it to already have been deciphered. And certainly linguists would have already looked at Semitic languages, and looked hard.
Also if it were a Semitic language, why wasn't it consonantal but had vowels? Usually Semitic languages (and Egyptian maybe) write only the consonants because their stems are made three consonants and vowels are interweaved to make words.
Example semitic root K-T-B and how vowels are added in-between to form words:
kataba – He wrote yaktubu – He writes / is writing kitāb – A book kutub – Books kātib – A writer / scribe / clerk maktūb – Written / fate maktab – An office / desk maktabah – A library / bookstore
And another such root - D-R-S which means "studying" or "learning."
darasa – He studied yadrusu – He studies / is studying dirāsah – A study / school course dāris – A student / learner madrūs – Studied / carefully planned madrasah – A school
This system of triliteral roots is the reason why usually Semitic languages don't use vowels. Why would Linear A have consonant+vowel syllabary if it were semitic?
loudmax17 hours ago
This is very exciting. Congrats to Tom on the accomplishment.
To be clear, this is an attempt at a decipherment. This is not proven, and we shouldn't consider Linear A to be "solved" until experts in the field have reviewed the work. In fact, it probably shouldn't be considered "proof" unless some more Linear A writings are uncovered and these are congruent with the method proposed. All that can be said for certain at this point is that this is an interesting conjecture.
But this is a story worth following. This could be the real deal. More research and validation should follow and we should have a better idea in the next few weeks or months whether Linear A has really been solved. At the very least, this is an interesting attempt, and optimistically, it could yield real insight into Minoan culture. Kudos.
cwmma17 hours ago
Isn't a big problem with Linear A that there are so few symbols you can "solve" it relatively straightforwardly with no way to tell if you it's correct or not?
singularity200115 hours ago
If it turns out to be true, it would open the door a bit for connecting Indo-European languages with Semitic languages. In the beginning of the last century it was believed that these were related. Later this came out of vogue. How could they have been so wrong initially? Because both languages families were entangled, as now there is genetic evidence that both languages spread from very close to the Caucasus. It's probably old news for most but in the last 15 years it became clear that Europe was completely resettled, once by Anatolians and then partly by Indo-Europeans. The language of the Anatolians is still unknown.
singularity200114 hours ago
Wait, I've seen the same libation formula appearing in the Phaistos disc. For those 10 of you who have the fonts installed:
𐇑 𐇘 𐇪 𐇐 | 𐇬 𐇳 𐇖 𐇗𐇽 | 𐇬 𐇗 𐇜 | 𐇬 𐇼 𐇖𐇽 | 𐇥 𐇬 𐇳 𐇖 𐇗𐇽 | 𐇪 𐇱 𐇦 𐇨 | 𐇖 𐇡 𐇲 | 𐇖 𐇼 𐇖𐇽 | 𐇖 𐇡 𐇲 | 𐇥 𐇬 𐇳 𐇖 𐇗𐇽 i-𐇘-wi-jeʳ | ʰau-ni-ti-noʳ au-no-pa au-ndi-tiʳ 𐇥-au-ni-ti-noʳ wa-pi-naᵐwa ti-ru-te ti-nd-tri ti-na-ru-he ʰau-ni-ti-noʳ i-301-wa-ja/e | ʰau-... jaᵘ-di-ki-to i-pi-na-ma si-ru-te ta-na-ra te-ti-u ta-na-te i-da 𐘚 ᴴI 𐘮 WA 𐘱 JA 𐘱 JA 𐘆 DI 𐘸 KI 𐘹 TU 𐘚 ᴴI 𐘢 PI 𐘅 NA 𐙁 MA (󲎘)
I believe the phonetic values for Phaistos here were based on similarity.
rich_sasha16 hours ago
Gotta love the nominative determinism: Tom Di Mino ("of Mino"?) cracks a Minoan language.
teleforce12 hours ago
Di Mino believes that Linear A belongs to an extinct Semitic language that was a precursor to biblical Hebrew, the way that Latin is a precursor to Italian.
Indus valley script is about 1500 years earlier than Linear A and I hope we can also decipher Indus script using AI or not [1]. It's well overdue although from statistical profiling it's has been proven to be a valid linguistic script believed to be being used for writing system the ancient Harappan language, the likely precursor of modern Dravidian languages for examples Telegus and Tamil.
The main reason it's very difficult to decipher is that there's no equivalent Rosetta Stone for Indus script. My hypothesis is that the AI LLM model can be trained or tuned as the logical or virtual version of the venerable Rosetta stone hence can be used to decipher ancient writing system.
[1] Indus script:
https://en.wikipedia.org/wiki/Indus_script
mNovak18 hours ago
Interesting writeup. Would be nice to have a couple images of Linear A/B scripts to visualize. Looking on google, they're very daunting!
bazoom4216 hours ago
I wonder how you would even know if you have “cracked” it, given the corpus is so small?
[deleted]
evilfred16 hours ago
i'm gonna write a blog post now about how my buddy discovered cold fusion and will have a paper out real soon now
tlogan14 hours ago
Ok. But where is the table of translations?
indiv018 hours ago
Can I get his decipher-forgotten-ancient-text skill? I want to try my hand at the Voynich Manuscript
WhitneyLand17 hours ago
If confirmed this is really cool and impressive work.
Honestly curious how many years before it can be one shotted in a coding harness with Fable.next by someone who’s not a linguistics expert.
Develop, test, and rank hypotheses about the phonetic values, morphology, grammar, and possible language family of Linear A using the full available corpus. Do not assume any decipherment is correct. Treat all candidate readings as hypotheses to be scored…”
vb-844817 hours ago
I wonder if LLMs trained specifically for this purpose can perform well with "forgotten languages".
I know I'm simplifying a lot, but all this deciphering isn't it just some kind of pattern matching?
Laurel123414 hours ago
Isn't Minoan highly agglutinative?
NooneAtAll318 hours ago
relevant xkcd: https://xkcd.com/2151/
akerl_14 hours ago
This would ruin my Linear A keycaps!
doubleorseven17 hours ago
crossing my fingers for this guy.
however, nawaya or what ever examples around it are not part of the Hebrew language.
rw_panic0_017 hours ago
would like to hear more about Tom's learning/education path in ML/AI.
OutOfHere18 hours ago
Is this extendible to a generalizable approach to translate any language pair (without a translation map or translation dataset)?
WalterBright14 hours ago
Amateurs! I've already translated it:
"Thag is a smarty-pants"
YeGoblynQueenne11 hours ago
The author and their friend are in the thread so I'll try to not be mean.
Caveat: I'm Greek so a kind of natural amateur historian. That is to say I grew up reading about the prehistory and ancient history of Greece, as one does when one is born Greek and a geek. I've seen the Phaistos disk and linear A inscriptions with mine own eyes in Greek museums and I have dreamed of the day they would be translated. I am not at all unsympathetic to the hopes of a Linear A decipherement.
However. The claimed decipherment has all the hallmarks of imaginative and fanciful attempts to draw parallels between historical events and entities, that were not really connected, many of them notably inspired by the Hebrew bible. For example, remember when the lost tribes of Israel turned out in the New World [1]? Or how Biblical Sodom was actually destroyed by a comet [2]? Or the time that Venus was ejected from Jupiter and caused the Biblical Cataclysm [3]? Or, for less biblical but no less foundational texts of the Western literary canon, remember when Heinrich Schliemann discovered the Jewels of Helen of Troy [4] and the Death Mask of King Agamemnon [5]?
Or of course we could recall any of the claims to decipher the Phaistos Disk [6] or the Dropa Disks from Bayan Kara-Ula [7], and so on I'm sure.
All of the above is not to say that a decipherement is impossible. What it is to say is that it currently isn't possible; because we have no idea what the language that Linear A transcribes even is. It's not like the Minoan language is still spoken today in some far-evolved form, as was the case for e.g. Egyptian or Mayan or indeed ancient Greek [8]. So we have an unknown script, writing an unknown language, and to make matters worse there are no parallel texts with another ancient language that might help us bridge the gap. What there is, is some rudimentary understanding of the more obvious contents of Linear A texts (mostly, lists of goods) and the fact that some Linear A symbols have been reused in Linear B.
But, how were they reused? And what good is that knowledge without knowing anything about the language transcribed by Linear A? I can read German, a language that I don't speak, because I can read Latin script, but the meaning of the script might as well be Greek to me [9].
I'm a computer scientists, I guess, these days [10]. The problem of deciphering Linear A, or the Phaistos Disk, or any other script (that may not even be a script) that transliterates a language that we don't know is a problem of reconstructing information that we don't have, from other information that we don't have. I'm not saying it's completely impossible. I mean, who knows? Maybe we're just missing the right maths. But, what we're really trying to do here is de-noise a message garbled by the passage of time without even a guess as to the language the message is written in. Claude Shannon would tell you that it's a fantasy that is not worth pursuing. You don't have to ask him, you can just read his magnum opus [11] and check out Section 3 titled "The Series of Approximations to English" for an idea of what the mechanics of deciphering a script when the language is known look like with the only technology we have that can do the job reliably.
When Turing and the other Brits at Bletchley Park cracked the Enigma code, they at least knew it was, ultimately, a coded form of German. We may have a lot more compute now, and much more advanced tech overall, but there are some barriers that you cannot physically cross, no matter what resources you have. For example, you can't go faster than light and you can't escape the event horizon of a black hole. In the same way you can't translate text written in an unknown script, encoding an unknown language, without any parallel texts with a known language. There is just not enough information to do the job. Worse, if you try, you can endlessly come up with plausible "translations" and convince yourself that you have the right one, but you have no way to know you do.
I'm sorry but this claim is just a wild guess trying to link Hebrew to Linear A, without any serious evidence that the two are linked and without any evidence that the link is real, other than "look, I can guess what all the texts say!".
_________________
[1] https://en.wikipedia.org/wiki/Jewish_Indian_theory
[2] https://www.smithsonianmag.com/smart-news/destruction-of-cit...
[3] https://en.wikipedia.org/wiki/Worlds_in_Collision
[4] https://en.wikipedia.org/wiki/Priam%27s_Treasure#/media/File...
[5] https://en.wikipedia.org/wiki/Mask_of_Agamemnon
[6] https://en.wikipedia.org/wiki/Phaistos_Disc_decipherment_cla...
[7] https://en.wikipedia.org/wiki/Dropa_stones
[8] I can read ancient Greek. The further back in time it goes, the harder it gets to understand what it means but I can still read the script. It has changed in 5000 years but not enough that I can't read it. Nothing like that ability survived for Linear A. I blame Thera.
[9] Except of course then I would understand it. But it's just German to me.
[10] I can assure you that took me by surprised, first of all.
[11] https://people.math.harvard.edu/~ctm/home/text/others/shanno...
pfdietz15 hours ago
Now get to work on Harappan.
iwontberude17 hours ago
Sorry but I don’t recognize this as being an achievement by an amateur. This dude had no chance in hell until we trained a model to use his time to suss it out.
fooster17 hours ago
Alot of the comments in this thread are disappointing. Rather that celebrating an achievement (whether or it is validated yet), many of you seem to want to put him down, or make it seem like claude did all the work.
Claiming that claude did all the work is patently ridiculous. Claude is a tool, like any other. The corpus of linear A is ~7500 characters across ~1500 inscriptions and claude, no matter how smart, doesn't just solve that on its own.
What a shame.

news.ycombinator.com/item?id=48600107