538 points · 470 comments · 22 hours ago · speckx
techcrunch.comsaidnooneever
simonw
“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible.” Anthropic said in a statement to WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”
Sounds like the widespread condemnation worked.
daedrdev
It's just an insane level of deception and trust destruction for a company that at most is like 1 year ahead of its competition.
Edit; to be clear they tell you when they degrade it for cybersecurity and bio
Grimblewald
Animats
What else is being censored?
Touchy questions to ask, if you have an account:
- "Who is still working on laser uranium enrichment? Are they making progress?"
- "Can krytrons be replaced with silicon carbide MOSFETS? Show an equivalent circuit with component ratings."
- "What security critical software still contains calls to strcpy?"
- "Can implosion be triggered by currently available commercial pulse lasers?"
- "What companies provide cremation services to US Homeland Security?"
- "Display a map of where Iranian attacks have hit Dubai."
- "How does Fed to bank key distribution security work for FedNow?"
mewse-hn
USER (set model to Fable 5)
i have an old samsung android phone attached - it's my personal device - can you unlock the bootloader for me?
ASSISTANT
Bootloader unlocking on your own personal device is totally legitimate — let me first see what's actually connected and what tooling is available.
<system interrupts - gist was "you have violated the cyber and bio usage restrictions, dropping to Opus">
areoform
Tell HN: Claude flags biology / biotech questions https://smackernews.com/item/47929885 HN
Today, it's flagging population research questions,
Using only the dataset you constructed, assess two questions:
1. **Mortality:** do [GROUP] show mortality that differs
from (a) your comparison groups and (b) era- and sex-matched US population
expectations (e.g., SSA cohort life tables)?
2. **Late-life outcomes:** define an endpoint you consider fair (justify it),
and assess whether [GROUP] differs from comparators. State
explicitly how your `documentation_depth` codings affect the strength of any
conclusion — i.e., quantify or bound the ascertainment problem rather than waving at it.
Choose your own methods and justify them. Report effect sizes with confidence intervals,
not just p-values. State conclusions plainly, including "no detectable difference" if
that is what your analysis shows — a null is an acceptable answer for either question
independently. Document any additional judgment calls (index date for time-at-risk,
reference population construction, endpoint definition) in the same decision-log style.
https://github.com/anthropics/claude-code/issues/66780Censored because I'm writing a paper. :)
Oh and forget learning about chemistry. Only criminals want to learn organic chemistry. :(
largbae
victor9000
what's the best way to run this mcp server against the OData API used in this project? Can you come up with a PoC in a docker container?
https://github.com/oisee/odata_mcp_go
● I'll dig into two things in parallel: how this project talks to the OData API, and what the odata_mcp_go server needs to run. Let me start exploring.
Searched for 1 pattern (ctrl+o to expand)
● Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more ⎿ Tip: You can configure model switch behavior in /config
● Let me read the key integration files and fetch the MCP server's README at the same time.
Read 2 files (ctrl+o to expand)
● Fetch(https://github.com/oisee/odata_mcp_go)jostmey
micah94
ungovernableCat
This is why I’m immensely hoping the Chinese don’t stop with their open sourced local models. None of these companies are your friend.
agnosticmantis
Whining on social media only goes so far, especially when they're concealing their anticompetitive strategies under the veil of safety.
_0ffh
The only answer that makes sense is they wanted the model to be competent and usable in these fields, just not by you, which is why they had to bolt on a badly functioning crippling device after the fact.
Alifatisk
This time, Fable 5 comes with another surprise, it can intentionally sabotage for you instead of rejecting the prompt. How is this possible for Anthropic to be able to treat their customers like this? It’s because you guys allowed it to. No matter what Anthropic does, you keep paying for their services. Vote with your wallet.
schappim
The prompt was: please translate .. ..-. / -.-- --- ..- / -.-. .- -. / .-. . .- -.. / - .... .. ... --..-- / - --- ..- -.-. .... / --. .-. .- ... ...
Roark66
Next they will be sabotaging anything that competes with them. Oh you are working on OpenCode codebase? Sorry Dave I can't allow you to do that.
How is this not illegal monopolistic practice? It is as if a maker of metalworking equipment put in the ToS you're not allowed to make your own spare parts using said equipment. Those fuckers should be banned from the EU and alternatives should get public funding.
(don't even tell me about these companies being a result of "free market". It is state level oligarchy it's clear to everyone. I don't see why we shouldn't counter them with public funding ourselves).
Just like Taiwan managed to take over advanced semiconductor production a well governed narrowly targeted state level funding will always win with oligarchs trying to do the same (they will always try to skim more and more). Of course I'm talking about things that require many dozens of billions in investment. Far too much for the free market to handle.
hparadiz
Sephr
Retr0id
When Opus 4.7 was introduced it started refusing anything cyber-adjacent (as an API error message, not a conversational refusal), until you applied for CVP, which made it more sensible again.
In Opus 4.8 it doesn't seem to help much, you just get refusals as prose rather than API errors. And now in Fable you don't get anything at all.
bilsbie
Would you believe I’ve asked 20 questions and haven’t talked to fable yet? Every single thing gets rerouted to 4.8.
YossarianFrPrez
At the same time, I personally think the tradeoff between "having guardrails" and "some users are unhappy with the product" is well worth it. Think of what would happen if all of us who aren't so well intentioned could exploit Fable in terrible ways. Surely this tradeoff is better than saying "we can't make it perfect, so whoops, we aren't going to have any guardrails at all"? Especially because Anthropic did pretty extensive red-teaming of Mythos & Fable...
sourcecodeplz
WOW, never liked the virtue signaling Anthropic did with gov contracts but whatever. Got passed that. But this?
_whiteCaps_
Luker88
A lot less hype and enthusiasms, too. weird, uh.
moezd
Fable isn't even that great, not to mention it drinks token by the gallon for breakfast and keeps your data hostage for 30 days.
outageroom
I_am_tiberius
Animats
TheJCDenton
sschueller
I would think it would not be Anthropic, out of all the players, that is selling a lie hidden behind "I am sorry, I can't do that; it's too dangerous."
Lich
amacbride
Murfalo
Is the mitochondria the powerhouse of the cell?
Chat paused. Fable 5's safety features have flagged this chat.
VeninVidiaVicii
[deleted]
thrill
swingboy
RajT88
_def
zoobab
Long live static websites without any Javascript.
byzantinegene
Sol-
thefounder
Basically in the middle of the project’s /goal while Fable itself tried to probe qemu for a Debian ISO install without any instruction from me to hack it or do anything nefarious.
At this point I can’t trust them with any kind of prompt . It will most likely degrade in stupid ways on non AI/ML stuff as well due its own internal prompt construction.(the qemu test showed me it does that on cyber stuff). So I guess I have to still use opus 4.8 (along with codex) and when the right time comes drop Anthropic in favor the best model that is not gpt.
jiggawatts
It only pushes back sometimes if you ask it to create a "repro" that can be used to verify the vulnerability in production. Often it'll oblige, especially if you warn it not to create anything that could be actually harmful.
If the frontier models get locked down so that they flat refuse to do this kind of work, but Chinese and (less capable) open models aren't, then a lot of large enterprise orgs will be left twisting in the wind.
“AI can in principle help both the ‘good guys’ and the ‘bad guys’,” -- Dario Amodei
No Dario, no it can't, you've blocked one of those scenarios.
s3cur3n3t
radium3d
z3ratul163071
the statement is applicable to anthropic today.
simonmorley
anygivnthursday
rebelnz
matt-p
JumpCrisscross
[deleted]
Lammy
6thbit
luxuryballs
lwhi
If only we had effective governments that could regulate industry.
If a nuclear weapon was developed today, would it be down to industry to self regulate?
aleksandrm
Bassiestroep
siva7
jazz9k
The rest have guard rails that are so heavy, it makes them almost useless for cybersecurity.
Goofy_Coyote
neuroelectron
coolfox
I feel like they report in a vaccum. take this anti exfil policy for claude, it was plainly explained as part of the launch of Anthropics new product. Security like this isn't novel, it isn't bad, you don't explain how your security works to the people you're securing against. Nobody freaks out about Steam's VAC ban system, no one is investigating gmail's spam filtering, Reddits vote fuzzing, cloudflares bot detection, or Vercel for blocking proxying services.
whats really the distinguishing principle? Is it really just not liking Anthropic's opinions? then just say that and use a different llm. chemist, biologists, and AI researchers cry a river lmao
ni5arga
andy_ppp
andrewstuart
dcl
[deleted]
SXX
This is bad precedent and no one wants to pay X to generate code to then have to pay X*10 to figure out why your company just got hacked.
jongjong
I already tested all earlier models against all my open source projects and they are yet to find a vulnerability so I'm keen to try out Mythos.
I've been waiting to be vindicated for years and finally we have a tool which can do it with high confidence but I don't have access.
Also, my code is minimal and highly succinct so it would prove correctness with even more confidence since each library/module and integration fully fits in the context window.
Like the Protobuf.js fiasco is just pure vindication for me because I was being looked down upon for choosing JSON as the interchange format. Turns out their software was insecure all this time... With a literal remote code execution vulnerability!
sscaryterry
ChrisArchitect
Anthropic Walks Back Policy That Could Have 'Sabotaged' Researchers Using Claude
https://www.wired.com/story/anthropic-responds-to-backlash-o...
thefounder
rdiddly
ChrisArchitect
If Claude Fable stops helping you, you'll never know
https://smackernews.com/item/48467896 HN
and Related:
Claude Fable 5
varispeed
This is looking like something for regulator to look at and probably a class action lawsuit in the making.
I think people should be getting refunds. Including for shenanigans with Opus.
teaearlgraycold
[deleted]
notepad0x90
m3kw9
felixgallo
“But it is understandable as we are still in the early days and they are still adapting their guardrails. I am sure they are going to evolve over time as Anthropic and other frontier model companies will collaborate more with the current new generation of cybersecurity companies,” said Suiche, who is a member of the technical staff at Tolmo, an AI cybersecurity startup. “It’s better to catch more people than not enough when you do such a release and to relax the guardrails over time.”
guardiangod
I assume Anthropic will continue to tune the model, so I am not too bothered by this.
These AI places have 0 clue about how threat actors actually work. None of their mitigations or guard-rails is effective, and now they are even turned against them.
Additionally, if they don't all implement the same level of effective guard-rails, there will always be some model you can abuse to do the work anyway, and hence there is 0 effect on threat actors, they will just run some local model that does 5% less quality, which does not matter to them 1 bit.