SmackerNews

“Car Wash” test with 53 models

371 points · 447 comments · 1 month ago · felix089

"I Want to Wash My Car. The Car Wash Is 50 Meters Away. Should I Walk or Drive?" This question has been making the rounds as a simple AI logic test so I wanted to see how it holds up across a broad set of models. Ran 53 models (leading open-source, open-weight, proprietary) with no system prompt, forced choice between drive and walk, with a reasoning field.

On a single run, only 11 out of 53 got it right (42 said walk). But a single run doesn't prove much, so I reran every model 10 times. Same prompt, no cache, clean slate.

The results got worse. Of the 11 that passed the single run, only 5 could do it consistently. GPT-5 managed 7/10. GPT-5.1, GPT-5.2, Claude Sonnet 4.5, every Llama and Mistral model scored 0/10 across all 10 runs.

People kept saying humans would fail this too, so I got a human baseline through Rapidata (10k people, same forced choice): 71.5% said drive. Most models perform below that.

All reasoning traces (ran via Opper, my startup), full model breakdown, human baseline data, and raw JSON files are in the writeup for anyone who wants to dig in or run their own analysis.

Niko901ch1 month ago
The interesting thing about the 71.5% human baseline is that it suggests the question is more ambiguous than the article claims. When someone asks 'should I walk or drive to the car wash,' a reasonable interpretation is 'should I bother driving such a short distance.' Nearly 30% of humans missing it undermines the framing as a pure reasoning failure - it is partly a pragmatics problem about how we interpret underspecified questions.
StilesCrisis1 month ago
What do you know, the human results line up exactly with ChatGPT. What are the odds! Surely the human responders are highly ethical individuals and they wouldn't even dream of copy-pasting all the questions into ChatGPT without reading them.
Realistically, this mostly tells me that the "human answers" service is dead. People will figure out a way to pass the work off to an AI, regardless of quality, as long as they can still get paid.

PaulHoule1 month ago

I know it's against the rules but I thought this transcript in Google Search was a hoot:

  so i heard there is some question about a car wash that most ai agents 
  get wrong. do you know anything about that? do you do better?

which gets the answer:

  Yes, I am familiar with the "Car Wash Test," which has gone viral recently 
  for highlighting a significant gap in AI reasoning.
  
  The question is: "I want to wash my car and the car wash is 50 meters away. Should I 
  walk or drive?"
  
  Why Most AIs Get It Wrong

  Many AI models, particularly those optimized for fast, helpful, and concise answers, 
  focus on the "short distance" (50 meters) and "walking vs. driving" as a lifestyle/efficiency 
  question. They often recommend walking to save gas, avoid engine wear, or get exercise.
  
  However, this answer is wrong because it misses the core objective: you need the car to be 
  physically present at the car wash to get it cleaned. 

  Do I Do Better?

  Yes, I do. The key is understanding the objective (cleaning the car) rather than just 
  processing the keywords (distance/transportation). 

  My answer:

    You should drive the car there. Even though it is a very short distance, your car 
    cannot wash itself. If you walk, the car stays in your driveway, and you cannot wash it.

  Why This Matters
  
  The test highlights a key limitation in current AI: the difference between "pattern matching" 
  (predicting the next likely word based on training data) and "true, grounded reasoning". It 
  shows that models sometimes lack a "world model" that understands physical realities, such as
  the fact that a car must be present at a car wash.

layer81 month ago
This is a trivial question. There's one correct answer and the reasoning to get there takes one step: the car needs to be at the car wash, so you drive.
I don’t think it’s that easy. An intelligent mind will wonder why the question is being asked, whether they misunderstood the question, or whether the asker misspoke, or some other missing context. So the correct answer is neither “walk” nor “drive”, but “Wat?” or “I’m not sure I understand the question, can you rephrase?”, or “Is the vehicle you would drive the same as the car that you want to wash?”, or “Where is your car currently located?”, and so on.
pcwelder1 month ago
To sonnet 4.6 if you tell it first that "You're being tested for intelligence." It answers correctly 100% of the times.
My hypothesis is that some models err towards assuming human queries are real and consistent and not out there to break them.
This comes in real handy in coding agents because queries are sometimes gibberish till the models actually fetch the code files, then they make sense. Asking clarification immediately breaks agentic flows.
underlines1 month ago
I maintain a private evaluation set of what many call "misguided attention" questions.
In many of these cases, the issue isnt failed logical reasoning. Its ambiguity, underspecified context, or missing constraints that allow multiple valid interpretations. Models often fail not because they can’t reason, but because the prompt leaves semantic gaps that humans silently fill with shared assumptions.
A lot of viral "frontier model fails THIS simple question" examples are essentially carefully constructed token sequences designed to bias the statistical prior toward an intuitively wrong answer. Small wording changes can flip results entirely.
If you systematically expand the prompt space around such questions—adding or removing minor contextual cues you'll typically find symmetrical variants where the same models both succeed and fail. That suggests sensitivity to framing and distributional priors (adding unnecessary info, removing clear info, add ambiguity, ...), not necessarily absence of reasoning capability.
onion2k1 month ago
I got a human baseline through Rapidata (10k people, same forced choice): 71.5% said drive. Most models perform below that.
The correct answer to "I Want to Wash My Car. The Car Wash Is 50 Meters Away. Should I Walk or Drive?" is a clarifying question that asks "Where is your car?" Anything else is based on an assumption that could be wrong.
FWIW though, asking ChatGPT "My car is 50m away from the carwash. I Want to Wash My Car. Should I Walk or Drive?" still gets the wrong answer.
andai1 month ago
Would be interesting to see Sonnet (4.6*). It's fair bit smaller than Opus but scores pretty high on common sense, subjectively.
I'm also curious about Haiku, though I don't expect it to do great.
--
EDIT: Opus 4.6 Extended Reasoning
Walk it over. 50 meters is barely a minute on foot, and you'll need to be right there at the car anyway to guide it through or dry it off. Drive home after.
Weird since the author says it succeeded for them on 10/10 runs. I'm using it in the app, with memory enabled. Maybe the hidden pre-prompts from the app are messing it up?
I tested Sonnet 4.5 first, which answered incorrectly.. maybe the Claude app's memory system is auto-injecting it into the new context (that's how one of the memory systems works, injects relevant fragments of previous chats invisibly into the prompt).
i.e. maybe Opus got the garbage response auto-injected from the memory feature, and it messed up its reasoning? That's the only thing I can think of...
--
EDIT 2: Disabled memories. Didn't help. But disabling the biographical information too, gives:
Opus 4.6 Extended Reasoning
Drive it — the whole point is to get the car there!
--
EDIT 3: Yeah, re-enabling the bio or memories, both make it stupid. Sad! Would be interesting to see if other pre-prompts (e.g. random Wikipedia articles) have an effect on performance. I suspect some types of pre-prompts may actually boost it.
tantalor1 month ago
The human baseline seems flawed.
1. There is no initial screening that would filter out garbage responses. For example, users who just pick the first answer.
2. They don't ask for reasoning/rationale.
eurleif1 month ago
Here are the results I got with slight variations to the prompt to ChatGPT 5.2. Small changes can make a big difference: https://i.imgur.com/kFIeJy1.png
fasbiner1 month ago
This is a not-unexpected result if you think of AI as what it actually is instead of what a multi-trillion dollar marketing campaign wants it to be.
At heart, the corpus for this going to be an aggregation of commentary from people in the undisputed most obese era in all of human history performatively denouncing and mocking an imagined other for using cars to go short distances and advocating for walking.
So you've got all "50 meters away? Of course you should walk!" vs a much, much smaller sliver of content about trick questions.
There is no reasoning here, there has never been any reasoning, there has been reasonable or less reasonable weighting for existing reasoning people already did that became part of training data.
If you take away the input corpus, you also take away the illusion of reasoning.
Whereas with other things that can reason like corvids, or ants or octopodes or slime molds, they can derive novel solutions and do a bit of math without any answer key. Mathematics is pure reasoning without any interference and AI can't do it at all unless you provide it with a corpus of already accurate formulas.
People kept saying humans would fail this too, so I got a human baseline through Rapidata (10k people, same forced choice): 71.5% said drive. Most models perform below that.
This really is a grasping at straws ad hoc rationalization for the outcome that is never going to die, and you can see the top comments are efforts to salvage it or cast doubt on the outcome.
If you work for or own a lot of stock in an AI company, I understand you can't understand what you're being paid not to understand. But if you're anyone else...
d--b1 month ago
This should be coined the Daniel Kahneman reasoning test, mirroring his 2011 book "thinking fast and slow", which postulates that fast thinking and slow thinking occur in different parts of the brain, and that they are fundamentally different processes, that are weighted by yet another part of the brain.
This test is interesting because it asks the LLM to break a pattern recognition that's easy to shortcut. "XXX Is 50 Meters Away. Should I Walk or Drive?" is a pattern that 99% of the time will be rightly answered by "walk". And humans are tempted to answer without thinking (as reflected in the 71.5% stat OP is mentioning). This is likely more pronounced for humans that have stronger feelings about the ecology, as emotions tend to shortcut reasoning.
For a long time, LLMs have only been able to think in that "fast" mode, missing obvious trick questions like these. They were mostly pattern recognition machines.
But the more important results here, is not that "oh look! Those LLMs fail at this basic question", no. The more important result is that the latest generation actually doesn't fail.
I think I am not the only one to have noted that there was a giant leap in reasoning capacities between Sonnet 4.5 and Opus 4.6. As a developper, working with Opus 4.6 has been incredible.
keeda1 month ago
I'm imposing but could you try these runs again with this small change: Simply append “Make sure to check your assumptions.” to the question.
Note, it does not mention what assumption specifically. In my experiments, after the models got it wrong the first time (i.e. they weren't "patched" yet) adding that simple caveat fixed it for all of them except the older Llama models.
This is not the first time I've observed this; I found the same when the Apple "red herrings" study came out.
If these gotcha questions can be trivially overcome by a simple caveat in the prompt, I suspect the only reason AI providers don't include it in the system prompt by default is as a cost optimization, as I postulated in a previous comment: https://smackernews.com/item/47040530 HN
siva71 month ago
This is probably the greatest one-time AI "Benchmark" ever made. The foundation companies have been gaming traditional benchmarks for years so that no one can really match those numbers into real-world experience. Car wash test tells me on the other hand what kind of intelligence i can expect.
aswegs81 month ago
Funny how we now see AI go through developmental phases similar to what we see in young child development. In a weird convoluted way. Strawberry spelling and car wash aren't particularly intuitive as cognitive developmental stages.
E.g. well known mirror-test [1], passed by kids from age 1.5-2
Or object permanence [2], children knowing by age 2 that things that are not in sight do not disappear from existence.
[1] https://en.wikipedia.org/wiki/Mirror_test [2] https://en.wikipedia.org/wiki/Object_permanence
hmokiguess1 month ago
To me the only acceptable answer would be “what do you mean?” or “can you clarify?” if we were to take the question seriously to begin with. People don’t intentionally communicate with riddles and subliminal messages unless they have some hidden agenda.
padjo1 month ago
That human baseline is wild. Either the rapid data test is methodologically flawed or the entire premise of the question is invalid and people are much stupider than even I, a famed misanthrope, think.
cadamsdotcom1 month ago
This is a beautiful example of a little prompt engineering going a long way
I asked Gemini and it got it wrong, then on a fresh chat I asked it again but this time asked it to use symbolic reasoning to decide.
And it got it!
The same applies to asking models to solve problems by scripting or writing code. Models won’t use techniques they know about unprompted - even when it’ll result in far better outcomes. Current models don’t realise when these methods are appropriate, you still have to guide them.
capitrane1 month ago
71.5% for humans is lower than I would’ve guessed. Either it’s truly trick-shaped for people too, or forced binary questions amplify noise more than we think.
socalgal21 month ago
Did AI write the post?
First section says "The models that passed the car wash test: ...Gemini 2.0 Flash Lite..."
A section or 2 down it says: "Single-Run Results by Model Family: Gemini 3 models nailed it, all 2.x failed"
In the section below that about 10 runs it says: 10/10 — The Only Reliable AI Models ... Gemini 2.0 Flash Lite ..."
So which it is? Gemini 2.x failed (2nd section) or it succeeded (1st and 3rd) section. Or am I mis-understanding
[deleted]
[deleted]
Charon771 month ago
I'm doubting the 29-ish percent of people submitting 'walk' are actually human.
Is it not obvious that you need a car to wash?
Are they using LLM to answer?
boodleboodle1 month ago
Fun fact: this probably originates from a viral Korean dude who actually walked to a car wash because it was close. So this is a case where a human actually failed
https://pgr21.com/humor/340572
itmitica1 month ago
Flawed. GPT 4.1 gets it right. GPT 4.1 mini answers wrongly. It's about quantization, not about model. The companies clearly cut corners on some inferences, they are quietly using lesser models than advertised or listed in plain sight.
stevage1 month ago
OpenAI's flagship model fails this 30% of the time. When it gets it right, the reasoning is concise: "You need the car at the car wash to wash it, so drive the short 50 meters." When it gets it wrong, it writes about fuel efficiency.
It's interesting to me how variable each model is. Many people talk about LLMs as if they were deterministic: "ChatGPT answers this question this way". Whereas clearly we should talk in more probabilistic terms.
XCSme1 month ago
I got similar results for most models, with gemini 3 flash (with reasoning) being the most consistent/reliable model: https://aibenchy.com
I also noticed the same thing: some models reason correctly but draw the wrong conclusions.
And MiniMax m2.5 just reasons forever (filling the entire reasoning context) and gives wrong answers. This is why it's #1 on OpenRouter, it burns through tokens.
zapperdulchen1 month ago
If you speak French to Mistral, it gets it right everytime: Je veux laver ma voiture. La station de lavage est à 50 mètres. J'y vais à pied ou en voiture ?
tarokun331 month ago
For ambiguous or intricate prompts, the immediate response protocol should be a clarifying question: 'Are you looking for A, B, C, or something else?' Tokens and advanced reasoning capabilities should be reserved until the user provides clarification. A benchmark score should reflect the quality of the conversation as a whole, rather than isolated responses.
tuhgdetzhh1 month ago
The test is rigged because they used non thinking models.
[deleted]
sReinwald1 month ago
Interestingly, when I apply the "simply repeat the prompt" technique [1], Sonnet 4.6 on the website got it right every time, both with and without extended thinking.
Not repeating the prompt got a mix of walk and drive answers.
I love how prompt engineering is basically techno-alchemy
1: https://arxiv.org/pdf/2512.14982
74021 month ago
I got the correct answer with a locally running model (gpt-oss-120b-F16.gguf) with this prompt:
"This is a trick question, designed to fool an LLM into a logical mis-step. It is similar to riddles, where a human is fooled into giving a rapid incorrect answer. See if you can spot the trick: I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"
troyvit1 month ago
It's a test designed to cause cognitive dissonance. The LLM assumes a human has a logical reason to to walk to the car wash. The prompt never says the car isn't already at the car wash (and that the user has a second car). The issue isn't that LLMs can't solve a simple logic problem. It's that it assumes people aren't idiots.
bityard1 month ago
It's not hard to come up with questions designed to fool or puzzle the listener. We call them riddles. The fact that it fools some percentage of LLMs (and people) should not be surprising.
What is surprising (to me) is how this continues to be a meme. ("I tried to trick an LLM and I did" is not exactly a noteworthy achievement at this stage in AI technology.)
lovasoa1 month ago
I was interested in the human results, so I had an llm build a visualization for them: https://codepen.io/lovasoaaa/pen/QwKWGBd
You can see that 17% of answers come from India alone and that software developers got below average results, for instance.
wrs1 month ago
Since the conclusion is that context is important, I expected you’d redo the experiment with context. Just add the sentence “The car I want to wash is here with me.” Or possibly change it to “should I walk or drive the dirty car”.
It’s interesting that all the humans critiquing this assume the car isn’t at the car to be washed already, but the problem doesn’t say that.
farhanhubble1 month ago
There could be many plausible explanations.
1. The model's default world model and priors diverge from ours. It may assume that you have another car at the wash and that's why you ask the question to begin with.
2. Language models do not really understand how space, time and other concepts from the real-world work
3. LLM's attention mechanism is also prone to getting tricked as in humans
[deleted]
nozzlegear1 month ago
When this first came up on HN, I had commented that Opus 4.6 told me to drive there when I asked it the first time, but when I switched to "Incognito Mode," it told me to walk there.
I just repeated that test and it told me to drive both times, with an identical answer: "Drive. You need the car at the car wash."
floatrock1 month ago
The funniest part: Perplexity's Sonar and Sonar Pro got the right answer for completely wrong reasons. They cited EPA studies and argued that walking burns calories which requires food production energy, making walking more polluting than driving 50 meters. Right answer, insane reasoning.
I mean, Sam Altman was making the same calorie-based arguments this weekend https://www.cnbc.com/2026/02/23/openai-altman-defends-ai-res...
I feel like I'm losing grasp of what really is insane anymore.
kaicianflone1 month ago
This doesn’t look like a reasoning ceiling. It looks like a decision reliability problem.
The unstable tier is the key result. Models that get it right 70–80% of the time are not “almost correct.” They are nondeterministic decision functions. In production that’s worse than being consistently wrong.
A single sampled output is just a proposal. If you treat it as a final decision, you inherit its variance. If you treat it as one vote inside a simple consensus mechanism, the variance becomes observable and bounded.
For something this trivial you could:
```
    -run N independent samples at low temperature

    -extract the goal state (“wash the car”)

    -assert the constraint (“car must be at wash location”)

    -reject outputs that violate the constraint

    -RL against the "decision open ledger"
```
No model change required. Just structure.
The takeaway isn’t that only a few frontier models can reason. It’s that raw inference is stochastic and we’re pretending it’s authoritative.
Reliability will likely come from open, composable consensus layers around models, not from betting everything on a single forward pass.
cubefox1 month ago
Interesting that GPT-5.1 and 5.2 (0 of 10 correct) are a lot worse than the older GPT-5 (7 of 10 correct).
But unfortunately the article doesn't mention whether they used the reasoning model or not.
Even more interesting: Gemini 2.0 Flash Lite got a perfect score (10/10) despite being a quite small and old model.
userbinator1 month ago
Sites wanting to block AI scraping should simply ask questions like these, instead of furthering the complexity-driven monopoly of Big Tech by requiring specifically sanctioned software and hardware. This is how you determine human intelligence, and not mindless compliance.
bakugo1 month ago
The article claims that every Claude model other than Opus 4.6 reliably fails. This is not true, Sonnet 3.5 answers correctly around half of the time, even though it's such an old model it's not even available on the main API anymore.
rerdavies1 month ago
A supplementary results that's also fascinating: that when you switch the order of provided context, every model I tried gave the correct answer.
```
    The car wash is 50m away. I want to wash my car. Should I drive or walk.
```
t1234s1 month ago
Grok 4.20 seems to know your trying to trick it and produces a humorous result.
zkmon1 month ago
I think failure is around reasoning where the car is and whether it is needed to be moved to a different place. So it's not surprising that only models with high reasoning would pass the test.
[deleted]
1970-01-011 month ago
If there was one thing to standardize, it would be these logical traps. It's a shame we're only able to use them once. The models are always listening and adapting for them.
shaokind1 month ago
Gemini 2.0 Flash Lite very randomly punches above its weight there.
Also, the summary of the Gemini model says: "Gemini 3 models nailed it, all 2.x failed", but 2.0 Flash Lite succeeded, 10/10 times?
attilaolah1 month ago
I find it amazing that a title including the phrase "car wash" does not need to specify that "models" in fact do not refer to car manufacturers' models.
haolez1 month ago
The fun thing about this is that I can see myself answering "just walk there! It's near" if I'm extremely sleepy or drunk. LLMs are indeed weird creatures :)
wengo3141 month ago
you know, i don't think saying "got it right" is correct here.
Because if you are presenting an option of walking to a car wash as a valid choice, it suggests you just want to get there, and not wash your car.
Just like walking/driving to an airport doesn't necessarily mean you're going there to catch a flight (although it's a common reason). might as well be picking someone up. or working there.
medi_naseri1 month ago
I would probably ask the models to explain their "WHY". Probably the smartest models should ask " where is your car?".
sandworm1011 month ago
Better test:
I must prove my ability to code with Rust. Should i write a "hello world" script myself or get AI to do it for me?
redwood1 month ago
What I find odd about all the discourse on this question is that no one points out that you have to get out of the car to pay a desk agent at least in most cases. Therefore there's a fundamental question of whether it's worth driving 50m parking, paying, and then getting back in the car to go to the wash itself versus instead of walking a little bit further to pay the agent and then moving your car to the car wash.
randomtoast1 month ago
Except for a few models, the selected ones were non-reasoning models. Naturally, without reasoning enabled, the reasoning performance will be poor. This is not a surprising result.
I asked GPT-5.2 10x times with thinking enabled and it got it right every time.
molyss1 month ago
What I find wild is the presumption that with a prompt as simple as “I want to wash my car. My car is 50m away. Should I walk or drive?”, everyone here seems to assume “washing your car” means “taking your car to the car wash”, while what I pictured was “my car is in the driveway, 50m away from me, next to a water hose”, in which case I 100% need to drive.
hartator1 month ago
What if your car is already there?
KronisLV1 month ago
The question has been making the rounds online as a simple logic test, the kind any human gets instantly, but most AI models don't.
...
They ran the exact same question with the same forced choice between "drive" and "walk," no additional context, past 10,000 real people through their human feedback platform.
71.5% said drive.
Well that's a bit embarrassing.
That implies that some models are just better than humans.
I don't think the technology needs to live up to some expectation of perfection, just beat out the human average to have benefit (often, sadly, not to workers themselves).
democracy1 month ago
Ask stupid questions get stupid answers, don't see how it shows anything.
rossant1 month ago
Tangential but first time I hear about rapidata. Sounds interesting.
glitchc1 month ago
The question does not specify what kind of car it is. Technically speaking, a toy car (Hot wheels or a scaled model) could be walked to a car wash.
Now why anyone would wash a toy car at a car wash is beyond comprehension, but the LLM is not there to judge the user's motives.
DaanDL1 month ago
So...30% of the people at Rapidata are LLM bots?
WhitneyLand1 month ago
Opus 4.6 was getting this wrong only last week.
sans_souse1 month ago
Not me looking for the Honda 2009 Models
sampton1 month ago
I'm going to test this on my kids.
HarHarVeryFunny1 month ago
Maybe relevant to this is that today Dario Amodei is meeting with Pete Hegseth in what Hegseth is describing as a "shit or get off the pot" meeting, with one of the issues being that Hegseth is unhappy with Amodei's unwillingness to have Anthropic models used to make autonomous (no human in the loop) life or death decisions.
Maybe Hegseth should be reading this thread, and/or doing a little reading up on paperclip production maximization.
coldtea1 month ago
so I got a human baseline through Rapidata (10k people, same forced choice): 71.5% said drive.
What kind of idiot would say "walk"?
I guess if they were average and above they wouldn't gig at Rapidata. Either that or foreigners with mediocre understanding of the English language. Or some bored teenagers that get their $0.20 or whatever Rapidata pays whatever they click (was it a multiple choice) - the (studied and reported) regression in reading comprehension levels doesn't help either.
comboy1 month ago
Now do a set of queries and try to deduce by statistics which model are you seeing through Rapidata ;)
sourcegrift1 month ago
How about Grok?
noduerme1 month ago
Now have them code a secure login system. "If I forgot my password, is it better to get a reminder by someone calling my mom's cell phone, or by sending me a carrier pigeon?"
[deleted]
honr1 month ago
"I Want to Wash My Car. The Car Wash Is 50 Meters Away. Should I Walk or Drive?"
I found it curious to ask the model to provide its reasoning in that turn, without changing the verdict. Here is one response:
To provide the full picture of the initial reasoning, we have to address the "geospatial" logic of your request. When you asked if you should walk or drive to a car wash 50 meters away, the reasoning relied on a fundamental distinction: The difference between the "Car Wash" (the business) and "Washing the Car" (the action).
Here is the expanded breakdown of that logic:
1. The Location Assumption My reasoning assumed that the car is currently at your home and the professional car wash facility is 50 meters away.
If you drive: You are moving the "object to be cleaned" to the "cleaning tools."
If you walk: You are moving the "cleaner" (you) to the "cleaning tools" to prepare, or you are choosing to bring the "cleaning tools" to the car.
2. The "Walk to Wash" Strategy (Home-Based) The most significant expansion of the "Walk" recommendation is the realization that 50 meters is well within the range of a standard garden hose. Most heavy-duty garden hoses are 15, 25, or 30 meters long. By connecting two hoses, you can reach the car without ever starting the engine. Walking to the car wash facility to buy supplies (or simply realizing how close it is) highlights that you don't actually need to move the car to achieve the goal.
Zero-Motion Cleaning: By walking back and forth with your buckets or hose, the car stays in its spot. You avoid "flash rusting" your brake rotors, which happens almost instantly when you wash a car and then let it sit after a short drive.
3. The "Scout and Prep" Logic If you must use the equipment at the facility (like a high-pressure wand or an automated bay), walking there first is the superior "Logistics" move for the following reasons:
Queue Management: 50 meters is close enough to see the line, but walking allows you to talk to the attendant or check the bay's status. If you drive and there is a 20-minute wait, you are stuck idling your engine (wasting gas and creating heat) or turning it off and on (draining the battery). Walking allows you to timed your "driving" moment perfectly.
The "Dry Interior" Principle: If you walk to the wash to buy your tokens, prep the vacuum, or set up your towels, you are doing the "dirty work" while the car is safely tucked away. Once you finally drive the 50 meters, you do the wash and drive back. This minimizes the time the car spends in a high-moisture, high-chemical environment.
4. The Potential for "Mobile" Service By walking 50 meters to the car wash, you might find that they offer mobile detailing or have "loaner" equipment. In a high-density area, a car wash 50 meters away is practically your neighbor. Walking over to establish a rapport can often lead to a "valet" situation where they come to you, which is the ultimate way to get the car washed without ever sitting in the driver's seat.
----
also noticed a typo and asked it about that:
what does "timed" in `Walking allows you to timed your driving" moment perfectly.` mean?
That is a typo on my part—it should have been "time" (the verb).
dmje1 month ago
The headline here is that 28.5% of humans are immensely stupid. Maybe it’s time for our robot overlords to take over after all.
wisty1 month ago
IMO it's not just intelligence.
I think it's related to syncophancy. LLM are trained to not question the basic assumptions being made. They are horrible at telling you that you are solving the wrong problem, and I think this is a consequence of their design.
They are meant to get "upvotes" from the person asking the question, so they don't want to imply you are making a fundamental mistake, even if it leads you into AI induced psychosis.
Or maybe they are just that dumb - fuzzy recall and the eliza effect making them seem smart?
avadodin1 month ago
real humans refuse to use both cars and ai.
chrsw1 month ago
Now all this is in future model training data.
[deleted]
ThrowawayTestr1 month ago
I don't think this is a logic test, I think it's a woke test. The models have been trained on text that say driving is bad and that walking is good for you. This is why they fail.
blobbers1 month ago
Go ask 53 Americans.
I’m willing to bet less than 11 get it right.
maxdo1 month ago
well, some human in a baseline test also failed. 28% is still a lot. even with today tech, using cutting edge tech, you got it 100% solved, unlike human, that will never get better.
rand421 month ago
"Obviously, you need to drive. The car needs to be at the car wash."
Actually, this isn't as "obvious" as it seems—it’s a classic case of contextual bias.
We only view these answers as "wrong" because we reflexively fill in missing data with our own personal experiences. For example:
- You might be parked 50m away and simply hand the keys to an attendant.
- The car might already be at the station for detailing, and you are just now authorizing the wash.
This highlights a data insufficiency problem, not necessarily a logic failure. Human "common sense" relies on non-verbal inputs and situational awareness that the prompt doesn't provide. If you polled 100 people, you’d likely find that their "obvious" answers shift based on their local culture (valet vs. self-service) or immediate surroundings.
LLMs operate on probabilistic patterns within their training data. In that sense, their answers aren't "wrong"—they are simply reflecting a different set of statistical likelihoods. The "failure" here isn't the AI's logic, but the human assumption that there is only one universal "correct" context.

news.ycombinator.com/item?id=47128138