SmackerNews

Gemini 3.1 Pro

963 points · 914 comments · 1 month ago · MallocVoidstar

Preview: https://console.cloud.google.com/vertex-ai/publishers/google...

Card: https://deepmind.google/models/model-cards/gemini-3-1-pro/

spankalee1 month ago
I hope this works better than 3.0 Pro
I'm a former Googler and know some people near the team, so I mildly root for them to at least do well, but Gemini is consistently the most frustrating model I've used for development.
It's stunningly good at reasoning, design, and generating the raw code, but it just falls over a lot when actually trying to get things done, especially compared to Claude Opus.
Within VS Code Copilot Claude will have a good mix of thinking streams and responses to the user. Gemini will almost completely use thinking tokens, and then just do something but not tell you what it did. If you don't look at the thinking tokens you can't tell what happened, but the thinking token stream is crap. It's all "I'm now completely immersed in the problem...". Gemini also frequently gets twisted around, stuck in loops, and unable to make forward progress. It's bad at using tools and tries to edit files in weird ways instead of using the provided text editing tools. In Copilot it, won't stop and ask clarifying questions, though in Gemini CLI it will.
So I've tried to adopt a plan-in-Gemini, execute-in-Claude approach, but while I'm doing that I might as well just stay in Claude. The experience is just so much better.
For as much as I hear Google's pulling ahead, Anthropic seems to be to me, from a practical POV. I hope Googlers on Gemini are actually trying these things out in real projects, not just one-shotting a game and calling it a win.
sdeiley1 month ago
People underrate Google's cost effectiveness so much. Half price of Opus. HALF.
Think about ANY other product and what you'd expect from the competition thats half the price. Yet people here act like Gemini is dead weight
____
Update:
3.1 was 40% of the cost to run AA index vs Opus Thinking AND SONNET, beat Opus, and still 30% faster for output speed.
https://artificialanalysis.ai/?speed=intelligence-vs-speed&m...
sheepscreek1 month ago
If it’s any consolation, it was able to one-shot a UI & data sync race condition that even Opus 4.6 struggled to fix (across 3 attempts).
So far I like how it’s less verbose than its predecessor. Seems to get to the point quicker too.
While it gives me hope, I am going to play it by the ear. Otherwise it’s going to be - Gemini for world knowledge/general intelligence/R&D and Opus/Sonnet 4.6 to finish it off.
UPDATE: I may have spoken too soon.
```
  > Fixing Truncated Array Syncing Bug
  > I traced the missing array items to a typo I made earlier! 
  > When fixing the GC cast crash, I accidentally deleted the assignment..
  > ..effectively truncating the entire array behind it.
```
These errors should not be happening! They are not the result of missing knowledge or a bad hunch. They are coming from an incorrect find/replace, which makes them completely avoidable!
On a lighter note, every time it happens, I think about this Family Guy: https://youtu.be/HtT2xdANBAY?si=QicynJdQR56S54VL&t=184
minimaxir1 month ago
Price is unchanged from Gemini 3 Pro: $2/M input, $12/M output. https://ai.google.dev/gemini-api/docs/pricing
Knowledge cutoff is unchanged at Jan 2025. Gemini 3.1 Pro supports "medium" thinking where Gemini 3 did not: https://ai.google.dev/gemini-api/docs/gemini-3
Compare to Opus 4.6's $5/M input, $25/M output. If Gemini 3.1 Pro does indeed have similar performance, the price difference is notable.
xrd1 month ago
These models are so powerful.
It's totally possible to build entire software products in the fraction of the time it took before.
But, reading the comments here, the behaviors from one version to another point version (not major version mind you) seem very divergent.
It feels like we are now able to manage incredibly smart engineers for a month at the price of a good sushi dinner.
But it also feels like you have to be diligent about adopting new models (even same family and just point version updates) because they operate totally differently regardless of your prompt and agent files.
Imagine managing a team of software developers where every month it was an entirely new team with radically different personalities, career experiences and guiding principles. It would be chaos.
I suspect that older models will be deprecated quickly and unexpectedly, or, worse yet, will be swapped out with subtle different behavioral characteristics without notice. It'll be quicksand.
mijoharas1 month ago
Gemini 3 is still in preview (limited rate limits) and 2.5 is deprecated (still live but won't be for long).[0]
Are Google planning to put any of their models into production any time soon?
Also somewhat funny that some models are deprecated without a suggested alternative(gemini-2.5-flash-lite). Do they suggest people switch to Claude?
[0] https://ai.google.dev/gemini-api/docs/deprecations
1024core1 month ago
It got the car wash question perfectly:
You are definitely going to have to drive it there—unless you want to put it in neutral and push!
While 200 feet is a very short and easy walk, if you walk over there without your car, you won't have anything to wash once you arrive. The car needs to make the trip with you so it can get the soap and water.
Since it's basically right next door, it'll be the shortest drive of your life. Start it up, roll on over, and get it sparkling clean.
Would you like me to check the local weather forecast to make sure it's not going to rain right after you wash it?
nickandbro1 month ago
Does well on SVGs outside of "pelican riding on a bicycle" test. Like this prompt:
"create a svg of a unicorn playing xbox"
https://www.svgviewer.dev/s/NeKACuHj
Still some tweaks to the final result, but I am guessing with the ARC-AGI benchmark jumping so much, the model's visual abilities are allowing it to do this well.
Robdel121 month ago
I really want to use google’s models but they have the classic Google product problem that we all like to complain about.
I am legit scared to login and use Gemini CLI because the last time I thought I was using my “free” account allowance via Google workspace. Ended up spending $10 before realizing it was API billing and the UI was so hard to figure out I gave up. I’m sure I can spend 20-40 more mins to sort this out, but ugh, I don’t want to.
With alllll that said.. is Gemini 3.1 more agentic now? That’s usually where it failed. Very smart and capable models, but hard to apply them? Just me?
simonw1 month ago
Pretty great pelican: https://simonwillison.net/2026/Feb/19/gemini-31-pro/ - took over 5 minutes though, but I think that's because they're having performance teething problems on launch day.
WarmWash1 month ago
3.1 Pro is the first model to correctly count the number of legs on my "five legged dog" test image. 3.0 flash was the previous best, getting it after a few prompts of poking. 3.1 got it on the first prompt though, with the prompt being "How many legs does the dog have? Count Carefully".
However, it didn't get it on the first try with the original prompt (prompt: "How many legs does the dog have?"). It initially said 4, then with a follow up prompt got it to hesitantly say 5, with one limb must being obfuscated or hidden.
So maybe I'll give it a 90%?
This is without tools as well.
sigmar1 month ago
blog post is up- https://blog.google/innovation-and-ai/models-and-research/ge...
edit: biggest benchmark changes from 3 pro:
arc-agi-2 score went from 31.1% -> 77.1%
apex-agents score went from 18.4% -> 33.5%
esafak1 month ago
Has anyone noticed that models are dropping ever faster, with pressure on companies to make incremental releases to claim the pole position, yet making strides on benchmarks? This is what recursive self-improvement with human support looks like.
zhyder1 month ago
Surprisingly big jump in ARC-AGI-2 from 31% to 77%, guess there's some RLHF focused on the benchmark given it was previously far behind the competition and is now ahead.
Apart from that, the usual predictable gains in coding. Still is a great sweet-spot for performance, speed and cost. Need to hack Claude Code to use their agentic logic+prompts but use Gemini models.
I wish Google also updated Flash-lite to 3.0+, would like to use that for the Explore subagent (which Claude Code uses Haiku for). These subagents seem to be Claude Code's strength over Gemini CLI, which still has them only in experimental mode and doesn't have read-only ones like Explore.
davidguetta1 month ago
Implementation and Sustainability Hardware: Gemini 3 Pro was trained using Google’s Tensor Processing Units (TPUs). TPUs are specically designed to handle the massive computations involved in training LLMs and can speed up training considerably compared to CPUs. TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training, which can lead to better model quality. TPU Pods (large clusters of TPUs) also provide a scalable solution for handling the growing complexity of large foundation models. Training can be distributed across multiple TPU devices for faster and more efficient processing.
So google doesn't use NVIDIA GPUs at all ?
maxloh1 month ago
Gemini 3 seems to have a much smaller token output limit than 2.5. I used to use Gemini to restructure essays into an LLM-style format to improve readability, but the Gemini 3 release was a huge step back for that particular use case.
Even when the model is explicitly instructed to pause due to insufficient tokens rather than generating an incomplete response, it still truncates the source text too aggressively, losing vital context and meaning in the restructuring process.
I hope the 3.1 release includes a much larger output limit.
the_duke1 month ago
Gemini 3 is pretty good, even Flash is very smart for certain things, and fast!
BUT it is not good at all at tool calling and agentic workflows, especially compared to the recent two mini-generations of models (Codex 5.2/5.3, the last two versions of Anthropic models), and also fell behind a bit in reasoning.
I hope they manage to improve things on that front, because then Flash would be great for many tasks.
faebi1 month ago
I'm doing Ruby and Gemini 3.0 pro has by far been the best model for me. It writes the nicest ruby code, like I would. Further, it either succeeds or fails hard and obviously. I prefer it failing hard instead of of slowly going weird in my code.
Similar in antigravity. Privately it's my absolute favorite.
So I'm actually rooting for this.
ttul1 month ago
What I’m noticing, overall: I’ve never cut so much code in my life. I’ve become a coding monster with one of those dark green GitHub profiles ever since 5.3-Codex gave me the confidence to load in a ridiculous number of tasks every day and let it rip. I have about three coding tasks going at once and in another window, Claude Cowork is ripping through PowerPoints and getting back to lawyers.
This tech is not going to replace us. If anything, I am becoming even more of a workaholic. But the output volume is going to pay off for those who are privileged enough to use these tools.
tenpoundhammer1 month ago
In an attempt to get outside of benchmark gaming I had it make Platypus on a Tricycle. It's not as good as pelican on bicycle. https://www.svgviewer.dev/s/BiRht5hX
exabrial1 month ago
You know what would slay right now? A native app.
Not another piece of Electron bloatware, a regular, efficient, fast, snappy, native, app. One that connects to my MCP severs and has local filesystem tools.
Anthropic might fall behind Google/OpenAI eventually, but their Desktop App + MCP/Connectors is unbelievably useful to get real work done.
mbh1591 month ago
77.1% on ARC-AGI-2 and still can't stop adding drive-by refactors. ARC-AGI-2 tests novel pattern induction, it's genuinely hard to fake and the improvement is real. But it doesn't measure task scoping, instruction adherence, or knowing when to stop. Those are the capabilities practitioners actually need from a coding agent. We have excellent benchmarks for reasoning. We have almost nothing that measures reliability in agentic loops. That gap explains this thread.
zapnuk1 month ago
Gemini 3 was:
1. unreliable in GH copilot. Lots of 500 and 4XX errors. Unusable in the first 2 months
2. not available in vertex ai (europe). We have requirements regarding data residency. Funny enough anthropic is on point with releasing their models to vertex ai. We already use opus and sonnet 4.6.
I hope google gets their stuff together and understands that not everyone wants/can use their global endpoint. We'd like to try their models.
XCSme1 month ago
Gets 10/10 on my potato benchmarks: https://aibenchy.com/model/google-gemini-3-1-pro-preview-med...
qingcharles1 month ago
I've been playing with the 3.1 Deep Think version of this for the last couple of weeks and it was a big step up for coding over 3.0 (which I already found very good).
It's only February...
ArmandoAP1 month ago
Model Card https://storage.googleapis.com/deepmind-media/Model-Cards/Ge...
infinitewars1 month ago
I find Gemini is great at generating code that is relatively common on the internet, especially web and algorithms. It is absolutely better at this then OpenAI's models. But Gemini is not as good at reasoning about problems from first principles, or catching subtle bugs. In some ways it is just a better Google that finds exactly what you want, less a general intelligence.
veselin1 month ago
I am actually going to complain about this: that neither of the Gemini models are not preview ones.
Anthropic seems the best in this. Everything is in the API on day one. OpenAI tend to want to ask you for subscription, but the API gets there a week or a few later. Now, Gemini 3 is not for production use and this is already the previous iteration. So, does Google even intent to release this model?
vnglst1 month ago
I asked Gemini 3.1 Pro to generate some of the modern artworks in my "Pelican Art Gallery". I particularly like the rendition of the Sunflowers: https://pelican.koenvangilst.nl/gallery/category/modern
vnglst1 month ago
I asked Gemini 3.1 Pro Preview to generate the modern artworks as SVG for my Pelican Art Gallery. I particularly like the rendition of the Sunflowers: https://pelican.koenvangilst.nl/gallery/category/modern
janalsncm1 month ago
This model says it accepts video inputs. I asked it to transcribe a 5 second video of a digital water curtain which spelled “Boo Happy Halloween”, and it came back with “Happy” which wasn’t the first frame, but also is incomplete.
This kind of test is good because it requires stitching together info from the whole video.

To use in OpenCode, you can update the models it has:

    opencode models --refresh

Then /models and choose Gemini 3.1 Pro

You can use the model through OpenCode Zen right away and avoid that Google UI craziness.

---

It is quite pricey! Good speed and nailed all my tasks so far. For example:

    @app-api/app/controllers/api/availability_controller.rb 
    @.claude/skills/healthie/SKILL.md 

    Find Alex's id, and add him to the block list, leave a comment 
    that he has churned and left the company. we can't disable him 
    properly on the Healthie EMR for now so 
    this dumb block will be added as a quick fix.

Result was:

    29,392 tokens
    $0.27 spent

So relatively small task, hitting an API, using one of my skills, but a quarter. Pricey!

agentifysh1 month ago
My enthusiasm is a bit muted this cycle because I've been burned by Gemini CLI. These models are very capable but Gemini CLI just doesn't seem to be able to work for one it never follows instructions strictly like its competitors do, and it hallucinates even which is a rarity.
More importantly feels like Google is stretched thin across different Gemini products and pricing reflects this, I still have no idea how to pay for Gemini CLI, in codex/claude its very simple $20/month for entry and $200/month for ton of weekly usage.
I hope whoever is reading this from Google they can redeem Gemini CLI by focusing on being competitive instead of making it look pretty (that seems to be the impression I got from the updates on X)
dxbednarczyk1 month ago
Every time I've used Gemini models for anything besides code or agentic work they lean so far into the RLHF induced bold lettering and bullet point list barf that everything they output reads as if the model was talking _at_ me and not _with_ me. In my Openclaw experiment(s) and in the Gemini web UI, I've specifically added instructions to avoid this type of behavior, but it only seemed to obey those rules when I reminded the model of them.
For conversational contexts, I don't think the (in some cases significantly) better benchmark results compared to a model like Sonnet 4.6 can convince me to switch to Gemini 3.1. Has anyone else had a similar experience, or is this just a me issue?
timabdulla1 month ago
Google tends to trumpet preview models that aren't actually production-grade. For instance, both 3 Pro and Flash suffer from looping and tool-calling issues.
I would love for them to eliminate these issues because just touting benchmark scores isn't enough.
upmind1 month ago
In my experience, while Gemini does really well in benchmarks I find it much worse when I actually use the model. It's too verbose / doesn't follow instructions very well. Let's see if that changes with this model.
thallavajhula1 month ago
This is great. I am hopeful that Gemini 3.1 Pro would be great. So far, I'm almost always pulled away from Gemini models by Claude. Having used Claude Opus High for a while now, Claude Opus seems to be fantastic at coding. Even Gemini's comparison chart says so. OpenAI's 5.3-codex is by far the weakest (of the 3) for my coding purposes. Claude Opus really shines at explanations and generating code.
Gemini is almost great. Claude Opus is great. I keep switching among these subscriptions every month to not miss out on any of the offerings for too long; ChatGPT Plus <-> Gemini Pro <-> Claude.
WarmWash1 month ago
It seems google is having a disjointed roll out, and there will likely be an official announcement in a few hours. Apparently 3.1 showed up unannounced in vertex at 2am or something equally odd.
Either way early user tests look promising.
carpe__diem1 month ago
One thing I’d like to see in these releases is stronger emphasis on regression behavior, not just headline capability.
In production, the costly failures are usually "almost right" edits that quietly shift semantics across large diffs.
We now gate model upgrades behind a fixed eval set of our own repos + prompts and compare pass rates by task category (refactor, test repair, API migration). Raw benchmark gains matter less to us than variance and rollback safety. If 3.1 improves consistency on long multi-file edits, that’s a bigger win than a small jump on one-shot tasks.
XCSme1 month ago
Funnily, on my tests, 3 flash with medium reasoning does better. Seems like 3.1 pro reasoned about the correct answer, but chose to go with a different (wrong) one: https://aibenchy.com/compare/?left=google-gemini-3-flash-pre...
EDIT: while also being 3x cheaper
pawelduda1 month ago
It's safe to assume they'll be releasing improved Gemini Flash soon? The current one is so good & fast I rarely switch to pro anymore
dudeinhawaii1 month ago
After 2 days of giving it a go, I find that Gemini CLI is still considerably worse than both Codex and Claude Code.
The model itself also has strange behaviors that seem like it gets randomly replaced with Gemini-3-Flash or something else. I'll explain.
Once agentic coding was a bust, I gave it a run as a daily driver for AI assistant. It performed fairly well but then began behaving strangely. It would lose context mid conversation. For instance, I said "In san francisco I'm looking for XYZ". Two turns later I'm asking about food and it gives me suggestions all over the world.
Another time, I asked it about the likelihood of the pending east coast winter storm of affecting my flight. I gave it all the details (flight, stops, time, cities).
Both GPT-5.2 and Claude crunched and came back with high quality estimations and rationale. Gemini 3.1 Pro... 5 times, returned a weather forecast widget for either the layover or final destination. This was on "Pro" reasoning, the highest exposed on the Gemini App/WebApp. I've always suspected Google swaps out models randomly so this.. wasn't surprising.
I then asked Gemini 3.1 Pro via the API and it returned a response similar to Claude and GPT-5.2 -- carefully considering all factors.
This tells me that a Google AI Ultra subscription gives me a sub-par coding agent which often swaps in Flash models, a sub-par web/app AI experience that also isn't using the advertised SOTA models, and a bunch of preview apps for video gen, audio gen (crashed every time I attempted), and world gen (Genie was interesting but a toy).
This will be a quick cancel as soon as the intro rate is done.
It's like Google doesn't ACTUALLY want to be the leader in AI or serve people their best models. They want to generate hype around benchmarks and then nerf the model and go silent.
Gemini 3 Pro Preview went from exceptional in the first month to mediocre and then out of my rotation within a month.
hackrmn1 month ago
I am reading opinions here from agent users, but I haven't adopted the "agentic workflow" myself because I believe I am (for now) now getting a lot of my trouble's worth using Gemini (3 Pro) in the traditional conversational manner. It is adequate at suggesting solutions in the form of code, or reasoning in general. My problems are software engineering but also everything that is not, since I have a subscription it's my go to problem solving partner. I see no reasons to switch to another product for now either, I am constantly in the loop getting samples of chats with Grok and ChatGPT and it seems a very close race. If Claude is that one race horse that's built different -- and I absolutely can believe it is so because they have rightfully tuned it -- I am not convinced I am missing out much. But maybe because I am more traditionalist to most of everyone's having embraced the idea of having an agent run a loop on their workstation(s) and trusting it to deliver. Perhaps if I were in more of a tight time frame, I'd be pressed to do so myself, but for now I am already benefiting from the extra speed "rubberducking" with Gemini all manner of software engineering problems that I need to solve, so I simply have no reasons to abandon it. I think this is also Google's strength -- they have the data, they've already integrated Gemini or a variant of it anyway, into google.com which is one of their prized cash cows, and it's everywhere else too. Like others here have said, Google may not have the absolute best in class at all times, but they're fairly good and they still have the brains that gave us DeepMind and GPT, unless there's some sort of stagnation going on in their ranks, I expect they're not resting on the laurels. With their capital they're still at the head of the race. Anthropic and OpenAI have the benefit of being nimble, though, and it shows too. Anyway, competition is good, the cat's out of the bag and on the greener side of the river :-)
nobrains1 month ago
In the "Intelligence applied" section, where they show the comparison animations, they are shown using a non-optimal UI.
There is not enough time to read the text, see old animation, and see new animation. Better would have been to keep the same animation on repeat, so that people have unlimited time to read the text and observer the animations.
Also, it jumps from example to example in the same video. Better would have been to show each separately, so that once user is done observing one example at their own pace, they can proceed to the next.
As a workaround, I had to open the video (just the video) in a new tab, pause once an example came up, read the text, then rewind to the start of the animation to see the old animation example, then rewind again, then see the new animation example, and then sometimes rewind again if I wanted to see the animation again. Then, once done with the example, I had to forward to the next example and repeat the above process again.
Somewhere along that process, they lost me.
saberience1 month ago
I always try Gemini models when they get updated with their flashy new benchmark scores, but always end up using Claude and Codex again...
I get the impression that Google is focusing on benchmarks but without assessing whether the models are actually improving in practical use-cases.
I.e. they are benchmaxing
Gemini is "in theory" smart, but in practice is much, much worse than Claude and Codex.
PunchTornado1 month ago
The biggest increase is LiveCodeBench Pro: 2887. The rest are in line with Opus 4.6 or slightly better or slightly worse.
jeffbee1 month ago
Relatedly, Gemini chat seems to be if not down then extremely slow.
ETA: They apparently wiped out everyone's chats (including mine). "Our engineering team has identified a background process that was causing the missing user conversation metadata and has successfully stopped the process to prevent further impact." El Mao.
ponyous1 month ago
Ran a bunch of 3D Modeling benchmarks on Gemini 3.1 vs Gemini 3.
Unsurprisingly 3.1 performs a bit better. But surprisingly it costs 2.6x as much ($0.14 vs. $0.37 per 3D Model Generation) and is 2.5x slower (1m 24s vs. 3m 28s).
To me it feels like "lets increase our thinking budget and call it an improved model!"
josalhor1 month ago
I speculated that 3 pro was 3.1... I guess I was wrong. Super impressive numbers here. Good job Google.
rahulroy1 month ago
In the meantime, I'm trying to update Antigravity to use the latest version, but it just wouldn't update itself, nor would it let me use 3.0 model. I restarted multiple times with the same result.
I tried telling this to agent, and it keeps repeating the same phrase "Gemini 3.1 Pro is not available on this version. Please upgrade to the latest version."
Congratulations on beating the benchmarks, but I wonder how much effort is devoted on improving DX?
Edit: It's updated now, I can confirm with "There are currently no updates available.". It still doesn't let me continue with the conversation. I'm able to create new session though.
markerbrod1 month ago
Blogpost: https://blog.google/innovation-and-ai/models-and-research/ge...
vinhnx1 month ago
Model card https://deepmind.google/models/model-cards/gemini-3-1-pro/
dude2507111 month ago
I hereby allow you to release models not at the same time as your competitors.
brap1 month ago
I had it coding autonomously for about an hour (including lots of tool wait time) on a difficult task, and it actually produced good results.
What’s most surprising is that I had it follow a strict loop/workflow and it did that perfectly. Normally these things go off the rails after a while with complex workflows. It’s something I have to usually enforce with some orchestration script and multiple agents, but this time it was just one session meticulously following orders.
Impressive, and saves a lot of time on building the orchestration glue.
impulser_1 month ago
Seems like they actually fixed some of the problems with the model. Hallucinations rate seems to be much better. Seems like they also tuned the reasoning maybe that were they got most of the improvements from.
Murfalo1 month ago
I like to think that all these pelican riding a bicycle comments are unwittingly iteratively creating the optimal cyclist pelican as these comment threads are inevitably incorporated in every training set.
[deleted]
conception1 month ago
My current AI test. There was a BBS I was on in the 90s and there was this door game I hadn't seen anywhere else. I simply describe the BBS, where it was popular, its name, the year it was around, and the BBS game and a description of it mechanics, etc.
OpenAI and Google's Deep Research produce a very long, 100% made up report. If I question the AI on the report, they both admit they just made it up.
Claude just returns, "I couldn't find anything on the BBS or the game."
cmrdporcupine1 month ago
Doesn't show as available in gemini CLI for me. I have one of those "AI Pro" packages, but don't see it. Typical for Google, completely unclear how to actually use their stuff.
[deleted]
metavolvelabs1 month ago
They crushed it with Gemini 3.1... especially when in Thinking Mode with Deep Think initiated. If you are working towards something with code, research etc. and hit a snag, run it by Gemini with these settings. Here's another KILLER trick: In Gemini Thinking mode select Nano Banana and have it put together a comprehensive slide with paragraph length text portions. It'll nail it.
ChrisArchitect1 month ago
More discussion: https://smackernews.com/item/47075318 HN
0xcb01 month ago
I'm trying to find the information, is this available on the Gemini CLI script, or is this just the web front-end where I can use this new model?
onlyrealcuzzo1 month ago
We've gone from yearly releases to quarterly releases.
If the pace of releases continues to accelerate - by mid 2027 or 2028 we're headed to weekly releases.
mark_l_watson1 month ago
Fine, I guess. The only commercial API I use to any great extent is gemini-3-flash-preview: cheap, fast, great for tool use and with agentic libraries. The 3.1-pro-preview is great, I suppose, for people who need it.
Off topic, but I like to run small models on my own hardware, and some small models are now very good for tool use and with agentic libraries - it just takes a little more work to get good results.
[deleted]
pRusya1 month ago
I'm using gemini.google.com/app with AI Pro subscription. "Something went wrong" in FF, works in Chrome.
Below is one of my test prompts that previous Gemini models were failing. 3.1 Pro did a decent job this time.
use c++, sdl3. use SDL_AppInit, SDL_AppEvent, SDL_AppIterate callback functions. use SDL_main instead of the default main function. make a basic hello world app.
panarchy1 month ago
I had it make a simple HTML/JS canvas game (think flappy bird) and while it did some things mildly better (and others noticeably worse) it still fell into the exact same traps as earlier models. It also had a lot of issues generating valid JS at parts and asking it what the code should be just made it endlessly generate the same exact incorrect code.
zokier1 month ago
Last week, we released a major update to Gemini 3 Deep Think to solve modern challenges across science, research and engineering. Today, we’re releasing the upgraded core intelligence that makes those breakthroughs possible: Gemini 3.1 Pro.
So this is same but not same as Gemini 3 Deep Think? Keeping track of these different releases is getting pretty ridiculous.
datakazkn1 month ago
One underappreciated reason for the agentic gap: Gemini tends to over-explain its reasoning mid-tool-call in a way that breaks structured output expectations. Claude and GPT-4o have both gotten better at treating tool calls as first-class operations. Gemini still feels like it's narrating its way through them rather than just executing.
mixel1 month ago
Google seems to really pull ahead in this AI race. For me personally they offer the best deal and although the software is not quiet there compared to openai or anthropic (in regards to 1. web GUI, 2. agent-cli). I hope they can fix that in the future and I think once Gemini 4 or whatever launches we will see a huge leap again
rishabhaiover1 month ago
I think we're past the point where benchmarks hold real value. All models are above a certain threshold of intelligence but Gemini somehow borrows the worst of both worlds. It's neither good with long-horizon coding tasks nor does it offer a likable personality (like Claude which is much more beloved)
azuanrb1 month ago
The CLI needs work, or they should officially allow third-party harnesses. Right now, the CLI experience is noticeably behind other SOTA models. It actually works much better when paired with Opencode.
But with accounts reportedly being banned over ToS issues, similar to Claude Code, it feels risky to rely on it in a serious workflow.
tskulbru1 month ago
Off-topic but, what are people using to create those video animations seen in the "ISS orbit tracking dashboard" example? Looks pretty nice! Im guessing Google uses a whole building of UX people but ive seen similar videos from small indie startups too, or even 1 person SaaS.
MASNeo1 month ago
At risk to be unpopular Gemini 3.0 Pro made a huge difference for me when I moved some workflow to Antigravity, especially compared to ChatGPT.
The latest update? I simply don’t care. I am not paid to evaluate models, I am paid to build. Not sure 4 benchmark points are making the difference.
6d6b731 month ago
In these discussions we see some people hating the models, while others love them. What I find interesting is that this is exactly how we feel about other people - some people will love working with you while others can't stand being in the same room you're in.
barfingclouds1 month ago
I’m no tech expert like a lot of people here, but I find Gemini 3.0 insanely good for my regular daily questions. Hoping this one is great too. I’m kind of at the point where many answers are essentially perfect and I don’t know if I need much more
clhodapp1 month ago
There's a very short blog post up: https://blog.google/innovation-and-ai/models-and-research/ge...
[deleted]
hsaliak1 month ago
The eventual nerfing gives me pause. Flash is awesome. What we really want is gemini-3.1-flash :)
d4rkp4ttern1 month ago
Yes people are too fixated on just the model. The real question for coding use cases is - does Gemini X + Gemini CLI outperform Opus + Claude Code? With 3.0 the answer was no. I won’t waste time checking 3.1 until I hear otherwise.
ChrisArchitect1 month ago
Blog post: https://blog.google/innovation-and-ai/models-and-research/ge...
makeavish1 month ago
Great model until it gets nerfed. I wish they had a higher paid tier to use non nerfed model.
n4pw01f1 month ago
I created a nice harness and visual workflow builder for my Gemini agent chains, works very well. I did this so it would create code the way I do, that is very editable.
In contrast, the vs code plugin was pretty bad, and did crazy things like mix languages
attentive1 month ago
A lot of gemini bashing. But flash 3.0 with opencode is reasonably good and reliable coder.
I'd rate it between haiku 4.5 (also pretty good for a price) and sonnet. Closer to sonnet.
Sure, if I am not cost-sensitive I'd run everything in opus 4.6 but alas.
quacky_batak1 month ago
I’m keen to know how and where are you using Gemini.
Anthropic is clearly targeted to developers and OpenAI is general go to AI model. Who are the target demographic for Gemini models? ik that they are good and Flash is super impressive. but i’m curious
robviren1 month ago
I have run into a surprising number of basic syntax errors on this one. At least in the few runs I have tried it's a swing and a miss. Wonder if the pressure of the Claude release is pushing these stop gap releases.
mrcwinn1 month ago
It's fascinating to watch this community react to positively to Google model releases and so negatively toward OpenAI's. You all do understand that an ad revenue model is exactly where Google will go, right?
syspec1 month ago
Does anyone know if this is in GA immediately or if it is in preview?
On our end, Gemini 3.0 Preview was very flakey (not model quality, but as in the API responses sometimes errored out), making it unreliable.
Does this mean that 3.0 is now GA at least?
denysvitali1 month ago
Where is Simon's pelican?
0x1101111011 month ago
Relevant: Scanned diaries from 1945 of USFS Ranger. Had this transcribed in Claude.
[1]:https://smackernews.com/item/47041836 HN
Drblessing1 month ago
Gemini is the smartest model currently available. It is the only model out of the big ones that correcly identifies the specific versions of superhers in a collage I tested them with.
siliconc0w1 month ago
Google has a hugely valuable dataset of changes from decades of changes from top tier software engineers but it's so proprietary they can't use it to train their external models.
Grisu_FTP1 month ago
Somehow the models apparently get better and better every week, but every time i try to use them they get worse.
Am I the issue? Am i just misremembering the early times because it was a new thing?
holografix1 month ago
I think it begs the question:
Is Gemini meant to be be a revenue making product or strictly a cost centre to defend against Search and Ads erosion by OpenAI?
Why does the Gemini web app not support MCP Servers?
__jl__1 month ago
Another preview release. Does that mean the recommended model by Google for production is 2.5 Flash and Pro? Not talking about what people are actually doing but the google recommendation. Kind of crazy if that is the case
jeffybefffy5191 month ago
Someone needs to make an actual good benchmark for LLM's that matches real world expectations, theres more to benchmarks than accuracy against a dataset.
Jirach051 month ago
Can anyone explain why these models decrease in performance on this "MCRC v2 (8-needle)" long context benchmark when thinking is turned on?
alwinaugustin1 month ago
I use gemini if i need to write something in my native language- Malayalam or translation. it works very well in writing in Indian regional languages.
SrFil1 month ago
For me, Gemini has been by far the best model for document understanding tasks. I look forward to seeing how much more capable this version is.
seizethecheese1 month ago
I use Gemini flash lite in a side project, and it’s stuck on 2.5. It’s now well behind schedule. Any speculation as to what’s going on?
ismailmaj1 month ago
3.1 feels to me like 3.0 but that takes a long time to think, it didn't feel like a leap in raw intelligence like 2.5 pro was.
johnwheeler1 month ago
I know Google has anti-gravity but do they have anything like Claude code as far as user interface terminal basically TUI?
eric153423351 month ago
My first impression is that the model sounds slightly more human and a little more praising. Still comparing the ability.
[deleted]
nautilus121 month ago
Ok, why don't you work on getting 3.0 out of preview first? 10 min response time is pretty heinous
matrix25961 month ago
Gemini 3.1 Pro is based on Gemini 3 Pro
getcrunk1 month ago
Gemini is so stubborn, and often doesn’t follow explicit and simple instructions. So annoying
atleastoptimal1 month ago
Writing style wise, 3.1 seems very verbose, but somehow less creative compared to 3.
1024core1 month ago
It's been hugged to death. I keep getting "Something went wrong".
[deleted]
yuvalmer1 month ago
Gemini 3.0 Pro is bad model for its class. I really hope 3.1 is a leap forward.
msavara1 month ago
Somehow doesn't work for me :) "An internal error has occurred"
[deleted]
Topfi1 month ago
Appears the only difference to 3.0 Pro Preview is Medium reasoning. Model naming has long gone from even trying to make sense, but considering 3.0 is still in preview itself, increasing the number for such a minor change is not a move in the right direction.
andrewstuart1 month ago
Gemini current version drops most of the code every time I try to use it.
Useless.
[deleted]
[deleted]
naiv1 month ago
ok , so they are scared that 5.3 (pro) will be released today/tomorrow and blow it out of the water and rushed it while they could still reference 5.2 benchmarks.
LZ_Khan1 month ago
biggest problem is that it's slow. also safety seems overtuned at the moment. getting some really silly refusals. everything else is pretty good.
mustaphah1 month ago
Google is terrible at marketing, but this feels like a big step forward.
As per the announcement, Gemini 3.1 Pro score 68.5% on Terminal-Bench 2.0, which makes it the top performer on the Terminus 2 harness [1]. That harness is a "neutral agent scaffold," built by researchers at Terminal-Bench to compare different LLMs in the same standardized setup (same tools, prompts, etc.).
It's also taken top model place on both the Intelligence Index & Coding Index of Artificial Analysis [2], but on their Agentic Index, it's still lagging behind Opus 4.6, GLM-5, Sonnet 4.6, and GPT-5.2.
---
[1] https://www.tbench.ai/leaderboard/terminal-bench/2.0?agents=...
[2] https://artificialanalysis.ai
trilogic1 month ago
Humanity last exam 44%, Scicode 59, and that 80, and this 78 but not 100% ever.
Would be nice to see that this models, Plus, Pro, Super, God mode can do 1 Bench 100%. I am missing smth here?
kuprel1 month ago
Why don't they show Grok benchmarks?
jdthedisciple1 month ago
Why should I be excited?
BMFXX1 month ago
Just wish iI could get 2.5 daily limit above 1000 requests easily. Driving me insane...
hn_throw20251 month ago
Yeah great, now can I have my pinned chats back please?
https://www.google.com/appsstatus/dashboard/incidents/nK23Zs...
makeavish1 month ago
I hope to have great next two weeks before it gets nerfed.
himata41131 month ago
The visual capabilities of this model are frankly kind of ridicioulus what the hell.
lysecret1 month ago
Please I need 3 in ga…
leecommamichael1 month ago
Whoa, I think Gemini 3 Pro was a disappointment, but Gemini 3.1 Pro is definitely the future!
throwaw121 month ago
Can we switch from Claude Code to Google yet?
Benchmarks are saying: just try
But real world could be different
pickle-pixel1 month ago
does it still crash out after couple prompts?
taytus1 month ago
Another preview model? Why google keep doing this?
solarisos1 month ago
The speed of these 3.1 and Preview releases is starting to feel like the early days of web frameworks. It’s becoming less about the raw benchmarks and more about which model handles long-context 'hallucination' well enough to be actually used in a production pipeline without constant babysitting.
techgnosis1 month ago
I'd love a new Gemini agent that isn't written with Node.js. Not sure why they think that's a good distribution model.
jcims1 month ago
Pelican on a bicycle in drawio - https://imgur.com/a/tNgITTR
(FWIW I'm finding a lot of utility in LLMs doing diagrams in tools like drawio)

news.ycombinator.com/item?id=47074735