Gemini 3.5 Flash pairs smarts with speed

June 17, 2026

Google’s faster mid-tier model arrives with meaningful performance gains—but at a noticeably higher price, reinforcing a broader industry trend of rising per-token costs across leading AI systems.

What’s new: Google has released Gemini 3.5 Flash, an upgraded version of its multimodal mid-range model. The update delivers improvements in agent-like reasoning, visual comprehension, and latency, but comes at roughly three times the price of its predecessor, Gemini 3 Flash.

Input/output: The model accepts text, images, audio, and video inputs (up to 1 million tokens) and generates up to 64,000 tokens of text output at approximately 204 tokens per second.

Architecture: Mixture-of-experts transformer

Key features: Adjustable reasoning modes (minimal, low, medium, high), “thought preservation” that carries reasoning tokens across multi-turn interactions (similar in concept to Kimi K2.6’s persistent thinking), and tool-use capabilities, though full computer-use functionality is not yet available.

Performance: Gemini 3.5 Flash leads Artificial Analysis’s APEX-Agents-AA benchmark and performs strongly on the MMMU-Pro multimodal reasoning test in Flash configuration. However, it still trails top-tier frontier models in overall intelligence, coding ability, and knowledge depth.

Availability and pricing: The model is available through the Gemini app, Google AI Studio, Google Antigravity, and Google Search AI Mode under usage limits that reset every five hours (with weekly caps). For enterprise and API users, pricing is set at $1.50 / $0.15 / $9.00 per million tokens (input / cached input / output).

Undisclosed: Google has not released parameter counts, training dataset details, or full architectural specifics.

How it works: Gemini 3.5 Flash is described as being “based on” Gemini 3 Flash, which itself derives from Gemini 3 Pro, according to Google’s model documentation. It uses a mixture-of-experts transformer architecture trained on a blend of text, code, images, audio, and video sourced from public web data, licensed datasets, Google user interactions, and synthetic training material.

The model was further refined using reinforcement learning techniques focused on multi-step reasoning, structured problem-solving, and formal tasks such as theorem proving.

Performance snapshot: Independent evaluations show a mixed but competitive profile. On Artificial Analysis’s Intelligence Index, Gemini 3.5 Flash places mid-pack—around fifth to seventh depending on reasoning settings—behind higher-end models such as Qwen 3.7 Max (reasoning-enabled) and other frontier systems.

On MMMU-Pro, the model reaches 84% accuracy in high-reasoning mode, the highest recorded score, narrowly surpassing Gemini 3.1 Pro Preview (82%).

On APEX-Agents-AA, a benchmark simulating long-horizon professional tasks, Gemini 3.5 Flash achieves 47.1% accuracy, outperforming GPT-5.5 (37.7%) on first attempt performance.

On GDPval-AA, it records an Elo score of 1,656 in high reasoning mode, trailing GPT-5.5 (1,769) but surpassing earlier Gemini iterations.

On ARC-AGI-2, the model scores 72.1%, below Gemini 3.1 Pro Preview (77.1%) and GPT-5.5 (85.0%).

On AA-Omniscience, which penalizes hallucinated responses, Gemini 3.5 Flash lags behind both Gemini 3.1 Pro Preview and Claude Opus 4.7.

On Arena.ai leaderboards (May 24, 2026), it ranks ninth in Text Arena (1,480 Elo) and tenth in WebDev coding (1,506 Elo), with stronger performance in math (first place in category at 1,521 Elo) but weaker results in coding (31st in category).

Context from Google I/O 2026: The model debuted alongside several major announcements, including a redesigned Antigravity coding environment focused on agent orchestration rather than IDE-style workflows, and the introduction of the Omni model family, including Omni Flash, which supports text-to-video generation across multiple input modalities.

Gemini 3.5 Flash also plays a role in the evolution of Google Search, enabling more conversational query handling, AI-driven research agents, and expanded AI-generated summaries that reduce reliance on traditional link-based results.

Why it matters: Gemini 3.5 Flash effectively redefines the “Flash” tier. Originally positioned as a lightweight, cost-efficient alternative to Pro and Ultra models, it now occupies a more complex space—closer in capability to mid-tier competitors like Anthropic’s Sonnet-class systems than earlier “fast and cheap” models.

The trade-off is clear: higher capability and stronger agent performance come with increased cost and reduced pricing clarity. While Google positions the model as efficient for developers building multi-step agents and low-latency applications, independent analysis suggests it may not always be cheaper in practice than previous-generation Pro models.

The result is a shifting landscape where speed no longer guarantees affordability, and where mid-tier models increasingly compete on capability rather than cost alone.

Gemini 3.5 Flash pairs smarts with speed

Don't Miss

AI Ethics in the Age of Generative Models

Will Matter finally do what it was always supposed to do?

The Role of Quantum Mechanics in Modern Tech

This startup’s “super metals” may soon find their way into military drones, luxury watches,...

The Evolution of AI Companions and Chatbots