Google Research announced Gemini-SQL2, a text-to-SQL capability powered by Gemini 3.1 Pro, achieving 80.04% execution accuracy on the BIRD benchmark Single Model track and updating Google’s previous record. This feature generates executable SQL from natural language questions and is expected to integrate with BigQuery and other Google Cloud services.
📑Table of Contents
What is Gemini-SQL2? How it achieved 80.04% on BIRD
Gemini-SQL2 leverages the advanced reasoning capabilities of Gemini 3.1 Pro to deliver execution-ready SQL queries. Unlike previous models that focused mainly on syntactic correctness, Gemini-SQL2 emphasizes real-world execution accuracy, handling complex business contexts, schema understanding, and messy data values effectively. The 80.04% score on BIRD demonstrates significant progress in bridging the gap to human-level performance (92.96%).
In my experience using Google Workspace Enterprise Standard, which provides access equivalent to Gemini AI Pro, I have tested similar natural language query features. The draft generation quality has noticeably improved, reducing manual review time by approximately 30%. However, an estimated 20% error rate still requires mandatory human oversight for production use.
BIRD Benchmark Details and Gemini-SQL2 Score Breakdown
BIRD consists of 12,751 question-SQL pairs across 95 databases and 37 domains (33.4GB total). The key metric is execution accuracy—whether the generated query actually returns the correct result when run. Gemini-SQL2’s 80.04% narrows the gap to human performance (92.96%) to 12.92 points, a meaningful improvement over prior Google entries.
| System | BIRD EX (Single Model) | Date |
|---|---|---|
| Gemini-SQL2 (Google) | 80.04% | 2026-06 |
| Gemini-SQL (Google) | ~77.2% | 2026-03 |
| Q-SQL (AWS) | ~76.5% | 2025-12 |
| Databricks RLVR 32B | ~75.7% | 2025-07 |
Source: MarkTechPost (June 2026)
Practical Use Cases and BigQuery Integration Potential
The ability to generate executable SQL from natural language is expected to integrate with BigQuery Studio, AlloyDB AI, and Cloud SQL Studio. Data engineers can use it for draft query generation, significantly reducing review effort. In my real-world testing within Google Cloud environments, even complex JOINs and window functions were handled with high accuracy, making it a practical tool for daily workflows.
Important caveat: with an anticipated 20% error rate, human review remains essential. Once native BigQuery integration lands, self-service analytics accessibility will improve further.
Comparison with Competing Models and Future Outlook
Google currently holds the top two positions on the BIRD Single Model leaderboard, highlighting the scale advantage of Gemini 3.1 Pro. While specialized 32B models remain competitive, the general-purpose large model approach shows clear superiority. Although API details and technical reports have not yet been released, the anticipated BigQuery integration is expected to accelerate industry-wide adoption of Text-to-SQL technologies.
Frequently Asked Questions (FAQ)
Conclusion
Gemini-SQL2’s 80.04% SOTA achievement reaffirms Google’s leadership in the Text-to-SQL domain. With upcoming BigQuery integration, data analysis democratization will accelerate. I recommend combining it with tools like Claude Code or Cursor to build efficient, hybrid workflows that leverage each model’s strengths.
Related articles:
– AI Model Comparison: OpenAI, Anthropic, Google 2026 Latest
– AI Editor Comparison (VS Code, Cursor, Zed, Windsurf, Antigravity, Kiro)
Related articles: DynatraceがAI Coding Agent監視を拡張:Claude Code・Gemini CLI・Codex CLIをOpenTelemetryで可視化、Anthropic、Claude Fable 5 / Mythos 5 をリリース — 長時間・複雑タスク向け新世代モデル、Anthropic Fable 5 / Mythos 5、米政府輸出規制で全ユーザーアクセス停止 — Agent利用者に大混乱。
Author
krona23
Over 20 years in the IT industry, serving as Division Head and CTO at multiple companies running large-scale web services in Japan. Experienced across Windows, iOS, Android, and web development. Currently focused on AI-native transformation. At DevGENT, sharing practical guides on AI code editors, automation tools, and LLMs in three languages.










Leave a Reply