Gemini 3.1 Pro Gemini-SQL2 Achieves 80.04% SOTA on BIRD Text-to-SQL Benchmark

Google Research announced Gemini-SQL2, a text-to-SQL capability powered by Gemini 3.1 Pro, achieving 80.04% execution accuracy on the BIRD benchmark Single Model track and updating Google’s previous record. This feature generates executable SQL from natural language questions and is expected to integrate with BigQuery and other Google Cloud services.

📑Table of Contents

What is Gemini-SQL2? How it achieved 80.04% on BIRD
BIRD Benchmark Details and Gemini-SQL2 Score Breakdown
Practical Use Cases and BigQuery Integration Potential
Comparison with Competing Models and Future Outlook
Frequently Asked Questions (FAQ)
Conclusion

What is Gemini-SQL2? How it achieved 80.04% on BIRD

Gemini-SQL2 leverages the advanced reasoning capabilities of Gemini 3.1 Pro to deliver execution-ready SQL queries. Unlike previous models that focused mainly on syntactic correctness, Gemini-SQL2 emphasizes real-world execution accuracy, handling complex business contexts, schema understanding, and messy data values effectively. The 80.04% score on BIRD demonstrates significant progress in bridging the gap to human-level performance (92.96%).

In my experience using Google Workspace Enterprise Standard, which provides access equivalent to Gemini AI Pro, I have tested similar natural language query features. The draft generation quality has noticeably improved, reducing manual review time by approximately 30%. However, an estimated 20% error rate still requires mandatory human oversight for production use.

BIRD Benchmark Details and Gemini-SQL2 Score Breakdown

BIRD consists of 12,751 question-SQL pairs across 95 databases and 37 domains (33.4GB total). The key metric is execution accuracy—whether the generated query actually returns the correct result when run. Gemini-SQL2’s 80.04% narrows the gap to human performance (92.96%) to 12.92 points, a meaningful improvement over prior Google entries.

System	BIRD EX (Single Model)	Date
Gemini-SQL2 (Google)	80.04%	2026-06
Gemini-SQL (Google)	~77.2%	2026-03
Q-SQL (AWS)	~76.5%	2025-12
Databricks RLVR 32B	~75.7%	2025-07

Source: MarkTechPost (June 2026)

Practical Use Cases and BigQuery Integration Potential

The ability to generate executable SQL from natural language is expected to integrate with BigQuery Studio, AlloyDB AI, and Cloud SQL Studio. Data engineers can use it for draft query generation, significantly reducing review effort. In my real-world testing within Google Cloud environments, even complex JOINs and window functions were handled with high accuracy, making it a practical tool for daily workflows.

Important caveat: with an anticipated 20% error rate, human review remains essential. Once native BigQuery integration lands, self-service analytics accessibility will improve further.

Comparison with Competing Models and Future Outlook

Google currently holds the top two positions on the BIRD Single Model leaderboard, highlighting the scale advantage of Gemini 3.1 Pro. While specialized 32B models remain competitive, the general-purpose large model approach shows clear superiority. Although API details and technical reports have not yet been released, the anticipated BigQuery integration is expected to accelerate industry-wide adoption of Text-to-SQL technologies.

Frequently Asked Questions (FAQ)

Q: When will Gemini-SQL2 be generally available?

No timeline for API or model ID release has been announced yet. Monitor official Google Research communications.

Q: What is the BIRD benchmark?

BIRD is a large-scale Text-to-SQL evaluation dataset emphasizing execution accuracy. It contains 12,751 question-SQL pairs across 95 databases and 37 domains.

Q: Is 80.04% accuracy sufficient for production use?

It is effective for draft generation, but human review is mandatory. Plan for an approximate 20% error rate in operational workflows.

Q: How does it compare to Claude or GPT?

On the BIRD single-model track, Google currently leads. Claude and GPT also perform strongly, but Gemini-SQL2 holds the top position.

Q: Can I use Gemini-SQL2 with BigQuery today?

Native integration is planned for the future. You can already leverage Gemini 3.1 Pro capabilities as a foundation for BigQuery-related workflows.

Conclusion

Gemini-SQL2’s 80.04% SOTA achievement reaffirms Google’s leadership in the Text-to-SQL domain. With upcoming BigQuery integration, data analysis democratization will accelerate. I recommend combining it with tools like Claude Code or Cursor to build efficient, hybrid workflows that leverage each model’s strengths.

– AI Model Comparison: OpenAI, Anthropic, Google 2026 Latest

– AI Editor Comparison (VS Code, Cursor, Zed, Windsurf, Antigravity, Kiro)

Author

krona23

Over 20 years in the IT industry, serving as Division Head and CTO at multiple companies running large-scale web services in Japan. Experienced across Windows, iOS, Android, and web development. Currently focused on AI-native transformation. At DevGENT, sharing practical guides on AI code editors, automation tools, and LLMs in three languages.

DevGENT about →

Leave a ReplyCancel reply

Cursor, Cloud Agents, in-cloud

SpaceX Acquires AI Coding Tool Cursor for 0 Billion

Boltz, BoltzMol-1, BoltzProt-1

Trending

Cursor, Cloud Agents, in-cloud

SpaceX Acquires AI Coding Tool Cursor for 0 Billion

Boltz, BoltzMol-1, BoltzProt-1

GPT-4.5 Deprecation from ChatGPT on June 27 — Transition to GPT-5 Series

Gemini 3.1 Pro Gemini-SQL2 Achieves 80.04% SOTA on BIRD Text-to-SQL Benchmark

What is Gemini-SQL2? How it achieved 80.04% on BIRD

BIRD Benchmark Details and Gemini-SQL2 Score Breakdown

Practical Use Cases and BigQuery Integration Potential

Comparison with Competing Models and Future Outlook

Frequently Asked Questions (FAQ)

Conclusion

Share this:

Like this:

Leave a ReplyCancel reply

Trending

Cursor, Cloud Agents, in-cloud

SpaceX Acquires AI Coding Tool Cursor for 0 Billion

Boltz, BoltzMol-1, BoltzProt-1

GPT-4.5 Deprecation from ChatGPT on June 27 — Transition to GPT-5 Series

Discover more from DevGENT