A New Google AI Research Proposes Deep-Thinking Ratio to Improve LLM Accuracy While Cutting Total Inference Costs by Half

Revolutionizing AI Automation: Google AI’s Deep-Thinking Ratio Boosts LLM Accuracy and Cuts Inference Costs

By Amr Abdeldaym, Founder of Thiqa Flow

In recent years, the AI community has adhered to a prevailing assumption: to tackle harder problems with Large Language Models (LLMs), simply extend the Chain-of-Thought (CoT) length. However, groundbreaking research from the University of Virginia and Google challenges this long-held belief. Their findings illuminate that “thinking long”—generating longer outputs—is not synonymous with “thinking hard.”

The Pitfalls of Token Maxing: Why More Isn’t Always Better

Traditionally, token count has been used as a proxy for the effort or reasoning depth an AI exerts on a task. But the new study reveals a striking contradiction:

Negative correlation between token length and accuracy: With an average r = -0.59, longer generation length often leads to reduced accuracy.
Overthinking phenomenon: AI models producing lengthy outputs risk getting trapped in loops, repeating unnecessary steps, or amplifying errors.
Compute wastage: This naïve token maximization inflates inference costs by expending resources on uninformative or redundant tokens.

This revelation urges developers to rethink approaches focusing solely on output length as an indicator of quality, paving the way for more intelligent metrics.

Introducing Deep-Thinking Tokens: The Core of True AI Reasoning

The research pivots from counting tokens to evaluating how extensively an LLM processes information internally. It categorizes tokens into:

Shallow Tokens: Easy-to-predict words whose probability distribution stabilizes early in the model’s transformer layers.
Deep-Thinking Tokens: Challenging tokens—especially in logic or math—that see significant internal prediction shifts in deeper layers.

This conceptual breakthrough introduces the Deep-Thinking Ratio (DTR): the percentage of deep-thinking tokens within the entire sequence, representing true cognitive effort inside the LLM’s architecture.

Measuring Depth: From Internal Layers to the Output

To identify deep-thinking tokens, the researchers apply a novel internal probing technique:

Intermediate hidden states (h_t,l) at each transformer layer are projected into vocabulary probabilities via the unembedding matrix.
They compute the Jensen-Shannon Divergence (JSD) between the token’s probability distribution at the current layer and at the final layer, capturing how predictions evolve.
A token is classified as deep-thinking if its prediction stabilizes late—specifically, after 85% of layers have been processed.

This approach yields a remarkably strong positive correlation with accuracy metrics, averaging r = 0.683 across various advanced models including DeepSeek-R1-70B, Qwen3-30B-Thinking, and GPT-OSS-120B.

Think@n: The Next-Gen AI Inference Paradigm

Building on DTR, the researchers propose Think@n, an innovative inference scaling method enhancing both accuracy and efficiency:

Generate multiple candidate responses simultaneously.
After processing only 50 prefix tokens, calculate the DTR for each candidate.
Use early halting to discard candidates with low DTR scores, thus avoiding unnecessary computation.
Fully generate only high-potential “deep thinkers.”

Method	Accuracy (%)	Avg. Cost (k tokens)
Cons@n (Majority Vote)	92.7	307.6
Think@n (DTR-based Selection)	94.7	155.4

On the AIME 2025 math benchmark, Think@n not only boosted accuracy by 2% but also halved average inference costs—demonstrating a landmark advance in AI automation efficiency.

Implications for AI Automation and Business Efficiency

This advancement holds transformative potential for enterprises investing in AI-driven automation:

Reduced operational costs: By trimming unnecessary token generation, Think@n slashes compute expenses, making AI solutions more cost-effective.
Enhanced model reliability: Prioritizing deep-thinking tokens over sheer length means AI outputs are more precise and dependable.
Scalable AI deployment: Businesses can confidently scale AI automation workflows without linear increases in inference cost or latency.

This insight underscores the importance of incorporating internal model metrics like DTR into future AI system design, ultimately driving better business performance with smarter automation.

Conclusion: Embrace Deep Thinking for Smarter AI Automation

The research from Google and the University of Virginia illustrates a paradigm shift in how we measure and optimize AI reasoning. Moving beyond token count to the Deep-Thinking Ratio (DTR) aligns AI evaluation with how models truly process complex tasks internally. The Think@n method exemplifies how leveraging DTR can deliver not only superior accuracy but also dramatically improved business efficiency by cutting inference costs in half.

As AI continues to reshape automation across industries, adopting metrics like DTR will be essential for building powerful, cost-efficient language models that meet the growing demands of intelligent applications.

Looking for custom AI automation for your business? Connect with me at https://amr-abdeldaym.netlify.app/.