uderia · LLM Intelligence Guide · 2026

Find the right model
for every uderia workload

40 models across 4 tiers — scored on uderia's real tasks. Adjust weights, flip sort direction, compare prices across providers.

RAG Doc Q&A 30% TOOL Function Calling 25% CTX Long Context 20% JSON Structured Output 15% CHAT Multi-Turn 10%

Open / Low

Open-weight · self-hostable

Medium / Entry Frontier

Lite · Mini · Flash-Lite variants

High / Mid Frontier

Flash · Sonnet · Standard class

Extreme / Flagship

Pro · Opus · GPT-5.5 tier

#	Model	Tier	Composite ▼	Quality	Speed	Price Eff.	Best Price	t/s	TTFT	Context	Arena ELO

Price vs Performance

Bubble size = speed score · X axis = blended $/1M (log scale) · Y axis = uderia quality score · Click a bubble to highlight that model

Open (L) · Entry Frontier (M)

Mid Frontier (H) · Flagship (E)

Methodology

How models are scored for uderia workloads

Quality Score

Weighted composite of five uderia workloads: RAG/Document Q&A (30%), Tool Use & Function Calling (25%), Long-Context Reasoning (20%), Structured Output / JSON (15%), Multi-Turn Conversation (10%). Scores derived from published benchmarks (GPQA, BFCL, SWE-bench, IFEval) and Chatbot Arena ELO where available.

Speed Score

Combines throughput (tokens/second, higher = better) and TTFT (time to first token, lower = better). Speed score = 60% throughput + 40% TTFT component, each normalized 0–100 across all 40 models. Matters most for uderia's interactive IDEATE profile and real-time streaming.

Price Efficiency

Blended price = input × 0.70 + output × 0.30 (uderia's typical token traffic ratio, reflecting 16K max output). Best available price for the selected provider type is used. Self-hosted models score 100. Score uses log scale so $0.01 vs $0.10 matters as much as $1 vs $10.

Tier Classification

L: Open-weight / self-hostable models (Gemma, Llama, Qwen, DeepSeek, Mistral open). M: Lite/Mini/Nano/Flash-Lite closed-source variants. H: Standard/Flash/Sonnet-class models. E: Pro/Opus/flagship — best available from each provider. Classification is capability-based, not purely price-based.

Data Sources

Loading…

Update Cadence

Refreshed monthly, or immediately following major model releases or pricing changes. At each update, provider pricing pages, Artificial Analysis benchmarks, and Chatbot Arena rankings are re-verified against primary sources to ensure accuracy.

Changelog

Version history and data updates

Find the right modelfor every uderia workload

Find the right model
for every uderia workload