Rank by
Quality
40%
Speed
20%
Price Eff.
40%
Workload
RAG 30%
TOOL 25%
CTX 20%
JSON 15%
CHAT 10%
uderia · LLM Intelligence Guide · 2026

Find the right model
for every uderia workload

40 models across 4 tiers — scored on uderia's real tasks. Adjust weights, flip sort direction, compare prices across providers.

RAG Doc Q&A 30% TOOL Function Calling 25% CTX Long Context 20% JSON Structured Output 15% CHAT Multi-Turn 10%
L
Open / Low
Open-weight · self-hostable
M
Medium / Entry Frontier
Lite · Mini · Flash-Lite variants
H
High / Mid Frontier
Flash · Sonnet · Standard class
E
Extreme / Flagship
Pro · Opus · GPT-5.5 tier
# Model Tier Composite ▼ Quality Speed Price Eff. Best Price t/s TTFT Context Arena ELO
Price vs Performance
Bubble size = speed score · X axis = blended $/1M (log scale) · Y axis = uderia quality score · Click a bubble to highlight that model
Open (L) · Entry Frontier (M)
Mid Frontier (H) · Flagship (E)
Methodology
How models are scored for uderia workloads
Quality Score
Weighted composite of five uderia workloads: RAG/Document Q&A (30%), Tool Use & Function Calling (25%), Long-Context Reasoning (20%), Structured Output / JSON (15%), Multi-Turn Conversation (10%). Scores derived from published benchmarks (GPQA, BFCL, SWE-bench, IFEval) and Chatbot Arena ELO where available.
Speed Score
Combines throughput (tokens/second, higher = better) and TTFT (time to first token, lower = better). Speed score = 60% throughput + 40% TTFT component, each normalized 0–100 across all 40 models. Matters most for uderia's interactive IDEATE profile and real-time streaming.
Price Efficiency
Blended price = input × 0.70 + output × 0.30 (uderia's typical token traffic ratio, reflecting 16K max output). Best available price for the selected provider type is used. Self-hosted models score 100. Score uses log scale so $0.01 vs $0.10 matters as much as $1 vs $10.
Tier Classification
L: Open-weight / self-hostable models (Gemma, Llama, Qwen, DeepSeek, Mistral open). M: Lite/Mini/Nano/Flash-Lite closed-source variants. H: Standard/Flash/Sonnet-class models. E: Pro/Opus/flagship — best available from each provider. Classification is capability-based, not purely price-based.
Data Sources
Loading…
Update Cadence
Refreshed monthly, or immediately following major model releases or pricing changes. At each update, provider pricing pages, Artificial Analysis benchmarks, and Chatbot Arena rankings are re-verified against primary sources to ensure accuracy.
Changelog
Version history and data updates