Quality Score
Weighted composite of five uderia workloads: RAG/Document Q&A (30%), Tool Use & Function Calling (25%), Long-Context Reasoning (20%), Structured Output / JSON (15%), Multi-Turn Conversation (10%). Scores derived from published benchmarks (GPQA, BFCL, SWE-bench, IFEval) and Chatbot Arena ELO where available.
Speed Score
Combines throughput (tokens/second, higher = better) and TTFT (time to first token, lower = better). Speed score = 60% throughput + 40% TTFT component, each normalized 0–100 across all 40 models. Matters most for uderia's interactive IDEATE profile and real-time streaming.
Price Efficiency
Blended price = input × 0.70 + output × 0.30 (uderia's typical token traffic ratio, reflecting 16K max output). Free tiers are excluded — only commercial, production-grade pricing is used. Open-weight models use the cheapest verified commercial hoster (e.g. Together AI, DeepInfra, Friendli). If no commercial hoster is available, the model is marked No Hoster Available and excluded from price ranking. Score is log-normalized relative to the cheapest and most expensive model in the current dataset — the cheapest scores 100, the most expensive scores 0, with all others placed proportionally on a log scale.
Tier Classification
L: Open-weight / self-hostable models (Gemma, Llama, Qwen, DeepSeek, Mistral open). M: Lite/Mini/Nano/Flash-Lite closed-source variants. H: Standard/Flash/Sonnet-class models. E: Pro/Opus/flagship — best available from each provider. Classification is capability-based, not purely price-based.
Update Cadence
Refreshed monthly, or immediately following major model releases or pricing changes. At each update, provider pricing pages, Artificial Analysis benchmarks, and Chatbot Arena rankings are re-verified against primary sources to ensure accuracy.