AI Model Benchmark · real‑time leaderboard (latest LLMs)

📊 Industry benchmark (Q2 2025)

Standardised evaluation: MMLU (pro), HumanEval, GPQA, latency & pricing.

Model (latest SOTA)	MMLU (5-shot)	HumanEval (pass@1)	Latency (ms/token)	Price (input) $/1M tokens

* latency measured on H100 SXM (p99, 1024 in/256 out)

🧠 Composite intelligence ranking

Normalized scores: reasoning, coding, multilingual, instruction following (0-100)

🏆 Top performers (overall)

⚡ Throughput (tok/sec) – efficiency champion

📌 Daily refreshed benchmark pool — data based on open leaderboard (max 2 quarters old)

📐 Benchmarks: MMLU, HumanEval, MATH, MT-Bench, GPQA (diamond set) ⚙️ Hardware: NVIDIA H100 SXM / vLLM 0.6.1

⟳ Run live evaluation

⨀ AI BENCHMARK HUB