13 Coding & General Questions (Human Judged)
Models with the best performance on KingBench
13 Coding & General Questions (Human Judged)
How model cost relates to performance on this benchmark
Complete performance breakdown for all models tested on KingBench
Claude 4.5 Sonnet Claude 4.5 Sonnet | Anthropic | 62.0 | 13.1 | 53.5 | 70 | 7.1 | 200,000 |
GLM-4.6 GLM-4.6 | Zhipu AI | 52.0 | 8.9 | 49.7 | 70 | 1.9 | 200,000 |
Qwen 3 Coder 480B A35B Qwen 3 Coder 480B A35B | Alibaba | 30.0 | 12.6 | 52.0 | 150 | 0.64 | 262,000 |
Qwen 3 Max Qwen 3 Max | Alibaba | 39.5 | 6.7 | 54.5 | 54 | 0.38 | 256,000 |
Qwen3 Next 80B A3B Qwen3 Next 80B A3B | Alibaba | 18.6 | 1.9 | - | 180 | 0.02 | 262,000 |
Claude 4 Sonnet Claude 4 Sonnet | Anthropic | 34.1 | 13.0 | - | 60 | 4.96 | 256,000 |
Claude 4 Sonnet (Max) Claude 4 Sonnet (Max) | Anthropic | 50.0 | 11.2 | - | 60 | 11.69 | 256,000 |
Claude 4.1 Opus (Max) Claude 4.1 Opus (Max) | Anthropic | 52.7 | 17.1 | - | 60 | 65.28 | 256,000 |
Deepseek V3.1 Deepseek V3.1 | DeepSeek | 45.5 | 14.1 | - | 50 | 0.03 | 128,000 |
Deepseek V3.1 Reasoning Deepseek V3.1 Reasoning | DeepSeek | 43.2 | 10.3 | - | 50 | 1.07 | 128,000 |
Gemini 2.5 Pro Gemini 2.5 Pro | Google | 42.3 | 7.1 | - | 70 | 7.29 | 1,000,000 |
GPT 5 Mini GPT 5 Mini | OpenAI | 43.2 | 2.3 | 75.2 | 41 | 3.46 | 400,000 |
GPT 5 Mini High GPT 5 Mini High | OpenAI | 42.7 | 3.4 | 65.8 | 47 | 4.35 | 400,000 |
Grok Code Fast 1 Grok Code Fast 1 | xAI | 37.7 | 15.8 | 45.8 | 84 | 0.93 | 256,000 |
LongCat Flash LongCat Flash | Meituan | 44.5 | 5.3 | - | 50 | 0.04 | 131,000 |
Kimi K2 (0905) Kimi K2 (0905) | Moonshot | 39.5 | 7.3 | - | 30 | 3.02 | 131,000 |
GLM 4.5 GLM 4.5 | Zhipu AI | 42.7 | 9.2 | - | 40 | 3.06 | 131,000 |
GLM 4.5 Air GLM 4.5 Air | Zhipu AI | 31.8 | 9.8 | - | 70 | 1.15 | 131,000 |
custom
accuracy percentage
100
13