60 Questions of Svelte (LLM Judged)
Models with the best performance on Svelte Bench
60 Questions of Svelte (LLM Judged)
How model cost relates to performance on this benchmark
Complete performance breakdown for all models tested on Svelte Bench
Claude 4.5 Sonnet Claude 4.5 Sonnet | Anthropic | 53.5 | 13.1 | 62.0 | 70 | 7.1 | 200,000 |
GLM-4.6 GLM-4.6 | Zhipu AI | 49.7 | 8.9 | 52.0 | 70 | 1.9 | 200,000 |
Qwen 3 Coder 480B A35B Qwen 3 Coder 480B A35B | Alibaba | 52.0 | 12.6 | 30.0 | 150 | 0.64 | 262,000 |
Qwen 3 Max Qwen 3 Max | Alibaba | 54.5 | 6.7 | 39.5 | 54 | 0.38 | 256,000 |
GPT 5 Mini GPT 5 Mini | OpenAI | 75.2 | 2.3 | 43.2 | 41 | 3.46 | 400,000 |
GPT 5 Mini High GPT 5 Mini High | OpenAI | 65.8 | 3.4 | 42.7 | 47 | 4.35 | 400,000 |
Grok Code Fast 1 Grok Code Fast 1 | xAI | 45.8 | 15.8 | 37.7 | 84 | 0.93 | 256,000 |
custom
accuracy percentage
100
60