← back to model results
ground truth
Nice spinny stuff
jssource ↗
model outputs
Gemini 3 Flash Preview →
A 0.92T 0.19
Qwen3-VL-8B-Instruct →
A 0.87T 0.12
GPT-5.4 →
A 0.96T 0.28
Claude Sonnet 4.6 →
A 0.96T 0.35
LLaMA 4 Scout →
A 0.48T 0.13
1<div class="container">
2<div class="manyouterpoop lol"></div>
3</div>