← back to model results
ground truth
Nice spinny stuff
jssource ↗
model outputs
Gemini 3 Flash Preview →
A 0.90T 0.11
Qwen3-VL-8B-Instruct →
A 0.77T 0.11
GPT-5.4 →
A 0.85T 0.32
Claude Sonnet 4.6 →
A 0.93T 0.31
LLaMA 4 Scout (no output)
A —T —
1<div class="container">
2<div class="poop lol"></div>
3</div>