← back to model results
ground truth
Only CSS: Screw 2
model outputs
Gemini 3 Flash Preview →
A 0.65T 0.23
Qwen3-VL-8B-Instruct →
A 0.63T 0.28
GPT-5.4 →
A 0.70T 0.26
Claude Sonnet 4.6 →
A 0.57T 0.19
LLaMA 4 Scout →
A 0.42T 0.00
1<div class="donuts"><div class="donuts_mask">SCREW</div><div class="donuts_mask">SCREW</div><div class="donuts_mask">SCREW</div><div class="donuts_mask">SCREW</div><div class="donuts_mask">SCREW</div><div class="donuts_mask">SCREW</div><div class="donuts_mask">SCREW</div><div class="donuts_mask">SCREW</div><div class="donuts_mask">SCREW</div><div class="donuts_mask">SCREW</div></div>