← back to model results
ground truth
Loaders (WIP)
csssource ↗
model outputs
Gemini 3 Flash Preview →
A 0.94T 0.41
Qwen3-VL-8B-Instruct →
A 0.60T 0.26
GPT-5.4 →
A 0.77T 0.29
Claude Sonnet 4.6 →
A 0.88T 0.31
LLaMA 4 Scout →
A 0.47T 0.00
1<div class="container">
2 <div class="box">
3 <div class="hourglass"></div>
4 </div>
5</div>
6