← back to model results
ground truth
jiggle?
csssource ↗
model outputs
Gemini 3 Flash Preview →
A 0.88T 0.12
Qwen3-VL-8B-Instruct →
A 0.89T 0.26
GPT-5.4 →
A 0.93T 0.21
Claude Sonnet 4.6 →
A 0.90T 0.09
LLaMA 4 Scout →
A 0.86T 0.03
1<div></div>ground truth
model outputs
1<div></div>