← back to model results
ground truth
Apple Loading Bar
csssource ↗
model outputs
Gemini 3 Flash Preview →
A 0.62T 0.23
Qwen3-VL-8B-Instruct →
A 0.60T 0.27
GPT-5.4 →
A 0.63T 0.25
Claude Sonnet 4.6 →
A 0.70T 0.52
LLaMA 4 Scout →
A 0.61T 0.22
1<div class="progress"></div>