← back to model results
ground truth
Only CSS: Japanese "人力車"
model outputs
Gemini 3 Flash Preview →
A 0.62T 0.36
Qwen3-VL-8B-Instruct →
A 0.67T 0.42
GPT-5.4 →
A 0.69T 0.39
Claude Sonnet 4.6 →
A 0.67T 0.43
LLaMA 4 Scout →
A 0.44T 0.00
1<div class="camera -x"><div class="camera -y"><div class="camera -z"><div class="rikisya"><div class="wheel -left"></div><div class="wheel -right"></div><div class="body"><div class="bottom"></div><div class="side -left"></div><div class="side -right"></div><div class="back"></div></div><div class="handle"></div></div></div></div></div>