← back to model results
ground truth
Camera following: Step8
csssource ↗
model outputs
Gemini 3 Flash Preview →
A 0.78T 0.36
Qwen3-VL-8B-Instruct →
A 0.53T 0.29
GPT-5.4 →
A 0.84T 0.31
Claude Sonnet 4.6 →
A 0.84T 0.50
LLaMA 4 Scout →
A 0.48T 0.00
1<div class="bank">
2 <div class="rotation">
3 <div class="move">
4 <div class="rotation-follow">
5 <div class="bank-follow">
6 <div class="ball"></div>
7 </div>
8 </div>
9 </div>
10 </div>
11 <div class="rotation">
12 <div class="move">
13 <div class="rotation-follow">
14 <div class="bank-follow">
15 <div class="ball"></div>
16 </div>
17 </div>
18 </div>
19 </div>
20 <div class="rotation">
21 <div class="move">
22 <div class="rotation-follow">
23 <div class="bank-follow">
24 <div class="ball"></div>
25 </div>
26 </div>
27 </div>
28 </div>
29</div>