animation2code benchmark
For best compatibility, please view this dashboard in a Chrome browser.

Zero-shot video (or image-frame) → code results on the test set, across commercial and open-source models.

Each output is tagged with A = appearance similarity and T = temporal similarity; higher is better for both. Click a video to inspect its code.

129136 of 214

ground truth

SVG Loading icons

model outputs

GPT-5.4

A 0.82T 0.28

Claude Sonnet 4.6

A 0.65T 0.26

LLaMA 4 Scout

A 0.61T 0.24

ground truth

SVG Loading icons

model outputs

GPT-5.4

A 0.88T 0.25

Claude Sonnet 4.6

A 0.85T 0.22

LLaMA 4 Scout

A 0.67T 0.19

ground truth

SVG Loading icons

model outputs

GPT-5.4

A 0.90T 0.30

Claude Sonnet 4.6

A 0.89T 0.28

LLaMA 4 Scout

A 0.84T 0.24

ground truth

SVG Loading icons

model outputs

GPT-5.4

A 0.88T 0.25

Claude Sonnet 4.6

A 0.82T 0.22

LLaMA 4 Scout

A 0.38T 0.00

ground truth

SVG Loading icons

model outputs

GPT-5.4

A 0.84T 0.29

Claude Sonnet 4.6

A 0.85T 0.27

LLaMA 4 Scout

A 0.80T 0.26

ground truth

Exploring Bourbon

model outputs

GPT-5.4

A 0.83T 0.26

Claude Sonnet 4.6

A 0.75T 0.25

LLaMA 4 Scout

A 0.41T 0.00

ground truth

Exploring Bourbon

model outputs

GPT-5.4

A 0.89T 0.22

Claude Sonnet 4.6

A 0.82T 0.28

LLaMA 4 Scout

A 0.46T 0.20

ground truth

Exploring Bourbon

model outputs

GPT-5.4

A 0.77T 0.32

Claude Sonnet 4.6

A 0.75T 0.31

LLaMA 4 Scout

A 0.66T 0.26