animation2code benchmark
For best compatibility, please view this dashboard in a Chrome browser.

Zero-shot video (or image-frame) → code results on the test set, across commercial and open-source models.

Each output is tagged with A = appearance similarity and T = temporal similarity; higher is better for both. Click a video to inspect its code.

5764 of 214

ground truth

Only CSS: Love

model outputs

no output

Gemini 3 Flash Preview

A T

GPT-5.4

A 0.82T 0.36

Claude Sonnet 4.6

A 0.31T 0.00

LLaMA 4 Scout

A 0.54T 0.09

model outputs

GPT-5.4

A 0.71T 0.36

Claude Sonnet 4.6

A 0.67T 0.31

LLaMA 4 Scout

A 0.46T 0.11

ground truth

SVG Tools

model outputs

no output

Qwen3-VL-8B-Instruct

A T

GPT-5.4

A 0.86T 0.24

Claude Sonnet 4.6

A 0.77T 0.29

LLaMA 4 Scout

A 0.66T 0.18

model outputs

no output

Qwen3-VL-8B-Instruct

A T

GPT-5.4

A 0.80T 0.19

Claude Sonnet 4.6

A 0.78T 0.24

LLaMA 4 Scout

A 0.61T 0.09

model outputs

GPT-5.4

A 0.81T 0.24

Claude Sonnet 4.6

A 0.90T 0.17

LLaMA 4 Scout

A 0.67T 0.33

model outputs

GPT-5.4

A 0.89T 0.35

Claude Sonnet 4.6

A 0.87T 0.23

LLaMA 4 Scout

A 0.60T 0.25

model outputs

no output

Qwen3-VL-8B-Instruct

A T

GPT-5.4

A 0.84T 0.27

Claude Sonnet 4.6

A 0.85T 0.27

LLaMA 4 Scout

A 0.57T 0.29

model outputs

no output

Qwen3-VL-8B-Instruct

A T

GPT-5.4

A 0.82T 0.23

Claude Sonnet 4.6

A 0.75T 0.38

LLaMA 4 Scout

A 0.59T 0.15