Zero-shot video (or image-frame) → code results on the test set, across commercial and open-source models.
Each output is tagged with A = appearance similarity and T = temporal similarity; higher is better for both. Click a video to inspect its code.
33–40 of 214
ground truth
Only CSS: 3D Scan
model outputs
Gemini 3 Flash Preview
Qwen3-VL-8B-Instruct
GPT-5.4
Claude Sonnet 4.6
LLaMA 4 Scout
ground truth
Only CSS: Responsive City Drone View Black
model outputs
Gemini 3 Flash Preview
Qwen3-VL-8B-Instruct
GPT-5.4
Claude Sonnet 4.6
LLaMA 4 Scout
ground truth
Only CSS: Fall In Love
model outputs
Gemini 3 Flash Preview
Qwen3-VL-8B-Instruct
GPT-5.4
Claude Sonnet 4.6
LLaMA 4 Scout
ground truth
Only CSS: Truck a GO, GO! GOOOO!!
model outputs
Gemini 3 Flash Preview
Qwen3-VL-8B-Instruct
GPT-5.4
Claude Sonnet 4.6
LLaMA 4 Scout
ground truth
Only CSS: Codevember #6 Money Storm
model outputs
Gemini 3 Flash Preview
Qwen3-VL-8B-Instruct
GPT-5.4
Claude Sonnet 4.6
LLaMA 4 Scout
ground truth
Only CSS: Sunset Beach
model outputs
Gemini 3 Flash Preview
Qwen3-VL-8B-Instruct
GPT-5.4
Claude Sonnet 4.6
LLaMA 4 Scout
ground truth
Only CSS: Star Warp Display
model outputs
Gemini 3 Flash Preview
Qwen3-VL-8B-Instruct
GPT-5.4
Claude Sonnet 4.6
LLaMA 4 Scout
ground truth
Only CSS: Caterpillar
model outputs
Gemini 3 Flash Preview
Qwen3-VL-8B-Instruct
GPT-5.4
Claude Sonnet 4.6
LLaMA 4 Scout