NVIDIA has introduced SANA-WM, a 2.6-billion-parameter open-source world model designed to generate high-fidelity 720p videos lasting up to one minute with precise 6-DoF camera control. The system can synthesize realistic environments from a single image and trajectory input while operating efficiently on a single GPU, including RTX 5090 hardware.
NVIDIA says SANA-WM uses hybrid linear attention, dual-branch camera control, and long-video refinement techniques to improve temporal consistency and action-following accuracy. The model reportedly achieves throughput up to 36 times faster than previous open-source baselines while maintaining competitive visual quality.
Researchers believe SANA-WM could accelerate advancements in robotics, simulation, gaming, embodied AI, and interactive world modeling.


.jpg)


