🧙‍♂️ AnimateAnything: Consistent and Controllable Animation for video generation


Abstract

We present a unified controllable video generation approach AnimateAnything that facilitates precise and consistent video manipulation across various conditions, including camera trajectories, text prompts, and user motion annotations. Specifically, we carefully design a multi-scale control feature fusion network to construct a common motion representation for different conditions. It explicitly converts all control information into frame-by-frame optical flows. Then we introduce the optical flows as motion priors to guide the video generation. In addition, to reduce the flickering issues caused by large-scale motion, we propose a frequency-based stabilization module. It can enhance temporal coherence by ensuring the video's frequency domain consistency. Experiments demonstrate that our method outperforms the state-of-the-art approaches.

Overview

The overview of AnimateAnything. Our framework is mainly constituted with two parts:Unified Flow Generation and Video Generation

Visualization Results

large Camera Pose Control

Drag Anything you like

Human Face Animation

Comparisons with CameraCtrl and MotionCtrl

Camera Motion CameraCtrl MotionCtrl Ours

Comparisons with Motioni2v,DragAnything and MOFA-Video

Drag Pattern Motion-I2V DragAnything MOFA-Video Ours

Human Face animation Comparisons

Reference Video MotionClone Motion-I2V MOFA-Video Ours

Animal animation Comparisons

Reference Video MotionClone Motion-I2V MOFA-Video Ours

Image to Video Generation Comparison

Open-Sora Pyramid-Flow CogVideoX DynamiCrafter Ours

Zoom_out;Zoom_in;Pan_left;Pan_right;ClickWise;count_clockwise

All kinds of Drags