AnimateAnything

We present a unified controllable video generation approach AnimateAnything that facilitates precise and consistent video manipulation across various conditions, including camera trajectories, text prompts, and user motion annotations. Specifically, we carefully design a multi-scale control feature fusion network to construct a common motion representation for different conditions. It explicitly converts all control information into frame-by-frame optical flows. Then we introduce the optical flows as motion priors to guide the video generation. In addition, to reduce the flickering issues caused by large-scale motion, we propose a frequency-based stabilization module. It can enhance temporal coherence by ensuring the video's frequency domain consistency. Experiments demonstrate that our method outperforms the state-of-the-art approaches.

The overview of AnimateAnything. Our framework is mainly constituted with two parts:Unified Flow Generation and Video Generation

Camera Motion	CameraCtrl	MotionCtrl	Ours

Drag Pattern	Motion-I2V	DragAnything	MOFA-Video	Ours

Reference Video	MotionClone	Motion-I2V	MOFA-Video	Ours

Reference Video	MotionClone	Motion-I2V	MOFA-Video	Ours

Open-Sora	Pyramid-Flow	CogVideoX	DynamiCrafter	Ours

large Camera Pose Control

Drag Anything you like

Human Face Animation

Comparisons with CameraCtrl and MotionCtrl

Comparisons with Motioni2v,DragAnything and MOFA-Video

Human Face animation Comparisons

Animal animation Comparisons

Image to Video Generation Comparison

Zoom_out;Zoom_in;Pan_left;Pan_right;ClickWise;count_clockwise

All kinds of Drags