We present a unified controllable video generation approach AnimateAnything that facilitates precise and consistent video manipulation across various conditions, including camera trajectories, text prompts, and user motion annotations. Specifically, we carefully design a multi-scale control feature fusion network to construct a common motion representation for different conditions. It explicitly converts all control information into frame-by-frame optical flows. Then we introduce the optical flows as motion priors to guide the video generation. In addition, to reduce the flickering issues caused by large-scale motion, we propose a frequency-based stabilization module. It can enhance temporal coherence by ensuring the video's frequency domain consistency. Experiments demonstrate that our method outperforms the state-of-the-art approaches.
Overview
The overview of AnimateAnything. Our framework is mainly constituted with two parts:Unified Flow Generation and Video Generation
Visualization Results
large Camera Pose Control
Drag Anything you like
Human Face Animation
Comparisons with CameraCtrl and MotionCtrl
Camera Motion
CameraCtrl
MotionCtrl
Ours
Comparisons with Motioni2v,DragAnything and MOFA-Video