Jiayi Zheng and Xiaodong Cun from GVC Lab @ Great Bay University
Arxiv | PDF | Project Page
TL;DR: We create story videos from a single child-drawn character image.
We propose FairyGen, a novel framework for generating animated story videos from a single hand-drawn character, while faithfully preserving its artistic style. It features story planning via MLLM, propagated stylization, 3D-based motion generation, and a two-stage propagated motion adapter.
FairyGen is a dual-pipeline framework designed for high-fidelity character stylization using SDXL and motion-consistent animation using Wan2.2.
conda create -n fairygen python=3.12 -y
conda activate fairygen
pip install -r requirements.txt
# Install modified diffusers for BrushNet support
cd stylization/BrushNet
pip install -e .[NOTE] Diffusers Library Compatibility: BrushNet requires the modified version of diffusers library to support specific code updates. While standard Style LoRA/DoRA or two-stage animation finetuning can operate on the latest official diffusers release, the local installation (pip install -e .) is mandatory for BrushNet functionality.
For Stylization (SDXL & BrushNet):
hf download stabilityai/stable-diffusion-xl-base-1.0
hf download madebyollin/sdxl-vae-fp16-fix BrushNet Checkpoint: available at this Google Drive Link
For Animation (Wan2.2-TI2V-5B):
hf download Wan-AI/Wan2.2-TI2V-5BLocal Model Loading: To load weights from local directories, modify the --model_id_with_origin_paths in stage1_id.sh to model_paths, and load the models in JSON format as shown below:
--model_paths '[
[
".models/Wan2.2-TI2V-5B/diffusion_pytorch_model-00001-of-00003-bf16.safetensors",
".models/Wan2.2-TI2V-5B/diffusion_pytorch_model-00002-of-00003-bf16.safetensors",
"./models/Wan2.2-TI2V-5B/diffusion_pytorch_model-00003-of-00003-bf16.safetensors"
],
"./models/Wan2.2-TI2V-5B/models_t5_umt5-xxl-enc-bf16.pth",
"./models/Wan2.2-TI2V-5B/Wan2.2_VAE.pth"
]'Step 1: Style DoRA Training
The training data requires a single character image paired with its binary mask. A script to generate binary masks is provided (create_mask.py). Example datasets can be found in here.
cd stylization/dora_training
bash train.shStep 2: Background Generation
Example data is available here. When crafting prompts, it is recommended to include a description of the character’s appearance for better results.
A key parameter when using BrushNet is brushnet_conditioning_scale, which controls the trade-off between style consistency and background richness. Higher values (e.g., 1.0) emphasize style consistency, while lower values allow for more text alignment and richer background content. A value of 0.7 is commonly used.
cd stylization/BrushNet
python examples/brushnet/test_brushnet_sdxl.pyA two-stage training approach is applied to learn anthropomorphic motion. Example dataset can be found here.
Stage1: Learn character identity (appearance)
cd animation
bash stage1_id.shStage 2: Learn motion information.
Update lora_checkpoint with the checkpoint from stage 1 before stage 2 training. More complex motions may require additional training steps in stage 2.
bash stage2_motion.shAfter training, merge the two-stage LoRA weights.
python merge_weights.pyGenerate Animation (Inference)
Use the first frame with a background and load the merged motion LoRA.
# single shot
python inference.py
# multi-shot
python batch_inference.py@article{zheng2025fairygen,
title={FairyGen: Storied Cartoon Video from a Single Child-Drawn Character},
author={Jiayi Zheng and Xiaodong Cun},
year={2025},
eprint={2506.21272},
archivePrefix={arXiv},
primaryClass={cs.GR},
url={https://arxiv.org/abs/2506.21272},
}

