Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added EmoTalk/EmoTalk使用文档.docx
Binary file not shown.
91 changes: 91 additions & 0 deletions EmoTalk/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# EmoTalk大作业

原仓库链接:[psyai-net/EmoTalk_release: This is the official source for our ICCV 2023 paper "EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation" (github.com)](https://github.com/psyai-net/EmoTalk_release)

封装后的可运行项目:https://pan.baidu.com/s/1ZO7TercF4GeucMflwcy9zw?pwd=jdkv

### 本项目用于记录EmoTalk小组大作业的所有工作,包括以下内容:

- **论文学习**:学习相关方法并进行记录。

- **部署环境**:对原仓库代码进行修改以适应不同环境。

- **测试指标**:包括定量测试老师提供的指标和定性测试论文的评价指标。

- **改进创新**:将原仓库的推理模型改为训练,成功复现论文的训练。



## 项目结构和组成
```plaintext
实验报告.docx #二、实验报告,按照北理学报格式
code/ # 我们的代码工作汇总
├── EmoTalk_win/ # 修改后的Windows下可运行项目
├── test/ # 测试老师测试集的LSE-C,LSE-D评价指标,以及定性评价论文提供的评价指标
├── train/ # 将原仓库的推理模型改为训练模型
EmoTalk使用文档.docx #3.3、封装后的可运行项目配置文档
```


# 文件说明

## 1. 实验报告.docx
包括模型简介,实验困难及解决方案,模型的定性及定量评价结果,可能改进方法,组内评价

## 2. Code
我们的主要代码工作汇总,包括以下子模块:

### 2.1 EmoTalk
修改后的可运行项目,Windows环境下使用。

**注意事项**:

- 需要手动下载以下模型并放置到指定目录:
- **`wav2vec2-large-xlsr-53-english`** 和 **`wav2vec-english-speech-emotion-recognition`**,下载后存放至 `models/` 文件夹。
- **`EmoTalk.pth`**,下载后存放至 `pretrain_model/` 文件夹。
- 其他具体信息参考[环境部署教程](code/EmoTalk_win#readme)。

### 2.2 test
用于测试老师提供的测试集,计算LSE-C,LSE-D评价指标。另外定性评价了论文的几项指标。

### 2.3 train
将原仓库的推理模型修改为训练模型,复现论文训练功能。

## 3. EmoTalk使用文档.docx

详细讲解了如何运行文件以生成视频和评价指标

1. **Dockers镜像下载**

> 通过代码仓库或百度网盘下载docker镜像百度网盘https://pan.baidu.com/s/1ZO7TercF4GeucMflwcy9zw?pwd=jdkv

2. **Docker镜像配置**

> 下载后在终端加载docker镜像:
>
> ```
> docker load \< /path/to/EmoTalk.tar
> ```
>
> 加载完毕后,查看镜像是否已经导入:docker images

3. **运行EmoTalk.py**

> ```
> docker run --gpus all -v <input_path>:/app/videos -v <output_path>:/app/result -it emotalk bash
> ```
>
> 将输入视频和输出结果的路径挂载到容器上,注意一定要添加 \--gpus all
>
> 在容器中执行
>
> ```
> python EmoTalk.py \<path to video\>
> ```
>
> 运行成功,输出LSE-D和LESE-C
>
> 输出文件保存在输出路径



54 changes: 54 additions & 0 deletions EmoTalk/code/EmoTalk_win/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
FROM nvidia/cudagl:11.3.1-devel-ubuntu20.04
MAINTAINER "Jungwoo Choi"

ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=Asia/Seoul

ADD requirements.txt /tmp/requirements.txt
RUN \
# Fix CUDA apt error
rm -f /etc/apt/sources.list.d/cuda.list && \
rm -f /etc/apt/sources.list.d/nvidia-ml.list && \
apt-get update && apt-get install -y gnupg2 software-properties-common && \
apt-key del 7fa2af80 && \
apt-get update && apt-get install -y --no-install-recommends wget && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-keyring_1.0-1_all.deb && \
dpkg -i cuda-keyring_1.0-1_all.deb && \
# Install Start
apt update && \
add-apt-repository -y ppa:savoury1/ffmpeg4 && \
apt -y install python3.8 python3.8-distutils libgl1-mesa-glx libglib2.0-0 git wget zsh vim openssh-server curl ffmpeg && \
# Python Library
update-alternatives --install /usr/bin/python python /usr/bin/python3.8 1 && \
wget https://bootstrap.pypa.io/get-pip.py && \
python3 get-pip.py && \
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113 && \
pip install -r /tmp/requirements.txt && \
# zsh option
chsh -s /bin/zsh && \
sh -c "$(wget -O- https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)" && \
# add zsh-autosuggestions, zsh-syntax-highlighting plugin
git clone https://github.com/zsh-users/zsh-autosuggestions ~/.oh-my-zsh/custom/plugins/zsh-autosuggestions && \
git clone https://github.com/zsh-users/zsh-syntax-highlighting.git ~/.oh-my-zsh/custom/plugins/zsh-syntax-highlighting && \
# Modify .zshrc whth Perl
perl -pi -w -e 's/ZSH_THEME=.*/ZSH_THEME="af-magic"/g;' ~/.zshrc && \
perl -pi -w -e 's/plugins=.*/plugins=(git ssh-agent zsh-autosuggestions zsh-syntax-highlighting)/g;' ~/.zshrc && \
# Set ssh id and password, default is id = root, password = root.
# I recommand changing this for more security
# PermitRootLogin : yes - for ssh connection
echo 'root:root' |chpasswd && \
sed -ri 's/^#?PermitRootLogin\s+.*/PermitRootLogin yes/' /etc/ssh/sshd_config && \
sed -ri 's/UsePAM yes/#UsePAM yes/g' /etc/ssh/sshd_config && \
mkdir /root/.ssh && \
mkdir /var/run/sshd && \
# install language pack for timeline issue.
apt-get install -y language-pack-en && update-locale && \
# Clean up
apt-get clean && \
apt-get autoclean && \
apt-get autoremove -y && \
rm -rf /var/lib/cache/* && \
rm -rf /var/lib/log/*

WORKDIR /workspace
CMD ["echo", "nvidia/cudagl:11.3.1-devel-ubuntu20.04 is ready!", 'zsh']
13 changes: 13 additions & 0 deletions EmoTalk/code/EmoTalk_win/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Copyright (c) 2023 Psyche AI Inc.

This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, and distribute the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

1. Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

2. NonCommercial — You may not use the material for commercial purposes.

3. No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Binary file added EmoTalk/code/EmoTalk_win/audio/angry1.wav
Binary file not shown.
Binary file added EmoTalk/code/EmoTalk_win/audio/angry2.wav
Binary file not shown.
Binary file added EmoTalk/code/EmoTalk_win/audio/disgust.wav
Binary file not shown.
Binary file added EmoTalk/code/EmoTalk_win/audio/fearful.wav
Binary file not shown.
Binary file added EmoTalk/code/EmoTalk_win/audio/happy.wav
Binary file not shown.
Binary file added EmoTalk/code/EmoTalk_win/audio/malaya.wav
Binary file not shown.
Binary file added EmoTalk/code/EmoTalk_win/audio/sad.wav
Binary file not shown.
Binary file added EmoTalk/code/EmoTalk_win/audio/ted1.wav
Binary file not shown.
Binary file added EmoTalk/code/EmoTalk_win/audio/ted2.wav
Binary file not shown.
4 changes: 4 additions & 0 deletions EmoTalk/code/EmoTalk_win/blender.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
wget https://ftp.nluug.nl/pub/graphics/blender/release/Blender3.4/blender-3.4.1-linux-x64.tar.xz
tar -xf blender-3.4.1-linux-x64.tar.xz
mv blender-3.4.1-linux-x64 blender && rm blender-3.4.1-linux-x64.tar.xz

3 changes: 3 additions & 0 deletions EmoTalk/code/EmoTalk_win/blender/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
通过百度网盘分享的文件:blender.zip
链接:https://pan.baidu.com/s/1O2ZoDuh2IGq43ElTfEnESA?pwd=7ssp
提取码:7ssp
138 changes: 138 additions & 0 deletions EmoTalk/code/EmoTalk_win/demo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
import librosa
import numpy as np
import argparse
from scipy.signal import savgol_filter
import torch
from model import EmoTalk
import random
import os, subprocess
import shlex


@torch.no_grad()
def test(args):
result_path = args.result_path
os.makedirs(result_path, exist_ok=True)
eye1 = np.array([0.36537236, 0.950235724, 0.95593375, 0.916715622, 0.367256105, 0.119113259, 0.025357503])
eye2 = np.array([0.234776169, 0.909951985, 0.944758058, 0.777862132, 0.191071674, 0.235437036, 0.089163929])
eye3 = np.array([0.870040774, 0.949833691, 0.949418545, 0.695911646, 0.191071674, 0.072576277, 0.007108896])
eye4 = np.array([0.000307991, 0.556701422, 0.952656746, 0.942345619, 0.425857186, 0.148335218, 0.017659493])
model = EmoTalk(args)
model.load_state_dict(torch.load(args.model_path, map_location=torch.device(args.device)), strict=False)
model = model.to(args.device)
model.eval()
wav_path = args.wav_path
file_name = wav_path.split('/')[-1].split('.')[0]
speech_array, sampling_rate = librosa.load(os.path.join(wav_path), sr=16000)
audio = torch.FloatTensor(speech_array).unsqueeze(0).to(args.device)
level = torch.tensor([1]).to(args.device)
person = torch.tensor([0]).to(args.device)
prediction = model.predict(audio, level, person)
prediction = prediction.squeeze().detach().cpu().numpy()
if args.post_processing:
output = np.zeros((prediction.shape[0], prediction.shape[1]))
for i in range(prediction.shape[1]):
output[:, i] = savgol_filter(prediction[:, i], 5, 2)
output[:, 8] = 0
output[:, 9] = 0
i = random.randint(0, 60)
while i < output.shape[0] - 7:
eye_num = random.randint(1, 4)
if eye_num == 1:
output[i:i + 7, 8] = eye1
output[i:i + 7, 9] = eye1
elif eye_num == 2:
output[i:i + 7, 8] = eye2
output[i:i + 7, 9] = eye2
elif eye_num == 3:
output[i:i + 7, 8] = eye3
output[i:i + 7, 9] = eye3
else:
output[i:i + 7, 8] = eye4
output[i:i + 7, 9] = eye4
time1 = random.randint(60, 180)
i = i + time1
np.save(os.path.join(result_path, "{}.npy".format(file_name)), output) # with postprocessing (smoothing and blinking)
else:
np.save(os.path.join(result_path, "{}.npy".format(file_name)), prediction) # without post-processing


def render_video(args):
wav_name = args.wav_path.split('/')[-1].split('.')[0]
image_path = os.path.join(args.result_path, wav_name)
os.makedirs(image_path, exist_ok=True)
image_temp = image_path + "/%d.png"
output_path = os.path.join(args.result_path, wav_name + ".mp4")
blender_path = args.blender_path
python_path = "./render.py"
blend_path = "./render.blend"

current_file_path = os.path.abspath(__file__)
directory_of_current_file = os.path.dirname(current_file_path)
result_path = directory_of_current_file + args.result_path

cmd = '{} -t 64 -b {} -P {} -- "{}" "{}" '.format(blender_path, blend_path, python_path, result_path, wav_name)
cmd = shlex.split(cmd)
p = subprocess.Popen(cmd, shell=False, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
while p.poll() is None:
line = p.stdout.readline()
line = line.strip()
if line:
print('[{}]'.format(line))
if p.returncode == 0:
print('Subprogram success')
else:
print('Subprogram failed')

# If the operating system is Windows
try:
ffmpeg_cmd = [
'ffmpeg', '-r', '30', '-i', image_temp, '-i', args.wav_path,
'-pix_fmt', 'yuv420p', '-s', '512x768', output_path, '-y'
]
ffmpeg_cmd_str = ' '.join(ffmpeg_cmd)
subprocess.run(ffmpeg_cmd_str, check=True, shell=True)

rm_cmd = ['rmdir', '/S', '/Q', f'"{image_path}"']
rm_cmd_str = ' '.join(rm_cmd)
subprocess.run(rm_cmd_str, check=True, shell=True)

except subprocess.CalledProcessError as e:
print(f"Command failed with return code {e.returncode}: {e.cmd}")
if e.output:
print(f"Output: {e.output}")
if e.stderr:
print(f"Error: {e.stderr}")
except Exception as e:
print(f"An unexpected error occurred: {e}")

# cmd = 'ffmpeg -r 30 -i "{}" -i "{}" -pix_fmt yuv420p -s 512x768 "{}" -y'.format(image_temp, args.wav_path, output_path)
# subprocess.call(cmd, shell=True)
#
# cmd = 'rm -rf "{}"'.format(image_path)
# subprocess.call(cmd, shell=True)


def main():
parser = argparse.ArgumentParser(
description='EmoTalk: Speech-driven Emotional Disentanglement for 3D Face Animation')
parser.add_argument("--wav_path", type=str, default="./audio/angry1.wav", help='path of the test data')
parser.add_argument("--bs_dim", type=int, default=52, help='number of blendshapes:52')
parser.add_argument("--feature_dim", type=int, default=832, help='number of feature dim')
parser.add_argument("--period", type=int, default=30, help='number of period')
parser.add_argument("--device", type=str, default="cuda", help='device')
parser.add_argument("--model_path", type=str, default="./pretrain_model/EmoTalk.pth",
help='path of the trained models')
parser.add_argument("--result_path", type=str, default="./result/", help='path of the result')
parser.add_argument("--max_seq_len", type=int, default=5000, help='max sequence length')
parser.add_argument("--num_workers", type=int, default=0)
parser.add_argument("--batch_size", type=int, default=1)
parser.add_argument("--post_processing", type=bool, default=True, help='whether to use post processing')
parser.add_argument("--blender_path", type=str, default="./blender/blender", help='path of blender')
args = parser.parse_args()
test(args)
render_video(args)


if __name__ == "__main__":
main()
Binary file added EmoTalk/code/EmoTalk_win/images/media/image1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added EmoTalk/code/EmoTalk_win/images/media/image13.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added EmoTalk/code/EmoTalk_win/images/media/image14.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added EmoTalk/code/EmoTalk_win/images/media/image16.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added EmoTalk/code/EmoTalk_win/images/media/image3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added EmoTalk/code/EmoTalk_win/images/media/image4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added EmoTalk/code/EmoTalk_win/images/media/image5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added EmoTalk/code/EmoTalk_win/images/media/image6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added EmoTalk/code/EmoTalk_win/images/media/image8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added EmoTalk/code/EmoTalk_win/images/media/image9.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading