Academic-Hammer · lyuSiying · Dec 21, 2024 · Dec 21, 2024 · Dec 21, 2024 · Dec 21, 2024
diff --git a/EmoTalk/EmoTalk使用文档.docx b/EmoTalk/EmoTalk使用文档.docx
diff --git a/EmoTalk/README.md b/EmoTalk/README.md
@@ -0,0 +1,91 @@
+# EmoTalk大作业
+
+原仓库链接：[psyai-net/EmoTalk_release: This is the official source for our ICCV 2023 paper "EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation" (github.com)](https://github.com/psyai-net/EmoTalk_release)
+
+封装后的可运行项目：https://pan.baidu.com/s/1ZO7TercF4GeucMflwcy9zw?pwd=jdkv
+
+###  本项目用于记录EmoTalk小组大作业的所有工作，包括以下内容：
+
+- **论文学习**：学习相关方法并进行记录。
+
+- **部署环境**：对原仓库代码进行修改以适应不同环境。
+
+- **测试指标**：包括定量测试老师提供的指标和定性测试论文的评价指标。
+
+- **改进创新**：将原仓库的推理模型改为训练，成功复现论文的训练。
+
+
+
+## 项目结构和组成
+```plaintext
+实验报告.docx       #二、实验报告，按照北理学报格式
+code/              # 我们的代码工作汇总
+  ├── EmoTalk_win/      # 修改后的Windows下可运行项目
+  ├── test/             # 测试老师测试集的LSE-C，LSE-D评价指标，以及定性评价论文提供的评价指标
+  ├── train/            # 将原仓库的推理模型改为训练模型
+EmoTalk使用文档.docx        #3.3、封装后的可运行项目配置文档
+```
+
+
+# 文件说明
+
+## 1. 实验报告.docx
+包括模型简介，实验困难及解决方案，模型的定性及定量评价结果，可能改进方法，组内评价
+
+## 2. Code
+我们的主要代码工作汇总，包括以下子模块：
+
+### 2.1 EmoTalk
+修改后的可运行项目，Windows环境下使用。
+
+**注意事项**：
+
+- 需要手动下载以下模型并放置到指定目录：
+  - **`wav2vec2-large-xlsr-53-english`** 和 **`wav2vec-english-speech-emotion-recognition`**，下载后存放至 `models/` 文件夹。
+  - **`EmoTalk.pth`**，下载后存放至 `pretrain_model/` 文件夹。
+- 其他具体信息参考[环境部署教程](code/EmoTalk_win#readme)。
+
+### 2.2 test
+用于测试老师提供的测试集，计算LSE-C，LSE-D评价指标。另外定性评价了论文的几项指标。
+
+### 2.3 train
+将原仓库的推理模型修改为训练模型，复现论文训练功能。
+
+## 3. EmoTalk使用文档.docx
+
+详细讲解了如何运行文件以生成视频和评价指标
+
+1.  **Dockers镜像下载**
+
+> 通过代码仓库或百度网盘下载docker镜像百度网盘https://pan.baidu.com/s/1ZO7TercF4GeucMflwcy9zw?pwd=jdkv
+
+2.  **Docker镜像配置**
+
+> 下载后在终端加载docker镜像：
+>
+> ```
+> docker load \< /path/to/EmoTalk.tar
+> ```
+>
+> 加载完毕后，查看镜像是否已经导入：docker images
+
+3.  **运行EmoTalk.py**
+
+> ```
+> docker run --gpus all -v <input_path>:/app/videos -v <output_path>:/app/result -it emotalk bash
+> ```
+>
+> 将输入视频和输出结果的路径挂载到容器上，注意一定要添加 \--gpus all
+>
+> 在容器中执行 
+>
+> ```
+> python EmoTalk.py \<path to video\>
+> ```
+>
+> 运行成功，输出LSE-D和LESE-C
+>
+> 输出文件保存在输出路径
+
+
+
diff --git a/EmoTalk/code/EmoTalk_win/Dockerfile b/EmoTalk/code/EmoTalk_win/Dockerfile
@@ -0,0 +1,54 @@
+FROM nvidia/cudagl:11.3.1-devel-ubuntu20.04
+MAINTAINER "Jungwoo Choi"
+
+ARG DEBIAN_FRONTEND=noninteractive
+ENV TZ=Asia/Seoul
+
+ADD requirements.txt /tmp/requirements.txt
+RUN \
+    # Fix CUDA apt error
+    rm -f /etc/apt/sources.list.d/cuda.list  && \
+    rm -f /etc/apt/sources.list.d/nvidia-ml.list  && \
+    apt-get update && apt-get install -y gnupg2 software-properties-common && \
+    apt-key del 7fa2af80  && \
+    apt-get update && apt-get install -y --no-install-recommends wget  && \
+    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-keyring_1.0-1_all.deb  && \
+    dpkg -i cuda-keyring_1.0-1_all.deb  && \
+    # Install Start 
+    apt update  && \
+    add-apt-repository -y ppa:savoury1/ffmpeg4 && \
+    apt -y install python3.8 python3.8-distutils libgl1-mesa-glx libglib2.0-0 git wget zsh vim openssh-server curl ffmpeg && \
+    # Python Library 
+    update-alternatives --install /usr/bin/python python /usr/bin/python3.8 1 && \
+    wget https://bootstrap.pypa.io/get-pip.py && \
+    python3 get-pip.py && \
+    pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113  && \
+    pip install -r /tmp/requirements.txt  && \
+    # zsh option
+    chsh -s /bin/zsh  && \
+    sh -c "$(wget -O- https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"  && \
+    # add zsh-autosuggestions, zsh-syntax-highlighting plugin
+    git clone https://github.com/zsh-users/zsh-autosuggestions ~/.oh-my-zsh/custom/plugins/zsh-autosuggestions  && \
+    git clone https://github.com/zsh-users/zsh-syntax-highlighting.git ~/.oh-my-zsh/custom/plugins/zsh-syntax-highlighting  && \
+    # Modify .zshrc whth Perl
+    perl -pi -w -e 's/ZSH_THEME=.*/ZSH_THEME="af-magic"/g;' ~/.zshrc  && \
+    perl -pi -w -e 's/plugins=.*/plugins=(git ssh-agent zsh-autosuggestions zsh-syntax-highlighting)/g;' ~/.zshrc  && \
+    # Set ssh id and password, default is id = root, password = root.
+    # I recommand changing this for more security
+    # PermitRootLogin : yes - for ssh connection
+    echo 'root:root' |chpasswd  && \
+    sed -ri 's/^#?PermitRootLogin\s+.*/PermitRootLogin yes/' /etc/ssh/sshd_config  && \
+    sed -ri 's/UsePAM yes/#UsePAM yes/g' /etc/ssh/sshd_config  && \
+    mkdir /root/.ssh  && \
+    mkdir /var/run/sshd   && \
+    # install language pack for timeline issue.
+    apt-get install -y language-pack-en && update-locale  && \
+    # Clean up
+    apt-get clean  && \
+    apt-get autoclean  && \
+    apt-get autoremove -y  && \
+    rm -rf /var/lib/cache/*  && \
+    rm -rf /var/lib/log/*
+
+WORKDIR /workspace
+CMD ["echo", "nvidia/cudagl:11.3.1-devel-ubuntu20.04 is ready!", 'zsh']
diff --git a/EmoTalk/code/EmoTalk_win/LICENSE b/EmoTalk/code/EmoTalk_win/LICENSE
@@ -0,0 +1,13 @@
+Copyright (c) 2023 Psyche AI Inc.
+
+This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, and distribute the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+1. Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
+
+2. NonCommercial — You may not use the material for commercial purposes.
+
+3. No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/EmoTalk/code/EmoTalk_win/audio/angry1.wav b/EmoTalk/code/EmoTalk_win/audio/angry1.wav
diff --git a/EmoTalk/code/EmoTalk_win/audio/angry2.wav b/EmoTalk/code/EmoTalk_win/audio/angry2.wav
diff --git a/EmoTalk/code/EmoTalk_win/audio/disgust.wav b/EmoTalk/code/EmoTalk_win/audio/disgust.wav
diff --git a/EmoTalk/code/EmoTalk_win/audio/fearful.wav b/EmoTalk/code/EmoTalk_win/audio/fearful.wav
diff --git a/EmoTalk/code/EmoTalk_win/audio/happy.wav b/EmoTalk/code/EmoTalk_win/audio/happy.wav
diff --git a/EmoTalk/code/EmoTalk_win/audio/malaya.wav b/EmoTalk/code/EmoTalk_win/audio/malaya.wav
diff --git a/EmoTalk/code/EmoTalk_win/audio/sad.wav b/EmoTalk/code/EmoTalk_win/audio/sad.wav
diff --git a/EmoTalk/code/EmoTalk_win/audio/ted1.wav b/EmoTalk/code/EmoTalk_win/audio/ted1.wav
diff --git a/EmoTalk/code/EmoTalk_win/audio/ted2.wav b/EmoTalk/code/EmoTalk_win/audio/ted2.wav
diff --git a/EmoTalk/code/EmoTalk_win/blender.sh b/EmoTalk/code/EmoTalk_win/blender.sh
@@ -0,0 +1,4 @@
+wget https://ftp.nluug.nl/pub/graphics/blender/release/Blender3.4/blender-3.4.1-linux-x64.tar.xz
+tar -xf blender-3.4.1-linux-x64.tar.xz
+mv blender-3.4.1-linux-x64 blender && rm blender-3.4.1-linux-x64.tar.xz
+
diff --git a/EmoTalk/code/EmoTalk_win/blender/README.md b/EmoTalk/code/EmoTalk_win/blender/README.md
@@ -0,0 +1,3 @@
+通过百度网盘分享的文件：blender.zip
+链接：https://pan.baidu.com/s/1O2ZoDuh2IGq43ElTfEnESA?pwd=7ssp 
+提取码：7ssp
diff --git a/EmoTalk/code/EmoTalk_win/demo.py b/EmoTalk/code/EmoTalk_win/demo.py
@@ -0,0 +1,138 @@
+import librosa
+import numpy as np
+import argparse
+from scipy.signal import savgol_filter
+import torch
+from model import EmoTalk
+import random
+import os, subprocess
+import shlex
+
+
+@torch.no_grad()
+def test(args):
+    result_path = args.result_path
+    os.makedirs(result_path, exist_ok=True)
+    eye1 = np.array([0.36537236, 0.950235724, 0.95593375, 0.916715622, 0.367256105, 0.119113259, 0.025357503])
+    eye2 = np.array([0.234776169, 0.909951985, 0.944758058, 0.777862132, 0.191071674, 0.235437036, 0.089163929])
+    eye3 = np.array([0.870040774, 0.949833691, 0.949418545, 0.695911646, 0.191071674, 0.072576277, 0.007108896])
+    eye4 = np.array([0.000307991, 0.556701422, 0.952656746, 0.942345619, 0.425857186, 0.148335218, 0.017659493])
+    model = EmoTalk(args)
+    model.load_state_dict(torch.load(args.model_path, map_location=torch.device(args.device)), strict=False)
+    model = model.to(args.device)
+    model.eval()
+    wav_path = args.wav_path
+    file_name = wav_path.split('/')[-1].split('.')[0]
+    speech_array, sampling_rate = librosa.load(os.path.join(wav_path), sr=16000)
+    audio = torch.FloatTensor(speech_array).unsqueeze(0).to(args.device)
+    level = torch.tensor([1]).to(args.device)
+    person = torch.tensor([0]).to(args.device)
+    prediction = model.predict(audio, level, person)
+    prediction = prediction.squeeze().detach().cpu().numpy()
+    if args.post_processing:
+        output = np.zeros((prediction.shape[0], prediction.shape[1]))
+        for i in range(prediction.shape[1]):
+            output[:, i] = savgol_filter(prediction[:, i], 5, 2)
+        output[:, 8] = 0
+        output[:, 9] = 0
+        i = random.randint(0, 60)
+        while i < output.shape[0] - 7:
+            eye_num = random.randint(1, 4)
+            if eye_num == 1:
+                output[i:i + 7, 8] = eye1
+                output[i:i + 7, 9] = eye1
+            elif eye_num == 2:
+                output[i:i + 7, 8] = eye2
+                output[i:i + 7, 9] = eye2
+            elif eye_num == 3:
+                output[i:i + 7, 8] = eye3
+                output[i:i + 7, 9] = eye3
+            else:
+                output[i:i + 7, 8] = eye4
+                output[i:i + 7, 9] = eye4
+            time1 = random.randint(60, 180)
+            i = i + time1
+        np.save(os.path.join(result_path, "{}.npy".format(file_name)), output)  # with postprocessing (smoothing and blinking)
+    else:
+        np.save(os.path.join(result_path, "{}.npy".format(file_name)), prediction)  # without post-processing
+
+
+def render_video(args):
+    wav_name = args.wav_path.split('/')[-1].split('.')[0]
+    image_path = os.path.join(args.result_path, wav_name)
+    os.makedirs(image_path, exist_ok=True)
+    image_temp = image_path + "/%d.png"
+    output_path = os.path.join(args.result_path, wav_name + ".mp4")
+    blender_path = args.blender_path
+    python_path = "./render.py"
+    blend_path = "./render.blend"
+
+    current_file_path = os.path.abspath(__file__)
+    directory_of_current_file = os.path.dirname(current_file_path)
+    result_path = directory_of_current_file + args.result_path
+
+    cmd = '{} -t 64 -b {} -P {} -- "{}" "{}" '.format(blender_path, blend_path, python_path, result_path, wav_name)
+    cmd = shlex.split(cmd)
+    p = subprocess.Popen(cmd, shell=False, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
+    while p.poll() is None:
+        line = p.stdout.readline()
+        line = line.strip()
+        if line:
+            print('[{}]'.format(line))
+    if p.returncode == 0:
+        print('Subprogram success')
+    else:
+        print('Subprogram failed')
+
+    # If the operating system is Windows
+    try:
+        ffmpeg_cmd = [
+            'ffmpeg', '-r', '30', '-i', image_temp, '-i', args.wav_path,
+            '-pix_fmt', 'yuv420p', '-s', '512x768', output_path, '-y'
+        ]
+        ffmpeg_cmd_str = ' '.join(ffmpeg_cmd)
+        subprocess.run(ffmpeg_cmd_str, check=True, shell=True)
+
+        rm_cmd = ['rmdir', '/S', '/Q', f'"{image_path}"']
+        rm_cmd_str = ' '.join(rm_cmd)
+        subprocess.run(rm_cmd_str, check=True, shell=True)
+
+    except subprocess.CalledProcessError as e:
+        print(f"Command failed with return code {e.returncode}: {e.cmd}")
+        if e.output:
+            print(f"Output: {e.output}")
+        if e.stderr:
+            print(f"Error: {e.stderr}")
+    except Exception as e:
+        print(f"An unexpected error occurred: {e}")
+
+    # cmd = 'ffmpeg -r 30 -i "{}" -i "{}" -pix_fmt yuv420p -s 512x768 "{}" -y'.format(image_temp, args.wav_path, output_path)
+    # subprocess.call(cmd, shell=True)
+    #
+    # cmd = 'rm -rf "{}"'.format(image_path)
+    # subprocess.call(cmd, shell=True)
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description='EmoTalk: Speech-driven Emotional Disentanglement for 3D Face Animation')
+    parser.add_argument("--wav_path", type=str, default="./audio/angry1.wav", help='path of the test data')
+    parser.add_argument("--bs_dim", type=int, default=52, help='number of blendshapes:52')
+    parser.add_argument("--feature_dim", type=int, default=832, help='number of feature dim')
+    parser.add_argument("--period", type=int, default=30, help='number of period')
+    parser.add_argument("--device", type=str, default="cuda", help='device')
+    parser.add_argument("--model_path", type=str, default="./pretrain_model/EmoTalk.pth",
+                        help='path of the trained models')
+    parser.add_argument("--result_path", type=str, default="./result/", help='path of the result')
+    parser.add_argument("--max_seq_len", type=int, default=5000, help='max sequence length')
+    parser.add_argument("--num_workers", type=int, default=0)
+    parser.add_argument("--batch_size", type=int, default=1)
+    parser.add_argument("--post_processing", type=bool, default=True, help='whether to use post processing')
+    parser.add_argument("--blender_path", type=str, default="./blender/blender", help='path of blender')
+    args = parser.parse_args()
+    test(args)
+    render_video(args)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/EmoTalk/code/EmoTalk_win/images/media/image1.png b/EmoTalk/code/EmoTalk_win/images/media/image1.png
diff --git a/EmoTalk/code/EmoTalk_win/images/media/image10.png b/EmoTalk/code/EmoTalk_win/images/media/image10.png
diff --git a/EmoTalk/code/EmoTalk_win/images/media/image11.png b/EmoTalk/code/EmoTalk_win/images/media/image11.png
diff --git a/EmoTalk/code/EmoTalk_win/images/media/image12.png b/EmoTalk/code/EmoTalk_win/images/media/image12.png
diff --git a/EmoTalk/code/EmoTalk_win/images/media/image13.png b/EmoTalk/code/EmoTalk_win/images/media/image13.png
diff --git a/EmoTalk/code/EmoTalk_win/images/media/image14.png b/EmoTalk/code/EmoTalk_win/images/media/image14.png
diff --git a/EmoTalk/code/EmoTalk_win/images/media/image15.png b/EmoTalk/code/EmoTalk_win/images/media/image15.png
diff --git a/EmoTalk/code/EmoTalk_win/images/media/image16.png b/EmoTalk/code/EmoTalk_win/images/media/image16.png
diff --git a/EmoTalk/code/EmoTalk_win/images/media/image2.png b/EmoTalk/code/EmoTalk_win/images/media/image2.png
diff --git a/EmoTalk/code/EmoTalk_win/images/media/image3.png b/EmoTalk/code/EmoTalk_win/images/media/image3.png
diff --git a/EmoTalk/code/EmoTalk_win/images/media/image4.png b/EmoTalk/code/EmoTalk_win/images/media/image4.png
diff --git a/EmoTalk/code/EmoTalk_win/images/media/image5.png b/EmoTalk/code/EmoTalk_win/images/media/image5.png
diff --git a/EmoTalk/code/EmoTalk_win/images/media/image6.png b/EmoTalk/code/EmoTalk_win/images/media/image6.png
diff --git a/EmoTalk/code/EmoTalk_win/images/media/image7.png b/EmoTalk/code/EmoTalk_win/images/media/image7.png
diff --git a/EmoTalk/code/EmoTalk_win/images/media/image8.png b/EmoTalk/code/EmoTalk_win/images/media/image8.png
diff --git a/EmoTalk/code/EmoTalk_win/images/media/image9.png b/EmoTalk/code/EmoTalk_win/images/media/image9.png