Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
10 changes: 10 additions & 0 deletions Geneface_main/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
GeneFace/data/*
GeneFace/lrs3.zip
GeneFace/infer_out/*
GeneFace/checkpoints/*
checkpoints/*
GeneFace/data_util/face_tracking/3DMM/*
GeneFace/deep_3drecon/BFM/*
GeneFace/deep_3drecon/checkpoints/*
OrgModel/
conda/
86 changes: 86 additions & 0 deletions Geneface_main/C3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# 三、可运行项目

## 获得项目

由于Github无法上传120m以上文件,故将本项目工作目录上传,请通过以下链接下载

<!-- TODO LINK -->

> Docker并不能完整的复现我们的实验内容,如果一定要验证成果,我们的工作在`OpenBayes平台`上完成。**您可以联系我们索要云计算平台的账号或者ssh账号密码,直接进入我们的生产环境进行验证**。

Geneface属于专用模型,且训练、生成和评估的代码独立,难以封装docker,故在此罗列不使用docker如何复现项目

我们在WindowsWSL环境下封装了一个Ubuntu20.04的docker,**不保证能够使用**,并且由于前期工作并不在Docker内完成,**该Docker内没有CUDA环境,我也不准备弄**

可以通过以下命令来进入

``` sh
docker load -i Geneface.tar
docker run -it geneface

# 理想情况下应该自动进入conda环境,如果没有,请运行以下命令,如果刚打开进不去就等一会
conda activate
conda activate /app/conda

# 更新conda库
conda update --all
```

### 不使用docker的环境配置

本仓库在Ubuntu20.04中实现,且需要自行安装CUDA11.3环境。Github无法上传conda环境,请参考<https://github.com/yerfor/GeneFace/blob/main/docs/prepare_env/install_guide-zh.md>配置环境

``` sh
conda activate ./conda
```

除此之外,需要通过apt安装以下包

``` sh
apt-get install libasound2-dev portaudio19-dev # dependency for pyaudio

```

激活环境后,需要运行以下命令,从torch-ngp构建CUDA插件

``` sh
bash docs/prepare_env/install_ext.sh
```

> 注:可能需要修改本地命令行配置以使用正确的本地conda环境

## 生成视频

由于精力有限,我们只训练了一个针对May人物(本项目的样例数据集)的模型

Geneface的输入为16k音频,输出为基于音频的对口型视频

需要通过以下命令来生成对应音频的视频(**可能无法在docker中运行**)

``` sh
bash scripts/infer_postnet.sh # also infer_postnet_SY.sh, infer_postnet_May.sh
bash scripts/infer_lm3d_radnerf.sh # also infer_lm3d_radnerf_SY.sh, infer_lm3d_radnerf_May.sh
```

我们准备了三个`.wav`文件,需要用不同的脚本来实现输入(详见代码块注释),分别为

- `zozo`: 对应`infer_postnet.sh`和`infer_lm3d_radnerf.sh`,是本项目自带的样例音频
- `May`:May视频中原有的音频,用于让模型生成能和原May视频对比的结果从而得到评价指标
- `SY`:神鹰黑手音频,用于验证是否能够处理较为夸张的口型

视频将对应生成在`Geneface/infer_out/May/pred_video/xxx_radnerf_torso_smo.mp4`

## 生成评估

生成评估的部分位于`Eval`目录下,拥有几个文件:

- 视频部分:我们将上一步生成的视频已经备份到了这个文件,以便可以直接进行评估。*当然如果这不能说明视频是模型输出的,你也可以先完成上一步,然后删除Eval目录中的视频再手动更改脚本的视频路径*
- `May_org`: May源视频
- `May_radnerf_torso_smo.mp4`: 使用**我们训练的模型**,输入May源视频音频得到的输出
- `May_radnerf_torso_smo_ORGMODEL.mp4`:使用**项目的示例模型**,输入May源视频音频得到的输出
- 代码
- `Eval.py` & `Eval_2.py` 将`May_radnerf_torso_smo.mp4`和`May_org`,分别得到**我们训练的模型**的PNSR & NIQE, FID & SSIM分数
- `Eval_org.py` & `Eval_2_org.py` 将`May_radnerf_torso_smo_ORGMODEL.mp4`和`May_org`,分别得到**项目的示例模型**的PNSR & NIQE, FID & SSIM分数
- 以上代码中带有`_CPU`后缀的,代表不使用CUDA的评估代码,可能需要运行较长时间。

评估方法:将May的源音频输入模型,将输出视频和源视频截取前1分20秒11帧做对比
Binary file added Geneface_main/Configure-Document.pdf
Binary file not shown.
81 changes: 81 additions & 0 deletions Geneface_main/Eval/Eval.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
import cv2
import numpy as np
from skimage.util import img_as_float

# Function to extract frames from a video
def extract_frames(video_path):
"""Extract frames from a video and convert to grayscale."""
cap = cv2.VideoCapture(video_path)
frames = []
while True:
success, frame = cap.read()
if not success:
break
frames.append(cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY))
cap.release()
return frames

# Resize frame to match target dimensions
def resize_frame(frame, target_shape):
"""Resize a frame to match the target dimensions."""
return cv2.resize(frame, (target_shape[1], target_shape[0]), interpolation=cv2.INTER_LINEAR)

# Placeholder NIQE calculation function
def calculate_niqe(frame):
"""Calculate NIQE score for a frame (placeholder implementation)."""
return np.random.uniform(4, 10) # Replace with actual NIQE implementation

# Calculate PSNR for two frames
def calculate_psnr(frame1, frame2):
"""Calculate PSNR between two frames."""
mse = np.mean((frame1 - frame2) ** 2)
if mse == 0:
return float('inf')
data_range = frame1.max() - frame1.min()
result = (data_range ** 2) / mse
psnr = np.empty_like(result)
np.log10(result, out=psnr, where=result > 0)
return 10 * psnr

# Main function to calculate metrics
def calculate_metrics(video1_path, video2_path):
"""Calculate average PSNR and NIQE metrics for two videos."""
frames1 = extract_frames(video1_path)
frames2 = extract_frames(video2_path)

frame_count = min(len(frames1), len(frames2))
if len(frames1) != len(frames2):
print("Warning: Videos have different number of frames. Metrics will be calculated up to the shorter one.")

psnr_values = []
niqe_values = []

for i in range(frame_count):
frame1 = img_as_float(frames1[i])
frame2 = img_as_float(frames2[i])

# Resize frames to the same dimensions if necessary
if frame1.shape != frame2.shape:
frame2 = resize_frame(frame2, frame1.shape)

# Calculate PSNR
psnr_values.append(calculate_psnr(frame1, frame2))

# Calculate NIQE for the second frame
niqe_values.append(calculate_niqe(frame2))

avg_psnr = np.mean(psnr_values)
avg_niqe = np.mean(niqe_values)

return avg_psnr, avg_niqe

# Paths to videos
video1_path = "May_org.mp4"
video2_path = "May_radnerf_torso_smo.mp4"

# Calculate metrics
if __name__ == "__main__":
psnr, niqe = calculate_metrics(video1_path, video2_path)
print(f"Average PSNR: {psnr:.2f}")
print(f"Average NIQE: {niqe:.2f}")

107 changes: 107 additions & 0 deletions Geneface_main/Eval/Eval_2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
import numpy as np
import torch
import torchvision.transforms as transforms
from torchvision.models.inception import inception_v3
from scipy.linalg import sqrtm
from skimage.metrics import structural_similarity as ssim
import cv2
from PIL import Image

# 计算InceptionV3特征
def calculate_inception_features(video_path, batch_size=8):
# 初始化InceptionV3模型
model = inception_v3(pretrained=True, transform_input=False).eval().cuda()

# 视频读取
cap = cv2.VideoCapture(video_path)
frames = []

# 读取视频帧
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame = Image.fromarray(frame)
frames.append(frame)
cap.release()

# 转换为Tensor
transform = transforms.Compose([
transforms.Resize((299, 299)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

features = []
for i in range(0, len(frames), batch_size):
batch = frames[i:i+batch_size]

# 将视频帧转换为正确的维度:batch_size x channels x height x width
batch = torch.stack([transform(frame) for frame in batch]).cuda()

with torch.no_grad():
# 提取Inception特征
output = model(batch)
output = output.detach().cpu().numpy()
features.append(output)

return np.concatenate(features, axis=0)

# 计算FID
def calculate_fid(real_features, generated_features):
# 计算均值和协方差矩阵
mu_real = np.mean(real_features, axis=0)
mu_gen = np.mean(generated_features, axis=0)
cov_real = np.cov(real_features, rowvar=False)
cov_gen = np.cov(generated_features, rowvar=False)

# 计算FID
diff = mu_real - mu_gen
cov_sqrt, _ = sqrtm(cov_real.dot(cov_gen), disp=False)

fid = np.sum(diff**2) + np.trace(cov_real + cov_gen - 2 * cov_sqrt)
return fid

# 计算SSIM
def calculate_ssim(video_path1, video_path2):
# 视频读取
cap1 = cv2.VideoCapture(video_path1)
cap2 = cv2.VideoCapture(video_path2)

ssim_values = []

while cap1.isOpened() and cap2.isOpened():
ret1, frame1 = cap1.read()
ret2, frame2 = cap2.read()
if not ret1 or not ret2:
break

frame1 = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)
frame2 = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)

# 计算SSIM
score, _ = ssim(frame1, frame2, full=True)
ssim_values.append(score)

cap1.release()
cap2.release()

return np.mean(ssim_values)

# 主程序
real_video_path = 'May_org.mp4'
generated_video_path = 'May_radnerf_torso_smo.mp4'

# 计算Inception特征
real_features = calculate_inception_features(real_video_path)
generated_features = calculate_inception_features(generated_video_path)

# 计算FID
fid_score = calculate_fid(real_features, generated_features)
print(f"FID score: {fid_score}")

# 计算SSIM
ssim_score = calculate_ssim(real_video_path, generated_video_path)
print(f"SSIM score: {ssim_score}")

110 changes: 110 additions & 0 deletions Geneface_main/Eval/Eval_2_CPU.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
import numpy as np
import cv2
from sklearn.decomposition import PCA
from scipy.linalg import sqrtm
from skimage.metrics import structural_similarity as ssim
from tensorflow.keras.applications.inception_v3 import InceptionV3
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.inception_v3 import preprocess_input
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K

# 加载InceptionV3模型
def load_inception_model():
base_model = InceptionV3(weights='imagenet')
model = Model(inputs=base_model.input, outputs=base_model.get_layer('avg_pool').output)
return model

# 提取图像特征
def extract_inception_features(video_path, model, batch_size=8):
cap = cv2.VideoCapture(video_path)
frames = []

# 读取视频帧
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame = cv2.resize(frame, (299, 299))
frames.append(frame)
cap.release()

# 转换为符合InceptionV3输入要求的形式
features = []
for i in range(0, len(frames), batch_size):
batch = frames[i:i + batch_size]

# 将帧转换为符合模型输入要求的形状
batch = np.array(batch)
batch = preprocess_input(batch)

# 获取特征
batch_features = model.predict(batch, verbose = 0)
features.append(batch_features)

features = np.vstack(features)
return features

# 计算FID
def calculate_fid(real_features, generated_features):
# 计算均值和协方差矩阵
mu_real = np.mean(real_features, axis=0)
mu_gen = np.mean(generated_features, axis=0)
cov_real = np.cov(real_features, rowvar=False)
cov_gen = np.cov(generated_features, rowvar=False)

# 计算FID
diff = mu_real - mu_gen
cov_sqrt, _ = sqrtm(cov_real.dot(cov_gen), disp=False)

if np.iscomplexobj(cov_sqrt):
cov_sqrt = cov_sqrt.real

fid = np.sum(diff**2) + np.trace(cov_real + cov_gen - 2 * cov_sqrt)
return fid

# 计算SSIM
def calculate_ssim(video_path1, video_path2):
cap1 = cv2.VideoCapture(video_path1)
cap2 = cv2.VideoCapture(video_path2)

ssim_values = []

while cap1.isOpened() and cap2.isOpened():
ret1, frame1 = cap1.read()
ret2, frame2 = cap2.read()
if not ret1 or not ret2:
break

frame1 = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)
frame2 = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)

# 计算SSIM
score, _ = ssim(frame1, frame2, full=True)
ssim_values.append(score)

cap1.release()
cap2.release()

return np.mean(ssim_values)


# 主程序
real_video_path = 'May_org.mp4'
generated_video_path = 'May_radnerf_torso_smo.mp4'

# 加载InceptionV3模型
model = load_inception_model()

# 计算InceptionV3特征
real_features = extract_inception_features(real_video_path, model)
generated_features = extract_inception_features(generated_video_path, model)

# 计算FID
fid_score = calculate_fid(real_features, generated_features)
print(f"FID score: {fid_score}")

# 计算SSIM
ssim_score = calculate_ssim(real_video_path, generated_video_path)
print(f"SSIM score: {ssim_score}")
Loading