diff --git a/hallo_root/Innovation/LMD_evaluate/0.gif b/hallo_root/Innovation/LMD_evaluate/0.gif
new file mode 100644
index 00000000..63fd2b87
Binary files /dev/null and b/hallo_root/Innovation/LMD_evaluate/0.gif differ
diff --git a/hallo_root/Innovation/LMD_evaluate/1.gif b/hallo_root/Innovation/LMD_evaluate/1.gif
new file mode 100644
index 00000000..0fef7b33
Binary files /dev/null and b/hallo_root/Innovation/LMD_evaluate/1.gif differ
diff --git a/hallo_root/Innovation/LMD_evaluate/Readme.md b/hallo_root/Innovation/LMD_evaluate/Readme.md
new file mode 100644
index 00000000..cd43772a
--- /dev/null
+++ b/hallo_root/Innovation/LMD_evaluate/Readme.md
@@ -0,0 +1,28 @@
+# 配置说明
+
+建议使用Anaconda构建虚拟环境,另外如果在构建dlib时报错,注意检查是否安装cmake,dlib的构建基础是cmake。
+
+# 算法说明
+
+## 原理介绍
+LMD 的核心思想是通过比较生成图像(或视频帧)中的关键点(landmarks)和对应真实图像中的关键点,来量化生成结果与真实目标之间的差异。关键点通常指的是脸部关键点,比如眼睛、鼻子、嘴巴等部位的位置。算法分为以下几个步骤:
+1. 关键点提取
+使用预训练的面部关键点检测模型(dlib 的 shape_predictor_68_face_landmarks.dat权重文件),从生成的图像和真实图像中提取一组面部关键点坐标。每个关键点坐标表示为二维平面上的点 $(x, y)$。
+2. 距离计算
+ 对应的关键点之间的欧氏距离被用来衡量生成结果与真实目标的相似度。
+ $$
+ d_i = \sqrt{(x_i^{\text{gen}} - x_i^{\text{real}})^2 + (y_i^{\text{gen}} - y_i^{\text{real}})^2}
+ $$
+3. 平均距离
+ 将所有关键点的距离取平均,得到 LMD 值:
+ $$
+ \text{LMD} = \frac{1}{N} \sum_{i=1}^{N} d_i
+ $$
+## 测试效果
+我们对测试集中的jae-in.mp4进行了测试,得到以下结果
+
+理想情况下,LMD 越低越好,表示生成结果越接近真实。当前值 36.08 是偏高的,表明生成的面部动作和表情与真实视频差距较大。
+## 问题分析
+通常研究中,LMD 值应低于某个阈值(比如 10-15)才能被认为生成结果较为逼真。而我们在对hallo模型生成的视频进行测试后,发现所得到的LMD值都普遍偏大。这是因为LMD算法的原理是检测每一帧中人脸五官的相对距离,而hallo模型只根据一帧图像生成视频,这导致生成视频的人脸几乎是静止的,所以当原视频中人的头部动作幅度较大时,LMD值就会偏高。
+
+
\ No newline at end of file
diff --git a/hallo_root/Innovation/LMD_evaluate/image.png b/hallo_root/Innovation/LMD_evaluate/image.png
new file mode 100644
index 00000000..6b339eb8
Binary files /dev/null and b/hallo_root/Innovation/LMD_evaluate/image.png differ
diff --git a/hallo_root/README.md b/hallo_root/README.md
new file mode 100644
index 00000000..be4d47cb
--- /dev/null
+++ b/hallo_root/README.md
@@ -0,0 +1,16 @@
+# evaluate_root
+Evaluate 目录中的main.py是进行评估值计算的统一接口,通过运行此python文件计算各评估指标
+
+MP4 目录下,Hallo中存放的是Hallo项目生成的视频,Source中存放的是原视频
+
+Dockerfile是评估的镜像构建文件
+
+README.md 说明了评估镜像的具体获取和使用步骤
+
+# image
+Dockerfile是Hallo的可运行项目的镜像构建文件
+
+README.md 说明了Hallo的可运行项目的具体获取和使用步骤
+
+# Innovation
+LMD_evaluate 目录中,存放了我们组使用的新评估指标LMD的相关说明
\ No newline at end of file
diff --git a/hallo_root/evaluate_root/Dockerfile b/hallo_root/evaluate_root/Dockerfile
new file mode 100644
index 00000000..2e12d6bf
--- /dev/null
+++ b/hallo_root/evaluate_root/Dockerfile
@@ -0,0 +1,35 @@
+# 使用 NVIDIA 提供的 CUDA 镜像作为基础镜像
+# FROM nvidia/cuda:12.7-base-ubuntu22.04
+# FROM nvidia/cuda:12.7-devel-ubuntu22.04
+FROM nvidia/cuda:11.8.0-base-ubuntu22.04
+
+# 设置环境变量,避免交互式安装
+ENV DEBIAN_FRONTEND=noninteractive
+
+# 安装 Python 环境和其他依赖
+RUN apt-get update && apt-get install -y \
+ python3.10 \
+ python3-pip \
+ build-essential \
+ cmake \
+ g++ \
+ libopenblas-dev \
+ libopencv-dev \
+ liblapack-dev \
+ libboost-all-dev \
+ git \
+ ffmpeg \
+ && apt-get clean
+
+# 设置工作目录
+WORKDIR /app
+
+# 复制项目文件到容器
+COPY . /app/
+
+# 安装 Python 包
+RUN pip install --upgrade pip setuptools wheel
+RUN pip install --no-cache-dir -r requirements.txt
+
+# 安装 InsightFace
+RUN pip install insightface
diff --git a/hallo_root/evaluate_root/Evaluate.xlsx b/hallo_root/evaluate_root/Evaluate.xlsx
new file mode 100644
index 00000000..23f873a2
Binary files /dev/null and b/hallo_root/evaluate_root/Evaluate.xlsx differ
diff --git a/hallo_root/evaluate_root/Evaluate/.idea/.gitignore b/hallo_root/evaluate_root/Evaluate/.idea/.gitignore
new file mode 100644
index 00000000..359bb530
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/.idea/.gitignore
@@ -0,0 +1,3 @@
+# 默认忽略的文件
+/shelf/
+/workspace.xml
diff --git a/hallo_root/evaluate_root/Evaluate/.idea/.name b/hallo_root/evaluate_root/Evaluate/.idea/.name
new file mode 100644
index 00000000..e6709194
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/.idea/.name
@@ -0,0 +1 @@
+niqe.py
\ No newline at end of file
diff --git a/hallo_root/evaluate_root/Evaluate/.idea/TestPython.iml b/hallo_root/evaluate_root/Evaluate/.idea/TestPython.iml
new file mode 100644
index 00000000..74d515a0
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/.idea/TestPython.iml
@@ -0,0 +1,10 @@
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/hallo_root/evaluate_root/Evaluate/.idea/image_metrics.iml b/hallo_root/evaluate_root/Evaluate/.idea/image_metrics.iml
new file mode 100644
index 00000000..8b8c3954
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/.idea/image_metrics.iml
@@ -0,0 +1,12 @@
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/hallo_root/evaluate_root/Evaluate/.idea/inspectionProfiles/profiles_settings.xml b/hallo_root/evaluate_root/Evaluate/.idea/inspectionProfiles/profiles_settings.xml
new file mode 100644
index 00000000..105ce2da
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/.idea/inspectionProfiles/profiles_settings.xml
@@ -0,0 +1,6 @@
+
+
+
+
+
+
\ No newline at end of file
diff --git a/hallo_root/evaluate_root/Evaluate/.idea/misc.xml b/hallo_root/evaluate_root/Evaluate/.idea/misc.xml
new file mode 100644
index 00000000..7a29dd83
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/.idea/misc.xml
@@ -0,0 +1,4 @@
+
+
+
+
\ No newline at end of file
diff --git a/hallo_root/evaluate_root/Evaluate/.idea/modules.xml b/hallo_root/evaluate_root/Evaluate/.idea/modules.xml
new file mode 100644
index 00000000..26e7734e
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/.idea/modules.xml
@@ -0,0 +1,8 @@
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/hallo_root/evaluate_root/Evaluate/.idea/vcs.xml b/hallo_root/evaluate_root/Evaluate/.idea/vcs.xml
new file mode 100644
index 00000000..8fe5bdbd
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/.idea/vcs.xml
@@ -0,0 +1,7 @@
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/hallo_root/evaluate_root/Evaluate/README.md b/hallo_root/evaluate_root/Evaluate/README.md
new file mode 100644
index 00000000..7e1ed622
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/README.md
@@ -0,0 +1,2 @@
+# image_metrics
+FID、LPIPS、NIQE、PSNR、SSIM
diff --git a/hallo_root/evaluate_root/Evaluate/__pycache__/fid.cpython-310.pyc b/hallo_root/evaluate_root/Evaluate/__pycache__/fid.cpython-310.pyc
new file mode 100644
index 00000000..7f17c9f3
Binary files /dev/null and b/hallo_root/evaluate_root/Evaluate/__pycache__/fid.cpython-310.pyc differ
diff --git a/hallo_root/evaluate_root/Evaluate/__pycache__/imd.cpython-310.pyc b/hallo_root/evaluate_root/Evaluate/__pycache__/imd.cpython-310.pyc
new file mode 100644
index 00000000..668bf36e
Binary files /dev/null and b/hallo_root/evaluate_root/Evaluate/__pycache__/imd.cpython-310.pyc differ
diff --git a/hallo_root/evaluate_root/Evaluate/__pycache__/inception.cpython-310.pyc b/hallo_root/evaluate_root/Evaluate/__pycache__/inception.cpython-310.pyc
new file mode 100644
index 00000000..426e3646
Binary files /dev/null and b/hallo_root/evaluate_root/Evaluate/__pycache__/inception.cpython-310.pyc differ
diff --git a/hallo_root/evaluate_root/Evaluate/__pycache__/lmd.cpython-310.pyc b/hallo_root/evaluate_root/Evaluate/__pycache__/lmd.cpython-310.pyc
new file mode 100644
index 00000000..5c0ab3ec
Binary files /dev/null and b/hallo_root/evaluate_root/Evaluate/__pycache__/lmd.cpython-310.pyc differ
diff --git a/hallo_root/evaluate_root/Evaluate/__pycache__/niqe.cpython-310.pyc b/hallo_root/evaluate_root/Evaluate/__pycache__/niqe.cpython-310.pyc
new file mode 100644
index 00000000..65ee788b
Binary files /dev/null and b/hallo_root/evaluate_root/Evaluate/__pycache__/niqe.cpython-310.pyc differ
diff --git a/hallo_root/evaluate_root/Evaluate/__pycache__/niqe.cpython-312.pyc b/hallo_root/evaluate_root/Evaluate/__pycache__/niqe.cpython-312.pyc
new file mode 100644
index 00000000..0aae36e0
Binary files /dev/null and b/hallo_root/evaluate_root/Evaluate/__pycache__/niqe.cpython-312.pyc differ
diff --git a/hallo_root/evaluate_root/Evaluate/__pycache__/psnr_ssim.cpython-310.pyc b/hallo_root/evaluate_root/Evaluate/__pycache__/psnr_ssim.cpython-310.pyc
new file mode 100644
index 00000000..236d513f
Binary files /dev/null and b/hallo_root/evaluate_root/Evaluate/__pycache__/psnr_ssim.cpython-310.pyc differ
diff --git a/hallo_root/evaluate_root/Evaluate/__pycache__/utils.cpython-310.pyc b/hallo_root/evaluate_root/Evaluate/__pycache__/utils.cpython-310.pyc
new file mode 100644
index 00000000..0acf984b
Binary files /dev/null and b/hallo_root/evaluate_root/Evaluate/__pycache__/utils.cpython-310.pyc differ
diff --git a/hallo_root/evaluate_root/Evaluate/__pycache__/utils.cpython-311.pyc b/hallo_root/evaluate_root/Evaluate/__pycache__/utils.cpython-311.pyc
new file mode 100644
index 00000000..381de21a
Binary files /dev/null and b/hallo_root/evaluate_root/Evaluate/__pycache__/utils.cpython-311.pyc differ
diff --git a/hallo_root/evaluate_root/Evaluate/cal_lpips.py b/hallo_root/evaluate_root/Evaluate/cal_lpips.py
new file mode 100644
index 00000000..c52ebb40
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/cal_lpips.py
@@ -0,0 +1,33 @@
+import lpips
+from utils import tensor2img, img2tensor
+
+'''
+https://github.com/richzhang/PerceptualSimilarity
+
+@inproceedings{zhang2018perceptual,
+ title={The Unreasonable Effectiveness of Deep Features as a Perceptual Metric},
+ author={Zhang, Richard and Isola, Phillip and Efros, Alexei A and Shechtman, Eli and Wang, Oliver},
+ booktitle={CVPR},
+ year={2018}
+}
+'''
+
+
+loss_fn_alex = lpips.LPIPS(net='alex') # best forward scores
+loss_fn_vgg = lpips.LPIPS(net='vgg') # closer to "traditional" perceptual loss, when used for optimization
+
+if __name__ =='__main__':
+ import torch
+
+ img0 = torch.randn(1, 3, 64, 64) # image should be RGB, IMPORTANT: normalized to [-1,1]
+ img1 = torch.randn(1, 3, 64, 64)
+ d = loss_fn_alex(img0, img1)
+
+ print(d)
+
+ from skimage import io
+
+ clean = io.imread('clean/2762.png')
+ noisy = io.imread('noisy/2762.png')
+ print(loss_fn_alex(img2tensor(clean), img2tensor(noisy)))
+ print(loss_fn_vgg(img2tensor(clean), img2tensor(noisy)))
diff --git a/hallo_root/evaluate_root/Evaluate/fid.py b/hallo_root/evaluate_root/Evaluate/fid.py
new file mode 100644
index 00000000..803822b9
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/fid.py
@@ -0,0 +1,304 @@
+#!/usr/bin/env python3
+"""Calculates the Frechet Inception Distance (FID) to evalulate GANs
+The FID metric calculates the distance between two distributions of images.
+Typically, we have summary statistics (mean & covariance matrix) of one
+of these distributions, while the 2nd distribution is given by a GAN.
+When run as a stand-alone program, it compares the distribution of
+images that are stored as PNG/JPEG at a specified location with a
+distribution given by summary statistics (in pickle format).
+The FID is calculated by assuming that X_1 and X_2 are the activations of
+the pool_3 layer of the inception net for generated samples and real world
+samples respectivly.
+See --help to see further details.
+Code apapted from https://github.com/bioinf-jku/TTUR to use PyTorch instead
+of Tensorflow
+Copyright 2018 Institute of Bioinformatics, JKU Linz
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+ http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
+import os
+import pathlib
+from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
+
+import torch
+import numpy as np
+# from scipy.misc import imread
+from skimage import io
+from scipy import linalg
+from torch.autograd import Variable
+from torch.nn.functional import adaptive_avg_pool2d
+
+from inception import InceptionV3
+
+import cv2
+
+
+def get_activations(images, model, batch_size=64, dims=2048,
+ cuda=False, verbose=False):
+ """Calculates the activations of the pool_3 layer for all images.
+ Params:
+ -- images : Numpy array of dimension (n_images, 3, hi, wi). The values
+ must lie between 0 and 1.
+ -- model : Instance of inception model
+ -- batch_size : the images numpy array is split into batches with
+ batch size batch_size. A reasonable batch size depends
+ on the hardware.
+ -- dims : Dimensionality of features returned by Inception
+ -- cuda : If set to True, use GPU
+ -- verbose : If set to True and parameter out_step is given, the number
+ of calculated batches is reported.
+ Returns:
+ -- A numpy array of dimension (num images, dims) that contains the
+ activations of the given tensor when feeding inception with the
+ query tensor.
+ """
+ model.eval()
+
+ d0 = images.shape[0]
+ if batch_size > d0:
+ print(('Warning: batch size is bigger than the data size. '
+ 'Setting batch size to data size'))
+ batch_size = d0
+
+ n_batches = d0 // batch_size
+ n_used_imgs = n_batches * batch_size
+
+ pred_arr = np.empty((n_used_imgs, dims))
+ for i in range(n_batches):
+ if verbose:
+ print('\rPropagating batch %d/%d' % (i + 1, n_batches),
+ end='', flush=True)
+ start = i * batch_size
+ end = start + batch_size
+
+ batch = torch.from_numpy(images[start:end]).type(torch.FloatTensor)
+ batch = Variable(batch, volatile=True)
+ if cuda:
+ batch = batch.cuda()
+
+ pred = model(batch)[0]
+
+ # If model output is not scalar, apply global spatial average pooling.
+ # This happens if you choose a dimensionality not equal 2048.
+ if pred.shape[2] != 1 or pred.shape[3] != 1:
+ pred = adaptive_avg_pool2d(pred, output_size=(1, 1))
+
+ pred_arr[start:end] = pred.cpu().data.numpy().reshape(batch_size, -1)
+
+ if verbose:
+ print(' done')
+
+ return pred_arr
+
+
+def calculate_frechet_distance(mu1, sigma1, mu2, sigma2, eps=1e-6):
+ """Numpy implementation of the Frechet Distance.
+ The Frechet distance between two multivariate Gaussians X_1 ~ N(mu_1, C_1)
+ and X_2 ~ N(mu_2, C_2) is
+ d^2 = ||mu_1 - mu_2||^2 + Tr(C_1 + C_2 - 2*sqrt(C_1*C_2)).
+ Stable version by Dougal J. Sutherland.
+ Params:
+ -- mu1 : Numpy array containing the activations of a layer of the
+ inception net (like returned by the function 'get_predictions')
+ for generated samples.
+ -- mu2 : The sample mean over activations, precalculated on an
+ representive data set.
+ -- sigma1: The covariance matrix over activations for generated samples.
+ -- sigma2: The covariance matrix over activations, precalculated on an
+ representive data set.
+ Returns:
+ -- : The Frechet Distance.
+ """
+
+ mu1 = np.atleast_1d(mu1)
+ mu2 = np.atleast_1d(mu2)
+
+ sigma1 = np.atleast_2d(sigma1)
+ sigma2 = np.atleast_2d(sigma2)
+
+ assert mu1.shape == mu2.shape, \
+ 'Training and test mean vectors have different lengths'
+ assert sigma1.shape == sigma2.shape, \
+ 'Training and test covariances have different dimensions'
+
+ diff = mu1 - mu2
+
+ # Product might be almost singular
+ covmean, _ = linalg.sqrtm(sigma1.dot(sigma2), disp=False)
+ if not np.isfinite(covmean).all():
+ msg = ('fid calculation produces singular product; '
+ 'adding %s to diagonal of cov estimates') % eps
+ print(msg)
+ offset = np.eye(sigma1.shape[0]) * eps
+ covmean = linalg.sqrtm((sigma1 + offset).dot(sigma2 + offset))
+
+ # Numerical error might give slight imaginary component
+ if np.iscomplexobj(covmean):
+ if not np.allclose(np.diagonal(covmean).imag, 0, atol=1e-3):
+ m = np.max(np.abs(covmean.imag))
+ raise ValueError('Imaginary component {}'.format(m))
+ covmean = covmean.real
+
+ tr_covmean = np.trace(covmean)
+
+ return (diff.dot(diff) + np.trace(sigma1) +
+ np.trace(sigma2) - 2 * tr_covmean)
+
+
+def calculate_activation_statistics(images, model, batch_size=64,
+ dims=2048, cuda=False, verbose=False):
+ """Calculation of the statistics used by the FID.
+ Params:
+ -- images : Numpy array of dimension (n_images, 3, hi, wi). The values
+ must lie between 0 and 1.
+ -- model : Instance of inception model
+ -- batch_size : The images numpy array is split into batches with
+ batch size batch_size. A reasonable batch size
+ depends on the hardware.
+ -- dims : Dimensionality of features returned by Inception
+ -- cuda : If set to True, use GPU
+ -- verbose : If set to True and parameter out_step is given, the
+ number of calculated batches is reported.
+ Returns:
+ -- mu : The mean over samples of the activations of the pool_3 layer of
+ the inception model.
+ -- sigma : The covariance matrix of the activations of the pool_3 layer of
+ the inception model.
+ """
+ act = get_activations(images, model, batch_size, dims, cuda, verbose)
+ mu = np.mean(act, axis=0)
+ sigma = np.cov(act, rowvar=False)
+ return mu, sigma
+
+
+def _compute_statistics_of_path(path, model, batch_size, dims, cuda):
+ if path.endswith('.npz'):
+ f = np.load(path)
+ m, s = f['mu'][:], f['sigma'][:]
+ f.close()
+ else:
+ path = pathlib.Path(path)
+ files = list(path.glob('*.jpg')) + list(path.glob('*.png'))
+
+ imgs = np.array([io.imread(str(fn)).astype(np.float32) for fn in files])
+
+ # for gray images, expand image dims
+ if imgs.ndim==3 and not 1 in imgs.shape:
+ imgs = imgs[:,:,:,np.newaxis]
+ imgs = np.repeat(imgs, repeats=3, axis=3)
+
+ # Bring images to shape (B, 3, H, W)
+ imgs = imgs.transpose((0, 3, 1, 2))
+
+ # Rescale images to be between 0 and 1
+ imgs /= 255
+
+ m, s = calculate_activation_statistics(imgs, model, batch_size,
+ dims, cuda)
+
+ return m, s
+
+
+def calculate_fid_given_paths(path1, path2, batch_size, cuda, dims):
+ """Calculates the FID of two paths"""
+ if not os.path.exists(path1):
+ raise RuntimeError('Invalid path: %s' % path1)
+ if not os.path.exists(path2):
+ raise RuntimeError('Invalid path: %s' % path2)
+
+ block_idx = InceptionV3.BLOCK_INDEX_BY_DIM[dims]
+
+ model = InceptionV3([block_idx])
+ if cuda:
+ model.cuda()
+
+ m1, s1 = _compute_statistics_of_path(path1, model, batch_size,
+ dims, cuda)
+ m2, s2 = _compute_statistics_of_path(path2, model, batch_size,
+ dims, cuda)
+ fid_value = calculate_frechet_distance(m1, s1, m2, s2)
+
+ return fid_value
+
+def img_scissors(img, origin_size, dest_size): # 将img先裁剪为origin_size*origin_size,再resize为dest_size*dest_size
+ from skimage import io, transform
+ from skimage.util import img_as_ubyte
+
+ height, width = img.shape[:2]
+ center_y, center_x = height // 2, width // 2
+ crop_size = origin_size
+ half_crop = crop_size // 2
+ start_x = max(center_x - half_crop, 0)
+ start_y = max(center_y - half_crop, 0)
+ end_x = min(center_x + half_crop, width)
+ end_y = min(center_y + half_crop, height)
+ cropped_img = img[start_y:end_y, start_x:end_x]
+ resized_img = transform.resize(cropped_img, (dest_size, dest_size), anti_aliasing=True)
+ resized_img_ubyte = img_as_ubyte(resized_img)
+ return resized_img_ubyte
+
+if __name__ == '__main__':
+
+ from skimage import io
+
+ clean = 'ImgsForFIDCalcu/Jae-in/origin'
+ noisy = 'ImgsForFIDCalcu/Jae-in/result'
+
+ # parser = ArgumentParser(formatter_class=ArgumentDefaultsHelpFormatter)
+ parser = ArgumentParser()
+ parser.add_argument('--path1', type=str, default=clean,
+ help='Path to the generated images or to .npz statistic files')
+ parser.add_argument('--path2', type=str, default=noisy,
+ help='Path to the generated images or to .npz statistic files')
+
+ parser.add_argument('--batch-size', type=int, default=64,
+ help='Batch size to use')
+ parser.add_argument('--dims', type=int, default=2048,
+ choices=list(InceptionV3.BLOCK_INDEX_BY_DIM),
+ help=('Dimensionality of Inception features to use. '
+ 'By default, uses pool3 features'))
+ parser.add_argument('-c', '--gpu', default='0', type=str,
+ help='GPU to use (leave blank for CPU only)')
+
+ args = parser.parse_args()
+ os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
+
+ fid_value = calculate_fid_given_paths(args.path1,
+ args.path2,
+ args.batch_size,
+ args.gpu != '',
+ args.dims)
+ print('FID: ', fid_value)
+
+ # video_origin = cv2.VideoCapture("MP4/Jae-in.mp4") # 在这里修改路径,改为想要计算NIQE的视频路径,video_origin即原视频
+ # video_result = cv2.VideoCapture("MP4/Jae-inRes.mp4") # 在这里修改路径,改为想要计算NIQE的视频路径,video_result即hallo生成的视频
+ # save_path_origin = "ImgsForFIDCalcu/Jae-in/origin" # 用于定性评估的帧图像保存的路径
+ # save_path_result = "ImgsForFIDCalcu/Jae-in/result"
+ # index = 0
+ # if video_origin.isOpened() and video_result.isOpened():
+ # rval_origin, frame_origin = video_origin.read() # 读取视频帧
+ # rval_result, frame_result = video_result.read()
+ # else:
+ # rval_origin = False
+ # rval_result = False
+
+ # while rval_origin and rval_result:
+ # print(index)
+ # rval_origin, frame_origin = video_origin.read()
+ # img_origin = img_scissors(frame_origin, 720, 512)
+ # rval_result, frame_result = video_result.read()
+ # img_result = frame_result
+ # if img_origin is None or img_result is None:
+ # break
+ # else:
+ # cv2.imwrite(save_path_origin + "/" + str(index) + ".jpg", img_origin)
+ # cv2.imwrite(save_path_result + "/" + str(index) + "Res.jpg", img_result)
+ # index += 1
+ # break
diff --git a/hallo_root/evaluate_root/Evaluate/inception.py b/hallo_root/evaluate_root/Evaluate/inception.py
new file mode 100644
index 00000000..9ba16a55
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/inception.py
@@ -0,0 +1,307 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torchvision import models
+
+try:
+ from torchvision.models.utils import load_state_dict_from_url
+except ImportError:
+ from torch.utils.model_zoo import load_url as load_state_dict_from_url
+
+# Inception weights ported to Pytorch from
+# http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz
+FID_WEIGHTS_URL = 'https://github.com/mseitzer/pytorch-fid/releases/download/fid_weights/pt_inception-2015-12-05-6726825d.pth'
+
+
+class InceptionV3(nn.Module):
+ """Pretrained InceptionV3 network returning feature maps"""
+
+ # Index of default block of inception to return,
+ # corresponds to output of final average pooling
+ DEFAULT_BLOCK_INDEX = 3
+
+ # Maps feature dimensionality to their output blocks indices
+ BLOCK_INDEX_BY_DIM = {
+ 64: 0, # First max pooling features
+ 192: 1, # Second max pooling featurs
+ 768: 2, # Pre-aux classifier features
+ 2048: 3 # Final average pooling features
+ }
+
+ def __init__(self,
+ output_blocks=[DEFAULT_BLOCK_INDEX],
+ resize_input=True,
+ normalize_input=True,
+ requires_grad=False,
+ use_fid_inception=True):
+ """Build pretrained InceptionV3
+ Parameters
+ ----------
+ output_blocks : list of int
+ Indices of blocks to return features of. Possible values are:
+ - 0: corresponds to output of first max pooling
+ - 1: corresponds to output of second max pooling
+ - 2: corresponds to output which is fed to aux classifier
+ - 3: corresponds to output of final average pooling
+ resize_input : bool
+ If true, bilinearly resizes input to width and height 299 before
+ feeding input to model. As the network without fully connected
+ layers is fully convolutional, it should be able to handle inputs
+ of arbitrary size, so resizing might not be strictly needed
+ normalize_input : bool
+ If true, scales the input from range (0, 1) to the range the
+ pretrained Inception network expects, namely (-1, 1)
+ requires_grad : bool
+ If true, parameters of the model require gradients. Possibly useful
+ for finetuning the network
+ use_fid_inception : bool
+ If true, uses the pretrained Inception model used in Tensorflow's
+ FID implementation. If false, uses the pretrained Inception model
+ available in torchvision. The FID Inception model has different
+ weights and a slightly different structure from torchvision's
+ Inception model. If you want to compute FID scores, you are
+ strongly advised to set this parameter to true to get comparable
+ results.
+ """
+ super(InceptionV3, self).__init__()
+
+ self.resize_input = resize_input
+ self.normalize_input = normalize_input
+ self.output_blocks = sorted(output_blocks)
+ self.last_needed_block = max(output_blocks)
+
+ assert self.last_needed_block <= 3, \
+ 'Last possible output block index is 3'
+
+ self.blocks = nn.ModuleList()
+
+ if use_fid_inception:
+ inception = fid_inception_v3()
+ else:
+ inception = models.inception_v3(pretrained=True)
+
+ # Block 0: input to maxpool1
+ block0 = [
+ inception.Conv2d_1a_3x3,
+ inception.Conv2d_2a_3x3,
+ inception.Conv2d_2b_3x3,
+ nn.MaxPool2d(kernel_size=3, stride=2)
+ ]
+ self.blocks.append(nn.Sequential(*block0))
+
+ # Block 1: maxpool1 to maxpool2
+ if self.last_needed_block >= 1:
+ block1 = [
+ inception.Conv2d_3b_1x1,
+ inception.Conv2d_4a_3x3,
+ nn.MaxPool2d(kernel_size=3, stride=2)
+ ]
+ self.blocks.append(nn.Sequential(*block1))
+
+ # Block 2: maxpool2 to aux classifier
+ if self.last_needed_block >= 2:
+ block2 = [
+ inception.Mixed_5b,
+ inception.Mixed_5c,
+ inception.Mixed_5d,
+ inception.Mixed_6a,
+ inception.Mixed_6b,
+ inception.Mixed_6c,
+ inception.Mixed_6d,
+ inception.Mixed_6e,
+ ]
+ self.blocks.append(nn.Sequential(*block2))
+
+ # Block 3: aux classifier to final avgpool
+ if self.last_needed_block >= 3:
+ block3 = [
+ inception.Mixed_7a,
+ inception.Mixed_7b,
+ inception.Mixed_7c,
+ nn.AdaptiveAvgPool2d(output_size=(1, 1))
+ ]
+ self.blocks.append(nn.Sequential(*block3))
+
+ for param in self.parameters():
+ param.requires_grad = requires_grad
+
+ def forward(self, inp):
+ """Get Inception feature maps
+ Parameters
+ ----------
+ inp : torch.autograd.Variable
+ Input tensor of shape Bx3xHxW. Values are expected to be in
+ range (0, 1)
+ Returns
+ -------
+ List of torch.autograd.Variable, corresponding to the selected output
+ block, sorted ascending by index
+ """
+ outp = []
+ x = inp
+
+ if self.resize_input:
+ x = F.interpolate(x,
+ size=(299, 299),
+ mode='bilinear',
+ align_corners=False)
+
+ if self.normalize_input:
+ x = 2 * x - 1 # Scale from range (0, 1) to range (-1, 1)
+
+ for idx, block in enumerate(self.blocks):
+ x = block(x)
+ if idx in self.output_blocks:
+ outp.append(x)
+
+ if idx == self.last_needed_block:
+ break
+
+ return outp
+
+
+def fid_inception_v3():
+ """Build pretrained Inception model for FID computation
+ The Inception model for FID computation uses a different set of weights
+ and has a slightly different structure than torchvision's Inception.
+ This method first constructs torchvision's Inception and then patches the
+ necessary parts that are different in the FID Inception model.
+ """
+ inception = models.inception_v3(num_classes=1008,
+ aux_logits=False,
+ pretrained=False)
+ inception.Mixed_5b = FIDInceptionA(192, pool_features=32)
+ inception.Mixed_5c = FIDInceptionA(256, pool_features=64)
+ inception.Mixed_5d = FIDInceptionA(288, pool_features=64)
+ inception.Mixed_6b = FIDInceptionC(768, channels_7x7=128)
+ inception.Mixed_6c = FIDInceptionC(768, channels_7x7=160)
+ inception.Mixed_6d = FIDInceptionC(768, channels_7x7=160)
+ inception.Mixed_6e = FIDInceptionC(768, channels_7x7=192)
+ inception.Mixed_7b = FIDInceptionE_1(1280)
+ inception.Mixed_7c = FIDInceptionE_2(2048)
+
+ # state_dict = load_state_dict_from_url(FID_WEIGHTS_URL, progress=True)
+ checkpoint = torch.load('pre-train-models/fid/pt_inception-2015-12-05-6726825d.pth')
+ state_dict = checkpoint
+ inception.load_state_dict(state_dict)
+ return inception
+
+
+class FIDInceptionA(models.inception.InceptionA):
+ """InceptionA block patched for FID computation"""
+ def __init__(self, in_channels, pool_features):
+ super(FIDInceptionA, self).__init__(in_channels, pool_features)
+
+ def forward(self, x):
+ branch1x1 = self.branch1x1(x)
+
+ branch5x5 = self.branch5x5_1(x)
+ branch5x5 = self.branch5x5_2(branch5x5)
+
+ branch3x3dbl = self.branch3x3dbl_1(x)
+ branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
+ branch3x3dbl = self.branch3x3dbl_3(branch3x3dbl)
+
+ # Patch: Tensorflow's average pool does not use the padded zero's in
+ # its average calculation
+ branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1,
+ count_include_pad=False)
+ branch_pool = self.branch_pool(branch_pool)
+
+ outputs = [branch1x1, branch5x5, branch3x3dbl, branch_pool]
+ return torch.cat(outputs, 1)
+
+
+class FIDInceptionC(models.inception.InceptionC):
+ """InceptionC block patched for FID computation"""
+ def __init__(self, in_channels, channels_7x7):
+ super(FIDInceptionC, self).__init__(in_channels, channels_7x7)
+
+ def forward(self, x):
+ branch1x1 = self.branch1x1(x)
+
+ branch7x7 = self.branch7x7_1(x)
+ branch7x7 = self.branch7x7_2(branch7x7)
+ branch7x7 = self.branch7x7_3(branch7x7)
+
+ branch7x7dbl = self.branch7x7dbl_1(x)
+ branch7x7dbl = self.branch7x7dbl_2(branch7x7dbl)
+ branch7x7dbl = self.branch7x7dbl_3(branch7x7dbl)
+ branch7x7dbl = self.branch7x7dbl_4(branch7x7dbl)
+ branch7x7dbl = self.branch7x7dbl_5(branch7x7dbl)
+
+ # Patch: Tensorflow's average pool does not use the padded zero's in
+ # its average calculation
+ branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1,
+ count_include_pad=False)
+ branch_pool = self.branch_pool(branch_pool)
+
+ outputs = [branch1x1, branch7x7, branch7x7dbl, branch_pool]
+ return torch.cat(outputs, 1)
+
+
+class FIDInceptionE_1(models.inception.InceptionE):
+ """First InceptionE block patched for FID computation"""
+ def __init__(self, in_channels):
+ super(FIDInceptionE_1, self).__init__(in_channels)
+
+ def forward(self, x):
+ branch1x1 = self.branch1x1(x)
+
+ branch3x3 = self.branch3x3_1(x)
+ branch3x3 = [
+ self.branch3x3_2a(branch3x3),
+ self.branch3x3_2b(branch3x3),
+ ]
+ branch3x3 = torch.cat(branch3x3, 1)
+
+ branch3x3dbl = self.branch3x3dbl_1(x)
+ branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
+ branch3x3dbl = [
+ self.branch3x3dbl_3a(branch3x3dbl),
+ self.branch3x3dbl_3b(branch3x3dbl),
+ ]
+ branch3x3dbl = torch.cat(branch3x3dbl, 1)
+
+ # Patch: Tensorflow's average pool does not use the padded zero's in
+ # its average calculation
+ branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1,
+ count_include_pad=False)
+ branch_pool = self.branch_pool(branch_pool)
+
+ outputs = [branch1x1, branch3x3, branch3x3dbl, branch_pool]
+ return torch.cat(outputs, 1)
+
+
+class FIDInceptionE_2(models.inception.InceptionE):
+ """Second InceptionE block patched for FID computation"""
+ def __init__(self, in_channels):
+ super(FIDInceptionE_2, self).__init__(in_channels)
+
+ def forward(self, x):
+ branch1x1 = self.branch1x1(x)
+
+ branch3x3 = self.branch3x3_1(x)
+ branch3x3 = [
+ self.branch3x3_2a(branch3x3),
+ self.branch3x3_2b(branch3x3),
+ ]
+ branch3x3 = torch.cat(branch3x3, 1)
+
+ branch3x3dbl = self.branch3x3dbl_1(x)
+ branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
+ branch3x3dbl = [
+ self.branch3x3dbl_3a(branch3x3dbl),
+ self.branch3x3dbl_3b(branch3x3dbl),
+ ]
+ branch3x3dbl = torch.cat(branch3x3dbl, 1)
+
+ # Patch: The FID Inception model uses max pooling instead of average
+ # pooling. This is likely an error in this specific Inception
+ # implementation, as other Inception models use average pooling here
+ # (which matches the description in the paper).
+ branch_pool = F.max_pool2d(x, kernel_size=3, stride=1, padding=1)
+ branch_pool = self.branch_pool(branch_pool)
+
+ outputs = [branch1x1, branch3x3, branch3x3dbl, branch_pool]
+ return torch.cat(outputs, 1)
diff --git a/hallo_root/evaluate_root/Evaluate/inception_score_torch.py b/hallo_root/evaluate_root/Evaluate/inception_score_torch.py
new file mode 100644
index 00000000..863993bd
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/inception_score_torch.py
@@ -0,0 +1,163 @@
+import torch
+from torch import nn
+from torch.autograd import Variable
+from torch.nn import functional as F
+import torch.utils.data
+import torchvision.transforms as transforms
+import os
+import os.path
+
+from torchvision.models.inception import inception_v3
+import numpy as np
+import math
+from PIL import Image
+import argparse
+
+
+
+
+IMG_EXTENSIONS = [
+ '.jpg', '.JPG', '.jpeg', '.JPEG',
+ '.png', '.PNG', '.ppm', '.PPM', '.bmp', '.BMP',
+]
+
+
+def is_image_file(filename):
+ return any(filename.endswith(extension) for extension in IMG_EXTENSIONS)
+
+
+def make_dataset_dir(dir):
+ """
+ :param dir: directory paths that store the image
+ :return: image paths and sizes
+ """
+ img_paths = []
+
+ assert os.path.isdir(dir), '%s is not a valid directory' % dir
+
+ for root, _, fnames in os.walk(dir):
+ for fname in sorted(fnames):
+ if is_image_file(fname):
+ path = os.path.join(root, fname)
+ img_paths.append(path)
+
+ return img_paths, len(img_paths)
+
+def make_dataset_txt(files):
+ """
+ :param path_files: the path of txt file that store the image paths
+ :return: image paths and sizes
+ """
+ img_paths = []
+
+ with open(files) as f:
+ paths = f.readlines()
+
+ for path in paths:
+ path = path.strip()
+ img_paths.append(path)
+
+ return img_paths, len(img_paths)
+
+
+def make_dataset(path_files):
+ if path_files.find('.txt') != -1:
+ paths, size = make_dataset_txt(path_files)
+ else:
+ paths, size = make_dataset_dir(path_files)
+
+ return paths, size
+
+
+def get_inception_score(imgs, batch_size=32, resize=True, splits=10):
+ """Computes the inception score of the generated images imgs
+ imgs -- Torch dataset of (3xHxW) numpy images normalized in the range [-1, 1]
+ cuda -- whether or not to run on GPU
+ batch_size -- batch size for feeding into Inception v3
+ splits -- number of splits
+ """
+ N = imgs.shape[0]
+
+ assert batch_size > 0
+ assert N > batch_size
+
+ # Load inception model
+ inception_model = inception_v3(pretrained=True, transform_input=True)
+ up = nn.Upsample(size=(299, 299), mode='bilinear', align_corners=True)
+ if torch.cuda.is_available():
+ inception_model.cuda(0)
+ up.cuda(0)
+ inception_model.eval()
+
+ def get_pred(x):
+ if resize:
+ x = up(x)
+ x = inception_model(x)
+ return F.softmax(x, dim=-1).data.cpu().numpy()
+
+ # Get predictions
+ preds = np.zeros((N, 1000))
+ n_batches = int(math.ceil(float(N) / float(batch_size)))
+
+ for i in range(n_batches):
+ batch = torch.from_numpy(imgs[i * batch_size:min((i + 1) * batch_size, N)])
+ batchv = Variable(batch)
+ if torch.cuda.is_available():
+ batchv = batchv.cuda(0)
+
+ preds[i * batch_size:min((i + 1) * batch_size, N)] = get_pred(batchv)
+
+ # Now compute the mean kl-div
+ scores = []
+
+ for i in range(splits):
+ part = preds[(i * preds.shape[0] // splits):((i + 1) * preds.shape[0] // splits), :]
+ kl = part * (np.log(part) - np.log(np.expand_dims(np.mean(part, 0), 0)))
+ kl = np.mean(np.sum(kl, 1))
+ scores.append(np.exp(kl))
+
+ return np.mean(scores), np.std(scores)
+
+
+if __name__=='__main__':
+ parser = argparse.ArgumentParser(description='Evaluation ont the dataset')
+ parser.add_argument('--save_path', type=str,
+ default='noisy/',
+ help='path to save the test dataset')
+ parser.add_argument('--num_test', type=int, default=400,
+ help='how many images to load for each test')
+ args = parser.parse_args()
+
+
+ img_paths, img_size = make_dataset(args.save_path)
+ print(len(img_paths))
+
+ iters = int(10000 / args.num_test)
+
+ u = np.zeros(iters, np.float32)
+ sigma = np.zeros(iters, np.float32)
+
+ transform = transforms.Compose([
+ transforms.ToTensor(),
+ transforms.Normalize((0.5,0.5,0.5), (0.5,0.5,0.5))
+ ])
+
+ for i in range(iters):
+ all_samples = []
+ num = i*args.num_test
+
+ for j in range(args.num_test):
+ index = num+j
+ image = Image.open(img_paths[index]).resize([256,256]).convert('RGB')
+ all_samples.append(transform(image).reshape(-1,3,256,256))
+ all_samples = np.concatenate(all_samples, axis=0)
+ u_iter,sigma_iter = get_inception_score(all_samples, batch_size=8)
+
+ u[i] = u_iter
+ sigma[i] = sigma_iter
+
+ print(i)
+ print('{:10.4f},{:10.4f}'.format(u_iter, sigma_iter))
+
+ print('{:>10},{:>10}'.format('u', 'sigma'))
+ print('{:10.4f},{:10.4f}'.format(u.mean(), sigma.mean()))
diff --git a/hallo_root/evaluate_root/Evaluate/lmd.py b/hallo_root/evaluate_root/Evaluate/lmd.py
new file mode 100644
index 00000000..fcabd0d7
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/lmd.py
@@ -0,0 +1,65 @@
+import cv2
+import numpy as np
+from scipy.spatial import distance
+
+# 使用 dlib 提取关键点
+import dlib
+
+# 初始化 dlib 的人脸检测器和关键点检测器
+detector = dlib.get_frontal_face_detector()
+predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat") # 需要下载权重文件
+
+def extract_landmarks(image):
+ """提取单帧图像的面部关键点"""
+ gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
+ faces = detector(gray)
+ if len(faces) == 0:
+ return None
+ shape = predictor(gray, faces[0])
+ landmarks = np.array([[p.x, p.y] for p in shape.parts()])
+ return landmarks
+
+def compute_lmd(reference_video_path, generated_video_path):
+ """计算 LMD 指标"""
+ # 打开参考视频和生成视频
+ ref_cap = cv2.VideoCapture(reference_video_path)
+ gen_cap = cv2.VideoCapture(generated_video_path)
+
+ ref_landmarks_list = []
+ gen_landmarks_list = []
+ frame_count = 0
+
+ while True:
+ ref_ret, ref_frame = ref_cap.read()
+ gen_ret, gen_frame = gen_cap.read()
+
+ if not ref_ret or not gen_ret:
+ break
+
+ ref_landmarks = extract_landmarks(ref_frame)
+ gen_landmarks = extract_landmarks(gen_frame)
+
+ if ref_landmarks is not None and gen_landmarks is not None:
+ ref_landmarks_list.append(ref_landmarks)
+ gen_landmarks_list.append(gen_landmarks)
+ frame_count += 1
+
+ # 检查关键点数据是否对齐
+ if len(ref_landmarks_list) != len(gen_landmarks_list):
+ raise ValueError("Mismatch in landmark detection between reference and generated videos.")
+
+ # 计算 LMD
+ lmd = 0
+ for ref, gen in zip(ref_landmarks_list, gen_landmarks_list):
+ distances = [distance.euclidean(r, g) for r, g in zip(ref, gen)]
+ lmd += np.mean(distances)
+
+ lmd /= frame_count
+ # print(f"LMD: {lmd}")
+ return lmd
+
+# 示例:运行 LMD 计算
+# reference_video = "../MP4/Source/Jae-in.mp4"
+# generated_video = "../MP4/Hallo/Jae-in.mp4"
+
+# compute_lmd(reference_video, generated_video)
\ No newline at end of file
diff --git a/hallo_root/evaluate_root/Evaluate/main.py b/hallo_root/evaluate_root/Evaluate/main.py
new file mode 100644
index 00000000..47e272e3
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/main.py
@@ -0,0 +1,141 @@
+import niqe, psnr_ssim, fid, lmd
+import cv2
+from skimage import io
+from argparse import ArgumentParser
+from inception import InceptionV3
+import os
+import subprocess
+
+def calculate_FID(path1, path2):
+ # 构建计算FID的解释器,用来传递计算所需的参数
+ parser = ArgumentParser()
+ parser.add_argument('--path1', type=str, default=path1)
+ parser.add_argument('--path2', type=str, default=path2)
+ parser.add_argument('--batch-size', type=int, default=64)
+ parser.add_argument('--dims', type=int, default=2048,
+ choices=list(InceptionV3.BLOCK_INDEX_BY_DIM))
+ parser.add_argument('-c', '--gpu', default='0', type=str)
+
+ args = parser.parse_args()
+ os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
+
+ FID = fid.calculate_fid_given_paths(args.path1,
+ args.path2,
+ args.batch_size,
+ args.gpu != '',
+ args.dims)
+ return FID
+
+def run_pipeline(videofile, reference, data_dir):
+ # python run_pipeline.py --videofile /path/to/your/video --reference wav2lip --data_dir tmp_dir
+ target_dir = "syncnet_python"
+ command = [
+ "python3", "run_pipeline.py",
+ "--videofile", videofile,
+ "--reference", reference,
+ "--data_dir", data_dir
+ ]
+ try:
+ result = subprocess.run(command, cwd=target_dir, check=True, capture_output=True, text=True)
+ except subprocess.CalledProcessError as e:
+ print(e.stderr)
+
+
+def calculate_LSE(videofile, reference, data_dir):
+ # python calculate_scores_real_videos.py --videofile /path/to/you/video --reference wav2lip --data_dir tmp_dir >> all_scores.txt
+ target_dir = "syncnet_python"
+ command = [
+ "python3", "calculate_scores_real_videos.py",
+ "--videofile", videofile,
+ "--reference", reference,
+ "--data_dir", data_dir,
+ ]
+ # result = subprocess.run(command, cwd=target_dir, check=True, capture_output=True, text=True)
+ try:
+ result = subprocess.run(command, cwd=target_dir, check=True, capture_output=True, text=True)
+ scores = list(map(float, result.stdout.strip().split()))
+ return scores
+ except subprocess.CalledProcessError as e:
+ print(e.stderr)
+ return None
+
+
+if __name__ == "__main__":
+ example_video_name = 'Shaheen.mp4'
+ print("The video for demonstration is " + example_video_name)
+
+ example_source_video_path = '../MP4/Source'
+ example_hallo_video_path = '../MP4/Hallo'
+
+ example_source_video = cv2.VideoCapture(example_source_video_path + "/" + example_video_name)
+ example_hallo_video = cv2.VideoCapture(example_hallo_video_path + "/" + example_video_name)
+
+ example_FID_source_img_path = '../ImgsForFIDCalcu/source'
+ example_FID_hallo_img_path = '../ImgsForFIDCalcu/hallo'
+
+ params_path = 'pre-train-models/' # 计算NIQE所需要的变量
+ index = 0 # 帧序号
+
+ niqe_source = 0.0
+ niqe_hallo = 0.0
+ PSNR = 0.0
+ SSIM = 0.0
+
+ if example_source_video.isOpened() and example_hallo_video.isOpened():
+ rval_source, frame_source = example_source_video.read() # 读取视频帧
+ rval_hallo, frame_hallo = example_hallo_video.read()
+ else:
+ rval_source = False
+ rval_hallo = False
+
+ while rval_source and rval_hallo:
+ # 对视频的每一帧进行处理
+ rval_source, frame_source = example_source_video.read()
+ img_source = niqe.img_scissors(frame_source, 720, 512) # 对源视频的帧图像进行尺寸统一处理
+ rval_hallo, frame_hallo = example_hallo_video.read()
+ img_hallo = frame_hallo
+ if img_source is None or img_hallo is None:
+ print("Loop End.")
+ break
+ else:
+ cv2.imwrite(example_FID_source_img_path + "/" + str(index) + ".jpg", img_source)
+ cv2.imwrite(example_FID_hallo_img_path + "/" + str(index) + ".jpg", img_hallo)
+
+ #计算source与hallo的NIQE值
+ niqe_source += niqe.calculate_niqe(img_source, crop_border=0, params_path=params_path)
+ niqe_hallo += niqe.calculate_niqe(img_hallo, crop_border=0, params_path=params_path)
+ #计算PSNR值
+ PSNR += psnr_ssim.calculate_psnr(img_source, img_hallo, crop_border=0)
+ #计算SSIM值
+ SSIM += psnr_ssim.calculate_ssim(img_source, img_hallo, crop_border=0)
+
+ print(index)
+ index += 1
+
+ niqe_source /= index
+ niqe_hallo /= index
+
+ PSNR /= index
+ SSIM /= index
+
+ FID = calculate_FID(example_FID_source_img_path, example_FID_hallo_img_path)
+
+ run_pipeline("../" + example_source_video_path + "/" + example_video_name, "wav2lip", "tmp_dir")
+ scores_source = calculate_LSE("../" + example_source_video_path + "/" + example_video_name, "wav2lip", "tmp_dir")
+ LSE_D_source = scores_source[0]
+ LSE_C_source = scores_source[1]
+
+ run_pipeline("../" + example_hallo_video_path + "/" + example_video_name, "wav2lip", "tmp_dir")
+ scores_hallo = calculate_LSE("../" + example_hallo_video_path + "/" + example_video_name, "wav2lip", "tmp_dir")
+ LSE_D_hallo = scores_hallo[0]
+ LSE_C_hallo = scores_hallo[1]
+
+ LMD = lmd.compute_lmd(example_source_video_path + "/" + example_video_name, example_hallo_video_path + "/" + example_video_name)
+
+ print("source NIQE: " + str(niqe_source) + ", hallo NIQE: " + str(niqe_hallo))
+ print("PSNR: " + str(PSNR))
+ print("SSIM: " + str(SSIM))
+ print("FID: " + str(FID))
+ print("source LSE_C: " + str(LSE_C_source) + ", hallo LSE_C: " + str(LSE_C_hallo))
+ print("source LSE_D: " + str(LSE_D_source) + ", hallo LSE_D: " + str(LSE_D_hallo))
+ print("LMD: " + str(LMD))
diff --git a/hallo_root/evaluate_root/Evaluate/niqe.py b/hallo_root/evaluate_root/Evaluate/niqe.py
new file mode 100644
index 00000000..7ad33afc
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/niqe.py
@@ -0,0 +1,263 @@
+import cv2
+import math
+import numpy as np
+import os
+from scipy.ndimage import convolve
+from scipy.special import gamma
+
+from utils import reorder_image, to_y_channel,imresize
+
+def estimate_aggd_param(block):
+ """Estimate AGGD (Asymmetric Generalized Gaussian Distribution) parameters.
+ Args:
+ block (ndarray): 2D Image block.
+ Returns:
+ tuple: alpha (float), beta_l (float) and beta_r (float) for the AGGD
+ distribution (Estimating the parames in Equation 7 in the paper).
+ """
+ block = block.flatten()
+ gam = np.arange(0.2, 10.001, 0.001) # len = 9801
+ gam_reciprocal = np.reciprocal(gam)
+ r_gam = np.square(gamma(gam_reciprocal * 2)) / (gamma(gam_reciprocal) * gamma(gam_reciprocal * 3))
+
+ left_std = np.sqrt(np.mean(block[block < 0]**2))
+ right_std = np.sqrt(np.mean(block[block > 0]**2))
+ gammahat = left_std / right_std
+ rhat = (np.mean(np.abs(block)))**2 / np.mean(block**2)
+ rhatnorm = (rhat * (gammahat**3 + 1) * (gammahat + 1)) / ((gammahat**2 + 1)**2)
+ array_position = np.argmin((r_gam - rhatnorm)**2)
+
+ alpha = gam[array_position]
+ beta_l = left_std * np.sqrt(gamma(1 / alpha) / gamma(3 / alpha))
+ beta_r = right_std * np.sqrt(gamma(1 / alpha) / gamma(3 / alpha))
+ return (alpha, beta_l, beta_r)
+
+
+def compute_feature(block):
+ """Compute features.
+ Args:
+ block (ndarray): 2D Image block.
+ Returns:
+ list: Features with length of 18.
+ """
+ feat = []
+ alpha, beta_l, beta_r = estimate_aggd_param(block)
+ feat.extend([alpha, (beta_l + beta_r) / 2])
+
+ # distortions disturb the fairly regular structure of natural images.
+ # This deviation can be captured by analyzing the sample distribution of
+ # the products of pairs of adjacent coefficients computed along
+ # horizontal, vertical and diagonal orientations.
+ shifts = [[0, 1], [1, 0], [1, 1], [1, -1]]
+ for i in range(len(shifts)):
+ shifted_block = np.roll(block, shifts[i], axis=(0, 1))
+ alpha, beta_l, beta_r = estimate_aggd_param(block * shifted_block)
+ # Eq. 8
+ mean = (beta_r - beta_l) * (gamma(2 / alpha) / gamma(1 / alpha))
+ feat.extend([alpha, mean, beta_l, beta_r])
+ return feat
+
+
+def niqe(img, mu_pris_param, cov_pris_param, gaussian_window, block_size_h=96, block_size_w=96):
+ """Calculate NIQE (Natural Image Quality Evaluator) metric.
+ ``Paper: Making a "Completely Blind" Image Quality Analyzer``
+ This implementation could produce almost the same results as the official
+ MATLAB codes: http://live.ece.utexas.edu/research/quality/niqe_release.zip
+ Note that we do not include block overlap height and width, since they are
+ always 0 in the official implementation.
+ For good performance, it is advisable by the official implementation to
+ divide the distorted image in to the same size patched as used for the
+ construction of multivariate Gaussian model.
+ Args:
+ img (ndarray): Input image whose quality needs to be computed. The
+ image must be a gray or Y (of YCbCr) image with shape (h, w).
+ Range [0, 255] with float type.
+ mu_pris_param (ndarray): Mean of a pre-defined multivariate Gaussian
+ model calculated on the pristine dataset.
+ cov_pris_param (ndarray): Covariance of a pre-defined multivariate
+ Gaussian model calculated on the pristine dataset.
+ gaussian_window (ndarray): A 7x7 Gaussian window used for smoothing the
+ image.
+ block_size_h (int): Height of the blocks in to which image is divided.
+ Default: 96 (the official recommended value).
+ block_size_w (int): Width of the blocks in to which image is divided.
+ Default: 96 (the official recommended value).
+ """
+ assert img.ndim == 2, ('Input image must be a gray or Y (of YCbCr) image with shape (h, w).')
+ # crop image
+ h, w = img.shape
+ num_block_h = math.floor(h / block_size_h)
+ num_block_w = math.floor(w / block_size_w)
+ img = img[0:num_block_h * block_size_h, 0:num_block_w * block_size_w]
+
+ distparam = [] # dist param is actually the multiscale features
+ for scale in (1, 2): # perform on two scales (1, 2)
+ mu = convolve(img, gaussian_window, mode='nearest')
+ sigma = np.sqrt(np.abs(convolve(np.square(img), gaussian_window, mode='nearest') - np.square(mu)))
+ # normalize, as in Eq. 1 in the paper
+ img_nomalized = (img - mu) / (sigma + 1)
+
+ feat = []
+ for idx_w in range(num_block_w):
+ for idx_h in range(num_block_h):
+ # process ecah block
+ block = img_nomalized[idx_h * block_size_h // scale:(idx_h + 1) * block_size_h // scale,
+ idx_w * block_size_w // scale:(idx_w + 1) * block_size_w // scale]
+ feat.append(compute_feature(block))
+
+ distparam.append(np.array(feat))
+
+ if scale == 1:
+ img = imresize(img / 255., scale=0.5, antialiasing=True)
+ img = img * 255.
+
+ distparam = np.concatenate(distparam, axis=1)
+
+ # fit a MVG (multivariate Gaussian) model to distorted patch features
+ mu_distparam = np.nanmean(distparam, axis=0)
+ # use nancov. ref: https://ww2.mathworks.cn/help/stats/nancov.html
+ distparam_no_nan = distparam[~np.isnan(distparam).any(axis=1)]
+ cov_distparam = np.cov(distparam_no_nan, rowvar=False)
+
+ # compute niqe quality, Eq. 10 in the paper
+ invcov_param = np.linalg.pinv((cov_pris_param + cov_distparam) / 2)
+ quality = np.matmul(
+ np.matmul((mu_pris_param - mu_distparam), invcov_param), np.transpose((mu_pris_param - mu_distparam)))
+
+ quality = np.sqrt(quality)
+ quality = float(np.squeeze(quality))
+ return quality
+
+
+def calculate_niqe(img, crop_border, params_path, input_order='HWC', convert_to='y', **kwargs):
+ """Calculate NIQE (Natural Image Quality Evaluator) metric.
+ ``Paper: Making a "Completely Blind" Image Quality Analyzer``
+ This implementation could produce almost the same results as the official
+ MATLAB codes: http://live.ece.utexas.edu/research/quality/niqe_release.zip
+ > MATLAB R2021a result for tests/data/baboon.png: 5.72957338 (5.7296)
+ > Our re-implementation result for tests/data/baboon.png: 5.7295763 (5.7296)
+ We use the official params estimated from the pristine dataset.
+ We use the recommended block size (96, 96) without overlaps.
+ Args:
+ img (ndarray): Input image whose quality needs to be computed.
+ The input image must be in range [0, 255] with float/int type.
+ The input_order of image can be 'HW' or 'HWC' or 'CHW'. (BGR order)
+ If the input order is 'HWC' or 'CHW', it will be converted to gray
+ or Y (of YCbCr) image according to the ``convert_to`` argument.
+ crop_border (int): Cropped pixels in each edge of an image. These
+ pixels are not involved in the metric calculation.
+ input_order (str): Whether the input order is 'HW', 'HWC' or 'CHW'.
+ Default: 'HWC'.
+ convert_to (str): Whether converted to 'y' (of MATLAB YCbCr) or 'gray'.
+ Default: 'y'.
+ Returns:
+ float: NIQE result.
+ """
+ # ROOT_DIR = os.path.dirname(os.path.abspath(__file__))
+ # we use the official params estimated from the pristine dataset.
+ niqe_pris_params = np.load(os.path.join(params_path, 'niqe_pris_params.npz'))
+ mu_pris_param = niqe_pris_params['mu_pris_param']
+ cov_pris_param = niqe_pris_params['cov_pris_param']
+ gaussian_window = niqe_pris_params['gaussian_window']
+
+ img = img.astype(np.float32)
+ if input_order != 'HW':
+ img = reorder_image(img, input_order=input_order)
+ if convert_to == 'y':
+ img = to_y_channel(img)
+ elif convert_to == 'gray':
+ img = cv2.cvtColor(img / 255., cv2.COLOR_BGR2GRAY) * 255.
+ img = np.squeeze(img)
+
+ if crop_border != 0:
+ img = img[crop_border:-crop_border, crop_border:-crop_border]
+
+ # round is necessary for being consistent with MATLAB's result
+ img = img.round()
+
+ niqe_result = niqe(img, mu_pris_param, cov_pris_param, gaussian_window)
+
+ return niqe_result
+
+
+def img_scissors(img, origin_size, dest_size): # 将img先裁剪为origin_size*origin_size,再resize为dest_size*dest_size
+ from skimage import io, transform
+ from skimage.util import img_as_ubyte
+
+ height, width = img.shape[:2]
+ center_y, center_x = height // 2, width // 2
+ crop_size = origin_size
+ half_crop = crop_size // 2
+ start_x = max(center_x - half_crop, 0)
+ start_y = max(center_y - half_crop, 0)
+ end_x = min(center_x + half_crop, width)
+ end_y = min(center_y + half_crop, height)
+ cropped_img = img[start_y:end_y, start_x:end_x]
+ resized_img = transform.resize(cropped_img, (dest_size, dest_size), anti_aliasing=True)
+ resized_img_ubyte = img_as_ubyte(resized_img)
+ return resized_img_ubyte
+
+def NIQE(video_origin, video_result):
+ params_path = 'pre-train-models/'
+
+ index = 0
+ niqe_origin = 0.0
+ niqe_result = 0.0
+
+ if video_origin.isOpened() and video_result.isOpened():
+ rval_origin, frame_origin = video_origin.read() # 读取视频帧
+ rval_result, frame_result = video_result.read()
+ else:
+ rval_origin = False
+ rval_result = False
+
+ while rval_origin and rval_result:
+
+ rval_origin, frame_origin = video_origin.read()
+ img_origin = img_scissors(frame_origin, 720, 512)
+ rval_result, frame_result = video_result.read()
+ img_result = frame_result
+ if img_origin is None or img_result is None:
+ break
+ else:
+ niqe_origin += calculate_niqe(img_origin, crop_border=0, params_path=params_path)
+ niqe_result += calculate_niqe(img_result, crop_border=0, params_path=params_path)
+ index += 1
+ niqe_origin /= index
+ niqe_result /= index
+ return("The source video NIQE: " + str(niqe_origin) + "\nThe hallo genarated video NIQE: " + str(niqe_result))
+
+
+if __name__ == '__main__':
+ params_path = 'pre-train-models/'
+
+ index = 0
+ example_source_video_path = '../MP4/Source'
+ example_hallo_video_path = '../MP4/Hallo'
+ example_FID_source_img_path = '../JpgForQualitative/Macron'
+ example_source_video = cv2.VideoCapture(example_source_video_path + "/Macron.mp4")
+ example_hallo_video = cv2.VideoCapture(example_hallo_video_path + "/Macron.mp4")
+
+ if example_source_video.isOpened() and example_hallo_video.isOpened():
+ rval_source, frame_source = example_source_video.read() # 读取视频帧
+ rval_hallo, frame_hallo = example_hallo_video.read()
+ else:
+ rval_source = False
+ rval_hallo = False
+
+ while rval_source and rval_hallo:
+ # 对视频的每一帧进行处理
+ rval_source, frame_source = example_source_video.read()
+ img_source = img_scissors(frame_source, 720, 512) # 对源视频的帧图像进行尺寸统一处理
+ rval_hallo, frame_hallo = example_hallo_video.read()
+ img_hallo = frame_hallo
+ if img_source is None or img_hallo is None:
+ print("Loop End.")
+ break
+ else:
+ if index % 100 == 0:
+ cv2.imwrite(example_FID_source_img_path + "/" + str(index) + ".jpg", img_source)
+ cv2.imwrite(example_FID_source_img_path + "/" + str(index) + "Res.jpg", img_hallo)
+ print(index)
+ index += 1
+
diff --git a/hallo_root/evaluate_root/Evaluate/pre-train-models/README.md b/hallo_root/evaluate_root/Evaluate/pre-train-models/README.md
new file mode 100644
index 00000000..3a1fdd85
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/pre-train-models/README.md
@@ -0,0 +1,4 @@
+
+FID model: pt_inception-2015-12-05-6726825d.pth
+
+https://drive.google.com/file/d/1FntpfZRNchRCF0LIKg2m0xOrb15oTm-k/view?usp=share_link
diff --git a/hallo_root/evaluate_root/Evaluate/pre-train-models/fid/pt_inception-2015-12-05-6726825d.pth b/hallo_root/evaluate_root/Evaluate/pre-train-models/fid/pt_inception-2015-12-05-6726825d.pth
new file mode 100644
index 00000000..346b0945
Binary files /dev/null and b/hallo_root/evaluate_root/Evaluate/pre-train-models/fid/pt_inception-2015-12-05-6726825d.pth differ
diff --git a/hallo_root/evaluate_root/Evaluate/pre-train-models/niqe_pris_params.npz b/hallo_root/evaluate_root/Evaluate/pre-train-models/niqe_pris_params.npz
new file mode 100644
index 00000000..204ddcee
Binary files /dev/null and b/hallo_root/evaluate_root/Evaluate/pre-train-models/niqe_pris_params.npz differ
diff --git a/hallo_root/evaluate_root/Evaluate/psnr_ssim.py b/hallo_root/evaluate_root/Evaluate/psnr_ssim.py
new file mode 100644
index 00000000..1b4da259
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/psnr_ssim.py
@@ -0,0 +1,260 @@
+import numpy as np
+import torch
+import cv2
+import torch.nn.functional as F
+
+from utils import reorder_image, to_y_channel, rgb2ycbcr_pt, img2tensor
+
+def calculate_psnr(img, img2, crop_border=0, input_order='HWC', test_y_channel=False, **kwargs):
+ """Calculate PSNR (Peak Signal-to-Noise Ratio).
+ Reference: https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio
+ Args:
+ img (ndarray): Images with range [0, 255].
+ img2 (ndarray): Images with range [0, 255].
+ crop_border (int): Cropped pixels in each edge of an image. These pixels are not involved in the calculation.
+ input_order (str): Whether the input order is 'HWC' or 'CHW'. Default: 'HWC'.
+ test_y_channel (bool): Test on Y channel of YCbCr. Default: False.
+ Returns:
+ float: PSNR result.
+ """
+
+ assert img.shape == img2.shape, (f'Image shapes are different: {img.shape}, {img2.shape}.')
+ if input_order not in ['HWC', 'CHW']:
+ raise ValueError(f'Wrong input_order {input_order}. Supported input_orders are "HWC" and "CHW"')
+ img = reorder_image(img, input_order=input_order)
+ img2 = reorder_image(img2, input_order=input_order)
+
+ if crop_border != 0:
+ img = img[crop_border:-crop_border, crop_border:-crop_border, ...]
+ img2 = img2[crop_border:-crop_border, crop_border:-crop_border, ...]
+
+ if test_y_channel:
+ img = to_y_channel(img)
+ img2 = to_y_channel(img2)
+
+ img = img.astype(np.float64)
+ img2 = img2.astype(np.float64)
+
+ mse = np.mean((img - img2)**2)
+ if mse == 0:
+ return float('inf')
+ return 10. * np.log10(255. * 255. / mse)
+
+def calculate_psnr_pt(img, img2, crop_border=0, test_y_channel=False, **kwargs):
+ """Calculate PSNR (Peak Signal-to-Noise Ratio) (PyTorch version).
+ Reference: https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio
+ Args:
+ img (Tensor): Images with range [0, 1], shape (n, 3/1, h, w).
+ img2 (Tensor): Images with range [0, 1], shape (n, 3/1, h, w).
+ crop_border (int): Cropped pixels in each edge of an image. These pixels are not involved in the calculation.
+ test_y_channel (bool): Test on Y channel of YCbCr. Default: False.
+ Returns:
+ float: PSNR result.
+ """
+
+ assert img.shape == img2.shape, (f'Image shapes are different: {img.shape}, {img2.shape}.')
+
+ if crop_border != 0:
+ img = img[:, :, crop_border:-crop_border, crop_border:-crop_border]
+ img2 = img2[:, :, crop_border:-crop_border, crop_border:-crop_border]
+
+ if test_y_channel:
+ img = rgb2ycbcr_pt(img, y_only=True)
+ img2 = rgb2ycbcr_pt(img2, y_only=True)
+
+ img = img.to(torch.float64)
+ img2 = img2.to(torch.float64)
+
+ mse = torch.mean((img - img2)**2, dim=[1, 2, 3])
+ return 10. * torch.log10(1. / (mse + 1e-8))
+
+
+def _ssim(img, img2):
+ """Calculate SSIM (structural similarity) for one channel images.
+ It is called by func:`calculate_ssim`.
+ Args:
+ img (ndarray): Images with range [0, 255] with order 'HWC'.
+ img2 (ndarray): Images with range [0, 255] with order 'HWC'.
+ Returns:
+ float: SSIM result.
+ """
+
+ c1 = (0.01 * 255)**2
+ c2 = (0.03 * 255)**2
+ kernel = cv2.getGaussianKernel(11, 1.5)
+ window = np.outer(kernel, kernel.transpose())
+
+ mu1 = cv2.filter2D(img, -1, window)[5:-5, 5:-5] # valid mode for window size 11
+ mu2 = cv2.filter2D(img2, -1, window)[5:-5, 5:-5]
+ mu1_sq = mu1**2
+ mu2_sq = mu2**2
+ mu1_mu2 = mu1 * mu2
+ sigma1_sq = cv2.filter2D(img**2, -1, window)[5:-5, 5:-5] - mu1_sq
+ sigma2_sq = cv2.filter2D(img2**2, -1, window)[5:-5, 5:-5] - mu2_sq
+ sigma12 = cv2.filter2D(img * img2, -1, window)[5:-5, 5:-5] - mu1_mu2
+
+ ssim_map = ((2 * mu1_mu2 + c1) * (2 * sigma12 + c2)) / ((mu1_sq + mu2_sq + c1) * (sigma1_sq + sigma2_sq + c2))
+ return ssim_map.mean()
+
+def _ssim_pth(img, img2):
+ """Calculate SSIM (structural similarity) (PyTorch version).
+ It is called by func:`calculate_ssim_pt`.
+ Args:
+ img (Tensor): Images with range [0, 1], shape (n, 3/1, h, w).
+ img2 (Tensor): Images with range [0, 1], shape (n, 3/1, h, w).
+ Returns:
+ float: SSIM result.
+ """
+ c1 = (0.01 * 255)**2
+ c2 = (0.03 * 255)**2
+
+ kernel = cv2.getGaussianKernel(11, 1.5)
+ window = np.outer(kernel, kernel.transpose())
+ window = torch.from_numpy(window).view(1, 1, 11, 11).expand(img.size(1), 1, 11, 11).to(img.dtype).to(img.device)
+
+ mu1 = F.conv2d(img, window, stride=1, padding=0, groups=img.shape[1]) # valid mode
+ mu2 = F.conv2d(img2, window, stride=1, padding=0, groups=img2.shape[1]) # valid mode
+ mu1_sq = mu1.pow(2)
+ mu2_sq = mu2.pow(2)
+ mu1_mu2 = mu1 * mu2
+ sigma1_sq = F.conv2d(img * img, window, stride=1, padding=0, groups=img.shape[1]) - mu1_sq
+ sigma2_sq = F.conv2d(img2 * img2, window, stride=1, padding=0, groups=img.shape[1]) - mu2_sq
+ sigma12 = F.conv2d(img * img2, window, stride=1, padding=0, groups=img.shape[1]) - mu1_mu2
+
+ cs_map = (2 * sigma12 + c2) / (sigma1_sq + sigma2_sq + c2)
+ ssim_map = ((2 * mu1_mu2 + c1) / (mu1_sq + mu2_sq + c1)) * cs_map
+ return ssim_map.mean([1, 2, 3])
+
+
+def calculate_ssim(img, img2, crop_border=0, input_order='HWC', test_y_channel=False, **kwargs):
+ """Calculate SSIM (structural similarity).
+ ``Paper: Image quality assessment: From error visibility to structural similarity``
+ The results are the same as that of the official released MATLAB code in
+ https://ece.uwaterloo.ca/~z70wang/research/ssim/.
+ For three-channel images, SSIM is calculated for each channel and then
+ averaged.
+ Args:
+ img (ndarray): Images with range [0, 255].
+ img2 (ndarray): Images with range [0, 255].
+ crop_border (int): Cropped pixels in each edge of an image. These pixels are not involved in the calculation.
+ input_order (str): Whether the input order is 'HWC' or 'CHW'.
+ Default: 'HWC'.
+ test_y_channel (bool): Test on Y channel of YCbCr. Default: False.
+ Returns:
+ float: SSIM result.
+ """
+
+ assert img.shape == img2.shape, (f'Image shapes are different: {img.shape}, {img2.shape}.')
+ if input_order not in ['HWC', 'CHW']:
+ raise ValueError(f'Wrong input_order {input_order}. Supported input_orders are "HWC" and "CHW"')
+ img = reorder_image(img, input_order=input_order)
+ img2 = reorder_image(img2, input_order=input_order)
+
+ if crop_border != 0:
+ img = img[crop_border:-crop_border, crop_border:-crop_border, ...]
+ img2 = img2[crop_border:-crop_border, crop_border:-crop_border, ...]
+
+ if test_y_channel:
+ img = to_y_channel(img)
+ img2 = to_y_channel(img2)
+
+ img = img.astype(np.float64)
+ img2 = img2.astype(np.float64)
+
+ ssims = []
+ for i in range(img.shape[2]):
+ ssims.append(_ssim(img[..., i], img2[..., i]))
+ return np.array(ssims).mean()
+
+
+def calculate_ssim_pt(img, img2, crop_border=0, test_y_channel=False, **kwargs):
+ """Calculate SSIM (structural similarity) (PyTorch version).
+ ``Paper: Image quality assessment: From error visibility to structural similarity``
+ The results are the same as that of the official released MATLAB code in
+ https://ece.uwaterloo.ca/~z70wang/research/ssim/.
+ For three-channel images, SSIM is calculated for each channel and then
+ averaged.
+ Args:
+ img (Tensor): Images with range [0, 1], shape (n, 3/1, h, w).
+ img2 (Tensor): Images with range [0, 1], shape (n, 3/1, h, w).
+ crop_border (int): Cropped pixels in each edge of an image. These pixels are not involved in the calculation.
+ test_y_channel (bool): Test on Y channel of YCbCr. Default: False.
+ Returns:
+ float: SSIM result.
+ """
+
+ assert img.shape == img2.shape, (f'Image shapes are different: {img.shape}, {img2.shape}.')
+
+ if crop_border != 0:
+ img = img[:, :, crop_border:-crop_border, crop_border:-crop_border]
+ img2 = img2[:, :, crop_border:-crop_border, crop_border:-crop_border]
+
+ if test_y_channel:
+ img = rgb2ycbcr_pt(img, y_only=True)
+ img2 = rgb2ycbcr_pt(img2, y_only=True)
+
+ img = img.to(torch.float64)
+ img2 = img2.to(torch.float64)
+
+ ssim = _ssim_pth(img * 255., img2 * 255.)
+ return ssim
+
+
+def img_scissors(img, origin_size, dest_size): # 将img先裁剪为origin_size*origin_size,再resize为dest_size*dest_size
+ from skimage import io, transform
+ from skimage.util import img_as_ubyte
+
+ height, width = img.shape[:2]
+ center_y, center_x = height // 2, width // 2
+ crop_size = origin_size
+ half_crop = crop_size // 2
+ start_x = max(center_x - half_crop, 0)
+ start_y = max(center_y - half_crop, 0)
+ end_x = min(center_x + half_crop, width)
+ end_y = min(center_y + half_crop, height)
+ cropped_img = img[start_y:end_y, start_x:end_x]
+ resized_img = transform.resize(cropped_img, (dest_size, dest_size), anti_aliasing=True)
+ resized_img_ubyte = img_as_ubyte(resized_img)
+ return resized_img_ubyte
+
+
+if __name__ == '__main__':
+ params_path = 'pre-train-models/'
+ '''
+ PSNR 是用来衡量图像之间的差异程度的指标,基于均方误差(MSE)。PSNR值越高,图像质量越好。
+ 通常情况下,30 dB 以上被认为是较高质量。20-30 dB 之间为可接受的质量。低于 20 dB 则质量较差。
+
+ SSIM 是一种衡量两幅图像在亮度、对比度和结构信息上的相似性的方法,SSIM值在 [0, 1] 之间,1 表示完全相同。
+ 0.8 以上表示较好的结构相似度。
+ '''
+ video_origin = cv2.VideoCapture("MP4/Shaheen.mp4") # 在这里修改路径,改为想要计算的视频路径,video_origin即原视频
+ video_result = cv2.VideoCapture("MP4/ShaheenRes.mp4") # 在这里修改路径,改为想要计算的视频路径,video_result即hallo生成的视频
+ index = 0
+ PSNR = 0.0
+ SSIM = 0.0
+ if video_origin.isOpened() and video_result.isOpened():
+ rval_origin, frame_origin = video_origin.read() # 读取视频帧
+ rval_result, frame_result = video_result.read()
+ else:
+ rval_origin = False
+ rval_result = False
+
+ while rval_origin and rval_result:
+ print(index)
+ rval_origin, frame_origin = video_origin.read()
+ img_origin = img_scissors(frame_origin, 720, 512)
+ rval_result, frame_result = video_result.read()
+ img_result = frame_result
+ if img_origin is None or img_result is None:
+ print("Loop End.")
+ break
+ else:
+ PSNR_temp = calculate_psnr(img_origin, img_result, crop_border=0)
+ PSNR += PSNR_temp
+ SSIM_temp = calculate_ssim(img_origin, img_result, crop_border=0)
+ SSIM += SSIM_temp
+ print(PSNR_temp, SSIM_temp)
+ index += 1
+ PSNR /= index
+ SSIM /= index
+ print(PSNR, SSIM)
diff --git a/hallo_root/evaluate_root/Evaluate/shape_predictor_68_face_landmarks.dat b/hallo_root/evaluate_root/Evaluate/shape_predictor_68_face_landmarks.dat
new file mode 100644
index 00000000..e0ec20d6
Binary files /dev/null and b/hallo_root/evaluate_root/Evaluate/shape_predictor_68_face_landmarks.dat differ
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/.gitignore b/hallo_root/evaluate_root/Evaluate/syncnet_python/.gitignore
new file mode 100644
index 00000000..350ada00
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/.gitignore
@@ -0,0 +1,45 @@
+# Compiled source #
+###################
+*.com
+*.class
+*.dll
+*.exe
+*.o
+*.so
+*.pyc
+
+# Packages #
+############
+# it's better to unpack these files and commit the raw source
+# git has its own built in compression methods
+*.7z
+*.dmg
+*.gz
+*.iso
+*.jar
+*.rar
+*.tar
+*.zip
+
+# Logs and databases #
+######################
+*.log
+*.sql
+*.sqlite
+
+# OS generated files #
+######################
+.DS_Store
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db
+
+# Specific to this demo #
+#########################
+data/
+protos/
+utils/
+*.pth
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/LICENSE.md b/hallo_root/evaluate_root/Evaluate/syncnet_python/LICENSE.md
new file mode 100644
index 00000000..de4a5458
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/LICENSE.md
@@ -0,0 +1,19 @@
+Copyright (c) 2016-present Joon Son Chung.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/README.md b/hallo_root/evaluate_root/Evaluate/syncnet_python/README.md
new file mode 100644
index 00000000..7da53541
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/README.md
@@ -0,0 +1,59 @@
+# SyncNet
+
+This repository contains the demo for the audio-to-video synchronisation network (SyncNet). This network can be used for audio-visual synchronisation tasks including:
+1. Removing temporal lags between the audio and visual streams in a video;
+2. Determining who is speaking amongst multiple faces in a video.
+
+Please cite the paper below if you make use of the software.
+
+## Dependencies
+```
+pip install -r requirements.txt
+```
+
+In addition, `ffmpeg` is required.
+
+
+## Demo
+
+SyncNet demo:
+```
+python demo_syncnet.py --videofile data/example.avi --tmp_dir /path/to/temp/directory
+```
+
+Check that this script returns:
+```
+AV offset: 3
+Min dist: 5.353
+Confidence: 10.021
+```
+
+Full pipeline:
+```
+sh download_model.sh
+python run_pipeline.py --videofile /path/to/video.mp4 --reference name_of_video --data_dir /path/to/output
+python run_syncnet.py --videofile /path/to/video.mp4 --reference name_of_video --data_dir /path/to/output
+python run_visualise.py --videofile /path/to/video.mp4 --reference name_of_video --data_dir /path/to/output
+```
+
+Outputs:
+```
+$DATA_DIR/pycrop/$REFERENCE/*.avi - cropped face tracks
+$DATA_DIR/pywork/$REFERENCE/offsets.txt - audio-video offset values
+$DATA_DIR/pyavi/$REFERENCE/video_out.avi - output video (as shown below)
+```
+
+
+
+
+
+## Publications
+
+```
+@InProceedings{Chung16a,
+ author = "Chung, J.~S. and Zisserman, A.",
+ title = "Out of time: automated lip sync in the wild",
+ booktitle = "Workshop on Multi-view Lip-reading, ACCV",
+ year = "2016",
+}
+```
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/SyncNetInstance.py b/hallo_root/evaluate_root/Evaluate/syncnet_python/SyncNetInstance.py
new file mode 100644
index 00000000..497d44fc
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/SyncNetInstance.py
@@ -0,0 +1,208 @@
+#!/usr/bin/python
+#-*- coding: utf-8 -*-
+# Video 25 FPS, Audio 16000HZ
+
+import torch
+import numpy
+import time, pdb, argparse, subprocess, os, math, glob
+import cv2
+import python_speech_features
+
+from scipy import signal
+from scipy.io import wavfile
+from SyncNetModel import *
+from shutil import rmtree
+
+
+# ==================== Get OFFSET ====================
+
+def calc_pdist(feat1, feat2, vshift=10):
+
+ win_size = vshift*2+1
+
+ feat2p = torch.nn.functional.pad(feat2,(0,0,vshift,vshift))
+
+ dists = []
+
+ for i in range(0,len(feat1)):
+
+ dists.append(torch.nn.functional.pairwise_distance(feat1[[i],:].repeat(win_size, 1), feat2p[i:i+win_size,:]))
+
+ return dists
+
+# ==================== MAIN DEF ====================
+
+class SyncNetInstance(torch.nn.Module):
+
+ def __init__(self, dropout = 0, num_layers_in_fc_layers = 1024):
+ super(SyncNetInstance, self).__init__();
+
+ self.__S__ = S(num_layers_in_fc_layers = num_layers_in_fc_layers).cuda();
+
+ def evaluate(self, opt, videofile):
+
+ self.__S__.eval();
+
+ # ========== ==========
+ # Convert files
+ # ========== ==========
+
+ if os.path.exists(os.path.join(opt.tmp_dir,opt.reference)):
+ rmtree(os.path.join(opt.tmp_dir,opt.reference))
+
+ os.makedirs(os.path.join(opt.tmp_dir,opt.reference))
+
+ command = ("ffmpeg -y -i %s -threads 1 -f image2 %s" % (videofile,os.path.join(opt.tmp_dir,opt.reference,'%06d.jpg')))
+ output = subprocess.call(command, shell=True, stdout=None)
+
+ command = ("ffmpeg -y -i %s -async 1 -ac 1 -vn -acodec pcm_s16le -ar 16000 %s" % (videofile,os.path.join(opt.tmp_dir,opt.reference,'audio.wav')))
+ output = subprocess.call(command, shell=True, stdout=None)
+
+ # ========== ==========
+ # Load video
+ # ========== ==========
+
+ images = []
+
+ flist = glob.glob(os.path.join(opt.tmp_dir,opt.reference,'*.jpg'))
+ flist.sort()
+
+ for fname in flist:
+ images.append(cv2.imread(fname))
+
+ im = numpy.stack(images,axis=3)
+ im = numpy.expand_dims(im,axis=0)
+ im = numpy.transpose(im,(0,3,4,1,2))
+
+ imtv = torch.autograd.Variable(torch.from_numpy(im.astype(float)).float())
+
+ # ========== ==========
+ # Load audio
+ # ========== ==========
+
+ sample_rate, audio = wavfile.read(os.path.join(opt.tmp_dir,opt.reference,'audio.wav'))
+ mfcc = zip(*python_speech_features.mfcc(audio,sample_rate))
+ mfcc = numpy.stack([numpy.array(i) for i in mfcc])
+
+ cc = numpy.expand_dims(numpy.expand_dims(mfcc,axis=0),axis=0)
+ cct = torch.autograd.Variable(torch.from_numpy(cc.astype(float)).float())
+
+ # ========== ==========
+ # Check audio and video input length
+ # ========== ==========
+
+ if (float(len(audio))/16000) != (float(len(images))/25) :
+ print("WARNING: Audio (%.4fs) and video (%.4fs) lengths are different."%(float(len(audio))/16000,float(len(images))/25))
+
+ min_length = min(len(images),math.floor(len(audio)/640))
+
+ # ========== ==========
+ # Generate video and audio feats
+ # ========== ==========
+
+ lastframe = min_length-5
+ im_feat = []
+ cc_feat = []
+
+ tS = time.time()
+ for i in range(0,lastframe,opt.batch_size):
+
+ im_batch = [ imtv[:,:,vframe:vframe+5,:,:] for vframe in range(i,min(lastframe,i+opt.batch_size)) ]
+ im_in = torch.cat(im_batch,0)
+ im_out = self.__S__.forward_lip(im_in.cuda());
+ im_feat.append(im_out.data.cpu())
+
+ cc_batch = [ cct[:,:,:,vframe*4:vframe*4+20] for vframe in range(i,min(lastframe,i+opt.batch_size)) ]
+ cc_in = torch.cat(cc_batch,0)
+ cc_out = self.__S__.forward_aud(cc_in.cuda())
+ cc_feat.append(cc_out.data.cpu())
+
+ im_feat = torch.cat(im_feat,0)
+ cc_feat = torch.cat(cc_feat,0)
+
+ # ========== ==========
+ # Compute offset
+ # ========== ==========
+
+ print('Compute time %.3f sec.' % (time.time()-tS))
+
+ dists = calc_pdist(im_feat,cc_feat,vshift=opt.vshift)
+ mdist = torch.mean(torch.stack(dists,1),1)
+
+ minval, minidx = torch.min(mdist,0)
+
+ offset = opt.vshift-minidx
+ conf = torch.median(mdist) - minval
+
+ fdist = numpy.stack([dist[minidx].numpy() for dist in dists])
+ # fdist = numpy.pad(fdist, (3,3), 'constant', constant_values=15)
+ fconf = torch.median(mdist).numpy() - fdist
+ fconfm = signal.medfilt(fconf,kernel_size=9)
+
+ numpy.set_printoptions(formatter={'float': '{: 0.3f}'.format})
+ print('Framewise conf: ')
+ print(fconfm)
+ print('AV offset: \t%d \nMin dist: \t%.3f\nConfidence: \t%.3f' % (offset,minval,conf))
+
+ dists_npy = numpy.array([ dist.numpy() for dist in dists ])
+ return offset.numpy(), conf.numpy(), dists_npy
+
+ def extract_feature(self, opt, videofile):
+
+ self.__S__.eval();
+
+ # ========== ==========
+ # Load video
+ # ========== ==========
+ cap = cv2.VideoCapture(videofile)
+
+ frame_num = 1;
+ images = []
+ while frame_num:
+ frame_num += 1
+ ret, image = cap.read()
+ if ret == 0:
+ break
+
+ images.append(image)
+
+ im = numpy.stack(images,axis=3)
+ im = numpy.expand_dims(im,axis=0)
+ im = numpy.transpose(im,(0,3,4,1,2))
+
+ imtv = torch.autograd.Variable(torch.from_numpy(im.astype(float)).float())
+
+ # ========== ==========
+ # Generate video feats
+ # ========== ==========
+
+ lastframe = len(images)-4
+ im_feat = []
+
+ tS = time.time()
+ for i in range(0,lastframe,opt.batch_size):
+
+ im_batch = [ imtv[:,:,vframe:vframe+5,:,:] for vframe in range(i,min(lastframe,i+opt.batch_size)) ]
+ im_in = torch.cat(im_batch,0)
+ im_out = self.__S__.forward_lipfeat(im_in.cuda());
+ im_feat.append(im_out.data.cpu())
+
+ im_feat = torch.cat(im_feat,0)
+
+ # ========== ==========
+ # Compute offset
+ # ========== ==========
+
+ print('Compute time %.3f sec.' % (time.time()-tS))
+
+ return im_feat
+
+
+ def loadParameters(self, path):
+ loaded_state = torch.load(path, map_location=lambda storage, loc: storage);
+
+ self_state = self.__S__.state_dict();
+
+ for name, param in loaded_state.items():
+
+ self_state[name].copy_(param);
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/SyncNetInstance_calc_scores.py b/hallo_root/evaluate_root/Evaluate/syncnet_python/SyncNetInstance_calc_scores.py
new file mode 100644
index 00000000..64906e25
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/SyncNetInstance_calc_scores.py
@@ -0,0 +1,210 @@
+#!/usr/bin/python
+#-*- coding: utf-8 -*-
+# Video 25 FPS, Audio 16000HZ
+
+import torch
+import numpy
+import time, pdb, argparse, subprocess, os, math, glob
+import cv2
+import python_speech_features
+
+from scipy import signal
+from scipy.io import wavfile
+from SyncNetModel import *
+from shutil import rmtree
+
+
+# ==================== Get OFFSET ====================
+
+def calc_pdist(feat1, feat2, vshift=10):
+
+ win_size = vshift*2+1
+
+ feat2p = torch.nn.functional.pad(feat2,(0,0,vshift,vshift))
+
+ dists = []
+
+ for i in range(0,len(feat1)):
+
+ dists.append(torch.nn.functional.pairwise_distance(feat1[[i],:].repeat(win_size, 1), feat2p[i:i+win_size,:]))
+
+ return dists
+
+# ==================== MAIN DEF ====================
+
+class SyncNetInstance(torch.nn.Module):
+
+ def __init__(self, dropout = 0, num_layers_in_fc_layers = 1024):
+ super(SyncNetInstance, self).__init__();
+
+ self.__S__ = S(num_layers_in_fc_layers = num_layers_in_fc_layers).cuda();
+
+ def evaluate(self, opt, videofile):
+
+ self.__S__.eval();
+
+ # ========== ==========
+ # Convert files
+ # ========== ==========
+
+ if os.path.exists(os.path.join(opt.tmp_dir,opt.reference)):
+ rmtree(os.path.join(opt.tmp_dir,opt.reference))
+
+ os.makedirs(os.path.join(opt.tmp_dir,opt.reference))
+
+ command = ("ffmpeg -loglevel error -y -i %s -threads 1 -f image2 %s" % (videofile,os.path.join(opt.tmp_dir,opt.reference,'%06d.jpg')))
+ output = subprocess.call(command, shell=True, stdout=None)
+
+ command = ("ffmpeg -loglevel error -y -i %s -async 1 -ac 1 -vn -acodec pcm_s16le -ar 16000 %s" % (videofile,os.path.join(opt.tmp_dir,opt.reference,'audio.wav')))
+ output = subprocess.call(command, shell=True, stdout=None)
+
+ # ========== ==========
+ # Load video
+ # ========== ==========
+
+ images = []
+
+ flist = glob.glob(os.path.join(opt.tmp_dir,opt.reference,'*.jpg'))
+ flist.sort()
+
+ for fname in flist:
+ img_input = cv2.imread(fname)
+ img_input = cv2.resize(img_input, (224,224)) #HARD CODED, CHANGE BEFORE RELEASE
+ images.append(img_input)
+
+ im = numpy.stack(images,axis=3)
+ im = numpy.expand_dims(im,axis=0)
+ im = numpy.transpose(im,(0,3,4,1,2))
+
+ imtv = torch.autograd.Variable(torch.from_numpy(im.astype(float)).float())
+
+ # ========== ==========
+ # Load audio
+ # ========== ==========
+
+ sample_rate, audio = wavfile.read(os.path.join(opt.tmp_dir,opt.reference,'audio.wav'))
+ mfcc = zip(*python_speech_features.mfcc(audio,sample_rate))
+ mfcc = numpy.stack([numpy.array(i) for i in mfcc])
+
+ cc = numpy.expand_dims(numpy.expand_dims(mfcc,axis=0),axis=0)
+ cct = torch.autograd.Variable(torch.from_numpy(cc.astype(float)).float())
+
+ # ========== ==========
+ # Check audio and video input length
+ # ========== ==========
+
+ #if (float(len(audio))/16000) != (float(len(images))/25) :
+ # print("WARNING: Audio (%.4fs) and video (%.4fs) lengths are different."%(float(len(audio))/16000,float(len(images))/25))
+
+ min_length = min(len(images),math.floor(len(audio)/640))
+
+ # ========== ==========
+ # Generate video and audio feats
+ # ========== ==========
+
+ lastframe = min_length-5
+ im_feat = []
+ cc_feat = []
+
+ tS = time.time()
+ for i in range(0,lastframe,opt.batch_size):
+
+ im_batch = [ imtv[:,:,vframe:vframe+5,:,:] for vframe in range(i,min(lastframe,i+opt.batch_size)) ]
+ im_in = torch.cat(im_batch,0)
+ im_out = self.__S__.forward_lip(im_in.cuda());
+ im_feat.append(im_out.data.cpu())
+
+ cc_batch = [ cct[:,:,:,vframe*4:vframe*4+20] for vframe in range(i,min(lastframe,i+opt.batch_size)) ]
+ cc_in = torch.cat(cc_batch,0)
+ cc_out = self.__S__.forward_aud(cc_in.cuda())
+ cc_feat.append(cc_out.data.cpu())
+
+ im_feat = torch.cat(im_feat,0)
+ cc_feat = torch.cat(cc_feat,0)
+
+ # ========== ==========
+ # Compute offset
+ # ========== ==========
+
+ #print('Compute time %.3f sec.' % (time.time()-tS))
+
+ dists = calc_pdist(im_feat,cc_feat,vshift=opt.vshift)
+ mdist = torch.mean(torch.stack(dists,1),1)
+
+ minval, minidx = torch.min(mdist,0)
+
+ offset = opt.vshift-minidx
+ conf = torch.median(mdist) - minval
+
+ fdist = numpy.stack([dist[minidx].numpy() for dist in dists])
+ # fdist = numpy.pad(fdist, (3,3), 'constant', constant_values=15)
+ fconf = torch.median(mdist).numpy() - fdist
+ fconfm = signal.medfilt(fconf,kernel_size=9)
+
+ numpy.set_printoptions(formatter={'float': '{: 0.3f}'.format})
+ #print('Framewise conf: ')
+ #print(fconfm)
+ #print('AV offset: \t%d \nMin dist: \t%.3f\nConfidence: \t%.3f' % (offset,minval,conf))
+
+ dists_npy = numpy.array([ dist.numpy() for dist in dists ])
+ return offset.numpy(), conf.numpy(), minval.numpy()
+
+ def extract_feature(self, opt, videofile):
+
+ self.__S__.eval();
+
+ # ========== ==========
+ # Load video
+ # ========== ==========
+ cap = cv2.VideoCapture(videofile)
+
+ frame_num = 1;
+ images = []
+ while frame_num:
+ frame_num += 1
+ ret, image = cap.read()
+ if ret == 0:
+ break
+
+ images.append(image)
+
+ im = numpy.stack(images,axis=3)
+ im = numpy.expand_dims(im,axis=0)
+ im = numpy.transpose(im,(0,3,4,1,2))
+
+ imtv = torch.autograd.Variable(torch.from_numpy(im.astype(float)).float())
+
+ # ========== ==========
+ # Generate video feats
+ # ========== ==========
+
+ lastframe = len(images)-4
+ im_feat = []
+
+ tS = time.time()
+ for i in range(0,lastframe,opt.batch_size):
+
+ im_batch = [ imtv[:,:,vframe:vframe+5,:,:] for vframe in range(i,min(lastframe,i+opt.batch_size)) ]
+ im_in = torch.cat(im_batch,0)
+ im_out = self.__S__.forward_lipfeat(im_in.cuda());
+ im_feat.append(im_out.data.cpu())
+
+ im_feat = torch.cat(im_feat,0)
+
+ # ========== ==========
+ # Compute offset
+ # ========== ==========
+
+ print('Compute time %.3f sec.' % (time.time()-tS))
+
+ return im_feat
+
+
+ def loadParameters(self, path):
+ loaded_state = torch.load(path, map_location=lambda storage, loc: storage);
+
+ self_state = self.__S__.state_dict();
+
+ for name, param in loaded_state.items():
+
+ self_state[name].copy_(param);
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/SyncNetModel.py b/hallo_root/evaluate_root/Evaluate/syncnet_python/SyncNetModel.py
new file mode 100644
index 00000000..c21ce25c
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/SyncNetModel.py
@@ -0,0 +1,117 @@
+#!/usr/bin/python
+#-*- coding: utf-8 -*-
+
+import torch
+import torch.nn as nn
+
+def save(model, filename):
+ with open(filename, "wb") as f:
+ torch.save(model, f);
+ print("%s saved."%filename);
+
+def load(filename):
+ net = torch.load(filename)
+ return net;
+
+class S(nn.Module):
+ def __init__(self, num_layers_in_fc_layers = 1024):
+ super(S, self).__init__();
+
+ self.__nFeatures__ = 24;
+ self.__nChs__ = 32;
+ self.__midChs__ = 32;
+
+ self.netcnnaud = nn.Sequential(
+ nn.Conv2d(1, 64, kernel_size=(3,3), stride=(1,1), padding=(1,1)),
+ nn.BatchNorm2d(64),
+ nn.ReLU(inplace=True),
+ nn.MaxPool2d(kernel_size=(1,1), stride=(1,1)),
+
+ nn.Conv2d(64, 192, kernel_size=(3,3), stride=(1,1), padding=(1,1)),
+ nn.BatchNorm2d(192),
+ nn.ReLU(inplace=True),
+ nn.MaxPool2d(kernel_size=(3,3), stride=(1,2)),
+
+ nn.Conv2d(192, 384, kernel_size=(3,3), padding=(1,1)),
+ nn.BatchNorm2d(384),
+ nn.ReLU(inplace=True),
+
+ nn.Conv2d(384, 256, kernel_size=(3,3), padding=(1,1)),
+ nn.BatchNorm2d(256),
+ nn.ReLU(inplace=True),
+
+ nn.Conv2d(256, 256, kernel_size=(3,3), padding=(1,1)),
+ nn.BatchNorm2d(256),
+ nn.ReLU(inplace=True),
+ nn.MaxPool2d(kernel_size=(3,3), stride=(2,2)),
+
+ nn.Conv2d(256, 512, kernel_size=(5,4), padding=(0,0)),
+ nn.BatchNorm2d(512),
+ nn.ReLU(),
+ );
+
+ self.netfcaud = nn.Sequential(
+ nn.Linear(512, 512),
+ nn.BatchNorm1d(512),
+ nn.ReLU(),
+ nn.Linear(512, num_layers_in_fc_layers),
+ );
+
+ self.netfclip = nn.Sequential(
+ nn.Linear(512, 512),
+ nn.BatchNorm1d(512),
+ nn.ReLU(),
+ nn.Linear(512, num_layers_in_fc_layers),
+ );
+
+ self.netcnnlip = nn.Sequential(
+ nn.Conv3d(3, 96, kernel_size=(5,7,7), stride=(1,2,2), padding=0),
+ nn.BatchNorm3d(96),
+ nn.ReLU(inplace=True),
+ nn.MaxPool3d(kernel_size=(1,3,3), stride=(1,2,2)),
+
+ nn.Conv3d(96, 256, kernel_size=(1,5,5), stride=(1,2,2), padding=(0,1,1)),
+ nn.BatchNorm3d(256),
+ nn.ReLU(inplace=True),
+ nn.MaxPool3d(kernel_size=(1,3,3), stride=(1,2,2), padding=(0,1,1)),
+
+ nn.Conv3d(256, 256, kernel_size=(1,3,3), padding=(0,1,1)),
+ nn.BatchNorm3d(256),
+ nn.ReLU(inplace=True),
+
+ nn.Conv3d(256, 256, kernel_size=(1,3,3), padding=(0,1,1)),
+ nn.BatchNorm3d(256),
+ nn.ReLU(inplace=True),
+
+ nn.Conv3d(256, 256, kernel_size=(1,3,3), padding=(0,1,1)),
+ nn.BatchNorm3d(256),
+ nn.ReLU(inplace=True),
+ nn.MaxPool3d(kernel_size=(1,3,3), stride=(1,2,2)),
+
+ nn.Conv3d(256, 512, kernel_size=(1,6,6), padding=0),
+ nn.BatchNorm3d(512),
+ nn.ReLU(inplace=True),
+ );
+
+ def forward_aud(self, x):
+
+ mid = self.netcnnaud(x); # N x ch x 24 x M
+ mid = mid.view((mid.size()[0], -1)); # N x (ch x 24)
+ out = self.netfcaud(mid);
+
+ return out;
+
+ def forward_lip(self, x):
+
+ mid = self.netcnnlip(x);
+ mid = mid.view((mid.size()[0], -1)); # N x (ch x 24)
+ out = self.netfclip(mid);
+
+ return out;
+
+ def forward_lipfeat(self, x):
+
+ mid = self.netcnnlip(x);
+ out = mid.view((mid.size()[0], -1)); # N x (ch x 24)
+
+ return out;
\ No newline at end of file
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/calculate_scores_LRS.py b/hallo_root/evaluate_root/Evaluate/syncnet_python/calculate_scores_LRS.py
new file mode 100644
index 00000000..eda02b8f
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/calculate_scores_LRS.py
@@ -0,0 +1,53 @@
+#!/usr/bin/python
+#-*- coding: utf-8 -*-
+
+import time, pdb, argparse, subprocess
+import glob
+import os
+from tqdm import tqdm
+
+from SyncNetInstance_calc_scores import *
+
+# ==================== LOAD PARAMS ====================
+
+
+parser = argparse.ArgumentParser(description = "SyncNet");
+
+parser.add_argument('--initial_model', type=str, default="data/syncnet_v2.model", help='');
+parser.add_argument('--batch_size', type=int, default='20', help='');
+parser.add_argument('--vshift', type=int, default='15', help='');
+parser.add_argument('--data_root', type=str, required=True, help='');
+parser.add_argument('--tmp_dir', type=str, default="data/work/pytmp", help='');
+parser.add_argument('--reference', type=str, default="demo", help='');
+
+opt = parser.parse_args();
+
+
+# ==================== RUN EVALUATION ====================
+
+s = SyncNetInstance();
+
+s.loadParameters(opt.initial_model);
+#print("Model %s loaded."%opt.initial_model);
+path = os.path.join(opt.data_root, "*.mp4")
+
+all_videos = glob.glob(path)
+
+prog_bar = tqdm(range(len(all_videos)))
+avg_confidence = 0.
+avg_min_distance = 0.
+
+
+for videofile_idx in prog_bar:
+ videofile = all_videos[videofile_idx]
+ offset, confidence, min_distance = s.evaluate(opt, videofile=videofile)
+ avg_confidence += confidence
+ avg_min_distance += min_distance
+ prog_bar.set_description('Avg Confidence: {}, Avg Minimum Dist: {}'.format(round(avg_confidence / (videofile_idx + 1), 3), round(avg_min_distance / (videofile_idx + 1), 3)))
+ prog_bar.refresh()
+
+print ('Average Confidence: {}'.format(avg_confidence/len(all_videos)))
+print ('Average Minimum Distance: {}'.format(avg_min_distance/len(all_videos)))
+
+
+
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/calculate_scores_real_videos.py b/hallo_root/evaluate_root/Evaluate/syncnet_python/calculate_scores_real_videos.py
new file mode 100644
index 00000000..4839a6ae
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/calculate_scores_real_videos.py
@@ -0,0 +1,46 @@
+#!/usr/bin/python
+#-*- coding: utf-8 -*-
+
+import time, pdb, argparse, subprocess, pickle, os, gzip, glob
+
+from SyncNetInstance_calc_scores import *
+
+# ==================== PARSE ARGUMENT ====================
+
+parser = argparse.ArgumentParser(description = "SyncNet");
+parser.add_argument('--initial_model', type=str, default="data/syncnet_v2.model", help='');
+parser.add_argument('--batch_size', type=int, default='20', help='');
+parser.add_argument('--vshift', type=int, default='15', help='');
+parser.add_argument('--data_dir', type=str, default='data/work', help='');
+parser.add_argument('--videofile', type=str, default='', help='');
+parser.add_argument('--reference', type=str, default='', help='');
+opt = parser.parse_args();
+
+setattr(opt,'avi_dir',os.path.join(opt.data_dir,'pyavi'))
+setattr(opt,'tmp_dir',os.path.join(opt.data_dir,'pytmp'))
+setattr(opt,'work_dir',os.path.join(opt.data_dir,'pywork'))
+setattr(opt,'crop_dir',os.path.join(opt.data_dir,'pycrop'))
+
+
+# ==================== LOAD MODEL AND FILE LIST ====================
+
+s = SyncNetInstance();
+
+s.loadParameters(opt.initial_model);
+#print("Model %s loaded."%opt.initial_model);
+
+flist = glob.glob(os.path.join(opt.crop_dir,opt.reference,'0*.avi'))
+flist.sort()
+
+# ==================== GET OFFSETS ====================
+
+dists = []
+for idx, fname in enumerate(flist):
+ offset, conf, dist = s.evaluate(opt,videofile=fname)
+ print()
+ print (str(dist)+" "+str(conf))
+
+# ==================== PRINT RESULTS TO FILE ====================
+
+#with open(os.path.join(opt.work_dir,opt.reference,'activesd.pckl'), 'wb') as fil:
+# pickle.dump(dists, fil)
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/calculate_scores_real_videos.sh b/hallo_root/evaluate_root/Evaluate/syncnet_python/calculate_scores_real_videos.sh
new file mode 100644
index 00000000..4a45cd56
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/calculate_scores_real_videos.sh
@@ -0,0 +1,8 @@
+rm all_scores.txt
+yourfilenames=`ls $1`
+
+for eachfile in $yourfilenames
+do
+ python run_pipeline.py --videofile $1/$eachfile --reference wav2lip --data_dir tmp_dir
+ python calculate_scores_real_videos.py --videofile $1/$eachfile --reference wav2lip --data_dir tmp_dir >> all_scores.txt
+done
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/demo_feature.py b/hallo_root/evaluate_root/Evaluate/syncnet_python/demo_feature.py
new file mode 100644
index 00000000..e3bd290e
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/demo_feature.py
@@ -0,0 +1,32 @@
+#!/usr/bin/python
+#-*- coding: utf-8 -*-
+
+import time, pdb, argparse, subprocess
+
+from SyncNetInstance import *
+
+# ==================== LOAD PARAMS ====================
+
+
+parser = argparse.ArgumentParser(description = "SyncNet");
+
+parser.add_argument('--initial_model', type=str, default="data/syncnet_v2.model", help='');
+parser.add_argument('--batch_size', type=int, default='20', help='');
+parser.add_argument('--vshift', type=int, default='15', help='');
+parser.add_argument('--videofile', type=str, default="data/example.avi", help='');
+parser.add_argument('--tmp_dir', type=str, default="data", help='');
+parser.add_argument('--save_as', type=str, default="data/features.pt", help='');
+
+opt = parser.parse_args();
+
+
+# ==================== RUN EVALUATION ====================
+
+s = SyncNetInstance();
+
+s.loadParameters(opt.initial_model);
+print("Model %s loaded."%opt.initial_model);
+
+feats = s.extract_feature(opt, videofile=opt.videofile)
+
+torch.save(feats, opt.save_as)
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/demo_syncnet.py b/hallo_root/evaluate_root/Evaluate/syncnet_python/demo_syncnet.py
new file mode 100644
index 00000000..01c25a6f
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/demo_syncnet.py
@@ -0,0 +1,30 @@
+#!/usr/bin/python
+#-*- coding: utf-8 -*-
+
+import time, pdb, argparse, subprocess
+
+from SyncNetInstance import *
+
+# ==================== LOAD PARAMS ====================
+
+
+parser = argparse.ArgumentParser(description = "SyncNet");
+
+parser.add_argument('--initial_model', type=str, default="data/syncnet_v2.model", help='');
+parser.add_argument('--batch_size', type=int, default='20', help='');
+parser.add_argument('--vshift', type=int, default='15', help='');
+parser.add_argument('--videofile', type=str, default="data/example.avi", help='');
+parser.add_argument('--tmp_dir', type=str, default="data/work/pytmp", help='');
+parser.add_argument('--reference', type=str, default="demo", help='');
+
+opt = parser.parse_args();
+
+
+# ==================== RUN EVALUATION ====================
+
+s = SyncNetInstance();
+
+s.loadParameters(opt.initial_model);
+print("Model %s loaded."%opt.initial_model);
+
+s.evaluate(opt, videofile=opt.videofile)
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/detectors/README.md b/hallo_root/evaluate_root/Evaluate/syncnet_python/detectors/README.md
new file mode 100644
index 00000000..f5a8d4fe
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/detectors/README.md
@@ -0,0 +1,3 @@
+# Face detector
+
+This face detector is adapted from `https://github.com/cs-giung/face-detection-pytorch`.
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/detectors/__init__.py b/hallo_root/evaluate_root/Evaluate/syncnet_python/detectors/__init__.py
new file mode 100644
index 00000000..059d49bf
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/detectors/__init__.py
@@ -0,0 +1 @@
+from .s3fd import S3FD
\ No newline at end of file
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/detectors/s3fd/__init__.py b/hallo_root/evaluate_root/Evaluate/syncnet_python/detectors/s3fd/__init__.py
new file mode 100644
index 00000000..d7f35e05
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/detectors/s3fd/__init__.py
@@ -0,0 +1,61 @@
+import time
+import numpy as np
+import cv2
+import torch
+from torchvision import transforms
+from .nets import S3FDNet
+from .box_utils import nms_
+
+PATH_WEIGHT = './detectors/s3fd/weights/sfd_face.pth'
+img_mean = np.array([104., 117., 123.])[:, np.newaxis, np.newaxis].astype('float32')
+
+
+class S3FD():
+
+ def __init__(self, device='cuda'):
+
+ tstamp = time.time()
+ self.device = device
+
+ print('[S3FD] loading with', self.device)
+ self.net = S3FDNet(device=self.device).to(self.device)
+ state_dict = torch.load(PATH_WEIGHT, map_location=self.device)
+ self.net.load_state_dict(state_dict)
+ self.net.eval()
+ print('[S3FD] finished loading (%.4f sec)' % (time.time() - tstamp))
+
+ def detect_faces(self, image, conf_th=0.8, scales=[1]):
+
+ w, h = image.shape[1], image.shape[0]
+
+ bboxes = np.empty(shape=(0, 5))
+
+ with torch.no_grad():
+ for s in scales:
+ scaled_img = cv2.resize(image, dsize=(0, 0), fx=s, fy=s, interpolation=cv2.INTER_LINEAR)
+
+ scaled_img = np.swapaxes(scaled_img, 1, 2)
+ scaled_img = np.swapaxes(scaled_img, 1, 0)
+ scaled_img = scaled_img[[2, 1, 0], :, :]
+ scaled_img = scaled_img.astype('float32')
+ scaled_img -= img_mean
+ scaled_img = scaled_img[[2, 1, 0], :, :]
+ x = torch.from_numpy(scaled_img).unsqueeze(0).to(self.device)
+ y = self.net(x)
+
+ detections = y.data
+ scale = torch.Tensor([w, h, w, h])
+
+ for i in range(detections.size(1)):
+ j = 0
+ while detections[0, i, j, 0] > conf_th:
+ score = detections[0, i, j, 0]
+ pt = (detections[0, i, j, 1:] * scale).cpu().numpy()
+ bbox = (pt[0], pt[1], pt[2], pt[3], score)
+ bboxes = np.vstack((bboxes, bbox))
+ j += 1
+
+ keep = nms_(bboxes, 0.1)
+ bboxes = bboxes[keep]
+
+ return bboxes
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/detectors/s3fd/box_utils.py b/hallo_root/evaluate_root/Evaluate/syncnet_python/detectors/s3fd/box_utils.py
new file mode 100644
index 00000000..1bf4be2c
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/detectors/s3fd/box_utils.py
@@ -0,0 +1,217 @@
+import numpy as np
+from itertools import product as product
+import torch
+from torch.autograd import Function
+
+
+def nms_(dets, thresh):
+ """
+ Courtesy of Ross Girshick
+ [https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/nms/py_cpu_nms.py]
+ """
+ x1 = dets[:, 0]
+ y1 = dets[:, 1]
+ x2 = dets[:, 2]
+ y2 = dets[:, 3]
+ scores = dets[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(int(i))
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+
+ return np.array(keep).astype(int)
+
+
+def decode(loc, priors, variances):
+ """Decode locations from predictions using priors to undo
+ the encoding we did for offset regression at train time.
+ Args:
+ loc (tensor): location predictions for loc layers,
+ Shape: [num_priors,4]
+ priors (tensor): Prior boxes in center-offset form.
+ Shape: [num_priors,4].
+ variances: (list[float]) Variances of priorboxes
+ Return:
+ decoded bounding box predictions
+ """
+
+ boxes = torch.cat((
+ priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:],
+ priors[:, 2:] * torch.exp(loc[:, 2:] * variances[1])), 1)
+ boxes[:, :2] -= boxes[:, 2:] / 2
+ boxes[:, 2:] += boxes[:, :2]
+ return boxes
+
+
+def nms(boxes, scores, overlap=0.5, top_k=200):
+ """Apply non-maximum suppression at test time to avoid detecting too many
+ overlapping bounding boxes for a given object.
+ Args:
+ boxes: (tensor) The location preds for the img, Shape: [num_priors,4].
+ scores: (tensor) The class predscores for the img, Shape:[num_priors].
+ overlap: (float) The overlap thresh for suppressing unnecessary boxes.
+ top_k: (int) The Maximum number of box preds to consider.
+ Return:
+ The indices of the kept boxes with respect to num_priors.
+ """
+
+ keep = scores.new(scores.size(0)).zero_().long()
+ if boxes.numel() == 0:
+ return keep, 0
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ area = torch.mul(x2 - x1, y2 - y1)
+ v, idx = scores.sort(0) # sort in ascending order
+ # I = I[v >= 0.01]
+ idx = idx[-top_k:] # indices of the top-k largest vals
+ xx1 = boxes.new()
+ yy1 = boxes.new()
+ xx2 = boxes.new()
+ yy2 = boxes.new()
+ w = boxes.new()
+ h = boxes.new()
+
+ # keep = torch.Tensor()
+ count = 0
+ while idx.numel() > 0:
+ i = idx[-1] # index of current largest val
+ # keep.append(i)
+ keep[count] = i
+ count += 1
+ if idx.size(0) == 1:
+ break
+ idx = idx[:-1] # remove kept element from view
+ # load bboxes of next highest vals
+ torch.index_select(x1, 0, idx, out=xx1)
+ torch.index_select(y1, 0, idx, out=yy1)
+ torch.index_select(x2, 0, idx, out=xx2)
+ torch.index_select(y2, 0, idx, out=yy2)
+ # store element-wise max with next highest score
+ xx1 = torch.clamp(xx1, min=x1[i])
+ yy1 = torch.clamp(yy1, min=y1[i])
+ xx2 = torch.clamp(xx2, max=x2[i])
+ yy2 = torch.clamp(yy2, max=y2[i])
+ w.resize_as_(xx2)
+ h.resize_as_(yy2)
+ w = xx2 - xx1
+ h = yy2 - yy1
+ # check sizes of xx1 and xx2.. after each iteration
+ w = torch.clamp(w, min=0.0)
+ h = torch.clamp(h, min=0.0)
+ inter = w * h
+ # IoU = i / (area(a) + area(b) - i)
+ rem_areas = torch.index_select(area, 0, idx) # load remaining areas)
+ union = (rem_areas - inter) + area[i]
+ IoU = inter / union # store result in iou
+ # keep only elements with an IoU <= overlap
+ idx = idx[IoU.le(overlap)]
+ return keep, count
+
+
+class Detect(object):
+
+ def __init__(self, num_classes=2,
+ top_k=750, nms_thresh=0.3, conf_thresh=0.05,
+ variance=[0.1, 0.2], nms_top_k=5000):
+
+ self.num_classes = num_classes
+ self.top_k = top_k
+ self.nms_thresh = nms_thresh
+ self.conf_thresh = conf_thresh
+ self.variance = variance
+ self.nms_top_k = nms_top_k
+
+ def forward(self, loc_data, conf_data, prior_data):
+
+ num = loc_data.size(0)
+ num_priors = prior_data.size(0)
+
+ conf_preds = conf_data.view(num, num_priors, self.num_classes).transpose(2, 1)
+ batch_priors = prior_data.view(-1, num_priors, 4).expand(num, num_priors, 4)
+ batch_priors = batch_priors.contiguous().view(-1, 4)
+
+ decoded_boxes = decode(loc_data.view(-1, 4), batch_priors, self.variance)
+ decoded_boxes = decoded_boxes.view(num, num_priors, 4)
+
+ output = torch.zeros(num, self.num_classes, self.top_k, 5)
+
+ for i in range(num):
+ boxes = decoded_boxes[i].clone()
+ conf_scores = conf_preds[i].clone()
+
+ for cl in range(1, self.num_classes):
+ c_mask = conf_scores[cl].gt(self.conf_thresh)
+ scores = conf_scores[cl][c_mask]
+
+ if scores.dim() == 0:
+ continue
+ l_mask = c_mask.unsqueeze(1).expand_as(boxes)
+ boxes_ = boxes[l_mask].view(-1, 4)
+ ids, count = nms(boxes_, scores, self.nms_thresh, self.nms_top_k)
+ count = count if count < self.top_k else self.top_k
+
+ output[i, cl, :count] = torch.cat((scores[ids[:count]].unsqueeze(1), boxes_[ids[:count]]), 1)
+
+ return output
+
+
+class PriorBox(object):
+
+ def __init__(self, input_size, feature_maps,
+ variance=[0.1, 0.2],
+ min_sizes=[16, 32, 64, 128, 256, 512],
+ steps=[4, 8, 16, 32, 64, 128],
+ clip=False):
+
+ super(PriorBox, self).__init__()
+
+ self.imh = input_size[0]
+ self.imw = input_size[1]
+ self.feature_maps = feature_maps
+
+ self.variance = variance
+ self.min_sizes = min_sizes
+ self.steps = steps
+ self.clip = clip
+
+ def forward(self):
+ mean = []
+ for k, fmap in enumerate(self.feature_maps):
+ feath = fmap[0]
+ featw = fmap[1]
+ for i, j in product(range(feath), range(featw)):
+ f_kw = self.imw / self.steps[k]
+ f_kh = self.imh / self.steps[k]
+
+ cx = (j + 0.5) / f_kw
+ cy = (i + 0.5) / f_kh
+
+ s_kw = self.min_sizes[k] / self.imw
+ s_kh = self.min_sizes[k] / self.imh
+
+ mean += [cx, cy, s_kw, s_kh]
+
+ output = torch.FloatTensor(mean).view(-1, 4)
+
+ if self.clip:
+ output.clamp_(max=1, min=0)
+
+ return output
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/detectors/s3fd/nets.py b/hallo_root/evaluate_root/Evaluate/syncnet_python/detectors/s3fd/nets.py
new file mode 100644
index 00000000..85b5c82c
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/detectors/s3fd/nets.py
@@ -0,0 +1,174 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.nn.init as init
+from .box_utils import Detect, PriorBox
+
+
+class L2Norm(nn.Module):
+
+ def __init__(self, n_channels, scale):
+ super(L2Norm, self).__init__()
+ self.n_channels = n_channels
+ self.gamma = scale or None
+ self.eps = 1e-10
+ self.weight = nn.Parameter(torch.Tensor(self.n_channels))
+ self.reset_parameters()
+
+ def reset_parameters(self):
+ init.constant_(self.weight, self.gamma)
+
+ def forward(self, x):
+ norm = x.pow(2).sum(dim=1, keepdim=True).sqrt() + self.eps
+ x = torch.div(x, norm)
+ out = self.weight.unsqueeze(0).unsqueeze(2).unsqueeze(3).expand_as(x) * x
+ return out
+
+
+class S3FDNet(nn.Module):
+
+ def __init__(self, device='cuda'):
+ super(S3FDNet, self).__init__()
+ self.device = device
+
+ self.vgg = nn.ModuleList([
+ nn.Conv2d(3, 64, 3, 1, padding=1),
+ nn.ReLU(inplace=True),
+ nn.Conv2d(64, 64, 3, 1, padding=1),
+ nn.ReLU(inplace=True),
+ nn.MaxPool2d(2, 2),
+
+ nn.Conv2d(64, 128, 3, 1, padding=1),
+ nn.ReLU(inplace=True),
+ nn.Conv2d(128, 128, 3, 1, padding=1),
+ nn.ReLU(inplace=True),
+ nn.MaxPool2d(2, 2),
+
+ nn.Conv2d(128, 256, 3, 1, padding=1),
+ nn.ReLU(inplace=True),
+ nn.Conv2d(256, 256, 3, 1, padding=1),
+ nn.ReLU(inplace=True),
+ nn.Conv2d(256, 256, 3, 1, padding=1),
+ nn.ReLU(inplace=True),
+ nn.MaxPool2d(2, 2, ceil_mode=True),
+
+ nn.Conv2d(256, 512, 3, 1, padding=1),
+ nn.ReLU(inplace=True),
+ nn.Conv2d(512, 512, 3, 1, padding=1),
+ nn.ReLU(inplace=True),
+ nn.Conv2d(512, 512, 3, 1, padding=1),
+ nn.ReLU(inplace=True),
+ nn.MaxPool2d(2, 2),
+
+ nn.Conv2d(512, 512, 3, 1, padding=1),
+ nn.ReLU(inplace=True),
+ nn.Conv2d(512, 512, 3, 1, padding=1),
+ nn.ReLU(inplace=True),
+ nn.Conv2d(512, 512, 3, 1, padding=1),
+ nn.ReLU(inplace=True),
+ nn.MaxPool2d(2, 2),
+
+ nn.Conv2d(512, 1024, 3, 1, padding=6, dilation=6),
+ nn.ReLU(inplace=True),
+ nn.Conv2d(1024, 1024, 1, 1),
+ nn.ReLU(inplace=True),
+ ])
+
+ self.L2Norm3_3 = L2Norm(256, 10)
+ self.L2Norm4_3 = L2Norm(512, 8)
+ self.L2Norm5_3 = L2Norm(512, 5)
+
+ self.extras = nn.ModuleList([
+ nn.Conv2d(1024, 256, 1, 1),
+ nn.Conv2d(256, 512, 3, 2, padding=1),
+ nn.Conv2d(512, 128, 1, 1),
+ nn.Conv2d(128, 256, 3, 2, padding=1),
+ ])
+
+ self.loc = nn.ModuleList([
+ nn.Conv2d(256, 4, 3, 1, padding=1),
+ nn.Conv2d(512, 4, 3, 1, padding=1),
+ nn.Conv2d(512, 4, 3, 1, padding=1),
+ nn.Conv2d(1024, 4, 3, 1, padding=1),
+ nn.Conv2d(512, 4, 3, 1, padding=1),
+ nn.Conv2d(256, 4, 3, 1, padding=1),
+ ])
+
+ self.conf = nn.ModuleList([
+ nn.Conv2d(256, 4, 3, 1, padding=1),
+ nn.Conv2d(512, 2, 3, 1, padding=1),
+ nn.Conv2d(512, 2, 3, 1, padding=1),
+ nn.Conv2d(1024, 2, 3, 1, padding=1),
+ nn.Conv2d(512, 2, 3, 1, padding=1),
+ nn.Conv2d(256, 2, 3, 1, padding=1),
+ ])
+
+ self.softmax = nn.Softmax(dim=-1)
+ self.detect = Detect()
+
+ def forward(self, x):
+ size = x.size()[2:]
+ sources = list()
+ loc = list()
+ conf = list()
+
+ for k in range(16):
+ x = self.vgg[k](x)
+ s = self.L2Norm3_3(x)
+ sources.append(s)
+
+ for k in range(16, 23):
+ x = self.vgg[k](x)
+ s = self.L2Norm4_3(x)
+ sources.append(s)
+
+ for k in range(23, 30):
+ x = self.vgg[k](x)
+ s = self.L2Norm5_3(x)
+ sources.append(s)
+
+ for k in range(30, len(self.vgg)):
+ x = self.vgg[k](x)
+ sources.append(x)
+
+ # apply extra layers and cache source layer outputs
+ for k, v in enumerate(self.extras):
+ x = F.relu(v(x), inplace=True)
+ if k % 2 == 1:
+ sources.append(x)
+
+ # apply multibox head to source layers
+ loc_x = self.loc[0](sources[0])
+ conf_x = self.conf[0](sources[0])
+
+ max_conf, _ = torch.max(conf_x[:, 0:3, :, :], dim=1, keepdim=True)
+ conf_x = torch.cat((max_conf, conf_x[:, 3:, :, :]), dim=1)
+
+ loc.append(loc_x.permute(0, 2, 3, 1).contiguous())
+ conf.append(conf_x.permute(0, 2, 3, 1).contiguous())
+
+ for i in range(1, len(sources)):
+ x = sources[i]
+ conf.append(self.conf[i](x).permute(0, 2, 3, 1).contiguous())
+ loc.append(self.loc[i](x).permute(0, 2, 3, 1).contiguous())
+
+ features_maps = []
+ for i in range(len(loc)):
+ feat = []
+ feat += [loc[i].size(1), loc[i].size(2)]
+ features_maps += [feat]
+
+ loc = torch.cat([o.view(o.size(0), -1) for o in loc], 1)
+ conf = torch.cat([o.view(o.size(0), -1) for o in conf], 1)
+
+ with torch.no_grad():
+ self.priorbox = PriorBox(size, features_maps)
+ self.priors = self.priorbox.forward()
+
+ output = self.detect.forward(
+ loc.view(loc.size(0), -1, 4),
+ self.softmax(conf.view(conf.size(0), -1, 2)),
+ self.priors.type(type(x.data)).to(self.device)
+ )
+
+ return output
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/download_model.sh b/hallo_root/evaluate_root/Evaluate/syncnet_python/download_model.sh
new file mode 100644
index 00000000..3e3a9dc2
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/download_model.sh
@@ -0,0 +1,9 @@
+# SyncNet model
+
+mkdir data
+wget http://www.robots.ox.ac.uk/~vgg/software/lipsync/data/syncnet_v2.model -O data/syncnet_v2.model
+wget http://www.robots.ox.ac.uk/~vgg/software/lipsync/data/example.avi -O data/example.avi
+
+# For the pre-processing pipeline
+mkdir detectors/s3fd/weights
+wget https://www.robots.ox.ac.uk/~vgg/software/lipsync/data/sfd_face.pth -O detectors/s3fd/weights/sfd_face.pth
\ No newline at end of file
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/img/ex1.jpg b/hallo_root/evaluate_root/Evaluate/syncnet_python/img/ex1.jpg
new file mode 100644
index 00000000..b20b57e1
Binary files /dev/null and b/hallo_root/evaluate_root/Evaluate/syncnet_python/img/ex1.jpg differ
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/img/ex2.jpg b/hallo_root/evaluate_root/Evaluate/syncnet_python/img/ex2.jpg
new file mode 100644
index 00000000..851402cc
Binary files /dev/null and b/hallo_root/evaluate_root/Evaluate/syncnet_python/img/ex2.jpg differ
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/requirements.txt b/hallo_root/evaluate_root/Evaluate/syncnet_python/requirements.txt
new file mode 100644
index 00000000..89197409
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/requirements.txt
@@ -0,0 +1,7 @@
+torch>=1.4.0
+torchvision>=0.5.0
+numpy>=1.18.1
+scipy>=1.2.1
+scenedetect==0.5.1
+opencv-contrib-python
+python_speech_features
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/run_pipeline.py b/hallo_root/evaluate_root/Evaluate/syncnet_python/run_pipeline.py
new file mode 100644
index 00000000..f5fc22e0
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/run_pipeline.py
@@ -0,0 +1,322 @@
+#!/usr/bin/python
+
+import sys, time, os, pdb, argparse, pickle, subprocess, glob, cv2
+import numpy as np
+from shutil import rmtree
+
+import scenedetect
+from scenedetect.video_manager import VideoManager
+from scenedetect.scene_manager import SceneManager
+from scenedetect.frame_timecode import FrameTimecode
+from scenedetect.stats_manager import StatsManager
+from scenedetect.detectors import ContentDetector
+
+from scipy.interpolate import interp1d
+from scipy.io import wavfile
+from scipy import signal
+
+from detectors import S3FD
+
+# ========== ========== ========== ==========
+# # PARSE ARGS
+# ========== ========== ========== ==========
+
+parser = argparse.ArgumentParser(description = "FaceTracker");
+parser.add_argument('--data_dir', type=str, default='data/work', help='Output direcotry');
+parser.add_argument('--videofile', type=str, default='', help='Input video file');
+parser.add_argument('--reference', type=str, default='', help='Video reference');
+parser.add_argument('--facedet_scale', type=float, default=0.25, help='Scale factor for face detection');
+parser.add_argument('--crop_scale', type=float, default=0.40, help='Scale bounding box');
+parser.add_argument('--min_track', type=int, default=100, help='Minimum facetrack duration');
+parser.add_argument('--frame_rate', type=int, default=25, help='Frame rate');
+parser.add_argument('--num_failed_det', type=int, default=25, help='Number of missed detections allowed before tracking is stopped');
+parser.add_argument('--min_face_size', type=int, default=100, help='Minimum face size in pixels');
+opt = parser.parse_args();
+
+setattr(opt,'avi_dir',os.path.join(opt.data_dir,'pyavi'))
+setattr(opt,'tmp_dir',os.path.join(opt.data_dir,'pytmp'))
+setattr(opt,'work_dir',os.path.join(opt.data_dir,'pywork'))
+setattr(opt,'crop_dir',os.path.join(opt.data_dir,'pycrop'))
+setattr(opt,'frames_dir',os.path.join(opt.data_dir,'pyframes'))
+
+# ========== ========== ========== ==========
+# # IOU FUNCTION
+# ========== ========== ========== ==========
+
+def bb_intersection_over_union(boxA, boxB):
+
+ xA = max(boxA[0], boxB[0])
+ yA = max(boxA[1], boxB[1])
+ xB = min(boxA[2], boxB[2])
+ yB = min(boxA[3], boxB[3])
+
+ interArea = max(0, xB - xA) * max(0, yB - yA)
+
+ boxAArea = (boxA[2] - boxA[0]) * (boxA[3] - boxA[1])
+ boxBArea = (boxB[2] - boxB[0]) * (boxB[3] - boxB[1])
+
+ iou = interArea / float(boxAArea + boxBArea - interArea)
+
+ return iou
+
+# ========== ========== ========== ==========
+# # FACE TRACKING
+# ========== ========== ========== ==========
+
+def track_shot(opt,scenefaces):
+
+ iouThres = 0.5 # Minimum IOU between consecutive face detections
+ tracks = []
+
+ while True:
+ track = []
+ for framefaces in scenefaces:
+ for face in framefaces:
+ if track == []:
+ track.append(face)
+ framefaces.remove(face)
+ elif face['frame'] - track[-1]['frame'] <= opt.num_failed_det:
+ iou = bb_intersection_over_union(face['bbox'], track[-1]['bbox'])
+ if iou > iouThres:
+ track.append(face)
+ framefaces.remove(face)
+ continue
+ else:
+ break
+
+ if track == []:
+ break
+ elif len(track) > opt.min_track:
+
+ framenum = np.array([ f['frame'] for f in track ])
+ bboxes = np.array([np.array(f['bbox']) for f in track])
+
+ frame_i = np.arange(framenum[0],framenum[-1]+1)
+
+ bboxes_i = []
+ for ij in range(0,4):
+ interpfn = interp1d(framenum, bboxes[:,ij])
+ bboxes_i.append(interpfn(frame_i))
+ bboxes_i = np.stack(bboxes_i, axis=1)
+
+ if max(np.mean(bboxes_i[:,2]-bboxes_i[:,0]), np.mean(bboxes_i[:,3]-bboxes_i[:,1])) > opt.min_face_size:
+ tracks.append({'frame':frame_i,'bbox':bboxes_i})
+
+ return tracks
+
+# ========== ========== ========== ==========
+# # VIDEO CROP AND SAVE
+# ========== ========== ========== ==========
+
+def crop_video(opt,track,cropfile):
+
+ flist = glob.glob(os.path.join(opt.frames_dir,opt.reference,'*.jpg'))
+ flist.sort()
+
+ fourcc = cv2.VideoWriter_fourcc(*'XVID')
+ vOut = cv2.VideoWriter(cropfile+'t.avi', fourcc, opt.frame_rate, (224,224))
+
+ dets = {'x':[], 'y':[], 's':[]}
+
+ for det in track['bbox']:
+
+ dets['s'].append(max((det[3]-det[1]),(det[2]-det[0]))/2)
+ dets['y'].append((det[1]+det[3])/2) # crop center x
+ dets['x'].append((det[0]+det[2])/2) # crop center y
+
+ # Smooth detections
+ dets['s'] = signal.medfilt(dets['s'],kernel_size=13)
+ dets['x'] = signal.medfilt(dets['x'],kernel_size=13)
+ dets['y'] = signal.medfilt(dets['y'],kernel_size=13)
+
+ for fidx, frame in enumerate(track['frame']):
+
+ cs = opt.crop_scale
+
+ bs = dets['s'][fidx] # Detection box size
+ bsi = int(bs*(1+2*cs)) # Pad videos by this amount
+
+ image = cv2.imread(flist[frame])
+
+ frame = np.pad(image,((bsi,bsi),(bsi,bsi),(0,0)), 'constant', constant_values=(110,110))
+ my = dets['y'][fidx]+bsi # BBox center Y
+ mx = dets['x'][fidx]+bsi # BBox center X
+
+ face = frame[int(my-bs):int(my+bs*(1+2*cs)),int(mx-bs*(1+cs)):int(mx+bs*(1+cs))]
+
+ vOut.write(cv2.resize(face,(224,224)))
+
+ audiotmp = os.path.join(opt.tmp_dir,opt.reference,'audio.wav')
+ audiostart = (track['frame'][0])/opt.frame_rate
+ audioend = (track['frame'][-1]+1)/opt.frame_rate
+
+ vOut.release()
+
+ # ========== CROP AUDIO FILE ==========
+
+ command = ("ffmpeg -y -i %s -ss %.3f -to %.3f %s" % (os.path.join(opt.avi_dir,opt.reference,'audio.wav'),audiostart,audioend,audiotmp))
+ output = subprocess.call(command, shell=True, stdout=None)
+
+ if output != 0:
+ pdb.set_trace()
+
+ sample_rate, audio = wavfile.read(audiotmp)
+
+ # ========== COMBINE AUDIO AND VIDEO FILES ==========
+
+ command = ("ffmpeg -y -i %st.avi -i %s -c:v copy -c:a copy %s.avi" % (cropfile,audiotmp,cropfile))
+ output = subprocess.call(command, shell=True, stdout=None)
+
+ if output != 0:
+ pdb.set_trace()
+
+ print('Written %s'%cropfile)
+
+ os.remove(cropfile+'t.avi')
+
+ print('Mean pos: x %.2f y %.2f s %.2f'%(np.mean(dets['x']),np.mean(dets['y']),np.mean(dets['s'])))
+
+ return {'track':track, 'proc_track':dets}
+
+# ========== ========== ========== ==========
+# # FACE DETECTION
+# ========== ========== ========== ==========
+
+def inference_video(opt):
+
+ DET = S3FD(device='cuda')
+
+ flist = glob.glob(os.path.join(opt.frames_dir,opt.reference,'*.jpg'))
+ flist.sort()
+
+ dets = []
+
+ for fidx, fname in enumerate(flist):
+
+ start_time = time.time()
+
+ image = cv2.imread(fname)
+
+ image_np = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
+ bboxes = DET.detect_faces(image_np, conf_th=0.9, scales=[opt.facedet_scale])
+
+ dets.append([]);
+ for bbox in bboxes:
+ dets[-1].append({'frame':fidx, 'bbox':(bbox[:-1]).tolist(), 'conf':bbox[-1]})
+
+ elapsed_time = time.time() - start_time
+
+ print('%s-%05d; %d dets; %.2f Hz' % (os.path.join(opt.avi_dir,opt.reference,'video.avi'),fidx,len(dets[-1]),(1/elapsed_time)))
+
+ savepath = os.path.join(opt.work_dir,opt.reference,'faces.pckl')
+
+ with open(savepath, 'wb') as fil:
+ pickle.dump(dets, fil)
+
+ return dets
+
+# ========== ========== ========== ==========
+# # SCENE DETECTION
+# ========== ========== ========== ==========
+
+def scene_detect(opt):
+
+ video_manager = VideoManager([os.path.join(opt.avi_dir,opt.reference,'video.avi')])
+ stats_manager = StatsManager()
+ scene_manager = SceneManager(stats_manager)
+ # Add ContentDetector algorithm (constructor takes detector options like threshold).
+ scene_manager.add_detector(ContentDetector())
+ base_timecode = video_manager.get_base_timecode()
+
+ video_manager.set_downscale_factor()
+
+ video_manager.start()
+
+ scene_manager.detect_scenes(frame_source=video_manager)
+
+ scene_list = scene_manager.get_scene_list(base_timecode)
+
+ savepath = os.path.join(opt.work_dir,opt.reference,'scene.pckl')
+
+ if scene_list == []:
+ scene_list = [(video_manager.get_base_timecode(),video_manager.get_current_timecode())]
+
+ with open(savepath, 'wb') as fil:
+ pickle.dump(scene_list, fil)
+
+ print('%s - scenes detected %d'%(os.path.join(opt.avi_dir,opt.reference,'video.avi'),len(scene_list)))
+
+ return scene_list
+
+
+# ========== ========== ========== ==========
+# # EXECUTE DEMO
+# ========== ========== ========== ==========
+
+# ========== DELETE EXISTING DIRECTORIES ==========
+
+if os.path.exists(os.path.join(opt.work_dir,opt.reference)):
+ rmtree(os.path.join(opt.work_dir,opt.reference))
+
+if os.path.exists(os.path.join(opt.crop_dir,opt.reference)):
+ rmtree(os.path.join(opt.crop_dir,opt.reference))
+
+if os.path.exists(os.path.join(opt.avi_dir,opt.reference)):
+ rmtree(os.path.join(opt.avi_dir,opt.reference))
+
+if os.path.exists(os.path.join(opt.frames_dir,opt.reference)):
+ rmtree(os.path.join(opt.frames_dir,opt.reference))
+
+if os.path.exists(os.path.join(opt.tmp_dir,opt.reference)):
+ rmtree(os.path.join(opt.tmp_dir,opt.reference))
+
+# ========== MAKE NEW DIRECTORIES ==========
+
+os.makedirs(os.path.join(opt.work_dir,opt.reference))
+os.makedirs(os.path.join(opt.crop_dir,opt.reference))
+os.makedirs(os.path.join(opt.avi_dir,opt.reference))
+os.makedirs(os.path.join(opt.frames_dir,opt.reference))
+os.makedirs(os.path.join(opt.tmp_dir,opt.reference))
+
+# ========== CONVERT VIDEO AND EXTRACT FRAMES ==========
+
+command = ("ffmpeg -y -i %s -qscale:v 2 -async 1 -r 25 %s" % (opt.videofile,os.path.join(opt.avi_dir,opt.reference,'video.avi')))
+output = subprocess.call(command, shell=True, stdout=None)
+
+command = ("ffmpeg -y -i %s -qscale:v 2 -threads 1 -f image2 %s" % (os.path.join(opt.avi_dir,opt.reference,'video.avi'),os.path.join(opt.frames_dir,opt.reference,'%06d.jpg')))
+output = subprocess.call(command, shell=True, stdout=None)
+
+command = ("ffmpeg -y -i %s -ac 1 -vn -acodec pcm_s16le -ar 16000 %s" % (os.path.join(opt.avi_dir,opt.reference,'video.avi'),os.path.join(opt.avi_dir,opt.reference,'audio.wav')))
+output = subprocess.call(command, shell=True, stdout=None)
+
+# ========== FACE DETECTION ==========
+
+faces = inference_video(opt)
+
+# ========== SCENE DETECTION ==========
+
+scene = scene_detect(opt)
+
+# ========== FACE TRACKING ==========
+
+alltracks = []
+vidtracks = []
+
+for shot in scene:
+
+ if shot[1].frame_num - shot[0].frame_num >= opt.min_track :
+ alltracks.extend(track_shot(opt,faces[shot[0].frame_num:shot[1].frame_num]))
+
+# ========== FACE TRACK CROP ==========
+
+for ii, track in enumerate(alltracks):
+ vidtracks.append(crop_video(opt,track,os.path.join(opt.crop_dir,opt.reference,'%05d'%ii)))
+
+# ========== SAVE RESULTS ==========
+
+savepath = os.path.join(opt.work_dir,opt.reference,'tracks.pckl')
+
+with open(savepath, 'wb') as fil:
+ pickle.dump(vidtracks, fil)
+
+rmtree(os.path.join(opt.tmp_dir,opt.reference))
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/run_syncnet.py b/hallo_root/evaluate_root/Evaluate/syncnet_python/run_syncnet.py
new file mode 100644
index 00000000..45099fd6
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/run_syncnet.py
@@ -0,0 +1,45 @@
+#!/usr/bin/python
+#-*- coding: utf-8 -*-
+
+import time, pdb, argparse, subprocess, pickle, os, gzip, glob
+
+from SyncNetInstance import *
+
+# ==================== PARSE ARGUMENT ====================
+
+parser = argparse.ArgumentParser(description = "SyncNet");
+parser.add_argument('--initial_model', type=str, default="data/syncnet_v2.model", help='');
+parser.add_argument('--batch_size', type=int, default='20', help='');
+parser.add_argument('--vshift', type=int, default='15', help='');
+parser.add_argument('--data_dir', type=str, default='data/work', help='');
+parser.add_argument('--videofile', type=str, default='', help='');
+parser.add_argument('--reference', type=str, default='', help='');
+opt = parser.parse_args();
+
+setattr(opt,'avi_dir',os.path.join(opt.data_dir,'pyavi'))
+setattr(opt,'tmp_dir',os.path.join(opt.data_dir,'pytmp'))
+setattr(opt,'work_dir',os.path.join(opt.data_dir,'pywork'))
+setattr(opt,'crop_dir',os.path.join(opt.data_dir,'pycrop'))
+
+
+# ==================== LOAD MODEL AND FILE LIST ====================
+
+s = SyncNetInstance();
+
+s.loadParameters(opt.initial_model);
+print("Model %s loaded."%opt.initial_model);
+
+flist = glob.glob(os.path.join(opt.crop_dir,opt.reference,'0*.avi'))
+flist.sort()
+
+# ==================== GET OFFSETS ====================
+
+dists = []
+for idx, fname in enumerate(flist):
+ offset, conf, dist = s.evaluate(opt,videofile=fname)
+ dists.append(dist)
+
+# ==================== PRINT RESULTS TO FILE ====================
+
+with open(os.path.join(opt.work_dir,opt.reference,'activesd.pckl'), 'wb') as fil:
+ pickle.dump(dists, fil)
diff --git a/hallo_root/evaluate_root/Evaluate/syncnet_python/run_visualise.py b/hallo_root/evaluate_root/Evaluate/syncnet_python/run_visualise.py
new file mode 100644
index 00000000..85d89253
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/syncnet_python/run_visualise.py
@@ -0,0 +1,88 @@
+#!/usr/bin/python
+#-*- coding: utf-8 -*-
+
+import torch
+import numpy
+import time, pdb, argparse, subprocess, pickle, os, glob
+import cv2
+
+from scipy import signal
+
+# ==================== PARSE ARGUMENT ====================
+
+parser = argparse.ArgumentParser(description = "SyncNet");
+parser.add_argument('--data_dir', type=str, default='data/work', help='');
+parser.add_argument('--videofile', type=str, default='', help='');
+parser.add_argument('--reference', type=str, default='', help='');
+parser.add_argument('--frame_rate', type=int, default=25, help='Frame rate');
+opt = parser.parse_args();
+
+setattr(opt,'avi_dir',os.path.join(opt.data_dir,'pyavi'))
+setattr(opt,'tmp_dir',os.path.join(opt.data_dir,'pytmp'))
+setattr(opt,'work_dir',os.path.join(opt.data_dir,'pywork'))
+setattr(opt,'crop_dir',os.path.join(opt.data_dir,'pycrop'))
+setattr(opt,'frames_dir',os.path.join(opt.data_dir,'pyframes'))
+
+# ==================== LOAD FILES ====================
+
+with open(os.path.join(opt.work_dir,opt.reference,'tracks.pckl'), 'rb') as fil:
+ tracks = pickle.load(fil, encoding='latin1')
+
+with open(os.path.join(opt.work_dir,opt.reference,'activesd.pckl'), 'rb') as fil:
+ dists = pickle.load(fil, encoding='latin1')
+
+flist = glob.glob(os.path.join(opt.frames_dir,opt.reference,'*.jpg'))
+flist.sort()
+
+# ==================== SMOOTH FACES ====================
+
+faces = [[] for i in range(len(flist))]
+
+for tidx, track in enumerate(tracks):
+
+ mean_dists = numpy.mean(numpy.stack(dists[tidx],1),1)
+ minidx = numpy.argmin(mean_dists,0)
+ minval = mean_dists[minidx]
+
+ fdist = numpy.stack([dist[minidx] for dist in dists[tidx]])
+ fdist = numpy.pad(fdist, (3,3), 'constant', constant_values=10)
+
+ fconf = numpy.median(mean_dists) - fdist
+ fconfm = signal.medfilt(fconf,kernel_size=9)
+
+ for fidx, frame in enumerate(track['track']['frame'].tolist()) :
+ faces[frame].append({'track': tidx, 'conf':fconfm[fidx], 's':track['proc_track']['s'][fidx], 'x':track['proc_track']['x'][fidx], 'y':track['proc_track']['y'][fidx]})
+
+# ==================== ADD DETECTIONS TO VIDEO ====================
+
+first_image = cv2.imread(flist[0])
+
+fw = first_image.shape[1]
+fh = first_image.shape[0]
+
+fourcc = cv2.VideoWriter_fourcc(*'XVID')
+vOut = cv2.VideoWriter(os.path.join(opt.avi_dir,opt.reference,'video_only.avi'), fourcc, opt.frame_rate, (fw,fh))
+
+for fidx, fname in enumerate(flist):
+
+ image = cv2.imread(fname)
+
+ for face in faces[fidx]:
+
+ clr = max(min(face['conf']*25,255),0)
+
+ cv2.rectangle(image,(int(face['x']-face['s']),int(face['y']-face['s'])),(int(face['x']+face['s']),int(face['y']+face['s'])),(0,clr,255-clr),3)
+ cv2.putText(image,'Track %d, Conf %.3f'%(face['track'],face['conf']), (int(face['x']-face['s']),int(face['y']-face['s'])),cv2.FONT_HERSHEY_SIMPLEX,0.5,(255,255,255),2)
+
+ vOut.write(image)
+
+ print('Frame %d'%fidx)
+
+vOut.release()
+
+# ========== COMBINE AUDIO AND VIDEO FILES ==========
+
+command = ("ffmpeg -y -i %s -i %s -c:v copy -c:a copy %s" % (os.path.join(opt.avi_dir,opt.reference,'video_only.avi'),os.path.join(opt.avi_dir,opt.reference,'audio.wav'),os.path.join(opt.avi_dir,opt.reference,'video_out.avi'))) #-async 1
+output = subprocess.call(command, shell=True, stdout=None)
+
+
diff --git a/hallo_root/evaluate_root/Evaluate/utils.py b/hallo_root/evaluate_root/Evaluate/utils.py
new file mode 100644
index 00000000..9021e340
--- /dev/null
+++ b/hallo_root/evaluate_root/Evaluate/utils.py
@@ -0,0 +1,385 @@
+import numpy as np
+import torch
+import cv2
+import math
+import torch.nn.functional as F
+
+def cubic(x):
+ """cubic function used for calculate_weights_indices."""
+ absx = torch.abs(x)
+ absx2 = absx**2
+ absx3 = absx**3
+ return (1.5 * absx3 - 2.5 * absx2 + 1) * (
+ (absx <= 1).type_as(absx)) + (-0.5 * absx3 + 2.5 * absx2 - 4 * absx + 2) * (((absx > 1) *
+ (absx <= 2)).type_as(absx))
+
+
+
+def calculate_weights_indices(in_length, out_length, scale, kernel, kernel_width, antialiasing):
+ """Calculate weights and indices, used for imresize function.
+ Args:
+ in_length (int): Input length.
+ out_length (int): Output length.
+ scale (float): Scale factor.
+ kernel_width (int): Kernel width.
+ antialisaing (bool): Whether to apply anti-aliasing when downsampling.
+ """
+
+ if (scale < 1) and antialiasing:
+ # Use a modified kernel (larger kernel width) to simultaneously
+ # interpolate and antialias
+ kernel_width = kernel_width / scale
+
+ # Output-space coordinates
+ x = torch.linspace(1, out_length, out_length)
+
+ # Input-space coordinates. Calculate the inverse mapping such that 0.5
+ # in output space maps to 0.5 in input space, and 0.5 + scale in output
+ # space maps to 1.5 in input space.
+ u = x / scale + 0.5 * (1 - 1 / scale)
+
+ # What is the left-most pixel that can be involved in the computation?
+ left = torch.floor(u - kernel_width / 2)
+
+ # What is the maximum number of pixels that can be involved in the
+ # computation? Note: it's OK to use an extra pixel here; if the
+ # corresponding weights are all zero, it will be eliminated at the end
+ # of this function.
+ p = math.ceil(kernel_width) + 2
+
+ # The indices of the input pixels involved in computing the k-th output
+ # pixel are in row k of the indices matrix.
+ indices = left.view(out_length, 1).expand(out_length, p) + torch.linspace(0, p - 1, p).view(1, p).expand(
+ out_length, p)
+
+ # The weights used to compute the k-th output pixel are in row k of the
+ # weights matrix.
+ distance_to_center = u.view(out_length, 1).expand(out_length, p) - indices
+
+ # apply cubic kernel
+ if (scale < 1) and antialiasing:
+ weights = scale * cubic(distance_to_center * scale)
+ else:
+ weights = cubic(distance_to_center)
+
+ # Normalize the weights matrix so that each row sums to 1.
+ weights_sum = torch.sum(weights, 1).view(out_length, 1)
+ weights = weights / weights_sum.expand(out_length, p)
+
+ # If a column in weights is all zero, get rid of it. only consider the
+ # first and last column.
+ weights_zero_tmp = torch.sum((weights == 0), 0)
+ if not math.isclose(weights_zero_tmp[0], 0, rel_tol=1e-6):
+ indices = indices.narrow(1, 1, p - 2)
+ weights = weights.narrow(1, 1, p - 2)
+ if not math.isclose(weights_zero_tmp[-1], 0, rel_tol=1e-6):
+ indices = indices.narrow(1, 0, p - 2)
+ weights = weights.narrow(1, 0, p - 2)
+ weights = weights.contiguous()
+ indices = indices.contiguous()
+ sym_len_s = -indices.min() + 1
+ sym_len_e = indices.max() - in_length
+ indices = indices + sym_len_s - 1
+ return weights, indices, int(sym_len_s), int(sym_len_e)
+
+def imresize(img, scale, antialiasing=True):
+ """imresize function same as MATLAB.
+ It now only supports bicubic.
+ The same scale applies for both height and width.
+ Args:
+ img (Tensor | Numpy array):
+ Tensor: Input image with shape (c, h, w), [0, 1] range.
+ Numpy: Input image with shape (h, w, c), [0, 1] range.
+ scale (float): Scale factor. The same scale applies for both height
+ and width.
+ antialisaing (bool): Whether to apply anti-aliasing when downsampling.
+ Default: True.
+ Returns:
+ Tensor: Output image with shape (c, h, w), [0, 1] range, w/o round.
+ """
+ squeeze_flag = False
+ if type(img).__module__ == np.__name__: # numpy type
+ numpy_type = True
+ if img.ndim == 2:
+ img = img[:, :, None]
+ squeeze_flag = True
+ img = torch.from_numpy(img.transpose(2, 0, 1)).float()
+ else:
+ numpy_type = False
+ if img.ndim == 2:
+ img = img.unsqueeze(0)
+ squeeze_flag = True
+
+ in_c, in_h, in_w = img.size()
+ out_h, out_w = math.ceil(in_h * scale), math.ceil(in_w * scale)
+ kernel_width = 4
+ kernel = 'cubic'
+
+ # get weights and indices
+ weights_h, indices_h, sym_len_hs, sym_len_he = calculate_weights_indices(in_h, out_h, scale, kernel, kernel_width,
+ antialiasing)
+ weights_w, indices_w, sym_len_ws, sym_len_we = calculate_weights_indices(in_w, out_w, scale, kernel, kernel_width,
+ antialiasing)
+ # process H dimension
+ # symmetric copying
+ img_aug = torch.FloatTensor(in_c, in_h + sym_len_hs + sym_len_he, in_w)
+ img_aug.narrow(1, sym_len_hs, in_h).copy_(img)
+
+ sym_patch = img[:, :sym_len_hs, :]
+ inv_idx = torch.arange(sym_patch.size(1) - 1, -1, -1).long()
+ sym_patch_inv = sym_patch.index_select(1, inv_idx)
+ img_aug.narrow(1, 0, sym_len_hs).copy_(sym_patch_inv)
+
+ sym_patch = img[:, -sym_len_he:, :]
+ inv_idx = torch.arange(sym_patch.size(1) - 1, -1, -1).long()
+ sym_patch_inv = sym_patch.index_select(1, inv_idx)
+ img_aug.narrow(1, sym_len_hs + in_h, sym_len_he).copy_(sym_patch_inv)
+
+ out_1 = torch.FloatTensor(in_c, out_h, in_w)
+ kernel_width = weights_h.size(1)
+ for i in range(out_h):
+ idx = int(indices_h[i][0])
+ for j in range(in_c):
+ out_1[j, i, :] = img_aug[j, idx:idx + kernel_width, :].transpose(0, 1).mv(weights_h[i])
+
+ # process W dimension
+ # symmetric copying
+ out_1_aug = torch.FloatTensor(in_c, out_h, in_w + sym_len_ws + sym_len_we)
+ out_1_aug.narrow(2, sym_len_ws, in_w).copy_(out_1)
+
+ sym_patch = out_1[:, :, :sym_len_ws]
+ inv_idx = torch.arange(sym_patch.size(2) - 1, -1, -1).long()
+ sym_patch_inv = sym_patch.index_select(2, inv_idx)
+ out_1_aug.narrow(2, 0, sym_len_ws).copy_(sym_patch_inv)
+
+ sym_patch = out_1[:, :, -sym_len_we:]
+ inv_idx = torch.arange(sym_patch.size(2) - 1, -1, -1).long()
+ sym_patch_inv = sym_patch.index_select(2, inv_idx)
+ out_1_aug.narrow(2, sym_len_ws + in_w, sym_len_we).copy_(sym_patch_inv)
+
+ out_2 = torch.FloatTensor(in_c, out_h, out_w)
+ kernel_width = weights_w.size(1)
+ for i in range(out_w):
+ idx = int(indices_w[i][0])
+ for j in range(in_c):
+ out_2[j, :, i] = out_1_aug[j, :, idx:idx + kernel_width].mv(weights_w[i])
+
+ if squeeze_flag:
+ out_2 = out_2.squeeze(0)
+ if numpy_type:
+ out_2 = out_2.numpy()
+ if not squeeze_flag:
+ out_2 = out_2.transpose(1, 2, 0)
+
+ return out_2
+
+
+def _convert_input_type_range(img):
+ """Convert the type and range of the input image.
+ It converts the input image to np.float32 type and range of [0, 1].
+ It is mainly used for pre-processing the input image in colorspace
+ conversion functions such as rgb2ycbcr and ycbcr2rgb.
+ Args:
+ img (ndarray): The input image. It accepts:
+ 1. np.uint8 type with range [0, 255];
+ 2. np.float32 type with range [0, 1].
+ Returns:
+ (ndarray): The converted image with type of np.float32 and range of
+ [0, 1].
+ """
+ img_type = img.dtype
+ img = img.astype(np.float32)
+ if img_type == np.float32:
+ pass
+ elif img_type == np.uint8:
+ img /= 255.
+ else:
+ raise TypeError(f'The img type should be np.float32 or np.uint8, but got {img_type}')
+ return img
+
+
+def _convert_output_type_range(img, dst_type):
+ """Convert the type and range of the image according to dst_type.
+ It converts the image to desired type and range. If `dst_type` is np.uint8,
+ images will be converted to np.uint8 type with range [0, 255]. If
+ `dst_type` is np.float32, it converts the image to np.float32 type with
+ range [0, 1].
+ It is mainly used for post-processing images in colorspace conversion
+ functions such as rgb2ycbcr and ycbcr2rgb.
+ Args:
+ img (ndarray): The image to be converted with np.float32 type and
+ range [0, 255].
+ dst_type (np.uint8 | np.float32): If dst_type is np.uint8, it
+ converts the image to np.uint8 type with range [0, 255]. If
+ dst_type is np.float32, it converts the image to np.float32 type
+ with range [0, 1].
+ Returns:
+ (ndarray): The converted image with desired type and range.
+ """
+ if dst_type not in (np.uint8, np.float32):
+ raise TypeError(f'The dst_type should be np.float32 or np.uint8, but got {dst_type}')
+ if dst_type == np.uint8:
+ img = img.round()
+ else:
+ img /= 255.
+ return img.astype(dst_type)
+
+
+
+def rgb2ycbcr(img, y_only=False):
+ """Convert a RGB image to YCbCr image.
+ This function produces the same results as Matlab's `rgb2ycbcr` function.
+ It implements the ITU-R BT.601 conversion for standard-definition
+ television. See more details in
+ https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.
+ It differs from a similar function in cv2.cvtColor: `RGB <-> YCrCb`.
+ In OpenCV, it implements a JPEG conversion. See more details in
+ https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.
+ Args:
+ img (ndarray): The input image. It accepts:
+ 1. np.uint8 type with range [0, 255];
+ 2. np.float32 type with range [0, 1].
+ y_only (bool): Whether to only return Y channel. Default: False.
+ Returns:
+ ndarray: The converted YCbCr image. The output image has the same type
+ and range as input image.
+ """
+ img_type = img.dtype
+ img = _convert_input_type_range(img)
+ if y_only:
+ out_img = np.dot(img, [65.481, 128.553, 24.966]) + 16.0
+ else:
+ out_img = np.matmul(
+ img, [[65.481, -37.797, 112.0], [128.553, -74.203, -93.786], [24.966, 112.0, -18.214]]) + [16, 128, 128]
+ out_img = _convert_output_type_range(out_img, img_type)
+ return out_img
+
+
+def bgr2ycbcr(img, y_only=False):
+ """Convert a BGR image to YCbCr image.
+ The bgr version of rgb2ycbcr.
+ It implements the ITU-R BT.601 conversion for standard-definition
+ television. See more details in
+ https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.
+ It differs from a similar function in cv2.cvtColor: `BGR <-> YCrCb`.
+ In OpenCV, it implements a JPEG conversion. See more details in
+ https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.
+ Args:
+ img (ndarray): The input image. It accepts:
+ 1. np.uint8 type with range [0, 255];
+ 2. np.float32 type with range [0, 1].
+ y_only (bool): Whether to only return Y channel. Default: False.
+ Returns:
+ ndarray: The converted YCbCr image. The output image has the same type
+ and range as input image.
+ """
+ img_type = img.dtype
+ img = _convert_input_type_range(img)
+ if y_only:
+ out_img = np.dot(img, [24.966, 128.553, 65.481]) + 16.0
+ else:
+ out_img = np.matmul(
+ img, [[24.966, 112.0, -18.214], [128.553, -74.203, -93.786], [65.481, -37.797, 112.0]]) + [16, 128, 128]
+ out_img = _convert_output_type_range(out_img, img_type)
+ return out_img
+
+def ycbcr2rgb(img):
+ """Convert a YCbCr image to RGB image.
+ This function produces the same results as Matlab's ycbcr2rgb function.
+ It implements the ITU-R BT.601 conversion for standard-definition
+ television. See more details in
+ https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.
+ It differs from a similar function in cv2.cvtColor: `YCrCb <-> RGB`.
+ In OpenCV, it implements a JPEG conversion. See more details in
+ https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.
+ Args:
+ img (ndarray): The input image. It accepts:
+ 1. np.uint8 type with range [0, 255];
+ 2. np.float32 type with range [0, 1].
+ Returns:
+ ndarray: The converted RGB image. The output image has the same type
+ and range as input image.
+ """
+ img_type = img.dtype
+ img = _convert_input_type_range(img) * 255
+ out_img = np.matmul(img, [[0.00456621, 0.00456621, 0.00456621], [0, -0.00153632, 0.00791071],
+ [0.00625893, -0.00318811, 0]]) * 255.0 + [-222.921, 135.576, -276.836] # noqa: E126
+ out_img = _convert_output_type_range(out_img, img_type)
+ return out_img
+
+
+def to_y_channel(img):
+ """Change to Y channel of YCbCr.
+ Args:
+ img (ndarray): Images with range [0, 255].
+ Returns:
+ (ndarray): Images with range [0, 255] (float type) without round.
+ """
+ img = img.astype(np.float32) / 255.
+ if img.ndim == 3 and img.shape[2] == 3:
+ img = bgr2ycbcr(img, y_only=True)
+ img = img[..., None]
+ return img * 255.
+
+
+def reorder_image(img, input_order='HWC'):
+ """Reorder images to 'HWC' order.
+ If the input_order is (h, w), return (h, w, 1);
+ If the input_order is (c, h, w), return (h, w, c);
+ If the input_order is (h, w, c), return as it is.
+ Args:
+ img (ndarray): Input image.
+ input_order (str): Whether the input order is 'HWC' or 'CHW'.
+ If the input image shape is (h, w), input_order will not have
+ effects. Default: 'HWC'.
+ Returns:
+ ndarray: reordered image.
+ """
+
+ if input_order not in ['HWC', 'CHW']:
+ raise ValueError(f"Wrong input_order {input_order}. Supported input_orders are 'HWC' and 'CHW'")
+ if len(img.shape) == 2:
+ img = img[..., None]
+ if input_order == 'CHW':
+ img = img.transpose(1, 2, 0)
+ return img
+
+def rgb2ycbcr_pt(img, y_only=False):
+ """Convert RGB images to YCbCr images (PyTorch version).
+ It implements the ITU-R BT.601 conversion for standard-definition television. See more details in
+ https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.
+ Args:
+ img (Tensor): Images with shape (n, 3, h, w), the range [0, 1], float, RGB format.
+ y_only (bool): Whether to only return Y channel. Default: False.
+ Returns:
+ (Tensor): converted images with the shape (n, 3/1, h, w), the range [0, 1], float.
+ """
+ if y_only:
+ weight = torch.tensor([[65.481], [128.553], [24.966]]).to(img)
+ out_img = torch.matmul(img.permute(0, 2, 3, 1), weight).permute(0, 3, 1, 2) + 16.0
+ else:
+ weight = torch.tensor([[65.481, -37.797, 112.0], [128.553, -74.203, -93.786], [24.966, 112.0, -18.214]]).to(img)
+ bias = torch.tensor([16, 128, 128]).view(1, 3, 1, 1).to(img)
+ out_img = torch.matmul(img.permute(0, 2, 3, 1), weight).permute(0, 3, 1, 2) + bias
+
+ out_img = out_img / 255.
+ return
+
+def tensor2img(tensor):
+ im = (255. * tensor).data.cpu().numpy()
+ # clamp
+ im[im > 255] = 255
+ im[im < 0] = 0
+ im = im.astype(np.uint8)
+ return im
+
+def img2tensor(img):
+ img = (img / 255.).astype('float32')
+ if img.ndim ==2:
+ img = np.expand_dims(np.expand_dims(img, axis = 0),axis=0)
+ else:
+ img = np.transpose(img, (2, 0, 1)) # C, H, W
+ img = np.expand_dims(img, axis=0)
+ img = np.ascontiguousarray(img, dtype=np.float32)
+ tensor = torch.from_numpy(img)
+ return tensor
diff --git a/hallo_root/evaluate_root/ImgsForFIDCalcu/hallo/0.jpg b/hallo_root/evaluate_root/ImgsForFIDCalcu/hallo/0.jpg
new file mode 100644
index 00000000..1557ec2d
Binary files /dev/null and b/hallo_root/evaluate_root/ImgsForFIDCalcu/hallo/0.jpg differ
diff --git a/hallo_root/evaluate_root/ImgsForFIDCalcu/source/0.jpg b/hallo_root/evaluate_root/ImgsForFIDCalcu/source/0.jpg
new file mode 100644
index 00000000..c94e3045
Binary files /dev/null and b/hallo_root/evaluate_root/ImgsForFIDCalcu/source/0.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Jae-in/0.jpg b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/0.jpg
new file mode 100644
index 00000000..95611fed
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/0.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Jae-in/0Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/0Res.jpg
new file mode 100644
index 00000000..9652a8f3
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/0Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Jae-in/1200.jpg b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/1200.jpg
new file mode 100644
index 00000000..48613e36
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/1200.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Jae-in/1200Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/1200Res.jpg
new file mode 100644
index 00000000..5f9c566f
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/1200Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Jae-in/1500.jpg b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/1500.jpg
new file mode 100644
index 00000000..0a36b1e9
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/1500.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Jae-in/1500Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/1500Res.jpg
new file mode 100644
index 00000000..f6197b3a
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/1500Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Jae-in/300.jpg b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/300.jpg
new file mode 100644
index 00000000..8618c2c3
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/300.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Jae-in/300Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/300Res.jpg
new file mode 100644
index 00000000..f6428762
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/300Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Jae-in/600.jpg b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/600.jpg
new file mode 100644
index 00000000..9bebc6ef
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/600.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Jae-in/600Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/600Res.jpg
new file mode 100644
index 00000000..2f2a41c3
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/600Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Jae-in/900.jpg b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/900.jpg
new file mode 100644
index 00000000..3e6132e3
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/900.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Jae-in/900Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/900Res.jpg
new file mode 100644
index 00000000..fb109c73
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Jae-in/900Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Lieu/0.jpg b/hallo_root/evaluate_root/JpgForQualitative/Lieu/0.jpg
new file mode 100644
index 00000000..0e284ba5
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Lieu/0.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Lieu/0Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Lieu/0Res.jpg
new file mode 100644
index 00000000..85475356
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Lieu/0Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Lieu/1200.jpg b/hallo_root/evaluate_root/JpgForQualitative/Lieu/1200.jpg
new file mode 100644
index 00000000..c84c4f86
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Lieu/1200.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Lieu/1200Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Lieu/1200Res.jpg
new file mode 100644
index 00000000..88dbbc16
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Lieu/1200Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Lieu/300.jpg b/hallo_root/evaluate_root/JpgForQualitative/Lieu/300.jpg
new file mode 100644
index 00000000..bb29a727
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Lieu/300.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Lieu/300Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Lieu/300Res.jpg
new file mode 100644
index 00000000..f9de7bfc
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Lieu/300Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Lieu/600.jpg b/hallo_root/evaluate_root/JpgForQualitative/Lieu/600.jpg
new file mode 100644
index 00000000..a8366ae6
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Lieu/600.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Lieu/600Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Lieu/600Res.jpg
new file mode 100644
index 00000000..848b283d
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Lieu/600Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Lieu/900.jpg b/hallo_root/evaluate_root/JpgForQualitative/Lieu/900.jpg
new file mode 100644
index 00000000..f976969b
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Lieu/900.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Lieu/900Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Lieu/900Res.jpg
new file mode 100644
index 00000000..05ededd4
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Lieu/900Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/0.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/0.jpg
new file mode 100644
index 00000000..bc464512
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/0.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/0Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/0Res.jpg
new file mode 100644
index 00000000..0b7a9c57
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/0Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/100.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/100.jpg
new file mode 100644
index 00000000..b972713c
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/100.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/1000.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/1000.jpg
new file mode 100644
index 00000000..b9be549b
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/1000.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/1000Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/1000Res.jpg
new file mode 100644
index 00000000..59ae5f68
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/1000Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/100Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/100Res.jpg
new file mode 100644
index 00000000..1edef2be
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/100Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/1100.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/1100.jpg
new file mode 100644
index 00000000..d0c11b0e
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/1100.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/1100Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/1100Res.jpg
new file mode 100644
index 00000000..b059e95a
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/1100Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/1200.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/1200.jpg
new file mode 100644
index 00000000..53292e5c
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/1200.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/1200Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/1200Res.jpg
new file mode 100644
index 00000000..6da33558
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/1200Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/1300.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/1300.jpg
new file mode 100644
index 00000000..dbb9530d
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/1300.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/1300Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/1300Res.jpg
new file mode 100644
index 00000000..92458ce0
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/1300Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/1400.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/1400.jpg
new file mode 100644
index 00000000..778ae716
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/1400.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/1400Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/1400Res.jpg
new file mode 100644
index 00000000..d8786eab
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/1400Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/1500.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/1500.jpg
new file mode 100644
index 00000000..a6b4f42d
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/1500.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/1500Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/1500Res.jpg
new file mode 100644
index 00000000..e34b4349
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/1500Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/200.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/200.jpg
new file mode 100644
index 00000000..c43deb0e
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/200.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/200Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/200Res.jpg
new file mode 100644
index 00000000..c49a3e40
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/200Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/300.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/300.jpg
new file mode 100644
index 00000000..3f81cd7a
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/300.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/300Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/300Res.jpg
new file mode 100644
index 00000000..c1e3acbf
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/300Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/400.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/400.jpg
new file mode 100644
index 00000000..b9b8df4b
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/400.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/400Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/400Res.jpg
new file mode 100644
index 00000000..61907932
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/400Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/500.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/500.jpg
new file mode 100644
index 00000000..b4e7f971
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/500.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/500Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/500Res.jpg
new file mode 100644
index 00000000..e93cb88c
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/500Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/600.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/600.jpg
new file mode 100644
index 00000000..03aac120
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/600.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/600Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/600Res.jpg
new file mode 100644
index 00000000..d36b57b2
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/600Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/700.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/700.jpg
new file mode 100644
index 00000000..e9e88f23
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/700.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/700Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/700Res.jpg
new file mode 100644
index 00000000..342867d5
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/700Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/800.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/800.jpg
new file mode 100644
index 00000000..fd614511
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/800.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/800Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/800Res.jpg
new file mode 100644
index 00000000..5678da48
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/800Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/900.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/900.jpg
new file mode 100644
index 00000000..b1836226
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/900.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Macron/900Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Macron/900Res.jpg
new file mode 100644
index 00000000..834d39c6
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Macron/900Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/May/0.jpg b/hallo_root/evaluate_root/JpgForQualitative/May/0.jpg
new file mode 100644
index 00000000..aa36e0cd
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/May/0.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/May/0Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/May/0Res.jpg
new file mode 100644
index 00000000..84370488
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/May/0Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/May/1200.jpg b/hallo_root/evaluate_root/JpgForQualitative/May/1200.jpg
new file mode 100644
index 00000000..40575f1f
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/May/1200.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/May/1200Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/May/1200Res.jpg
new file mode 100644
index 00000000..d45334a7
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/May/1200Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/May/1500.jpg b/hallo_root/evaluate_root/JpgForQualitative/May/1500.jpg
new file mode 100644
index 00000000..79099275
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/May/1500.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/May/1500Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/May/1500Res.jpg
new file mode 100644
index 00000000..10704719
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/May/1500Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/May/300.jpg b/hallo_root/evaluate_root/JpgForQualitative/May/300.jpg
new file mode 100644
index 00000000..f3fd792b
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/May/300.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/May/300Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/May/300Res.jpg
new file mode 100644
index 00000000..81b6ad32
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/May/300Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/May/600.jpg b/hallo_root/evaluate_root/JpgForQualitative/May/600.jpg
new file mode 100644
index 00000000..2e2b933d
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/May/600.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/May/600Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/May/600Res.jpg
new file mode 100644
index 00000000..8bc8ac8f
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/May/600Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/May/900.jpg b/hallo_root/evaluate_root/JpgForQualitative/May/900.jpg
new file mode 100644
index 00000000..3c1a2de6
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/May/900.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/May/900Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/May/900Res.jpg
new file mode 100644
index 00000000..a2431dc1
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/May/900Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama/0.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama/0.jpg
new file mode 100644
index 00000000..7280c487
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama/0.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama/0Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama/0Res.jpg
new file mode 100644
index 00000000..be8e96c9
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama/0Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama/1200.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama/1200.jpg
new file mode 100644
index 00000000..e9523d73
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama/1200.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama/1200Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama/1200Res.jpg
new file mode 100644
index 00000000..e524f19e
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama/1200Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama/300.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama/300.jpg
new file mode 100644
index 00000000..a9255fd3
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama/300.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama/300Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama/300Res.jpg
new file mode 100644
index 00000000..94312376
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama/300Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama/600.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama/600.jpg
new file mode 100644
index 00000000..868cc17f
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama/600.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama/600Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama/600Res.jpg
new file mode 100644
index 00000000..34e8b5b2
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama/600Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama/900.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama/900.jpg
new file mode 100644
index 00000000..4cde83ba
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama/900.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama/900Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama/900Res.jpg
new file mode 100644
index 00000000..c17de1b9
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama/900Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama1/0.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama1/0.jpg
new file mode 100644
index 00000000..b9d218f3
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama1/0.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama1/0Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama1/0Res.jpg
new file mode 100644
index 00000000..6cfc2616
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama1/0Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama1/1200.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama1/1200.jpg
new file mode 100644
index 00000000..c2a8626b
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama1/1200.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama1/1200Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama1/1200Res.jpg
new file mode 100644
index 00000000..2403affa
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama1/1200Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama1/1500.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama1/1500.jpg
new file mode 100644
index 00000000..db262021
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama1/1500.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama1/1500Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama1/1500Res.jpg
new file mode 100644
index 00000000..8820038d
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama1/1500Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama1/300.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama1/300.jpg
new file mode 100644
index 00000000..3e001c82
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama1/300.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama1/300Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama1/300Res.jpg
new file mode 100644
index 00000000..42319cae
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama1/300Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama1/600.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama1/600.jpg
new file mode 100644
index 00000000..ec4e82fb
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama1/600.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama1/600Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama1/600Res.jpg
new file mode 100644
index 00000000..ce753e87
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama1/600Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama1/900.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama1/900.jpg
new file mode 100644
index 00000000..4d99849d
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama1/900.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama1/900Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama1/900Res.jpg
new file mode 100644
index 00000000..1a9b6d2e
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama1/900Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama2/0.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama2/0.jpg
new file mode 100644
index 00000000..77533140
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama2/0.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama2/0Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama2/0Res.jpg
new file mode 100644
index 00000000..eb0f26b6
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama2/0Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama2/1200.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama2/1200.jpg
new file mode 100644
index 00000000..6d09f3f9
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama2/1200.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama2/1200Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama2/1200Res.jpg
new file mode 100644
index 00000000..a8aff1dd
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama2/1200Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama2/300.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama2/300.jpg
new file mode 100644
index 00000000..c9d9e261
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama2/300.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama2/300Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama2/300Res.jpg
new file mode 100644
index 00000000..a6df826d
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama2/300Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama2/600.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama2/600.jpg
new file mode 100644
index 00000000..7f27f397
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama2/600.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama2/600Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama2/600Res.jpg
new file mode 100644
index 00000000..c705ff75
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama2/600Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama2/900.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama2/900.jpg
new file mode 100644
index 00000000..2ed52ad6
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama2/900.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Obama2/900Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Obama2/900Res.jpg
new file mode 100644
index 00000000..5784a9c2
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Obama2/900Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Shaheen/0.jpg b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/0.jpg
new file mode 100644
index 00000000..c94e3045
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/0.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Shaheen/0Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/0Res.jpg
new file mode 100644
index 00000000..1557ec2d
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/0Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Shaheen/1200.jpg b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/1200.jpg
new file mode 100644
index 00000000..c2977d1f
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/1200.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Shaheen/1200Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/1200Res.jpg
new file mode 100644
index 00000000..64b881a0
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/1200Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Shaheen/1500.jpg b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/1500.jpg
new file mode 100644
index 00000000..88248e3b
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/1500.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Shaheen/1500Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/1500Res.jpg
new file mode 100644
index 00000000..42a3c655
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/1500Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Shaheen/300.jpg b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/300.jpg
new file mode 100644
index 00000000..a132efdf
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/300.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Shaheen/300Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/300Res.jpg
new file mode 100644
index 00000000..42bc0499
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/300Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Shaheen/600.jpg b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/600.jpg
new file mode 100644
index 00000000..36f911c5
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/600.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Shaheen/600Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/600Res.jpg
new file mode 100644
index 00000000..27e72f19
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/600Res.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Shaheen/900.jpg b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/900.jpg
new file mode 100644
index 00000000..a03c25cf
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/900.jpg differ
diff --git a/hallo_root/evaluate_root/JpgForQualitative/Shaheen/900Res.jpg b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/900Res.jpg
new file mode 100644
index 00000000..476e06a5
Binary files /dev/null and b/hallo_root/evaluate_root/JpgForQualitative/Shaheen/900Res.jpg differ
diff --git a/hallo_root/evaluate_root/MP4/Hallo/Jae-in.mp4 b/hallo_root/evaluate_root/MP4/Hallo/Jae-in.mp4
new file mode 100644
index 00000000..4ce9edc1
Binary files /dev/null and b/hallo_root/evaluate_root/MP4/Hallo/Jae-in.mp4 differ
diff --git a/hallo_root/evaluate_root/MP4/Hallo/Lieu.mp4 b/hallo_root/evaluate_root/MP4/Hallo/Lieu.mp4
new file mode 100644
index 00000000..a065414e
Binary files /dev/null and b/hallo_root/evaluate_root/MP4/Hallo/Lieu.mp4 differ
diff --git a/hallo_root/evaluate_root/MP4/Hallo/Macron.mp4 b/hallo_root/evaluate_root/MP4/Hallo/Macron.mp4
new file mode 100644
index 00000000..b676f01d
Binary files /dev/null and b/hallo_root/evaluate_root/MP4/Hallo/Macron.mp4 differ
diff --git a/hallo_root/evaluate_root/MP4/Hallo/May.mp4 b/hallo_root/evaluate_root/MP4/Hallo/May.mp4
new file mode 100644
index 00000000..3c14573d
Binary files /dev/null and b/hallo_root/evaluate_root/MP4/Hallo/May.mp4 differ
diff --git a/hallo_root/evaluate_root/MP4/Hallo/Obama.mp4 b/hallo_root/evaluate_root/MP4/Hallo/Obama.mp4
new file mode 100644
index 00000000..f65454ad
Binary files /dev/null and b/hallo_root/evaluate_root/MP4/Hallo/Obama.mp4 differ
diff --git a/hallo_root/evaluate_root/MP4/Hallo/Obama1.mp4 b/hallo_root/evaluate_root/MP4/Hallo/Obama1.mp4
new file mode 100644
index 00000000..7b3799b6
Binary files /dev/null and b/hallo_root/evaluate_root/MP4/Hallo/Obama1.mp4 differ
diff --git a/hallo_root/evaluate_root/MP4/Hallo/Obama2.mp4 b/hallo_root/evaluate_root/MP4/Hallo/Obama2.mp4
new file mode 100644
index 00000000..9f2baffd
Binary files /dev/null and b/hallo_root/evaluate_root/MP4/Hallo/Obama2.mp4 differ
diff --git a/hallo_root/evaluate_root/MP4/Hallo/Shaheen.mp4 b/hallo_root/evaluate_root/MP4/Hallo/Shaheen.mp4
new file mode 100644
index 00000000..fe03c28e
Binary files /dev/null and b/hallo_root/evaluate_root/MP4/Hallo/Shaheen.mp4 differ
diff --git a/hallo_root/evaluate_root/MP4/Source/Jae-in.mp4 b/hallo_root/evaluate_root/MP4/Source/Jae-in.mp4
new file mode 100644
index 00000000..cd033fa3
Binary files /dev/null and b/hallo_root/evaluate_root/MP4/Source/Jae-in.mp4 differ
diff --git a/hallo_root/evaluate_root/MP4/Source/Lieu.mp4 b/hallo_root/evaluate_root/MP4/Source/Lieu.mp4
new file mode 100644
index 00000000..1c185f0b
Binary files /dev/null and b/hallo_root/evaluate_root/MP4/Source/Lieu.mp4 differ
diff --git a/hallo_root/evaluate_root/MP4/Source/Macron.mp4 b/hallo_root/evaluate_root/MP4/Source/Macron.mp4
new file mode 100644
index 00000000..1764e8f3
Binary files /dev/null and b/hallo_root/evaluate_root/MP4/Source/Macron.mp4 differ
diff --git a/hallo_root/evaluate_root/MP4/Source/May.mp4 b/hallo_root/evaluate_root/MP4/Source/May.mp4
new file mode 100644
index 00000000..adfd4d1e
Binary files /dev/null and b/hallo_root/evaluate_root/MP4/Source/May.mp4 differ
diff --git a/hallo_root/evaluate_root/MP4/Source/Obama.mp4 b/hallo_root/evaluate_root/MP4/Source/Obama.mp4
new file mode 100644
index 00000000..d560e05d
Binary files /dev/null and b/hallo_root/evaluate_root/MP4/Source/Obama.mp4 differ
diff --git a/hallo_root/evaluate_root/MP4/Source/Obama1.mp4 b/hallo_root/evaluate_root/MP4/Source/Obama1.mp4
new file mode 100644
index 00000000..39821890
Binary files /dev/null and b/hallo_root/evaluate_root/MP4/Source/Obama1.mp4 differ
diff --git a/hallo_root/evaluate_root/MP4/Source/Obama2.mp4 b/hallo_root/evaluate_root/MP4/Source/Obama2.mp4
new file mode 100644
index 00000000..1dde85bb
Binary files /dev/null and b/hallo_root/evaluate_root/MP4/Source/Obama2.mp4 differ
diff --git a/hallo_root/evaluate_root/MP4/Source/Shaheen.mp4 b/hallo_root/evaluate_root/MP4/Source/Shaheen.mp4
new file mode 100644
index 00000000..188656ad
Binary files /dev/null and b/hallo_root/evaluate_root/MP4/Source/Shaheen.mp4 differ
diff --git a/hallo_root/evaluate_root/README.md b/hallo_root/evaluate_root/README.md
new file mode 100644
index 00000000..b7efd47f
--- /dev/null
+++ b/hallo_root/evaluate_root/README.md
@@ -0,0 +1,115 @@
+# 项目镜像配置文档
+_测试情况:_
++ _源程序测试:_
+ + _本机 - NVIDIA GeForce RTX 3050/3060 Laptop + 16G + wsl2 + Ubuntu 22.04.3 - 能够计算除了FID之外的数据_
+ + _云服务器:NVIDIA Tesla T4 / 1 * 16G + 32G + Ubuntu 22.04 能够计算所有数据_
++ _镜像测试:_
+ + _本机 - NVIDIA GeForce RTX 3050/3060 Laptop + 16G + wsl2 + Ubuntu 22.04.3 + Docker 27.2.0 - 能够计算除了FID之外的数据_
+## 一、硬件要求
+### 显卡
+只测试过NVIDIA GeForce RTX 3050/3060 Laptop
+## 二、软件要求
+### 1、操作系统
+经测试,Ubuntu22.04 符合要求
+### 2、docker版本
+Docker 27.2.0 及以上
+### 3、nvidia-docker2安装
+```
+# 安装依赖
+sudo apt-get update
+sudo apt-get install -y \
+ curl \
+ gnupg2 \
+ lsb-release \
+ sudo
+
+# 导入 NVIDIA 的 GPG 密钥
+curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo tee /etc/apt/trusted.gpg.d/nvidia.asc
+
+# 添加 NVIDIA 的 Docker 仓库
+distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
+curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
+
+# 更新 apt 包索引并安装 nvidia-docker2 和 docker.io
+sudo apt-get update
+sudo apt-get install -y nvidia-docker2 docker.io
+
+# 重启 Docker 服务
+sudo systemctl restart docker
+
+```
+## 三、镜像使用
+### 1、构建镜像
+使用Dockerfile获取
+
+由GitHub获取项目源码
+```
+git clone https://github.com/STF-Zero/talkingface-kit.git
+```
+进入评估的根目录evaluate_root
+```
+cd talkingface-kit/hallo_root/evaluate_root
+```
+构建镜像
+```
+docker build -t evaluate_image .
+```
+构建过程中,Debian 的镜像源有时会遇到访问问题而导致构建失败,若出现此问题,可通过以下命令切换Ubuntu的镜像源
+```
+vim /etc/apt/sources.list
+
+deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal main restricted universe multiverse
+deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
+deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
+deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-security main restricted universe multiverse
+# 在sources.list的末尾添加以上四行,用于更改apt镜像源
+```
+构建过程大约需要20min
+
+
+
+### 2、启动镜像
+```
+docker run --runtime=nvidia \
+ --gpus all \
+ -it \
+ evaluate_image \
+ bash
+
+# 或者
+ docker run --runtime=nvidia --gpus all -it evaluate_image bash
+```
++ evaluate_image:
+这是要启动的 Docker 镜像的名称和标签。
++ bash:
+启动容器后会进入 bash 控制台
+#### 使用命令
+进入控制台后即可运行main.py来计算 示例视频的 NIQE,PSNR,FID,SSIM,LSE-C,LSE-D。
+```
+python3 main.py
+```
+若报错显示无syncnet_v2.model和sfd_face.pth,则回退至上一级目录,执行以下命令
+```
+cd Evaluate/syncnet_python
+mkdir data
+cd data
+wget http://www.robots.ox.ac.uk/~vgg/software/lipsync/data/syncnet_v2.model
+
+cd ../detectors/s3fd
+mkdir weights
+cd weights
+wget https://www.robots.ox.ac.uk/~vgg/software/lipsync/data/sfd_face.pth
+
+```
+#### 修改参数,计算其他视频
+```
+apt-get install vim
+vim Evaluate/main.py
+```
+可供计算测试的视频都存放在/MP4目录下
+
+/MP4/Hallo中存放的是Hallo项目生成的视频,/MP4/Source中存放的是原视频
+
+可以修改/Evaluate/main.py中的以下参数为视频的名称
+
+
+FROM nvidia/cuda:12.1.0-devel-ubuntu20.04
+
+# 设置环境变量,避免交互式安装
+ENV DEBIAN_FRONTEND=noninteractive
+
+# 安装系统依赖
+RUN apt-get update && apt-get install -y \
+ python3.10 \
+ python3-distutils \
+ python3-pip \
+ ffmpeg \
+ git \
+ build-essential \
+ curl \
+ wget \
+ && apt-get clean
+
+# 安装 Miniconda
+RUN curl -o miniconda.sh -L "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" && \
+ bash miniconda.sh -b -p /opt/conda && \
+ rm miniconda.sh && \
+ /opt/conda/bin/conda init bash
+
+# 将 Conda 添加到环境变量
+ENV PATH=/opt/conda/bin:$PATH
+
+# 复制本地项目文件到容器
+COPY . /app
+
+# 设置工作目录
+WORKDIR /app
+
+# 创建 Conda 环境并安装依赖
+RUN /bin/bash -c "conda create -n hallo python=3.10 && \
+ source activate hallo && \
+ pip install -r requirements.txt && \
+ pip install ."
+
+# 安装 huggingface_hub 0.25.2
+RUN /opt/conda/bin/conda run -n hallo pip install huggingface_hub==0.25.2
+
+# 克隆 Hugging Face 模型
+RUN git clone https://huggingface.co/fudan-generative-ai/hallo pretrained_models
+
+# 暴露端口(根据需要开放端口)
+EXPOSE 8000
+
+# 启动 bash 终端
+CMD ["/bin/bash"]
\ No newline at end of file
diff --git a/hallo_root/image/Readme.md b/hallo_root/image/Readme.md
new file mode 100644
index 00000000..68366b29
--- /dev/null
+++ b/hallo_root/image/Readme.md
@@ -0,0 +1,106 @@
+# 项目镜像配置文档
+_由于项目镜像过大(约30G),镜像传输麻烦,所以我们没有在过多的机器上测试,目前我们已经在组员电脑(NVIDIA GeForce RTX 4060 Ti)wsl2下的子系统Ubuntu24.04、Ubuntu22.04.5中进行了测试,测试均成功运行_
+## 一、硬件要求
+### 显卡
+只测试过NVIDIA GeForce RTX 4060 Ti
+## 二、软件要求
+### 1、操作系统
+经测试,Ubuntu24.04、Ubuntu22.04.5均符合要求
+### 2、显卡驱动(重点)
+在创建镜像的过程中,发现该镜像对显卡驱动的要求很苛刻,不能使用太新的镜像。下面是经过测试可以运行的两个驱动版本
++ 552.44(这个版本我们是在windows平台手动下载安装)
++ 550.120 Linux系统下使用`apt install nvidia-utils-550 `安装
+### 3、docker安装
+我们安装了最新版本的docker,具体安装流程可参考这篇博客。[dockeran安装教程](https://blog.csdn.net/u011278722/article/details/137673353)下面给出基本步骤:
+
+#安装前先卸载操作系统默认安装的docker,
+`sudo apt-get remove docker docker-engine docker.io containerd runc`
+
+#安装必要支持
+`sudo apt install apt-transport-https ca-certificates curl software-properties-common gnupg lsb-release`
+
+
+#添加 Docker 官方 GPG key (可能国内现在访问会存在问题)
+`curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg`
+
+#阿里源(推荐使用阿里的gpg KEY)
+`curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg`
+
+#添加 apt 源:
+#Docker官方源
+`echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null`
+
+
+#阿里apt源
+`echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null`
+
+
+#更新源
+`sudo apt update`
+`sudo apt-get update`
+
+#安装最新版本的Docker
+`sudo apt install docker-ce docker-ce-cli containerd.io`
+#等待安装完成
+
+#查看Docker版本
+`sudo docker version`
+
+#查看Docker运行状态
+`sudo systemctl status docker`
+
+### 4、nvidia-docker2安装
+**注意**:进行这一步之前注意给docker换源,不然容易出现错误
+
+配置软件源
+`distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
+ && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
+ && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list`
+启动下载
+`sudo apt-get update`
+`sudo apt-get install -y nvidia-docker2`
+**注意**:执行这步的时候会跳出一行询问你是否写入damon.json,记得输出“N”,输入“Y”会覆盖你的damon文件.
+
+重启docker
+`sudo systemctl restart docker`
+## 三、镜像使用
+### 1、获取镜像
++ 从我们的tar包获取镜像
+ `docker load --input /path/to/hallo_image.tar`注意修改文件路径。
++ 使用dockerfile构建
+ 1、使用`git@github.com:fudan-generative-vision/hallo.git`下载项目,再将dockerfile放到项目根目录下
+ 2、运行`docker build -t hallo_image:2.0 .`构建镜像(可根据个人喜好,修改镜像名和tag,这里使用我们镜像完善的最后一版,如修改了镜像名和路径,注意对后续的命令也进行修改)
+
+### 2、新建用于挂载数据的文件夹
+`mkdir -p data/output data/images data/audios`
+在root路径下新建data文件夹用于存放输入的图片(jpg),音频(wav)和输出的文件
+### 3、启动镜像
+#### 进入控制台
+`docker run --rm --runtime=nvidia --gpus all -it \
+ -v /root/data/output:/app/.cache \
+ -v /root/data/images:/app/examples/reference_images \
+ -v /root/data/audios:/app/examples/driving_audios \
+ hallo_image:2.0 \
+ bash`
++ --rm:
+这个选项表示当容器停止时,自动删除容器。
++ --runtime=nvidia:
+这个选项告诉 Docker 使用 nvidia 运行时来支持 GPU。如果后期这方面报错,检查/etc/docker/damon.json文件中是否写入环境变量
++ --gpus all:
+这个选项告诉 Docker 容器使用所有可用的 GPU
++ -v /root/data/output:/app/.cache:
+这个选项将宿主机的 /root/data/output 目录挂载到容器内的 /app/.cache 目录。
++ -v /root/data/images:/app/examples/reference_images:
+这个选项将宿主机的 /root/data/images 目录挂载到容器内的 /app/examples/reference_images 目录。这样宿主机的图像文件可以供容器内的程序访问。
++ -v /root/data/audios:/app/examples/driving_audios:
+这个选项将宿主机的 /root/data/audios 目录挂载到容器内的 /app/examples/driving_audios 目录。宿主机上的音频文件可以供容器内的程序使用。
++ hallo_image:2.0:
+这是你要启动的 Docker 镜像的名称和标签。
++ bash:
+启动容器后会进入 bash 控制台
+#### 激活虚拟环境
+`conda activate hallo`
+#### 使用命令
+进入控制台后即可根据你的图片和音频文件输出MP4,下面为使用data/images下的1.jpg和data/audios下的1.wav文件生成MP4的示例命令
+`python scripts/inference.py --source_image examples/reference_images/1.jpg --driving_audio examples/driving_audios/1.wav`
+然后你就可以在宿主机的/root/data/output下读取生成的视频文件
\ No newline at end of file