Skip to content

fix: 修正了源代码仓库中准备数据和训练流程的许多问题,记录在readme中#22

Open
UnderTurrets wants to merge 2 commits intoweiqi-zhang:mainfrom
UnderTurrets:main
Open

fix: 修正了源代码仓库中准备数据和训练流程的许多问题,记录在readme中#22
UnderTurrets wants to merge 2 commits intoweiqi-zhang:mainfrom
UnderTurrets:main

Conversation

@UnderTurrets
Copy link

具体如下:


  • 安装依赖时,出现ERROR: Could not find a version that satisfies the requirement clip==1.0 (from versions: 0.0.1, 0.1.0, 0.2.0),需要先行从git安装,并修改对应environment.yml文件中:
pip install git+https://github.com/openai/CLIP.git

另外,手动安装子模块:

pip install process_data/submodules/diff-gaussian-rasterization
pip install process_data/submodules/simple-knn

最后再执行环境更新指令:

conda env update -f environment.yaml

否则,会出现下载迟滞问题。


  • 在数据准备工作中,需要安装2.9版本的blender:
cd /opt
wget https://download.blender.org/release/Blender2.93/blender-2.93.2-linux-x64.tar.xz
tar -xvf blender-2.93.2-linux-x64.tar.xz
echo 'export PATH=$PATH:/opt/blender-2.93.2-linux-x64/' >> ~/.bashrc
rm blender-2.93.2-linux-x64.tar.xz

  • 在headless的server中,运行render_blender,需要安装xvfb
apt update && apt install xvfb -y

然后如下运行脚本:

xvfb-run -a blender --background --python render_blender.py -- --output_folder {images_path} {mesh_path}

  • 在针对obj文件进行渲染时,由于每个类别中有非常多物体,例如,chair类别中有9000多个文件夹,不可能逐个执行render_blender.py文件进行渲染。因此,添加了针对obj文件的批处理脚本,可以一键渲染多个obj文件并保持对应路径。
python render_blender_batch.py -s {shapenet_folder}

  • 采集点云时,出现AttributeError: 'Scene' object has no attribute 'area'.,修改了sample_points.py中对应代码,手动将scene转换为单一mesh,然后再调用mesh.sample函数进行点云采样。同时,修改了点云文件保存的路径,保存在物品id对应的文件夹下,这样dataset_reader.py才能正确读取。

  • 运行train_gaussian.py,准备GS数据时,出现ValueError: no field of name nx',通过断点调试和单步执行,定位错误到文件dataset_readers.pyfetchPly函数,意思是读取的点云文件中没有法线数据,判断是sample_points.pypoint_cloud.export函数没有保存法线数据。此外,还发现fetchPly函数读取到的点云文件中颜色值都是0。于是,不再使用point_cloud.export方法保存数据,转而手动构建plyfile文件,确保点云文件中包含xyz、rgb、normal等数据。

  • 新建train_gaussian_batch.py,批量准备多个场景的数据,默认保存在process_data/output下,文件路径参考ShapeNetCore路径,遵循先类别后物体的双重目录结构。

  • 修正了准备stage2阶段数据时,运行python test.py -e config/stage1/ -r {num epoch}报错的bug,具体原因为test.py中未加入-rparser参数。现在运行参数为python test.py -e config/stage1/ -r {ckpt_fileName}

  • 修正了在stage2阶段unconditional diffusion training报错TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class 'NoneType'>的bug,具体原因为在无条件输入的情况下,数据加载器ModulationLoader__getitem__函数会返回包含None值的字典,而torch在按批次加载数据时,会把字典中同一个键的值堆叠起来,遇到None就出错了。解决方法为在train.py中略微修改torch.utils.data.DataLoader创建对象的参数,在参数collate_fn中传入一个经过略微修改的default_collate,规避了遇到None值的堆叠。

Copilot AI review requested due to automatic review settings December 11, 2025 09:23
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses multiple bugs and issues discovered during data preparation and training workflows for a 3D Gaussian Splatting project. The fixes span environment setup, data preprocessing scripts, training logic, and documentation.

Key Changes:

  • Fixed handling of None values in unconditional diffusion training by implementing custom batch collation
  • Enhanced point cloud sampling to properly include normals and colors in PLY files
  • Added batch processing scripts for rendering and training multiple ShapeNet objects
  • Improved error handling and robustness in dataset readers for missing PLY attributes

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
train.py Adds custom collate function to handle None values in unconditional training; changes tensorboard log directory
test.py Adds missing --resume parameter and fixes output path handling; removes incomplete code line
process_data/train_gaussian_batch.py New batch processing script for training multiple Gaussian splatting scenes
process_data/train_gaussian.py Fixes TENSORBOARD_FOUND flag from False to True
process_data/submodules/simple-knn/simple_knn.egg-info/PKG-INFO Corrects package name from simple_knn to simple-knn
process_data/shapenetcore_extract.py New utility script for extracting and filtering ShapeNet zip archives
process_data/shapenetcore_delete_unfinished.py New cleanup script for removing incomplete rendering directories
process_data/scene/dataset_readers.py Enhances fetchPly to gracefully handle missing color/normal attributes with fallback defaults
process_data/sample_points.py Major refactoring to properly extract and save normals/colors; adds command-line argument support
process_data/render_blender_batch.py New batch rendering script for processing multiple OBJ files in parallel
process_data/convert_data.py Refactors to use argparse for command-line arguments instead of hardcoded paths
models/combined_model.py Adds None check (unreachable) and uses .get() for safer context access
environment.yaml Comments out clip==1.0 dependency (to be installed from git separately)
diff_utils/helpers.py Removes 'metrics' folder from code backup list
dataloader/modulation_loader.py Conditionalizes assertion check and modifies tensor loading for point cloud augmentation
config/stage2_conditional/specs.json Fixes description from "unconditional" to "conditional"
README.md Adds comprehensive documentation of all fixes and installation issues
.gitignore Adds entries for build artifacts and data directories
Comments suppressed due to low confidence (2)

process_data/sample_points.py:21

  • This comment appears to contain commented-out code.
# def sample(arg):
#     path, name = arg
#     mesh = trimesh.load_mesh(os.path.join(path, name))
#
#     num_points = 100000
#     points = mesh.sample(num_points)
#
#     point_cloud = trimesh.points.PointCloud(points)
#
#     save_path = os.path.join(path, 'points3d.ply')
#     point_cloud.export(save_path)

process_data/sample_points.py:127

  • Unnecessary 'pass' statement.
        pass

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

files_to_process.append({
'path': item,
'name': item.name,
'stem': item.stem, #文件名不含后缀
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 55 has a typo: "文件名不含后缀" should use better phrasing. Consider "文件名(不含扩展名)" for clarity. While this is a minor documentation issue, clear comments improve code maintainability.

Suggested change
'stem': item.stem, #文件名不含后缀
'stem': item.stem, # 文件名(不含扩展名)

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +21
# def sample(arg):
# path, name = arg
# mesh = trimesh.load_mesh(os.path.join(path, name))
#
# num_points = 100000
# points = mesh.sample(num_points)
#
# point_cloud = trimesh.points.PointCloud(points)
#
# save_path = os.path.join(path, 'points3d.ply')
# point_cloud.export(save_path)
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A large block of old implementation code (lines 11-21) remains commented out. Since the new implementation is working and significantly different, this commented code should be removed to improve code maintainability and readability. If historical reference is needed, it can be retrieved from version control.

Suggested change
# def sample(arg):
# path, name = arg
# mesh = trimesh.load_mesh(os.path.join(path, name))
#
# num_points = 100000
# points = mesh.sample(num_points)
#
# point_cloud = trimesh.points.PointCloud(points)
#
# save_path = os.path.join(path, 'points3d.ply')
# point_cloud.export(save_path)

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +37
if x is None:
return None
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for x being None comes after using x to determine which training function to call, which means this check will never be reached if x is actually None. This dead code should be removed, or the logic should be restructured to check for None before attempting to access x's properties.

Copilot uses AI. Check for mistakes.
vertex_data = np.empty(num_points, dtype=[
('x', 'f4'), ('y', 'f4'), ('z', 'f4'),
('nx', 'f4'), ('ny', 'f4'), ('nz', 'f4'),
('red', 'u1'), ('green', 'u1'), ('blue', 'u1') # 颜色通常是 8位无符号整数 (0-255)
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Chinese comments, there's a spacing inconsistency. The standard Chinese writing convention is to not use spaces between the color values and the unit. Consider using consistent spacing: either "0-255" or maintaining the current format throughout all comments.

Suggested change
('red', 'u1'), ('green', 'u1'), ('blue', 'u1') # 颜色通常是 8位无符号整数 (0-255)
('red', 'u1'), ('green', 'u1'), ('blue', 'u1') # 颜色通常是8位无符号整数(0-255)

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +122
process = subprocess.Popen(command, stdout=sys.stdout, stderr=sys.stderr)
process.wait()

if process.returncode == 0:
print(f"✅ 训练成功: {source_path}\n")
else:
print(f"❌ 训练失败,返回码: {process.returncode}。源路径: {source_path}\n")
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subprocess.Popen call outputs directly to sys.stdout and sys.stderr without any buffering control. For long-running batch processes with many objects, this could lead to interleaved output from multiple processes making logs difficult to read. Consider redirecting to individual log files per object or using a logging framework.

Suggested change
process = subprocess.Popen(command, stdout=sys.stdout, stderr=sys.stderr)
process.wait()
if process.returncode == 0:
print(f"✅ 训练成功: {source_path}\n")
else:
print(f"❌ 训练失败,返回码: {process.returncode}。源路径: {source_path}\n")
log_file_path = os.path.join(target_path, "train.log")
with open(log_file_path, "w", encoding="utf-8") as log_file:
process = subprocess.Popen(command, stdout=log_file, stderr=log_file)
process.wait()
if process.returncode == 0:
print(f"✅ 训练成功: {source_path}\n")
print(f" 日志已保存到: {log_file_path}\n")
else:
print(f"❌ 训练失败,返回码: {process.returncode}。源路径: {source_path}\n")
print(f" 日志已保存到: {log_file_path}\n")

Copilot uses AI. Check for mistakes.
Comment on lines +126 to +127
# traceback.print_exc()
pass
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare except clause catches all exceptions silently, which can hide important errors and make debugging difficult. Consider either removing the pass statement to let exceptions propagate, or at minimum logging the exception for debugging purposes.

Suggested change
# traceback.print_exc()
pass
traceback.print_exc()

Copilot uses AI. Check for mistakes.
import argparse
import os
import subprocess
import multiprocessing
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'multiprocessing' is imported with both 'import' and 'import from'.

Suggested change
import multiprocessing

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +5
import sys

Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'sys' is not used.

Suggested change
import sys

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,180 @@
import os
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'os' is not used.

Suggested change
import os

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,180 @@
import os
import shutil
import sys
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'sys' is not used.

Suggested change
import sys

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants