fix: 修正了源代码仓库中准备数据和训练流程的许多问题,记录在readme中#22
fix: 修正了源代码仓库中准备数据和训练流程的许多问题,记录在readme中#22UnderTurrets wants to merge 2 commits intoweiqi-zhang:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses multiple bugs and issues discovered during data preparation and training workflows for a 3D Gaussian Splatting project. The fixes span environment setup, data preprocessing scripts, training logic, and documentation.
Key Changes:
- Fixed handling of None values in unconditional diffusion training by implementing custom batch collation
- Enhanced point cloud sampling to properly include normals and colors in PLY files
- Added batch processing scripts for rendering and training multiple ShapeNet objects
- Improved error handling and robustness in dataset readers for missing PLY attributes
Reviewed changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
| train.py | Adds custom collate function to handle None values in unconditional training; changes tensorboard log directory |
| test.py | Adds missing --resume parameter and fixes output path handling; removes incomplete code line |
| process_data/train_gaussian_batch.py | New batch processing script for training multiple Gaussian splatting scenes |
| process_data/train_gaussian.py | Fixes TENSORBOARD_FOUND flag from False to True |
| process_data/submodules/simple-knn/simple_knn.egg-info/PKG-INFO | Corrects package name from simple_knn to simple-knn |
| process_data/shapenetcore_extract.py | New utility script for extracting and filtering ShapeNet zip archives |
| process_data/shapenetcore_delete_unfinished.py | New cleanup script for removing incomplete rendering directories |
| process_data/scene/dataset_readers.py | Enhances fetchPly to gracefully handle missing color/normal attributes with fallback defaults |
| process_data/sample_points.py | Major refactoring to properly extract and save normals/colors; adds command-line argument support |
| process_data/render_blender_batch.py | New batch rendering script for processing multiple OBJ files in parallel |
| process_data/convert_data.py | Refactors to use argparse for command-line arguments instead of hardcoded paths |
| models/combined_model.py | Adds None check (unreachable) and uses .get() for safer context access |
| environment.yaml | Comments out clip==1.0 dependency (to be installed from git separately) |
| diff_utils/helpers.py | Removes 'metrics' folder from code backup list |
| dataloader/modulation_loader.py | Conditionalizes assertion check and modifies tensor loading for point cloud augmentation |
| config/stage2_conditional/specs.json | Fixes description from "unconditional" to "conditional" |
| README.md | Adds comprehensive documentation of all fixes and installation issues |
| .gitignore | Adds entries for build artifacts and data directories |
Comments suppressed due to low confidence (2)
process_data/sample_points.py:21
- This comment appears to contain commented-out code.
# def sample(arg):
# path, name = arg
# mesh = trimesh.load_mesh(os.path.join(path, name))
#
# num_points = 100000
# points = mesh.sample(num_points)
#
# point_cloud = trimesh.points.PointCloud(points)
#
# save_path = os.path.join(path, 'points3d.ply')
# point_cloud.export(save_path)
process_data/sample_points.py:127
- Unnecessary 'pass' statement.
pass
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| files_to_process.append({ | ||
| 'path': item, | ||
| 'name': item.name, | ||
| 'stem': item.stem, #文件名不含后缀 |
There was a problem hiding this comment.
The comment on line 55 has a typo: "文件名不含后缀" should use better phrasing. Consider "文件名(不含扩展名)" for clarity. While this is a minor documentation issue, clear comments improve code maintainability.
| 'stem': item.stem, #文件名不含后缀 | |
| 'stem': item.stem, # 文件名(不含扩展名) |
| # def sample(arg): | ||
| # path, name = arg | ||
| # mesh = trimesh.load_mesh(os.path.join(path, name)) | ||
| # | ||
| # num_points = 100000 | ||
| # points = mesh.sample(num_points) | ||
| # | ||
| # point_cloud = trimesh.points.PointCloud(points) | ||
| # | ||
| # save_path = os.path.join(path, 'points3d.ply') | ||
| # point_cloud.export(save_path) |
There was a problem hiding this comment.
A large block of old implementation code (lines 11-21) remains commented out. Since the new implementation is working and significantly different, this commented code should be removed to improve code maintainability and readability. If historical reference is needed, it can be retrieved from version control.
| # def sample(arg): | |
| # path, name = arg | |
| # mesh = trimesh.load_mesh(os.path.join(path, name)) | |
| # | |
| # num_points = 100000 | |
| # points = mesh.sample(num_points) | |
| # | |
| # point_cloud = trimesh.points.PointCloud(points) | |
| # | |
| # save_path = os.path.join(path, 'points3d.ply') | |
| # point_cloud.export(save_path) |
| if x is None: | ||
| return None |
There was a problem hiding this comment.
The check for x being None comes after using x to determine which training function to call, which means this check will never be reached if x is actually None. This dead code should be removed, or the logic should be restructured to check for None before attempting to access x's properties.
| vertex_data = np.empty(num_points, dtype=[ | ||
| ('x', 'f4'), ('y', 'f4'), ('z', 'f4'), | ||
| ('nx', 'f4'), ('ny', 'f4'), ('nz', 'f4'), | ||
| ('red', 'u1'), ('green', 'u1'), ('blue', 'u1') # 颜色通常是 8位无符号整数 (0-255) |
There was a problem hiding this comment.
In Chinese comments, there's a spacing inconsistency. The standard Chinese writing convention is to not use spaces between the color values and the unit. Consider using consistent spacing: either "0-255" or maintaining the current format throughout all comments.
| ('red', 'u1'), ('green', 'u1'), ('blue', 'u1') # 颜色通常是 8位无符号整数 (0-255) | |
| ('red', 'u1'), ('green', 'u1'), ('blue', 'u1') # 颜色通常是8位无符号整数(0-255) |
| process = subprocess.Popen(command, stdout=sys.stdout, stderr=sys.stderr) | ||
| process.wait() | ||
|
|
||
| if process.returncode == 0: | ||
| print(f"✅ 训练成功: {source_path}\n") | ||
| else: | ||
| print(f"❌ 训练失败,返回码: {process.returncode}。源路径: {source_path}\n") |
There was a problem hiding this comment.
The subprocess.Popen call outputs directly to sys.stdout and sys.stderr without any buffering control. For long-running batch processes with many objects, this could lead to interleaved output from multiple processes making logs difficult to read. Consider redirecting to individual log files per object or using a logging framework.
| process = subprocess.Popen(command, stdout=sys.stdout, stderr=sys.stderr) | |
| process.wait() | |
| if process.returncode == 0: | |
| print(f"✅ 训练成功: {source_path}\n") | |
| else: | |
| print(f"❌ 训练失败,返回码: {process.returncode}。源路径: {source_path}\n") | |
| log_file_path = os.path.join(target_path, "train.log") | |
| with open(log_file_path, "w", encoding="utf-8") as log_file: | |
| process = subprocess.Popen(command, stdout=log_file, stderr=log_file) | |
| process.wait() | |
| if process.returncode == 0: | |
| print(f"✅ 训练成功: {source_path}\n") | |
| print(f" 日志已保存到: {log_file_path}\n") | |
| else: | |
| print(f"❌ 训练失败,返回码: {process.returncode}。源路径: {source_path}\n") | |
| print(f" 日志已保存到: {log_file_path}\n") |
| # traceback.print_exc() | ||
| pass |
There was a problem hiding this comment.
Bare except clause catches all exceptions silently, which can hide important errors and make debugging difficult. Consider either removing the pass statement to let exceptions propagate, or at minimum logging the exception for debugging purposes.
| # traceback.print_exc() | |
| pass | |
| traceback.print_exc() |
| import argparse | ||
| import os | ||
| import subprocess | ||
| import multiprocessing |
There was a problem hiding this comment.
Module 'multiprocessing' is imported with both 'import' and 'import from'.
| import multiprocessing |
| import sys | ||
|
|
There was a problem hiding this comment.
Import of 'sys' is not used.
| import sys |
| @@ -0,0 +1,180 @@ | |||
| import os | |||
There was a problem hiding this comment.
Import of 'os' is not used.
| import os |
| @@ -0,0 +1,180 @@ | |||
| import os | |||
| import shutil | |||
| import sys | |||
There was a problem hiding this comment.
Import of 'sys' is not used.
| import sys |
具体如下:
ERROR: Could not find a version that satisfies the requirement clip==1.0 (from versions: 0.0.1, 0.1.0, 0.2.0),需要先行从git安装,并修改对应environment.yml文件中:另外,手动安装子模块:
最后再执行环境更新指令:
否则,会出现下载迟滞问题。
xvfb:apt update && apt install xvfb -y然后如下运行脚本:
xvfb-run -a blender --background --python render_blender.py -- --output_folder {images_path} {mesh_path}render_blender.py文件进行渲染。因此,添加了针对obj文件的批处理脚本,可以一键渲染多个obj文件并保持对应路径。python render_blender_batch.py -s {shapenet_folder}AttributeError: 'Scene' object has no attribute 'area'.,修改了sample_points.py中对应代码,手动将scene转换为单一mesh,然后再调用mesh.sample函数进行点云采样。同时,修改了点云文件保存的路径,保存在物品id对应的文件夹下,这样dataset_reader.py才能正确读取。train_gaussian.py,准备GS数据时,出现ValueError: no field of name nx',通过断点调试和单步执行,定位错误到文件dataset_readers.py的fetchPly函数,意思是读取的点云文件中没有法线数据,判断是sample_points.py中point_cloud.export函数没有保存法线数据。此外,还发现fetchPly函数读取到的点云文件中颜色值都是0。于是,不再使用point_cloud.export方法保存数据,转而手动构建plyfile文件,确保点云文件中包含xyz、rgb、normal等数据。train_gaussian_batch.py,批量准备多个场景的数据,默认保存在process_data/output下,文件路径参考ShapeNetCore路径,遵循先类别后物体的双重目录结构。python test.py -e config/stage1/ -r {num epoch}报错的bug,具体原因为test.py中未加入-rparser参数。现在运行参数为python test.py -e config/stage1/ -r {ckpt_fileName}。TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class 'NoneType'>的bug,具体原因为在无条件输入的情况下,数据加载器ModulationLoader的__getitem__函数会返回包含None值的字典,而torch在按批次加载数据时,会把字典中同一个键的值堆叠起来,遇到None就出错了。解决方法为在train.py中略微修改torch.utils.data.DataLoader创建对象的参数,在参数collate_fn中传入一个经过略微修改的default_collate,规避了遇到None值的堆叠。