A high-performance PyTorch implementation of Earth Mover Distance (EMD) for point clouds using CUDA. This package provides efficient computation of EMD with automatic differentiation support for deep learning applications.
Note: This repository is an updated and improved version of daerduoCarey/PyTorchEMD. Special thanks to the original authors for their foundational work.
- Fast CUDA Implementation: High-performance CUDA kernels for EMD computation
- PyTorch Integration: Seamless integration with PyTorch's autograd system
- Cross-Platform: Works on Windows, Linux, and macOS
- Modern PyTorch API: Compatible with PyTorch 1.8+ and CUDA 11.0+
- Flexible Input Formats: Supports both BNC and BCN tensor formats
- Robust Error Handling: Multiple fallback mechanisms for CUDA extension loading
- CUDA: 11.0 or higher
- Python: 3.7 or higher
- PyTorch: 1.8.0 or higher with CUDA support
- C++ Compiler:
- Windows: Visual Studio 2019/2022 Build Tools
- Linux: GCC 7+ or Clang 6+
- macOS: Xcode Command Line Tools
pip install torch>=1.8.0 numpy- Clone the repository:
git clone https://github.com/hieulhaiwork/EMD-Pytorch.git
cd EMD-Pytorch- Build and install:
On Windows:
# Make sure you have Visual Studio Build Tools installed
python setup.py build_ext --inplace
pip install -e .On Linux/macOS:
# Make sure you have GCC/Clang and CUDA toolkit installed
chmod +x build.sh
./build.sh
pip install -e .For development or if you encounter issues:
git clone https://github.com/hieulhaiwork/EMD-Pytorch.git
cd EMD-Pytorch
pip install -e .import torch
from emd import earth_mover_distance
# Create sample point clouds (batch_size=2, num_points=1000, dims=3)
xyz1 = torch.randn(2, 1000, 3).cuda()
xyz2 = torch.randn(2, 1000, 3).cuda()
# Compute EMD
distance = earth_mover_distance(xyz1, xyz2, transpose=False)
print(f"EMD: {distance}") # Output: tensor([123.45, 67.89], device='cuda:0')import torch
import torch.nn as nn
from emd import EMDLoss
class PointCloudAutoEncoder(nn.Module):
def __init__(self):
super().__init__()
self.emd_loss = EMDLoss(transpose=False)
# ... your model layers
def forward(self, input_pc, reconstructed_pc):
# ... model forward pass
loss = self.emd_loss(input_pc, reconstructed_pc)
return loss
# Usage
model = PointCloudAutoEncoder().cuda()
input_points = torch.randn(4, 2048, 3).cuda()
reconstructed = model(input_points)import torch
from emd import earth_mover_distance
# BNC format (Batch, Num_points, Channels) - Default
xyz1_bnc = torch.randn(2, 1000, 3).cuda()
xyz2_bnc = torch.randn(2, 1000, 3).cuda()
distance_bnc = earth_mover_distance(xyz1_bnc, xyz2_bnc, transpose=False)
# BCN format (Batch, Channels, Num_points)
xyz1_bcn = torch.randn(2, 3, 1000).cuda()
xyz2_bcn = torch.randn(2, 3, 1000).cuda()
distance_bcn = earth_mover_distance(xyz1_bcn, xyz2_bcn, transpose=True)
# Both should give similar results
print(f"BNC format: {distance_bnc}")
print(f"BCN format: {distance_bcn}")import torch
from emd import earth_mover_distance
# Enable gradients
xyz1 = torch.randn(2, 1000, 3, requires_grad=True).cuda()
xyz2 = torch.randn(2, 1000, 3, requires_grad=True).cuda()
# Forward pass
distance = earth_mover_distance(xyz1, xyz2, transpose=False)
loss = distance.sum()
# Backward pass
loss.backward()
print(f"xyz1 gradient shape: {xyz1.grad.shape}")
print(f"xyz2 gradient shape: {xyz2.grad.shape}")Run the comprehensive test suite to verify everything is working:
# Basic functionality test
python tests/simple_test.py
# Loss function test
python tests/loss_test.py
# Complete validation suite
python tests/final_validation.py
# Performance benchmark
python -c "
import torch
from emd import earth_mover_distance
import time
# Benchmark
xyz1 = torch.randn(4, 2048, 3).cuda()
xyz2 = torch.randn(4, 2048, 3).cuda()
# Warmup
for _ in range(10):
_ = earth_mover_distance(xyz1, xyz2, transpose=False)
# Timing
torch.cuda.synchronize()
start = time.time()
for _ in range(100):
dist = earth_mover_distance(xyz1, xyz2, transpose=False)
torch.cuda.synchronize()
end = time.time()
print(f'Average time: {(end-start)/100:.4f} seconds')
"1. "ImportError: DLL load failed while importing emd_cuda"
- Make sure Visual Studio Build Tools are installed (Windows)
- Verify CUDA toolkit version matches PyTorch CUDA version
- Try rebuilding:
python setup.py build_ext --inplace --force
2. "CUDA out of memory"
- Reduce batch size or number of points
- Use
torch.cuda.empty_cache()between iterations
3. "RuntimeError: CUDA is not available"
- Install PyTorch with CUDA support:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118 - Verify CUDA installation:
nvcc --version
4. Compilation errors on Linux
- Install build essentials:
sudo apt-get install build-essential - Make sure GCC version is compatible with your CUDA version
Enable verbose compilation for debugging:
import os
os.environ['TORCH_CUDA_ARCH_LIST'] = "6.0;6.1;7.0;7.5;8.0;8.6" # Adjust for your GPU
os.environ['MAX_JOBS'] = "4" # Limit parallel compilation
# Then rebuild
python setup.py build_ext --inplace --force --verboseTypical performance on modern hardware:
| GPU | Point Cloud Size | Batch Size | Time per Forward Pass |
|---|---|---|---|
| RTX 3080 | 2048 points | 4 | ~8ms |
| RTX 3090 | 4096 points | 8 | ~15ms |
| V100 | 2048 points | 16 | ~12ms |
Performance may vary based on point cloud distribution and system configuration.
EMD-Pytorch/
βββ emd/ # Main package
β βββ __init__.py # Package initialization
β βββ emd.py # Python wrapper with robust loading
β βββ cuda/ # CUDA implementation
β βββ emd.cpp # PyTorch C++ interface
β βββ emd_kernel.cu # CUDA kernel implementation
βββ tests/ # Test suite
β βββ __init__.py # Test package initialization
β βββ simple_test.py # Basic functionality test
β βββ loss_test.py # Loss function test
β βββ check_compatibility.py # Platform compatibility checker
β βββ final_validation.py # Complete validation suite
βββ .github/workflows/ # CI/CD configuration
βββ setup.py # Build configuration
βββ pyproject.toml # Project metadata
βββ build.sh # Linux/macOS build script
βββ MANIFEST.in # Package inclusion rules
βββ CONTRIBUTING.md # Contribution guidelines
βββ LICENSE # MIT license
βββ README.md # This file
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
git clone https://github.com/hieulhaiwork/EMD-Pytorch.git
cd EMD-Pytorch
# Install in development mode
pip install -e .
# Run tests
python final_validation.pyThis project is licensed under the MIT License - see the LICENSE file for details.
This project is based on the excellent work from daerduoCarey/PyTorchEMD. We extend our sincere gratitude to the original authors:
- Haoqiang Fan: Original CUDA implementation
- Kaichun Mo: PyTorch wrapper
- Jiayuan Gu: Additional contributions and improvements
- daerduoCarey: Project organization and improvements
This updated version includes enhanced cross-platform compatibility, improved error handling, and modern Python packaging standards.
If you use this code in your research, please consider citing:
@misc{emd-pytorch,
title={Earth Mover Distance CUDA Extension for PyTorch},
author={Fan, Haoqiang and Mo, Kaichun and Gu, Jiayuan},
year={2025},
url={https://github.com/hieulhaiwork/EMD-Pytorch}
}