Skip to content
This repository was archived by the owner on Aug 9, 2025. It is now read-only.

The goal of this group project is to develop and compare different computer vision methods for segmenting standing dead trees in aerial images of forests.

Notifications You must be signed in to change notification settings

ParzHe/CV9517_Group-Project

Repository files navigation

COMP9517 Computer Vision 25T2 Group Project

The goal of this group project is to develop and compare different deep learning computer vision methods for the semantic segmentation of dead trees in aerial images. The dataset used for this project is Dataset, which contains RGB and NRG (NIR-RG) images.

This project is implemented using PyTorch and PyTorch Lightning, and it includes over 60+ deep learning methods for semantic segmentation powered by Use. And, we also try zero-shot and fine-tuning with SAM2 .

The software demonstration Powered by Gradio.

Table of contents:

0. Project Zotero Library

This project has a Zotero library that contains references to papers, articles, and other resources relevant to the project. You can access the Zotero library at the following link:

https://www.zotero.org/groups/6056458/cvers

1. Project Files Structure

This project is organized into several directories and files, each serving a specific purpose. Below is an overview of the project structure:

CV9517_Group-Project/
├── assets/                      # Directory for assets (e.g., images, icons)
├── checkpoints/                 # Directory for model checkpoints (Local storage)
├── data/                        # Directory for data processing scripts
│   ├── __init__.py              # Init file for data processing module
│   ├── datamodule.py            # Script for lightning datamodule (Supports merged, RGB, and NRG modalities)
│   ├── dataset.py               # Script for dataset class
│   ├── transforms.py            # Script for data transformations
│   └── utils.py                 # Utility functions for data processing
├── datasplits                   # Directory for data splits csv files
│   └── data_split_42_70_10.csv  # seed 42, train 70%, val 10%, test 20%
├── gradio/                      # Directory for Gradio app script
│   └── app.py                   # Script for Gradio app (RGB modality only for now)
├── lighting_modules             # Directory for lightning modules
│   ├── __init__.py              # Init file for lightning modules
│   ├── sam2_module              # Script for SAM2 lightning module (Future work)
│   ├── segmentation_module      # Base lightning module for this dead tree segmentation project
│   ├── smp_module               # Script for Segmentation Models PyTorch lightning module
│   └── u2net_module             # Script for U2Net lightning module
├── logs/                        # Directory for logs (Local storage, e.g., TensorBoard logs)
├── models/                      # Directory for model scripts
│   ├── __init__.py              # Init file for models module
│   ├── smp_models_util.py       # Utility functions for models of Segmentation Models PyTorch
│   └── u2net.py                 # Script for U2Net model for semantic segmentation
├── notebooks/                   # Directory for Jupyter notebooks (It is for simple demonstration and testing)
│   └─ segmentation_models.ipynb # Notebook for testing Segmentation Models PyTorch
├── outputs/                     # Directory for output files (e.g., model predictions, evaluation logs)
├── sam2/                        # Directory for SAM2 (You should clone files from the SAM2 official GitHub repository)
├── scripts/                     # Directory for scripts
│   ├── sam2_fine_tune.py        # Script for fine-tuning SAM2 model
│   ├── sam2_ft_lightning.py     # Script for fine-tuning SAM2 model using PyTorch Lightning (Future work)
│   ├── sam2_inference_vis.py    # Script for inference using SAM2 model (Visualize the results)
│   ├── sam2_zero_shot.py        # Script for test zero-shot segmentation using SAM2
│   ├── test_sam2_ft.py          # Script for testing fine-tuned SAM2 model
│   ├── test_smp.py              # Script for testing Segmentation Models PyTorch model
│   ├── train_smp.py             # Script for training Segmentation Models PyTorch model (Over 60 models, with different architectures & feature extractors, on 3 modalities)
│   └── train_u2net.py           # Script for training U2Net model (Future work)
├── utils/                       # Directory for utility scripts
│   ├── __init__.py              # Init file for utils module
│   ├── callbacks.py             # Script for abstracting lightning trainer callbacks
│   ├── logger.py                # Script for abstracting logger construction for each kind of activity
│   └── paths.py                 # Script for paths management
├── .gitignore                   # Git ignore file to exclude unnecessary files from version control
├── environment.yaml             # Conda environment file for dependencies
├── README.md                    # Project overview and instructions
└── requirements.txt             # Pip requirements file

2. Setup Environment Instructions

2.1 Clone the repository

git clone https://github.com/ParzHe/CV9517_Group-Project.git
cd CV9517_Group-Project

2.2 Set up environment

We suggest you to use Linux for the trying. If you are using Windows, we suggest you use WSL (Windows Subsystem for Linux).

If you do not have conda installed, you can install it from the Anaconda website.

2.2.1 Setup Environment with Conda

  • Create a CVers conda environment using the provided environment.yaml file:

    # in the root directory of the project
    conda env create -f ./environment.yaml
  • Activate the CVers environment:

    conda activate CVers

2.2.2 Setup for SAM2 (Optional)

  • Install sam2:

    git clone https://github.com/facebookresearch/sam2.git && cd sam2
    
    pip install -e .
    
  • Download the SAM2 model weights with the following command:

    cd sam2/checkpoints
    bash download_ckpts.sh
  • Then back to the root directory of the project:

    cd ../..

2.3 Activate the environment

Our environment can be activated using the following commands:

conda activate CVers # For the Pytorch Lightning environment.

3. Project Methods

3.1 SMP (Segmentation Models PyTorch)

Person in Charge: Lintao He 何林涛

Participants (listed in alphabetical order by name): Bowei Cheng 程柏威, Chencan Que 阙晨灿, Zhen Yang 杨震, Zitong Wei 魏子童

You can find the papers by directly clicking the architecture and feature extractor names below.

Applied Architectures

  • Unet
  • Unet++: Do not support Mix Vision Transformer (MixViT) as the feature extractor (encoder).
  • Linknet: Do not support Mix Vision Transformer (MixViT) as the feature extractor (encoder).
  • FPN
  • PSPNet
  • PAN: Do not support DenseNet as the feature extractor (encoder).
  • DeepLabV3: Do not support DenseNet as the feature extractor (encoder).
  • DeepLabV3+: Do not support DenseNet as the feature extractor (encoder).
  • UperNet
  • SegFormer

Applied Feature Extractors (Encoders)

Select the following feature extractors (encoders) because they have close parameters.

3.2 SAM2 (Zero-Shot Segmentation & Fine-Tuning)

Person in Charge: Zhen Yang(杨震)

Participant: Lintao He 何林涛

Refer to:

  1. The GitHub repository
  2. Enabling Meta’s SAM 2 model for Geospatial AI on satellite imagery
  3. axXiv paper: "Customized SAM 2 for Referring Remote Sensing Image Segmentation"
  4. axXiv paper: "Zero-Shot Tree Detection and Segmentation from Aerial Forest Imagery"

Dev Environment: CVers or other environments as needed

Note

You need to manually clone the SAM2 repository and download the model weights as described in 2.2.2 Subsection.

4. Training & Testing

4.1 Dataset & Pre-trained Weights

The code will automatically download the dataset from Kaggle, with the kagglehub package. If you want to delete the dataset, you need to go to the .cache/kagglehub/datasets directory. And, the datasplit csv files are stored in the datasplits directory. You can set different seeds and split ratios by modifying the parameters of AerialDeadTreeSegDataModule in different training or test scripts.

For Segmentation Models PyTorch, the pre-trained weights will be automatically downloaded. If you want to delete the pre-trained weights, you need to go to the ~/.cache/huggingface/hub directory to delete as you wish.

For SAM2, the pre-trained weights should download manually as described in the 2.2.2 Subsection.

4.2 Training

4.2.1 SMP (Segmentation Models PyTorch)

To train the Segmentation Models PyTorch, you can use the following command in the root directory of the project:

conda activate CVers  # Activate the CVers environment
python scripts/train_smp.py

This will train all the architectures and feature extractors specified in the arch_list and encoder_only variables in the scripts/train_smp.py file. The training will be performed on the merged, RGB and NRG modalities by default.

Tip

You can modify the BATCH_SIZE and ACCUMULATE_GRAD_BATCHES variables in the scripts/train_smp.py file to adjust the batch size and gradient accumulation. The default values are BATCH_SIZE = 32 and ACCUMULATE_GRAD_BATCHES = 1. If you encounter out-of-memory (OOM) errors, you can try reducing the batch size and increasing the gradient accumulation.

4.2.1.1 Train on a specific architecture or feature extractor:

If you want to train a specific architecture or feature extractor, you can modify the arch_list and encoder_only variables in the scripts/train_smp.py file. For example, to train the Unet architecture with DenseNet161 as the feature extractor, you can set:

arch_list = ['Unet']
encoder_only = ['densenet161']

4.2.1.2 Train on a specific modality:

If you want to train on a specific modality, you can modify the modality_list variable in the scripts/train_smp.py file. For example, to train only on the RGB modality, you can set:

modality_list = ['rgb']

4.2.1.3 Loss Function:

The default loss function is combination of JaccardLoss and FocalLoss. You can modify the loss function in the scripts/train_smp.py file by changing the LOSS1 and LOSS2 variables. For example, to use only JaccardLoss, you can set:

LOSS1 = 'JaccardLoss'
LOSS2 = None

4.2.1.4 Early Stopping:

The training script will automatically stop if the validation loss does not improve for 30 epochs. This can avoid overfitting and save training time. You can modify the EARLY_STOP_PATIENCE variable in the scripts/train_smp.py file to change the patience value.

4.2.1.5 Checkpoints Saving: The training script will automatically save the best 2 model checkpoints in the checkpoints/ directory, each models have 3 modalities (merged, RGB, NRG) folders to save the checkpoints. The checkpoints will be saved with the following naming format:

smp_{encoder}_{arch}/{modality}_{target size}_{version suffix}/{epoch}-{val per_image_mIou}.ckpt

Where:

  • {encoder}: The feature extractor (encoder) name.
  • {arch}: The architecture name.
  • {modality}: The modality name (merged, rgb, nrg).
  • {target size}: The target size of the input images.
  • {version suffix}: The version suffix, can be changed as needed.
  • {epoch}: The epoch number of the saving model.
  • {val per_image_mIou}: The validation imagewise mean Intersection over Union (mIoU) value.

Note

  1. The training script will automatically search the suggested learning rate using the find_lr method from the lightning library.
  2. The training script will automatically log the training and validation metrics to the logs/ directory, which can be viewed using TensorBoard. And, logging summary will also be done in the certain checkpoints/ directory.

4.2.2 SAM2 (Fine-Tuning)

To fine-tune the SAM2 model, you can use the following command in the root directory of the project:

conda activate CVers  # Activate the CVers environment
python scripts/sam2_fine_tune.py

4.3 Testing

4.3.1 SMP (Segmentation Models PyTorch)

To test the Segmentation Models PyTorch, you can use the following command:

conda activate CVers  # Activate the CVers environment
python scripts/test_smp.py

This script will automatically load the best model checkpoints from the checkpoints/ directory and perform inference on the test dataset split with the specified split and modality. The results will generate a csv file in the outputs/smp_test_results directory.

4.3.2 SAM2 (Zero-Shot Segmentation & Fine-Tuning)

4.3.2.1 Zero-Shot Segmentation

For zero-shot segmentation using SAM2, you can use the following command:

conda activate CVers  # Activate the CVers environment
python scripts/sam2_zero_shot.py

This script will perform zero-shot segmentation using the SAM2 model on the test dataset split. The results will be saved in the outputs/sam2_zs_inference directory.

4.3.2.2 Fine-Tuning Test

For fine-tuning the SAM2 model, you can use the following command:

conda activate CVers  # Activate the CVers environment
python scripts/test_sam2_ft.py

This script will load the fine-tuned SAM2 model and perform inference on the test dataset split. The results will be saved in the outputs/sam2_ft_inference directory.

5. Experimental Results

5.1 Metrics Description

  • Per Image IoU: Image-by-image calculation IoU and then average
  • Dataset IoU: IoU calculated on the whole dataset
  • F1 Score: Harmonic mean of precision and recall
  • F2 Score: Harmonic mean of precision and recall with more emphasis on recall
  • Accuracy: Ratio of correct pixel predictions to the total predictions
  • Precision: Ratio of true positive predictions to the total predicted positives
  • Recall: Ratio of true positive predictions to the total actual positives
  • Sensitivity: True positive rate, same as recall
  • Specificity: True negative rate, ratio of true negative predictions to the total actual negatives
  • Test Time (Seconds): Time taken to perform inference on the test dataset

5.2 Modality Mean Metrics (SMP)

Metric RGB-NIR NIR-RG RGB
Per Image IoU 0.4481 0.4290 0.4334
Dataset IoU 0.4605 0.4379 0.4472
F1 Score 0.6020 0.5847 0.5847
F2 Score 0.6089 0.5938 0.5898
Accuracy 0.9839 0.9830 0.9836
Precision 0.6342 0.6139 0.6239
Recall 0.6235 0.6107 0.6038
Sensitivity 0.6235 0.6107 0.6038
Specificity 0.9922 0.9918 0.9924

The result is the mean of all the architectures and feature extractors tested on the RGB-NIR, NIR-RG, and RGB modalities.

5.3 Results Summary of Different Architectures

Architecture Per Image IoU Dataset IoU Test Time (Seconds)
meanmax meanmax meanmax
DeepLabV30.42160.45050.43580.46023.09775.7786
DeepLabV3Plus0.41990.45280.43240.46552.53293.7419
FPN0.44690.47590.45620.49112.76983.8235
Linknet0.42720.47190.44000.48612.81944.2640
PAN0.44270.46240.45380.47624.966437.4820
PSPNet0.42020.44100.43690.45801.89233.0223
Segformer0.44090.46300.45250.48022.68604.1915
UPerNet0.44880.47420.45770.48122.88214.2409
Unet0.44000.48100.45030.49302.81493.8503
Unet++0.44970.48070.45870.49703.54555.4500
Unet with scse0.44410.47900.45670.49593.12734.1764

The result is the mean of all the feature extractors with all three modalities tested on the different architectures. The Unet with scse architecture is a modified version of the Unet architecture with Squeeze-and-Excitation (SE) blocks on decoder.

5.4 Results Summary of Different Feature Extractors (Encoders/Backbones)

Backbone Per Image IoU Dataset IoU Test Time (Seconds)
meanmax meanmax meanmax
Densenet-1610.42920.45460.43910.47453.79325.4500
EfficientNet-b50.45210.48100.46580.49704.820737.4820
MixViT-b20.43540.47590.44420.49113.16135.7786
Resnet500.42430.45800.43660.47622.13813.1704
ResneXt50_32x4d0.43200.46960.44270.47942.30849.5732
SE-Resnet500.43970.47810.45320.48352.46853.2726
SE-ResneXt50_32x4d0.44280.46860.45490.48122.46273.2390

The result is the mean of all the architectures with all three modalities tested on the different feature extractors. The MixViT-b2 is a modified version of the Mix Vision Transformer (MixViT) with a smaller size.

6. Gradio Demo

To run the Gradio demo, you can use the following command:

conda activate CVers  # Activate the CVers environment
python gradio/app.py

Then, you can open your web browser and go to http://localhost:7860 to see the demo. The demo will allow you to upload an image and perform inference using the Segmentation Models Pytorch (SMP) models (SAM2 in the future work). The results will be displayed on the web page as following:

Gradio Demo

Note

  1. The Gradio demo is only supported for use local checkpoints, so you need to run the scripts/train_smp.py script to train the models and save the checkpoints in the checkpoints/ directory before running the Gradio demo.
  2. The Gradio demo currently only supports the RGB modality. So, you need to use the RGB images from the dataset. The NRG and merged modalities will be supported in the future work.

7. Future Work

We plan to implement the following features in the future:

  • More Models: We will add more models to the project, including U2Net and other segmentation models. This will help us to compare the performance of different models on the same dataset.
  • SAM2 zero-shot with prompt: We will explore the use of prompt techniques to improve the zero-shot segmentation capabilities of the SAM2 model in this task.
  • Different Loss Functions: We will experiment with different loss functions to improve the performance of the models. This includes trying out different combinations of loss functions and hyperparameters.

About

The goal of this group project is to develop and compare different computer vision methods for segmenting standing dead trees in aerial images of forests.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •