UNet Image Segmentation

Image segmentation is a computer vision and image processing technique that involves grouping or labeling similar regions or segments in an image at the pixel level. Each segment of pixels is represented by a class label or a mask.

In image segmentation, an image consists of two main components:

Things: Countable objects in an image (e.g., people, flowers, birds, animals, etc.).
Stuff: Amorphous regions (or repeating patterns) of similar material, which are uncountable (e.g., road, sky, grass).

Types of Image Segmentation

Semantic Segmentation

Semantic segmentation assigns a class label to every pixel in an image. It classifies regions as belonging to a particular category, such as a car, tree, or road. However, it does not differentiate between multiple instances of the same object. For example, if an image contains two cars, semantic segmentation will classify both as "car" but will not distinguish between them.

Commonly used architectures for semantic segmentation include:

SegNet
U-Net
DeconvNet
Fully Convolutional Networks (FCNs)

Instance Segmentation

Instance segmentation extends semantic segmentation by distinguishing between different instances of the same class. It assigns a unique mask or bounding box to each object in an image. This is useful for tasks where object counting or differentiation is required, such as detecting multiple cars or people in an image.

Panoptic Segmentation

Panoptic segmentation combines the best aspects of both semantic and instance segmentation. Each pixel in an image is assigned both a semantic label (class) and a unique instance identifier. This approach enables a more comprehensive understanding of the scene by distinguishing between different objects while also classifying background regions.

U-Net for Semantic Segmentation

U-Net is a widely used deep learning architecture for semantic segmentation. It follows a U-shaped design with an encoder-decoder structure:

Encoder (Contracting Path): Uses convolutional layers to capture spatial features while downsampling the image.
Decoder (Expanding Path): Uses upsampling layers to reconstruct the segmented image while preserving spatial details.
Skip Connections: Connect corresponding layers in the encoder and decoder to retain high-resolution information.

U-Net is commonly used in medical image segmentation, satellite image processing, and other pixel-wise classification tasks.

Loss Functions for Image Segmentation

The choice of loss function significantly impacts the performance of a segmentation model. Some commonly used loss functions include:

Cross-Entropy Loss: Measures the difference between predicted and ground-truth probability distributions.
Intersection over Union (IoU) Loss: Measures the overlap between the predicted mask and ground truth. IoU loss penalizes cases where either precision or recall is low.
Dice Loss: Computes the similarity between the predicted and actual segmentation masks. It is particularly useful for imbalanced datasets.
Tversky Loss: A variant of Dice loss that allows adjusting the balance between false positives and false negatives, making it suitable for highly imbalanced datasets.
Focal Loss: Focuses on hard-to-classify examples by down-weighting easy samples, improving performance on challenging datasets.

Evaluation Metrics for Image Segmentation

To assess model performance, various evaluation metrics are used:

Pixel Accuracy: Measures the percentage of correctly classified pixels.
Intersection over Union (mIoU): The Intersection over Union (IoU) metric measures the overlap between the predicted and ground truth segmentation masks.
Precision, Recall, and F1 Score: Measures model performance in detecting true positives and avoiding false positives/negatives.

Project Overview

This repository aims to build UNet from scratch for binary semantic segmentation.

Differences from the Original UNet Implementation

The UNet implementation in the original paper used cropping while concatenating feature maps from the contracting and expanding paths. This was necessary due to the loss of border pixels in every convolution. The original UNet did not use padding, which resulted in an output image smaller than its input (e.g., an input of 572×572 produced an output of 388×388). In this implementation, padding is used in the contracting path, ensuring that the output feature map has the same size as the input image.

Dataset

The dataset used is a Kaggle person segmentation dataset.

Training

Two UNet models were trained using different loss functions—one with the Dice Loss and the other with Soft IoU (Jaccard) Loss.
Using Soft IoU as a loss function brought the optimization closer to capturing what we really care about.
Model checkpoints were also used to save training progress at regular intervals.
The UNet model trained with the Soft IoU loss significantly outperformed the Dice loss model, as seen in the inference results.

Training Plot

Learning plot for the UNet model trained with Dice Loss

Inference result

Dice Loss:

Soft IoU Loss:

Project description

UNET_SEGMENTATION/
│
├── checkpoints/         # Directory for storing model checkpoints
│   └──checkpoint1.pth           # model checkpoint file
│
├── Data/                        # Directory for dataset
│   ├── test/                    # Test dataset directory
│   └── train/                   # Train dataset directory
│
├── input_images/                # Directory for input images
│
├── output_images/               # Directory for output images
│
├── trained_model/               # Directory for saved model
│   └── unet_segmentation.pth    # saved model file
│
├── scr/                         # Directory for project scripts
│   ├── load_data.py             # load dataset srcipt
│   ├── model.py                 # UNet model definition
│   ├── train.py                 # Training script
│   ├── utils.py                 # Utility functions
│   └── inference.py             # Inference script
│
├── config.json                  # json configuration file
│
├── README.md                    # Project README file

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Output		Output
scr		scr
.gitignore		.gitignore
config.json		config.json
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UNet Image Segmentation

Types of Image Segmentation

Semantic Segmentation

Instance Segmentation

Panoptic Segmentation

U-Net for Semantic Segmentation

Loss Functions for Image Segmentation

Evaluation Metrics for Image Segmentation

Project Overview

Differences from the Original UNet Implementation

Dataset

Training

Training Plot

Inference result

Project description

About

Uh oh!

Releases

Packages

Languages

Nene-S/UNet_Segmentation

Folders and files

Latest commit

History

Repository files navigation

UNet Image Segmentation

Types of Image Segmentation

Semantic Segmentation

Instance Segmentation

Panoptic Segmentation

U-Net for Semantic Segmentation

Loss Functions for Image Segmentation

Evaluation Metrics for Image Segmentation

Project Overview

Differences from the Original UNet Implementation

Dataset

Training

Training Plot

Inference result

Project description

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages