Official PyTorch implementation of PortionNet, a novel cross-modal knowledge distillation framework for accurate food nutrition estimation from single RGB images.
PortionNet: Distilling 3D Geometric Knowledge for Food Nutrition Estimation
Darrin Bright, Rakshith Raj, Kanchan Keisham
Vellore Institute of Technology
Accepted at CVIS 2025 - 11th Annual Conference on Vision and Intelligent Systems
arXiv preprint arXiv:2512.22304, 2025
PortionNet addresses the challenge of accurate food nutrition estimation from single RGB images by learning 3D geometric features from point clouds during training while requiring only RGB images at inference.
Figure 1: Overview of the PortionNet framework. During training, the model uses both RGB images and point clouds with cross-modal knowledge distillation. At inference, only RGB images are required.
- Python 3.8+
- PyTorch 2.0+
- CUDA 11.0+ (for GPU training)
git clone https://github.com/darrinbright/PortionNet.git
cd PortionNet
conda create -n portionnet python=3.8
conda activate portionnet
pip install -r requirements.txt
pip install open3d- Download the MetaFood3D dataset from here
- Extract and organize as follows:
MetaFood3D/
├── RGB/
│ ├── apple/
│ │ ├── apple_1.png
│ │ └── ...
│ └── ...
├── Point_Cloud/
│ ├── apple/
│ │ ├── apple_1.ply
│ │ └── ...
│ └── ...
└── _MetaFood3D_new_complete_dataset_nutrition_v2.xlsx
Download SimpleFood45 from here and organize similarly.
python src/train.py \
--data_dir /path/to/MetaFood3D \
--excel_path /path/to/MetaFood3D/_MetaFood3D_new_complete_dataset_nutrition_v2.xlsx \
--output_dir ./outputs \
--epochs 25 \
--batch_size 16 \
--seed 7For reproducible results as reported in the paper:
# Seed 7
python src/train.py --data_dir /path/to/MetaFood3D --excel_path /path/to/excel --seed 7 --output_dir ./outputs/seed7
# Seed 13
python src/train.py --data_dir /path/to/MetaFood3D --excel_path /path/to/excel --seed 13 --output_dir ./outputs/seed13
# Seed 2023
python src/train.py --data_dir /path/to/MetaFood3D --excel_path /path/to/excel --seed 2023 --output_dir ./outputs/seed2023--rgb_only_ratio: Proportion of batches trained in RGB-only mode (default: 0.3)--lambda_distill: Weight for distillation loss (default: 0.5)--lambda_reg: Weight for regression loss (default: 0.1)
# RGB-only mode (inference mode)
python src/evaluate.py \
--data_dir /path/to/MetaFood3D \
--excel_path /path/to/excel \
--checkpoint ./outputs/best_model_seed7.pt \
--mode rgb_only \
--output_file results_rgb.json
# Multimodal mode (with point clouds)
python src/evaluate.py \
--data_dir /path/to/MetaFood3D \
--excel_path /path/to/excel \
--checkpoint ./outputs/best_model_seed7.pt \
--mode multimodal \
--output_file results_multimodal.jsonpython src/evaluate.py \
--data_dir /path/to/SimpleFood45 \
--excel_path /path/to/SimpleFood45/labels.xlsx \
--checkpoint ./outputs/best_model_seed7.pt \
--mode rgb_only \
--num_classes 12 \
--output_file results_simplefood45.jsonIf you find this work useful, please cite our paper:
@article{bright2025portionnet,
title={PortionNet: Distilling 3D Geometric Knowledge for Food Nutrition Estimation},
author={Bright, Darrin and Raj, Rakshith and Keisham, Kanchan},
journal={arXiv preprint arXiv:2512.22304},
year={2025}
}- Darrin Bright: darrin.bright2022@vitstudent.ac.in
- Rakshith Raj: rakshith.raj2022@vitstudent.ac.in