Learning to Retrieve for Environmental Knowledge Discovery: An Augmentation-Adaptive Self-Supervised Learning Framework

Overview

Requirements

System Requirements

Python 3.8 or higher
CUDA-compatible GPU (recommended for training)

Core Dependencies

torch>=2.0.0
torchvision>=0.15.0
torchaudio>=2.0.0
numpy>=1.24.0
pandas>=2.0.0
scikit-learn>=1.3.0
matplotlib>=3.8.0
scipy>=1.11.0

Quick Start

Basic Usage

python train.py

With Custom Parameters

python train.py --task DO --save_path results_experiment1

Training Configuration

--task: Task type - 'DO' for dissolved oxygen or 'Temp' for temperature (default: 'DO')
--seed: Random seed for reproducibility (default: 21)
--save_path: Directory to save all training outputs (default: 'top_n_100')

Training Pipeline

The model follows a sophisticated 5-phase training pipeline:

Phase 1: Encoder Training

Model: BiLSTM encoder
Objective: Learn lake embeddings using contrastive learning
Loss: Contrastive loss with positive, semi-positive, and negative pairs
Output: Trained encoder + precomputed embeddings for retrieval

Phase 2: Yearly Model Training

Model: LSTM
Objective: Learn annual dissolved oxygen patterns
Training: Two-step process:
1. Pretraining with simulation data
2. Fine-tuning with real observations
Output: Trained yearly prediction model

Phase 3: Joint Encoder-Decoder Training

Models: Encoder + Monthly decoders
Objective: Jointly optimize encoder and monthly prediction decoders
Features: Retrieval-augmented learning with top-N similar samples

Phase 4: Discriminator Training

Models: Two discriminators for epi/hyp
Objective: Learn to discriminate between yearly vs monthly model predictions
Training: Binary classification on synthetic vs retrieved predictions
Output: Trained discriminators with optimal thresholds

Phase 5: Final Testing and Inference

Process: Use discriminators to select between yearly and monthly predictions
Output: Final predictions and performance metrics

Data description

There are 47 features in total:

Core features:

datetime: Date and time information
sat_hypo: Simulated hypolimnion DO saturation concentration
thermocline_depth: Simulated thermocline depth
temperature_epi: Simulated epilimnion water temperature
temperature_hypo: Simulated hypolimnion water temperature
volume_epi: Simulated epilimnion volume
volume_hypo: Simulated hypolimnion volume
wind: Derived wind speed
airtemp: Derived air temperature
fnep: Simulated net ecosystem production flux
fmineral: Simulated mineralisation flux
fsed: Simulated net sedimentation flux
fatm: Simulated atmospheric exchange flux
fdiff: Simulated diffusion flux
fentr_epi: Simulated entrainment flux (epilimnion)
fentr_hyp: Simulated entrainment flux (hypolimnion)
eutro: Derived classification for eutrophic state
oligo: Derived classification for oligotrophic state
dys: Derived classification for dystrophic state
water: Derived classification proportion for water landuse
developed: Derived classification proportion for developed landuse
barren: Derived classification proportion for barren landuse
forest: Derived classification proportion for forest landuse
shrubland: Derived classification proportion for shrubland landuse
herbaceous: Derived classification proportion for herbaceous landuse
cultivated: Derived classification proportion for cultivated landuse
wetlands: Derived classification proportion for wetlands landuse
depth: Derived maximum lake depth
area: Derived maximum lake surface area
elev: Derived lake elevation
Shore_len: Shore length
Vol_total: Lake volume
Vol_res: Lake volume residual
Vol_src: Lake volume supplement
Depth_avg: Average lake depth
Dis_avg: Average inflow discharge
Res_time: Lake residence time
Elevation: Alternative lake elevation
Slope_100: Lake slope information
Wshd_area: Watershed area
ShortWave: Daily shortwave radiation
LongWave: Daily longwave radiation
RelHum: Daily relative humidity
Rain: Daily precipitation (rain)
Snow: Daily precipitation (snow)
ice: Ice cover indicator
sim_epi: Simulated epilimnion DO concentration
sim_hyp: Simulated hypolimnion DO concentration

Target variables:

obs_epi: Observed epilimnion DO concentration
obs_hyp: Observed hypolimnion DO concentration

Model-specific usage:

Encoder (BiLSTM): Uses all 47 features including sim_epi and sim_hyp
Yearly model (LSTM): Uses 45 features (excludes sim_epi and sim_hyp)
Monthly decoder: Uses 26 features (subset of the core features for monthly predictions)

Contact

For questions, issues, or collaborations related to this project, please contact:

Shiyuan Luo - shl298@pitt.edu
Runlong Yu - ryu5@ua.edu

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
.gitignore		.gitignore
A2SL_framework.png		A2SL_framework.png
README.md		README.md
data_month.py		data_month.py
data_year.py		data_year.py
model.py		model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Learning to Retrieve for Environmental Knowledge Discovery: An Augmentation-Adaptive Self-Supervised Learning Framework

Overview

Requirements

System Requirements

Core Dependencies

Quick Start

Basic Usage

With Custom Parameters

Training Configuration

Training Pipeline

Phase 1: Encoder Training

Phase 2: Yearly Model Training

Phase 3: Joint Encoder-Decoder Training

Phase 4: Discriminator Training

Phase 5: Final Testing and Inference

Data description

Core features:

Target variables:

Model-specific usage:

Contact

About

Uh oh!

Releases

Packages

Languages

RunlongYu/A2SL

Folders and files

Latest commit

History

Repository files navigation

Learning to Retrieve for Environmental Knowledge Discovery: An Augmentation-Adaptive Self-Supervised Learning Framework

Overview

Requirements

System Requirements

Core Dependencies

Quick Start

Basic Usage

With Custom Parameters

Training Configuration

Training Pipeline

Phase 1: Encoder Training

Phase 2: Yearly Model Training

Phase 3: Joint Encoder-Decoder Training

Phase 4: Discriminator Training

Phase 5: Final Testing and Inference

Data description

Core features:

Target variables:

Model-specific usage:

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages