A time-series forecasting project designed to predict strawberry prices with a 2-week horizon using a simple Transformer model. The dataset includes weekly data between 2013 and 2023 with weather and pricing features.
βββ data/
β βββ dataset.py # PyTorch Dataset and DataLoader split logic
β βββ preprocess.py # Preprocessing, time-aware imputation logic
β
βββ local_data/
β βββ models/ # Trained model weights (e.g. model.pt)
β βββ plots/ # Saved performance plots
β βββ processed/ # Preprocessed parquet files
β βββ raw/ # Raw CSV data
β
βββ model/
β βββ model.py # Transformer model definition
β βββ simulate.py # Evaluation script with metrics + matplotlib plots
β βββ train.py # Training loop with wandb integration
- Safe handling of missing values
- Weekly date reconstruction
- Time-aware imputation of prices with Random Forest regressors if desired
python -m data.preprocess --data local_data/raw/senior_ds_test.csv --impute- Custom
PriceForecastDatasetfor rolling-window normalization - Splits data by year into train/val/test
- Efficient
DataLoadersetup
- Minimal Transformer encoder
- Sequence-to-one prediction
- Continuous price output
- Logs to Weights & Biases (wandb)
- Early saving of best model
python -m model.train --data local_data/processed/data_2013_2023.parquet- MAE, RMSE, MAPE, RΒ² metrics
- Matplotlib plots for predictions vs actuals
python -m model.simulate --model_path local_data/models/original.ptInstall everything with:
conda env create -f environment.ymlThen install preferred PyTorch in that environment, e.g.:
conda activate dylan-merqato
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124- All preprocessing is done without data leakage (only past data is used).
- The model is very lightweight and intended as a prototype β you can easily extend it with more features, attention masks, or richer architectures.
- Data is stored locally to prevent large files in version control.
Dylan Prins @ Neurality for Merqato Β· 2025