This project provides an automated pipeline for forecasting influenza hospitalizations using the TEMPO model. The pipeline downloads the latest data, runs historical model fits, generates current season forecasts, and produces visualizations.
The complete forecasting pipeline is managed through the Makefile and consists of the following stages:
Creates a Python virtual environment (.forecast/) and installs all required dependencies from requirements.txt.
make build_envThe pipeline downloads multiple data sources required for forecasting:
- Downloads public laboratory data using R script
- Formats and processes the lab data
- Outputs:
analysis_data/clinical_and_public_lab_data__formatted.csv
make download_clinical_data- Downloads recent ILINet (Influenza-Like Illness Network) data
- Outputs:
analysis_data/ili_data_all_states_2021_present.csv
make download_ili- Downloads NHSN percent hospital reporting data
- Outputs:
analysis_data/pct_hospital_reporting.csv
make download_hosp_pct_data- Downloads weather data that may influence flu transmission
make download_weather_dataShortcut to run all data downloads:
make run_dataRuns the TEMPO model on historical flu seasons to estimate model parameters and validate performance:
- Processes all past seasons for all states
- Combines results into parameter estimates
- Outputs:
historical_model_run_for_tempo/all_past_param_estimates__tempo4.csv
make run_historical_forecastsGenerates forecasts for the current flu season:
- Uses the TEMPO model with historical parameters
- Produces individual state-level forecasts
- Combines all forecasts into a single file
- Outputs: Individual forecasts in
forecasts/directory and combined timestamped file intime_stamped_forecasts/
make run_current_season_forecastsCreates visualizations of state-level forecasts for all locations.
make visualize_state_level_forecastsTo run the entire pipeline from start to finish:
make forecastThis single command executes all stages in the correct order:
- Build environment
- Download all data sources
- Run historical forecasts
- Generate current season forecasts
- Create visualizations
The project includes an interactive web application built with Streamlit that displays influenza hospitalization forecasts. The app allows users to select multiple locations and view forecasts with prediction intervals.
- Interactive location selection (US states and national level)
- Median forecast lines with 50% and 80% prediction intervals
- Dynamic visualization that adjusts based on number of selected locations
- Real-time data from AWS S3 storage
- Navigate to the webapp directory:
cd webapp- Install webapp-specific requirements:
pip install -r requirements.txt- Set up AWS credentials (required for data access):
Create a
.streamlit/secrets.tomlfile with your AWS credentials:
AWS_ACCESS_KEY_ID = "your_access_key"
AWS_SECRET_ACCESS_KEY = "your_secret_key"- Run the Streamlit app:
streamlit run main.py- The app will open automatically in your web browser (typically at
http://localhost:8501)
If deployed, the web app can be accessed at the URL provided by your Streamlit hosting service.
analysis_data/: Scripts and data for downloading and formatting source datadata/: Reference data including location information and target dataforecasts/: Individual forecast files for each locationtime_stamped_forecasts/: Combined forecast files with timestampshistorical_model_run_for_tempo/: Historical model runs and parameter estimatesmodel/tempo/: TEMPO model implementationwebapp/: Streamlit web application for visualizing forecastsMakefile: Automated pipeline workflowrequirements.txt: Python dependencies for forecasting pipeline
- Python 3.x
- R (for downloading clinical data)
- See
requirements.txtfor Python package dependencies - See
webapp/requirements.txtfor web app dependencies
- The pipeline uses the TEMPO model (version 4) for forecasting
- Forecasts are generated for all US states and national level
- Data sources include CDC's ILINet, NHSN hospital data, and public laboratory data
- The virtual environment is created in
.forecast/directory
Thomas McAndrew
Associate Professor
Department of Biostatistics
Lehigh University
Email: mcandrew@lehigh.edu
Lab Website: Computational Uncertainty Lab