ComStock Processor

A Python class to help download ComStock data locally for analysis. The ComStockProcessor class provides an easy interface to download metadata and time series data from the ComStock dataset hosted on AWS S3.

Installation

Install and run poetry:

pip install poetry
poetry install

ComStockProcessor Class

The ComStockProcessor class is located in lib/comstock_processor.py and provides methods to download and process ComStock building data.

Initialization

from pathlib import Path
from lib.comstock_processor import ComStockProcessor

# Initialize the processor
processor = ComStockProcessor(
    state="CA",           # 2-letter state abbreviation
    county_name="All",    # County name or "All"
    building_type="All",  # Building type or "All"
    upgrade="0",          # Upgrade identifier (0 = baseline)
    base_dir=Path("./datasets/comstock")  # Local directory to save data
)

Methods

`process_metadata(save_dir: Path) -> pd.DataFrame`

Downloads and processes ComStock metadata with filtering based on the class constraints.

Downloads the baseline metadata parquet file if not already present
Filters by state, county, and building type as specified during initialization
Saves filtered results as a CSV file
Returns a pandas DataFrame with the filtered metadata

`process_building_time_series(data_frame, save_dir: Path) -> tuple`

Downloads time series data for buildings specified in the input DataFrame using parallel execution.

Uses multi-threading to download building time series files efficiently
Skips downloading files that already exist locally
Downloads from the ComStock AWS S3 bucket
Returns paths and building IDs of downloaded files

Usage Example

from pathlib import Path
from lib.comstock_processor import ComStockProcessor

# Set up directories
base_dir = Path("./datasets/comstock")
timeseries_dir = base_dir / "timeseries"
for d in [base_dir, timeseries_dir]:
    d.mkdir(parents=True, exist_ok=True)

# Initialize processor for California data
processor = ComStockProcessor(
    state="CA",
    county_name="All",
    building_type="All",
    upgrade="0",
    base_dir=base_dir
)

# Download and filter metadata
metadata_df = processor.process_metadata(save_dir=base_dir)

# Download time series data for buildings in metadata
paths, building_ids = processor.process_building_time_series(
    metadata_df,
    save_dir=timeseries_dir
)

Data Source

The processor downloads data from the ComStock dataset hosted on AWS S3:

Base URL: https://oedi-data-lake.s3.amazonaws.com/nrel-pds-building-stock/end-use-load-profiles-for-us-building-stock/2024/comstock_amy2018_release_1/
Data Explorer: OpenEI Data Lake Explorer

Performance Features

Parallel Downloads: Uses ThreadPoolExecutor for concurrent file downloads
Smart Caching: Skips downloading files that already exist locally
Progress Tracking: Shows download progress with tqdm progress bars
Efficient Filtering: Uses pandas parquet filtering for large datasets

Development

Testing

The ComStock processor includes comprehensive unit and integration tests that validate the downloading and processing functionality.

Running Tests

Run specific test categories:

# Unit tests only (fast)
poetry run pytest tests/ -m "unit" -v

# Integration tests (downloads small datasets)
poetry run pytest tests/ -m "integration" -v

# All tests including large dataset downloads
TEST_DATA=true poetry run pytest tests/ -m "integration" -v

# Run all tests
poetry run pytest tests -v

Test Categories

Unit tests: Fast tests that verify initialization and basic functionality
Integration tests: Tests that download and process real ComStock data

Committing

Before pushing changes to GitHub, run pre-commit to format the code consistently:

pre-commit run --all-files

If this doesn't work, try:

poetry update
poetry run pre-commit run --all-files

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
.pre-commit		.pre-commit
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
01_data_sampling_example.ipynb		01_data_sampling_example.ipynb
LICENSE.md		LICENSE.md
README.md		README.md
comstock_processor.py		comstock_processor.py
cspell.json		cspell.json
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ComStock Processor

Installation

ComStockProcessor Class

Initialization

Methods

`process_metadata(save_dir: Path) -> pd.DataFrame`

`process_building_time_series(data_frame, save_dir: Path) -> tuple`

Usage Example

Data Source

Performance Features

Development

Testing

Running Tests

Test Categories

Committing

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

SEED-platform/comstock-processor

Folders and files

Latest commit

History

Repository files navigation

ComStock Processor

Installation

ComStockProcessor Class

Initialization

Methods

process_metadata(save_dir: Path) -> pd.DataFrame

process_building_time_series(data_frame, save_dir: Path) -> tuple

Usage Example

Data Source

Performance Features

Development

Testing

Running Tests

Test Categories

Committing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

`process_metadata(save_dir: Path) -> pd.DataFrame`

`process_building_time_series(data_frame, save_dir: Path) -> tuple`

Packages