Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -173,4 +173,11 @@ poetry.toml
# ruff
.ruff_cache/

# End of https://www.toptal.com/developers/gitignore/api/python
# End of https://www.toptal.com/developers/gitignore/api/python

# Project specific
results/

# OSMnx cache (temporary files)
cache/
analysis/cache/
97 changes: 97 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Changelog - Bikenv Prediction Platform

## 2025-12-27 - Initial Implementation

### Added
- **Data Retrieval Script** (`scripts/retrieve_data.py`)
- Manual data entry from Copenhagenize Index 2025 edition
- Function to fetch and save top 30 cities with scores
- Notes for future automated scraping implementation

- **Index Calculation Functions** (`scripts/calculate_indices.py`)
- `calculate_altitude_index()`: Measures city hilliness using OSM elevation data
- `calculate_distance_index()`: Measures network connectivity/compactness
- Both functions integrated with OSMnx for real geographic data

- **Analysis Platform** (`analysis/prediction_platform.py`)
- Comprehensive hypothesis testing framework
- Statistical analysis (Pearson, Spearman correlations)
- Linear regression modeling
- Automated visualization generation
- CSV export of results

- **Demo Mode** (`analysis/demo_platform.py`)
- Simplified version with synthetic data
- No API dependencies required
- Quick testing and validation

- **Project Structure**
- `data/` - Reference datasets
- `scripts/` - Data retrieval and calculation utilities
- `analysis/` - Main platform and demo scripts
- `results/` - Output directory for plots and CSVs

- **Documentation**
- Comprehensive README with methodology and usage
- Structure verification script
- Requirements file for dependencies

### Changed
- **Updated to Copenhagenize Index 2025 Edition**
- Previous: Referenced "Global Bicycle Cities Index 2022"
- Current: **Copenhagenize Index 2025 (EIT Urban Mobility Edition)**
- Reason: 2025 is the latest available edition
- Source: https://copenhagenizeindex.eu/

- **Data Attribution Improvements**
- Added full source citation: "The Global Ranking of Bicycle-Friendly Cities"
- Included publisher: Copenhagenize Design Company & EIT Urban Mobility
- Added direct link to official website
- Clarified data retrieval date and method

### Dataset Details

**Copenhagenize Index 2025 Edition**
- Top 30 cities included (from 100 total ranked)
- Score range: 50.3 (Vancouver) to 71.1 (Utrecht)
- Countries represented: 15
- Top countries: France (5), Netherlands (4), Germany (3), Canada (3)

### Hypotheses Tested

1. **H1**: Lower altitude index (A_i) correlates with higher bicycle scores
- Expected: Flat cities are more bike-friendly

2. **H2**: Distance index (D_i) closer to 1 correlates with higher bicycle scores
- Expected: Better-connected networks are more bike-friendly

### Technical Stack

- Python 3.12+
- pandas, numpy, matplotlib, seaborn
- scipy (statistical analysis)
- scikit-learn (regression)
- osmnx, networkx (geographic analysis)
- geopandas (spatial data)

### Known Limitations

1. Sample size limited to 15 cities for computational efficiency
2. Requires OpenStreetMap API access for real data
3. Elevation data may require Google Elevation API key
4. Analysis time: 10-30 minutes per run with real data

### Future Enhancements

- [ ] Automated web scraping for data updates
- [ ] Expand to all 100 cities in index
- [ ] Add weather/climate indices
- [ ] Integrate bike infrastructure metrics
- [ ] Develop combined predictive model
- [ ] Real-time data validation

---

**Contributors**: Brandon Trigueros Lara
**Project**: TCU - SIMOVI Lab, Universidad de Costa Rica
**Issue**: bikenv#2
14 changes: 13 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
# bikenv: Environmental factors that affect cycling

Topographical and climatic indexes to quantify their effect on cycling.
Topographical and climatic indexes to quantify their effect on cycling.

## Project Structure

This is a research analysis project, not a Python package. The structure is:

- `scripts/` - Core calculation functions (altitude_index, distance_index)
- `analysis/` - Statistical analysis and hypothesis testing platform
- `data/` - Copenhagenize Index 2025 Edition reference data
- `results/` - Generated analysis outputs (CSV, plots)
- `requirements-platform.txt` - Python dependencies

**Note:** This project was previously structured as an installable package with `setup.py` and a `bikenv/` module, but has been refactored into a scripts-based analysis platform. All dependencies are managed via `requirements-platform.txt`.
227 changes: 227 additions & 0 deletions analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
# Bikenv Prediction Platform

**Issue #2**: Platform to test prediction capabilities of altitude and distance indices

## Overview

This platform evaluates the prediction capabilities of the proposed `altitude_index (A_i)` and `distance_index (D_i)` using data from the **Copenhagenize Index 2025 Edition** as a reference.

**Data Source**: [The Global Ranking of Bicycle-Friendly Cities](https://copenhagenizeindex.eu/) (Copenhagenize Index - EIT Urban Mobility Edition 2025)

## Hypotheses

This platform tests two hypotheses:

1. **Hypothesis 1**: The lower the `A_i` (altitude index), the better for cycling
2. **Hypothesis 2**: The closer to 1 the `D_i` (distance index), the better for cycling

## Methodology

### Altitude Index (A_i)

The altitude index quantifies the hilliness of a city by measuring elevation changes across the road network:

```
A_i = (mean_elevation_change / mean_edge_length) × 100
```

Where:
- `mean_elevation_change`: Average elevation difference across road segments (meters)
- `mean_edge_length`: Average length of road segments (meters)

**Interpretation**: Lower values indicate flatter terrain, which is expected to correlate with better cycling conditions.

### Distance Index (D_i)

The distance index measures the connectivity and compactness of a city's cycling network:

```
D_i = circuity / (1 + normalized_node_density)
```

Where:
- `circuity`: Ratio of network distances to straight-line distances (1.0 = perfectly direct routes)
- `normalized_node_density`: Number of intersections per km², normalized to [0, 1]

**Interpretation**: Values closer to 1 indicate better connectivity with more direct routes.

### Data Source

The **Copenhagenize Index 2025 Edition** (official name: "The Global Ranking of Bicycle-Friendly Cities") ranks the top 100 bicycle-friendly cities globally based on 13 indicators across 3 pillars: Infrastructure, Usage, and Policy.

This platform uses the **top 30 cities** as reference data, with scores ranging from 50.3 (Vancouver) to 71.1 (Utrecht).

**Source**: https://copenhagenizeindex.eu/
**Publisher**: Copenhagenize Design Company & EIT Urban Mobility
**Data retrieved**: December 2025

## Project Structure

```
bikenv/
├── data/
│ ├── copenhagenize_index_2025.csv # Reference data (2025 edition)
│ └── copenhagenize_index_2022.csv # Legacy data (deprecated)
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference to 'copenhagenize_index_2022.csv' in the documentation appears to be outdated. According to the PR description and other parts of the codebase, the project now uses the 2025 edition. This file is listed as 'Legacy data (deprecated)' but may not actually exist in the repository, which could confuse users.

Suggested change
│ └── copenhagenize_index_2022.csv # Legacy data (deprecated)
│ └── legacy/ # Optional legacy data (e.g., previous index editions)

Copilot uses AI. Check for mistakes.
├── scripts/
│ ├── retrieve_data.py # Script to fetch latest index data
│ └── calculate_indices.py # Functions to calculate A_i and D_i
├── analysis/
│ └── prediction_platform.py # Main analysis script
├── results/ # Output directory
│ ├── cities_with_indices.csv # Cities with calculated indices
│ ├── statistical_results.csv # Correlation and regression results
│ └── hypothesis_testing_results.png # Visualization plots
└── requirements-platform.txt # Python dependencies
```

## Installation

1. **Clone the repository** (if not already done):
```bash
git clone https://github.com/simovilab/bikenv.git
cd bikenv
```

2. **Install dependencies**:
```bash
pip install -r requirements-platform.txt
```

Or using conda:
```bash
conda install pandas numpy matplotlib seaborn scipy scikit-learn
conda install -c conda-forge osmnx
```

## Usage

### Running the Complete Analysis

From the `analysis/` directory:

```bash
cd analysis
python prediction_platform.py
```

This will:
1. Load the Copenhagenize Index 2025 data
2. Sample 15 cities across different performance tiers
3. Calculate `A_i` and `D_i` for each city using OpenStreetMap data
4. Perform statistical analysis (correlation, regression)
5. Generate visualizations
6. Save results to the `results/` directory

**Note**: The analysis may take 10-30 minutes depending on network speed and API rate limits, as it downloads geographic data for each city.

### Calculating Indices for Individual Cities

```python
from scripts.calculate_indices import calculate_indices_for_city

# Calculate indices for a single city
altitude_idx, distance_idx = calculate_indices_for_city("Amsterdam", "Netherlands")

print(f"Altitude Index: {altitude_idx:.3f}")
print(f"Distance Index: {distance_idx:.3f}")
```

## Statistical Methods

The platform employs multiple statistical approaches:

1. **Pearson Correlation**: Measures linear relationship strength
2. **Spearman Correlation**: Measures monotonic relationship (rank-based)
3. **Linear Regression**: Models the relationship and calculates R² score
4. **Significance Testing**: p-values < 0.05 indicate statistical significance

### Interpretation Criteria

- **Strong support**: |r| > 0.5 and p < 0.05
- **Moderate support**: 0.3 < |r| < 0.5 and p < 0.05
- **Weak support**: |r| < 0.3 and p < 0.05
- **Not significant**: p ≥ 0.05

## Output Files

After running the analysis, the following files are generated in `results/`:

1. **cities_with_indices.csv**: Complete dataset with calculated indices
2. **statistical_results.csv**: Summary of correlation and regression analysis
3. **hypothesis_testing_results.png**: 4-panel visualization showing:
- Altitude Index vs Score scatter plot
- Distance Index vs Score scatter plot
- Distance from Optimal D_i vs Score
- Correlation heatmap

## Expected Results

Based on urban cycling research, we expect:

- **Negative correlation** between `A_i` and cycling scores (flatter cities rank higher)
- **Cities with D_i ≈ 1** to have higher scores (better network connectivity)

Top-performing cities like Utrecht, Copenhagen, and Amsterdam are expected to have:
- Low `A_i` values (< 2.0, indicating flat terrain)
- `D_i` values close to 1.0 (indicating efficient, direct networks)

## Limitations

1. **Sample Size**: Analysis uses 15 cities for computational efficiency
2. **API Dependencies**: Requires OpenStreetMap data access
3. **Elevation Data**: May require Google Elevation API key for accurate altitude calculations
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation mentions "Elevation data may require Google Elevation API key for accurate altitude calculations" but the code in calculate_indices.py actually uses the Open Topo Data API (which is free and doesn't require an API key). This is misleading and should be corrected to accurately reflect the implementation.

Suggested change
3. **Elevation Data**: May require Google Elevation API key for accurate altitude calculations
3. **Elevation Data**: Uses the Open Topo Data API (no API key required); accuracy depends on its data coverage and resolution

Copilot uses AI. Check for mistakes.
4. **Network Complexity**: Simplified metrics may not capture all aspects of cyclability

## Future Improvements

- [ ] Expand sample size to all 30 cities
- [ ] Add weather/climate index
- [ ] Incorporate bike infrastructure data (protected lanes, bike parking)
- [ ] Test against modal share data (% of trips by bicycle)
- [ ] Develop combined predictive model

## References

- **Copenhagenize Index**: https://copenhagenizeindex.eu/
- **OSMnx Documentation**: https://osmnx.readthedocs.io/
- **GTFS and Urban Mobility**: https://gtfs.org/

## License

MIT License - See repository LICENSE file

## Author

**Brandon Trigueros Lara**
TCU Project - SIMOVI Lab, Universidad de Costa Rica
December 2025

---

## Quick Start Example

```python
# Quick test with sample cities
import pandas as pd
from scripts.calculate_indices import calculate_indices_for_city

# Test with Amsterdam
print("Calculating indices for Amsterdam...")
a_i, d_i = calculate_indices_for_city("Amsterdam", "Netherlands")

print(f"\nAmsterdam Results:")
print(f" Altitude Index: {a_i:.3f} (lower is better)")
print(f" Distance Index: {d_i:.3f} (closer to 1 is better)")

# Load reference data to compare
df = pd.read_csv('../data/copenhagenize_index_2022.csv')
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code example references 'copenhagenize_index_2022.csv' which appears to be outdated. The project now uses the 2025 edition ('copenhagenize_index_2025.csv'). Update this example to use the correct filename.

Suggested change
df = pd.read_csv('../data/copenhagenize_index_2022.csv')
df = pd.read_csv('../data/copenhagenize_index_2025.csv')

Copilot uses AI. Check for mistakes.
amsterdam_score = df[df['city'] == 'Amsterdam']['score'].values[0]

print(f" Copenhagenize Score: {amsterdam_score} (rank #4)")
```

## Contact

For questions or issues, please open an issue on GitHub or contact:
- brandon.trigueros@ucr.ac.cr
- Laboratory: SIMOVI - UCR
Loading