simovilab · BrandonTrigueros · Dec 27, 2025 · Dec 27, 2025 · Dec 27, 2025 · Dec 27, 2025
diff --git a/.gitignore b/.gitignore
@@ -173,4 +173,11 @@ poetry.toml
 # ruff
 .ruff_cache/
 
-# End of https://www.toptal.com/developers/gitignore/api/python
+# End of https://www.toptal.com/developers/gitignore/api/python
+
+# Project specific
+results/
+
+# OSMnx cache (temporary files)
+cache/
+analysis/cache/
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,97 @@
+# Changelog - Bikenv Prediction Platform
+
+## 2025-12-27 - Initial Implementation
+
+### Added
+- **Data Retrieval Script** (`scripts/retrieve_data.py`)
+  - Manual data entry from Copenhagenize Index 2025 edition
+  - Function to fetch and save top 30 cities with scores
+  - Notes for future automated scraping implementation
+
+- **Index Calculation Functions** (`scripts/calculate_indices.py`)
+  - `calculate_altitude_index()`: Measures city hilliness using OSM elevation data
+  - `calculate_distance_index()`: Measures network connectivity/compactness
+  - Both functions integrated with OSMnx for real geographic data
+
+- **Analysis Platform** (`analysis/prediction_platform.py`)
+  - Comprehensive hypothesis testing framework
+  - Statistical analysis (Pearson, Spearman correlations)
+  - Linear regression modeling
+  - Automated visualization generation
+  - CSV export of results
+
+- **Demo Mode** (`analysis/demo_platform.py`)
+  - Simplified version with synthetic data
+  - No API dependencies required
+  - Quick testing and validation
+
+- **Project Structure**
+  - `data/` - Reference datasets
+  - `scripts/` - Data retrieval and calculation utilities
+  - `analysis/` - Main platform and demo scripts
+  - `results/` - Output directory for plots and CSVs
+
+- **Documentation**
+  - Comprehensive README with methodology and usage
+  - Structure verification script
+  - Requirements file for dependencies
+
+### Changed
+- **Updated to Copenhagenize Index 2025 Edition**
+  - Previous: Referenced "Global Bicycle Cities Index 2022"
+  - Current: **Copenhagenize Index 2025 (EIT Urban Mobility Edition)**
+  - Reason: 2025 is the latest available edition
+  - Source: https://copenhagenizeindex.eu/
+
+- **Data Attribution Improvements**
+  - Added full source citation: "The Global Ranking of Bicycle-Friendly Cities"
+  - Included publisher: Copenhagenize Design Company & EIT Urban Mobility
+  - Added direct link to official website
+  - Clarified data retrieval date and method
+
+### Dataset Details
+
+**Copenhagenize Index 2025 Edition**
+- Top 30 cities included (from 100 total ranked)
+- Score range: 50.3 (Vancouver) to 71.1 (Utrecht)
+- Countries represented: 15
+- Top countries: France (5), Netherlands (4), Germany (3), Canada (3)
+
+### Hypotheses Tested
+
+1. **H1**: Lower altitude index (A_i) correlates with higher bicycle scores
+   - Expected: Flat cities are more bike-friendly
+
+2. **H2**: Distance index (D_i) closer to 1 correlates with higher bicycle scores
+   - Expected: Better-connected networks are more bike-friendly
+
+### Technical Stack
+
+- Python 3.12+
+- pandas, numpy, matplotlib, seaborn
+- scipy (statistical analysis)
+- scikit-learn (regression)
+- osmnx, networkx (geographic analysis)
+- geopandas (spatial data)
+
+### Known Limitations
+
+1. Sample size limited to 15 cities for computational efficiency
+2. Requires OpenStreetMap API access for real data
+3. Elevation data may require Google Elevation API key
+4. Analysis time: 10-30 minutes per run with real data
+
+### Future Enhancements
+
+- [ ] Automated web scraping for data updates
+- [ ] Expand to all 100 cities in index
+- [ ] Add weather/climate indices
+- [ ] Integrate bike infrastructure metrics
+- [ ] Develop combined predictive model
+- [ ] Real-time data validation
+
+---
+
+**Contributors**: Brandon Trigueros Lara  
+**Project**: TCU - SIMOVI Lab, Universidad de Costa Rica  
+**Issue**: bikenv#2
diff --git a/README.md b/README.md
@@ -1,3 +1,15 @@
 # bikenv: Environmental factors that affect cycling
 
-Topographical and climatic indexes to quantify their effect on cycling.
+Topographical and climatic indexes to quantify their effect on cycling.
+
+## Project Structure
+
+This is a research analysis project, not a Python package. The structure is:
+
+- `scripts/` - Core calculation functions (altitude_index, distance_index)
+- `analysis/` - Statistical analysis and hypothesis testing platform
+- `data/` - Copenhagenize Index 2025 Edition reference data
+- `results/` - Generated analysis outputs (CSV, plots)
+- `requirements-platform.txt` - Python dependencies
+
+**Note:** This project was previously structured as an installable package with `setup.py` and a `bikenv/` module, but has been refactored into a scripts-based analysis platform. All dependencies are managed via `requirements-platform.txt`.
diff --git a/analysis/README.md b/analysis/README.md
@@ -0,0 +1,227 @@
+# Bikenv Prediction Platform
+
+**Issue #2**: Platform to test prediction capabilities of altitude and distance indices
+
+## Overview
+
+This platform evaluates the prediction capabilities of the proposed `altitude_index (A_i)` and `distance_index (D_i)` using data from the **Copenhagenize Index 2025 Edition** as a reference.
+
+**Data Source**: [The Global Ranking of Bicycle-Friendly Cities](https://copenhagenizeindex.eu/) (Copenhagenize Index - EIT Urban Mobility Edition 2025)
+
+## Hypotheses
+
+This platform tests two hypotheses:
+
+1. **Hypothesis 1**: The lower the `A_i` (altitude index), the better for cycling
+2. **Hypothesis 2**: The closer to 1 the `D_i` (distance index), the better for cycling
+
+## Methodology
+
+### Altitude Index (A_i)
+
+The altitude index quantifies the hilliness of a city by measuring elevation changes across the road network:
+
+```
+A_i = (mean_elevation_change / mean_edge_length) × 100
+```
+
+Where:
+- `mean_elevation_change`: Average elevation difference across road segments (meters)
+- `mean_edge_length`: Average length of road segments (meters)
+
+**Interpretation**: Lower values indicate flatter terrain, which is expected to correlate with better cycling conditions.
+
+### Distance Index (D_i)
+
+The distance index measures the connectivity and compactness of a city's cycling network:
+
+```
+D_i = circuity / (1 + normalized_node_density)
+```
+
+Where:
+- `circuity`: Ratio of network distances to straight-line distances (1.0 = perfectly direct routes)
+- `normalized_node_density`: Number of intersections per km², normalized to [0, 1]
+
+**Interpretation**: Values closer to 1 indicate better connectivity with more direct routes.
+
+### Data Source
+
+The **Copenhagenize Index 2025 Edition** (official name: "The Global Ranking of Bicycle-Friendly Cities") ranks the top 100 bicycle-friendly cities globally based on 13 indicators across 3 pillars: Infrastructure, Usage, and Policy.
+
+This platform uses the **top 30 cities** as reference data, with scores ranging from 50.3 (Vancouver) to 71.1 (Utrecht).
+
+**Source**: https://copenhagenizeindex.eu/  
+**Publisher**: Copenhagenize Design Company & EIT Urban Mobility  
+**Data retrieved**: December 2025
+
+## Project Structure
+
+```
+bikenv/
+├── data/
+│   ├── copenhagenize_index_2025.csv    # Reference data (2025 edition)
+│   └── copenhagenize_index_2022.csv    # Legacy data (deprecated)
-│   └── copenhagenize_index_2022.csv    # Legacy data (deprecated)
+│   └── legacy/                         # Optional legacy data (e.g., previous index editions)
-│   └── copenhagenize_index_2022.csv    # Legacy data (deprecated)
+│   └── legacy/                         # Optional legacy data (e.g., previous index editions)
+├── scripts/
+│   ├── retrieve_data.py                # Script to fetch latest index data
+│   └── calculate_indices.py            # Functions to calculate A_i and D_i
+├── analysis/
+│   └── prediction_platform.py          # Main analysis script
+├── results/                            # Output directory
+│   ├── cities_with_indices.csv         # Cities with calculated indices
+│   ├── statistical_results.csv         # Correlation and regression results
+│   └── hypothesis_testing_results.png  # Visualization plots
+└── requirements-platform.txt           # Python dependencies
+```
+
+## Installation
+
+1. **Clone the repository** (if not already done):
+   ```bash
+   git clone https://github.com/simovilab/bikenv.git
+   cd bikenv
+   ```
+
+2. **Install dependencies**:
+   ```bash
+   pip install -r requirements-platform.txt
+   ```
+
+   Or using conda:
+   ```bash
+   conda install pandas numpy matplotlib seaborn scipy scikit-learn
+   conda install -c conda-forge osmnx
+   ```
+
+## Usage
+
+### Running the Complete Analysis
+
+From the `analysis/` directory:
+
+```bash
+cd analysis
+python prediction_platform.py
+```
+
+This will:
+1. Load the Copenhagenize Index 2025 data
+2. Sample 15 cities across different performance tiers
+3. Calculate `A_i` and `D_i` for each city using OpenStreetMap data
+4. Perform statistical analysis (correlation, regression)
+5. Generate visualizations
+6. Save results to the `results/` directory
+
+**Note**: The analysis may take 10-30 minutes depending on network speed and API rate limits, as it downloads geographic data for each city.
+
+### Calculating Indices for Individual Cities
+
+```python
+from scripts.calculate_indices import calculate_indices_for_city
+
+# Calculate indices for a single city
+altitude_idx, distance_idx = calculate_indices_for_city("Amsterdam", "Netherlands")
+
+print(f"Altitude Index: {altitude_idx:.3f}")
+print(f"Distance Index: {distance_idx:.3f}")
+```
+
+## Statistical Methods
+
+The platform employs multiple statistical approaches:
+
+1. **Pearson Correlation**: Measures linear relationship strength
+2. **Spearman Correlation**: Measures monotonic relationship (rank-based)
+3. **Linear Regression**: Models the relationship and calculates R² score
+4. **Significance Testing**: p-values < 0.05 indicate statistical significance
+
+### Interpretation Criteria
+
+- **Strong support**: |r| > 0.5 and p < 0.05
+- **Moderate support**: 0.3 < |r| < 0.5 and p < 0.05
+- **Weak support**: |r| < 0.3 and p < 0.05
+- **Not significant**: p ≥ 0.05
+
+## Output Files
+
+After running the analysis, the following files are generated in `results/`:
+
+1. **cities_with_indices.csv**: Complete dataset with calculated indices
+2. **statistical_results.csv**: Summary of correlation and regression analysis
+3. **hypothesis_testing_results.png**: 4-panel visualization showing:
+   - Altitude Index vs Score scatter plot
+   - Distance Index vs Score scatter plot
+   - Distance from Optimal D_i vs Score
+   - Correlation heatmap
+
+## Expected Results
+
+Based on urban cycling research, we expect:
+
+- **Negative correlation** between `A_i` and cycling scores (flatter cities rank higher)
+- **Cities with D_i ≈ 1** to have higher scores (better network connectivity)
+
+Top-performing cities like Utrecht, Copenhagen, and Amsterdam are expected to have:
+- Low `A_i` values (< 2.0, indicating flat terrain)
+- `D_i` values close to 1.0 (indicating efficient, direct networks)
+
+## Limitations
+
+1. **Sample Size**: Analysis uses 15 cities for computational efficiency
+2. **API Dependencies**: Requires OpenStreetMap data access
+3. **Elevation Data**: May require Google Elevation API key for accurate altitude calculations
-3. **Elevation Data**: May require Google Elevation API key for accurate altitude calculations
+3. **Elevation Data**: Uses the Open Topo Data API (no API key required); accuracy depends on its data coverage and resolution
-3. **Elevation Data**: May require Google Elevation API key for accurate altitude calculations
+3. **Elevation Data**: Uses the Open Topo Data API (no API key required); accuracy depends on its data coverage and resolution
+4. **Network Complexity**: Simplified metrics may not capture all aspects of cyclability
+
+## Future Improvements
+
+- [ ] Expand sample size to all 30 cities
+- [ ] Add weather/climate index
+- [ ] Incorporate bike infrastructure data (protected lanes, bike parking)
+- [ ] Test against modal share data (% of trips by bicycle)
+- [ ] Develop combined predictive model
+
+## References
+
+- **Copenhagenize Index**: https://copenhagenizeindex.eu/
+- **OSMnx Documentation**: https://osmnx.readthedocs.io/
+- **GTFS and Urban Mobility**: https://gtfs.org/
+
+## License
+
+MIT License - See repository LICENSE file
+
+## Author
+
+**Brandon Trigueros Lara**  
+TCU Project - SIMOVI Lab, Universidad de Costa Rica  
+December 2025
+
+---
+
+## Quick Start Example
+
+```python
+# Quick test with sample cities
+import pandas as pd
+from scripts.calculate_indices import calculate_indices_for_city
+
+# Test with Amsterdam
+print("Calculating indices for Amsterdam...")
+a_i, d_i = calculate_indices_for_city("Amsterdam", "Netherlands")
+
+print(f"\nAmsterdam Results:")
+print(f"  Altitude Index: {a_i:.3f} (lower is better)")
+print(f"  Distance Index: {d_i:.3f} (closer to 1 is better)")
+
+# Load reference data to compare
+df = pd.read_csv('../data/copenhagenize_index_2022.csv')
-df = pd.read_csv('../data/copenhagenize_index_2022.csv')
+df = pd.read_csv('../data/copenhagenize_index_2025.csv')
-df = pd.read_csv('../data/copenhagenize_index_2022.csv')
+df = pd.read_csv('../data/copenhagenize_index_2025.csv')
+amsterdam_score = df[df['city'] == 'Amsterdam']['score'].values[0]
+
+print(f"  Copenhagenize Score: {amsterdam_score} (rank #4)")
+```
+
+## Contact
+
+For questions or issues, please open an issue on GitHub or contact:
+- brandon.trigueros@ucr.ac.cr
+- Laboratory: SIMOVI - UCR