-
Notifications
You must be signed in to change notification settings - Fork 0
Create prediction platform to test altitude and distance indices #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
8704d4d
d3b5f77
77072c4
fb9566a
6455f07
355f240
fc0d6cf
76b2098
e94f8f6
98cc6d8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| # Changelog - Bikenv Prediction Platform | ||
|
|
||
| ## 2025-12-27 - Initial Implementation | ||
|
|
||
| ### Added | ||
| - **Data Retrieval Script** (`scripts/retrieve_data.py`) | ||
| - Manual data entry from Copenhagenize Index 2025 edition | ||
| - Function to fetch and save top 30 cities with scores | ||
| - Notes for future automated scraping implementation | ||
|
|
||
| - **Index Calculation Functions** (`scripts/calculate_indices.py`) | ||
| - `calculate_altitude_index()`: Measures city hilliness using OSM elevation data | ||
| - `calculate_distance_index()`: Measures network connectivity/compactness | ||
| - Both functions integrated with OSMnx for real geographic data | ||
|
|
||
| - **Analysis Platform** (`analysis/prediction_platform.py`) | ||
| - Comprehensive hypothesis testing framework | ||
| - Statistical analysis (Pearson, Spearman correlations) | ||
| - Linear regression modeling | ||
| - Automated visualization generation | ||
| - CSV export of results | ||
|
|
||
| - **Demo Mode** (`analysis/demo_platform.py`) | ||
| - Simplified version with synthetic data | ||
| - No API dependencies required | ||
| - Quick testing and validation | ||
|
|
||
| - **Project Structure** | ||
| - `data/` - Reference datasets | ||
| - `scripts/` - Data retrieval and calculation utilities | ||
| - `analysis/` - Main platform and demo scripts | ||
| - `results/` - Output directory for plots and CSVs | ||
|
|
||
| - **Documentation** | ||
| - Comprehensive README with methodology and usage | ||
| - Structure verification script | ||
| - Requirements file for dependencies | ||
|
|
||
| ### Changed | ||
| - **Updated to Copenhagenize Index 2025 Edition** | ||
| - Previous: Referenced "Global Bicycle Cities Index 2022" | ||
| - Current: **Copenhagenize Index 2025 (EIT Urban Mobility Edition)** | ||
| - Reason: 2025 is the latest available edition | ||
| - Source: https://copenhagenizeindex.eu/ | ||
|
|
||
| - **Data Attribution Improvements** | ||
| - Added full source citation: "The Global Ranking of Bicycle-Friendly Cities" | ||
| - Included publisher: Copenhagenize Design Company & EIT Urban Mobility | ||
| - Added direct link to official website | ||
| - Clarified data retrieval date and method | ||
|
|
||
| ### Dataset Details | ||
|
|
||
| **Copenhagenize Index 2025 Edition** | ||
| - Top 30 cities included (from 100 total ranked) | ||
| - Score range: 50.3 (Vancouver) to 71.1 (Utrecht) | ||
| - Countries represented: 15 | ||
| - Top countries: France (5), Netherlands (4), Germany (3), Canada (3) | ||
|
|
||
| ### Hypotheses Tested | ||
|
|
||
| 1. **H1**: Lower altitude index (A_i) correlates with higher bicycle scores | ||
| - Expected: Flat cities are more bike-friendly | ||
|
|
||
| 2. **H2**: Distance index (D_i) closer to 1 correlates with higher bicycle scores | ||
| - Expected: Better-connected networks are more bike-friendly | ||
|
|
||
| ### Technical Stack | ||
|
|
||
| - Python 3.12+ | ||
| - pandas, numpy, matplotlib, seaborn | ||
| - scipy (statistical analysis) | ||
| - scikit-learn (regression) | ||
| - osmnx, networkx (geographic analysis) | ||
| - geopandas (spatial data) | ||
|
|
||
| ### Known Limitations | ||
|
|
||
| 1. Sample size limited to 15 cities for computational efficiency | ||
| 2. Requires OpenStreetMap API access for real data | ||
| 3. Elevation data may require Google Elevation API key | ||
| 4. Analysis time: 10-30 minutes per run with real data | ||
|
|
||
| ### Future Enhancements | ||
|
|
||
| - [ ] Automated web scraping for data updates | ||
| - [ ] Expand to all 100 cities in index | ||
| - [ ] Add weather/climate indices | ||
| - [ ] Integrate bike infrastructure metrics | ||
| - [ ] Develop combined predictive model | ||
| - [ ] Real-time data validation | ||
|
|
||
| --- | ||
|
|
||
| **Contributors**: Brandon Trigueros Lara | ||
| **Project**: TCU - SIMOVI Lab, Universidad de Costa Rica | ||
| **Issue**: bikenv#2 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,15 @@ | ||
| # bikenv: Environmental factors that affect cycling | ||
|
|
||
| Topographical and climatic indexes to quantify their effect on cycling. | ||
| Topographical and climatic indexes to quantify their effect on cycling. | ||
|
|
||
| ## Project Structure | ||
|
|
||
| This is a research analysis project, not a Python package. The structure is: | ||
|
|
||
| - `scripts/` - Core calculation functions (altitude_index, distance_index) | ||
| - `analysis/` - Statistical analysis and hypothesis testing platform | ||
| - `data/` - Copenhagenize Index 2025 Edition reference data | ||
| - `results/` - Generated analysis outputs (CSV, plots) | ||
| - `requirements-platform.txt` - Python dependencies | ||
|
|
||
| **Note:** This project was previously structured as an installable package with `setup.py` and a `bikenv/` module, but has been refactored into a scripts-based analysis platform. All dependencies are managed via `requirements-platform.txt`. |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,227 @@ | ||||||
| # Bikenv Prediction Platform | ||||||
|
|
||||||
| **Issue #2**: Platform to test prediction capabilities of altitude and distance indices | ||||||
|
|
||||||
| ## Overview | ||||||
|
|
||||||
| This platform evaluates the prediction capabilities of the proposed `altitude_index (A_i)` and `distance_index (D_i)` using data from the **Copenhagenize Index 2025 Edition** as a reference. | ||||||
|
|
||||||
| **Data Source**: [The Global Ranking of Bicycle-Friendly Cities](https://copenhagenizeindex.eu/) (Copenhagenize Index - EIT Urban Mobility Edition 2025) | ||||||
|
|
||||||
| ## Hypotheses | ||||||
|
|
||||||
| This platform tests two hypotheses: | ||||||
|
|
||||||
| 1. **Hypothesis 1**: The lower the `A_i` (altitude index), the better for cycling | ||||||
| 2. **Hypothesis 2**: The closer to 1 the `D_i` (distance index), the better for cycling | ||||||
|
|
||||||
| ## Methodology | ||||||
|
|
||||||
| ### Altitude Index (A_i) | ||||||
|
|
||||||
| The altitude index quantifies the hilliness of a city by measuring elevation changes across the road network: | ||||||
|
|
||||||
| ``` | ||||||
| A_i = (mean_elevation_change / mean_edge_length) × 100 | ||||||
| ``` | ||||||
|
|
||||||
| Where: | ||||||
| - `mean_elevation_change`: Average elevation difference across road segments (meters) | ||||||
| - `mean_edge_length`: Average length of road segments (meters) | ||||||
|
|
||||||
| **Interpretation**: Lower values indicate flatter terrain, which is expected to correlate with better cycling conditions. | ||||||
|
|
||||||
| ### Distance Index (D_i) | ||||||
|
|
||||||
| The distance index measures the connectivity and compactness of a city's cycling network: | ||||||
|
|
||||||
| ``` | ||||||
| D_i = circuity / (1 + normalized_node_density) | ||||||
| ``` | ||||||
|
|
||||||
| Where: | ||||||
| - `circuity`: Ratio of network distances to straight-line distances (1.0 = perfectly direct routes) | ||||||
| - `normalized_node_density`: Number of intersections per km², normalized to [0, 1] | ||||||
|
|
||||||
| **Interpretation**: Values closer to 1 indicate better connectivity with more direct routes. | ||||||
|
|
||||||
| ### Data Source | ||||||
|
|
||||||
| The **Copenhagenize Index 2025 Edition** (official name: "The Global Ranking of Bicycle-Friendly Cities") ranks the top 100 bicycle-friendly cities globally based on 13 indicators across 3 pillars: Infrastructure, Usage, and Policy. | ||||||
|
|
||||||
| This platform uses the **top 30 cities** as reference data, with scores ranging from 50.3 (Vancouver) to 71.1 (Utrecht). | ||||||
|
|
||||||
| **Source**: https://copenhagenizeindex.eu/ | ||||||
| **Publisher**: Copenhagenize Design Company & EIT Urban Mobility | ||||||
| **Data retrieved**: December 2025 | ||||||
|
|
||||||
| ## Project Structure | ||||||
|
|
||||||
| ``` | ||||||
| bikenv/ | ||||||
| ├── data/ | ||||||
| │ ├── copenhagenize_index_2025.csv # Reference data (2025 edition) | ||||||
| │ └── copenhagenize_index_2022.csv # Legacy data (deprecated) | ||||||
| ├── scripts/ | ||||||
| │ ├── retrieve_data.py # Script to fetch latest index data | ||||||
| │ └── calculate_indices.py # Functions to calculate A_i and D_i | ||||||
| ├── analysis/ | ||||||
| │ └── prediction_platform.py # Main analysis script | ||||||
| ├── results/ # Output directory | ||||||
| │ ├── cities_with_indices.csv # Cities with calculated indices | ||||||
| │ ├── statistical_results.csv # Correlation and regression results | ||||||
| │ └── hypothesis_testing_results.png # Visualization plots | ||||||
| └── requirements-platform.txt # Python dependencies | ||||||
| ``` | ||||||
|
|
||||||
| ## Installation | ||||||
|
|
||||||
| 1. **Clone the repository** (if not already done): | ||||||
| ```bash | ||||||
| git clone https://github.com/simovilab/bikenv.git | ||||||
| cd bikenv | ||||||
| ``` | ||||||
|
|
||||||
| 2. **Install dependencies**: | ||||||
| ```bash | ||||||
| pip install -r requirements-platform.txt | ||||||
| ``` | ||||||
|
|
||||||
| Or using conda: | ||||||
| ```bash | ||||||
| conda install pandas numpy matplotlib seaborn scipy scikit-learn | ||||||
| conda install -c conda-forge osmnx | ||||||
| ``` | ||||||
|
|
||||||
| ## Usage | ||||||
|
|
||||||
| ### Running the Complete Analysis | ||||||
|
|
||||||
| From the `analysis/` directory: | ||||||
|
|
||||||
| ```bash | ||||||
| cd analysis | ||||||
| python prediction_platform.py | ||||||
| ``` | ||||||
|
|
||||||
| This will: | ||||||
| 1. Load the Copenhagenize Index 2025 data | ||||||
| 2. Sample 15 cities across different performance tiers | ||||||
| 3. Calculate `A_i` and `D_i` for each city using OpenStreetMap data | ||||||
| 4. Perform statistical analysis (correlation, regression) | ||||||
| 5. Generate visualizations | ||||||
| 6. Save results to the `results/` directory | ||||||
|
|
||||||
| **Note**: The analysis may take 10-30 minutes depending on network speed and API rate limits, as it downloads geographic data for each city. | ||||||
|
|
||||||
| ### Calculating Indices for Individual Cities | ||||||
|
|
||||||
| ```python | ||||||
| from scripts.calculate_indices import calculate_indices_for_city | ||||||
|
|
||||||
| # Calculate indices for a single city | ||||||
| altitude_idx, distance_idx = calculate_indices_for_city("Amsterdam", "Netherlands") | ||||||
|
|
||||||
| print(f"Altitude Index: {altitude_idx:.3f}") | ||||||
| print(f"Distance Index: {distance_idx:.3f}") | ||||||
| ``` | ||||||
|
|
||||||
| ## Statistical Methods | ||||||
|
|
||||||
| The platform employs multiple statistical approaches: | ||||||
|
|
||||||
| 1. **Pearson Correlation**: Measures linear relationship strength | ||||||
| 2. **Spearman Correlation**: Measures monotonic relationship (rank-based) | ||||||
| 3. **Linear Regression**: Models the relationship and calculates R² score | ||||||
| 4. **Significance Testing**: p-values < 0.05 indicate statistical significance | ||||||
|
|
||||||
| ### Interpretation Criteria | ||||||
|
|
||||||
| - **Strong support**: |r| > 0.5 and p < 0.05 | ||||||
| - **Moderate support**: 0.3 < |r| < 0.5 and p < 0.05 | ||||||
| - **Weak support**: |r| < 0.3 and p < 0.05 | ||||||
| - **Not significant**: p ≥ 0.05 | ||||||
|
|
||||||
| ## Output Files | ||||||
|
|
||||||
| After running the analysis, the following files are generated in `results/`: | ||||||
|
|
||||||
| 1. **cities_with_indices.csv**: Complete dataset with calculated indices | ||||||
| 2. **statistical_results.csv**: Summary of correlation and regression analysis | ||||||
| 3. **hypothesis_testing_results.png**: 4-panel visualization showing: | ||||||
| - Altitude Index vs Score scatter plot | ||||||
| - Distance Index vs Score scatter plot | ||||||
| - Distance from Optimal D_i vs Score | ||||||
| - Correlation heatmap | ||||||
|
|
||||||
| ## Expected Results | ||||||
|
|
||||||
| Based on urban cycling research, we expect: | ||||||
|
|
||||||
| - **Negative correlation** between `A_i` and cycling scores (flatter cities rank higher) | ||||||
| - **Cities with D_i ≈ 1** to have higher scores (better network connectivity) | ||||||
|
|
||||||
| Top-performing cities like Utrecht, Copenhagen, and Amsterdam are expected to have: | ||||||
| - Low `A_i` values (< 2.0, indicating flat terrain) | ||||||
| - `D_i` values close to 1.0 (indicating efficient, direct networks) | ||||||
|
|
||||||
| ## Limitations | ||||||
|
|
||||||
| 1. **Sample Size**: Analysis uses 15 cities for computational efficiency | ||||||
| 2. **API Dependencies**: Requires OpenStreetMap data access | ||||||
| 3. **Elevation Data**: May require Google Elevation API key for accurate altitude calculations | ||||||
|
||||||
| 3. **Elevation Data**: May require Google Elevation API key for accurate altitude calculations | |
| 3. **Elevation Data**: Uses the Open Topo Data API (no API key required); accuracy depends on its data coverage and resolution |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code example references 'copenhagenize_index_2022.csv' which appears to be outdated. The project now uses the 2025 edition ('copenhagenize_index_2025.csv'). Update this example to use the correct filename.
| df = pd.read_csv('../data/copenhagenize_index_2022.csv') | |
| df = pd.read_csv('../data/copenhagenize_index_2025.csv') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference to 'copenhagenize_index_2022.csv' in the documentation appears to be outdated. According to the PR description and other parts of the codebase, the project now uses the 2025 edition. This file is listed as 'Legacy data (deprecated)' but may not actually exist in the repository, which could confuse users.