Clean, structured datasets from Sri Lankan government sources
5 Years of Data | 4 Key Ministries | Multiple Departments
- Foreign Affairs & Relations
- Immigration & Emigration
- Foreign Employment
- Tourism Development
- ποΈ Foreign Affairs: Diplomatic missions, communications, organizational data
- π Immigration: Asylum seekers, visas, passports, refugee statistics
- πΌ Employment: Worker complaints, remittances, registration data, legal performance
- ποΈ Tourism: Arrivals, accommodations, occupancy rates, revenue statistics
Note
π¨ Action Required: View the Missing Datasets Report to see which datasets need to be populated.
| Data Source | Dataset Category | Years Available | Collection Status | Verification Status |
|---|---|---|---|---|
| Ministry of Foreign Affairs | Diplomatic Missions | 2019-2023 | β Collected | |
| Ministry of Foreign Affairs | Official Communications | 2019-2023 | β Collected | |
| Department of Immigration and Emigration | Asylum Seekers & Refugees | 2019-2023 | β Collected | |
| Department of Immigration and Emigration | Visas & Passports | 2019-2023 | β Collected | |
| Sri Lanka Bureau of Foreign Employment | Worker Complaints | 2019-2023 | β Collected | |
| Sri Lanka Bureau of Foreign Employment | Remittances & Earnings | 2019-2023 | β Collected | |
| Sri Lanka Bureau of Foreign Employment | Registrations (SLBFE) | 2019-2023 | β Collected | |
| Sri Lanka Tourism Development Authority | Tourist Arrivals | 2019-2024 | β Collected | β Verified (2024 Partial) |
| Sri Lanka Tourism Development Authority | Accommodations & Occupancy | 2019-2024 | β Collected | β Verified (2024 Partial) |
| Sri Lanka Tourism Development Authority | Revenue Statistics | 2019-2024 | β Collected | β Verified (2024 Partial) |
- 2019
- 2020-2021
- 2022-2023
- 2024
π Browse all data interactively β
π View online at GitHub Pages β
All datasets are in clean JSON format with metadata .
This repository contains cleaned and organized datasets from various Sri Lankan government public sources, compiled by the Lanka Data Foundation. The data spans from 2019 to 2024 and covers multiple ministries and departments.
To run the data ingestion and utility scripts, you'll need to set up the Python environment. We recommend using Mamba (or Conda).
-
Create the environment:
mamba env create -f environment.yml
(If using Conda:
conda env create -f environment.yml) -
Activate the environment:
mamba activate datasets_env
-
Run the scripts:
# Run the optimized ingestion script python insert.py # Run the attribute writer (optional year filter) python write_attributes.py --year 2023
- Total Years: 6 (2019-2024)
- Total Datasets: 175+ JSON files
- Ministries Covered: 4 main categories
- Data Sources: Public government sources
datasets/
βββ data/ # Main data directory
β βββ 2019/ # Year-based organization
β βββ 2020/
β βββ 2021/
β βββ 2022/
β βββ 2023/
βββ generate_static_html.py # HTML generator script
βββ index.html # Generated static HTML
βββ styles.css # CSS stylesheet
βββ README.md # This file
Data is organized hierarchically:
- Year β Government β President β Ministry β Department β Data Files
Each dataset contains:
data.json- The main datasetmetadata.json- Metadata about the dataset (optional)
- Create a new folder under
data/(e.g.,data/2024/) - Follow the existing folder structure:
data/2024/ βββ Government of Sri Lanka(government)/ βββ [President Name](citizen)/ βββ [Ministry Name](minister)/ βββ [Department Name](department)/ βββ [category]/ β βββ data.json β βββ metadata.json (optional)
- Navigate to the appropriate year folder in
data/ - Follow the existing hierarchy to find the correct ministry/department
- Add your
data.jsonand optionalmetadata.jsonfiles
- data.json: Must contain valid JSON data
- metadata.json: Optional, should contain dataset metadata (description, source, etc.)
- Files must be placed in appropriately named folders with category indicators
The API documentation website is built with Jekyll on GitHub Pages. The data listing is auto-generated and injected into docs/index.md.
To update the data listing:
- Run the update script:
python3 update_dataset_index.py
- This will:
- Scan the
data/directory. - Generate ZIP files for each year.
- Inject the file listing into
docs/index.md.
- Scan the
- Commit and push changes to
mainbranch.
- Automatically created for each year folder
- Contains all JSON files from that year
- Named as
[YEAR]_Data.zip(e.g.,2019_Data.zip)
- Interactive collapsible sections
- Download buttons for yearly ZIP files
- In-browser JSON viewer with copy/download functionality
- Responsive design with CSS styling
- Use
(government),(citizen),(minister),(department)suffixes for proper categorization - Use
(AS_CATEGORY)for sub-categories - Underscores in folder names will be converted to spaces in display
Edit the get_emoji_for_type() function in generate_static_html.py:
emoji_map = {
'your_category': 'π―',
# ... existing mappings
}Edit styles.css to customize the appearance:
- Colors, fonts, spacing
- Responsive breakpoints
- Modal styling for JSON viewer
The script automatically counts datasets, but you can manually update the description in the main() function.
The generated index.html is ready for deployment on:
- GitHub Pages
- Any static hosting service
- Local web servers
For any enquiries please contact: contact@datafoundation.lk
Codebase at: https://github.com/LDFLK/datasets
See LICENSE file for details.