🇱🇰 Sri Lanka Government Statistics Datasets (2019–2024)

Clean, structured datasets from Sri Lankan government sources

📊 What's Inside

5 Years of Data | 4 Key Ministries | Multiple Departments

Foreign Affairs & Relations
Immigration & Emigration
Foreign Employment
Tourism Development

🗂️ Data Categories

🏛️ Foreign Affairs: Diplomatic missions, communications, organizational data
🛂 Immigration: Asylum seekers, visas, passports, refugee statistics
💼 Employment: Worker complaints, remittances, registration data, legal performance
🏖️ Tourism: Arrivals, accommodations, occupancy rates, revenue statistics

📋 Data Matrix

Note

🚨 Action Required: View the Missing Datasets Report to see which datasets need to be populated.

Data Source	Dataset Category	Years Available	Collection Status	Verification Status
Ministry of Foreign Affairs	Diplomatic Missions	2019-2023	✅ Collected	⚠️ Pending (2024)
Ministry of Foreign Affairs	Official Communications	2019-2023	✅ Collected	⚠️ Pending (2024)
Department of Immigration and Emigration	Asylum Seekers & Refugees	2019-2023	✅ Collected	⚠️ Pending (2024)
Department of Immigration and Emigration	Visas & Passports	2019-2023	✅ Collected	⚠️ Pending (2024)
Sri Lanka Bureau of Foreign Employment	Worker Complaints	2019-2023	✅ Collected	⚠️ Pending (2024)
Sri Lanka Bureau of Foreign Employment	Remittances & Earnings	2019-2023	✅ Collected	⚠️ Pending (2024)
Sri Lanka Bureau of Foreign Employment	Registrations (SLBFE)	2019-2023	✅ Collected	⚠️ Pending (2024)
Sri Lanka Tourism Development Authority	Tourist Arrivals	2019-2024	✅ Collected	✅ Verified (2024 Partial)
Sri Lanka Tourism Development Authority	Accommodations & Occupancy	2019-2024	✅ Collected	✅ Verified (2024 Partial)
Sri Lanka Tourism Development Authority	Revenue Statistics	2019-2024	✅ Collected	✅ Verified (2024 Partial)

📅 Years Available

2019
2020-2021
2022-2023
2024

🚀 Quick Start

📖 Browse all data interactively →

🌐 View online at GitHub Pages →

All datasets are in clean JSON format with metadata .

This repository contains cleaned and organized datasets from various Sri Lankan government public sources, compiled by the Lanka Data Foundation. The data spans from 2019 to 2024 and covers multiple ministries and departments.

🛠️ Installation & Setup

To run the data ingestion and utility scripts, you'll need to set up the Python environment. We recommend using Mamba (or Conda).

Create the environment:
```
mamba env create -f environment.yml
```
(If using Conda: conda env create -f environment.yml)
Activate the environment:
```
mamba activate datasets_env
```

Run the scripts:

# Run the optimized ingestion script
python insert.py

# Run the attribute writer (optional year filter)
python write_attributes.py --year 2023

📊 Dataset Overview

Total Years: 6 (2019-2024)
Total Datasets: 175+ JSON files
Ministries Covered: 4 main categories
Data Sources: Public government sources

🏗️ Repository Structure

datasets/
├── data/                           # Main data directory
│   ├── 2019/                      # Year-based organization
│   ├── 2020/
│   ├── 2021/
│   ├── 2022/
│   └── 2023/
├── generate_static_html.py         # HTML generator script
├── index.html                      # Generated static HTML
├── styles.css                      # CSS stylesheet
└── README.md                       # This file

📁 Data Organization

Data is organized hierarchically:

Year → Government → President → Ministry → Department → Data Files

Data File Structure

Each dataset contains:

data.json - The main dataset
metadata.json - Metadata about the dataset (optional)

🔄 How to Update Data and Regenerate HTML

1. Adding New Data

Adding Data for a New Year

Create a new folder under data/ (e.g., data/2024/)

Follow the existing folder structure:

data/2024/
└── Government of Sri Lanka(government)/
    └── [President Name](citizen)/
        └── [Ministry Name](minister)/
            └── [Department Name](department)/
                ├── [category]/
                │   ├── data.json
                │   └── metadata.json (optional)

Adding Data to Existing Year

Navigate to the appropriate year folder in data/
Follow the existing hierarchy to find the correct ministry/department
Add your data.json and optional metadata.json files

Data File Requirements

data.json: Must contain valid JSON data
metadata.json: Optional, should contain dataset metadata (description, source, etc.)
Files must be placed in appropriately named folders with category indicators

2. Update the Website (Optional)

The API documentation website is built with Jekyll on GitHub Pages. The data listing is auto-generated and injected into docs/index.md.

To update the data listing:

Run the update script:
```
python3 update_dataset_index.py
```
This will:
- Scan the data/ directory.
- Generate ZIP files for each year.
- Inject the file listing into docs/index.md.
Commit and push changes to main branch.

3. What Gets Generated

ZIP Files

Automatically created for each year folder
Contains all JSON files from that year
Named as [YEAR]_Data.zip (e.g., 2019_Data.zip)

HTML Features

Interactive collapsible sections
Download buttons for yearly ZIP files
In-browser JSON viewer with copy/download functionality
Responsive design with CSS styling

4. Folder Structure Guidelines

Special Naming Conventions

Use (government), (citizen), (minister), (department) suffixes for proper categorization
Use (AS_CATEGORY) for sub-categories
Underscores in folder names will be converted to spaces in display

5. Customization

Adding New Emojis

Edit the get_emoji_for_type() function in generate_static_html.py:

emoji_map = {
    'your_category': '🎯',
    # ... existing mappings
}

Modifying CSS

Edit styles.css to customize the appearance:

Colors, fonts, spacing
Responsive breakpoints
Modal styling for JSON viewer

Updating Statistics

The script automatically counts datasets, but you can manually update the description in the main() function.

🚀 Deployment

The generated index.html is ready for deployment on:

GitHub Pages
Any static hosting service
Local web servers

📞 Contact

For any enquiries please contact: contact@datafoundation.lk

Codebase at: https://github.com/LDFLK/datasets

📄 License

See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
data		data
docs		docs
.gitignore		.gitignore
DEVELOPER.md		DEVELOPER.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
find_missing_datasets.py		find_missing_datasets.py
guide.css		guide.css
guide.html		guide.html
index.html		index.html
linter.py		linter.py
replicate_structure.py		replicate_structure.py
script.py		script.py
styles.css		styles.css
test_linter.py		test_linter.py
update_dataset_index.py		update_dataset_index.py
verify.py		verify.py
write_attributes.py		write_attributes.py

License

LDFLK/datasets

Folders and files

Latest commit

History

Repository files navigation