A comprehensive toolkit for analyzing the quality of open data metadata. Based on the European Data Portal's Metadata Quality Assessment (MQA) methodology and SHACL validation for DCAT-AP, DCAT-AP-ES and NTI-RISP (2013). profiles.
Try the simplified browser version (no installation needed), more info at mjanez/metadata-quality-react/README.md
Tip
Live Demo: metadata-quality.mjanez.dev/
This edition runs entirely client-side and includes the core MQA and SHACL validator for instant metadata quality checks in your browser.
For complete features including historical tracking and API access:
git clone https://github.com/your-organization/metadata-quality-stack.git
cd metadata-quality-stack
docker-compose upThis tool helps data publishers and consumers evaluate and improve the quality of metadata in open data catalogs. It analyzes metadata against the FAIR+C principles (Findability, Accessibility, Interoperability, Reusability, and Contextuality) and provides detailed reports on quality metrics.
- Static Version - GitHub Pages compatible, no backend required
- Docker Version - Full-featured with database and API
- Quality Assessment: Evaluate metadata according to the MQA methodology
- Multiple Profiles: Support for DCAT-AP, DCAT-AP-ES, and NTI-RISP standards
- Real-time Validation: Instant feedback on metadata quality
- Interactive Visualizations: Radar charts and detailed metrics breakdown
- Multilingual Support: English and Spanish interfaces
- API Integration: REST API for programmatic access (Docker version)
- Historical Tracking: Store and visualize quality evolution over time (Docker version)
- SHACL Validation: Check compliance with official shapes from:
- DCAT-AP: European data portal standard
- DCAT-AP-ES: Spanish national profile
- NTI-RISP: Spanish interoperability standard
- Location: https://metadata-quality.mjanez.dev/ directory
- Technology: HTML, CSS, TypeScript with
N3.js,shacl-engine,rdfxml-streaming-parser.jsand React - Deployment: GitHub Pages, any static hosting
- Features: Full MQA and SHACL validation, visualization, no backend required
- Use Case: Quick deployment, demo environments, edge cases
- Technology: FastAPI backend + Streamlit frontend + nginx proxy
- Features: Complete functionality + database + API + historical tracking
- Use Case: Production environments, enterprise deployment
The project consists of these main components:
- API: FastAPI-based backend that validates metadata and generates reports
- Frontend: Streamlit-based web interface for visualizing reports
- Static Version: Client-side implementation for easy deployment
Using https://metadata-quality.mjanez.dev/
Features: Full metadata validation, SHACL compliance checking with official shapes, no backend required.
For complete functionality with database and API:
-
Clone the repository:
git clone https://github.com/your-organization/metadata-quality-stack.git cd metadata-quality-stack -
Start the services using Docker Compose:
docker-compose up
-
Clone the repository:
git clone https://github.com/your-organization/metadata-quality-stack.git cd metadata-quality-stack -
Install dependencies:
pip install -e . -
Start the API:
uvicorn src.api.main:app --host 0.0.0.0 --port 8000
-
Start the frontend (in a separate terminal):
streamlit run src/frontend/app.py
The API provides the following endpoints:
- Base API:
http://localhost:80/ - Swagger UI Documentation. Interactive interface to test the API:
http://localhost:80/docs - ReDoc Documentation. Detailed documentation in more readable format:
http://localhost:80/redoc - Endpoints:
POST/validate: Validate metadata from a URLPOST/validate-content: Validate metadata provided directly as contentGET/report/{url}: Get the most recent report for a URLGET/history/{url}: Get report history for a URLGET/reports/by-date: Fetch reports within a specified date rangeGET/reports/by-rating/{rating}: Get reports with a specific quality rating
- Web Interface:
http://localhost:8501/ - Main sections:
-
Validation Options:
- Enter a URL to a catalog ()
RDF/XML,TTL,JSON-LDandN3formats) - Paste RDF content directly for validation
- Select different compliance profiles (DCAT-AP, DCAT-AP-ES, NTI-RISP)
- Enter a URL to a catalog ()
-
Visualization Features:
- Hierarchical chart showing dimension and metric relationships
- Radar chart displaying performance across FAIR+C dimensions
- Detailed metrics breakdown with counts and percentages
-
Report Management:
- View historical reports and track quality evolution over time
- Export reports in both JSON and JSON-LD (DQV vocabulary) formats
- Score evolution charts for long-term quality tracking
-
Analytics Dashboard:
- Overview statistics of catalogs evaluated
- Distribution of quality ratings
- Comparison of dimension averages
- Top and bottom performing catalogs
- Dimension correlation analysis
-
Multilingual Support:
- Toggle between English and Spanish interfaces
- Localized metric descriptions and labels
-
For development, we recommend using VS Code with the Dev Container configuration provided:
- Install the VS Code Remote - Containers extension
- Open the project in VS Code
- Click on "Reopen in Container" when prompted
- Wait for the container to build and configure
After updating the translation file (mqa.po), don't forget to compile it to generate the .mo file, e.g spanish:
cd metadata-quality-stack
# Extract i18n texts and update POT of apps (e.g. app.py)
xgettext -d mqa --from-code=UTF-8 -o locales/mqa.pot src/frontend/app.py
# Compile MO files (Spanish)
msgfmt -o locale/es/LC_MESSAGES/mqa.mo locale/es/LC_MESSAGES/mqa.poThe system is designed to be modular, allowing you to easily extend or customize metrics for specific profiles (DCAT-AP, DCAT-AP-ES, NTI-RISP, etc.). Follow these steps to extend or create metrics for a profile:
Each metric is defined as a dictionary with ID, dimension, and weight. To add metrics for a new or existing profile:
# Define specific metrics for your profile
MY_PROFILE_SPECIFIC_METRICS = [
{"id": "my_new_metric", "dimension": "interoperability", "weight": 20},
{"id": "another_metric", "dimension": "reusability", "weight": 15}
]
# Add your metrics to the METRICS_BY_PROFILE dictionary
METRICS_BY_PROFILE["my_profile"] = COMMON_METRICS + MY_PROFILE_SPECIFIC_METRICSFor each new metric, create a checker that implements the validation logic:
# Create a checker class if existing ones don't fit your needs
class MyCustomChecker(MetricChecker):
def __init__(self, property_uri: URIRef):
self.property_uri = property_uri
def check(self, g: Graph, resources: List[URIRef], context: Dict[str, Any] = None) -> Tuple[int, int]:
# Implement your checking logic here
# Return a tuple of (successful count, total count)
return (count, total)
# Add your checker to the CHECKER_DEFINITIONS dictionary
CHECKER_DEFINITIONS.update({
"my_new_metric": lambda: MyCustomChecker(MY_PROPERTY_URI),
"another_metric": lambda: ExistingCheckerClass(MY_OTHER_PROPERTY)
})If you're adding metrics to a new dimension, ensure the dimension is registered in the DimensionType enum in models.py and update the calculate_dimension_scores function in validators.py to include your new dimension.
To make your profile selectable in the UI, update the PROFILES dictionary in frontend/config.py:
PROFILES = {
"dcat_ap": "DCAT-AP 2.0",
"dcat_ap_es": "DCAT-AP-ES 2.0",
"nti_risp": "NTI-RISP",
"my_profile": "My Custom Profile"
}Here's an example of extending NTI-RISP with a label-based format checker:
- Create the specialized checker class:
class VocabularyLabelComplianceChecker(MetricChecker):
"""Check if property labels comply with a CSV-based vocabulary."""
def __init__(self, property_uris: List[URIRef], csv_path: str,
compare_column: str = None, label_property: URIRef = RDFS.label):
self.property_uris = property_uris
self.csv_path = csv_path
self.compare_column = compare_column
self.label_property = label_property
# Initialize allowed values from CSV file
# ...
def check(self, g: Graph, resources: List[URIRef], context: Dict[str, Any] = None) -> Tuple[int, int]:
# Check values against the allowed values, considering labels
# ...- Add to
CHECKER_DEFINITIONS:
CHECKER_DEFINITIONS.update({
"dct_format_nonproprietary_nti": lambda: VocabularyLabelComplianceChecker(
[DCTERMS.format], MQA_VOCABS['non_proprietary']
)
})- Add the metric to
NTI_RISP_SPECIFIC_METRICS:
NTI_RISP_SPECIFIC_METRICS.append(
{"id": "dct_format_nonproprietary_nti", "dimension": "interoperability", "weight": 25}
)The API now uses remote SHACL files directly from official repositories, ensuring you always have the latest validation rules:
- DCAT-AP: Files loaded from SEMICeu/DCAT-AP GitHub repository
- DCAT-AP-ES: Files loaded from datosgobes/DCAT-AP-ES GitHub repository
- NTI-RISP: Files loaded from datosgobes/NTI-RISP GitHub repository
- ✅ Always up-to-date: Latest SHACL shapes automatically available
- ✅ No local maintenance: No need to manually update SHACL files
- ✅ Reduced repository size: No local SHACL files stored
- ✅ Official sources: Direct from standards organizations
The SHACL URLs are configured in src/api/config.py:
# DCAT-AP SHACL files by level - using remote URLs
DCAT_AP_SHACL_FILES = {
SHACLLevel.LEVEL_1: [
"https://raw.githubusercontent.com/SEMICeu/DCAT-AP/refs/heads/master/releases/2.1.1/dcat-ap_2.1.1_shacl_shapes.ttl",
# ... more URLs
]
}Note
The API includes fallback mechanisms in case remote URLs are temporarily unavailable.
To update the local SSL certificate, follow these steps:
- Generate a new certificate and private key:
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout nginx/setup/metadata-quality-stack.key \
-out nginx/setup/metadata-quality-stack.crt \
-subj "/C=ES/ST=Madrid/L=Madrid/O=Development/CN=localhost"- Verify that the files have been created correctly:
ls -l nginx/setup/metadata-quality-stack.*- Restart the
nginxcontainer to apply the changes:
docker compose restart nginxCaution
This certificate is for local development only. In production, use a valid certificate from a certificate authority.
See the LICENSE file for license rights and limitations (MIT).









