Metadata Quality Stack

A comprehensive toolkit for analyzing the quality of open data metadata. Based on the European Data Portal's Metadata Quality Assessment (MQA) methodology and SHACL validation for DCAT-AP, DCAT-AP-ES and NTI-RISP (2013). profiles.

Quick Start

Try the React version. It is for the web!

Try the simplified browser version (no installation needed), more info at mjanez/metadata-quality-react/README.md

Tip

Live Demo: metadata-quality.mjanez.dev/
This edition runs entirely client-side and includes the core MQA and SHACL validator for instant metadata quality checks in your browser.

Full docker deployment (Recommended for more complex validations)

For complete features including historical tracking and API access:

git clone https://github.com/your-organization/metadata-quality-stack.git
cd metadata-quality-stack
docker-compose up

Overview

This tool helps data publishers and consumers evaluate and improve the quality of metadata in open data catalogs. It analyzes metadata against the FAIR+C principles (Findability, Accessibility, Interoperability, Reusability, and Contextuality) and provides detailed reports on quality metrics.

Features

Two Deployment Options

Static Version - GitHub Pages compatible, no backend required
Docker Version - Full-featured with database and API

Core Capabilities

Quality Assessment: Evaluate metadata according to the MQA methodology
Multiple Profiles: Support for DCAT-AP, DCAT-AP-ES, and NTI-RISP standards
Real-time Validation: Instant feedback on metadata quality
Interactive Visualizations: Radar charts and detailed metrics breakdown
Multilingual Support: English and Spanish interfaces
API Integration: REST API for programmatic access (Docker version)
Historical Tracking: Store and visualize quality evolution over time (Docker version)
SHACL Validation: Check compliance with official shapes from:
- DCAT-AP: European data portal standard
- DCAT-AP-ES: Spanish national profile
- NTI-RISP: Spanish interoperability standard

Static version

Docker version

Architecture

Deployment Options

1. Static Version (Client-Side Only)

Location: https://metadata-quality.mjanez.dev/ directory
Technology: HTML, CSS, TypeScript with N3.js, shacl-engine, rdfxml-streaming-parser.js and React
Deployment: GitHub Pages, any static hosting
Features: Full MQA and SHACL validation, visualization, no backend required
Use Case: Quick deployment, demo environments, edge cases

2. Docker Version (Full Stack)

Technology: FastAPI backend + Streamlit frontend + nginx proxy
Features: Complete functionality + database + API + historical tracking
Use Case: Production environments, enterprise deployment

The project consists of these main components:

API: FastAPI-based backend that validates metadata and generates reports
Frontend: Streamlit-based web interface for visualizing reports
Static Version: Client-side implementation for easy deployment

Installation

Static Version (Quick Start)

Using https://metadata-quality.mjanez.dev/

Features: Full metadata validation, SHACL compliance checking with official shapes, no backend required.

Docker Version (Production)

For complete functionality with database and API:

Clone the repository:

git clone https://github.com/your-organization/metadata-quality-stack.git
cd metadata-quality-stack

Start the services using Docker Compose:
```
docker-compose up
```

Manual Installation

Clone the repository:

git clone https://github.com/your-organization/metadata-quality-stack.git
cd metadata-quality-stack

Install dependencies:
```
pip install -e .
```

Start the API:

uvicorn src.api.main:app --host 0.0.0.0 --port 8000

Start the frontend (in a separate terminal):
```
streamlit run src/frontend/app.py
```

Usage

Backend (API)

The API provides the following endpoints:

Base API: http://localhost:80/
Swagger UI Documentation. Interactive interface to test the API: http://localhost:80/docs
ReDoc Documentation. Detailed documentation in more readable format: http://localhost:80/redoc
Endpoints:
- POST /validate: Validate metadata from a URL
- POST /validate-content: Validate metadata provided directly as content
- GET /report/{url}: Get the most recent report for a URL
- GET /history/{url}: Get report history for a URL
- GET /reports/by-date: Fetch reports within a specified date range
- GET /reports/by-rating/{rating}: Get reports with a specific quality rating

Frontend (Web Interface)

Web Interface: http://localhost:8501/
Main sections:
1. Validation Options:
  - Enter a URL to a catalog ()RDF/XML, TTL, JSON-LD and N3 formats)
  - Paste RDF content directly for validation
  - Select different compliance profiles (DCAT-AP, DCAT-AP-ES, NTI-RISP)
2. Visualization Features:
  - Hierarchical chart showing dimension and metric relationships
  - Radar chart displaying performance across FAIR+C dimensions
  - Detailed metrics breakdown with counts and percentages
3. Report Management:
  - View historical reports and track quality evolution over time
  - Export reports in both JSON and JSON-LD (DQV vocabulary) formats
  - Score evolution charts for long-term quality tracking
4. Analytics Dashboard:
  - Overview statistics of catalogs evaluated
  - Distribution of quality ratings
  - Comparison of dimension averages
  - Top and bottom performing catalogs
  - Dimension correlation analysis
5. Multilingual Support:
  - Toggle between English and Spanish interfaces
  - Localized metric descriptions and labels

Development

For development, we recommend using VS Code with the Dev Container configuration provided:

Install the VS Code Remote - Containers extension
Open the project in VS Code
Click on "Reopen in Container" when prompted
Wait for the container to build and configure

Translation

After updating the translation file (mqa.po), don't forget to compile it to generate the .mo file, e.g spanish:

cd metadata-quality-stack

# Extract i18n texts and update POT of apps (e.g. app.py)
xgettext -d mqa --from-code=UTF-8 -o locales/mqa.pot src/frontend/app.py

# Compile MO files (Spanish)
msgfmt -o locale/es/LC_MESSAGES/mqa.mo locale/es/LC_MESSAGES/mqa.po

Extending Profile Metrics

The system is designed to be modular, allowing you to easily extend or customize metrics for specific profiles (DCAT-AP, DCAT-AP-ES, NTI-RISP, etc.). Follow these steps to extend or create metrics for a profile:

1. Define Your Metrics in `config.py`

Each metric is defined as a dictionary with ID, dimension, and weight. To add metrics for a new or existing profile:

# Define specific metrics for your profile
MY_PROFILE_SPECIFIC_METRICS = [
    {"id": "my_new_metric", "dimension": "interoperability", "weight": 20},
    {"id": "another_metric", "dimension": "reusability", "weight": 15}
]

# Add your metrics to the METRICS_BY_PROFILE dictionary
METRICS_BY_PROFILE["my_profile"] = COMMON_METRICS + MY_PROFILE_SPECIFIC_METRICS

2. Create Checkers for Your Metrics in `validators.py`

For each new metric, create a checker that implements the validation logic:

# Create a checker class if existing ones don't fit your needs
class MyCustomChecker(MetricChecker):
    def __init__(self, property_uri: URIRef):
        self.property_uri = property_uri
    
    def check(self, g: Graph, resources: List[URIRef], context: Dict[str, Any] = None) -> Tuple[int, int]:
        # Implement your checking logic here
        # Return a tuple of (successful count, total count)
        return (count, total)

# Add your checker to the CHECKER_DEFINITIONS dictionary
CHECKER_DEFINITIONS.update({
    "my_new_metric": lambda: MyCustomChecker(MY_PROPERTY_URI),
    "another_metric": lambda: ExistingCheckerClass(MY_OTHER_PROPERTY)
})

3. Update Dimension Scores (If Needed)

If you're adding metrics to a new dimension, ensure the dimension is registered in the DimensionType enum in models.py and update the calculate_dimension_scores function in validators.py to include your new dimension.

4. Register Your Profile in Frontend (Optional)

To make your profile selectable in the UI, update the PROFILES dictionary in frontend/config.py:

PROFILES = {
    "dcat_ap": "DCAT-AP 2.0",
    "dcat_ap_es": "DCAT-AP-ES 2.0",
    "nti_risp": "NTI-RISP",
    "my_profile": "My Custom Profile"
}

Example: Adding Label-Based Format Checker for NTI-RISP

Here's an example of extending NTI-RISP with a label-based format checker:

Create the specialized checker class:

class VocabularyLabelComplianceChecker(MetricChecker):
    """Check if property labels comply with a CSV-based vocabulary."""
    
    def __init__(self, property_uris: List[URIRef], csv_path: str, 
                 compare_column: str = None, label_property: URIRef = RDFS.label):
        self.property_uris = property_uris
        self.csv_path = csv_path
        self.compare_column = compare_column
        self.label_property = label_property
        # Initialize allowed values from CSV file
        # ...
    
    def check(self, g: Graph, resources: List[URIRef], context: Dict[str, Any] = None) -> Tuple[int, int]:
        # Check values against the allowed values, considering labels
        # ...

Add to CHECKER_DEFINITIONS:

CHECKER_DEFINITIONS.update({
    "dct_format_nonproprietary_nti": lambda: VocabularyLabelComplianceChecker(
        [DCTERMS.format], MQA_VOCABS['non_proprietary']
    )
})

Add the metric to NTI_RISP_SPECIFIC_METRICS:

NTI_RISP_SPECIFIC_METRICS.append(
    {"id": "dct_format_nonproprietary_nti", "dimension": "interoperability", "weight": 25}
)

SHACL Validation

The API now uses remote SHACL files directly from official repositories, ensuring you always have the latest validation rules:

Automatic Remote Loading

DCAT-AP: Files loaded from SEMICeu/DCAT-AP GitHub repository
DCAT-AP-ES: Files loaded from datosgobes/DCAT-AP-ES GitHub repository
NTI-RISP: Files loaded from datosgobes/NTI-RISP GitHub repository

Benefits

✅ Always up-to-date: Latest SHACL shapes automatically available
✅ No local maintenance: No need to manually update SHACL files
✅ Reduced repository size: No local SHACL files stored
✅ Official sources: Direct from standards organizations

Configuration

The SHACL URLs are configured in src/api/config.py:

# DCAT-AP SHACL files by level - using remote URLs
DCAT_AP_SHACL_FILES = {
    SHACLLevel.LEVEL_1: [
        "https://raw.githubusercontent.com/SEMICeu/DCAT-AP/refs/heads/master/releases/2.1.1/dcat-ap_2.1.1_shacl_shapes.ttl",
        # ... more URLs
    ]
}

Note

The API includes fallback mechanisms in case remote URLs are temporarily unavailable.

Update SSL Certificate

To update the local SSL certificate, follow these steps:

Generate a new certificate and private key:

openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout nginx/setup/metadata-quality-stack.key \
  -out nginx/setup/metadata-quality-stack.crt \
  -subj "/C=ES/ST=Madrid/L=Madrid/O=Development/CN=localhost"

Verify that the files have been created correctly:

ls -l nginx/setup/metadata-quality-stack.*

Restart the nginx container to apply the changes:

docker compose restart nginx

Caution

This certificate is for local development only. In production, use a valid certificate from a certificate authority.

Licence

See the LICENSE file for license rights and limitations (MIT).

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
docker		docker
docs		docs
locale		locale
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.es.md		README.es.md
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

License

mjanez/metadata-quality-stack

Folders and files

Latest commit

History

Repository files navigation

Metadata Quality Stack

Quick Start

Try the React version. It is for the web!

Full docker deployment (Recommended for more complex validations)

Overview

Features

Two Deployment Options

Core Capabilities

Static version

Docker version

Architecture

Deployment Options

1. Static Version (Client-Side Only)

2. Docker Version (Full Stack)

Installation

Static Version (Quick Start)

Docker Version (Production)

Manual Installation

Usage

Backend (API)

Frontend (Web Interface)

Development

Translation

Extending Profile Metrics

1. Define Your Metrics in config.py

2. Create Checkers for Your Metrics in validators.py

3. Update Dimension Scores (If Needed)

4. Register Your Profile in Frontend (Optional)

Example: Adding Label-Based Format Checker for NTI-RISP

SHACL Validation

Automatic Remote Loading

Benefits

Configuration

Update SSL Certificate

Licence

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Uh oh!

Languages

1. Define Your Metrics in `config.py`

2. Create Checkers for Your Metrics in `validators.py`

Packages