Skip to content

Metadata Quality Stack is a comprehensive toolkit for analysing metadata quality. It implements the European Data Portal's MQA methodology. Docker Compose deployment and React web application.

License

Notifications You must be signed in to change notification settings

mjanez/metadata-quality-stack

Repository files navigation

Metadata Quality Stack

ES EN Static Demo

A comprehensive toolkit for analyzing the quality of open data metadata. Based on the European Data Portal's Metadata Quality Assessment (MQA) methodology and SHACL validation for DCAT-AP, DCAT-AP-ES and NTI-RISP (2013). profiles.

Quick Start

Try the React version. It is for the web!

Try the simplified browser version (no installation needed), more info at mjanez/metadata-quality-react/README.md

Tip

Live Demo: metadata-quality.mjanez.dev/
This edition runs entirely client-side and includes the core MQA and SHACL validator for instant metadata quality checks in your browser.

Full docker deployment (Recommended for more complex validations)

For complete features including historical tracking and API access:

git clone https://github.com/your-organization/metadata-quality-stack.git
cd metadata-quality-stack
docker-compose up

Overview

This tool helps data publishers and consumers evaluate and improve the quality of metadata in open data catalogs. It analyzes metadata against the FAIR+C principles (Findability, Accessibility, Interoperability, Reusability, and Contextuality) and provides detailed reports on quality metrics.

Features

Two Deployment Options

  1. Static Version - GitHub Pages compatible, no backend required
  2. Docker Version - Full-featured with database and API

Core Capabilities

  • Quality Assessment: Evaluate metadata according to the MQA methodology
  • Multiple Profiles: Support for DCAT-AP, DCAT-AP-ES, and NTI-RISP standards
  • Real-time Validation: Instant feedback on metadata quality
  • Interactive Visualizations: Radar charts and detailed metrics breakdown
  • Multilingual Support: English and Spanish interfaces
  • API Integration: REST API for programmatic access (Docker version)
  • Historical Tracking: Store and visualize quality evolution over time (Docker version)
  • SHACL Validation: Check compliance with official shapes from:

Static version

React React React React

Docker version

Home Home Home Home Home Home

Architecture

Deployment Options

1. Static Version (Client-Side Only)

2. Docker Version (Full Stack)

  • Technology: FastAPI backend + Streamlit frontend + nginx proxy
  • Features: Complete functionality + database + API + historical tracking
  • Use Case: Production environments, enterprise deployment

The project consists of these main components:

  1. API: FastAPI-based backend that validates metadata and generates reports
  2. Frontend: Streamlit-based web interface for visualizing reports
  3. Static Version: Client-side implementation for easy deployment

Installation

Static Version (Quick Start)

Using https://metadata-quality.mjanez.dev/

Features: Full metadata validation, SHACL compliance checking with official shapes, no backend required.

Docker Version (Production)

For complete functionality with database and API:

  1. Clone the repository:

    git clone https://github.com/your-organization/metadata-quality-stack.git
    cd metadata-quality-stack
  2. Start the services using Docker Compose:

    docker-compose up

Manual Installation

  1. Clone the repository:

    git clone https://github.com/your-organization/metadata-quality-stack.git
    cd metadata-quality-stack
  2. Install dependencies:

    pip install -e .
  3. Start the API:

    uvicorn src.api.main:app --host 0.0.0.0 --port 8000
  4. Start the frontend (in a separate terminal):

    streamlit run src/frontend/app.py

Usage

Backend (API)

The API provides the following endpoints:

  • Base API: http://localhost:80/
  • Swagger UI Documentation. Interactive interface to test the API: http://localhost:80/docs
  • ReDoc Documentation. Detailed documentation in more readable format: http://localhost:80/redoc
  • Endpoints:
    • POST /validate: Validate metadata from a URL
    • POST /validate-content: Validate metadata provided directly as content
    • GET /report/{url}: Get the most recent report for a URL
    • GET /history/{url}: Get report history for a URL
    • GET /reports/by-date: Fetch reports within a specified date range
    • GET /reports/by-rating/{rating}: Get reports with a specific quality rating

Frontend (Web Interface)

  • Web Interface: http://localhost:8501/
  • Main sections:
    1. Validation Options:

      • Enter a URL to a catalog ()RDF/XML, TTL, JSON-LD and N3 formats)
      • Paste RDF content directly for validation
      • Select different compliance profiles (DCAT-AP, DCAT-AP-ES, NTI-RISP)
    2. Visualization Features:

      • Hierarchical chart showing dimension and metric relationships
      • Radar chart displaying performance across FAIR+C dimensions
      • Detailed metrics breakdown with counts and percentages
    3. Report Management:

      • View historical reports and track quality evolution over time
      • Export reports in both JSON and JSON-LD (DQV vocabulary) formats
      • Score evolution charts for long-term quality tracking
    4. Analytics Dashboard:

      • Overview statistics of catalogs evaluated
      • Distribution of quality ratings
      • Comparison of dimension averages
      • Top and bottom performing catalogs
      • Dimension correlation analysis
    5. Multilingual Support:

      • Toggle between English and Spanish interfaces
      • Localized metric descriptions and labels

Development

For development, we recommend using VS Code with the Dev Container configuration provided:

  1. Install the VS Code Remote - Containers extension
  2. Open the project in VS Code
  3. Click on "Reopen in Container" when prompted
  4. Wait for the container to build and configure

Translation

After updating the translation file (mqa.po), don't forget to compile it to generate the .mo file, e.g spanish:

cd metadata-quality-stack

# Extract i18n texts and update POT of apps (e.g. app.py)
xgettext -d mqa --from-code=UTF-8 -o locales/mqa.pot src/frontend/app.py

# Compile MO files (Spanish)
msgfmt -o locale/es/LC_MESSAGES/mqa.mo locale/es/LC_MESSAGES/mqa.po

Extending Profile Metrics

The system is designed to be modular, allowing you to easily extend or customize metrics for specific profiles (DCAT-AP, DCAT-AP-ES, NTI-RISP, etc.). Follow these steps to extend or create metrics for a profile:

1. Define Your Metrics in config.py

Each metric is defined as a dictionary with ID, dimension, and weight. To add metrics for a new or existing profile:

# Define specific metrics for your profile
MY_PROFILE_SPECIFIC_METRICS = [
    {"id": "my_new_metric", "dimension": "interoperability", "weight": 20},
    {"id": "another_metric", "dimension": "reusability", "weight": 15}
]

# Add your metrics to the METRICS_BY_PROFILE dictionary
METRICS_BY_PROFILE["my_profile"] = COMMON_METRICS + MY_PROFILE_SPECIFIC_METRICS

2. Create Checkers for Your Metrics in validators.py

For each new metric, create a checker that implements the validation logic:

# Create a checker class if existing ones don't fit your needs
class MyCustomChecker(MetricChecker):
    def __init__(self, property_uri: URIRef):
        self.property_uri = property_uri
    
    def check(self, g: Graph, resources: List[URIRef], context: Dict[str, Any] = None) -> Tuple[int, int]:
        # Implement your checking logic here
        # Return a tuple of (successful count, total count)
        return (count, total)

# Add your checker to the CHECKER_DEFINITIONS dictionary
CHECKER_DEFINITIONS.update({
    "my_new_metric": lambda: MyCustomChecker(MY_PROPERTY_URI),
    "another_metric": lambda: ExistingCheckerClass(MY_OTHER_PROPERTY)
})

3. Update Dimension Scores (If Needed)

If you're adding metrics to a new dimension, ensure the dimension is registered in the DimensionType enum in models.py and update the calculate_dimension_scores function in validators.py to include your new dimension.

4. Register Your Profile in Frontend (Optional)

To make your profile selectable in the UI, update the PROFILES dictionary in frontend/config.py:

PROFILES = {
    "dcat_ap": "DCAT-AP 2.0",
    "dcat_ap_es": "DCAT-AP-ES 2.0",
    "nti_risp": "NTI-RISP",
    "my_profile": "My Custom Profile"
}

Example: Adding Label-Based Format Checker for NTI-RISP

Here's an example of extending NTI-RISP with a label-based format checker:

  1. Create the specialized checker class:
class VocabularyLabelComplianceChecker(MetricChecker):
    """Check if property labels comply with a CSV-based vocabulary."""
    
    def __init__(self, property_uris: List[URIRef], csv_path: str, 
                 compare_column: str = None, label_property: URIRef = RDFS.label):
        self.property_uris = property_uris
        self.csv_path = csv_path
        self.compare_column = compare_column
        self.label_property = label_property
        # Initialize allowed values from CSV file
        # ...
    
    def check(self, g: Graph, resources: List[URIRef], context: Dict[str, Any] = None) -> Tuple[int, int]:
        # Check values against the allowed values, considering labels
        # ...
  1. Add to CHECKER_DEFINITIONS:
CHECKER_DEFINITIONS.update({
    "dct_format_nonproprietary_nti": lambda: VocabularyLabelComplianceChecker(
        [DCTERMS.format], MQA_VOCABS['non_proprietary']
    )
})
  1. Add the metric to NTI_RISP_SPECIFIC_METRICS:
NTI_RISP_SPECIFIC_METRICS.append(
    {"id": "dct_format_nonproprietary_nti", "dimension": "interoperability", "weight": 25}
)

SHACL Validation

The API now uses remote SHACL files directly from official repositories, ensuring you always have the latest validation rules:

Automatic Remote Loading

Benefits

  • Always up-to-date: Latest SHACL shapes automatically available
  • No local maintenance: No need to manually update SHACL files
  • Reduced repository size: No local SHACL files stored
  • Official sources: Direct from standards organizations

Configuration

The SHACL URLs are configured in src/api/config.py:

# DCAT-AP SHACL files by level - using remote URLs
DCAT_AP_SHACL_FILES = {
    SHACLLevel.LEVEL_1: [
        "https://raw.githubusercontent.com/SEMICeu/DCAT-AP/refs/heads/master/releases/2.1.1/dcat-ap_2.1.1_shacl_shapes.ttl",
        # ... more URLs
    ]
}

Note

The API includes fallback mechanisms in case remote URLs are temporarily unavailable.

Update SSL Certificate

To update the local SSL certificate, follow these steps:

  1. Generate a new certificate and private key:
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout nginx/setup/metadata-quality-stack.key \
  -out nginx/setup/metadata-quality-stack.crt \
  -subj "/C=ES/ST=Madrid/L=Madrid/O=Development/CN=localhost"
  1. Verify that the files have been created correctly:
ls -l nginx/setup/metadata-quality-stack.*
  1. Restart the nginx container to apply the changes:
docker compose restart nginx

Caution

This certificate is for local development only. In production, use a valid certificate from a certificate authority.

Licence

See the LICENSE file for license rights and limitations (MIT).

About

Metadata Quality Stack is a comprehensive toolkit for analysing metadata quality. It implements the European Data Portal's MQA methodology. Docker Compose deployment and React web application.

Topics

Resources

License

Stars

Watchers

Forks

Packages