Skip to content

tetherless-world/setlr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

142 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

setlr: Semantic Extract, Transform and Load

Unit Tests Lint codecov

SETLr is a powerful Python tool for generating RDF graphs from tabular data using declarative SETL (Semantic Extract, Transform, Load) scripts.

Features

✨ Multiple Data Sources: CSV, Excel, JSON, XML, RDF, SAS files
πŸ”„ Flexible Transformations: JSON-LD templates with Jinja2, Python functions, SPARQL
⚑ High Performance: Streaming XML parsing, pandas DataFrames, progress tracking
🐍 Python Integration: Use as library or CLI tool
βœ… Validation: Built-in SHACL validation
πŸ“ Well Documented: Comprehensive guides and API reference

Quick Start

Installation

pip install setlr

Simple Example

Create data.csv:

ID,Name,Email
1,Alice,alice@example.com
2,Bob,bob@example.com

Create transform.setl.ttl:

@prefix setl: <http://purl.org/twc/vocab/setl/> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix : <http://example.com/> .

:table a csvw:Table, setl:Table ;
    prov:wasGeneratedBy [ a setl:Extract ; prov:used <data.csv> ] .

:output a void:Dataset ;
    prov:wasGeneratedBy [
        a setl:Transform, setl:JSLDT ;
        prov:used :table ;
        prov:value '''[{
            "@id": "http://example.com/person/{{row.ID}}",
            "@type": "http://xmlns.com/foaf/0.1/Person",
            "http://xmlns.com/foaf/0.1/name": "{{row.Name}}",
            "http://xmlns.com/foaf/0.1/mbox": "mailto:{{row.Email}}"
        }]'''
    ] .

Run SETLr:

setlr transform.setl.ttl

Using from Python

from rdflib import Graph, URIRef
import setlr

# Load SETL script
setl_graph = Graph()
setl_graph.parse("transform.setl.ttl", format="turtle")

# Execute ETL pipeline
resources = setlr.run_setl(setl_graph)

# Access generated RDF
output = resources[URIRef('http://example.com/output')]
print(f"Generated {len(output)} RDF triples")

Documentation

πŸ“š Complete Documentation - Full guides and references

Quick Links:

Advanced Topics:

Key Concepts

SETLr uses RDF (with PROV-O vocabulary) to describe ETL workflows:

  1. Extract: Load data from sources (CSV, Excel, JSON, XML, RDF, SAS)
  2. Transform: Apply templates or Python scripts to generate RDF
  3. Load: Save to files or SPARQL endpoints

Supported Formats

Input:

  • Tabular: CSV, TSV, Excel (XLS/XLSX), SAS (XPORT/SAS7BDAT)
  • Structured: JSON (with ijson selectors), XML (with XPath streaming)
  • Semantic: RDF (Turtle, JSON-LD, RDF/XML, etc.), OWL Ontologies

Output:

  • RDF: Turtle, TriG, N-Triples, N3, RDF/XML, JSON-LD
  • Destinations: Files, SPARQL Update endpoints

Examples

See the examples/ directory for complete working examples:

  • social.setl.ttl - Basic CSV to RDF with conditionals and loops
  • ontology.setl.ttl - OWL ontology transformation with SHACL shapes

Development

# Clone repository
git clone https://github.com/tetherless-world/setlr.git
cd setlr

# Bootstrap (creates venv and installs dependencies)
./script/bootstrap

# Activate virtual environment  
source venv/bin/activate

# Run tests
./script/build

# Run linter
flake8 setlr/

Contributing

Contributions are welcome! Please see our Contributing Guide for details on:

  • Development setup and workflow
  • Code standards and style guidelines
  • Testing requirements
  • Pull request process

Please note that this project follows a Code of Conduct. By participating, you are expected to uphold this code.

License

Apache License 2.0 - see LICENSE file for details.

Citation

If you use SETLr in your research, please cite:

@software{setlr,
  title = {SETLr: Semantic Extract, Transform and Load},
  author = {McCusker, Jamie},
  year = {2024},
  url = {https://github.com/tetherless-world/setlr}
}

Support

About

Semantic Extract, Transform, and Load-er

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 5