SETLr is a powerful Python tool for generating RDF graphs from tabular data using declarative SETL (Semantic Extract, Transform, Load) scripts.
β¨ Multiple Data Sources: CSV, Excel, JSON, XML, RDF, SAS files
π Flexible Transformations: JSON-LD templates with Jinja2, Python functions, SPARQL
β‘ High Performance: Streaming XML parsing, pandas DataFrames, progress tracking
π Python Integration: Use as library or CLI tool
β
Validation: Built-in SHACL validation
π Well Documented: Comprehensive guides and API reference
pip install setlrCreate data.csv:
ID,Name,Email
1,Alice,alice@example.com
2,Bob,bob@example.comCreate transform.setl.ttl:
@prefix setl: <http://purl.org/twc/vocab/setl/> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix : <http://example.com/> .
:table a csvw:Table, setl:Table ;
prov:wasGeneratedBy [ a setl:Extract ; prov:used <data.csv> ] .
:output a void:Dataset ;
prov:wasGeneratedBy [
a setl:Transform, setl:JSLDT ;
prov:used :table ;
prov:value '''[{
"@id": "http://example.com/person/{{row.ID}}",
"@type": "http://xmlns.com/foaf/0.1/Person",
"http://xmlns.com/foaf/0.1/name": "{{row.Name}}",
"http://xmlns.com/foaf/0.1/mbox": "mailto:{{row.Email}}"
}]'''
] .Run SETLr:
setlr transform.setl.ttlfrom rdflib import Graph, URIRef
import setlr
# Load SETL script
setl_graph = Graph()
setl_graph.parse("transform.setl.ttl", format="turtle")
# Execute ETL pipeline
resources = setlr.run_setl(setl_graph)
# Access generated RDF
output = resources[URIRef('http://example.com/output')]
print(f"Generated {len(output)} RDF triples")π Complete Documentation - Full guides and references
Quick Links:
- Tutorial - Step-by-step guide to SETLr
- JSLDT Template Language - Transform syntax reference
- Python API - Using SETLr from Python
- Quick Start - Get started in 5 minutes
- Examples - Real-world examples
Advanced Topics:
- Streaming XML with XPath - Efficient large file processing
- Python Functions - Custom Python transforms
- SPARQL Support - Query and update endpoints
- SHACL Validation - Validate your RDF output
SETLr uses RDF (with PROV-O vocabulary) to describe ETL workflows:
- Extract: Load data from sources (CSV, Excel, JSON, XML, RDF, SAS)
- Transform: Apply templates or Python scripts to generate RDF
- Load: Save to files or SPARQL endpoints
Input:
- Tabular: CSV, TSV, Excel (XLS/XLSX), SAS (XPORT/SAS7BDAT)
- Structured: JSON (with ijson selectors), XML (with XPath streaming)
- Semantic: RDF (Turtle, JSON-LD, RDF/XML, etc.), OWL Ontologies
Output:
- RDF: Turtle, TriG, N-Triples, N3, RDF/XML, JSON-LD
- Destinations: Files, SPARQL Update endpoints
See the examples/ directory for complete working examples:
social.setl.ttl- Basic CSV to RDF with conditionals and loopsontology.setl.ttl- OWL ontology transformation with SHACL shapes
# Clone repository
git clone https://github.com/tetherless-world/setlr.git
cd setlr
# Bootstrap (creates venv and installs dependencies)
./script/bootstrap
# Activate virtual environment
source venv/bin/activate
# Run tests
./script/build
# Run linter
flake8 setlr/Contributions are welcome! Please see our Contributing Guide for details on:
- Development setup and workflow
- Code standards and style guidelines
- Testing requirements
- Pull request process
Please note that this project follows a Code of Conduct. By participating, you are expected to uphold this code.
Apache License 2.0 - see LICENSE file for details.
If you use SETLr in your research, please cite:
@software{setlr,
title = {SETLr: Semantic Extract, Transform and Load},
author = {McCusker, Jamie},
year = {2024},
url = {https://github.com/tetherless-world/setlr}
}- π Documentation
- π Issue Tracker
- π¬ Discussions
- π Security Policy - Report security vulnerabilities