Implement transparent multicore processing for json_transform #26
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
json_transformfunction processes rows sequentially, leaving CPU cores idle on large datasets. Row processing (template evaluation, Jinja2 rendering, SHACL validation) is CPU-bound and independent across rows.Implementation
ThreadPoolExecutordistributes row processing across all CPU coresmultiprocessing.cpu_count(), configurable viaSETLR_MAX_WORKERS_process_single_row()isolates parallelizable logic with thread-local loggingPerformance
For 100-row dataset with complex templates:
Actual speedup depends on template complexity and SHACL validation overhead.
Testing
test_multicore_performance.pywith 100-row dataset validationOriginal prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.