-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Our current test dataset comprises all of chr1 in two different samples: the Jurkat sample and the MOLT4 cell line. It takes about an hour to run the entire pipeline with this dataset.
Ideally, we would have a dataset that runs in under 10 mins or so. This could then be incorporated into a Github CI pipeline that runs automatically upon release of each major and minor version increment, so that we can know when a change that we've made to the code leads to a change in the results.
- find SNVs and indels supported by all callers
- choose just one or two peaks that overlap those variants from each of the two samples
- subset the example dataset to reads that only overlap those peaks
- also try to subset the reference genome that is packaged with the example data, since the ref genome appears to be the largest file, right now
- rerun the pipeline with the smaller dataset and tweak the dataset as necessary to make it run quickly
- use
snakemake --generate-unit-teststo create a bunch of tests that can be executed usingpytest- I'm running into issues with this. It doesn't work for outputs marked as
pipeand there are some problems with other directories (see edge cases fail with--generate-unit-testssnakemake/snakemake#1104) - fix issues and ensure test coverage is appropriate
- remove any unnecessary tests to ensure the test directory is small and can be properly included in version history (edit: this won't be possible, after all - b/c the test directory has to include the outputs of each rule ugh)
- I'm running into issues with this. It doesn't work for outputs marked as
- (optionally) create a Github action like this one to execute
pytestupon each major or minor version increment and confirm the tests pass successfully
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request