69 datamodule and data import are combined#70
Conversation
| self.batch_size = batch_size | ||
| self.num_workers = num_workers | ||
| self.visualise_training_data = visualise_training_data | ||
| self.image_series_option = image_series_option |
There was a problem hiding this comment.
is this only relevant for image data from XNAT? if so, could that be reflected in a comment?
There was a problem hiding this comment.
Leave for now, can always delete or refactor at a later stage.
There was a problem hiding this comment.
add comments in the code to prompt future users to delete as appropriate
| ] | ||
|
|
||
| def prepare_data(self, *args, **kwargs): | ||
| def setup(self, *args, **kwargs): |
There was a problem hiding this comment.
I would like to have an in-depth discussion around setup() versus prepare_data(), potentially at the next TT.
There was a problem hiding this comment.
Keep as is for now, but explore in the future.
There was a problem hiding this comment.
add comment in the code with a brief summary of the decision to change this and point to resources where relevant considerations are listed for future use
| return(data_builder.dataset) | ||
|
|
||
| @staticmethod | ||
| def fetch_xr(subject_data: SubjectData = None) -> List[ImageScanData]: |
There was a problem hiding this comment.
"CR" and "DX" are I assume specific for X-rays. I recommend we add various fetch functions with good documentation and prompt the developer of the specific project to choose and adapt the available functions as applicable for their project.
There was a problem hiding this comment.
Create as separate issue!
| import pandas as pd | ||
| import numpy as np | ||
|
|
||
| from utils.tools import DataBuilderXNAT |
There was a problem hiding this comment.
Could XNATDataImport be merged/refactored with DataBuilderXNAT? What's the benefit of having them separate?
A merged version is easier to get rid of/ignore for not image-based projects.
There was a problem hiding this comment.
Different functionalities so keep separate.
There was a problem hiding this comment.
make plans to move both to CSC-XNAT package
| Note: The values present in the template config files are examples, you can remove any except those in `[server]` and `[project]` which are necessary for MLOps. Outside of these you are encouraged to add and modify the config files as relevant to your project. | ||
|
|
||
| ### 2. `project/Network.py` | ||
| ### 2. `project/XNATDataImport.py` |
There was a problem hiding this comment.
I think this should stay to be part of the DataModule component.
There was a problem hiding this comment.
As it is its own component, I think it should have its own explanation.
Will create an issue to make a code structure/architecture image.
Summary of changes
Added a new file called XNATDataImport.py and adapted train.py and DataModule.py accordingly.
Reason for changes
The current implementation of pulling data from XNAT during the initialising of the PyTorch Lightning DataModule is messy and confusing. Adding the data import as a separate step is clearer, and additionally allows for simpler unit testing and data validation.
In this PR I've also updated the DataModule to align with best practices (https://lightning.ai/docs/pytorch/stable/data/datamodule.html). This includes removing the get_data() step as it is now not needed and renaming prepare_data() as setup().