Skip to content

69 datamodule and data import are combined#70

Open
mikewoodward94 wants to merge 6 commits intomasterfrom
69-datamodule-and-data-import-are-combined
Open

69 datamodule and data import are combined#70
mikewoodward94 wants to merge 6 commits intomasterfrom
69-datamodule-and-data-import-are-combined

Conversation

@mikewoodward94
Copy link
Collaborator

Summary of changes

Added a new file called XNATDataImport.py and adapted train.py and DataModule.py accordingly.

Reason for changes

The current implementation of pulling data from XNAT during the initialising of the PyTorch Lightning DataModule is messy and confusing. Adding the data import as a separate step is clearer, and additionally allows for simpler unit testing and data validation.

In this PR I've also updated the DataModule to align with best practices (https://lightning.ai/docs/pytorch/stable/data/datamodule.html). This includes removing the get_data() step as it is now not needed and renaming prepare_data() as setup().

@mikewoodward94 mikewoodward94 linked an issue Aug 13, 2024 that may be closed by this pull request
@mikewoodward94 mikewoodward94 self-assigned this Aug 13, 2024
self.batch_size = batch_size
self.num_workers = num_workers
self.visualise_training_data = visualise_training_data
self.image_series_option = image_series_option
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this only relevant for image data from XNAT? if so, could that be reflected in a comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave for now, can always delete or refactor at a later stage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comments in the code to prompt future users to delete as appropriate

]

def prepare_data(self, *args, **kwargs):
def setup(self, *args, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to have an in-depth discussion around setup() versus prepare_data(), potentially at the next TT.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep as is for now, but explore in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment in the code with a brief summary of the decision to change this and point to resources where relevant considerations are listed for future use

return(data_builder.dataset)

@staticmethod
def fetch_xr(subject_data: SubjectData = None) -> List[ImageScanData]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"CR" and "DX" are I assume specific for X-rays. I recommend we add various fetch functions with good documentation and prompt the developer of the specific project to choose and adapt the available functions as applicable for their project.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create as separate issue!

import pandas as pd
import numpy as np

from utils.tools import DataBuilderXNAT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could XNATDataImport be merged/refactored with DataBuilderXNAT? What's the benefit of having them separate?

A merged version is easier to get rid of/ignore for not image-based projects.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different functionalities so keep separate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make plans to move both to CSC-XNAT package

Note: The values present in the template config files are examples, you can remove any except those in `[server]` and `[project]` which are necessary for MLOps. Outside of these you are encouraged to add and modify the config files as relevant to your project.

### 2. `project/Network.py`
### 2. `project/XNATDataImport.py`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should stay to be part of the DataModule component.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is its own component, I think it should have its own explanation.

Will create an issue to make a code structure/architecture image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataModule and Data Import are combined

2 participants