Skip to content

separately save and load data (collected samples), using torch#9

Open
sheim wants to merge 3 commits intoPFMassiani:masterfrom
sheim:save_gp
Open

separately save and load data (collected samples), using torch#9
sheim wants to merge 3 commits intoPFMassiani:masterfrom
sheim:save_gp

Conversation

@sheim
Copy link
Contributor

@sheim sheim commented Oct 18, 2020

Forgot to commit and PR before going to Ticino. I ended up saving with torch instead of numpy, since train_x and train_y are actually torch tensors anyway. Although... I think torch.save is using pickle in the background.

@PFMassiani
Copy link
Owner

Hey,

Yes, saving using torch.save is better.
I find something to be a bit odd with GP.load: the user still needs to specify train_x and train_y even if load_dataset is specified. A way out could be making Dataset.load a @staticmethod, make train_x and train_y optional arguments in GP.load if load_dataset is specified, and load the GP with the saved data. Do you see a problem with this?

@sheim
Copy link
Contributor Author

sheim commented Oct 21, 2020

No, not really. I thought of it, and thought that forcing the user to specify a dummy train_x and train_y anyway is actually not such a bad idea, so they have to know how to initialize it.
Side note, I do prefer having a non-static method, so you can load the data in-place into an existing GP, instead of instantiating and returning a new GP to replace the existing one.
Not strong opinion either way though.

@PFMassiani
Copy link
Owner

PFMassiani commented Oct 21, 2020

Okay, I understand. When specifying a path to load the Dataset, I think that the user already knows how initialize the Dataset though. I'd find it a bit more natural to make train_x and train_y optional, but I don't have strong opinions either.

On the other hand, I disagree with the "staticness" of Dataset.load. Indeed, doing gp.dataset.load does not ensure that the GP object updates its internal state when its data is changed (currently, this is done by gp._set_gp_data_to_dataset). Therefore, just doing gp.dataset.load may lead to unexpected behaviour.
Nonetheless, the feature you describe is useful: this could be done in another method, that would load the given dataset and update the internal state? (GP.load_dataset or something)

model.load_state_dict(save_dict['state_dict'])

if load_dataset:
model.dataset.load(load_dataset)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to update the GP's internal state after this: self._set_gp_data_to_dataset()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So then how about actually just really making these two things separate? load only loads the model. And separately, load_dataset loads a dataset. This doesn't make train_ optional though...

@sheim
Copy link
Contributor Author

sheim commented Oct 22, 2020

So currently, I split up saving/loading the dataset from saving and loading the model, so the original save and load work the same as before.

I tried making train_x and train_y optional by making them default to None, and checking that you either provide a path to a saved dataset, or provide training data... but this breaks due to the @tensorwrap decorator, which doesn't know how to handle None. Didn't want to go messing around with the decorator though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants