-
Notifications
You must be signed in to change notification settings - Fork 6
Description
There is currently a system for checkpointing reconstructions, which is essentially undocumented and has only been used (I believe) once, by me, in a personal project. The idea is to make it possible to break a long reconstruction script up into many small slurm jobs.
The switch to the reconstructors class #47 currently breaks the checkpointing system. However, given that it is undocumented, I think we can still merge the PR without fixing this issue.
At the same time, I think there is a better way to run the checkpointing system. I think with a very small tweak, it would be possible to simply create a with statement, e.g:
with model.use_checkpoints(checkpoint_file_stem, dataset):
# your normal codeThat would then automatically checkpoint whenever the slurm job finishes. This would make it easier to write scripts for checkpointing, since you wouldn't need to manually checkpoint, and all the subtasks would take the full allotment of time. We could keep around the current model.checkpoint(dataset) function, perhaps switching the name to model.manually_checkpoint(dataset), if there are specific points where we want a checkpoint saved. But, it might be better to just use model.save_results() for that. Open to discussion/I will think about it.