Skip to content

Switch to a better system for checkpointing #48

@allevitan

Description

@allevitan

There is currently a system for checkpointing reconstructions, which is essentially undocumented and has only been used (I believe) once, by me, in a personal project. The idea is to make it possible to break a long reconstruction script up into many small slurm jobs.

The switch to the reconstructors class #47 currently breaks the checkpointing system. However, given that it is undocumented, I think we can still merge the PR without fixing this issue.

At the same time, I think there is a better way to run the checkpointing system. I think with a very small tweak, it would be possible to simply create a with statement, e.g:

with model.use_checkpoints(checkpoint_file_stem, dataset):
    # your normal code

That would then automatically checkpoint whenever the slurm job finishes. This would make it easier to write scripts for checkpointing, since you wouldn't need to manually checkpoint, and all the subtasks would take the full allotment of time. We could keep around the current model.checkpoint(dataset) function, perhaps switching the name to model.manually_checkpoint(dataset), if there are specific points where we want a checkpoint saved. But, it might be better to just use model.save_results() for that. Open to discussion/I will think about it.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions