I’m working on benchmarking different sampling strategies and I could use your input on how to set this up efficiently with an ado custom experiment.
Here’s the situation:
• For each combination of points in the training set, I need to train a separate model, but the order of points doesn’t matter.
• I want to make sure we don’t train the same model twice on the same set of points.
• My idea is to use the orchestrator, so it feels natural that the fundamental unit should be a unique identifier for each set of points. But setting up a space in this way sounds impractical
• Each set should then be linked to a set of performance metrics (measured properties).
Basically, I would like to:
- Generate the identifier for training sets
- Associate each identifier with its performance metric.
- Avoid duplicate training runs.
A solution that I would like:
- Generate the identifier for training set at runtime (In a consistent way), and use this identifier in such a way that if I am about to run the training related to this identifier again, I use the memoized result.
Is this already possible within ado? What would be the recommended practice?
Additional detail
The size of the benchmarking sampling strategy must be suited for the caching approach. I.e. sampling O(10) on a total of O(10) and with constraints on the sampling.