feat: memoization at the set level during an operation when the number of sets can be huge


I’m working on benchmarking different sampling strategies and I could use your input on how to set this up efficiently with an ado custom experiment.

**Here’s the situation:**
•	For each combination of points in the training set, I need to train a separate model, but the order of points doesn’t matter.

•	I want to make sure we don’t train the same model twice on the same set of points.

•	My idea is to use the orchestrator, so it feels natural that the fundamental unit should be a unique identifier for each set of points. But setting up a space in this way sounds impractical

•	Each set should then be linked to a set of performance metrics (measured properties).

Basically, **I would like to:**
1.	Generate the identifier for training sets
2.	Associate each identifier with its performance metric.
3.	Avoid duplicate training runs.


A **solution** that I would like:
-	Generate the identifier for training set at runtime (In a consistent way), and use this identifier in such a way that if I am about to run the training related to this identifier again, I use the memoized result.

Is this already possible within ado? What would be the recommended practice?

**Additional detail** 
The size of the benchmarking sampling strategy must be suited for the caching approach. I.e. sampling O(10) on a total of O(10) and with constraints on the sampling. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: memoization at the set level during an operation when the number of sets can be huge #607

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: memoization at the set level during an operation when the number of sets can be huge #607

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions