"sliding window" bigtable training mode#713
Conversation
preprocessing.py
Outdated
|
|
||
|
|
||
| def get_many_tpu_bt_input_tensors(games, games_nr, batch_size, | ||
| start_at, num_datasets, |
There was a problem hiding this comment.
nit: indenting is wrong
preprocessing.py
Outdated
| # is proportionally along compared to last_game_number? comparing | ||
| # timestamps?) | ||
| ds = games.moves_from_games(start_at + (i * window_increment), | ||
| start_at + (i * window_increment) + window_size, |
preprocessing.py
Outdated
| shuffle=True, | ||
| column_family=bigtable_input.TFEXAMPLE, | ||
| column='example') | ||
| ds = ds.repeat(1) |
There was a problem hiding this comment.
you can probably move the repeat and map out of this loop
| column='example') | ||
| ds = ds.repeat(1) | ||
| ds = ds.map(lambda row_name, s: s) | ||
| dataset = dataset.concatenate(ds) if dataset else ds |
There was a problem hiding this comment.
Regarding the general approach: if the training loop does multiple scans, I would expect to create a new dataset for each pass, rather than try to create a single enormous dataset, which I imagine would be harder to debug, inspect, etc.
There was a problem hiding this comment.
yes, but multiple calls to tpuestimator.train will create new graphs :( I am not sure what a good solution for lazy evaluating of these Datasets would be. As it is, it takes a real long time to build the datasets before training even starts -- i suspect the concatenate is doing something bad as things get slower and slower.
| column='example') | ||
| ds = ds.repeat(1) | ||
| ds = ds.map(lambda row_name, s: s) | ||
| dataset = dataset.concatenate(ds) if dataset else ds |
There was a problem hiding this comment.
yes, but multiple calls to tpuestimator.train will create new graphs :( I am not sure what a good solution for lazy evaluating of these Datasets would be. As it is, it takes a real long time to build the datasets before training even starts -- i suspect the concatenate is doing something bad as things get slower and slower.
preprocessing.py
Outdated
| shuffle=True, | ||
| column_family=bigtable_input.TFEXAMPLE, | ||
| column='example') | ||
| ds = ds.repeat(1) |
preprocessing.py
Outdated
|
|
||
|
|
||
| def get_many_tpu_bt_input_tensors(games, games_nr, batch_size, | ||
| start_at, num_datasets, |
preprocessing.py
Outdated
| # is proportionally along compared to last_game_number? comparing | ||
| # timestamps?) | ||
| ds = games.moves_from_games(start_at + (i * window_increment), | ||
| start_at + (i * window_increment) + window_size, |
train.py
Outdated
| self.before_weights = None | ||
|
|
||
|
|
||
| def train_many(start_at=1000000, num_datasets=3): |
There was a problem hiding this comment.
can you expose moves here also.
There was a problem hiding this comment.
what do you mean? number of steps?
|
@amj: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
instead of repeatedly calling train, create repeated datasets by incrementing along the dataset window.
@gitosaurus this is kind of a first cut -- it has to do all the full key retrieval and shuffle before it can start, and it'd be great if i could make that lazy somehow.