Allow choosing the torch device to store replay buffer#44
Allow choosing the torch device to store replay buffer#44lostmsu wants to merge 2 commits intolaboroai:mainfrom
Conversation
|
@lostmsu Thank you for your PR. As to reduce trait bounds for observation in agents like DQN, I added However, I don't see any speed up with a replay buffer on GPU in an example ( |
|
@taku-y I tried your With GPU buffer:
Without (e.g. replaced the last
The difference in evaluation time seems suspicious though, as it should not be affected. If you want, I can run more iterations to get a better idea if those 3% were some random occurrence. |
|
@lostmsu Thank you for your report. I think there is no statistical significance in speed for the ant environment. I will do an experiment on atari pong with small buffer size. And regardless of the result, if the updated code looks good for you, I will merge this PR. |
|
@taku-y I find it weird, that the buffer needs to know where the model is. This would not work for multiple GPUs - the training loop needs to move data from where it is to where it has to be, not the buffer. |
|
@taku-y also, is there a reason to keep |
That sounds reasonable. Supporting multiple GPUs is a nice feature. But I don't have any idea how to implements. I need to have a look at some papers and slides (e.g., paper and slides) and other RL libraries like Ray, PFRL. On the other hand, I would like agents to be generic, supporting mini-batch which could be a set of
I just didn't aware of it. These buffers should be on a specified device. |
|
@taku-y I spent some time trying to work this up to look pristine, and started to believe, that making A good argument for this is that Do you have a scenario where this behavior (e.g. convert observations and actions to |
|
What about |
|
@lostmsu I think I should to add an example to demonstrate flexibility of the library. |
This avoids the need to copy replay buffer samples from CPU to GPU every training step. However, if GPU RAM is insufficient, replay buffer can be kept on CPU.
This requires
Obsto be aTensor(Actalready is).