Allow choosing the torch device to store replay buffer by lostmsu · Pull Request #44 · laboroai/border

lostmsu · 2021-08-09T19:44:06Z

This avoids the need to copy replay buffer samples from CPU to GPU every training step. However, if GPU RAM is insufficient, replay buffer can be kept on CPU.

This requires Obs to be a Tensor (Act already is).

taku-y · 2021-08-12T00:39:13Z

@lostmsu Thank you for your PR. As to reduce trait bounds for observation in agents like DQN, I added TchBufferOnDevice trait, which is responsible for transferring data to the device of the model.

However, I don't see any speed up with a replay buffer on GPU in an example (sac_ant_gpu). It might be due to small memory footprint of vector observation of the Ant environment. Though the effect of GPU for replay buffer could be significant when the memory footprint of observation is large, like Atari environment with stacked images, the limited GPU memory does not allow us to use a large capacity of replay buffer (Atari envs requires about 20〜30GB of memory for capacity of 1,000,000 samples). If you have cases where replay buffer on GPU works, it would be nice for me to share information.

lostmsu · 2021-08-12T05:07:17Z

@taku-y I tried your sac_ant_gpu, and I get about 3% FPS improvement from using GPU replay buffer (I also had to increase batch size to 4096 to put some noticeable load on GPU).

With GPU buffer:

[2021-08-12T04:50:39Z INFO border_core::core::trainer] Opt step 20000, Eval (mean, min, max) of r_sum: 422.5705, 347.70502, 501.35193
[2021-08-12T04:50:39Z INFO border_core::core::trainer] 129.4951 FPS in training
[2021-08-12T04:50:39Z INFO border_core::core::trainer] 4.607 sec. in evaluation
[2021-08-12T04:51:58Z INFO border_core::core::util] Episode 0, 999 steps, reward = 557.7083
[2021-08-12T04:51:59Z INFO border_core::core::util] Episode 1, 999 steps, reward = 492.571
[2021-08-12T04:52:00Z INFO border_core::core::util] Episode 2, 999 steps, reward = 662.32434
[2021-08-12T04:52:01Z INFO border_core::core::util] Episode 3, 999 steps, reward = 559.0846
[2021-08-12T04:52:02Z INFO border_core::core::util] Episode 4, 999 steps, reward = 686.3971
[2021-08-12T04:52:02Z INFO border_core::core::trainer] Opt step 30000, Eval (mean, min, max) of r_sum: 591.61707, 492.571, 686.3971
[2021-08-12T04:52:02Z INFO border_core::core::trainer] 127.43561 FPS in training
[2021-08-12T04:52:02Z INFO border_core::core::trainer] 4.637 sec. in evaluation

Without (e.g. replaced the last device in build2 call with tch::Device::Cpu):

[2021-08-12T04:55:26Z INFO border_core::core::trainer] Opt step 20000, Eval (mean, min, max) of r_sum: 549.3173, 429.12888, 761.3656
[2021-08-12T04:55:26Z INFO border_core::core::trainer] 124.081795 FPS in training
[2021-08-12T04:55:26Z INFO border_core::core::trainer] 4.811 sec. in evaluation
[2021-08-12T04:56:47Z INFO border_core::core::util] Episode 0, 999 steps, reward = 374.09595
[2021-08-12T04:56:48Z INFO border_core::core::util] Episode 1, 999 steps, reward = 540.211
[2021-08-12T04:56:49Z INFO border_core::core::util] Episode 2, 999 steps, reward = 392.3262
[2021-08-12T04:56:50Z INFO border_core::core::util] Episode 3, 999 steps, reward = 353.30508
[2021-08-12T04:56:51Z INFO border_core::core::util] Episode 4, 999 steps, reward = 591.79846
[2021-08-12T04:56:51Z INFO border_core::core::trainer] Opt step 30000, Eval (mean, min, max) of r_sum: 450.34735, 353.30508, 591.79846
[2021-08-12T04:56:51Z INFO border_core::core::trainer] 125.22227 FPS in training
[2021-08-12T04:56:51Z INFO border_core::core::trainer] 4.841 sec. in evaluation

The difference in evaluation time seems suspicious though, as it should not be affected.

If you want, I can run more iterations to get a better idea if those 3% were some random occurrence.

taku-y · 2021-08-12T15:21:55Z

@lostmsu Thank you for your report. I think there is no statistical significance in speed for the ant environment. I will do an experiment on atari pong with small buffer size. And regardless of the result, if the updated code looks good for you, I will merge this PR.

lostmsu · 2021-08-12T16:19:39Z

@taku-y I find it weird, that the buffer needs to know where the model is. This would not work for multiple GPUs - the training loop needs to move data from where it is to where it has to be, not the buffer.

lostmsu · 2021-08-12T18:26:47Z

@taku-y also, is there a reason to keep reward and not_done on CPU?

taku-y · 2021-08-13T01:10:51Z

@lostmsu

I find it weird, that the buffer needs to know where the model is. This would not work for multiple GPUs - the training loop needs to move data from where it is to where it has to be, not the buffer.

That sounds reasonable. Supporting multiple GPUs is a nice feature. But I don't have any idea how to implements. I need to have a look at some papers and slides (e.g., paper and slides) and other RL libraries like Ray, PFRL.

On the other hand, I would like agents to be generic, supporting mini-batch which could be a set of Vecs some enums for example. But such agents might be better to implement as other struct; for DQN agent, there could be two versions that support GPU buffer or not.

is there a reason to keep reward and not_done on CPU?

I just didn't aware of it. These buffers should be on a specified device.

lostmsu · 2021-08-14T03:09:45Z

@taku-y I spent some time trying to work this up to look pristine, and started to believe, that making TchBuffer and TchBatch generic is a mistake. All components should be just Tensor objects instead of O::SubBatch, O::Item, or A::*. The training loop should simply convert them all to tensors the instant it receives them from the environment. This would also make most models be of type SubModel<Input = Tensor, Output = Tensor>, potentially removing the need for SubModel to be generic too.

A good argument for this is that TchBuffer already leaks the fact, that it heavily relies on tch because fn batch(...) takes an argument of type &Tensor. It is even named TchBuffer!

Do you have a scenario where this behavior (e.g. convert observations and actions to Tensor instantly) would not work? If not, I'd update this PR to make the modification I am suggesting.

taku-y · 2021-08-14T14:43:39Z

@lostmsu Environments with Tuple observation in gym wouldn't work with Tensor observation (link). Another example is a picking robot where observation includes a camera image and joint angles of the manipulator. I want to apply the agents to such environments.

lostmsu · 2021-08-14T16:46:58Z

What about Vec<Tensor>?

taku-y · 2021-08-16T11:38:16Z

@lostmsu Dict like structure would be better to avoid errors. Another example of such environment is RLBench, providing a set of benchmarks with robots. Each observation has four camera images and other information about the state, like joint positions and angles.

https://github.com/stepjam/RLBench/blob/3aa9bb3ad534d8fcdeb93d3f5ff1d161ce5c8fe6/rlbench/gym/rlbench_env.py#L74-L81

I think I should to add an example to demonstrate flexibility of the library.

lostmsu and others added 2 commits August 9, 2021 12:39

allow choosing the torch device to store replay buffer

a7399fc

Add trait for putting replay buffer on GPU

8cf2636

taku-y force-pushed the main branch from e71f5ea to c86c45e Compare September 1, 2024 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow choosing the torch device to store replay buffer#44

Allow choosing the torch device to store replay buffer#44
lostmsu wants to merge 2 commits intolaboroai:mainfrom
lostmsu:replay-buffer-on-gpu

lostmsu commented Aug 9, 2021

Uh oh!

taku-y commented Aug 12, 2021

Uh oh!

lostmsu commented Aug 12, 2021

Uh oh!

taku-y commented Aug 12, 2021

Uh oh!

lostmsu commented Aug 12, 2021

Uh oh!

lostmsu commented Aug 12, 2021

Uh oh!

taku-y commented Aug 13, 2021

Uh oh!

lostmsu commented Aug 14, 2021 •

edited

Loading

Uh oh!

taku-y commented Aug 14, 2021

Uh oh!

lostmsu commented Aug 14, 2021

Uh oh!

taku-y commented Aug 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lostmsu commented Aug 9, 2021

Uh oh!

taku-y commented Aug 12, 2021

Uh oh!

lostmsu commented Aug 12, 2021

Uh oh!

taku-y commented Aug 12, 2021

Uh oh!

lostmsu commented Aug 12, 2021

Uh oh!

lostmsu commented Aug 12, 2021

Uh oh!

taku-y commented Aug 13, 2021

Uh oh!

lostmsu commented Aug 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taku-y commented Aug 14, 2021

Uh oh!

lostmsu commented Aug 14, 2021

Uh oh!

taku-y commented Aug 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lostmsu commented Aug 14, 2021 •

edited

Loading