From 6f47a086113e640613e14c13bdda6f39347c6824 Mon Sep 17 00:00:00 2001 From: Dylan Madisetti Date: Tue, 12 May 2020 15:21:00 -0400 Subject: [PATCH] Updated stale GDR link (contrib removed in tensorflow 2.x) :tada: --- tensorflow_networking/verbs/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tensorflow_networking/verbs/README.md b/tensorflow_networking/verbs/README.md index 3137bfd..e7aedb7 100644 --- a/tensorflow_networking/verbs/README.md +++ b/tensorflow_networking/verbs/README.md @@ -27,7 +27,7 @@ During the server setup, an RDMA manager is created to manage low-level RDMA com TensorFlow dynamically allocates memory for tensors that are to be sent or received. This causes difficulty for RDMA operations where pinned memory is required. Few remedies are possible: 1. The memory is pinned, transferred, then unpinned for each and every tensor to be transferred. This incurs significant operation overhead since pinning and unpinning memory for each dynamically generated tensor is slow. 2. Buffer is pre-allocated and pinned for each tensor. This incurs large memory overhead and extra copying from the tensor to its pinned buffer, but may still be faster than the former. -3. Following HKUST research on the use of GPU direct, and their [GDR implementation](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/gdr/README.md), there is a smart way to benefit from the TensorFlow allocation theme which is mostly pool based, i.e allocators pre-allocate a large memory block, and allocate the tensors from there. By attaching a custom Visitor to relevant allocators, we can do a single registration of the entire memory block, which zeros the registration overhead. Once the block is registered, each new tensor allocated will be at a registered address, which will allow us to do direct RDMA writes to it. +3. Following HKUST research on the use of GPU direct, and their [GDR implementation](https://github.com/tensorflow/networking/blob/master/tensorflow_networking/gdr/README.md), there is a smart way to benefit from the TensorFlow allocation theme which is mostly pool based, i.e allocators pre-allocate a large memory block, and allocate the tensors from there. By attaching a custom Visitor to relevant allocators, we can do a single registration of the entire memory block, which zeros the registration overhead. Once the block is registered, each new tensor allocated will be at a registered address, which will allow us to do direct RDMA writes to it. For best performance, we will adopt HKUST 0 copies approach in our solution. This means: