Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/torch/cutorch.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSam Gross <colesbury@gmail.com>2016-09-26 22:57:08 +0300
committerSam Gross <sgross@fb.com>2016-09-26 23:02:48 +0300
commitd4ab31984d222ec5c02e099e6eb030d039ba778e (patch)
treece20d4039b981486b03d99fd1ba9f6036159eba1 /README.md
parent7d5e3f89c34c1e215a3e9ef4d4439b5ba0f664a9 (diff)
Add THC_CACHING_ALLOCATOR=1 to README.md
Diffstat (limited to 'README.md')
-rw-r--r--README.md7
1 files changed, 6 insertions, 1 deletions
diff --git a/README.md b/README.md
index 89d8129..abdc00e 100644
--- a/README.md
+++ b/README.md
@@ -27,6 +27,11 @@ Most other (besides float) CPU torch tensor types now have a cutorch equivalent,
**Note:** these are currently limited to copying/conversion, and several indexing and shaping operations (e.g. `narrow`, `select`, `unfold`, `transpose`).
+### CUDA memory allocation
+Set the environment variable `THC_CACHING_ALLOCATOR=1` to enable the caching CUDA memory allocator.
+
+By default, cutorch calls `cudaMalloc` and `cudaFree` when CUDA tensors are allocated and freed. This is expensive because `cudaFree` synchronizes the CPU with the GPU. Setting `THC_CACHING_ALLOCATOR=1` will cause cutorch to cache and re-use CUDA allocations to avoid synchronizations.
+
###`cutorch.*` API
- `cutorch.synchronize()` : All of the CUDA API is asynchronous (barring a few functions), which means that you can queue up operations. To wait for the operations to finish, you can issue `cutorch.synchronize()` in your code, when the code waits for all GPU operations on the current GPU to finish. WARNING: synchronizes the CPU host with respect to the current device (as per `cutorch.getDevice()`) only.
- `cutorch.synchronizeAll()` : Same as `cutorch.synchronize()` except synchronizes the CPU host with all visible GPU devices in the system. Equivalent to calling `cutorch.synchronize()` once per each device.
@@ -104,4 +109,4 @@ Compared to version 1.0, these are the following API changes:
## Inconsistencies with CPU API
| operators | CPU | CUDA |
-|---|---|---| \ No newline at end of file
+|---|---|---|