Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/torch/cutorch.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSoumith Chintala <soumith@gmail.com>2016-09-26 23:03:27 +0300
committerGitHub <noreply@github.com>2016-09-26 23:03:27 +0300
commitbe61b4dd711d7c2dc5d0a706a1e17047ed593628 (patch)
treece20d4039b981486b03d99fd1ba9f6036159eba1
parent7d5e3f89c34c1e215a3e9ef4d4439b5ba0f664a9 (diff)
parentd4ab31984d222ec5c02e099e6eb030d039ba778e (diff)
Merge pull request #512 from colesbury/master
Add THC_CACHING_ALLOCATOR=1 to README.md
-rw-r--r--README.md7
-rw-r--r--lib/THC/CMakeLists.txt2
2 files changed, 7 insertions, 2 deletions
diff --git a/README.md b/README.md
index 89d8129..abdc00e 100644
--- a/README.md
+++ b/README.md
@@ -27,6 +27,11 @@ Most other (besides float) CPU torch tensor types now have a cutorch equivalent,
**Note:** these are currently limited to copying/conversion, and several indexing and shaping operations (e.g. `narrow`, `select`, `unfold`, `transpose`).
+### CUDA memory allocation
+Set the environment variable `THC_CACHING_ALLOCATOR=1` to enable the caching CUDA memory allocator.
+
+By default, cutorch calls `cudaMalloc` and `cudaFree` when CUDA tensors are allocated and freed. This is expensive because `cudaFree` synchronizes the CPU with the GPU. Setting `THC_CACHING_ALLOCATOR=1` will cause cutorch to cache and re-use CUDA allocations to avoid synchronizations.
+
###`cutorch.*` API
- `cutorch.synchronize()` : All of the CUDA API is asynchronous (barring a few functions), which means that you can queue up operations. To wait for the operations to finish, you can issue `cutorch.synchronize()` in your code, when the code waits for all GPU operations on the current GPU to finish. WARNING: synchronizes the CPU host with respect to the current device (as per `cutorch.getDevice()`) only.
- `cutorch.synchronizeAll()` : Same as `cutorch.synchronize()` except synchronizes the CPU host with all visible GPU devices in the system. Equivalent to calling `cutorch.synchronize()` once per each device.
@@ -104,4 +109,4 @@ Compared to version 1.0, these are the following API changes:
## Inconsistencies with CPU API
| operators | CPU | CUDA |
-|---|---|---| \ No newline at end of file
+|---|---|---|
diff --git a/lib/THC/CMakeLists.txt b/lib/THC/CMakeLists.txt
index f2eab04..ac50349 100644
--- a/lib/THC/CMakeLists.txt
+++ b/lib/THC/CMakeLists.txt
@@ -103,7 +103,7 @@ ENDIF()
INCLUDE_DIRECTORIES("${CMAKE_CURRENT_BINARY_DIR}")
CONFIGURE_FILE(THCGeneral.h.in "${CMAKE_CURRENT_BINARY_DIR}/THCGeneral.h")
-SET(CMAKE_C_FLAGS "-std=c99 ${CMAKE_C_FLAGS} -g -O0")
+SET(CMAKE_C_FLAGS "-std=c99 ${CMAKE_C_FLAGS}")
SET(CMAKE_CXX_STANDARD 11)
SET(src
THCAllocator.c