diff options
author | Soumith Chintala <soumith@gmail.com> | 2016-09-26 23:03:27 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2016-09-26 23:03:27 +0300 |
commit | be61b4dd711d7c2dc5d0a706a1e17047ed593628 (patch) | |
tree | ce20d4039b981486b03d99fd1ba9f6036159eba1 | |
parent | 7d5e3f89c34c1e215a3e9ef4d4439b5ba0f664a9 (diff) | |
parent | d4ab31984d222ec5c02e099e6eb030d039ba778e (diff) |
Merge pull request #512 from colesbury/master
Add THC_CACHING_ALLOCATOR=1 to README.md
-rw-r--r-- | README.md | 7 | ||||
-rw-r--r-- | lib/THC/CMakeLists.txt | 2 |
2 files changed, 7 insertions, 2 deletions
@@ -27,6 +27,11 @@ Most other (besides float) CPU torch tensor types now have a cutorch equivalent, **Note:** these are currently limited to copying/conversion, and several indexing and shaping operations (e.g. `narrow`, `select`, `unfold`, `transpose`). +### CUDA memory allocation +Set the environment variable `THC_CACHING_ALLOCATOR=1` to enable the caching CUDA memory allocator. + +By default, cutorch calls `cudaMalloc` and `cudaFree` when CUDA tensors are allocated and freed. This is expensive because `cudaFree` synchronizes the CPU with the GPU. Setting `THC_CACHING_ALLOCATOR=1` will cause cutorch to cache and re-use CUDA allocations to avoid synchronizations. + ###`cutorch.*` API - `cutorch.synchronize()` : All of the CUDA API is asynchronous (barring a few functions), which means that you can queue up operations. To wait for the operations to finish, you can issue `cutorch.synchronize()` in your code, when the code waits for all GPU operations on the current GPU to finish. WARNING: synchronizes the CPU host with respect to the current device (as per `cutorch.getDevice()`) only. - `cutorch.synchronizeAll()` : Same as `cutorch.synchronize()` except synchronizes the CPU host with all visible GPU devices in the system. Equivalent to calling `cutorch.synchronize()` once per each device. @@ -104,4 +109,4 @@ Compared to version 1.0, these are the following API changes: ## Inconsistencies with CPU API | operators | CPU | CUDA | -|---|---|---|
\ No newline at end of file +|---|---|---| diff --git a/lib/THC/CMakeLists.txt b/lib/THC/CMakeLists.txt index f2eab04..ac50349 100644 --- a/lib/THC/CMakeLists.txt +++ b/lib/THC/CMakeLists.txt @@ -103,7 +103,7 @@ ENDIF() INCLUDE_DIRECTORIES("${CMAKE_CURRENT_BINARY_DIR}") CONFIGURE_FILE(THCGeneral.h.in "${CMAKE_CURRENT_BINARY_DIR}/THCGeneral.h") -SET(CMAKE_C_FLAGS "-std=c99 ${CMAKE_C_FLAGS} -g -O0") +SET(CMAKE_C_FLAGS "-std=c99 ${CMAKE_C_FLAGS}") SET(CMAKE_CXX_STANDARD 11) SET(src THCAllocator.c |