Merge pull request #512 from colesbury/master

Add THC_CACHING_ALLOCATOR=1 to README.md
author: Soumith Chintala <soumith@gmail.com> 2016-09-26 23:03:27 +0300
committer: GitHub <noreply@github.com> 2016-09-26 23:03:27 +0300
commit: be61b4dd711d7c2dc5d0a706a1e17047ed593628 (patch)
tree: ce20d4039b981486b03d99fd1ba9f6036159eba1
parent: 7d5e3f89c34c1e215a3e9ef4d4439b5ba0f664a9 (diff)
parent: d4ab31984d222ec5c02e099e6eb030d039ba778e (diff)
2 files changed, 7 insertions, 2 deletions
diff --git a/README.md b/README.md
index 89d8129..abdc00e 100644
--- a/README.md
+++ b/README.md
@@ -27,6 +27,11 @@ Most other (besides float) CPU torch tensor types now have a cutorch equivalent,
 
 **Note:** these are currently limited to copying/conversion, and several indexing and shaping operations (e.g. `narrow`, `select`, `unfold`, `transpose`).
 
+### CUDA memory allocation
+Set the environment variable `THC_CACHING_ALLOCATOR=1` to enable the caching CUDA memory allocator.
+
+By default, cutorch calls `cudaMalloc` and `cudaFree` when CUDA tensors are allocated and freed. This is expensive because `cudaFree` synchronizes the CPU with the GPU. Setting `THC_CACHING_ALLOCATOR=1` will cause cutorch to cache and re-use CUDA allocations to avoid synchronizations.
+
 ###`cutorch.*` API
 - `cutorch.synchronize()` : All of the CUDA API is asynchronous (barring a few functions), which means that you can queue up operations. To wait for the operations to finish, you can issue `cutorch.synchronize()` in your code, when the code waits for all GPU operations on the current GPU to finish. WARNING: synchronizes the CPU host with respect to the current device (as per `cutorch.getDevice()`) only.
 - `cutorch.synchronizeAll()` : Same as `cutorch.synchronize()` except synchronizes the CPU host with all visible GPU devices in the system. Equivalent to calling `cutorch.synchronize()` once per each device.
@@ -104,4 +109,4 @@ Compared to version 1.0, these are the following API changes:
 ## Inconsistencies with CPU API
 
 | operators | CPU | CUDA |
-|---|---|---|
-\ No newline at end of file
+|---|---|---|
diff --git a/lib/THC/CMakeLists.txt b/lib/THC/CMakeLists.txt
index f2eab04..ac50349 100644
--- a/lib/THC/CMakeLists.txt
+++ b/lib/THC/CMakeLists.txt
@@ -103,7 +103,7 @@ ENDIF()
 INCLUDE_DIRECTORIES("${CMAKE_CURRENT_BINARY_DIR}")
 CONFIGURE_FILE(THCGeneral.h.in "${CMAKE_CURRENT_BINARY_DIR}/THCGeneral.h")
 
-SET(CMAKE_C_FLAGS "-std=c99 ${CMAKE_C_FLAGS} -g -O0")
+SET(CMAKE_C_FLAGS "-std=c99 ${CMAKE_C_FLAGS}")
 SET(CMAKE_CXX_STANDARD 11)
 SET(src
     THCAllocator.c
author	Soumith Chintala <soumith@gmail.com>	2016-09-26 23:03:27 +0300
committer	GitHub <noreply@github.com>	2016-09-26 23:03:27 +0300
commit	be61b4dd711d7c2dc5d0a706a1e17047ed593628 (patch)
tree	ce20d4039b981486b03d99fd1ba9f6036159eba1
parent	7d5e3f89c34c1e215a3e9ef4d4439b5ba0f664a9 (diff)
parent	d4ab31984d222ec5c02e099e6eb030d039ba778e (diff)