Age | Commit message (Collapse) | Author |
|
|
|
Adds a caching allocator for CUDA pinned (page-locked) memory. This
avoid synchronization due to cudaFreeHost or cudaHostUnregister at the
expense of potentially higher host memory usage.
Correctness is preserved by recording CUDA events after each
cudaMemcpyAsync involving the pinned memory. The pinned memory
allocations are not reused until all events associated with it have
completed.
|
|
Use a single, global THCCachingAllocator instance.
Previously, each Lua thread had its own THCCachingAllocator instance.
However, threads can share storages, which means a segment could be
allocated from on THCCachingAllocator and freed on another, which
breaks.
Fixes #539
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This reverts commit d88ac24c712e3a40d4aaf3ac2d043bd79ba4280e.
Revert "Auto device mode, plus allocation helper functions."
This reverts commit 47a2f6de252c2254234edfc1c6115229b5383bac.
|
|
- Change :cuda(device) overload to :cudaOn(device)
- Add :cloneOn(device)
- Fix bug in +,-,*,/ metamethods: checkGPU wasn't being called on these
metamethods.
- Add description of auto-device mode to README.md
|
|
|
|
|
|
|