Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/torch/cutorch.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
path: root/init.c
AgeCommit message (Collapse)Author
2017-03-08Add CUDA caching allocator accessorGuillaume Klein
2017-02-07Static build support + Query CUDA driver, runtime versions (#695)Pavan Yalamanchili
2016-12-29Add THHalfTensor support to cutorch (#655)gchanan
* Add THHalfTensor support to cutorch.
2016-12-22Enable the CUDA caching allocators by defaultSam Gross
A few of us have been using this extensively without problems. This avoids synchronizations due to cudaFree calls which makes it much easier to write performant CUDA code.
2016-12-02Add caching allocator for pinned (host) memorySam Gross
Adds a caching allocator for CUDA pinned (page-locked) memory. This avoid synchronization due to cudaFreeHost or cudaHostUnregister at the expense of potentially higher host memory usage. Correctness is preserved by recording CUDA events after each cudaMemcpyAsync involving the pinned memory. The pinned memory allocations are not reused until all events associated with it have completed.
2016-12-01Adds a CUDA "sleep" kernelSam Gross
Adds a CUDA "sleep" kernel which spins for the given number of iterations. This is useful for testing correct synchronization with streams.
2016-11-26Lazily initialize CUDA devicesSam Gross
Previously, cutorch would initialize every CUDA device and enable P2P access between all pairs. This slows down start-up, especially with 8 devices. Now, THCudaInit does not initialize any devices and P2P access is enabled lazily. Setting the random number generator seed also does not initialize the device until random numbers are actually used.
2016-11-24Revert "Lazily initialize CUDA devices"revert-610-lazySoumith Chintala
2016-11-24Merge pull request #610 from colesbury/lazySoumith Chintala
Lazily initialize CUDA devices
2016-11-24Implemented cudaMemGetInfo for caching allocator (#600)Boris Fomitchev
* Implemented cudaMemGetInfo for caching allocator
2016-11-23Lazily initialize CUDA devicesSam Gross
Previously, cutorch would initialize every CUDA device and enable P2P access between all pairs. This slows down start-up, especially with 8 devices. Now, THCudaInit does not initialize any devices and P2P access is enabled lazily. Setting the random number generator seed also does not initialize the device until random numbers are actually used.
2016-11-05THC UVA AllocatorNicolas Vasilache
2016-10-18correct input types to lua_pushbooleansoumith
2016-10-17guards for halfSoumith Chintala
2016-10-15Add stream API that is not based on indicesSam Gross
This implements the THC code so that we can expose streams as objects instead of simply referring to them by indices. This is not exposed in Lua yet.
2016-10-14Fix caching allocator when used from multiple Lua threadsSam Gross
Use a single, global THCCachingAllocator instance. Previously, each Lua thread had its own THCCachingAllocator instance. However, threads can share storages, which means a segment could be allocated from on THCCachingAllocator and freed on another, which breaks. Fixes #539
2016-10-14adding hasHalfInstructions and hasFastHalfInstructions exposed to luasoumith
2016-09-30Make some basic THC operations thread-safeSam Gross
Switching the device, setting the stream, and switching BLAS handles is now thread-safe. Some other operations, like reserveStreams, are still not thread-safe.
2016-09-25Add CUDA caching allocatorSam Gross
The allocator can be enabled by setting the environment variable THC_CACHING_ALLOCATOR=1
2016-07-29Merge pull request #456 from torch/more-cutorch-template-typesSoumith Chintala
reduce and BLAS work
2016-07-29reduce and BLAS workJeff Johnson
2016-06-29added field driverVersion to cutorchLukas Cavigelli
2016-06-11add half cwrap type and enable math for CudaHalfTensorsoumith
2016-06-11template workJeff Johnson
2016-03-28Merge pull request #355 from apaszke/fp16Soumith Chintala
Add FP16 support (CudaHalfStorage, CudaHalfTensor)
2016-03-14kernel p2p access and non-blocking streamsJeff Johnson
2016-03-13Add FP16 support (CudaHalfStorage, CudaHalfTensor)Adam Paszke
2016-02-26properly shutdown and free cutorch on exitsoumith
2015-12-29synchronizeAllJeff Johnson
2015-12-26Add generic CudaTensor types to cutorchAdam Lerer
2015-11-13cutorch copy/event changesJeff Johnson
2015-08-21Merge pull request #222 from torch/streamfixesSoumith Chintala
stream event fixes
2015-08-21streams patch from nvidiaSoumith Chintala
2015-08-19cutorch gcAdam Lerer
2015-06-24Add MAGMA implementations of Torch LAPACK functionsSam Gross
2015-06-24Stream support for BLAS Handles.soumith
maskedCopy implemented generic Reduce kernels
2015-05-23Fixing call to cudaMemGetInfo to use the correct device.ztaylor
2015-05-19Add CudaHostAllocatorDominik Grewe
* A new allocator that uses cudaMallocHost. * cutorch.createCudaHostTensor(...) to create FloatTensor allocated with CudaHostAllocator.
2015-05-13Revert "Auto device: API changes, bug fixes, README.md"Adam Lerer
This reverts commit d88ac24c712e3a40d4aaf3ac2d043bd79ba4280e. Revert "Auto device mode, plus allocation helper functions." This reverts commit 47a2f6de252c2254234edfc1c6115229b5383bac.
2015-04-29Lua 5.2 compatibilitySam Gross
2015-04-29Auto device mode, plus allocation helper functions.Adam Lerer
This diff introduces an alternative way of writing multi-GPU cutorch code. In this mode, the location of each tensor is specified, and the appropriate GPU for each kernel is determined automatically based on the location of its argument tensors. It's backwards-compatible and interoperable with the old-style multi-GPU API.
2015-04-09adding optional device id to getMemoryUsagesoumith
2015-04-09depreceating deviceResetsoumith
2015-04-07adding cutorch streamssoumith
2015-04-01revamps TensorMath to remove sync points at many places, adds maskedSelect ↵soumith
and maskedFill operations (and tests). Also adds generic Reduce and Apply kernels that can be reused.
2015-03-27Recreate cuBLAS handle on deviceReset.Dominik Grewe
Only need to reset the cuBLAS handle for the current device, because only resources associated with the current device will be reset by cudaDeviceReset.
2015-01-14Pass a state to every THC function.Dominik Grewe
Every THC function gets a THCState pointer as the first argument. Some generic files that were previously included have been instantiated because TH functions currently don't get a state parameter.
2014-11-19Reset RNG state after device reset.Dominik Grewe
A device reset destroys the state of the RNG, so we have to re-initialize it after each reset.
2014-11-12fixed two implicit declaration bugs (found with -Werror)soumith
2014-11-12adding getDevice for tensor, manualSeedAll and seedAllsoumith