Age | Commit message (Collapse) | Author |
|
|
|
|
|
Lazily initialize CUDA devices
|
|
* Implemented cudaMemGetInfo for caching allocator
|
|
Previously, cutorch would initialize every CUDA device and enable P2P
access between all pairs. This slows down start-up, especially with 8
devices. Now, THCudaInit does not initialize any devices and P2P access
is enabled lazily. Setting the random number generator seed also does
not initialize the device until random numbers are actually used.
|
|
guard random functions for half
|
|
|
|
Add half support for addmv and addr.
|
|
|
|
fix memory leak in (equal)
|
|
|
|
Implement fmod, remainder, equal in Cutorch
|
|
|
|
Magma functions to generic
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
[cutorch] remove syncing point from baddbmm
|
|
This change removes HtoD copies inside baddbmm. These copies
introduce a syncing point which causes slow downs in a multi
gpu training.
Test plan: Run unittests for baddbmm.
|
|
move random functions to generic (attempt 2)
|
|
|
|
|
|
Revert "Move random functions to generic"
|
|
|
|
Move random functions to generic
|
|
unit tests
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
function generic
|
|
generic
|