Age | Commit message (Collapse) | Author | |
---|---|---|---|
2016-12-17 | Revert "Bugfix of type in THCTensor macro."revert-639-patch-1 | Soumith Chintala | |
2016-12-16 | Merge pull request #639 from popol1991/patch-1 | Soumith Chintala | |
Bugfix of type in THCTensor macro. | |||
2016-12-16 | Bugfix of type in THCTensor macro. | Gao Yingkai | |
A fix for issue #632. | |||
2016-12-16 | Merge pull request #637 from pavanky/test-fixes | Soumith Chintala | |
Fixing various tests | |||
2016-12-16 | Fixing various tests | Pavan Yalamanchili | |
- Increased the number of elements being used by distributions. - Fixed indexFill to generate a number that can be used by all types. | |||
2016-12-15 | fix wrong export directive for THCCachingHostAllocator (#633) | Eric Cosatto | |
fix wrong export directive for THCCachingHostAllocator | |||
2016-12-14 | Merge pull request #630 from apaszke/bernoulli | Soumith Chintala | |
Implement bernoulli with element-wise probabilities for all types | |||
2016-12-13 | Implement bernoulli with element-wise probabilities for all types | Adam Paszke | |
2016-12-12 | Merge pull request #628 from killeent/more-documentation | Soumith Chintala | |
TensorInfo related code documentation | |||
2016-12-12 | TensorInfo related code documentation | Trevor Killeen | |
2016-12-02 | Merge pull request #619 from colesbury/cached_pinned_memory_fix | Soumith Chintala | |
Process outstanding CUDA events in recordEvent | |||
2016-12-02 | Process outstanding CUDA events in recordEvent | Sam Gross | |
Without this, the cuda_events could continuously grow from calls to cudaMemcpyAsync, but would never be processed if there were no new pinned memory allocations. For example: t1 = cutorch.createCudaHostTensor(10) t2 = torch.CudaTensor(10) while true do t2:copyAsync(t1) end | |||
2016-12-02 | Merge pull request #618 from colesbury/cached_pinned_memory | Soumith Chintala | |
Add caching allocator for pinned (page-locked) memory | |||
2016-12-02 | Add caching allocator for pinned (host) memory | Sam Gross | |
Adds a caching allocator for CUDA pinned (page-locked) memory. This avoid synchronization due to cudaFreeHost or cudaHostUnregister at the expense of potentially higher host memory usage. Correctness is preserved by recording CUDA events after each cudaMemcpyAsync involving the pinned memory. The pinned memory allocations are not reused until all events associated with it have completed. | |||
2016-12-01 | Adds a CUDA "sleep" kernel | Sam Gross | |
Adds a CUDA "sleep" kernel which spins for the given number of iterations. This is useful for testing correct synchronization with streams. | |||
2016-11-28 | Merge pull request #614 from BTNC/win | Soumith Chintala | |
use local modified select_compute_arch.cmake for msvc | |||
2016-11-28 | use local modified select_compute_arch.cmake for msvc | Rui Guo | |
2016-11-26 | Merge pull request #613 from colesbury/lazy | Soumith Chintala | |
Lazily initialize CUDA devices (take 2) | |||
2016-11-26 | Lazily initialize CUDA devices | Sam Gross | |
Previously, cutorch would initialize every CUDA device and enable P2P access between all pairs. This slows down start-up, especially with 8 devices. Now, THCudaInit does not initialize any devices and P2P access is enabled lazily. Setting the random number generator seed also does not initialize the device until random numbers are actually used. | |||
2016-11-24 | Merge pull request #611 from torch/revert-610-lazy | Soumith Chintala | |
Revert "Lazily initialize CUDA devices" | |||
2016-11-24 | Revert "Lazily initialize CUDA devices"revert-610-lazy | Soumith Chintala | |
2016-11-24 | remove spurious prints in tests | soumith | |
2016-11-24 | Merge pull request #610 from colesbury/lazy | Soumith Chintala | |
Lazily initialize CUDA devices | |||
2016-11-24 | Implemented cudaMemGetInfo for caching allocator (#600) | Boris Fomitchev | |
* Implemented cudaMemGetInfo for caching allocator | |||
2016-11-23 | Lazily initialize CUDA devices | Sam Gross | |
Previously, cutorch would initialize every CUDA device and enable P2P access between all pairs. This slows down start-up, especially with 8 devices. Now, THCudaInit does not initialize any devices and P2P access is enabled lazily. Setting the random number generator seed also does not initialize the device until random numbers are actually used. | |||
2016-11-18 | Merge pull request #607 from killeent/half-guard | Soumith Chintala | |
guard random functions for half | |||
2016-11-18 | guard random functions for half | Trevor Killeen | |
2016-11-18 | Merge pull request #605 from gchanan/halfAddrAddmv | Soumith Chintala | |
Add half support for addmv and addr. | |||
2016-11-18 | Add half support for addmv and addr. | Gregory Chanan | |
2016-11-17 | Merge pull request #604 from killeent/memleak | Soumith Chintala | |
fix memory leak in (equal) | |||
2016-11-17 | fix memory leak in (equal) | Trevor Killeen | |
2016-11-17 | Merge pull request #603 from killeent/remainder | Soumith Chintala | |
Implement fmod, remainder, equal in Cutorch | |||
2016-11-17 | add support for equal in cutorch | Trevor Killeen | |
2016-11-17 | Merge pull request #602 from killeent/magma | Soumith Chintala | |
Magma functions to generic | |||
2016-11-16 | add support for fmod in cutorch | Trevor Killeen | |
2016-11-16 | add support for remainder in cutorch | Trevor Killeen | |
2016-11-16 | [cutorch mag2gen] more cleanup | Trevor Killeen | |
2016-11-16 | [cutorch mag2gen] some cleanup | Trevor Killeen | |
2016-11-16 | [cutorch mag2gen] move qr to generic | Trevor Killeen | |
2016-11-16 | [cutorch mag2gen] move potr* to generic | Trevor Killeen | |
2016-11-16 | [cutorch mag2gen] move inverse to generic | Trevor Killeen | |
2016-11-16 | [cutorch mag2gen] move svd to generic | Trevor Killeen | |
2016-11-16 | [cutorch mag2gen] move eig to generic | Trevor Killeen | |
2016-11-16 | [cutorch mag2gen] move symeig to generic | Trevor Killeen | |
2016-11-16 | [cutorch mag2gen] move gels to generic | Trevor Killeen | |
2016-11-16 | [cutorch mag2gen] code refactor to support generics; move gesv to generic | Trevor Killeen | |
2016-11-16 | [cutorch mag2gen] generic MAGMA memory allocator function | Trevor Killeen | |
2016-11-16 | [cutorch potr*] API parity for potr* functions in cutorch | Trevor Killeen | |
2016-11-15 | Merge pull request #601 from 1nadequacy/fix_baddbmm | Soumith Chintala | |
[cutorch] remove syncing point from baddbmm | |||
2016-11-15 | [cutorch] remove syncing point from baddbmm | Denis Yarats | |
This change removes HtoD copies inside baddbmm. These copies introduce a syncing point which causes slow downs in a multi gpu training. Test plan: Run unittests for baddbmm. |