Age | Commit message (Collapse) | Author | |
---|---|---|---|
2017-09-12 | fix alignment warningHEADmaster | Natalia Gimelshein | |
2017-09-10 | Optimize pow for different exponents and add tests | Francisco Massa | |
2017-09-07 | fix static linkage and make THD statically linked | Soumith Chintala | |
2017-08-29 | Fix grid size for batch cat tensor now that getApplyGrid has been changed. | Christian Sarofeen | |
2017-08-27 | Allowing larger grids for THCApply shows improved performance. | Christian Sarofeen | |
2017-08-25 | Fix typos. | Zhou Mo | |
2017-08-25 | add ones_like and zeros_like | Alykhan Tejani | |
2017-08-25 | cuda 9 hgemm fix | Soumith Chintala | |
2017-08-25 | Updates for CUDA 9 | Christian Sarofeen | |
2017-08-17 | fixing the bug with squeezing a singleton dimension in torch.min and torch.max | Anton Osokin | |
2017-08-17 | Add CUDA version of eye | Francisco Massa | |
2017-08-15 | accumulate in accType for reductions over dimensions | Natalia Gimelshein | |
2017-08-15 | Support __neg__, .neg(), and neg_() for Long, Int, Short tensor types. | Gregory Chanan | |
2017-08-10 | call gemmStridedBatched for cuda >=8 to avoid calling kernels to set up ↵ | ngimel | |
pointers (#794) | |||
2017-08-05 | move normal variants to TH/THC | Trevor Killeen | |
2017-07-21 | Fix torch.inverse when magma is not available | Sam Gross | |
Fixes #2156 | |||
2017-07-19 | Add CUDA support for arange | Francisco Massa | |
Also enables CUDA for range | |||
2017-07-19 | add explicit BLAS linkage to THC when linked against magma (in binary build) | Soumith Chintala | |
2017-07-19 | move to model with cuda indexing tensors for cuda tensor adv indexing | Trevor Killeen | |
2017-07-18 | fix baddbmm for expanded tensors | Natalia Gimelshein | |
2017-07-17 | fix cwrap | soumith | |
2017-07-15 | fix cwrap for std/var | Soumith Chintala | |
2017-07-15 | Wrap unbiased flag in var, std, varall, stdall | Luca Antiga | |
2017-07-14 | add launch_bounds to greedy kernels | Natalia Gimelshein | |
2017-07-13 | Advanced Indexing: Calculate linear offsets directly on the GPU when working ↵ | Trevor Killeen | |
with CUDA Tensors | |||
2017-07-13 | Check for shared_mem size in multinomial single-sample implementation | Pan He | |
Handle limited shared memory on function torch.multinomial Update THCTensorRandom.cu | |||
2017-07-12 | Avoid two unnecessary copies in addmm backward | Sam Gross | |
The `r_` and `t` tensors become different objects, even though they point to the same data. Avoid the copy whenever beta=0. | |||
2017-07-11 | Alias multinomial sampling in Cuda (#784) | Amartya Sanyal | |
* Support Multinomial Alias sampling in cuda Moving benchmark file * Review changes | |||
2017-07-04 | add missing definition | Soumith Chintala | |
2017-07-04 | Have median reduce over all dims and return just the value when dim is not ↵ | Luca Antiga | |
provided | |||
2017-07-03 | Add a nonContigDim reduction kernel to improve latency for small tensors. (#768) | Christian Sarofeen | |
2017-07-03 | Make reduction functors accept only constant arguments (#753) | ngimel | |
(similar to MaxValuePair and MinValuePair above). | |||
2017-06-29 | Warp intrinsic fixes (#785) | ngimel | |
2017-06-26 | support more than 8 gpus (#774) | Sergey Zagoruyko | |
2017-06-26 | Fp16 fixes for CUDA 9 (#783) | Christian Sarofeen | |
2017-06-23 | Advanced Indexing Part 1 -- Purely Integer Array Indexing | Trevor Killeen | |
2017-06-22 | Remove THCTensor_(expand2) and THCTensor_(expand3). | Gregory Chanan | |
They are no longer needed and the corresponding TH versions have been removed. | |||
2017-06-22 | btrifact: Make pivoting optional. | Brandon Amos | |
2017-06-13 | Short-circuit copy if src and dest are equal. | Edward Z. Yang | |
Signed-off-by: Edward Z. Yang <ezyang@fb.com> | |||
2017-06-11 | Remove raiseErrors from THTensor functions, have THStorage functions take an ↵ | Gregory Chanan | |
error_buffer to return a proper error message while being able to handle memory management correctly from calling function. | |||
2017-06-11 | Incorporate review comments: | Gregory Chanan | |
1) Line up trailing dimensions in broadcast docs. 2) remove unnecessary expand_as in common_nn test. 3) use view in tensor_str instead of resize_. 4) newExpand remove raiseErrors change. 5) clarify expandedSizes/expandedStrides parameters in inferExpandGeometry. 6) simplify inferSize2/inferSizeN implementations. 7) use new-style classes for warning. | |||
2017-06-11 | Add broadcasting support for copy_, simplify code generation by moving a lot ↵ | Gregory Chanan | |
of currently generated code to expand_utils. | |||
2017-06-11 | Support "fused" ops: addcmul/addcdiv. | Gregory Chanan | |
2017-06-11 | Expand improvements | Gregory Chanan | |
1) Rename calculateExpandGeometry to inferExpandGeometry for consistency 2) Simplify inferExpandGeometry implementation by using a single pass through dimensions 3) Implement a two operand expansion, expand2. 4) Implement versions that return error code to use for fallback to equal nElem support. | |||
2017-06-05 | Fix sharing of CUDA tensors on non-current devices | Sam Gross | |
2017-06-02 | substitute cudnnFind* functions with cudnnFind*Ex | Alexey Romanenko | |
2017-05-25 | Add scatterAdd | Adam Paszke | |
2017-05-15 | Cuda reduce in a consistent direction | Rudy Bunel | |
2017-05-10 | Make torch.cat not synchronize the host and device | Sam Gross | |
2017-05-10 | Add keepdim to lua cwrap. (#763) | gchanan | |