github.com/torch/cutorch.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2017-03-08	Add CUDA caching allocator accessor	Guillaume Klein

2017-02-07	Static build support + Query CUDA driver, runtime versions (#695)	Pavan Yalamanchili

2016-12-29	Add THHalfTensor support to cutorch (#655)	gchanan
	* Add THHalfTensor support to cutorch.
2016-12-22	Enable the CUDA caching allocators by default	Sam Gross
	A few of us have been using this extensively without problems. This avoids synchronizations due to cudaFree calls which makes it much easier to write performant CUDA code.
2016-12-02	Add caching allocator for pinned (host) memory	Sam Gross
	Adds a caching allocator for CUDA pinned (page-locked) memory. This avoid synchronization due to cudaFreeHost or cudaHostUnregister at the expense of potentially higher host memory usage. Correctness is preserved by recording CUDA events after each cudaMemcpyAsync involving the pinned memory. The pinned memory allocations are not reused until all events associated with it have completed.
2016-12-01	Adds a CUDA "sleep" kernel	Sam Gross
	Adds a CUDA "sleep" kernel which spins for the given number of iterations. This is useful for testing correct synchronization with streams.
2016-11-26	Lazily initialize CUDA devices	Sam Gross
	Previously, cutorch would initialize every CUDA device and enable P2P access between all pairs. This slows down start-up, especially with 8 devices. Now, THCudaInit does not initialize any devices and P2P access is enabled lazily. Setting the random number generator seed also does not initialize the device until random numbers are actually used.
2016-11-24	Revert "Lazily initialize CUDA devices"revert-610-lazy	Soumith Chintala

2016-11-24	Merge pull request #610 from colesbury/lazy	Soumith Chintala
	Lazily initialize CUDA devices
2016-11-24	Implemented cudaMemGetInfo for caching allocator (#600)	Boris Fomitchev
	* Implemented cudaMemGetInfo for caching allocator
2016-11-23	Lazily initialize CUDA devices	Sam Gross
	Previously, cutorch would initialize every CUDA device and enable P2P access between all pairs. This slows down start-up, especially with 8 devices. Now, THCudaInit does not initialize any devices and P2P access is enabled lazily. Setting the random number generator seed also does not initialize the device until random numbers are actually used.
2016-11-05	THC UVA Allocator	Nicolas Vasilache

2016-10-18	correct input types to lua_pushboolean	soumith

2016-10-17	guards for half	Soumith Chintala

2016-10-15	Add stream API that is not based on indices	Sam Gross
	This implements the THC code so that we can expose streams as objects instead of simply referring to them by indices. This is not exposed in Lua yet.
2016-10-14	Fix caching allocator when used from multiple Lua threads	Sam Gross
	Use a single, global THCCachingAllocator instance. Previously, each Lua thread had its own THCCachingAllocator instance. However, threads can share storages, which means a segment could be allocated from on THCCachingAllocator and freed on another, which breaks. Fixes #539
2016-10-14	adding hasHalfInstructions and hasFastHalfInstructions exposed to lua	soumith

2016-09-30	Make some basic THC operations thread-safe	Sam Gross
	Switching the device, setting the stream, and switching BLAS handles is now thread-safe. Some other operations, like reserveStreams, are still not thread-safe.
2016-09-25	Add CUDA caching allocator	Sam Gross
	The allocator can be enabled by setting the environment variable THC_CACHING_ALLOCATOR=1
2016-07-29	Merge pull request #456 from torch/more-cutorch-template-types	Soumith Chintala
	reduce and BLAS work
2016-07-29	reduce and BLAS work	Jeff Johnson

2016-06-29	added field driverVersion to cutorch	Lukas Cavigelli

2016-06-11	add half cwrap type and enable math for CudaHalfTensor	soumith

2016-06-11	template work	Jeff Johnson

2016-03-28	Merge pull request #355 from apaszke/fp16	Soumith Chintala
	Add FP16 support (CudaHalfStorage, CudaHalfTensor)
2016-03-14	kernel p2p access and non-blocking streams	Jeff Johnson

2016-03-13	Add FP16 support (CudaHalfStorage, CudaHalfTensor)	Adam Paszke

2016-02-26	properly shutdown and free cutorch on exit	soumith

2015-12-29	synchronizeAll	Jeff Johnson

2015-12-26	Add generic CudaTensor types to cutorch	Adam Lerer

2015-11-13	cutorch copy/event changes	Jeff Johnson

2015-08-21	Merge pull request #222 from torch/streamfixes	Soumith Chintala
	stream event fixes
2015-08-21	streams patch from nvidia	Soumith Chintala

2015-08-19	cutorch gc	Adam Lerer

2015-06-24	Add MAGMA implementations of Torch LAPACK functions	Sam Gross

2015-06-24	Stream support for BLAS Handles.	soumith
	maskedCopy implemented generic Reduce kernels
2015-05-23	Fixing call to cudaMemGetInfo to use the correct device.	ztaylor

2015-05-19	Add CudaHostAllocator	Dominik Grewe
	* A new allocator that uses cudaMallocHost. * cutorch.createCudaHostTensor(...) to create FloatTensor allocated with CudaHostAllocator.
2015-05-13	Revert "Auto device: API changes, bug fixes, README.md"	Adam Lerer
	This reverts commit d88ac24c712e3a40d4aaf3ac2d043bd79ba4280e. Revert "Auto device mode, plus allocation helper functions." This reverts commit 47a2f6de252c2254234edfc1c6115229b5383bac.
2015-04-29	Lua 5.2 compatibility	Sam Gross

2015-04-29	Auto device mode, plus allocation helper functions.	Adam Lerer
	This diff introduces an alternative way of writing multi-GPU cutorch code. In this mode, the location of each tensor is specified, and the appropriate GPU for each kernel is determined automatically based on the location of its argument tensors. It's backwards-compatible and interoperable with the old-style multi-GPU API.
2015-04-09	adding optional device id to getMemoryUsage	soumith

2015-04-09	depreceating deviceReset	soumith

2015-04-07	adding cutorch streams	soumith

2015-04-01	revamps TensorMath to remove sync points at many places, adds maskedSelect ↵	soumith
	and maskedFill operations (and tests). Also adds generic Reduce and Apply kernels that can be reused.
2015-03-27	Recreate cuBLAS handle on deviceReset.	Dominik Grewe
	Only need to reset the cuBLAS handle for the current device, because only resources associated with the current device will be reset by cudaDeviceReset.
2015-01-14	Pass a state to every THC function.	Dominik Grewe
	Every THC function gets a THCState pointer as the first argument. Some generic files that were previously included have been instantiated because TH functions currently don't get a state parameter.
2014-11-19	Reset RNG state after device reset.	Dominik Grewe
	A device reset destroys the state of the RNG, so we have to re-initialize it after each reset.
2014-11-12	fixed two implicit declaration bugs (found with -Werror)	soumith

2014-11-12	adding getDevice for tensor, manualSeedAll and seedAll	soumith