github.com/marian-nmt/FBGEMM.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2019-03-21	Further optimize acc16 kernel and cache blocking dimension for B matrix is no...	Daya S Khudia
2019-03-21	Further optimize acc32 kernel and cache blocking dimension for B matrix is no...	Daya S Khudia
2019-03-19	Dump generated kernels in files	Daya S Khudia
2019-03-18	Add the Naive bfloat16 implementation based on MKL	Jianyu Huang
2019-03-13	optimize requantize for float out processing (#85)	Jongsoo Park
2019-03-08	No need for PackA when m==1 (#83)	Jianyu Huang
2019-03-08	Fixes for FBGEMM FP16 performance (#82)	Jianyu Huang
2019-03-06	Add Avx512BW/VL/DQ check (#84)	Jianyu Huang
2019-03-01	Add documentations for the cache/register blocking parameters (#81)	Jianyu Huang
2019-02-26	barebone int8-acc16 and int8-acc32 benchmarks	Daya S Khudia
2019-02-23	specialization for first conv (#80)	Jongsoo Park
2019-02-22	Optimize PackB routine by removing addr function	Jianyu Huang
2019-02-20	optimize PackAWithIm2Col for symmetric b quant	Jongsoo Park
2019-02-20	increase test coverage (#78)	Jongsoo Park
2019-02-19	remove unused member var kBlock_ (#77)	Jongsoo Park
2019-02-15	simple spmdm optimization (#76)	Jongsoo Park
2019-02-14	clean up depthwise conv interface (#72)	Jongsoo Park
2019-02-14	fix bug in group conv + avx512 (#75)	Jongsoo Park
2019-02-14	JIT kernel should only handle a small portion of NCB for the last block: mult...	Jianyu Huang
2019-02-14	Fix PackBMatrix<T, accT>::printPackedMatrix issues	Jianyu Huang
2019-02-13	optimize gconv for b symmetric quantization (#70)	Jongsoo Park
2019-02-13	no need to subtract col offset if a_zp is 0 (#69)	Jongsoo Park
2019-02-13	isZeroPointZero_ -> isAZeroPointZero_ (#71)	Jongsoo Park
2019-02-13	group conv optimized for 16 channels per group (#68)	Jongsoo Park
2019-02-02	gconv optimized for 8 channels per group (#65)	Jongsoo Park
2019-02-02	minor optimization in handling zero points for row offset (#63)	Jongsoo Park
2019-02-02	Remove inappropriate consts (#67)	Lu Fang
2019-02-01	make G slowest moving dim of packed weight of gconv (#62)	Jongsoo Park
2019-02-01	more careful about movaps (#60)	Jongsoo Park
2019-02-01	specialized requantization for gconv (#61)	Jongsoo Park
2019-01-31	optimize requantization remainder (#64)	Jongsoo Park
2019-01-31	use 1 thread in benchmarks if OMP_NUM_THREADS is not explicitly set (#66)	Jongsoo Park
2019-01-31	Add threading for FBGEMM FP16	Jianyu Huang
2019-01-23	add missing include files to public headers so that they get installed properly	Daya S Khudia
2019-01-15	mac build fix (#58)	Daya S Khudia
2019-01-14	Groupwise direct convolution when number of channels per group is small	Daya S Khudia
2019-01-14	FP16Benchmark: Allow fp32 comparison using cblas (#56)	WilliamTambellini
2019-01-12	3x3x3 depthwise convolution with per channel quantization (#15775)	Jongsoo Park
2019-01-11	don't keep conv_param_p member as a const reference (#57)	Jongsoo Park
2019-01-04	missing copyright headers	Daya S Khudia
2019-01-03	fix shared lib build	Daya S Khudia
2019-01-03	optimize remainder loops of requantization and rowoffset (#54)	Jongsoo Park
2019-01-02	Fix a bug in FbgemmFP16 (#52)	Feiteng
2019-01-02	use 1 omp thread unless OMP_NUM_THREADS is explicitly set (#53)	Jongsoo Park
2018-12-21	Update the profiling format for Acc32 Benchmark (#50)	Jianyu Huang
2018-12-21	Update with clang format (#51)	Jianyu Huang
2018-12-19	Refactor to use FbgemmFP16 in packed gemm operator (#49)	Amy Yang
2018-12-17	add comments on col_offsets (#48)	Jongsoo Park
2018-12-11	instantiate more kernels for PackAmatrix (#47)	Jongsoo Park
2018-12-06	Fix duplicate symbols for thread local member variables (#43)	James Reed