Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/marian-nmt/FBGEMM.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2019-03-21Further optimize acc16 kernel and cache blocking dimension for B matrix is no...Daya S Khudia
2019-03-21Further optimize acc32 kernel and cache blocking dimension for B matrix is no...Daya S Khudia
2019-03-19Dump generated kernels in filesDaya S Khudia
2019-03-18Add the Naive bfloat16 implementation based on MKLJianyu Huang
2019-03-13optimize requantize for float out processing (#85)Jongsoo Park
2019-03-08No need for PackA when m==1 (#83)Jianyu Huang
2019-03-08Fixes for FBGEMM FP16 performance (#82)Jianyu Huang
2019-03-06Add Avx512BW/VL/DQ check (#84)Jianyu Huang
2019-03-01Add documentations for the cache/register blocking parameters (#81)Jianyu Huang
2019-02-26barebone int8-acc16 and int8-acc32 benchmarksDaya S Khudia
2019-02-23specialization for first conv (#80)Jongsoo Park
2019-02-22Optimize PackB routine by removing addr functionJianyu Huang
2019-02-20optimize PackAWithIm2Col for symmetric b quantJongsoo Park
2019-02-20increase test coverage (#78)Jongsoo Park
2019-02-19remove unused member var kBlock_ (#77)Jongsoo Park
2019-02-15simple spmdm optimization (#76)Jongsoo Park
2019-02-14clean up depthwise conv interface (#72)Jongsoo Park
2019-02-14fix bug in group conv + avx512 (#75)Jongsoo Park
2019-02-14JIT kernel should only handle a small portion of NCB for the last block: mult...Jianyu Huang
2019-02-14Fix PackBMatrix<T, accT>::printPackedMatrix issuesJianyu Huang
2019-02-13optimize gconv for b symmetric quantization (#70)Jongsoo Park
2019-02-13no need to subtract col offset if a_zp is 0 (#69)Jongsoo Park
2019-02-13isZeroPointZero_ -> isAZeroPointZero_ (#71)Jongsoo Park
2019-02-13group conv optimized for 16 channels per group (#68)Jongsoo Park
2019-02-02gconv optimized for 8 channels per group (#65)Jongsoo Park
2019-02-02minor optimization in handling zero points for row offset (#63)Jongsoo Park
2019-02-02Remove inappropriate consts (#67)Lu Fang
2019-02-01make G slowest moving dim of packed weight of gconv (#62)Jongsoo Park
2019-02-01more careful about movaps (#60)Jongsoo Park
2019-02-01specialized requantization for gconv (#61)Jongsoo Park
2019-01-31optimize requantization remainder (#64)Jongsoo Park
2019-01-31use 1 thread in benchmarks if OMP_NUM_THREADS is not explicitly set (#66)Jongsoo Park
2019-01-31Add threading for FBGEMM FP16Jianyu Huang
2019-01-23add missing include files to public headers so that they get installed properlyDaya S Khudia
2019-01-15mac build fix (#58)Daya S Khudia
2019-01-14Groupwise direct convolution when number of channels per group is smallDaya S Khudia
2019-01-14FP16Benchmark: Allow fp32 comparison using cblas (#56)WilliamTambellini
2019-01-123x3x3 depthwise convolution with per channel quantization (#15775)Jongsoo Park
2019-01-11don't keep conv_param_p member as a const reference (#57)Jongsoo Park
2019-01-04missing copyright headersDaya S Khudia
2019-01-03fix shared lib buildDaya S Khudia
2019-01-03optimize remainder loops of requantization and rowoffset (#54)Jongsoo Park
2019-01-02Fix a bug in FbgemmFP16 (#52)Feiteng
2019-01-02use 1 omp thread unless OMP_NUM_THREADS is explicitly set (#53)Jongsoo Park
2018-12-21Update the profiling format for Acc32 Benchmark (#50)Jianyu Huang
2018-12-21Update with clang format (#51)Jianyu Huang
2018-12-19Refactor to use FbgemmFP16 in packed gemm operator (#49)Amy Yang
2018-12-17add comments on col_offsets (#48)Jongsoo Park
2018-12-11instantiate more kernels for PackAmatrix (#47)Jongsoo Park
2018-12-06Fix duplicate symbols for thread local member variables (#43)James Reed