Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/marian-nmt/FBGEMM.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Expand)Author
2019-05-24Fix kernel loggingMike Tsai
2019-05-16fixing compiler warnings for uninitialized MR, NCB, KCBProtonu Basu
2019-04-19make sure cpuinfo_initialize called before fbgemmHasAvx2/512Support (#94)Jongsoo Park
2019-04-03optimize dw conv for symmetric quant (#73)Jongsoo Park
2019-04-02Exposing tuning parameters in FBGEMM (MCB, NCB, KCB, MR, NR, Row Interleave) ...Protonu Basu
2019-03-25Packing B documentationDaya S Khudia
2019-03-21Improves small N cases back to what they wereDaya S Khudia
2019-03-21Allocate some registers for B matrix loading and reuse loaded resultsDaya S Khudia
2019-03-21Further optimize acc16 kernel and cache blocking dimension for B matrix is no...Daya S Khudia
2019-03-21Further optimize acc32 kernel and cache blocking dimension for B matrix is no...Daya S Khudia
2019-03-19Dump generated kernels in filesDaya S Khudia
2019-03-18Add the Naive bfloat16 implementation based on MKLJianyu Huang
2019-03-13optimize requantize for float out processing (#85)Jongsoo Park
2019-03-08No need for PackA when m==1 (#83)Jianyu Huang
2019-03-08Fixes for FBGEMM FP16 performance (#82)Jianyu Huang
2019-03-06Add Avx512BW/VL/DQ check (#84)Jianyu Huang
2019-02-26barebone int8-acc16 and int8-acc32 benchmarksDaya S Khudia
2019-02-23specialization for first conv (#80)Jongsoo Park
2019-02-22Optimize PackB routine by removing addr functionJianyu Huang
2019-02-20optimize PackAWithIm2Col for symmetric b quantJongsoo Park
2019-02-20increase test coverage (#78)Jongsoo Park
2019-02-19remove unused member var kBlock_ (#77)Jongsoo Park
2019-02-15simple spmdm optimization (#76)Jongsoo Park
2019-02-14clean up depthwise conv interface (#72)Jongsoo Park
2019-02-14fix bug in group conv + avx512 (#75)Jongsoo Park
2019-02-14JIT kernel should only handle a small portion of NCB for the last block: mult...Jianyu Huang
2019-02-14Fix PackBMatrix<T, accT>::printPackedMatrix issuesJianyu Huang
2019-02-13optimize gconv for b symmetric quantization (#70)Jongsoo Park
2019-02-13no need to subtract col offset if a_zp is 0 (#69)Jongsoo Park
2019-02-13isZeroPointZero_ -> isAZeroPointZero_ (#71)Jongsoo Park
2019-02-13group conv optimized for 16 channels per group (#68)Jongsoo Park
2019-02-02gconv optimized for 8 channels per group (#65)Jongsoo Park
2019-02-02minor optimization in handling zero points for row offset (#63)Jongsoo Park
2019-02-01make G slowest moving dim of packed weight of gconv (#62)Jongsoo Park
2019-02-01more careful about movaps (#60)Jongsoo Park
2019-02-01specialized requantization for gconv (#61)Jongsoo Park
2019-01-31optimize requantization remainder (#64)Jongsoo Park
2019-01-31Add threading for FBGEMM FP16Jianyu Huang
2019-01-15mac build fix (#58)Daya S Khudia
2019-01-14Groupwise direct convolution when number of channels per group is smallDaya S Khudia
2019-01-123x3x3 depthwise convolution with per channel quantization (#15775)Jongsoo Park
2019-01-03fix shared lib buildDaya S Khudia
2019-01-02Fix a bug in FbgemmFP16 (#52)Feiteng
2018-12-21Update with clang format (#51)Jianyu Huang
2018-12-11instantiate more kernels for PackAmatrix (#47)Jongsoo Park
2018-12-06Fix duplicate symbols for thread local member variables (#43)James Reed
2018-12-06Add missing <algorithm> include (#42)James Reed
2018-12-06Final cleanup for avx2 isolation and consistent file names (#40)Daya S Khudia
2018-12-06avx2 intrinsic separation from OutputProcessing-inl.h (#38)Daya S Khudia
2018-12-06Separate out avx2 code from dense x sparse matrix multiplication (#39)Daya S Khudia