diff options
author | Jianyu Huang <jianyuhuang@fb.com> | 2019-03-08 21:34:32 +0300 |
---|---|---|
committer | Facebook Github Bot <facebook-github-bot@users.noreply.github.com> | 2019-03-08 21:37:10 +0300 |
commit | 50b43162fd1742122d01f2704945c78f13e0d73e (patch) | |
tree | d5fee7d82429cd63aa1f8bee3628e822bd010436 | |
parent | 844dacc267391cd2a725d81c2495636f0765771b (diff) |
No need for PackA when m==1 (#83)
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/83
When m = 1, PackA is actually not necessary: PackA operations for FP16 in these two libraries are both simply matrix transposition. In this case, we don’t need to do the transposition. We can just pass the pointer of the original A matrix buffer to the packed A buffer.
Reviewed By: zhengwy888
Differential Revision: D14299246
fbshipit-source-id: 78a62c5ff3a396b59afb15462efe38461cb71e15
-rw-r--r-- | src/FbgemmFP16.cc | 12 |
1 files changed, 10 insertions, 2 deletions
diff --git a/src/FbgemmFP16.cc b/src/FbgemmFP16.cc index 868bc1b..2c0eea3 100644 --- a/src/FbgemmFP16.cc +++ b/src/FbgemmFP16.cc @@ -244,11 +244,19 @@ FBGEMM_API void cblas_gemm_compute( auto m_start = m1, m_end = m1 + kernel_nrows * nkernel_nrows; for (auto m2 = m_start; m2 < m_end; m2 += kernel_nrows) { assert(kernel_nrows * kb < scratchpad->size()); - PackA(kernel_nrows, kb, &A[m2 * k + k_ind], k, scratchpad->data()); + if (m != 1) { + PackA(kernel_nrows, kb, &A[m2 * k + k_ind], k, scratchpad->data()); + gp.A = scratchpad->data(); + } else { + // When m == 1, it is actually vector matrix multiplication. We + // don't need to do the transposition for packA here. Instead, we + // can just pass the pointer of the original A matrix buffer to the + // packed A buffer. + gp.A = const_cast<float*>(&A[k_ind]); + } int nbcol = n / Bp.blockColSize(); gp.k = kb; - gp.A = scratchpad->data(); gp.B = &(Bp(k_ind, 0)); gp.beta = &beta_; gp.accum = accum; |