[BOLT] Do not merge cold and hot chains of basic blocks

There is a post-processing in ext-tsp block reordering that merges some blocks into chains. This allows to maintain the original block order in the absense of profile data and can be beneficial for code size (when fallthroughs are merged). In the earlier version we could merge hot and cold (with zero execution count) chains, that later were split by SplitFunction.cpp (when split-all-cold=1). The diff eliminates the redundant merging. It is unlikely the change will affect the performance of a binary in a measurable way, as it is mostly operates with cold basic blocks. However, after the diff the impact of split-all-cold is almost negligible and we can avoid the extra function splitting. Measuring on the clang binary (negative is good, positive is a regression): **clang12** benchmark1: `0.0253` benchmark2: `-0.1843` benchmark3: `0.3234` benchmark4: `0.0333` **clang10** benchmark1 `-0.2517` benchmark2 `-0.3703` benchmark3 `-0.1186` benchmark4 `-0.3822` **clang7** benchmark1 `0.2526` benchmark2 `0.0500` benchmark3 `0.3024` benchmark4 `-0.0489` **Overall**: `-0.0671 ± 0.1172` (insignificant) Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D129397
author: spupyrev <spupyrev@fb.com> 2022-07-08 20:14:26 +0300
committer: spupyrev <spupyrev@fb.com> 2022-07-11 19:31:52 +0300
commit: 7228371054746fd37a729b7f7f72f4689b68e890 (patch)
tree: 2b2fb647fbc72cb328fa8f45d0fd5173954f210f /bolt
parent: 76029cc53e838e6d86b13b0c39152f474fb09263 (diff)
1 files changed, 8 insertions, 3 deletions
diff --git a/bolt/lib/Passes/ExtTSPReorderAlgorithm.cpp b/bolt/lib/Passes/ExtTSPReorderAlgorithm.cpp
index 7281d8290f26..5435b2313bbc 100644
--- a/bolt/lib/Passes/ExtTSPReorderAlgorithm.cpp
+++ b/bolt/lib/Passes/ExtTSPReorderAlgorithm.cpp
@@ -642,20 +642,25 @@ private:
     }
   }
 
-  /// Merge cold blocks to reduce code size
+  /// Merge remaining blocks into chains w/o taking jump counts into
+  /// consideration. This allows to maintain the original block order in the
+  /// absense of profile data
   void mergeColdChains() {
     for (BinaryBasicBlock *SrcBB : BF.layout()) {
       // Iterating in reverse order to make sure original fallthrough jumps are
-      // merged first
+      // merged first; this might be beneficial for code size.
       for (auto Itr = SrcBB->succ_rbegin(); Itr != SrcBB->succ_rend(); ++Itr) {
         BinaryBasicBlock *DstBB = *Itr;
         size_t SrcIndex = SrcBB->getLayoutIndex();
         size_t DstIndex = DstBB->getLayoutIndex();
         Chain *SrcChain = AllBlocks[SrcIndex].CurChain;
         Chain *DstChain = AllBlocks[DstIndex].CurChain;
+        bool IsColdSrc = SrcChain->executionCount() == 0;
+        bool IsColdDst = DstChain->executionCount() == 0;
         if (SrcChain != DstChain && !DstChain->isEntryPoint() &&
             SrcChain->blocks().back()->Index == SrcIndex &&
-            DstChain->blocks().front()->Index == DstIndex)
+            DstChain->blocks().front()->Index == DstIndex &&
+            IsColdSrc == IsColdDst)
           mergeChains(SrcChain, DstChain, 0, MergeTypeTy::X_Y);
       }
     }
author	spupyrev <spupyrev@fb.com>	2022-07-08 20:14:26 +0300
committer	spupyrev <spupyrev@fb.com>	2022-07-11 19:31:52 +0300
commit	7228371054746fd37a729b7f7f72f4689b68e890 (patch)
tree	2b2fb647fbc72cb328fa8f45d0fd5173954f210f /bolt
parent	76029cc53e838e6d86b13b0c39152f474fb09263 (diff)