Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorElijah Newren <newren@gmail.com>2021-05-20 09:09:41 +0300
committerJunio C Hamano <gitster@pobox.com>2021-05-20 09:40:39 +0300
commit25e65b6dd52c987056f1cac00fe6073fbf8ea237 (patch)
treea2b9698df576ab6412985ed08d1ee6fd63b696c6 /t/t6429-merge-sequence-rename-caching.sh
parentcbdca289fbb011e7397fecfebeeac3f887ef22d1 (diff)
merge-ort, diffcore-rename: employ cached renames when possible
When there are many renames between the old base of a series of commits and the new base, the way sequencer.c, merge-recursive.c, and diffcore-rename.c have traditionally split the work resulted in redetecting the same renames with each and every commit being transplanted. To address this, the last several commits have been creating a cache of rename detection results, determining when it was safe to use such a cache in subsequent merge operations, adding helper functions, and so on. See the previous half dozen commit messages for additional discussion of this optimization, particularly the message a few commits ago entitled "add code to check for whether cached renames can be reused". This commit finally ties all of that work together, modifying the merge algorithm to make use of these cached renames. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms That's a fairly small improvement, but mostly because the previous optimizations were so effective for these particular testcases; this optimization only kicks in when the others don't. If we undid the basename-guided rename detection and skip-irrelevant-renames optimizations, then we'd see that this series by itself improved performance as follows: Before Basename Series After Just This Series no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s Since this optimization kicks in to help accelerate cases where the previous optimizations do not apply, this last comparison shows that this cached-renames optimization has the potential to help signficantly in cases that don't meet the requirements for the other optimizations to be effective. The changes made in this optimization also lay some important groundwork for a future optimization around having collect_merge_info() avoid recursing into subtrees in more cases. However, for this optimization to be effective, merge_switch_to_result() should only be called when the rebase or cherry-pick operation has either completed or hit a case where the user needs to resolve a conflict or edit the result. If it is called after every commit, as sequencer.c does, then the working tree and index are needlessly updated with every commit and the cached metadata is tossed, defeating this optimization. Refactoring sequencer.c to only call merge_switch_to_result() at the end of the operation is a bigger undertaking, and the practical benefits of this optimization will not be realized until that work is performed. Since `test-tool fast-rebase` only updates at the end of the operation, it was used to obtain the timings above. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 't/t6429-merge-sequence-rename-caching.sh')
-rwxr-xr-xt/t6429-merge-sequence-rename-caching.sh48
1 files changed, 28 insertions, 20 deletions
diff --git a/t/t6429-merge-sequence-rename-caching.sh b/t/t6429-merge-sequence-rename-caching.sh
index f47d8924ee..035edc40b1 100755
--- a/t/t6429-merge-sequence-rename-caching.sh
+++ b/t/t6429-merge-sequence-rename-caching.sh
@@ -101,10 +101,10 @@ test_expect_success 'caching renames does not preclude finding new ones' '
# dramatic change in size of the file, but remembering the rename and
# reusing it is reasonable too.
#
-# Rename detection (diffcore_rename_extended()) will run twice here; it is
-# not needed on the topic side of history for either of the two commits
-# being merged, but it is needed on the upstream side of history for each
-# commit being picked.
+# We do test here that we expect rename detection to only be run once total
+# (the topic side of history doesn't need renames, and with caching we
+# should be able to only run rename detection on the upstream side one
+# time.)
test_expect_success 'cherry-pick both a commit and its immediate revert' '
test_create_repo pick-commit-and-its-immediate-revert &&
(
@@ -140,11 +140,11 @@ test_expect_success 'cherry-pick both a commit and its immediate revert' '
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
- test_might_fail test-tool fast-rebase --onto HEAD upstream~1 topic &&
+ test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream~1..topic &&
grep region_enter.*diffcore_rename trace.output >calls &&
- test_line_count = 2 calls
+ test_line_count = 1 calls
)
'
@@ -304,9 +304,11 @@ test_expect_success 'rename same file identically, then add file to old dir' '
# Here we are just concerned that cached renames might prevent us from seeing
# the rename conflict, and we want to ensure that we do get a conflict.
#
-# While at it, also test that we do rename detection three times. We have to
-# detect renames on the upstream side of history once for each merge, plus
-# Topic_2 has renames.
+# While at it, though, we do test that we only try to detect renames 2
+# times and not three. (The first merge needs to detect renames on the
+# upstream side. Traditionally, the second merge would need to detect
+# renames on both sides of history, but our caching of upstream renames
+# should avoid the need to re-detect upstream renames.)
#
test_expect_success 'cached dir rename does not prevent noticing later conflict' '
test_create_repo dir-rename-cache-not-occluding-later-conflict &&
@@ -357,7 +359,7 @@ test_expect_success 'cached dir rename does not prevent noticing later conflict'
grep CONFLICT..rename/rename output &&
grep region_enter.*diffcore_rename trace.output >calls &&
- test_line_count = 3 calls
+ test_line_count = 2 calls
)
'
@@ -412,10 +414,17 @@ test_setup_upstream_rename () {
# commit to mess up its location either. We want to make sure that
# olddir/newfile doesn't exist in the result and that newdir/newfile does.
#
-# We also expect rename detection to occur three times. Although it is
-# typically needed two times per commit, there are no deleted files on the
-# topic side of history, so we only need to detect renames on the upstream
-# side for each of the 3 commits we need to pick.
+# We also test that we only do rename detection twice. We never need
+# rename detection on the topic side of history, but we do need it twice on
+# the upstream side of history. For the first topic commit, we only need
+# the
+# relevant-rename -> renamed
+# rename, because olddir is unmodified by Topic_1. For Topic_2, however,
+# the new file being added to olddir means files that were previously
+# irrelevant for rename detection are now relevant, forcing us to repeat
+# rename detection for the paths we don't already have cached. Topic_3 also
+# tweaks olddir/newfile, but the renames in olddir/ will have been cached
+# from the second rename detection run.
#
test_expect_success 'dir rename unneeded, then add new file to old dir' '
test_setup_upstream_rename dir-rename-unneeded-until-new-file &&
@@ -450,7 +459,7 @@ test_expect_success 'dir rename unneeded, then add new file to old dir' '
#git cherry-pick upstream..topic &&
grep region_enter.*diffcore_rename trace.output >calls &&
- test_line_count = 3 calls &&
+ test_line_count = 2 calls &&
git ls-files >tracked &&
test_line_count = 5 tracked &&
@@ -516,7 +525,7 @@ test_expect_success 'dir rename unneeded, then rename existing file into old dir
#git cherry-pick upstream..topic &&
grep region_enter.*diffcore_rename trace.output >calls &&
- test_line_count = 4 calls &&
+ test_line_count = 3 calls &&
test_path_is_missing somefile &&
test_path_is_missing olddir/newfile &&
@@ -648,9 +657,8 @@ test_expect_success 'caching renames only on upstream side, part 1' '
# for the wrong side of history.
#
#
-# This testcase should only need three calls to diffcore_rename_extended(),
-# because there are no renames on the topic side of history for picking
-# Topic_2.
+# This testcase should only need two calls to diffcore_rename_extended(),
+# both for the first merge, one for each side of history.
#
test_expect_success 'caching renames only on upstream side, part 2' '
test_setup_topic_rename cache-renames-only-upstream-rename-file &&
@@ -677,7 +685,7 @@ test_expect_success 'caching renames only on upstream side, part 2' '
#git cherry-pick upstream..topic &&
grep region_enter.*diffcore_rename trace.output >calls &&
- test_line_count = 3 calls &&
+ test_line_count = 2 calls &&
git ls-files >tracked &&
test_line_count = 4 tracked &&