Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPatrick Steinhardt <psteinhardt@gitlab.com>2021-08-05 09:31:34 +0300
committerPatrick Steinhardt <psteinhardt@gitlab.com>2021-08-05 09:31:34 +0300
commitcaf2cfabc79b21e8b4c5e0245eb4fbf5e7f7a493 (patch)
tree16c5718aaa9812afee41c7e61ca227cc3f7fccd8 /_support
parentc2ad49fbbf325f45bd31e307451cbf2982a4f647 (diff)
git: Speed up fetches in repos with many refs
Recently, we've hit such a large amount of references in one of our repos that `FetchSourceBranch()` always times out. The root cause of this is that Git has to load all commits of the source repo in order to negotiate refs which are in common with the remote reposiory when we execute git-fetch(1). This is currently always hitting the object database up to the point where git-fetch(1) is heavily dominated by decompressing and parsing objects from disk. To fix this, I'm currently upstreaming a patch to git.git which optimizes the lookup to make use of the commit-graph: if a commit is part of the commit-graph, then we don't need to hit the object database but can instead parse it via this graph, which is a lot more efficient. Benchmarks in the repo which is creating the problems for us has shown that this brings down the time to fetch from 44 seconds to 19 seconds, which is sufficient to unblock `FetchSourceBranch()` again. While our process states that we don't apply any patches to Git before they have hit "next", the problem is significant enough to make an exception here given that community contributors cannot create merge requests against gitlab-org/gitlab anymore. Furthermore, the patch itself is simple enough and has only received positive feedback from maintainers so far [1]. Apply the patch to Git to speed up git-fetch(1) in such repos with many refs and unblock community contributors again. [1]: https://public-inbox.org/git/08519b8ab6f395cffbcd5e530bfba6aaf64241a2.1628085347.git.ps@pks.im/ Changelog: performance
Diffstat (limited to '_support')
-rw-r--r--_support/git-patches/0001-fetch-pack-speed-up-loading-of-refs-via-commit-graph.patch65
1 files changed, 65 insertions, 0 deletions
diff --git a/_support/git-patches/0001-fetch-pack-speed-up-loading-of-refs-via-commit-graph.patch b/_support/git-patches/0001-fetch-pack-speed-up-loading-of-refs-via-commit-graph.patch
new file mode 100644
index 000000000..23d0d6a33
--- /dev/null
+++ b/_support/git-patches/0001-fetch-pack-speed-up-loading-of-refs-via-commit-graph.patch
@@ -0,0 +1,65 @@
+From 08519b8ab6f395cffbcd5e530bfba6aaf64241a2 Mon Sep 17 00:00:00 2001
+Message-Id: <08519b8ab6f395cffbcd5e530bfba6aaf64241a2.1628144240.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Wed, 4 Aug 2021 15:04:25 +0200
+Subject: [PATCH] fetch-pack: speed up loading of refs via commit graph
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When doing reference negotiation, git-fetch-pack(1) is loading all refs
+from disk in order to determine which commits it has in common with the
+remote repository. This can be quite expensive in repositories with many
+references though: in a real-world repository with around 2.2 million
+refs, fetching a single commit by its ID takes around 44 seconds.
+
+Dominating the loading time is decompression and parsing of the objects
+which are referenced by commits. Given the fact that we only care about
+commits (or tags which can be peeled to one) in this context, there is
+thus an easy performance win by switching the parsing logic to make use
+of the commit graph in case we have one available. Like this, we avoid
+hitting the object database to parse these commits but instead only load
+them from the commit-graph. This results in a significant performance
+boost when executing git-fetch in said repository with 2.2 million refs:
+
+ Benchmark #1: HEAD~: git fetch $remote $commit
+ Time (mean ± σ): 44.168 s ± 0.341 s [User: 42.985 s, System: 1.106 s]
+ Range (min … max): 43.565 s … 44.577 s 10 runs
+
+ Benchmark #2: HEAD: git fetch $remote $commit
+ Time (mean ± σ): 19.498 s ± 0.724 s [User: 18.751 s, System: 0.690 s]
+ Range (min … max): 18.629 s … 20.454 s 10 runs
+
+ Summary
+ 'HEAD: git fetch $remote $commit' ran
+ 2.27 ± 0.09 times faster than 'HEAD~: git fetch $remote $commit'
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+---
+ fetch-pack.c | 10 ++++++++--
+ 1 file changed, 8 insertions(+), 2 deletions(-)
+
+diff --git a/fetch-pack.c b/fetch-pack.c
+index b0c7be717c..0bf7ed7e47 100644
+--- a/fetch-pack.c
++++ b/fetch-pack.c
+@@ -137,8 +137,14 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
+ break;
+ }
+ }
+- if (type == OBJ_COMMIT)
+- return (struct commit *) parse_object(the_repository, oid);
++
++ if (type == OBJ_COMMIT) {
++ struct commit *commit = lookup_commit(the_repository, oid);
++ if (!commit || repo_parse_commit(the_repository, commit))
++ return NULL;
++ return commit;
++ }
++
+ return NULL;
+ }
+
+--
+2.32.0
+