Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPatrick Steinhardt <psteinhardt@gitlab.com>2021-09-09 15:07:32 +0300
committerPatrick Steinhardt <psteinhardt@gitlab.com>2021-09-09 15:53:04 +0300
commit6ebcf23f1aa77d21c98789a9823a6fde362a09c4 (patch)
tree2eddd7995d09d614e75304d466038a070f551aac
parente7f5668a0d77fc4c1ba0d329ea8c8e758e73ff4f (diff)
Makefile: Apply Git patches to speed up fetches
For quite some time we're aware of the fact that mirror-fetches into repositories with many refs are exceedingly slow. Most importantly, this issue poses problems for our replication strategy where replication jobs take so much time that replication targets are likely to be out of date immediately after they have received a replication jobs because the primary node has received additional mutators while the replication target was fetching changes. To address this problem, we have upstreamed a patch series into git.git which speeds fetches up somewhat. Most importantly, this patch series optimizes the way git-fetch(1) enumerates refs by making better use of the commit-graph. The result is that mirror-fetches in the benchmarking repository gitlab-org/gitlab have been sped up from originally 56s to 25s. While it is unlikely that this speedup alone will fix our replication issue, it is definitely an important step towards improving the situation. Changelog: performance
-rw-r--r--Makefile19
-rw-r--r--_support/git-patches/0007-fetch-skip-formatting-updated-refs-with-quiet.patch107
-rw-r--r--_support/git-patches/0008-fetch-speed-up-lookup-of-want-refs-via-commit-graph.patch95
-rw-r--r--_support/git-patches/0009-fetch-avoid-unpacking-headers-in-object-existence-ch.patch67
-rw-r--r--_support/git-patches/0010-connected-refactor-iterator-to-return-next-object-ID.patch260
-rw-r--r--_support/git-patches/0011-fetch-pack-optimize-loading-of-refs-via-commit-graph.patch61
-rw-r--r--_support/git-patches/0012-fetch-refactor-fetch-refs-to-be-more-extendable.patch60
-rw-r--r--_support/git-patches/0013-fetch-merge-fetching-and-consuming-refs.patch98
-rw-r--r--_support/git-patches/0014-fetch-avoid-second-connectivity-check-if-we-already-.patch78
9 files changed, 844 insertions, 1 deletions
diff --git a/Makefile b/Makefile
index c88e6a088..044949f5e 100644
--- a/Makefile
+++ b/Makefile
@@ -125,13 +125,30 @@ ifeq ($(origin GIT_PATCHES),undefined)
GIT_PATCHES += 0005-commit-graph-split-out-function-to-search-commit-pos.patch
GIT_PATCHES += 0006-revision-avoid-hitting-packfiles-when-commits-are-in.patch
+ # Due to a bug, fetches with `--quiet` were slower than those without
+ # because Git formatted each reference into the output buffer even though
+ # it wasn't used. This has been merged into next via 2440a8a2aa (Merge
+ # branch 'ps/fetch-omit-formatting-under-quiet' into next, 2021-09-01)
+ GIT_PATCHES += 0007-fetch-skip-formatting-updated-refs-with-quiet.patch
+
+ # This patch set speeds up fetches, most importantly by making better use
+ # of the commit graph. They have been merged into next via 99f865125d
+ # (Merge branch 'ps/fetch-optim' into next, 2021-09-08).
+ GIT_PATCHES += 0008-fetch-speed-up-lookup-of-want-refs-via-commit-graph.patch
+ GIT_PATCHES += 0009-fetch-avoid-unpacking-headers-in-object-existence-ch.patch
+ GIT_PATCHES += 0010-connected-refactor-iterator-to-return-next-object-ID.patch
+ GIT_PATCHES += 0011-fetch-pack-optimize-loading-of-refs-via-commit-graph.patch
+ GIT_PATCHES += 0012-fetch-refactor-fetch-refs-to-be-more-extendable.patch
+ GIT_PATCHES += 0013-fetch-merge-fetching-and-consuming-refs.patch
+ GIT_PATCHES += 0014-fetch-avoid-second-connectivity-check-if-we-already-.patch
+
# This extra version has two intentions: first, it allows us to detect
# capabilities of the command at runtime. Second, it helps admins to
# discover which version is currently in use. As such, this version must be
# incremented whenever a new patch is added above. When no patches exist,
# then this should be undefined. Otherwise, it must be set to at least
# `gl1` given that `0` is the "default" GitLab patch level.
- GIT_EXTRA_VERSION := gl1
+ GIT_EXTRA_VERSION := gl2
endif
ifeq ($(origin GIT_BUILD_OPTIONS),undefined)
diff --git a/_support/git-patches/0007-fetch-skip-formatting-updated-refs-with-quiet.patch b/_support/git-patches/0007-fetch-skip-formatting-updated-refs-with-quiet.patch
new file mode 100644
index 000000000..04c7f382a
--- /dev/null
+++ b/_support/git-patches/0007-fetch-skip-formatting-updated-refs-with-quiet.patch
@@ -0,0 +1,107 @@
+From f6bb64df82ddd050894ca8a2a0bfbd1997602500 Mon Sep 17 00:00:00 2001
+Message-Id: <f6bb64df82ddd050894ca8a2a0bfbd1997602500.1631166264.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Mon, 30 Aug 2021 12:54:26 +0200
+Subject: [PATCH] fetch: skip formatting updated refs with `--quiet`
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When fetching, Git will by default print a list of all updated refs in a
+nicely formatted table. In order to come up with this table, Git needs
+to iterate refs twice: first to determine the maximum column width, and
+a second time to actually format these changed refs.
+
+While this table will not be printed in case the user passes `--quiet`,
+we still go out of our way and do all these steps. In fact, we even do
+more work compared to not passing `--quiet`: without the flag, we will
+skip all references in the column width computation which have not been
+updated, but if it is set we will now compute widths for all refs.
+
+Fix this issue by completely skipping both preparation of the format and
+formatting data for display in case the user passes `--quiet`, improving
+performance especially with many refs. The following benchmark shows a
+nice speedup for a quiet mirror-fetch in a repository with 2.3M refs:
+
+ Benchmark #1: HEAD~: git-fetch
+ Time (mean ± σ): 26.929 s ± 0.145 s [User: 24.194 s, System: 4.656 s]
+ Range (min … max): 26.692 s … 27.068 s 5 runs
+
+ Benchmark #2: HEAD: git-fetch
+ Time (mean ± σ): 25.189 s ± 0.094 s [User: 22.556 s, System: 4.606 s]
+ Range (min … max): 25.070 s … 25.314 s 5 runs
+
+ Summary
+ 'HEAD: git-fetch' ran
+ 1.07 ± 0.01 times faster than 'HEAD~: git-fetch'
+
+While at it, this patch also fixes `adjust_refcol_width()` such that it
+skips unchanged refs in case the user passed `--quiet`, where verbosity
+will be negative. While this function won't be called anymore if so,
+this brings the comment in line with actual code. Furthermore, needless
+`verbosity >= 0` checks are now removed in `store_updated_refs()`: we
+never print to the `note` buffer anymore in case `verbosity < 0`, so we
+won't end up in that code block anyway.
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ builtin/fetch.c | 17 ++++++++++++-----
+ 1 file changed, 12 insertions(+), 5 deletions(-)
+
+diff --git a/builtin/fetch.c b/builtin/fetch.c
+index 25740c13df..334bc7efa6 100644
+--- a/builtin/fetch.c
++++ b/builtin/fetch.c
+@@ -712,7 +712,7 @@ static void adjust_refcol_width(const struct ref *ref)
+ int max, rlen, llen, len;
+
+ /* uptodate lines are only shown on high verbosity level */
+- if (!verbosity && oideq(&ref->peer_ref->old_oid, &ref->old_oid))
++ if (verbosity <= 0 && oideq(&ref->peer_ref->old_oid, &ref->old_oid))
+ return;
+
+ max = term_columns();
+@@ -748,6 +748,9 @@ static void prepare_format_display(struct ref *ref_map)
+ struct ref *rm;
+ const char *format = "full";
+
++ if (verbosity < 0)
++ return;
++
+ git_config_get_string_tmp("fetch.output", &format);
+ if (!strcasecmp(format, "full"))
+ compact_format = 0;
+@@ -827,7 +830,12 @@ static void format_display(struct strbuf *display, char code,
+ const char *remote, const char *local,
+ int summary_width)
+ {
+- int width = (summary_width + strlen(summary) - gettext_width(summary));
++ int width;
++
++ if (verbosity < 0)
++ return;
++
++ width = (summary_width + strlen(summary) - gettext_width(summary));
+
+ strbuf_addf(display, "%c %-*s ", code, width, summary);
+ if (!compact_format)
+@@ -1202,13 +1210,12 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
+ "FETCH_HEAD", summary_width);
+ }
+ if (note.len) {
+- if (verbosity >= 0 && !shown_url) {
++ if (!shown_url) {
+ fprintf(stderr, _("From %.*s\n"),
+ url_len, url);
+ shown_url = 1;
+ }
+- if (verbosity >= 0)
+- fprintf(stderr, " %s\n", note.buf);
++ fprintf(stderr, " %s\n", note.buf);
+ }
+ }
+ }
+--
+2.33.0
+
diff --git a/_support/git-patches/0008-fetch-speed-up-lookup-of-want-refs-via-commit-graph.patch b/_support/git-patches/0008-fetch-speed-up-lookup-of-want-refs-via-commit-graph.patch
new file mode 100644
index 000000000..2b8505d6e
--- /dev/null
+++ b/_support/git-patches/0008-fetch-speed-up-lookup-of-want-refs-via-commit-graph.patch
@@ -0,0 +1,95 @@
+From fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692 Mon Sep 17 00:00:00 2001
+Message-Id: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Wed, 1 Sep 2021 15:09:41 +0200
+Subject: [PATCH 08/14] fetch: speed up lookup of want refs via commit-graph
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When updating our local refs based on the refs fetched from the remote,
+we need to iterate through all requested refs and load their respective
+commits such that we can determine whether they need to be appended to
+FETCH_HEAD or not. In cases where we're fetching from a remote with
+exceedingly many refs, resolving these refs can be quite expensive given
+that we repeatedly need to unpack object headers for each of the
+referenced objects.
+
+Speed this up by opportunistically trying to resolve object IDs via the
+commit graph. We only do so for any refs which are not in "refs/tags":
+more likely than not, these are going to be a commit anyway, and this
+lets us avoid having to unpack object headers completely in case the
+object is a commit that is part of the commit-graph. This significantly
+speeds up mirror-fetches in a real-world repository with
+2.3M refs:
+
+ Benchmark #1: HEAD~: git-fetch
+ Time (mean ± σ): 56.482 s ± 0.384 s [User: 53.340 s, System: 5.365 s]
+ Range (min … max): 56.050 s … 57.045 s 5 runs
+
+ Benchmark #2: HEAD: git-fetch
+ Time (mean ± σ): 33.727 s ± 0.170 s [User: 30.252 s, System: 5.194 s]
+ Range (min … max): 33.452 s … 33.871 s 5 runs
+
+ Summary
+ 'HEAD: git-fetch' ran
+ 1.67 ± 0.01 times faster than 'HEAD~: git-fetch'
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ builtin/fetch.c | 24 ++++++++++++++++++------
+ 1 file changed, 18 insertions(+), 6 deletions(-)
+
+diff --git a/builtin/fetch.c b/builtin/fetch.c
+index e064687dbd..91d1301613 100644
+--- a/builtin/fetch.c
++++ b/builtin/fetch.c
+@@ -1074,7 +1074,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
+ int connectivity_checked, struct ref *ref_map)
+ {
+ struct fetch_head fetch_head;
+- struct commit *commit;
+ int url_len, i, rc = 0;
+ struct strbuf note = STRBUF_INIT, err = STRBUF_INIT;
+ struct ref_transaction *transaction = NULL;
+@@ -1122,6 +1121,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
+ want_status <= FETCH_HEAD_IGNORE;
+ want_status++) {
+ for (rm = ref_map; rm; rm = rm->next) {
++ struct commit *commit = NULL;
+ struct ref *ref = NULL;
+
+ if (rm->status == REF_STATUS_REJECT_SHALLOW) {
+@@ -1131,11 +1131,23 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
+ continue;
+ }
+
+- commit = lookup_commit_reference_gently(the_repository,
+- &rm->old_oid,
+- 1);
+- if (!commit)
+- rm->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE;
++ /*
++ * References in "refs/tags/" are often going to point
++ * to annotated tags, which are not part of the
++ * commit-graph. We thus only try to look up refs in
++ * the graph which are not in that namespace to not
++ * regress performance in repositories with many
++ * annotated tags.
++ */
++ if (!starts_with(rm->name, "refs/tags/"))
++ commit = lookup_commit_in_graph(the_repository, &rm->old_oid);
++ if (!commit) {
++ commit = lookup_commit_reference_gently(the_repository,
++ &rm->old_oid,
++ 1);
++ if (!commit)
++ rm->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE;
++ }
+
+ if (rm->fetch_head_status != want_status)
+ continue;
+--
+2.33.0
+
diff --git a/_support/git-patches/0009-fetch-avoid-unpacking-headers-in-object-existence-ch.patch b/_support/git-patches/0009-fetch-avoid-unpacking-headers-in-object-existence-ch.patch
new file mode 100644
index 000000000..2d35aaebd
--- /dev/null
+++ b/_support/git-patches/0009-fetch-avoid-unpacking-headers-in-object-existence-ch.patch
@@ -0,0 +1,67 @@
+From 47c61004c7cfbb8662b13fac813b45e3fd214665 Mon Sep 17 00:00:00 2001
+Message-Id: <47c61004c7cfbb8662b13fac813b45e3fd214665.1631166322.git.ps@pks.im>
+In-Reply-To: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+References: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Wed, 1 Sep 2021 15:09:45 +0200
+Subject: [PATCH 09/14] fetch: avoid unpacking headers in object existence
+ check
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When updating local refs after the fetch has transferred all objects, we
+do an object existence test as a safety guard to avoid updating a ref to
+an object which we don't have. We do so via `oid_object_info()`: if it
+returns an error, then we know the object does not exist.
+
+One side effect of `oid_object_info()` is that it parses the object's
+type, and to do so it must unpack the object header. This is completely
+pointless: we don't care for the type, but only want to assert that the
+object exists.
+
+Refactor the code to use `repo_has_object_file()`, which both makes the
+code's intent clearer and is also faster because it does not unpack
+object headers. In a real-world repo with 2.3M refs, this results in a
+small speedup when doing a mirror-fetch:
+
+ Benchmark #1: HEAD~: git-fetch
+ Time (mean ± σ): 33.686 s ± 0.176 s [User: 30.119 s, System: 5.262 s]
+ Range (min … max): 33.512 s … 33.944 s 5 runs
+
+ Benchmark #2: HEAD: git-fetch
+ Time (mean ± σ): 31.247 s ± 0.195 s [User: 28.135 s, System: 5.066 s]
+ Range (min … max): 30.948 s … 31.472 s 5 runs
+
+ Summary
+ 'HEAD: git-fetch' ran
+ 1.08 ± 0.01 times faster than 'HEAD~: git-fetch'
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ builtin/fetch.c | 4 +---
+ 1 file changed, 1 insertion(+), 3 deletions(-)
+
+diff --git a/builtin/fetch.c b/builtin/fetch.c
+index 91d1301613..01513e6aea 100644
+--- a/builtin/fetch.c
++++ b/builtin/fetch.c
+@@ -846,13 +846,11 @@ static int update_local_ref(struct ref *ref,
+ int summary_width)
+ {
+ struct commit *current = NULL, *updated;
+- enum object_type type;
+ struct branch *current_branch = branch_get(NULL);
+ const char *pretty_ref = prettify_refname(ref->name);
+ int fast_forward = 0;
+
+- type = oid_object_info(the_repository, &ref->new_oid, NULL);
+- if (type < 0)
++ if (!repo_has_object_file(the_repository, &ref->new_oid))
+ die(_("object %s not found"), oid_to_hex(&ref->new_oid));
+
+ if (oideq(&ref->old_oid, &ref->new_oid)) {
+--
+2.33.0
+
diff --git a/_support/git-patches/0010-connected-refactor-iterator-to-return-next-object-ID.patch b/_support/git-patches/0010-connected-refactor-iterator-to-return-next-object-ID.patch
new file mode 100644
index 000000000..872b92784
--- /dev/null
+++ b/_support/git-patches/0010-connected-refactor-iterator-to-return-next-object-ID.patch
@@ -0,0 +1,260 @@
+From 9fec7b213045135655354e864d15894175428d5a Mon Sep 17 00:00:00 2001
+Message-Id: <9fec7b213045135655354e864d15894175428d5a.1631166322.git.ps@pks.im>
+In-Reply-To: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+References: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Wed, 1 Sep 2021 15:09:50 +0200
+Subject: [PATCH 10/14] connected: refactor iterator to return next object ID
+ directly
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The object ID iterator used by the connectivity checks returns the next
+object ID via an out-parameter and then uses a return code to indicate
+whether an item was found. This is a bit roundabout: instead of a
+separate error code, we can just return the next object ID directly and
+use `NULL` pointers as indicator that the iterator got no items left.
+Furthermore, this avoids a copy of the object ID.
+
+Refactor the iterator and all its implementations to return object IDs
+directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs:
+
+ Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch
+ Time (mean ± σ): 30.110 s ± 0.148 s [User: 27.161 s, System: 5.075 s]
+ Range (min … max): 29.934 s … 30.406 s 10 runs
+
+ Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch
+ Time (mean ± σ): 29.899 s ± 0.109 s [User: 26.916 s, System: 5.104 s]
+ Range (min … max): 29.696 s … 29.996 s 10 runs
+
+ Summary
+ '328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran
+ 1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch'
+
+While this 1% speedup could be labelled as statistically insignificant,
+the speedup is consistent on my machine. Furthermore, this is an end to
+end test, so it is expected that the improvement in the connectivity
+check itself is more significant.
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ builtin/clone.c | 8 +++-----
+ builtin/fetch.c | 7 +++----
+ builtin/receive-pack.c | 17 +++++++----------
+ connected.c | 15 ++++++++-------
+ connected.h | 2 +-
+ fetch-pack.c | 7 +++----
+ 6 files changed, 25 insertions(+), 31 deletions(-)
+
+diff --git a/builtin/clone.c b/builtin/clone.c
+index 66fe66679c..4a1056fcc2 100644
+--- a/builtin/clone.c
++++ b/builtin/clone.c
+@@ -657,7 +657,7 @@ static void write_followtags(const struct ref *refs, const char *msg)
+ }
+ }
+
+-static int iterate_ref_map(void *cb_data, struct object_id *oid)
++static const struct object_id *iterate_ref_map(void *cb_data)
+ {
+ struct ref **rm = cb_data;
+ struct ref *ref = *rm;
+@@ -668,13 +668,11 @@ static int iterate_ref_map(void *cb_data, struct object_id *oid)
+ */
+ while (ref && !ref->peer_ref)
+ ref = ref->next;
+- /* Returning -1 notes "end of list" to the caller. */
+ if (!ref)
+- return -1;
++ return NULL;
+
+- oidcpy(oid, &ref->old_oid);
+ *rm = ref->next;
+- return 0;
++ return &ref->old_oid;
+ }
+
+ static void update_remote_refs(const struct ref *refs,
+diff --git a/builtin/fetch.c b/builtin/fetch.c
+index 01513e6aea..cdf0d0d671 100644
+--- a/builtin/fetch.c
++++ b/builtin/fetch.c
+@@ -962,7 +962,7 @@ static int update_local_ref(struct ref *ref,
+ }
+ }
+
+-static int iterate_ref_map(void *cb_data, struct object_id *oid)
++static const struct object_id *iterate_ref_map(void *cb_data)
+ {
+ struct ref **rm = cb_data;
+ struct ref *ref = *rm;
+@@ -970,10 +970,9 @@ static int iterate_ref_map(void *cb_data, struct object_id *oid)
+ while (ref && ref->status == REF_STATUS_REJECT_SHALLOW)
+ ref = ref->next;
+ if (!ref)
+- return -1; /* end of the list */
++ return NULL;
+ *rm = ref->next;
+- oidcpy(oid, &ref->old_oid);
+- return 0;
++ return &ref->old_oid;
+ }
+
+ struct fetch_head {
+diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
+index 2d1f97e1ca..041e915454 100644
+--- a/builtin/receive-pack.c
++++ b/builtin/receive-pack.c
+@@ -1306,7 +1306,7 @@ static void refuse_unconfigured_deny_delete_current(void)
+ rp_error("%s", _(refuse_unconfigured_deny_delete_current_msg));
+ }
+
+-static int command_singleton_iterator(void *cb_data, struct object_id *oid);
++static const struct object_id *command_singleton_iterator(void *cb_data);
+ static int update_shallow_ref(struct command *cmd, struct shallow_info *si)
+ {
+ struct shallow_lock shallow_lock = SHALLOW_LOCK_INIT;
+@@ -1731,16 +1731,15 @@ static void check_aliased_updates(struct command *commands)
+ string_list_clear(&ref_list, 0);
+ }
+
+-static int command_singleton_iterator(void *cb_data, struct object_id *oid)
++static const struct object_id *command_singleton_iterator(void *cb_data)
+ {
+ struct command **cmd_list = cb_data;
+ struct command *cmd = *cmd_list;
+
+ if (!cmd || is_null_oid(&cmd->new_oid))
+- return -1; /* end of list */
++ return NULL;
+ *cmd_list = NULL; /* this returns only one */
+- oidcpy(oid, &cmd->new_oid);
+- return 0;
++ return &cmd->new_oid;
+ }
+
+ static void set_connectivity_errors(struct command *commands,
+@@ -1770,7 +1769,7 @@ struct iterate_data {
+ struct shallow_info *si;
+ };
+
+-static int iterate_receive_command_list(void *cb_data, struct object_id *oid)
++static const struct object_id *iterate_receive_command_list(void *cb_data)
+ {
+ struct iterate_data *data = cb_data;
+ struct command **cmd_list = &data->cmds;
+@@ -1781,13 +1780,11 @@ static int iterate_receive_command_list(void *cb_data, struct object_id *oid)
+ /* to be checked in update_shallow_ref() */
+ continue;
+ if (!is_null_oid(&cmd->new_oid) && !cmd->skip_update) {
+- oidcpy(oid, &cmd->new_oid);
+ *cmd_list = cmd->next;
+- return 0;
++ return &cmd->new_oid;
+ }
+ }
+- *cmd_list = NULL;
+- return -1; /* end of list */
++ return NULL;
+ }
+
+ static void reject_updates_to_hidden(struct command *commands)
+diff --git a/connected.c b/connected.c
+index b18299fdf0..35bd4a2638 100644
+--- a/connected.c
++++ b/connected.c
+@@ -24,7 +24,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
+ struct child_process rev_list = CHILD_PROCESS_INIT;
+ FILE *rev_list_in;
+ struct check_connected_options defaults = CHECK_CONNECTED_INIT;
+- struct object_id oid;
++ const struct object_id *oid;
+ int err = 0;
+ struct packed_git *new_pack = NULL;
+ struct transport *transport;
+@@ -34,7 +34,8 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
+ opt = &defaults;
+ transport = opt->transport;
+
+- if (fn(cb_data, &oid)) {
++ oid = fn(cb_data);
++ if (!oid) {
+ if (opt->err_fd)
+ close(opt->err_fd);
+ return err;
+@@ -73,7 +74,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
+ for (p = get_all_packs(the_repository); p; p = p->next) {
+ if (!p->pack_promisor)
+ continue;
+- if (find_pack_entry_one(oid.hash, p))
++ if (find_pack_entry_one(oid->hash, p))
+ goto promisor_pack_found;
+ }
+ /*
+@@ -83,7 +84,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
+ goto no_promisor_pack_found;
+ promisor_pack_found:
+ ;
+- } while (!fn(cb_data, &oid));
++ } while ((oid = fn(cb_data)) != NULL);
+ return 0;
+ }
+
+@@ -132,12 +133,12 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
+ * are sure the ref is good and not sending it to
+ * rev-list for verification.
+ */
+- if (new_pack && find_pack_entry_one(oid.hash, new_pack))
++ if (new_pack && find_pack_entry_one(oid->hash, new_pack))
+ continue;
+
+- if (fprintf(rev_list_in, "%s\n", oid_to_hex(&oid)) < 0)
++ if (fprintf(rev_list_in, "%s\n", oid_to_hex(oid)) < 0)
+ break;
+- } while (!fn(cb_data, &oid));
++ } while ((oid = fn(cb_data)) != NULL);
+
+ if (ferror(rev_list_in) || fflush(rev_list_in)) {
+ if (errno != EPIPE && errno != EINVAL)
+diff --git a/connected.h b/connected.h
+index 8d5a6b3ad6..6e59c92aa3 100644
+--- a/connected.h
++++ b/connected.h
+@@ -9,7 +9,7 @@ struct transport;
+ * When called after returning the name for the last object, return -1
+ * to signal EOF, otherwise return 0.
+ */
+-typedef int (*oid_iterate_fn)(void *, struct object_id *oid);
++typedef const struct object_id *(*oid_iterate_fn)(void *);
+
+ /*
+ * Named-arguments struct for check_connected. All arguments are
+diff --git a/fetch-pack.c b/fetch-pack.c
+index 0bf7ed7e47..e6ec79f81a 100644
+--- a/fetch-pack.c
++++ b/fetch-pack.c
+@@ -1912,16 +1912,15 @@ static void update_shallow(struct fetch_pack_args *args,
+ oid_array_clear(&ref);
+ }
+
+-static int iterate_ref_map(void *cb_data, struct object_id *oid)
++static const struct object_id *iterate_ref_map(void *cb_data)
+ {
+ struct ref **rm = cb_data;
+ struct ref *ref = *rm;
+
+ if (!ref)
+- return -1; /* end of the list */
++ return NULL;
+ *rm = ref->next;
+- oidcpy(oid, &ref->old_oid);
+- return 0;
++ return &ref->old_oid;
+ }
+
+ struct ref *fetch_pack(struct fetch_pack_args *args,
+--
+2.33.0
+
diff --git a/_support/git-patches/0011-fetch-pack-optimize-loading-of-refs-via-commit-graph.patch b/_support/git-patches/0011-fetch-pack-optimize-loading-of-refs-via-commit-graph.patch
new file mode 100644
index 000000000..f0b9af0c1
--- /dev/null
+++ b/_support/git-patches/0011-fetch-pack-optimize-loading-of-refs-via-commit-graph.patch
@@ -0,0 +1,61 @@
+From 62b5a35a33ad6a4537e2ae75a49036e4173fcc87 Mon Sep 17 00:00:00 2001
+Message-Id: <62b5a35a33ad6a4537e2ae75a49036e4173fcc87.1631166322.git.ps@pks.im>
+In-Reply-To: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+References: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Wed, 1 Sep 2021 15:09:54 +0200
+Subject: [PATCH 11/14] fetch-pack: optimize loading of refs via commit graph
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+In order to negotiate a packfile, we need to dereference refs to see
+which commits we have in common with the remote. To do so, we first look
+up the object's type -- if it's a tag, we peel until we hit a non-tag
+object. If we hit a commit eventually, then we return that commit.
+
+In case the object ID points to a commit directly, we can avoid the
+initial lookup of the object type by opportunistically looking up the
+commit via the commit-graph, if available, which gives us a slight speed
+bump of about 2% in a huge repository with about 2.3M refs:
+
+ Benchmark #1: HEAD~: git-fetch
+ Time (mean ± σ): 31.634 s ± 0.258 s [User: 28.400 s, System: 5.090 s]
+ Range (min … max): 31.280 s … 31.896 s 5 runs
+
+ Benchmark #2: HEAD: git-fetch
+ Time (mean ± σ): 31.129 s ± 0.543 s [User: 27.976 s, System: 5.056 s]
+ Range (min … max): 30.172 s … 31.479 s 5 runs
+
+ Summary
+ 'HEAD: git-fetch' ran
+ 1.02 ± 0.02 times faster than 'HEAD~: git-fetch'
+
+In case this fails, we fall back to the old code which peels the
+objects to a commit.
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ fetch-pack.c | 5 +++++
+ 1 file changed, 5 insertions(+)
+
+diff --git a/fetch-pack.c b/fetch-pack.c
+index e6ec79f81a..a9604f35a3 100644
+--- a/fetch-pack.c
++++ b/fetch-pack.c
+@@ -119,6 +119,11 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
+ {
+ enum object_type type;
+ struct object_info info = { .typep = &type };
++ struct commit *commit;
++
++ commit = lookup_commit_in_graph(the_repository, oid);
++ if (commit)
++ return commit;
+
+ while (1) {
+ if (oid_object_info_extended(the_repository, oid, &info,
+--
+2.33.0
+
diff --git a/_support/git-patches/0012-fetch-refactor-fetch-refs-to-be-more-extendable.patch b/_support/git-patches/0012-fetch-refactor-fetch-refs-to-be-more-extendable.patch
new file mode 100644
index 000000000..e2ef70836
--- /dev/null
+++ b/_support/git-patches/0012-fetch-refactor-fetch-refs-to-be-more-extendable.patch
@@ -0,0 +1,60 @@
+From 284b2ce8fcb100e7194b9cca6d9b99bca7da39b6 Mon Sep 17 00:00:00 2001
+Message-Id: <284b2ce8fcb100e7194b9cca6d9b99bca7da39b6.1631166322.git.ps@pks.im>
+In-Reply-To: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+References: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Wed, 1 Sep 2021 15:09:58 +0200
+Subject: [PATCH 12/14] fetch: refactor fetch refs to be more extendable
+
+Refactor `fetch_refs()` code to make it more extendable by explicitly
+handling error cases. The refactored code should behave the same.
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ builtin/fetch.c | 24 +++++++++++++++++-------
+ 1 file changed, 17 insertions(+), 7 deletions(-)
+
+diff --git a/builtin/fetch.c b/builtin/fetch.c
+index cdf0d0d671..ef6f9b3a33 100644
+--- a/builtin/fetch.c
++++ b/builtin/fetch.c
+@@ -1293,18 +1293,28 @@ static int check_exist_and_connected(struct ref *ref_map)
+
+ static int fetch_refs(struct transport *transport, struct ref *ref_map)
+ {
+- int ret = check_exist_and_connected(ref_map);
++ int ret;
++
++ /*
++ * We don't need to perform a fetch in case we can already satisfy all
++ * refs.
++ */
++ ret = check_exist_and_connected(ref_map);
+ if (ret) {
+ trace2_region_enter("fetch", "fetch_refs", the_repository);
+ ret = transport_fetch_refs(transport, ref_map);
+ trace2_region_leave("fetch", "fetch_refs", the_repository);
++ if (ret)
++ goto out;
+ }
+- if (!ret)
+- /*
+- * Keep the new pack's ".keep" file around to allow the caller
+- * time to update refs to reference the new objects.
+- */
+- return 0;
++
++ /*
++ * Keep the new pack's ".keep" file around to allow the caller
++ * time to update refs to reference the new objects.
++ */
++ return ret;
++
++out:
+ transport_unlock_pack(transport);
+ return ret;
+ }
+--
+2.33.0
+
diff --git a/_support/git-patches/0013-fetch-merge-fetching-and-consuming-refs.patch b/_support/git-patches/0013-fetch-merge-fetching-and-consuming-refs.patch
new file mode 100644
index 000000000..ba792717c
--- /dev/null
+++ b/_support/git-patches/0013-fetch-merge-fetching-and-consuming-refs.patch
@@ -0,0 +1,98 @@
+From 1c7d1ab6f4a79e44406f304ec01b0a143dae9abb Mon Sep 17 00:00:00 2001
+Message-Id: <1c7d1ab6f4a79e44406f304ec01b0a143dae9abb.1631166322.git.ps@pks.im>
+In-Reply-To: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+References: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Wed, 1 Sep 2021 15:10:02 +0200
+Subject: [PATCH 13/14] fetch: merge fetching and consuming refs
+
+The functions `fetch_refs()` and `consume_refs()` must always be called
+together such that we first obtain all missing objects and then update
+our local refs to match the remote refs. In a subsequent patch, we'll
+further require that `fetch_refs()` must always be called before
+`consume_refs()` such that it can correctly assert that we have all
+objects after the fetch given that we're about to move the connectivity
+check.
+
+Make this requirement explicit by merging both functions into a single
+`fetch_and_consume_refs()` function.
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ builtin/fetch.c | 30 +++++++++---------------------
+ 1 file changed, 9 insertions(+), 21 deletions(-)
+
+diff --git a/builtin/fetch.c b/builtin/fetch.c
+index ef6f9b3a33..a1e17edd8b 100644
+--- a/builtin/fetch.c
++++ b/builtin/fetch.c
+@@ -1291,8 +1291,9 @@ static int check_exist_and_connected(struct ref *ref_map)
+ return check_connected(iterate_ref_map, &rm, &opt);
+ }
+
+-static int fetch_refs(struct transport *transport, struct ref *ref_map)
++static int fetch_and_consume_refs(struct transport *transport, struct ref *ref_map)
+ {
++ int connectivity_checked;
+ int ret;
+
+ /*
+@@ -1308,30 +1309,18 @@ static int fetch_refs(struct transport *transport, struct ref *ref_map)
+ goto out;
+ }
+
+- /*
+- * Keep the new pack's ".keep" file around to allow the caller
+- * time to update refs to reference the new objects.
+- */
+- return ret;
+-
+-out:
+- transport_unlock_pack(transport);
+- return ret;
+-}
+-
+-/* Update local refs based on the ref values fetched from a remote */
+-static int consume_refs(struct transport *transport, struct ref *ref_map)
+-{
+- int connectivity_checked = transport->smart_options
++ connectivity_checked = transport->smart_options
+ ? transport->smart_options->connectivity_checked : 0;
+- int ret;
++
+ trace2_region_enter("fetch", "consume_refs", the_repository);
+ ret = store_updated_refs(transport->url,
+ transport->remote->name,
+ connectivity_checked,
+ ref_map);
+- transport_unlock_pack(transport);
+ trace2_region_leave("fetch", "consume_refs", the_repository);
++
++out:
++ transport_unlock_pack(transport);
+ return ret;
+ }
+
+@@ -1520,8 +1509,7 @@ static void backfill_tags(struct transport *transport, struct ref *ref_map)
+ transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
+ transport_set_option(transport, TRANS_OPT_DEPTH, "0");
+ transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
+- if (!fetch_refs(transport, ref_map))
+- consume_refs(transport, ref_map);
++ fetch_and_consume_refs(transport, ref_map);
+
+ if (gsecondary) {
+ transport_disconnect(gsecondary);
+@@ -1612,7 +1600,7 @@ static int do_fetch(struct transport *transport,
+ transport->url);
+ }
+ }
+- if (fetch_refs(transport, ref_map) || consume_refs(transport, ref_map)) {
++ if (fetch_and_consume_refs(transport, ref_map)) {
+ free_refs(ref_map);
+ retcode = 1;
+ goto cleanup;
+--
+2.33.0
+
diff --git a/_support/git-patches/0014-fetch-avoid-second-connectivity-check-if-we-already-.patch b/_support/git-patches/0014-fetch-avoid-second-connectivity-check-if-we-already-.patch
new file mode 100644
index 000000000..89a16dfa0
--- /dev/null
+++ b/_support/git-patches/0014-fetch-avoid-second-connectivity-check-if-we-already-.patch
@@ -0,0 +1,78 @@
+From caff8b73402d4b5edb2c6c755506c5a90351b69a Mon Sep 17 00:00:00 2001
+Message-Id: <caff8b73402d4b5edb2c6c755506c5a90351b69a.1631166322.git.ps@pks.im>
+In-Reply-To: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+References: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Wed, 1 Sep 2021 15:10:06 +0200
+Subject: [PATCH 14/14] fetch: avoid second connectivity check if we already
+ have all objects
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When fetching refs, we are doing two connectivity checks:
+
+ - The first one is done such that we can skip fetching refs in the
+ case where we already have all objects referenced by the updated
+ set of refs.
+
+ - The second one verifies that we have all objects after we have
+ fetched objects.
+
+We always execute both connectivity checks, but this is wasteful in case
+the first connectivity check already notices that we have all objects
+locally available.
+
+Skip the second connectivity check in case we already had all objects
+available. This gives us a nice speedup when doing a mirror-fetch in a
+repository with about 2.3M refs where the fetching repo already has all
+objects:
+
+ Benchmark #1: HEAD~: git-fetch
+ Time (mean ± σ): 30.025 s ± 0.081 s [User: 27.070 s, System: 4.933 s]
+ Range (min … max): 29.900 s … 30.111 s 5 runs
+
+ Benchmark #2: HEAD: git-fetch
+ Time (mean ± σ): 25.574 s ± 0.177 s [User: 22.855 s, System: 4.683 s]
+ Range (min … max): 25.399 s … 25.765 s 5 runs
+
+ Summary
+ 'HEAD: git-fetch' ran
+ 1.17 ± 0.01 times faster than 'HEAD~: git-fetch'
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ builtin/fetch.c | 7 +++----
+ 1 file changed, 3 insertions(+), 4 deletions(-)
+
+diff --git a/builtin/fetch.c b/builtin/fetch.c
+index a1e17edd8b..e2c952ec67 100644
+--- a/builtin/fetch.c
++++ b/builtin/fetch.c
+@@ -1293,7 +1293,7 @@ static int check_exist_and_connected(struct ref *ref_map)
+
+ static int fetch_and_consume_refs(struct transport *transport, struct ref *ref_map)
+ {
+- int connectivity_checked;
++ int connectivity_checked = 1;
+ int ret;
+
+ /*
+@@ -1307,11 +1307,10 @@ static int fetch_and_consume_refs(struct transport *transport, struct ref *ref_m
+ trace2_region_leave("fetch", "fetch_refs", the_repository);
+ if (ret)
+ goto out;
++ connectivity_checked = transport->smart_options ?
++ transport->smart_options->connectivity_checked : 0;
+ }
+
+- connectivity_checked = transport->smart_options
+- ? transport->smart_options->connectivity_checked : 0;
+-
+ trace2_region_enter("fetch", "consume_refs", the_repository);
+ ret = store_updated_refs(transport->url,
+ transport->remote->name,
+--
+2.33.0
+