Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorChristian Couder <chriscool@tuxfamily.org>2021-09-13 15:55:57 +0300
committerChristian Couder <chriscool@tuxfamily.org>2021-09-13 15:55:57 +0300
commitff2f210d613cff6d208afd9409860c3394c8fb0e (patch)
tree7f01fe8cdaff6f5e08af719e6d060df3489b6ff8
parentb1a3ae70efd6b3c45d596060b6ef9b11fa723249 (diff)
parent6ebcf23f1aa77d21c98789a9823a6fde362a09c4 (diff)
Merge branch 'pks-git-fetch-speedups' into 'master'
Makefile: Apply Git patches to speed up fetches Closes git#95 See merge request gitlab-org/gitaly!3848
-rw-r--r--Makefile23
-rw-r--r--_support/git-patches/0007-fetch-skip-formatting-updated-refs-with-quiet.patch107
-rw-r--r--_support/git-patches/0008-fetch-speed-up-lookup-of-want-refs-via-commit-graph.patch95
-rw-r--r--_support/git-patches/0009-fetch-avoid-unpacking-headers-in-object-existence-ch.patch67
-rw-r--r--_support/git-patches/0010-connected-refactor-iterator-to-return-next-object-ID.patch260
-rw-r--r--_support/git-patches/0011-fetch-pack-optimize-loading-of-refs-via-commit-graph.patch61
-rw-r--r--_support/git-patches/0012-fetch-refactor-fetch-refs-to-be-more-extendable.patch60
-rw-r--r--_support/git-patches/0013-fetch-merge-fetching-and-consuming-refs.patch98
-rw-r--r--_support/git-patches/0014-fetch-avoid-second-connectivity-check-if-we-already-.patch78
9 files changed, 848 insertions, 1 deletions
diff --git a/Makefile b/Makefile
index db5f4b46d..044949f5e 100644
--- a/Makefile
+++ b/Makefile
@@ -114,6 +114,10 @@ ifeq ($(origin GIT_PATCHES),undefined)
# Before adding custom patches, please read doc/PROCESS.md#Patching-git
# first to make sure your patches meet our acceptance criteria. Patches
# must be put into `_support/git-patches`.
+
+ # The following set of patches speeds up connectivity checks and thus
+ # pushes into Gitaly. They have been merged into next via a5619d4f8d (Merge
+ # branch 'ps/connectivity-optim', 2021-09-03)
GIT_PATCHES += 0001-fetch-pack-speed-up-loading-of-refs-via-commit-graph.patch
GIT_PATCHES += 0002-revision-separate-walk-and-unsorted-flags.patch
GIT_PATCHES += 0003-connected-do-not-sort-input-revisions.patch
@@ -121,13 +125,30 @@ ifeq ($(origin GIT_PATCHES),undefined)
GIT_PATCHES += 0005-commit-graph-split-out-function-to-search-commit-pos.patch
GIT_PATCHES += 0006-revision-avoid-hitting-packfiles-when-commits-are-in.patch
+ # Due to a bug, fetches with `--quiet` were slower than those without
+ # because Git formatted each reference into the output buffer even though
+ # it wasn't used. This has been merged into next via 2440a8a2aa (Merge
+ # branch 'ps/fetch-omit-formatting-under-quiet' into next, 2021-09-01)
+ GIT_PATCHES += 0007-fetch-skip-formatting-updated-refs-with-quiet.patch
+
+ # This patch set speeds up fetches, most importantly by making better use
+ # of the commit graph. They have been merged into next via 99f865125d
+ # (Merge branch 'ps/fetch-optim' into next, 2021-09-08).
+ GIT_PATCHES += 0008-fetch-speed-up-lookup-of-want-refs-via-commit-graph.patch
+ GIT_PATCHES += 0009-fetch-avoid-unpacking-headers-in-object-existence-ch.patch
+ GIT_PATCHES += 0010-connected-refactor-iterator-to-return-next-object-ID.patch
+ GIT_PATCHES += 0011-fetch-pack-optimize-loading-of-refs-via-commit-graph.patch
+ GIT_PATCHES += 0012-fetch-refactor-fetch-refs-to-be-more-extendable.patch
+ GIT_PATCHES += 0013-fetch-merge-fetching-and-consuming-refs.patch
+ GIT_PATCHES += 0014-fetch-avoid-second-connectivity-check-if-we-already-.patch
+
# This extra version has two intentions: first, it allows us to detect
# capabilities of the command at runtime. Second, it helps admins to
# discover which version is currently in use. As such, this version must be
# incremented whenever a new patch is added above. When no patches exist,
# then this should be undefined. Otherwise, it must be set to at least
# `gl1` given that `0` is the "default" GitLab patch level.
- GIT_EXTRA_VERSION := gl1
+ GIT_EXTRA_VERSION := gl2
endif
ifeq ($(origin GIT_BUILD_OPTIONS),undefined)
diff --git a/_support/git-patches/0007-fetch-skip-formatting-updated-refs-with-quiet.patch b/_support/git-patches/0007-fetch-skip-formatting-updated-refs-with-quiet.patch
new file mode 100644
index 000000000..04c7f382a
--- /dev/null
+++ b/_support/git-patches/0007-fetch-skip-formatting-updated-refs-with-quiet.patch
@@ -0,0 +1,107 @@
+From f6bb64df82ddd050894ca8a2a0bfbd1997602500 Mon Sep 17 00:00:00 2001
+Message-Id: <f6bb64df82ddd050894ca8a2a0bfbd1997602500.1631166264.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Mon, 30 Aug 2021 12:54:26 +0200
+Subject: [PATCH] fetch: skip formatting updated refs with `--quiet`
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When fetching, Git will by default print a list of all updated refs in a
+nicely formatted table. In order to come up with this table, Git needs
+to iterate refs twice: first to determine the maximum column width, and
+a second time to actually format these changed refs.
+
+While this table will not be printed in case the user passes `--quiet`,
+we still go out of our way and do all these steps. In fact, we even do
+more work compared to not passing `--quiet`: without the flag, we will
+skip all references in the column width computation which have not been
+updated, but if it is set we will now compute widths for all refs.
+
+Fix this issue by completely skipping both preparation of the format and
+formatting data for display in case the user passes `--quiet`, improving
+performance especially with many refs. The following benchmark shows a
+nice speedup for a quiet mirror-fetch in a repository with 2.3M refs:
+
+ Benchmark #1: HEAD~: git-fetch
+ Time (mean ± σ): 26.929 s ± 0.145 s [User: 24.194 s, System: 4.656 s]
+ Range (min … max): 26.692 s … 27.068 s 5 runs
+
+ Benchmark #2: HEAD: git-fetch
+ Time (mean ± σ): 25.189 s ± 0.094 s [User: 22.556 s, System: 4.606 s]
+ Range (min … max): 25.070 s … 25.314 s 5 runs
+
+ Summary
+ 'HEAD: git-fetch' ran
+ 1.07 ± 0.01 times faster than 'HEAD~: git-fetch'
+
+While at it, this patch also fixes `adjust_refcol_width()` such that it
+skips unchanged refs in case the user passed `--quiet`, where verbosity
+will be negative. While this function won't be called anymore if so,
+this brings the comment in line with actual code. Furthermore, needless
+`verbosity >= 0` checks are now removed in `store_updated_refs()`: we
+never print to the `note` buffer anymore in case `verbosity < 0`, so we
+won't end up in that code block anyway.
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ builtin/fetch.c | 17 ++++++++++++-----
+ 1 file changed, 12 insertions(+), 5 deletions(-)
+
+diff --git a/builtin/fetch.c b/builtin/fetch.c
+index 25740c13df..334bc7efa6 100644
+--- a/builtin/fetch.c
++++ b/builtin/fetch.c
+@@ -712,7 +712,7 @@ static void adjust_refcol_width(const struct ref *ref)
+ int max, rlen, llen, len;
+
+ /* uptodate lines are only shown on high verbosity level */
+- if (!verbosity && oideq(&ref->peer_ref->old_oid, &ref->old_oid))
++ if (verbosity <= 0 && oideq(&ref->peer_ref->old_oid, &ref->old_oid))
+ return;
+
+ max = term_columns();
+@@ -748,6 +748,9 @@ static void prepare_format_display(struct ref *ref_map)
+ struct ref *rm;
+ const char *format = "full";
+
++ if (verbosity < 0)
++ return;
++
+ git_config_get_string_tmp("fetch.output", &format);
+ if (!strcasecmp(format, "full"))
+ compact_format = 0;
+@@ -827,7 +830,12 @@ static void format_display(struct strbuf *display, char code,
+ const char *remote, const char *local,
+ int summary_width)
+ {
+- int width = (summary_width + strlen(summary) - gettext_width(summary));
++ int width;
++
++ if (verbosity < 0)
++ return;
++
++ width = (summary_width + strlen(summary) - gettext_width(summary));
+
+ strbuf_addf(display, "%c %-*s ", code, width, summary);
+ if (!compact_format)
+@@ -1202,13 +1210,12 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
+ "FETCH_HEAD", summary_width);
+ }
+ if (note.len) {
+- if (verbosity >= 0 && !shown_url) {
++ if (!shown_url) {
+ fprintf(stderr, _("From %.*s\n"),
+ url_len, url);
+ shown_url = 1;
+ }
+- if (verbosity >= 0)
+- fprintf(stderr, " %s\n", note.buf);
++ fprintf(stderr, " %s\n", note.buf);
+ }
+ }
+ }
+--
+2.33.0
+
diff --git a/_support/git-patches/0008-fetch-speed-up-lookup-of-want-refs-via-commit-graph.patch b/_support/git-patches/0008-fetch-speed-up-lookup-of-want-refs-via-commit-graph.patch
new file mode 100644
index 000000000..2b8505d6e
--- /dev/null
+++ b/_support/git-patches/0008-fetch-speed-up-lookup-of-want-refs-via-commit-graph.patch
@@ -0,0 +1,95 @@
+From fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692 Mon Sep 17 00:00:00 2001
+Message-Id: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Wed, 1 Sep 2021 15:09:41 +0200
+Subject: [PATCH 08/14] fetch: speed up lookup of want refs via commit-graph
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When updating our local refs based on the refs fetched from the remote,
+we need to iterate through all requested refs and load their respective
+commits such that we can determine whether they need to be appended to
+FETCH_HEAD or not. In cases where we're fetching from a remote with
+exceedingly many refs, resolving these refs can be quite expensive given
+that we repeatedly need to unpack object headers for each of the
+referenced objects.
+
+Speed this up by opportunistically trying to resolve object IDs via the
+commit graph. We only do so for any refs which are not in "refs/tags":
+more likely than not, these are going to be a commit anyway, and this
+lets us avoid having to unpack object headers completely in case the
+object is a commit that is part of the commit-graph. This significantly
+speeds up mirror-fetches in a real-world repository with
+2.3M refs:
+
+ Benchmark #1: HEAD~: git-fetch
+ Time (mean ± σ): 56.482 s ± 0.384 s [User: 53.340 s, System: 5.365 s]
+ Range (min … max): 56.050 s … 57.045 s 5 runs
+
+ Benchmark #2: HEAD: git-fetch
+ Time (mean ± σ): 33.727 s ± 0.170 s [User: 30.252 s, System: 5.194 s]
+ Range (min … max): 33.452 s … 33.871 s 5 runs
+
+ Summary
+ 'HEAD: git-fetch' ran
+ 1.67 ± 0.01 times faster than 'HEAD~: git-fetch'
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ builtin/fetch.c | 24 ++++++++++++++++++------
+ 1 file changed, 18 insertions(+), 6 deletions(-)
+
+diff --git a/builtin/fetch.c b/builtin/fetch.c
+index e064687dbd..91d1301613 100644
+--- a/builtin/fetch.c
++++ b/builtin/fetch.c
+@@ -1074,7 +1074,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
+ int connectivity_checked, struct ref *ref_map)
+ {
+ struct fetch_head fetch_head;
+- struct commit *commit;
+ int url_len, i, rc = 0;
+ struct strbuf note = STRBUF_INIT, err = STRBUF_INIT;
+ struct ref_transaction *transaction = NULL;
+@@ -1122,6 +1121,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
+ want_status <= FETCH_HEAD_IGNORE;
+ want_status++) {
+ for (rm = ref_map; rm; rm = rm->next) {
++ struct commit *commit = NULL;
+ struct ref *ref = NULL;
+
+ if (rm->status == REF_STATUS_REJECT_SHALLOW) {
+@@ -1131,11 +1131,23 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
+ continue;
+ }
+
+- commit = lookup_commit_reference_gently(the_repository,
+- &rm->old_oid,
+- 1);
+- if (!commit)
+- rm->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE;
++ /*
++ * References in "refs/tags/" are often going to point
++ * to annotated tags, which are not part of the
++ * commit-graph. We thus only try to look up refs in
++ * the graph which are not in that namespace to not
++ * regress performance in repositories with many
++ * annotated tags.
++ */
++ if (!starts_with(rm->name, "refs/tags/"))
++ commit = lookup_commit_in_graph(the_repository, &rm->old_oid);
++ if (!commit) {
++ commit = lookup_commit_reference_gently(the_repository,
++ &rm->old_oid,
++ 1);
++ if (!commit)
++ rm->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE;
++ }
+
+ if (rm->fetch_head_status != want_status)
+ continue;
+--
+2.33.0
+
diff --git a/_support/git-patches/0009-fetch-avoid-unpacking-headers-in-object-existence-ch.patch b/_support/git-patches/0009-fetch-avoid-unpacking-headers-in-object-existence-ch.patch
new file mode 100644
index 000000000..2d35aaebd
--- /dev/null
+++ b/_support/git-patches/0009-fetch-avoid-unpacking-headers-in-object-existence-ch.patch
@@ -0,0 +1,67 @@
+From 47c61004c7cfbb8662b13fac813b45e3fd214665 Mon Sep 17 00:00:00 2001
+Message-Id: <47c61004c7cfbb8662b13fac813b45e3fd214665.1631166322.git.ps@pks.im>
+In-Reply-To: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+References: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Wed, 1 Sep 2021 15:09:45 +0200
+Subject: [PATCH 09/14] fetch: avoid unpacking headers in object existence
+ check
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When updating local refs after the fetch has transferred all objects, we
+do an object existence test as a safety guard to avoid updating a ref to
+an object which we don't have. We do so via `oid_object_info()`: if it
+returns an error, then we know the object does not exist.
+
+One side effect of `oid_object_info()` is that it parses the object's
+type, and to do so it must unpack the object header. This is completely
+pointless: we don't care for the type, but only want to assert that the
+object exists.
+
+Refactor the code to use `repo_has_object_file()`, which both makes the
+code's intent clearer and is also faster because it does not unpack
+object headers. In a real-world repo with 2.3M refs, this results in a
+small speedup when doing a mirror-fetch:
+
+ Benchmark #1: HEAD~: git-fetch
+ Time (mean ± σ): 33.686 s ± 0.176 s [User: 30.119 s, System: 5.262 s]
+ Range (min … max): 33.512 s … 33.944 s 5 runs
+
+ Benchmark #2: HEAD: git-fetch
+ Time (mean ± σ): 31.247 s ± 0.195 s [User: 28.135 s, System: 5.066 s]
+ Range (min … max): 30.948 s … 31.472 s 5 runs
+
+ Summary
+ 'HEAD: git-fetch' ran
+ 1.08 ± 0.01 times faster than 'HEAD~: git-fetch'
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ builtin/fetch.c | 4 +---
+ 1 file changed, 1 insertion(+), 3 deletions(-)
+
+diff --git a/builtin/fetch.c b/builtin/fetch.c
+index 91d1301613..01513e6aea 100644
+--- a/builtin/fetch.c
++++ b/builtin/fetch.c
+@@ -846,13 +846,11 @@ static int update_local_ref(struct ref *ref,
+ int summary_width)
+ {
+ struct commit *current = NULL, *updated;
+- enum object_type type;
+ struct branch *current_branch = branch_get(NULL);
+ const char *pretty_ref = prettify_refname(ref->name);
+ int fast_forward = 0;
+
+- type = oid_object_info(the_repository, &ref->new_oid, NULL);
+- if (type < 0)
++ if (!repo_has_object_file(the_repository, &ref->new_oid))
+ die(_("object %s not found"), oid_to_hex(&ref->new_oid));
+
+ if (oideq(&ref->old_oid, &ref->new_oid)) {
+--
+2.33.0
+
diff --git a/_support/git-patches/0010-connected-refactor-iterator-to-return-next-object-ID.patch b/_support/git-patches/0010-connected-refactor-iterator-to-return-next-object-ID.patch
new file mode 100644
index 000000000..872b92784
--- /dev/null
+++ b/_support/git-patches/0010-connected-refactor-iterator-to-return-next-object-ID.patch
@@ -0,0 +1,260 @@
+From 9fec7b213045135655354e864d15894175428d5a Mon Sep 17 00:00:00 2001
+Message-Id: <9fec7b213045135655354e864d15894175428d5a.1631166322.git.ps@pks.im>
+In-Reply-To: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+References: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Wed, 1 Sep 2021 15:09:50 +0200
+Subject: [PATCH 10/14] connected: refactor iterator to return next object ID
+ directly
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The object ID iterator used by the connectivity checks returns the next
+object ID via an out-parameter and then uses a return code to indicate
+whether an item was found. This is a bit roundabout: instead of a
+separate error code, we can just return the next object ID directly and
+use `NULL` pointers as indicator that the iterator got no items left.
+Furthermore, this avoids a copy of the object ID.
+
+Refactor the iterator and all its implementations to return object IDs
+directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs:
+
+ Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch
+ Time (mean ± σ): 30.110 s ± 0.148 s [User: 27.161 s, System: 5.075 s]
+ Range (min … max): 29.934 s … 30.406 s 10 runs
+
+ Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch
+ Time (mean ± σ): 29.899 s ± 0.109 s [User: 26.916 s, System: 5.104 s]
+ Range (min … max): 29.696 s … 29.996 s 10 runs
+
+ Summary
+ '328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran
+ 1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch'
+
+While this 1% speedup could be labelled as statistically insignificant,
+the speedup is consistent on my machine. Furthermore, this is an end to
+end test, so it is expected that the improvement in the connectivity
+check itself is more significant.
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ builtin/clone.c | 8 +++-----
+ builtin/fetch.c | 7 +++----
+ builtin/receive-pack.c | 17 +++++++----------
+ connected.c | 15 ++++++++-------
+ connected.h | 2 +-
+ fetch-pack.c | 7 +++----
+ 6 files changed, 25 insertions(+), 31 deletions(-)
+
+diff --git a/builtin/clone.c b/builtin/clone.c
+index 66fe66679c..4a1056fcc2 100644
+--- a/builtin/clone.c
++++ b/builtin/clone.c
+@@ -657,7 +657,7 @@ static void write_followtags(const struct ref *refs, const char *msg)
+ }
+ }
+
+-static int iterate_ref_map(void *cb_data, struct object_id *oid)
++static const struct object_id *iterate_ref_map(void *cb_data)
+ {
+ struct ref **rm = cb_data;
+ struct ref *ref = *rm;
+@@ -668,13 +668,11 @@ static int iterate_ref_map(void *cb_data, struct object_id *oid)
+ */
+ while (ref && !ref->peer_ref)
+ ref = ref->next;
+- /* Returning -1 notes "end of list" to the caller. */
+ if (!ref)
+- return -1;
++ return NULL;
+
+- oidcpy(oid, &ref->old_oid);
+ *rm = ref->next;
+- return 0;
++ return &ref->old_oid;
+ }
+
+ static void update_remote_refs(const struct ref *refs,
+diff --git a/builtin/fetch.c b/builtin/fetch.c
+index 01513e6aea..cdf0d0d671 100644
+--- a/builtin/fetch.c
++++ b/builtin/fetch.c
+@@ -962,7 +962,7 @@ static int update_local_ref(struct ref *ref,
+ }
+ }
+
+-static int iterate_ref_map(void *cb_data, struct object_id *oid)
++static const struct object_id *iterate_ref_map(void *cb_data)
+ {
+ struct ref **rm = cb_data;
+ struct ref *ref = *rm;
+@@ -970,10 +970,9 @@ static int iterate_ref_map(void *cb_data, struct object_id *oid)
+ while (ref && ref->status == REF_STATUS_REJECT_SHALLOW)
+ ref = ref->next;
+ if (!ref)
+- return -1; /* end of the list */
++ return NULL;
+ *rm = ref->next;
+- oidcpy(oid, &ref->old_oid);
+- return 0;
++ return &ref->old_oid;
+ }
+
+ struct fetch_head {
+diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
+index 2d1f97e1ca..041e915454 100644
+--- a/builtin/receive-pack.c
++++ b/builtin/receive-pack.c
+@@ -1306,7 +1306,7 @@ static void refuse_unconfigured_deny_delete_current(void)
+ rp_error("%s", _(refuse_unconfigured_deny_delete_current_msg));
+ }
+
+-static int command_singleton_iterator(void *cb_data, struct object_id *oid);
++static const struct object_id *command_singleton_iterator(void *cb_data);
+ static int update_shallow_ref(struct command *cmd, struct shallow_info *si)
+ {
+ struct shallow_lock shallow_lock = SHALLOW_LOCK_INIT;
+@@ -1731,16 +1731,15 @@ static void check_aliased_updates(struct command *commands)
+ string_list_clear(&ref_list, 0);
+ }
+
+-static int command_singleton_iterator(void *cb_data, struct object_id *oid)
++static const struct object_id *command_singleton_iterator(void *cb_data)
+ {
+ struct command **cmd_list = cb_data;
+ struct command *cmd = *cmd_list;
+
+ if (!cmd || is_null_oid(&cmd->new_oid))
+- return -1; /* end of list */
++ return NULL;
+ *cmd_list = NULL; /* this returns only one */
+- oidcpy(oid, &cmd->new_oid);
+- return 0;
++ return &cmd->new_oid;
+ }
+
+ static void set_connectivity_errors(struct command *commands,
+@@ -1770,7 +1769,7 @@ struct iterate_data {
+ struct shallow_info *si;
+ };
+
+-static int iterate_receive_command_list(void *cb_data, struct object_id *oid)
++static const struct object_id *iterate_receive_command_list(void *cb_data)
+ {
+ struct iterate_data *data = cb_data;
+ struct command **cmd_list = &data->cmds;
+@@ -1781,13 +1780,11 @@ static int iterate_receive_command_list(void *cb_data, struct object_id *oid)
+ /* to be checked in update_shallow_ref() */
+ continue;
+ if (!is_null_oid(&cmd->new_oid) && !cmd->skip_update) {
+- oidcpy(oid, &cmd->new_oid);
+ *cmd_list = cmd->next;
+- return 0;
++ return &cmd->new_oid;
+ }
+ }
+- *cmd_list = NULL;
+- return -1; /* end of list */
++ return NULL;
+ }
+
+ static void reject_updates_to_hidden(struct command *commands)
+diff --git a/connected.c b/connected.c
+index b18299fdf0..35bd4a2638 100644
+--- a/connected.c
++++ b/connected.c
+@@ -24,7 +24,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
+ struct child_process rev_list = CHILD_PROCESS_INIT;
+ FILE *rev_list_in;
+ struct check_connected_options defaults = CHECK_CONNECTED_INIT;
+- struct object_id oid;
++ const struct object_id *oid;
+ int err = 0;
+ struct packed_git *new_pack = NULL;
+ struct transport *transport;
+@@ -34,7 +34,8 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
+ opt = &defaults;
+ transport = opt->transport;
+
+- if (fn(cb_data, &oid)) {
++ oid = fn(cb_data);
++ if (!oid) {
+ if (opt->err_fd)
+ close(opt->err_fd);
+ return err;
+@@ -73,7 +74,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
+ for (p = get_all_packs(the_repository); p; p = p->next) {
+ if (!p->pack_promisor)
+ continue;
+- if (find_pack_entry_one(oid.hash, p))
++ if (find_pack_entry_one(oid->hash, p))
+ goto promisor_pack_found;
+ }
+ /*
+@@ -83,7 +84,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
+ goto no_promisor_pack_found;
+ promisor_pack_found:
+ ;
+- } while (!fn(cb_data, &oid));
++ } while ((oid = fn(cb_data)) != NULL);
+ return 0;
+ }
+
+@@ -132,12 +133,12 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
+ * are sure the ref is good and not sending it to
+ * rev-list for verification.
+ */
+- if (new_pack && find_pack_entry_one(oid.hash, new_pack))
++ if (new_pack && find_pack_entry_one(oid->hash, new_pack))
+ continue;
+
+- if (fprintf(rev_list_in, "%s\n", oid_to_hex(&oid)) < 0)
++ if (fprintf(rev_list_in, "%s\n", oid_to_hex(oid)) < 0)
+ break;
+- } while (!fn(cb_data, &oid));
++ } while ((oid = fn(cb_data)) != NULL);
+
+ if (ferror(rev_list_in) || fflush(rev_list_in)) {
+ if (errno != EPIPE && errno != EINVAL)
+diff --git a/connected.h b/connected.h
+index 8d5a6b3ad6..6e59c92aa3 100644
+--- a/connected.h
++++ b/connected.h
+@@ -9,7 +9,7 @@ struct transport;
+ * When called after returning the name for the last object, return -1
+ * to signal EOF, otherwise return 0.
+ */
+-typedef int (*oid_iterate_fn)(void *, struct object_id *oid);
++typedef const struct object_id *(*oid_iterate_fn)(void *);
+
+ /*
+ * Named-arguments struct for check_connected. All arguments are
+diff --git a/fetch-pack.c b/fetch-pack.c
+index 0bf7ed7e47..e6ec79f81a 100644
+--- a/fetch-pack.c
++++ b/fetch-pack.c
+@@ -1912,16 +1912,15 @@ static void update_shallow(struct fetch_pack_args *args,
+ oid_array_clear(&ref);
+ }
+
+-static int iterate_ref_map(void *cb_data, struct object_id *oid)
++static const struct object_id *iterate_ref_map(void *cb_data)
+ {
+ struct ref **rm = cb_data;
+ struct ref *ref = *rm;
+
+ if (!ref)
+- return -1; /* end of the list */
++ return NULL;
+ *rm = ref->next;
+- oidcpy(oid, &ref->old_oid);
+- return 0;
++ return &ref->old_oid;
+ }
+
+ struct ref *fetch_pack(struct fetch_pack_args *args,
+--
+2.33.0
+
diff --git a/_support/git-patches/0011-fetch-pack-optimize-loading-of-refs-via-commit-graph.patch b/_support/git-patches/0011-fetch-pack-optimize-loading-of-refs-via-commit-graph.patch
new file mode 100644
index 000000000..f0b9af0c1
--- /dev/null
+++ b/_support/git-patches/0011-fetch-pack-optimize-loading-of-refs-via-commit-graph.patch
@@ -0,0 +1,61 @@
+From 62b5a35a33ad6a4537e2ae75a49036e4173fcc87 Mon Sep 17 00:00:00 2001
+Message-Id: <62b5a35a33ad6a4537e2ae75a49036e4173fcc87.1631166322.git.ps@pks.im>
+In-Reply-To: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+References: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Wed, 1 Sep 2021 15:09:54 +0200
+Subject: [PATCH 11/14] fetch-pack: optimize loading of refs via commit graph
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+In order to negotiate a packfile, we need to dereference refs to see
+which commits we have in common with the remote. To do so, we first look
+up the object's type -- if it's a tag, we peel until we hit a non-tag
+object. If we hit a commit eventually, then we return that commit.
+
+In case the object ID points to a commit directly, we can avoid the
+initial lookup of the object type by opportunistically looking up the
+commit via the commit-graph, if available, which gives us a slight speed
+bump of about 2% in a huge repository with about 2.3M refs:
+
+ Benchmark #1: HEAD~: git-fetch
+ Time (mean ± σ): 31.634 s ± 0.258 s [User: 28.400 s, System: 5.090 s]
+ Range (min … max): 31.280 s … 31.896 s 5 runs
+
+ Benchmark #2: HEAD: git-fetch
+ Time (mean ± σ): 31.129 s ± 0.543 s [User: 27.976 s, System: 5.056 s]
+ Range (min … max): 30.172 s … 31.479 s 5 runs
+
+ Summary
+ 'HEAD: git-fetch' ran
+ 1.02 ± 0.02 times faster than 'HEAD~: git-fetch'
+
+In case this fails, we fall back to the old code which peels the
+objects to a commit.
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ fetch-pack.c | 5 +++++
+ 1 file changed, 5 insertions(+)
+
+diff --git a/fetch-pack.c b/fetch-pack.c
+index e6ec79f81a..a9604f35a3 100644
+--- a/fetch-pack.c
++++ b/fetch-pack.c
+@@ -119,6 +119,11 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
+ {
+ enum object_type type;
+ struct object_info info = { .typep = &type };
++ struct commit *commit;
++
++ commit = lookup_commit_in_graph(the_repository, oid);
++ if (commit)
++ return commit;
+
+ while (1) {
+ if (oid_object_info_extended(the_repository, oid, &info,
+--
+2.33.0
+
diff --git a/_support/git-patches/0012-fetch-refactor-fetch-refs-to-be-more-extendable.patch b/_support/git-patches/0012-fetch-refactor-fetch-refs-to-be-more-extendable.patch
new file mode 100644
index 000000000..e2ef70836
--- /dev/null
+++ b/_support/git-patches/0012-fetch-refactor-fetch-refs-to-be-more-extendable.patch
@@ -0,0 +1,60 @@
+From 284b2ce8fcb100e7194b9cca6d9b99bca7da39b6 Mon Sep 17 00:00:00 2001
+Message-Id: <284b2ce8fcb100e7194b9cca6d9b99bca7da39b6.1631166322.git.ps@pks.im>
+In-Reply-To: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+References: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Wed, 1 Sep 2021 15:09:58 +0200
+Subject: [PATCH 12/14] fetch: refactor fetch refs to be more extendable
+
+Refactor `fetch_refs()` code to make it more extendable by explicitly
+handling error cases. The refactored code should behave the same.
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ builtin/fetch.c | 24 +++++++++++++++++-------
+ 1 file changed, 17 insertions(+), 7 deletions(-)
+
+diff --git a/builtin/fetch.c b/builtin/fetch.c
+index cdf0d0d671..ef6f9b3a33 100644
+--- a/builtin/fetch.c
++++ b/builtin/fetch.c
+@@ -1293,18 +1293,28 @@ static int check_exist_and_connected(struct ref *ref_map)
+
+ static int fetch_refs(struct transport *transport, struct ref *ref_map)
+ {
+- int ret = check_exist_and_connected(ref_map);
++ int ret;
++
++ /*
++ * We don't need to perform a fetch in case we can already satisfy all
++ * refs.
++ */
++ ret = check_exist_and_connected(ref_map);
+ if (ret) {
+ trace2_region_enter("fetch", "fetch_refs", the_repository);
+ ret = transport_fetch_refs(transport, ref_map);
+ trace2_region_leave("fetch", "fetch_refs", the_repository);
++ if (ret)
++ goto out;
+ }
+- if (!ret)
+- /*
+- * Keep the new pack's ".keep" file around to allow the caller
+- * time to update refs to reference the new objects.
+- */
+- return 0;
++
++ /*
++ * Keep the new pack's ".keep" file around to allow the caller
++ * time to update refs to reference the new objects.
++ */
++ return ret;
++
++out:
+ transport_unlock_pack(transport);
+ return ret;
+ }
+--
+2.33.0
+
diff --git a/_support/git-patches/0013-fetch-merge-fetching-and-consuming-refs.patch b/_support/git-patches/0013-fetch-merge-fetching-and-consuming-refs.patch
new file mode 100644
index 000000000..ba792717c
--- /dev/null
+++ b/_support/git-patches/0013-fetch-merge-fetching-and-consuming-refs.patch
@@ -0,0 +1,98 @@
+From 1c7d1ab6f4a79e44406f304ec01b0a143dae9abb Mon Sep 17 00:00:00 2001
+Message-Id: <1c7d1ab6f4a79e44406f304ec01b0a143dae9abb.1631166322.git.ps@pks.im>
+In-Reply-To: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+References: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Wed, 1 Sep 2021 15:10:02 +0200
+Subject: [PATCH 13/14] fetch: merge fetching and consuming refs
+
+The functions `fetch_refs()` and `consume_refs()` must always be called
+together such that we first obtain all missing objects and then update
+our local refs to match the remote refs. In a subsequent patch, we'll
+further require that `fetch_refs()` must always be called before
+`consume_refs()` such that it can correctly assert that we have all
+objects after the fetch given that we're about to move the connectivity
+check.
+
+Make this requirement explicit by merging both functions into a single
+`fetch_and_consume_refs()` function.
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ builtin/fetch.c | 30 +++++++++---------------------
+ 1 file changed, 9 insertions(+), 21 deletions(-)
+
+diff --git a/builtin/fetch.c b/builtin/fetch.c
+index ef6f9b3a33..a1e17edd8b 100644
+--- a/builtin/fetch.c
++++ b/builtin/fetch.c
+@@ -1291,8 +1291,9 @@ static int check_exist_and_connected(struct ref *ref_map)
+ return check_connected(iterate_ref_map, &rm, &opt);
+ }
+
+-static int fetch_refs(struct transport *transport, struct ref *ref_map)
++static int fetch_and_consume_refs(struct transport *transport, struct ref *ref_map)
+ {
++ int connectivity_checked;
+ int ret;
+
+ /*
+@@ -1308,30 +1309,18 @@ static int fetch_refs(struct transport *transport, struct ref *ref_map)
+ goto out;
+ }
+
+- /*
+- * Keep the new pack's ".keep" file around to allow the caller
+- * time to update refs to reference the new objects.
+- */
+- return ret;
+-
+-out:
+- transport_unlock_pack(transport);
+- return ret;
+-}
+-
+-/* Update local refs based on the ref values fetched from a remote */
+-static int consume_refs(struct transport *transport, struct ref *ref_map)
+-{
+- int connectivity_checked = transport->smart_options
++ connectivity_checked = transport->smart_options
+ ? transport->smart_options->connectivity_checked : 0;
+- int ret;
++
+ trace2_region_enter("fetch", "consume_refs", the_repository);
+ ret = store_updated_refs(transport->url,
+ transport->remote->name,
+ connectivity_checked,
+ ref_map);
+- transport_unlock_pack(transport);
+ trace2_region_leave("fetch", "consume_refs", the_repository);
++
++out:
++ transport_unlock_pack(transport);
+ return ret;
+ }
+
+@@ -1520,8 +1509,7 @@ static void backfill_tags(struct transport *transport, struct ref *ref_map)
+ transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
+ transport_set_option(transport, TRANS_OPT_DEPTH, "0");
+ transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
+- if (!fetch_refs(transport, ref_map))
+- consume_refs(transport, ref_map);
++ fetch_and_consume_refs(transport, ref_map);
+
+ if (gsecondary) {
+ transport_disconnect(gsecondary);
+@@ -1612,7 +1600,7 @@ static int do_fetch(struct transport *transport,
+ transport->url);
+ }
+ }
+- if (fetch_refs(transport, ref_map) || consume_refs(transport, ref_map)) {
++ if (fetch_and_consume_refs(transport, ref_map)) {
+ free_refs(ref_map);
+ retcode = 1;
+ goto cleanup;
+--
+2.33.0
+
diff --git a/_support/git-patches/0014-fetch-avoid-second-connectivity-check-if-we-already-.patch b/_support/git-patches/0014-fetch-avoid-second-connectivity-check-if-we-already-.patch
new file mode 100644
index 000000000..89a16dfa0
--- /dev/null
+++ b/_support/git-patches/0014-fetch-avoid-second-connectivity-check-if-we-already-.patch
@@ -0,0 +1,78 @@
+From caff8b73402d4b5edb2c6c755506c5a90351b69a Mon Sep 17 00:00:00 2001
+Message-Id: <caff8b73402d4b5edb2c6c755506c5a90351b69a.1631166322.git.ps@pks.im>
+In-Reply-To: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+References: <fe7df03a9a2fa434ebce38b2cd5e6da42f8b2692.1631166322.git.ps@pks.im>
+From: Patrick Steinhardt <ps@pks.im>
+Date: Wed, 1 Sep 2021 15:10:06 +0200
+Subject: [PATCH 14/14] fetch: avoid second connectivity check if we already
+ have all objects
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When fetching refs, we are doing two connectivity checks:
+
+ - The first one is done such that we can skip fetching refs in the
+ case where we already have all objects referenced by the updated
+ set of refs.
+
+ - The second one verifies that we have all objects after we have
+ fetched objects.
+
+We always execute both connectivity checks, but this is wasteful in case
+the first connectivity check already notices that we have all objects
+locally available.
+
+Skip the second connectivity check in case we already had all objects
+available. This gives us a nice speedup when doing a mirror-fetch in a
+repository with about 2.3M refs where the fetching repo already has all
+objects:
+
+ Benchmark #1: HEAD~: git-fetch
+ Time (mean ± σ): 30.025 s ± 0.081 s [User: 27.070 s, System: 4.933 s]
+ Range (min … max): 29.900 s … 30.111 s 5 runs
+
+ Benchmark #2: HEAD: git-fetch
+ Time (mean ± σ): 25.574 s ± 0.177 s [User: 22.855 s, System: 4.683 s]
+ Range (min … max): 25.399 s … 25.765 s 5 runs
+
+ Summary
+ 'HEAD: git-fetch' ran
+ 1.17 ± 0.01 times faster than 'HEAD~: git-fetch'
+
+Signed-off-by: Patrick Steinhardt <ps@pks.im>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ builtin/fetch.c | 7 +++----
+ 1 file changed, 3 insertions(+), 4 deletions(-)
+
+diff --git a/builtin/fetch.c b/builtin/fetch.c
+index a1e17edd8b..e2c952ec67 100644
+--- a/builtin/fetch.c
++++ b/builtin/fetch.c
+@@ -1293,7 +1293,7 @@ static int check_exist_and_connected(struct ref *ref_map)
+
+ static int fetch_and_consume_refs(struct transport *transport, struct ref *ref_map)
+ {
+- int connectivity_checked;
++ int connectivity_checked = 1;
+ int ret;
+
+ /*
+@@ -1307,11 +1307,10 @@ static int fetch_and_consume_refs(struct transport *transport, struct ref *ref_m
+ trace2_region_leave("fetch", "fetch_refs", the_repository);
+ if (ret)
+ goto out;
++ connectivity_checked = transport->smart_options ?
++ transport->smart_options->connectivity_checked : 0;
+ }
+
+- connectivity_checked = transport->smart_options
+- ? transport->smart_options->connectivity_checked : 0;
+-
+ trace2_region_enter("fetch", "consume_refs", the_repository);
+ ret = store_updated_refs(transport->url,
+ transport->remote->name,
+--
+2.33.0
+