Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTaylor Blau <me@ttaylorr.com>2022-10-24 21:43:12 +0300
committerJunio C Hamano <gitster@pobox.com>2022-10-24 23:39:42 +0300
commit91badeba32d31d4dcc695a8888e5b697b4c3d90c (patch)
tree36d2edf54c99a2a237e68dab6d12e77ef8d4ce57 /t/t7700-repack.sh
parentc12cda479eb103bc364e299a2e5654a7165df3cd (diff)
builtin/repack.c: implement `--expire-to` for storing pruned objects
When pruning objects with `--cruft`, `git repack` offers some flexibility when selecting the set of which objects are pruned via the `--cruft-expiration` option. This is useful for expiring objects which are older than the grace period, making races where to-be-pruned objects become reachable and then ancestors of freshly pushed objects, leaving the repository in a corrupt state after pruning substantially less likely [1]. But in practice, such races are impossible to avoid entirely, no matter how long the grace period is. To prevent this race, it is often advisable to temporarily put a repository into a read-only state. But in practice, this is not always practical, and so some middle ground would be nice. This patch introduces a new option, `--expire-to`, which teaches `git repack` to write an additional cruft pack containing just the objects which were pruned from the repository. The caller can specify a directory outside of the current repository as the destination for this second cruft pack. This makes it possible to prune objects from a repository, while still holding onto a supplemental copy of them outside of the original repository. Having this copy on-disk makes it substantially easier to recover objects when the aforementioned race is encountered. `--expire-to` is implemented in a somewhat convoluted manner, which is to take advantage of the fact that the first time `write_cruft_pack()` is called, it adds the name of the cruft pack to the `names` string list. That means the second time we call `write_cruft_pack()`, objects in the previously-written cruft pack will be excluded. As long as the caller ensures that no objects are expired during the second pass, this is sufficient to generate a cruft pack containing all objects which don't appear in any of the new packs written by `git repack`, including the cruft pack. In other words, all of the objects which are about to be pruned from the repository. It is important to note that the destination in `--expire-to` does not necessarily need to be a Git repository (though it can be) Notably, the expired packs do not contain all ancestors of expired objects. So if the source repository contains something like: <unreachable> / C1 --- C2 \ refs/heads/master where C2 is unreachable, but has a parent (C1) which is reachable, and C2 would be pruned, then the expiry pack will contain only C2, not C1. [1]: https://lore.kernel.org/git/20190319001829.GL29661@sigill.intra.peff.net/ Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 't/t7700-repack.sh')
-rwxr-xr-xt/t7700-repack.sh121
1 files changed, 121 insertions, 0 deletions
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index ca45c4cd2c..17ee6fc2cc 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -482,4 +482,125 @@ test_expect_success '-n overrides repack.updateServerInfo=true' '
test_server_info_missing
'
+test_expect_success '--expire-to stores pruned objects (now)' '
+ git init expire-to-now &&
+ (
+ cd expire-to-now &&
+
+ git branch -M main &&
+
+ test_commit base &&
+
+ git checkout -b cruft &&
+ test_commit --no-tag cruft &&
+
+ git rev-list --objects --no-object-names main..cruft >moved.raw &&
+ sort moved.raw >moved.want &&
+
+ git rev-list --all --objects --no-object-names >expect.raw &&
+ sort expect.raw >expect &&
+
+ git checkout main &&
+ git branch -D cruft &&
+ git reflog expire --all --expire=all &&
+
+ git init --bare expired.git &&
+ git repack -d \
+ --cruft --cruft-expiration="now" \
+ --expire-to="expired.git/objects/pack/pack" &&
+
+ expired="$(ls expired.git/objects/pack/pack-*.idx)" &&
+ test_path_is_file "${expired%.idx}.mtimes" &&
+
+ # Since the `--cruft-expiration` is "now", the effective
+ # behavior is to move _all_ unreachable objects out to
+ # the location in `--expire-to`.
+ git show-index <$expired >expired.raw &&
+ cut -d" " -f2 expired.raw | sort >expired.objects &&
+ git rev-list --all --objects --no-object-names \
+ >remaining.objects &&
+
+ # ...in other words, the combined contents of this
+ # repository and expired.git should be the same as the
+ # set of objects we started with.
+ cat expired.objects remaining.objects | sort >actual &&
+ test_cmp expect actual &&
+
+ # The "moved" objects (i.e., those in expired.git)
+ # should be the same as the cruft objects which were
+ # expired in the previous step.
+ test_cmp moved.want expired.objects
+ )
+'
+
+test_expect_success '--expire-to stores pruned objects (5.minutes.ago)' '
+ git init expire-to-5.minutes.ago &&
+ (
+ cd expire-to-5.minutes.ago &&
+
+ git branch -M main &&
+
+ test_commit base &&
+
+ # Create two classes of unreachable objects, one which
+ # is older than 5 minutes (stale), and another which is
+ # newer (recent).
+ for kind in stale recent
+ do
+ git checkout -b $kind main &&
+ test_commit --no-tag $kind || return 1
+ done &&
+
+ git rev-list --objects --no-object-names main..stale >in &&
+ stale="$(git pack-objects $objdir/pack/pack <in)" &&
+ mtime="$(test-tool chmtime --get =-600 $objdir/pack/pack-$stale.pack)" &&
+
+ # expect holds the set of objects we expect to find in
+ # this repository after repacking
+ git rev-list --objects --no-object-names recent >expect.raw &&
+ sort expect.raw >expect &&
+
+ # moved.want holds the set of objects we expect to find
+ # in expired.git
+ git rev-list --objects --no-object-names main..stale >out &&
+ sort out >moved.want &&
+
+ git checkout main &&
+ git branch -D stale recent &&
+ git reflog expire --all --expire=all &&
+ git prune-packed &&
+
+ git init --bare expired.git &&
+ git repack -d \
+ --cruft --cruft-expiration=5.minutes.ago \
+ --expire-to="expired.git/objects/pack/pack" &&
+
+ # Some of the remaining objects in this repository are
+ # unreachable, so use `cat-file --batch-all-objects`
+ # instead of `rev-list` to get their names
+ git cat-file --batch-all-objects --batch-check="%(objectname)" \
+ >remaining.objects &&
+ sort remaining.objects >actual &&
+ test_cmp expect actual &&
+
+ (
+ cd expired.git &&
+
+ expired="$(ls objects/pack/pack-*.mtimes)" &&
+ test-tool pack-mtimes $(basename $expired) >out &&
+ cut -d" " -f1 out | sort >../moved.got &&
+
+ # Ensure that there are as many objects with the
+ # expected mtime as were moved to expired.git.
+ #
+ # In other words, ensure that the recorded
+ # mtimes of any moved objects was written
+ # correctly.
+ grep " $mtime$" out >matching &&
+ test_line_count = $(wc -l <../moved.want) matching
+ ) &&
+ test_cmp moved.want moved.got
+ )
+'
+
test_done