From 05b9013b7181e0c842517ce76aeab25a56670dc0 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 18 Apr 2023 16:40:38 -0400 Subject: builtin/gc.c: ignore cruft packs with `--keep-largest-pack` When cruft packs were implemented, we never adjusted the code for `git gc`'s `--keep-largest-pack` and `gc.bigPackThreshold` to ignore cruft packs. This option and configuration option share a common implementation, but including cruft packs is wrong in both cases: - Running `git gc --keep-largest-pack` in a repository where the largest pack is the cruft pack itself will make it impossible for `git gc` to prune objects, since the cruft pack itself is kept. - The same is true for `gc.bigPackThreshold`, if the size of the cruft pack exceeds the limit set by the caller. In the future, it is possible that `gc.bigPackThreshold` could be used to write a separate cruft pack containing any new unreachable objects that entered the repository since the last time a cruft pack was written. There are some complexities to doing so, mainly around handling pruning objects that are in an existing cruft pack that is above the threshold (which would either need to be rewritten, or else delay pruning). Rewriting a substantially similar cruft pack isn't ideal, but it is significantly better than the status-quo. If users have large cruft packs that they don't want to rewrite, they can mark them as `*.keep` packs. But in general, if a repository has a cruft pack that is so large it is slowing down GC's, it should probably be pruned anyway. In the meantime, ignore cruft packs in the common implementation for both of these options, and add a pair of tests to prevent any future regressions here. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- Documentation/git-gc.txt | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'Documentation/git-gc.txt') diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index a65c9aa62d..fef382a70f 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -77,9 +77,10 @@ be performed as well. instance running on this repository. --keep-largest-pack:: - All packs except the largest pack and those marked with a - `.keep` files are consolidated into a single pack. When this - option is used, `gc.bigPackThreshold` is ignored. + All packs except the largest non-cruft pack, any packs marked + with a `.keep` file, and any cruft pack(s) are consolidated into + a single pack. When this option is used, `gc.bigPackThreshold` + is ignored. AGGRESSIVE ---------- -- cgit v1.2.3 From e3e24de1bf1fd443978015fe06bb523dc85a3086 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 18 Apr 2023 16:40:57 -0400 Subject: builtin/gc.c: make `gc.cruftPacks` enabled by default Back in 5b92477f89 (builtin/gc.c: conditionally avoid pruning objects via loose, 2022-05-20), `git gc` learned the `--cruft` option and `gc.cruftPacks` configuration to opt-in to writing cruft packs when collecting or pruning unreachable objects. Cruft packs were introduced with the merge in a50036da1a (Merge branch 'tb/cruft-packs', 2022-06-03). They address the problem of "loose object explosions", where Git will write out many individual loose objects when there is a large number of unreachable objects that have not yet aged past `--prune=`. Instead of keeping track of those unreachable yet recent objects via their loose object file's mtime, cruft packs collect all unreachable objects into a single pack with a corresponding `*.mtimes` file that acts as a table to store the mtimes of all unreachable objects. This prevents the need to store unreachable objects as loose as they age out of the repository, and avoids the problem of loose object explosions. Beyond avoiding loose object explosions, cruft packs also act as a more efficient mechanism to store unreachable objects as they age out of a repository. This is because pairs of similar unreachable objects serve as delta bases for one another. In 5b92477f89, the feature was introduced as experimental. Since then, GitHub has been running these patches in every repository generating hundreds of millions of cruft packs along the way. The feature is battle-tested, and avoids many pathological cases such as above. Users who either run `git gc` manually, or via `git maintenance` can benefit from having cruft packs. As such, enable cruft pack generation to take place by default (by making `gc.cruftPacks` have the default of "true" rather than "false). Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- Documentation/git-gc.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'Documentation/git-gc.txt') diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index fef382a70f..90806fd26a 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -54,9 +54,10 @@ other housekeeping tasks (e.g. rerere, working trees, reflog...) will be performed as well. ---cruft:: +--[no-]cruft:: When expiring unreachable objects, pack them separately into a - cruft pack instead of storing them as loose objects. + cruft pack instead of storing them as loose objects. `--cruft` + is on by default. --prune=:: Prune loose objects older than date (default is 2 weeks ago, -- cgit v1.2.3