Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-29builtin/pack-objects.c: support `--max-pack-size` with `--cruft`Taylor Blau
When pack-objects learned the `--cruft` option back in b757353676 (builtin/pack-objects.c: --cruft without expiration, 2022-05-20), we explicitly forbade `--cruft` with `--max-pack-size`. At the time, there was no specific rationale given in the patch for not supporting the `--max-pack-size` option with `--cruft`. (As best I can remember, it's because we were trying to push users towards only ever having a single cruft pack, but I cannot be sure). However, `--max-pack-size` is flexible enough that it already works with `--cruft` and can shard unreachable objects across multiple cruft packs, creating separate ".mtimes" files as appropriate. In fact, the `--max-pack-size` option worked with `--cruft` as far back as b757353676! This is because we overwrite the `written_list`, and pass down the appropriate length, i.e. the number of objects written in each pack shard. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-13gc: introduce `gc.recentObjectsHook`Taylor Blau
This patch introduces a new multi-valued configuration option, `gc.recentObjectsHook` as a means to mark certain objects as recent (and thus exempt from garbage collection), regardless of their age. When performing a garbage collection operation on a repository with unreachable objects, Git makes its decision on what to do with those object(s) based on how recent the objects are or not. Generally speaking, unreachable-but-recent objects stay in the repository, and older objects are discarded. However, we have no convenient way to keep certain precious, unreachable objects around in the repository, even if they have aged out and would be pruned. Our options today consist of: - Point references at the reachability tips of any objects you consider precious, which may be undesirable or infeasible if there are many such objects. - Track them via the reflog, which may be undesirable since the reflog's lifetime is limited to that of the reference it's tracking (and callers may want to keep those unreachable objects around for longer). - Extend the grace period, which may keep around other objects that the caller *does* want to discard. - Manually modify the mtimes of objects you want to keep. If those objects are already loose, this is easy enough to do (you can just enumerate and `touch -m` each one). But if they are packed, you will either end up modifying the mtimes of *all* objects in that pack, or be forced to write out a loose copy of that object, both of which may be undesirable. Even worse, if they are in a cruft pack, that requires modifying its `*.mtimes` file by hand, since there is no exposed plumbing for this. - Force the caller to construct the pack of objects they want to keep themselves, and then mark the pack as kept by adding a ".keep" file. This works, but is burdensome for the caller, and having extra packs is awkward as you roll forward your cruft pack. This patch introduces a new option to the above list via the `gc.recentObjectsHook` configuration, which allows the caller to specify a program (or set of programs) whose output is treated as a set of objects to treat as recent, regardless of their true age. The implementation is straightforward. Git enumerates recent objects via `add_unseen_recent_objects_to_traversal()`, which enumerates loose and packed objects, and eventually calls add_recent_object() on any objects for which `want_recent_object()`'s conditions are met. This patch modifies the recency condition from simply "is the mtime of this object more recent than the cutoff?" to "[...] or, is this object mentioned by at least one `gc.recentObjectsHook`?". Depending on whether or not we are generating a cruft pack, this allows the caller to do one of two things: - If generating a cruft pack, the caller is able to retain additional objects via the cruft pack, even if they would have otherwise been pruned due to their age. - If not generating a cruft pack, the caller is likewise able to retain additional objects as loose. A potential alternative here is to introduce a new mode to alter the contents of the reachable pack instead of the cruft one. One could imagine a new option to `pack-objects`, say `--extra-reachable-tips` that does the same thing as above, adding the visited set of objects along the traversal to the pack. But this has the unfortunate side-effect of altering the reachability closure of that pack. If parts of the unreachable object graph mentioned by one or more of the "extra reachable tips" programs is not closed, then the resulting pack won't be either. This makes it impossible in the general case to write out reachability bitmaps for that pack, since closure is a requirement there. Instead, keep these unreachable objects in the cruft pack (or set of unreachable, loose objects) instead, to ensure that we can continue to have a pack containing just reachable objects, which is always safe to write a bitmap over. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-25t5329: notice a failure within a loopJunio C Hamano
We try to write "|| return 1" (or "|| exit 1" in a subshell) at the end of a sequence of &&-chained command in a loop of our tests, so that a failure of any step during the earlier iteration of the loop can properly be caught. There is one loop in this test script that is used to compute the expected result, that will be later compared with an actual output produced by the "test-tool pack-mtimes" command. This particular loop, however, is placed on the upstream side of a pipe, whose non-zero exit code does not get noticed. Emit a line that will never be produced by the "test-tool pack-mtimes" to cause the later comparison to fail. As we use test_cmp to compare this "expected output" file with the "actual output", the "error message" we are emitting into the expected output stream will stand out and shown to the tester. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-22t: detect and signal failure within loopEric Sunshine
Failures within `for` and `while` loops can go unnoticed if not detected and signaled manually since the loop itself does not abort when a contained command fails, nor will a failure necessarily be detected when the loop finishes since the loop returns the exit code of the last command it ran on the final iteration, which may not be the command which failed. Therefore, detect and signal failures manually within loops using the idiom `|| return 1` (or `|| exit 1` within subshells). Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-16t5329: test 'git gc --cruft' without '--prune=now'Derrick Stolee
Replace a 'git repack --cruft -d' with the wrapper 'git gc --cruft' to exercise some logic in builtin/gc.c that adds the '--cruft' option to the underlying 'git repack' command. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-27sha1-file.c: don't freshen cruft packsTaylor Blau
We don't bother to freshen objects stored in a cruft pack individually by updating the `.mtimes` file. This is because we can't portably `mmap` and write into the middle of a file (i.e., to update the mtime of just one object). Instead, we would have to rewrite the entire `.mtimes` file which may incur some wasted effort especially if there a lot of cruft objects and they are freshened infrequently. Instead, force the freshening code to avoid an optimizing write by writing out the object loose and letting it pick up a current mtime. This works because we prefer the mtime of the loose copy of an object when both a loose and packed one exist (whether or not the packed copy comes from a cruft pack or not). This could certainly do with a test and/or be included earlier in this series/PR, but I want to wait until after I have a chance to clean up the overly-repetitive nature of the cruft pack tests in general. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-27builtin/gc.c: conditionally avoid pruning objects via looseTaylor Blau
Expose the new `git repack --cruft` mode from `git gc` via a new opt-in flag. When invoked like `git gc --cruft`, `git gc` will avoid exploding unreachable objects as loose ones, and instead create a cruft pack and `.mtimes` file. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-27builtin/repack.c: add cruft packs to MIDX during geometric repackTaylor Blau
When using cruft packs, the following race can occur when a geometric repack that writes a MIDX bitmap takes place afterwords: - First, create an unreachable object and do an all-into-one cruft repack which stores that object in the repository's cruft pack. - Then make that object reachable. - Finally, do a geometric repack and write a MIDX bitmap. Assuming that we are sufficiently unlucky as to select a commit from the MIDX which reaches that object for bitmapping, then the `git multi-pack-index` process will complain that that object is missing. The reason is because we don't include cruft packs in the MIDX when doing a geometric repack. Since the "make that object reachable" doesn't necessarily mean that we'll create a new copy of that object in one of the packs that will get rolled up as part of a geometric repack, it's possible that the MIDX won't see any copies of that now-reachable object. Of course, it's desirable to avoid including cruft packs in the MIDX because it causes the MIDX to store a bunch of objects which are likely to get thrown away. But excluding that pack does open us up to the above race. This patch demonstrates the bug, and resolves it by including cruft packs in the MIDX even when doing a geometric repack. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-27builtin/repack.c: allow configuring cruft pack generationTaylor Blau
In servers which set the pack.window configuration to a large value, we can wind up spending quite a lot of time finding new bases when breaking delta chains between reachable and unreachable objects while generating a cruft pack. Introduce a handful of `repack.cruft*` configuration variables to control the parameters used by pack-objects when generating a cruft pack. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-27builtin/repack.c: support generating a cruft packTaylor Blau
Expose a way to split the contents of a repository into a main and cruft pack when doing an all-into-one repack with `git repack --cruft -d`, and a complementary configuration variable. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-27builtin/pack-objects.c: --cruft with expirationTaylor Blau
In a previous patch, pack-objects learned how to generate a cruft pack so long as no objects are dropped. This patch teaches pack-objects to handle the case where a non-never `--cruft-expiration` value is passed. This case is slightly more complicated than before, because we want pack-objects to save unreachable objects which would have been pruned when there is another recent (i.e., non-prunable) unreachable object which reaches the other. We'll call these objects "unreachable but reachable-from-recent". Here is how pack-objects handles `--cruft-expiration`: - Instead of adding all objects outside of the kept pack(s) into the packing list, only handle the ones whose mtime is within the grace period. - Construct a reachability traversal whose tips are the unreachable-but-recent objects. - Then, walk along that traversal, stopping if we reach an object in the kept pack. At each step along the traversal, we add the object we are visiting to the packing list. In the majority of these cases, any object we visit in this traversal will already be in our packing list. But we will sometimes encounter reachable-from-recent cruft objects, which we want to retain even if they aged out of the grace period. The most subtle point of this process is that we actually don't need to bother to update the rescued object's mtime. Even though we will write an .mtimes file with a value that is older than the expiration window, it will continue to survive cruft repacks so long as any objects which reach it haven't aged out. That is, a future repack will also exclude that object from the initial packing list, only to discover it later on when doing the reachability traversal. Finally, stopping early once an object is found in a kept pack is safe to do because the kept packs ordinarily represent which packs will survive after repacking. Assuming that it _isn't_ safe to halt a traversal early would mean that there is some ancestor object which is missing, which implies repository corruption (i.e., the complete set of reachable objects isn't present). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-27builtin/pack-objects.c: --cruft without expirationTaylor Blau
Teach `pack-objects` how to generate a cruft pack when no objects are dropped (i.e., `--cruft-expiration=never`). Later patches will teach `pack-objects` how to generate a cruft pack that prunes objects. When generating a cruft pack which does not prune objects, we want to collect all unreachable objects into a single pack (noting and updating their mtimes as we accumulate them). Ordinary use will pass the result of a `git repack -A` as a kept pack, so when this patch says "kept pack", readers should think "reachable objects". Generating a non-expiring cruft packs works as follows: - Callers provide a list of every pack they know about, and indicate which packs are about to be removed. - All packs which are going to be removed (we'll call these the redundant ones) are marked as kept in-core. Any packs the caller did not mention (but are known to the `pack-objects` process) are also marked as kept in-core. Packs not mentioned by the caller are assumed to be unknown to them, i.e., they entered the repository after the caller decided which packs should be kept and which should be discarded. Since we do not want to include objects in these "unknown" packs (because we don't know which of their objects are or aren't reachable), these are also marked as kept in-core. - Then, we enumerate all objects in the repository, and add them to our packing list if they do not appear in an in-core kept pack. This results in a new cruft pack which contains all known objects that aren't included in the kept packs. When the kept pack is the result of `git repack -A`, the resulting pack contains all unreachable objects. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>