Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-07-18Merge branch 'tb/repack-cleanup'Junio C Hamano
The recent change to "git repack" made it react less nicely when a leftover .idx file that no longer has the corresponding .pack file in the repository, which has been corrected. * tb/repack-cleanup: builtin/repack.c: avoid dir traversal in `collect_pack_filenames()` builtin/repack.c: only repack `.pack`s that exist
2023-07-17Merge branch 'cw/compat-util-header-cleanup'Junio C Hamano
Further shuffling of declarations across header files to streamline file dependencies. * cw/compat-util-header-cleanup: git-compat-util: move alloc macros to git-compat-util.h treewide: remove unnecessary includes for wrapper.h kwset: move translation table from ctype sane-ctype.h: create header for sane-ctype macros git-compat-util: move wrapper.c funcs to its header git-compat-util: move strbuf.c funcs to its header
2023-07-11builtin/repack.c: avoid dir traversal in `collect_pack_filenames()`Taylor Blau
When repacking, the function `collect_pack_filenames()` is responsible for collecting the set of existing packs in the repository, and partitioning them into "kept" (if the pack has a ".keep" file or was given via `--keep-pack`) and "nonkept" (otherwise) lists. This function comes from the original C port of git-repack.sh from back in a1bbc6c0176 (repack: rewrite the shell script in C, 2013-09-15), where it first appears as `get_non_kept_pack_filenames()`. At the time, the implementation was a fairly direct translation from the relevant portion of git-repack.sh, which looped over the results of find "$PACKDIR" -type f -name '*.pack' either ignoring the pack as kept, or adding it to the list of existing packs. So the choice to directly translate this function in terms of `readdir()` in a1bbc6c0176 made sense. At the time, it was possible to refine the C version in terms of packed_git structs, but was never done. However, manually enumerating a repository's packs via `readdir()` is confusing and error-prone. It leads to frustrating inconsistencies between which packs Git considers to be part of a repository (i.e., could be found in the list of packs from `get_all_packs()`), and which packs `collect_pack_filenames()` considers to meet the same criteria. This bit us in 73320e49ad (builtin/repack.c: only collect fully-formed packs, 2023-06-07), and again in the previous commit. Prevent these issues from biting us in the future by implementing the `collect_pack_filenames()` function by looping over an array of pointers to `packed_git` structs, ensuring that we use the same criteria to determine the set of available packs. One gotcha here is that we have to ignore non-local packs, since the original version of `collect_pack_filenames()` only looks at the local pack directory to collect existing packs. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-11builtin/repack.c: only repack `.pack`s that existDerrick Stolee
In 73320e49add (builtin/repack.c: only collect fully-formed packs, 2023-06-07), we switched the check for which packs to collect by starting at the .idx files and looking for matching .pack files. This avoids trying to repack pack-files that have not had their pack-indexes installed yet. However, it does cause maintenance to halt if we find the (problematic, but not insurmountable) case of a .idx file without a corresponding .pack file. In an environment where packfile maintenance is a critical function, such a hard stop is costly and requires human intervention to resolve (by deleting the .idx file). This was not the case before. We successfully repacked through this scenario until the recent change to scan for .idx files. Further, if we are actually in a case where objects are missing, we detect this at a different point during the reachability walk. In other cases, Git prepares its list of packfiles by scanning .idx files and then only adds it to the packfile list if the corresponding .pack file exists. It even does so without a warning! (See add_packed_git() in packfile.c for details.) This case is much less likely to occur than the failures seen before 73320e49add. Packfiles are "installed" by writing the .pack file before the .idx and that process can be interrupted. Packfiles _should_ be deleted by deleting the .idx first, followed by the .pack file, but unlink_pack_path() does not do this: it deletes the .pack _first_, allowing a window where this process could be interrupted. We leave the consideration of changing this order as a separate concern. Knowing that this condition is possible from interrupted Git processes and not other tools lends some weight that Git should be more flexible around this scenario. Add a check to see if the .pack file exists before adding it to the list for repacking. This will stop a number of maintenance failures seen in production but fixed by deleting the .idx files. This brings us closer to the case before 73320e49add in that 'git repack' will not fail when there is an orphaned .idx file, at least, not due to the way we scan for packfiles. In the case that the .pack file was erroneously deleted without copies of its objects in other installed packfiles, then 'git repack' will fail due to the reachable object walk. This does resolve the case where automated repacks will no longer be halted on this case. The tests in t7700 show both these successful scenarios and the case of failing if the .pack was truly required. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-06Merge branch 'gc/config-context'Junio C Hamano
Reduce reliance on a global state in the config reading API. * gc/config-context: config: pass source to config_parser_event_fn_t config: add kvi.path, use it to evaluate includes config.c: remove config_reader from configsets config: pass kvi to die_bad_number() trace2: plumb config kvi config.c: pass ctx with CLI config config: pass ctx with config files config.c: pass ctx in configsets config: add ctx arg to config_fn_t urlmatch.h: use config_fn_t type config: inline git_color_default_config
2023-07-05git-compat-util: move alloc macros to git-compat-util.hCalvin Wan
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for dynamic array allocation. Moving these macros to git-compat-util.h with the other alloc macros focuses alloc.[ch] to allocation for Git objects and additionally allows us to remove inclusions to alloc.h from files that solely used the above macros. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-30Merge branch 'en/header-split-cache-h-part-3'Junio C Hamano
Header files cleanup. * en/header-split-cache-h-part-3: (28 commits) fsmonitor-ll.h: split this header out of fsmonitor.h hash-ll, hashmap: move oidhash() to hash-ll object-store-ll.h: split this header out of object-store.h khash: name the structs that khash declares merge-ll: rename from ll-merge git-compat-util.h: remove unneccessary include of wildmatch.h builtin.h: remove unneccessary includes list-objects-filter-options.h: remove unneccessary include diff.h: remove unnecessary include of oidset.h repository: remove unnecessary include of path.h log-tree: replace include of revision.h with simple forward declaration cache.h: remove this no-longer-used header read-cache*.h: move declarations for read-cache.c functions from cache.h repository.h: move declaration of the_index from cache.h merge.h: move declarations for merge.c from cache.h diff.h: move declaration for global in diff.c from cache.h preload-index.h: move declarations for preload-index.c from elsewhere sparse-index.h: move declarations for sparse-index.c from cache.h name-hash.h: move declarations for name-hash.c from cache.h run-command.h: move declarations for run-command.c from cache.h ...
2023-06-29config: add ctx arg to config_fn_tGlen Choo
Add a new "const struct config_context *ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context *ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21repository: remove unnecessary include of path.hElijah Newren
This also made it clear that several .c files that depended upon path.h were missing a #include for it; add the missing includes while at it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-12builtin/repack.c: only collect fully-formed packsTaylor Blau
To partition the set of packs based on which ones are "kept" (either they have a .keep file, or were otherwise marked via the `--keep-pack` option) and "non-kept" ones (anything else), `git repack` uses its `collect_pack_filenames()` function. Ordinarily, we would rely on a convenience function such as `get_all_packs()` to enumerate and partition the set of packs. But `collect_pack_filenames()` uses `readdir()` directly to read the contents of the "$GIT_DIR/objects/pack" directory, and adds each entry ending in ".pack" to the appropriate list (either kept, or non-kept as above). This is subtly racy, since `collect_pack_filenames()` may see a pack that is not fully staged (i.e., it is missing its ".idx" file). Ordinarily, this doesn't cause a problem. But it can cause issues when generating a cruft pack. This is because `git repack` feeds (among other things) the list of existing kept packs down to `git pack-objects --cruft` to indicate that any kept packs will not be removed from the repository (so that the cruft pack machinery can avoid packing objects that appear in those packs as cruft). But `read_cruft_objects()` lists packfiles by calling `get_all_packs()`. So if a ".pack" file exists (necessary to get that pack to appear to `collect_pack_filenames()`), but doesn't have a corresponding ".idx" file (necessary to get that pack to appear via `get_all_packs()`), we'll complain with: fatal: could not find pack '.tmp-5841-pack-a6b0150558609c323c496ced21de6f4b66589260.pack' Fix the above by teaching `collect_pack_filenames()` to only collect packs with their corresponding `*.idx` files in place, indicating that those packs have been fully staged. There are a couple of things worth noting: - Since each entry in the `extra_keep` list (which contains the `--keep-pack` names) has a `*.pack` suffix, we'll have to swap the suffix from ".pack" to ".idx", and compare that instead. - Since we use the the `fname_kept_list` to figure out which packs to delete (with `git repack -d`), we would have previously deleted a `*.pack` with no index (since the existince of a ".pack" file is necessary and sufficient to include that pack in the list of existing non-kept packs). Now we will leave it alone (since that pack won't appear in the list). This is far more correct behavior, since we don't want to race with a pack being staged. Deleting a partially staged pack is unlikely, however, since the window of time between staging a pack and moving its .idx file into place is miniscule. Note that this window does *not* include the time it takes to receive and index the pack, since the incoming data goes into "$GIT_DIR/objects/tmp_pack_XXXXXX", which does not end in ".pack" and is thus ignored by collect_pack_filenames(). In the future, this function should probably be rewritten as a callback to `for_each_file_in_pack_dir()`, but this is the simplest change we could do in the short-term. Reported-by: Michael Haggerty <mhagger@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-10Merge branch 'en/header-split-cache-h-part-2'Junio C Hamano
More header clean-up. * en/header-split-cache-h-part-2: (22 commits) reftable: ensure git-compat-util.h is the first (indirect) include diff.h: reduce unnecessary includes object-store.h: reduce unnecessary includes commit.h: reduce unnecessary includes fsmonitor: reduce includes of cache.h cache.h: remove unnecessary headers treewide: remove cache.h inclusion due to previous changes cache,tree: move basic name compare functions from read-cache to tree cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c hash-ll.h: split out of hash.h to remove dependency on repository.h tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h dir.h: move DTYPE defines from cache.h versioncmp.h: move declarations for versioncmp.c functions from cache.h ws.h: move declarations for ws.c functions from cache.h match-trees.h: move declarations for match-trees.c functions from cache.h pkt-line.h: move declarations for pkt-line.c functions from cache.h base85.h: move declarations for base85.c functions from cache.h copy.h: move declarations for copy.c functions from cache.h server-info.h: move declarations for server-info.c functions from cache.h packfile.h: move pack_window and pack_entry from cache.h ...
2023-04-29Merge branch 'tb/enable-cruft-packs-by-default'Junio C Hamano
When "gc" needs to retain unreachable objects, packing them into cruft packs (instead of exploding them into loose object files) has been offered as a more efficient option for some time. Now the use of cruft packs has been made the default and no longer considered an experimental feature. * tb/enable-cruft-packs-by-default: repository.h: drop unused `gc_cruft_packs` builtin/gc.c: make `gc.cruftPacks` enabled by default t/t9300-fast-import.sh: prepare for `gc --cruft` by default t/t6500-gc.sh: add additional test cases t/t6500-gc.sh: refactor cruft pack tests t/t6501-freshen-objects.sh: prepare for `gc --cruft` by default t/t5304-prune.sh: prepare for `gc --cruft` by default builtin/gc.c: ignore cruft packs with `--keep-largest-pack` builtin/repack.c: fix incorrect reference to '-C' pack-write.c: plug a leak in stage_tmp_packfiles()
2023-04-25Merge branch 'ps/fix-geom-repack-with-alternates'Junio C Hamano
Geometric repacking ("git repack --geometric=<n>") in a repository that borrows from an alternate object database had various corner case bugs, which have been corrected. * ps/fix-geom-repack-with-alternates: repack: disable writing bitmaps when doing a local repack repack: honor `-l` when calculating pack geometry t/helper: allow chmtime to print verbosely without modifying mtime pack-objects: extend test coverage of `--stdin-packs` with alternates pack-objects: fix error when same packfile is included and excluded pack-objects: fix error when packing same pack twice pack-objects: split out `--stdin-packs` tests into separate file repack: fix generating multi-pack-index with only non-local packs repack: fix trying to use preferred pack in alternates midx: fix segfault with no packs and invalid preferred pack
2023-04-24server-info.h: move declarations for server-info.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-19builtin/repack.c: fix incorrect reference to '-C'Taylor Blau
When cruft packs were originally being developed, `-C` was designated as the short-form for `--cruft` (as in `git repack -C`). This was dropped due to confusion with Git's top-level `-C` option before submitting to the list. But the reference to it in `--cruft-expiration`'s help text was never updated. Fix that dangling reference in this patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14repack: disable writing bitmaps when doing a local repackPatrick Steinhardt
In order to write a bitmap, we need to have full coverage of all objects that are about to be packed. In the traditional non-multi-pack-index world this meant we need to do a full repack of all objects into a single packfile. But in the new multi-pack-index world we can get away with writing bitmaps when we have multiple packfiles as long as the multi-pack-index covers all objects. This is not always the case though. When asked to perform a repack of local objects, only, then we cannot guarantee to have full coverage of all objects regardless of whether we do a full repack or a repack with a multi-pack-index. The end result is that writing the bitmap will fail in both worlds: $ git multi-pack-index write --stdin-packs --bitmap <packfiles warning: Failed to write bitmap index. Packfile doesn't have full closure (object 1529341d78cf45377407369acb0f4ff2b5cdae42 is missing) error: could not write multi-pack bitmap Now there are two different ways to fix this. The first one would be to amend git-multi-pack-index(1) to disable writing bitmaps when we notice that we don't have full object coverage. - We don't have enough information in git-multi-pack-index(1) in order to tell whether the local repository _should_ have full coverage. Because even when connected to an alternate object directory, it may be the case that we still have all objects around in the main object database. - git-multi-pack-index(1) is quite a low-level tool. Automatically disabling functionality that it was asked to provide does not feel like the right thing to do. We can easily fix it at a higher level in git-repack(1) though. When asked to only include local objects via `-l` and when connected to an alternate object directory then we will override the user's ask and disable writing bitmaps with a warning. This is similar to what we do in git-pack-objects(1), where we also disable writing bitmaps in case we omit an object from the pack. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14repack: honor `-l` when calculating pack geometryPatrick Steinhardt
When the user passes `-l` to git-repack(1), then they essentially ask us to only repack objects part of the local object database while ignoring any packfiles part of an alternate object database. And we in fact honor this bit when doing a geometric repack as the resulting packfile will only ever contain local objects. What we're missing though is that we don't take locality of packfiles into account when computing whether the geometric sequence is intact or not. So even though we would only ever roll up local packfiles anyway, we could end up trying to repack because of non-local packfiles. This does not make much sense, and in the worst case it can cause us to try and do the geometric repack over and over again because we're never able to restore the geometric sequence. Fix this bug by honoring whether the user has passed `-l`. If so, we skip adding any non-local packfiles to the pack geometry. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14repack: fix generating multi-pack-index with only non-local packsPatrick Steinhardt
When writing the multi-pack-index with geometric repacking we will add all packfiles to the index that are part of the geometric sequence. This can potentially also include packfiles borrowed from an alternate object directory. But given that a multi-pack-index can only ever include packs that are part of the main object database this does not make much sense whatsoever. In the edge case where all packfiles are contained in the alternate object database and the local repository has none itself this bug can cause us to invoke git-multi-pack-index(1) with only non-local packfiles that it ultimately cannot find. This causes it to return an error and thus causes the geometric repack to fail. Fix the code to skip non-local packfiles. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14repack: fix trying to use preferred pack in alternatesPatrick Steinhardt
When doing a geometric repack with multi-pack-indices, then we ask git-multi-pack-index(1) to use the largest packfile as the preferred pack. It can happen though that the largest packfile is not part of the main object database, but instead part of an alternate object database. The result is that git-multi-pack-index(1) will not be able to find the preferred pack and print a warning. It then falls back to use the first packfile that the multi-pack-index shall reference. Fix this bug by only considering packfiles as preferred pack that are local. This is the right thing to do given that a multi-pack-index should never reference packfiles borrowed from an alternate. While at it, rename the function `get_largest_active_packfile()` to `get_preferred_pack()` to better document its intent. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-06Merge branch 'en/header-split-cleanup'Junio C Hamano
Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers
2023-04-06Merge branch 'ab/remove-implicit-use-of-the-repository'Junio C Hamano
Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-04Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano
en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-03-28cocci: apply the "promisor-remote.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason
Apply the part of "the_repository.pending.cocci" pertaining to "promisor-remote.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-18Merge branch 'jk/unused-post-2.39-part2'Junio C Hamano
More work towards -Wunused. * jk/unused-post-2.39-part2: (21 commits) help: mark unused parameter in git_unknown_cmd_config() run_processes_parallel: mark unused callback parameters userformat_want_item(): mark unused parameter for_each_commit_graft(): mark unused callback parameter rewrite_parents(): mark unused callback parameter fetch-pack: mark unused parameter in callback function notes: mark unused callback parameters prio-queue: mark unused parameters in comparison functions for_each_object: mark unused callback parameters list-objects: mark unused callback parameters mark unused parameters in signal handlers run-command: mark error routine parameters as unused mark "pointless" data pointers in callbacks ref-filter: mark unused callback parameters http-backend: mark unused parameters in virtual functions http-backend: mark argc/argv unused object-name: mark unused parameters in disambiguate callbacks serve: mark unused parameters in virtual functions serve: use repository pointer to get config ls-refs: drop config caching ...
2023-02-24for_each_object: mark unused callback parametersJeff King
The for_each_{loose,packed}_object interface uses callback functions, but not every callback needs all of the parameters. Mark the unused ones to satisfy -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24alloc.h: move ALLOC_GROW() functions from cache.hElijah Newren
This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-07repack: fix leaks on error with "goto cleanup"Ævar Arnfjörð Bjarmason
In cmd_repack() when we hit an error, replace "return ret" with "goto cleanup" to ensure we free the necessary data structures. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-14Merge branch 'ab/various-leak-fixes'Junio C Hamano
Various leak fixes. * ab/various-leak-fixes: built-ins: use free() not UNLEAK() if trivial, rm dead code revert: fix parse_options_concat() leak cherry-pick: free "struct replay_opts" members rebase: don't leak on "--abort" connected.c: free the "struct packed_git" sequencer.c: fix "opts->strategy" leak in read_strategy_opts() ls-files: fix a --with-tree memory leak revision API: call graph_clear() in release_revisions() unpack-file: fix ancient leak in create_temp_file() built-ins & libs & helpers: add/move destructors, fix leaks dir.c: free "ident" and "exclude_per_dir" in "struct untracked_cache" read-cache.c: clear and free "sparse_checkout_patterns" commit: discard partial cache before (re-)reading it {reset,merge}: call discard_index() before returning tests: mark tests as passing with SANITIZE=leak
2022-11-21built-ins & libs & helpers: add/move destructors, fix leaksÆvar Arnfjörð Bjarmason
Fix various leaks in built-ins, libraries and a test helper here we were missing a call to strbuf_release(), string_list_clear() etc, or were calling them after a potential "return". Comments on individual changes: - builtin/checkout.c: Fix a memory leak that was introduced in [1]. A sibling leak introduced in [2] was recently fixed in [3]. As with [3] we should be using the wt_status_state_free_buffers() API introduced in [4]. - builtin/repack.c: Fix a leak that's been here since this use of "strbuf_release()" was added in a1bbc6c0176 (repack: rewrite the shell script in C, 2013-09-15). We don't use the variable for anything except this loop, so we can instead free it right afterwards. - builtin/rev-parse: Fix a leak that's been here since this code was added in 21d47835386 (Add a parseopt mode to git-rev-parse to bring parse-options to shell scripts., 2007-11-04). - builtin/stash.c: Fix a couple of leaks that have been here since this code was added in d4788af875c (stash: convert create to builtin, 2019-02-25), we strbuf_release()'d only some of the "struct strbuf" we allocated earlier in the function, let's release all of them. - ref-filter.c: Fix a leak in 482c1191869 (gpg-interface: improve interface for parsing tags, 2021-02-11), we don't use the "payload" variable that we ask parse_signature() to populate for us, so let's free it. - t/helper/test-fake-ssh.c: Fix a leak that's been here since this code was added in 3064d5a38c7 (mingw: fix t5601-clone.sh, 2016-01-27). Let's free the "struct strbuf" as soon as we don't need it anymore. 1. c45f0f525de (switch: reject if some operation is in progress, 2019-03-29) 2. 2708ce62d21 (branch: sort detached HEAD based on a flag, 2021-01-07) 3. abcac2e19fa (ref-filter.c: fix a leak in get_head_description, 2022-09-25) 4. 962dd7ebc3e (wt-status: introduce wt_status_state_free_buffers(), 2020-09-27). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-19Merge branch 'tb/repack-expire-to'Taylor Blau
"git repack" learns to send cruft objects out of the way into packfiles outside the repository. * tb/repack-expire-to: builtin/repack.c: implement `--expire-to` for storing pruned objects builtin/repack.c: write cruft packs to arbitrary locations builtin/repack.c: pass "cruft_expiration" to `write_cruft_pack` builtin/repack.c: pass "out" to `prepare_pack_objects`
2022-10-31Merge branch 'jk/repack-tempfile-cleanup'Taylor Blau
The way "git repack" creared temporary files when it received a signal was prone to deadlocking, which has been corrected. * jk/repack-tempfile-cleanup: t7700: annotate cruft-pack failure with ok=sigpipe repack: drop remove_temporary_files() repack: use tempfiles for signal cleanup repack: expand error message for missing pack files repack: populate extension bits incrementally repack: convert "names" util bitfield to array
2022-10-28Merge branch 'tb/remove-unused-pack-bitmap'Junio C Hamano
When creating a multi-pack bitmap, remove per-pack bitmap files unconditionally as they will never be consulted. * tb/remove-unused-pack-bitmap: builtin/repack.c: remove redundant pack-based bitmaps
2022-10-24builtin/repack.c: implement `--expire-to` for storing pruned objectsTaylor Blau
When pruning objects with `--cruft`, `git repack` offers some flexibility when selecting the set of which objects are pruned via the `--cruft-expiration` option. This is useful for expiring objects which are older than the grace period, making races where to-be-pruned objects become reachable and then ancestors of freshly pushed objects, leaving the repository in a corrupt state after pruning substantially less likely [1]. But in practice, such races are impossible to avoid entirely, no matter how long the grace period is. To prevent this race, it is often advisable to temporarily put a repository into a read-only state. But in practice, this is not always practical, and so some middle ground would be nice. This patch introduces a new option, `--expire-to`, which teaches `git repack` to write an additional cruft pack containing just the objects which were pruned from the repository. The caller can specify a directory outside of the current repository as the destination for this second cruft pack. This makes it possible to prune objects from a repository, while still holding onto a supplemental copy of them outside of the original repository. Having this copy on-disk makes it substantially easier to recover objects when the aforementioned race is encountered. `--expire-to` is implemented in a somewhat convoluted manner, which is to take advantage of the fact that the first time `write_cruft_pack()` is called, it adds the name of the cruft pack to the `names` string list. That means the second time we call `write_cruft_pack()`, objects in the previously-written cruft pack will be excluded. As long as the caller ensures that no objects are expired during the second pass, this is sufficient to generate a cruft pack containing all objects which don't appear in any of the new packs written by `git repack`, including the cruft pack. In other words, all of the objects which are about to be pruned from the repository. It is important to note that the destination in `--expire-to` does not necessarily need to be a Git repository (though it can be) Notably, the expired packs do not contain all ancestors of expired objects. So if the source repository contains something like: <unreachable> / C1 --- C2 \ refs/heads/master where C2 is unreachable, but has a parent (C1) which is reachable, and C2 would be pruned, then the expiry pack will contain only C2, not C1. [1]: https://lore.kernel.org/git/20190319001829.GL29661@sigill.intra.peff.net/ Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24builtin/repack.c: write cruft packs to arbitrary locationsTaylor Blau
In the following commit, a new write_cruft_pack() caller will be added which wants to write a cruft pack to an arbitrary location. Prepare for this by adding a parameter which controls the destination of the cruft pack. For now, provide "packtmp" so that this commit does not change any behavior. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24builtin/repack.c: pass "cruft_expiration" to `write_cruft_pack`Taylor Blau
`builtin/repack.c`'s `write_cruft_pack()` is used to generate the cruft pack when `--cruft` is supplied. It uses a static variable "cruft_expiration" which is filled in by option parsing. A future patch will add an `--expire-to` option which allows `git repack` to write a cruft pack containing the pruned objects out to a separate repository. In order to implement this functionality, some callers will have to pass a value for `cruft_expiration` different than the one filled out by option parsing. Prepare for this by teaching `write_cruft_pack` to take a "cruft_expiration" parameter, instead of reading a single static variable. The (sole) existing caller of `write_cruft_pack()` will pass the value for "cruft_expiration" filled in by option parsing, retaining existing behavior. This means that we can make the variable local to `cmd_repack()`, and eliminate the static declaration. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24builtin/repack.c: pass "out" to `prepare_pack_objects`Taylor Blau
`builtin/repack.c`'s `prepare_pack_objects()` is used to prepare a set of arguments to a `pack-objects` process which will generate a desired pack. A future patch will add an `--expire-to` option which allows `git repack` to write a cruft pack containing the pruned objects out to a separate repository. Prepare for this by teaching that function to write packs to an arbitrary location specified by the caller. All existing callers of `prepare_pack_objects()` will pass `packtmp` for `out`, retaining the existing behavior. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22repack: drop remove_temporary_files()Jeff King
After we've successfully finished the repack, we call remove_temporary_files(), which looks for and removes any files matching ".tmp-$$-pack-*", where $$ is the pid of the current process. But this is pointless. If we make it this far in the process, we've already renamed these tempfiles into place, and there is nothing left to delete. Nor is there a point in trying to call it to clean up when we _aren't_ successful. It's not safe for using in a signal handler, and the previous commit already handed that job over to the tempfile API. It might seem like it would be useful to clean up stray .tmp files left by other invocations of git-repack. But it won't clean those files; it only matches ones with its pid, and leaves the rest. Fortunately, those are cleaned up naturally by successive calls to git-repack; we'll consider .tmp-*.pack the same as normal packfiles, so "repack -ad", etc, will roll up their contents and eventually delete them. The one case that could matter is if pack-objects generates an extension we don't know about, like ".tmp-pack-$$-$hash.some-new-ext". The current code will quietly delete such a file, while after this patch we'd leave it in place. In practice this doesn't happen, and would be indicative of a bug. Leaving the file as cruft is arguably a better behavior, as it means somebody is more likely to eventually notice and fix the bug. If we really wanted to be paranoid, we could scan for and warn about such files, but that seems like overkill. There's nothing to test with regard to the removal of this function. It was doing nothing, so the behavior should be the same. However, we can verify (and protect) our assumption that "repack -ad" will eventually remove stray files by adding a test for that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22repack: use tempfiles for signal cleanupJeff King
When git-repack exits due to a signal, it tries to clean up by calling its remove_temporary_files() function, which walks through the packs dir looking for ".tmp-$$-pack-*" files to delete (where "$$" is the pid of the current process). The biggest problem here is that remove_temporary_files() is not safe to call in a signal handler. It uses opendir(), which isn't on the POSIX async-signal-safe list. The details will be platform-specific, but a likely issue is that it needs to allocate memory; if we receive a signal while inside malloc(), etc, we'll conflict on the allocator lock and deadlock with ourselves. We can fix this by just cleaning up the files directly, without walking the directory. We already know the complete list of .tmp-* files that were generated, because we recorded them via populate_pack_exts(). When we find files there, we can use register_tempfile() to record the filenames. If we receive a signal, then the tempfile API will clean them up for us, and it's async-safe and pretty battle-tested. Note that this is slightly racier than the existing scheme. We don't record the filenames until pack-objects tells us the hash over stdout. So during the period between it generating the file and reporting the hash, we'd fail to clean up. However, that period is very small. During most of the pack generation process pack-objects is using its own internal tempfiles. It's only at the very end that it moves them into the names git-repack expects, and then it immediately reports the name to us. Given that cleanup like this is best effort (after all, we may get SIGKILL), this level of race is acceptable. When we register the tempfiles, we'll record them locally and use the result to call rename_tempfile(), rather than renaming by hand. This isn't strictly necessary, as once we've renamed the files they're gone, and the tempfile API's cleanup unlink() would simply become a pointless noop. But managing the lifetimes of the tempfile objects is the cleanest thing to do, and the tempfile pointers naturally fill the same role as the old booleans. This patch also fixes another small problem. We only hook signals, and don't set up an atexit handler. So if we see an error that causes us to die(), we'll leave the .tmp-* files in place. But since the tempfile API handles this for us, this is now fixed for free. The new test covers this by stimulating a failure of pack-objects when generating a cruft pack. Before this patch, the .tmp-* file for the main pack would have been left, but now we correctly clean it up. Two small subtleties on the implementation: - in the renaming loop, we can stop re-constructing fname_old; we only use it when we have a tempfile to rename, so we can just ask the tempfile for its path (which, barring bugs, should be identical) - when renaming fails, our error message mentions fname_old. But since a failed rename_tempfile() invalidates the tempfile struct, we'll lose access to that string. Instead, let's mention the destination filename, which is what most other callers do. Reported-by: Jan Pokorný <poki@fnusa.cz> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22repack: expand error message for missing pack filesJeff King
If pack-objects tells us it generated pack $hash, we expect to find .tmp-$$-pack-$hash.pack, .idx, .rev, and so on. Some of these files are optional, but others are not. For the required ones, we'll bail with an error if any of them is missing. The error message is just "missing required file", which is a bit vague. We should be more clear that it is not the user's fault, but rather that the sub-pgoram we called is not operating as expected. In practice, nobody should ever see this message, as it would generally only be caused by a bug in Git. It probably doesn't make sense to convert this to a BUG(), though, as there are other (unlikely) possibilities, such as somebody else racily deleting the files, filesystem errors causing stat() to fail, and so on. A nice side effect here is that we stop relying on fname_old in this code path, which will let us deal with it only in the first part of the conditional. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22repack: populate extension bits incrementallyJeff King
After generating the main pack and then any additional cruft packs, we iterate over the "names" list (which contains hashes of packs generated by pack-objects), and call populate_pack_exts() for each. There's one small problem with this. In repack_promisor_objects(), we may add entries to "names" and call populate_pack_exts() for them. Calling it again is mostly just wasteful, as we'll stat() the filename with each possible extension, get the same result, and just overwrite our bits. So we could drop the call there, and leave the final loop to populate all of the bits. But instead, this patch does the reverse: drops the final loop, and teaches the other two sites to populate the bits as they add entries. This makes the code easier to reason about, as you never have to worry about when the util field is valid; it is always valid for each entry. It also serves my ulterior purpose: recording the generated filenames as soon as possible will make it easier for a future patch to use them for cleaning up from a failed operation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22repack: convert "names" util bitfield to arrayJeff King
We keep a string_list "names" containing the hashes of packs generated on our behalf by pack-objects. The util field of each item is treated as a bitfield that tells us which extensions (.pack, .idx, .rev, etc) are present for each name. Let's switch this to allocating a real array. That will give us room in a future patch to store more data than just a single bit per extension. And it makes the code a little easier to read, as we avoid casting back and forth between uintptr_t and a void pointer. Since the only thing we're storing is an array, we could just allocate it directly. But instead I've put it into a named struct here. That further increases readability around the casts, and in particular helps differentiate us from other string_lists in the same file which use their util field differently. E.g., the existing_*_packs lists still do bit-twiddling, but their bits have different meaning than the ones in "names". This makes it hard to grep around the code to see how the util fields are used; now you can look for "generated_pack_data". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-18repack: don't remove .keep packs with `--pack-kept-objects`Taylor Blau
`git repack` supports a `--pack-kept-objects` flag which more or less translates to whether or not we pass `--honor-pack-keep` down to `git pack-objects` when assembling a new pack. This behavior has existed since ee34a2bead (repack: add `repack.packKeptObjects` config var, 2014-03-03). In that commit, the documentation was extended to say: [...] Note that we still do not delete `.keep` packs after `pack-objects` finishes. Unfortunately, this is not the case when `--pack-kept-objects` is combined with a `--geometric` repack. When doing a geometric repack, we include `.keep` packs when enumerating available packs only when `pack_kept_objects` is set. So this all works fine when `--no-pack-kept-objects` (or similar) is given. Kept packs are excluded from the geometric roll-up, so when we go to delete redundant packs (with `-d`), no `.keep` packs appear "below the split" in our geometric progression. But when `--pack-kept-objects` is given, things can go awry. Namely, when a kept pack is included in the list of packs tracked by the `pack_geometry` struct *and* part of the pack roll-up, we will delete the `.keep` pack when we shouldn't. Note that this *doesn't* result in object corruption, since the `.keep` pack's objects are still present in the new pack. But the `.keep` pack itself is removed, which violates our promise from back in ee34a2bead. But there's more. Because `repack` computes the geometric roll-up independently from selecting which packs belong in a MIDX (with `--write-midx`), this can lead to odd behavior. Consider when a `.keep` pack appears below the geometric split (ie., its objects will be part of the new pack we generate). We'll write a MIDX containing the new pack along with the existing `.keep` pack. But because the `.keep` pack appears below the geometric split line, we'll (incorrectly) try to remove it. While this doesn't corrupt the repository, it does cause us to remove the MIDX we just wrote, since removing that pack would invalidate the new MIDX. Funny enough, this behavior became far less noticeable after e4d0c11c04 (repack: respect kept objects with '--write-midx -b', 2021-12-20), which made `pack_kept_objects` be enabled by default only when we were writing a non-MIDX bitmap. But e4d0c11c04 didn't resolve this bug, it just made it harder to notice unless callers explicitly passed `--pack-kept-objects`. The solution is to avoid trying to remove `.keep` packs during `--geometric` repacks, even when they appear below the geometric split line, which is the approach this patch implements. Co-authored-by: Victoria Dye <vdye@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-18builtin/repack.c: remove redundant pack-based bitmapsTaylor Blau
When we write a MIDX bitmap after repacking, it is possible that the repository would be left in a state with both pack- and multi-pack reachability bitmaps. This can occur, for instance, if a pack that was kept (either by having a .keep file, or during a geometric repack in which it is not rolled up) has a bitmap file, and the repack wrote a multi-pack index and bitmap. When loading a reachability bitmap for the repository, the multi-pack one is always preferred, so the pack-based one is redundant. Let's remove it unconditionally, even if '-d' isn't passed, since there is no practical reason to keep both around. The patch below does just that. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01git-compat-util.h: use "UNUSED", not "UNUSED(var)"Ævar Arnfjörð Bjarmason
As reported in [1] the "UNUSED(var)" macro introduced in 2174b8c75de (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of 9b240347543 (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19refs: mark unused each_ref_fn parametersJeff King
Functions used with for_each_ref(), etc, need to conform to the each_ref_fn interface. But most of them don't need every parameter; let's annotate the unused ones to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-06cocci: generalize "unused" rule to cover more than "strbuf"Ævar Arnfjörð Bjarmason
Generalize the newly added "unused.cocci" rule to find more than just "struct strbuf", let's have it find the same unused patterns for "struct string_list", as well as other code that uses similar-looking *_{release,clear,free}() and {release,clear,free}_*() functions. We're intentionally loose in accepting e.g. a "strbuf_init(&sb)" followed by a "string_list_clear(&sb, 0)". It's assumed that the compiler will catch any such invalid code, i.e. that our constructors/destructors don't take a "void *". See [1] for example of code that would be covered by the "get_worktrees()" part of this rule. We'd still need work that the series is based on (we were passing "worktrees" to a function), but could now do the change in [1] automatically. 1. https://lore.kernel.org/git/Yq6eJFUPPTv%2Fzc0o@coredump.intra.peff.net/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>