Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-04-06Merge branch 'ab/remove-implicit-use-of-the-repository'Junio C Hamano
Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-06Merge branch 'ps/ahead-behind-truncation-fix'Junio C Hamano
Fix unnecessary truncation of generation numbers used in-core. * ps/ahead-behind-truncation-fix: commit-graph: fix truncated generation numbers
2023-04-06Merge branch 'ds/ahead-behind'Junio C Hamano
"git for-each-ref" learns '%(ahead-behind:<base>)' that computes the distances from a single reference point in the history with bunch of commits in bulk. * ds/ahead-behind: commit-reach: add tips_reachable_from_bases() for-each-ref: add ahead-behind format atom commit-reach: implement ahead_behind() logic commit-graph: introduce `ensure_generations_valid()` commit-graph: return generation from memory commit-graph: simplify compute_generation_numbers() commit-graph: refactor compute_topological_levels() for-each-ref: explicitly test no matches for-each-ref: add --stdin option
2023-03-28libs: use "struct repository *" argument, not "the_repository"Ævar Arnfjörð Bjarmason
As can easily be seen from grepping in our sources, we had these uses of "the_repository" in various library code in cases where the function in question was already getting a "struct repository *" argument. Let's use that argument instead. Out of these changes only the changes to "cache-tree.c", "commit-reach.c", "shallow.c" and "upload-pack.c" would have cleanly applied before the migration away from the "repo_*()" wrapper macros in the preceding commits. The rest aren't new, as we'd previously implicitly refer to "the_repository", but it's now more obvious that we were doing the wrong thing all along, and should have used the parameter instead. The change to change "get_index_format_default(the_repository)" in "read-cache.c" to use the "r" variable instead should arguably have been part of [1], or in the subsequent cleanup in [2]. Let's do it here, as can be seen from the initial code in [3] it's not important that we use "the_repository" there, but would prefer to always use the current repository. This change excludes the "the_repository" use in "upload-pack.c"'s upload_pack_advertise(), as the in-flight [4] makes that change. 1. ee1f0c242ef (read-cache: add index.skipHash config option, 2023-01-06) 2. 6269f8eaad0 (treewide: always have a valid "index_state.repo" member, 2023-01-17) 3. 7211b9e7534 (repo-settings: consolidate some config settings, 2019-08-13) 4. <Y/hbUsGPVNAxTdmS@coredump.intra.peff.net> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28cocci: apply the "packfile.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason
Apply the part of "the_repository.pending.cocci" pertaining to "packfile.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28cocci: apply the "commit.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason
Apply the part of "the_repository.pending.cocci" pertaining to "commit.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-27commit-graph: fix truncated generation numbersPatrick Steinhardt
In 80c928d947 (commit-graph: simplify compute_generation_numbers(), 2023-03-20), the code to compute generation numbers was simplified to use the same infrastructure as is used to compute topological levels. This refactoring introduced a bug where the generation numbers are truncated when they exceed UINT32_MAX because we explicitly cast the computed generation number to `uint32_t`. This is not required though: both the computed value and the field of `struct commit_graph_data` are of the same type `timestamp_t` already, so casting to `uint32_t` will cause truncation. This cast can cause us to miscompute generation data overflows: 1. Given a commit with no parents and committer date `UINT32_MAX + 1`. 2. We compute its generation number as `UINT32_MAX + 1`, but truncate it to `1`. 3. We calculate the generation offset via `$generation - $date`, which is thus `1 - (UINT32_MAX + 1)`. The computation underflows and we thus end up with an offset that is bigger than the maximum allowed offset. As a result, we'd be writing generation data overflow information into the commit-graph that is bogus and ultimately not even required. Fix this bug by removing the needless cast. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20commit-graph: introduce `ensure_generations_valid()`Taylor Blau
Use the just-introduced compute_reachable_generation_numbers_1() to implement a function which dynamically computes topological levels (or corrected commit dates) for out-of-graph commits. This will be useful for the ahead-behind algorithm we are about to introduce, which needs accurate topological levels on _all_ commits reachable from the tips in order to avoid over-counting. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20commit-graph: return generation from memoryDerrick Stolee
The commit_graph_generation() method used to report a value of GENERATION_NUMBER_INFINITY if the commit_graph_data_slab had an instance for the given commit but the graph_pos indicated the commit was not in the commit-graph file. However, an upcoming change will introduce the ability to set generation values in-memory without writing the commit-graph file. Thus, we can no longer trust 'graph_pos' to indicate whether or not the generation member can be trusted. Instead, trust the 'generation' member if the commit has a value in the slab _and_ the 'generation' member is non-zero. Otherwise, treat it as GENERATION_NUMBER_INFINITY. This only makes a difference for a very old case for the commit-graph: the very first Git release to write commit-graph files wrote zeroes in the topological level positions. If we are parsing a commit-graph with all zeroes, those commits will now appear to have GENERATION_NUMBER_INFINITY (as if they were not parsed from the commit-graph). I attempted several variations to work around the need for providing an uninitialized 'generation' member, but this was the best one I found. It does require a change to a verification test in t5318 because it reports a different error than the one about non-zero generation numbers. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20commit-graph: simplify compute_generation_numbers()Derrick Stolee
The previous change introduced the generic algorithm compute_reachable_generation_numbers() and used it as the core functionality of compute_topological_levels(). Now, use it as the core functionality of compute_generation_numbers(). The main difference here is that we use generation version 2, which is used in to toggle the logic in compute_generation_from_max() for computing the corrected commit date based on the corrected commit dates of the parent commits (and the commit date of the current commit). It also uses different methods for (get|set)_generation in the vtable in order to store and access the value in the correct places. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20commit-graph: refactor compute_topological_levels()Derrick Stolee
This patch extracts the common code used to compute topological levels and corrected committer dates into a common routine, compute_reachable_generation_numbers(). For ease of reading, it only modifies compute_topological_levels() to use this new routine, leaving compute_generation_numbers() to be modified in the next change. This new routine dispatches to call the necessary functions to get and set the generation number for a given commit through a vtable (the compute_generation_info struct). Computing the generation number itself is done in compute_generation_from_max(), which dispatches its implementation based on the generation version requested, or issuing a BUG() for unrecognized generation versions. This does not use a vtable because the logic depends only on the generation number version, not where the data is being loaded from or being stored to. This is a subtle point that will make more sense in a future change that modifies the in-memory generation values instead of just preparing values for writing to a commit-graph file. This change looks like it adds a lot of new code. However, two upcoming changes will be quite small due to the work being done in this change. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-09use DUP_ARRAYRené Scharfe
Add a semantic patch for replace ALLOC_ARRAY+COPY_ARRAY with DUP_ARRAY to reduce code duplication and apply its results. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-22Merge branch 'ml/commit-graph-expire-dir-leak-fix'Junio C Hamano
A result from opendir() was leaking in the commit-graph expiration codepath, which has been plugged. * ml/commit-graph-expire-dir-leak-fix: commit-graph: Fix missing closedir in expire_commit_graphs
2022-09-19commit-graph: Fix missing closedir in expire_commit_graphsMiaoqian Lin
The function calls opendir() but missing the corresponding closedir() before exit the function. Add missing closedir() to fix it. Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-14Merge branch 'ab/unused-annotation'Junio C Hamano
Undoes 'jk/unused-annotation' topic and redoes it to work around Coccinelle rules misfiring false positives in unrelated codepaths. * ab/unused-annotation: git-compat-util.h: use "deprecated" for UNUSED variables git-compat-util.h: use "UNUSED", not "UNUSED(var)"
2022-09-14Merge branch 'jk/unused-annotation'Junio C Hamano
Annotate function parameters that are not used (but cannot be removed for structural reasons), to prepare us to later compile with -Wunused warning turned on. * jk/unused-annotation: is_path_owned_by_current_uid(): mark "report" parameter as unused run-command: mark unused async callback parameters mark unused read_tree_recursive() callback parameters hashmap: mark unused callback parameters config: mark unused callback parameters streaming: mark unused virtual method parameters transport: mark bundle transport_options as unused refs: mark unused virtual method parameters refs: mark unused reflog callback parameters refs: mark unused each_ref_fn parameters git-compat-util: add UNUSED macro
2022-09-13Merge branch 'jk/rev-list-verify-objects-fix'Junio C Hamano
"git rev-list --verify-objects" ought to inspect the contents of objects and notice corrupted ones, but it didn't when the commit graph is in use, which has been corrected. * jk/rev-list-verify-objects-fix: rev-list: disable commit graph with --verify-objects lookup_commit_in_graph(): use prepare_commit_graph() to check for graph
2022-09-07lookup_commit_in_graph(): use prepare_commit_graph() to check for graphJeff King
We exit early from lookup_commit_in_graph() if the commit_graph pointer is NULL, under the assumption that we don't have a graph to look at. But the graph pointer is lazy-loaded; if no other code happens to have called prepare_commit_graph(), we'll incorrectly assume that one isn't available at all. This has a pretty small performance impact in practice, because the fallback will generally be to call parse_object() instead. That ends up in parse_commit_buffer(), which loads the graph data itself. So the first commit we see won't use the graph, but subsequent ones will. Since using the graph is just an optimization there's generally no user-visible difference, but if you instrument rev-list like so: diff --git a/revision.c b/revision.c index ee702e498a..63c488ffb6 100644 --- a/revision.c +++ b/revision.c @@ -381,6 +381,9 @@ static struct object *get_reference(struct rev_info *revs, const char *name, * parsing commit data from disk. */ commit = lookup_commit_in_graph(revs->repo, oid); + warning("%s %s in commit graph", + commit ? "found" : "did not find", + name); if (commit) object = &commit->object; else and run (in git.git): git commit-graph write --reachable git rev-list origin/master origin/next >/dev/null you'll see that we fail to find the first one: warning: did not find origin/master in commit graph warning: found origin/next in commit graph After this patch, you'll see that we find both: warning: found origin/master in commit graph warning: found origin/next in commit graph Even though the performance implication is small here, there are two important reasons to do this: - it's downright confusing if you are hunting a bug triggered by the use of the commit graph. It may or may not trigger depending on the number and ordering of tips you ask for. - prepare_commit_graph() has other policy logic, too. In particular, if we've loaded a commit graph and then disabled the graph via disable_commit_graph(), that should take precedence. I'm not sure if this can trigger bad behavior in practice. The only caller there is upload-pack's deepen_by_rev_list(), which should be avoiding the commit graph for its traversal tips, but probably wasn't before this patch. Whether you could come up with a case where that mattered is unclear. Still, this is obviously the right thing to be doing. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01git-compat-util.h: use "UNUSED", not "UNUSED(var)"Ævar Arnfjörð Bjarmason
As reported in [1] the "UNUSED(var)" macro introduced in 2174b8c75de (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of 9b240347543 (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19refs: mark unused each_ref_fn parametersJeff King
Functions used with for_each_ref(), etc, need to conform to the each_ref_fn interface. But most of them don't need every parameter; let's annotate the unused ones to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-11Merge branch 'tb/commit-graph-genv2-upgrade-fix' into maintJunio C Hamano
There was a bug in the codepath to upgrade generation information in commit-graph from v1 to v2 format, which has been corrected. source: <cover.1657667404.git.me@ttaylorr.com> * tb/commit-graph-genv2-upgrade-fix: commit-graph: fix corrupt upgrade from generation v1 to v2 commit-graph: introduce `repo_find_commit_pos_in_graph()` t5318: demonstrate commit-graph generation v2 corruption
2022-08-11Merge branch 'hx/lookup-commit-in-graph-fix' into maintJunio C Hamano
A corner case bug where lazily fetching objects from a promisor remote resulted in infinite recursion has been corrected. source: <cover.1656593279.git.hanxin.hx@bytedance.com> * hx/lookup-commit-in-graph-fix: t5330: remove run_with_limited_processses() commit-graph.c: no lazy fetch in lookup_commit_in_graph()
2022-08-03Merge branch 'tb/commit-graph-genv2-upgrade-fix'Junio C Hamano
There was a bug in the codepath to upgrade generation information in commit-graph from v1 to v2 format, which has been corrected. * tb/commit-graph-genv2-upgrade-fix: commit-graph: fix corrupt upgrade from generation v1 to v2 commit-graph: introduce `repo_find_commit_pos_in_graph()` t5318: demonstrate commit-graph generation v2 corruption
2022-07-27Merge branch 'js/commit-graph-parsing-without-repo-settings'Junio C Hamano
API tweak to make it easier to run fuzz testing on commit-graph parser. * js/commit-graph-parsing-without-repo-settings: commit-graph: pass repo_settings instead of repository
2022-07-20Merge branch 'hx/lookup-commit-in-graph-fix'Junio C Hamano
A corner case bug where lazily fetching objects from a promisor remote resulted in infinite recursion has been corrected. * hx/lookup-commit-in-graph-fix: t5330: remove run_with_limited_processses() commit-graph.c: no lazy fetch in lookup_commit_in_graph()
2022-07-16commit-graph: introduce `repo_find_commit_pos_in_graph()`Taylor Blau
Low-level callers in systems that are adjacent to the commit-graph (like the changed-path Bloom filter code) could benefit from being able to call a function like `parse_commit_in_graph()` without modifying the corresponding commit slab data. This is useful in contexts where that slab data is being used to prepare for an upcoming commit-graph write, where Git must be careful to avoid clobbering any of that data during a read operation. Introduce a low-level variant of `parse_commit_in_graph()` which returns the graph position of a given commit only, without modifying any of the slab data. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15commit-graph: pass repo_settings instead of repositoryTaylor Blau
The parse_commit_graph() function takes a 'struct repository *' pointer, but it only ever accesses config settings (either directly or through the .settings field of the repo struct). Move all relevant config settings into the repo_settings struct, and update parse_commit_graph() and its existing callers so that it takes 'struct repo_settings *' instead. Callers of parse_commit_graph() will now need to call prepare_repo_settings() themselves, or initialize a 'struct repo_settings' directly. Prior to ab14d0676c (commit-graph: pass a 'struct repository *' in more places, 2020-09-09), parsing a commit-graph was a pure function depending only on the contents of the commit-graph itself. Commit ab14d0676c introduced a dependency on a `struct repository` pointer, and later commits such as b66d84756f (commit-graph: respect 'commitGraph.readChangedPaths', 2020-09-09) added dependencies on config settings, which were accessed through the `settings` field of the repository pointer. This field was initialized via a call to `prepare_repo_settings()`. Additionally, this fixes an issue in fuzz-commit-graph: In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only in git repos), prepare_repo_settings was changed to issue a BUG() if it is called by a process whose CWD is not a Git repository. The combination of commits mentioned above broke fuzz-commit-graph, which attempts to parse arbitrary fuzzing-engine-provided bytes as a commit graph file. Prior to this change, parse_commit_graph() called prepare_repo_settings(), but since we run the fuzz tests without a valid repository, we are hitting the BUG() from 44c7e62 for every test case. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-01commit-graph.c: no lazy fetch in lookup_commit_in_graph()Han Xin
The commit-graph is used to opportunistically optimize accesses to certain pieces of information on commit objects, and lookup_commit_in_graph() tries to say "no" when the requested commit does not locally exist by returning NULL, in which case the caller can ask for (which may result in on-demand fetching from a promisor remote) and parse the commit object itself. However, it uses a wrong helper, repo_has_object_file(), to do so. This helper not only checks if an object is mmediately available in the local object store, but also tries to fetch from a promisor remote. But the fetch machinery calls lookup_commit_in_graph(), thus causing an infinite loop. We should make lookup_commit_in_graph() expect that a commit given to it can be legitimately missing from the local object store, by using the has_object_file() helper instead. Signed-off-by: Han Xin <hanxin.hx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-04Merge branch 'tb/cruft-packs'Junio C Hamano
A mechanism to pack unreachable objects into a "cruft pack", instead of ejecting them into loose form to be reclaimed later, has been introduced. * tb/cruft-packs: sha1-file.c: don't freshen cruft packs builtin/gc.c: conditionally avoid pruning objects via loose builtin/repack.c: add cruft packs to MIDX during geometric repack builtin/repack.c: use named flags for existing_packs builtin/repack.c: allow configuring cruft pack generation builtin/repack.c: support generating a cruft pack builtin/pack-objects.c: --cruft with expiration reachable: report precise timestamps from objects in cruft packs reachable: add options to add_unseen_recent_objects_to_traversal builtin/pack-objects.c: --cruft without expiration builtin/pack-objects.c: return from create_object_entry() t/helper: add 'pack-mtimes' test-tool pack-mtimes: support writing pack .mtimes files chunk-format.h: extract oid_version() pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' pack-mtimes: support reading .mtimes files Documentation/technical: add cruft-packs.txt
2022-05-27chunk-format.h: extract oid_version()Taylor Blau
There are three definitions of an identical function which converts `the_hash_algo` into either 1 (for SHA-1) or 2 (for SHA-256). There is a copy of this function for writing both the commit-graph and multi-pack-index file, and another inline definition used to write the .rev header. Consolidate these into a single definition in chunk-format.h. It's not clear that this is the best header to define this function in, but it should do for now. (Worth noting, the .rev caller expects a 4-byte unsigned, but the other two callers work with a single unsigned byte. The consolidated version uses the latter type, and lets the compiler widen it when required). Another caller will be added in a subsequent patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-24Merge branch 'ab/valgrind-fixes'Junio C Hamano
A bit of test framework fixes with a few fixes to issues found by valgrind. * ab/valgrind-fixes: commit-graph.c: don't assume that stat() succeeds object-file: fix a unpack_loose_header() regression in 3b6a8db3b03 log test: skip a failing mkstemp() test under valgrind tests: using custom GIT_EXEC_PATH breaks --valgrind tests
2022-05-21Merge branch 'ep/maint-equals-null-cocci'Junio C Hamano
Introduce and apply coccinelle rule to discourage an explicit comparison between a pointer and NULL, and applies the clean-up to the maintenance track. * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci
2022-05-13commit-graph.c: don't assume that stat() succeedsÆvar Arnfjörð Bjarmason
Fix code added in 8d84097f965 (commit-graph: expire commit-graph files, 2019-06-18) to check the return value of the stat() system call. Not doing so caused us to use uninitialized memory in the "Bloom generation is limited by --max-new-filters" test in t4216-log-bloom.sh: + rm -f trace.event + pwd + GIT_TRACE2_EVENT=[...]/t/trash directory.t4216-log-bloom/limits/trace.event git commit-graph write --reachable --split=replace --changed-paths --max-new-filters=2 ==24835== Syscall param utimensat(times[0].tv_sec) points to uninitialised byte(s) ==24835== at 0x499E65A: __utimensat64_helper (utimensat.c:34) ==24835== by 0x4999142: utime (utime.c:36) ==24835== by 0x552BE0: mark_commit_graphs (commit-graph.c:2213) ==24835== by 0x550822: write_commit_graph (commit-graph.c:2424) ==24835== by 0x54E3A0: write_commit_graph_reachable (commit-graph.c:1681) ==24835== by 0x4374BB: graph_write (commit-graph.c:269) ==24835== by 0x436F7D: cmd_commit_graph (commit-graph.c:326) ==24835== by 0x407B9A: run_builtin (git.c:465) ==24835== by 0x406651: handle_builtin (git.c:719) ==24835== by 0x407575: run_argv (git.c:786) ==24835== by 0x406410: cmd_main (git.c:917) ==24835== by 0x511F09: main (common-main.c:56) ==24835== Address 0x1ffeffde70 is on thread 1's stack ==24835== in frame #1, created by utime (utime.c:25) ==24835== Uninitialised value was created by a stack allocation ==24835== at 0x552B50: mark_commit_graphs (commit-graph.c:2201) ==24835== [...] error: last command exited with $?=126 not ok 137 - Bloom generation is limited by --max-new-filters This would happen as we stat'd the non-existing ".git/objects/info/commit-graph" file. Let's fix mark_commit_graphs() to check the stat()'s return value, and while we're at it fix another case added in the same commit to do the same. The caller in expire_commit_graphs() would have been less likely to run into this, as it's operating on files it just got from readdir(), but it could still happen due to a race with e.g. a concurrent "rm -rf" of the commit-graph files. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-02Merge branch 'ep/maint-equals-null-cocci' for maint-2.35Junio C Hamano
* ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci
2022-05-02tree-wide: apply equals-null.cocciJunio C Hamano
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-20commit-graph: close file before returning NULLKleber Tarcísio
There are two reasons that we could return NULL early within load_commit_graph_chain(): 1. The file does not exist, so the file pointer is NULL. 2. The file exists, but is too small to contain a single hash. These were grouped together when the function was first written in 5c84b3396 (commit-graph: load commit-graph chains, 2019-06-18) in order to simplify how the 'chain_name' string is freed. However, the current code leaves a narrow window where the file pointer is not closed when the file exists, but is rejected for being too small. Split out these cases separately to ensure we close the file in this case. Signed-off-by: Kleber Tarcísio <klebertarcisio@yahoo.com.br> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-26Merge branch 'ns/core-fsyncmethod'Junio C Hamano
Replace core.fsyncObjectFiles with two new configuration variables, core.fsync and core.fsyncMethod. * ns/core-fsyncmethod: core.fsync: documentation and user-friendly aggregate options core.fsync: new option to harden the index core.fsync: add configuration parsing core.fsync: introduce granular fsync control infrastructure core.fsyncmethod: add writeout-only mode wrapper: make inclusion of Windows csprng header tightly scoped
2022-03-17Merge branch 'ab/string-list-count-in-size-t'Junio C Hamano
Count string_list items in size_t, not "unsigned int". * ab/string-list-count-in-size-t: string-list API: change "nr" and "alloc" to "size_t" gettext API users: don't explicitly cast ngettext()'s "n"
2022-03-17Merge branch 'ds/commit-graph-gen-v2-fixes'Junio C Hamano
Fixes to the way generation number v2 in the commit-graph files are (not) handled. * ds/commit-graph-gen-v2-fixes: commit-graph: declare bankruptcy on GDAT chunks commit-graph: fix generation number v2 overflow values commit-graph: start parsing generation v2 (again) commit-graph: fix ordering bug in generation numbers t5318: extract helpers to lib-commit-graph.sh test-read-graph: include extra post-parse info
2022-03-11core.fsync: introduce granular fsync control infrastructureNeeraj Singh
This commit introduces the infrastructure for the core.fsync configuration knob. The repository components we want to sync are identified by flags so that we can turn on or off syncing for specific components. If core.fsyncObjectFiles is set and the core.fsync configuration also includes FSYNC_COMPONENT_LOOSE_OBJECT, we will fsync any loose objects. This picks the strictest data integrity behavior if core.fsync and core.fsyncObjectFiles are set to conflicting values. This change introduces the currently unused fsync_component helper, which will be used by a later patch that adds fsyncing to the refs backend. Actual configuration and documentation of the fsync components list are in other patches in the series to separate review of the underlying mechanism from the policy of how it's configured. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-07string-list API: change "nr" and "alloc" to "size_t"Ævar Arnfjörð Bjarmason
Change the "nr" and "alloc" members of "struct string_list" to use "size_t" instead of "nr". On some platforms the size of an "unsigned int" will be smaller than a "size_t", e.g. a 32 bit unsigned v.s. 64 bit unsigned. As "struct string_list" is a generic API we use in a lot of places this might cause overflows. As one example: code in "refs.c" keeps track of the number of refs with a "size_t", and auxiliary code in builtin/remote.c in get_ref_states() appends those to a "struct string_list". While we're at it split the "nr" and "alloc" in string-list.h across two lines, which is the case for most such struct member declarations (e.g. in "strbuf.h" and "strvec.h"). Changing e.g. "int i" to "size_t i" in run_and_feed_hook() isn't strictly necessary, and there are a lot more cases where we'll use a local "int", "unsigned int" etc. variable derived from the "nr" in the "struct string_list". But in that case as well as add_wrapped_shortlog_msg() in builtin/shortlog.c we need to adjust the printf format referring to "nr" anyway, so let's also change the other variables referring to it. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-07commit-graph: declare bankruptcy on GDAT chunksDerrick Stolee
The Generation Data (GDAT) and Generation Data Overflow (GDOV) chunks store corrected commit date offsets, used for generation number v2. Recent changes have demonstrated that previous versions of Git were incorrectly parsing data from these chunks, but might have also been writing them incorrectly. I asserted [1] that the previous fixes were sufficient because the known reasons for incorrectly writing generation number v2 data relied on parsing the information incorrectly out of a commit-graph file, but the previous versions of Git were not reading the generation number v2 data. However, Patrick demonstrated [2] a case where in split commit-graphs across an alternate boundary (and possibly some other special conditions) it was possible to have a commit-graph that was generated by a previous version of Git have incorrect generation number v2 data which results in errors like the following: commit-graph generation for commit <oid> is 1623273624 < 1623273710 [1] https://lore.kernel.org/git/f50e74f0-9ffa-f4f2-4663-269801495ed3@github.com/ [2] https://lore.kernel.org/git/Yh93vOkt2DkrGPh2@ncase/ Clearly, there is something else going on. The situation is not completely understood, but the errors do not reproduce if the commit-graphs are all generated by a Git version including these recent fixes. If we cannot trust the existing data in the GDAT and GDOV chunks, then we can alter the format to change the chunk IDs for these chunks. This causes the new version of Git to silently ignore the older chunks (and disabling generation number v2 in the process) while writing new commit-graph files with correct data in the GDA2 and GDO2 chunks. Update commit-graph-format.txt including a historical note about these deprecated chunks. Reported-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-05lockfile API users: simplify and don't leak "path"Ævar Arnfjörð Bjarmason
Fix a memory leak in code added in 6c622f9f0bb (commit-graph: write commit-graph chains, 2019-06-18). We needed to free the "lock_name" if we encounter errors, and the "graph_name" after we'd run unlink() on it. For the case of write_commit_graph_file() refactoring the code to free the "lock_name" after we were done using the "struct lock_file lk" would have made the control flow more complex. Luckily we can free the "lock_file" right after the hold_lock_file_for_update() call, if it makes use of "path" at all it'll have copied its contents to a "struct strbuf" of its own. While I'm at it let's fix code added in fb10ca5b543 (sparse-checkout: write using lockfile, 2019-11-21) in write_patterns_and_update() to avoid the same complexity that I thought I needed when I wrote the initial fix for write_commit_graph_file(). We can free the "sparse_filename" right after calling hold_lock_file_for_update(), we don't need to wait until we're exiting the function to do so. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-05commit-graph: stop fill_oids_from_packs() progress on error and free()Ævar Arnfjörð Bjarmason
Fix a bug in fill_oids_from_packs(), we should always stop_progress(), but did not do so if we returned an error here. This also plugs a memory leak in those cases by releasing the two "struct strbuf" variables the function uses. While I'm at it stop hardcoding "-1" here and just use the return value of error() instead, which happens to be "-1". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-05commit-graph: fix memory leak in misused string_list APIÆvar Arnfjörð Bjarmason
When this code was migrated to the string_list API in d88b14b3fd6 (commit-graph: use string-list API for input, 2018-06-27) it was made to use used both STRING_LIST_INIT_NODUP and a strbuf_detach() pattern. Those should not be used together if string_list_clear() is expected to free the memory, instead we need to either use STRING_LIST_INIT_DUP with a string_list_append_nodup(), or a STRING_LIST_INIT_NODUP and manually fiddle with the "strdup_strings" member before calling string_list_clear(). Let's do the former. Since "strdup_strings = 1" is set now other code might be broken by relying on "pack_indexes" not to duplicate it strings, but that doesn't happen. When we pass this down to write_commit_graph() that code uses the "struct string_list" without modifying it. Let's add a "const" to the variable to have the compiler enforce that assumption. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01commit-graph: fix generation number v2 overflow valuesDerrick Stolee
The Generation Data Chunk was implemented and tested in e8b63005c (commit-graph: implement generation data chunk, 2021-01-16), but the test was carefully constructed to work on systems with 32-bit dates. Since the corrected commit date offsets still required more than 31 bits, this triggered writing the generation_data_overflow chunk. However, upon closer look, the write_graph_chunk_generation_data_overflow() method writes the offsets to the chunk (as dictated by the format) but fill_commit_graph_info() treats the value in the chunk as if it is the full corrected commit date (not an offset). For some reason, this does not cause an issue when using the FUTURE_DATE specified in t5318-commit-graph.sh, but it does show up as a failure in 'git commit-graph verify' if we increase that FUTURE_DATE to be above four billion. Fix this error and create a 64-bit timestamp version of the test so we can test these larger values. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01commit-graph: start parsing generation v2 (again)Derrick Stolee
The 'read_generation_data' member of 'struct commit_graph' was introduced by 1fdc383c5 (commit-graph: use generation v2 only if entire chain does, 2021-01-16). The intention was to avoid using corrected commit dates if not all layers of a commit-graph had that data stored. The logic in validate_mixed_generation_chain() at that point incorrectly initialized read_generation_data to 1 if and only if the tip commit-graph contained the Corrected Commit Date chunk. This was "fixed" in 448a39e65 (commit-graph: validate layers for generation data, 2021-02-02) to validate that read_generation_data was either non-zero for all layers, or it would set read_generation_data to zero for all layers. The problem here is that read_generation_data is not initialized to be non-zero anywhere! This change initializes read_generation_data immediately after the chunk is parsed, so each layer will have its value present as soon as possible. The read_generation_data member is used in fill_commit_graph_info() to determine if we should use the corrected commit date or the topological levels stored in the Commit Data chunk. Due to this bug, all previous versions of Git were defaulting to topological levels in all cases! This can be measured with some performance tests. Using the Linux kernel as a testbed, I generated a complete commit-graph containing corrected commit dates and tested the 'new' version against the previous, 'old' version. First, rev-list with --topo-order demonstrates a 26% improvement using corrected commit dates: hyperfine \ -n "old" "$OLD_GIT rev-list --topo-order -1000 v3.6" \ -n "new" "$NEW_GIT rev-list --topo-order -1000 v3.6" \ --warmup=10 Benchmark 1: old Time (mean ± σ): 57.1 ms ± 3.1 ms Range (min … max): 52.9 ms … 62.0 ms 55 runs Benchmark 2: new Time (mean ± σ): 45.5 ms ± 3.3 ms Range (min … max): 39.9 ms … 51.7 ms 59 runs Summary 'new' ran 1.26 ± 0.11 times faster than 'old' These performance improvements are due to the algorithmic improvements given by walking fewer commits due to the higher cutoffs from corrected commit dates. However, this comes at a cost. The additional I/O cost of parsing the corrected commit dates is visible in case of merge-base commands that do not reduce the overall number of walked commits. hyperfine \ -n "old" "$OLD_GIT merge-base v4.8 v4.9" \ -n "new" "$NEW_GIT merge-base v4.8 v4.9" \ --warmup=10 Benchmark 1: old Time (mean ± σ): 110.4 ms ± 6.4 ms Range (min … max): 96.0 ms … 118.3 ms 25 runs Benchmark 2: new Time (mean ± σ): 150.7 ms ± 1.1 ms Range (min … max): 149.3 ms … 153.4 ms 19 runs Summary 'old' ran 1.36 ± 0.08 times faster than 'new' Performance issues like this are what motivated 702110aac (commit-graph: use config to specify generation type, 2021-02-25). In the future, we could fix this performance problem by inserting the corrected commit date offsets into the Commit Date chunk instead of having that data in an extra chunk. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01commit-graph: fix ordering bug in generation numbersDerrick Stolee
When computing the generation numbers for a commit-graph, we compute the corrected commit dates and then check if their offsets from the actual dates is too large to fit in the 32-bit Generation Data chunk. However, there is a problem with this approach: if we have parsed the generation data from the previous commit-graph, then we continue the loop because the corrected commit date is already computed. This causes an under-count in the number of overflow values. It is incorrect to add an increment to num_generation_data_overflows next to this 'continue' statement, because we might start double-counting commits that are computed because of the depth-first search walk from a commit with an earlier OID. Instead, iterate over the full commit list at the end, checking the offsets to see how many grow beyond the maximum value. Create a new t5328-commit-graph-64-bit-time.sh test script to handle special cases of testing 64-bit timestamps. This helps demonstrate this bug in more cases. It still won't hit all potential cases until the next change, which reenables reading generation numbers. Use the skip_all trick from 0a2bfccb9c8 (t0051: use "skip_all" under !MINGW in single-test file, 2022-02-04) to make the output clean when run on a 32-bit system. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-06commit-graph: return if there is no git directoryLessley Dennington
Return early if git directory does not exist. This will protect against test failures in the upcoming change to BUG in prepare_repo_settings if no git directory exists. Signed-off-by: Lessley Dennington <lessleydennington@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>