Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
path: root/midx.c
AgeCommit message (Collapse)Author
2023-03-18Merge branch 'jk/unused-post-2.39-part2'Junio C Hamano
More work towards -Wunused. * jk/unused-post-2.39-part2: (21 commits) help: mark unused parameter in git_unknown_cmd_config() run_processes_parallel: mark unused callback parameters userformat_want_item(): mark unused parameter for_each_commit_graft(): mark unused callback parameter rewrite_parents(): mark unused callback parameter fetch-pack: mark unused parameter in callback function notes: mark unused callback parameters prio-queue: mark unused parameters in comparison functions for_each_object: mark unused callback parameters list-objects: mark unused callback parameters mark unused parameters in signal handlers run-command: mark error routine parameters as unused mark "pointless" data pointers in callbacks ref-filter: mark unused callback parameters http-backend: mark unused parameters in virtual functions http-backend: mark argc/argv unused object-name: mark unused parameters in disambiguate callbacks serve: mark unused parameters in virtual functions serve: use repository pointer to get config ls-refs: drop config caching ...
2023-02-24for_each_object: mark unused callback parametersJeff King
The for_each_{loose,packed}_object interface uses callback functions, but not every callback needs all of the parameters. Mark the unused ones to satisfy -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24alloc.h: move ALLOC_GROW() functions from cache.hElijah Newren
This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-28Merge branch 'tb/midx-bitmap-selection-fix'Junio C Hamano
A bugfix with tracing support in midx codepath * tb/midx-bitmap-selection-fix: pack-bitmap-write.c: instrument number of reused bitmaps midx.c: instrument MIDX and bitmap generation with trace2 regions midx.c: consider annotated tags during bitmap selection midx.c: fix whitespace typo
2022-10-13midx.c: instrument MIDX and bitmap generation with trace2 regionsTaylor Blau
When debugging MIDX and MIDX-bitmap related issues, it is useful to figure out where Git is spending its time. GitHub has been using the below trace2 regions to instrument various components of generating a MIDX itself, as well time spent preparing to build a MIDX bitmap. These are limited to instrumenting the following functions: - midx.c::find_commits_for_midx_bitmap() - midx.c::midx_pack_order() - midx.c::prepare_midx_packing_data() - midx.c::write_midx_bitmap() - midx.c::write_midx_internal() - midx.c::write_midx_reverse_index() to start and end with a trace2_region_enter() and trace2_region_leave(), respectively. The category for all of these is "midx", which matches the existing convention. The region description matches the name of the function. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-13midx.c: consider annotated tags during bitmap selectionTaylor Blau
When generating a multi-pack bitmap without a `--refs-snapshot` (e.g., by running `git multi-pack-index write --bitmap` directly), we determine the set of bitmap-able commits by enumerating each reference, and adding the referrent as the tip of a reachability traversal when it appears somewhere in the MIDX. (Any commit we encounter during the reachability traversal then becomes a candidate for bitmap selection). But we incorrectly avoid peeling the object at the tip of each reference. So if we see some reference that points at an annotated tag (which in turn points through zero or more additional annotated tags at a commit), that we will not add it as a tip for the reachability traversal. This means that if some commit C is only referenced through one or more annotated tag(s), then C won't become a bitmap candidate. Correct this by peeling the reference tips as we enumerate them to ensure that we consider commits which are the targets of annotated tags, in addition to commits which are referenced directly. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-13midx.c: fix whitespace typoTaylor Blau
This was unintentionally introduced via 893b563505 (midx: inline nth_midxed_pack_entry(), 2021-09-11) where "struct repository *r" became "struct repository * r". The latter does not adhere to our usual style conventions, so fix that up to look more like our usual declarations. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-21midx.c: avoid cruft packs with non-zero `repack --batch-size`Taylor Blau
Apply similar treatment with respect to cruft packs as in a few commits ago to `repack` with a non-zero `--batch-size`. Since the case of a non-zero `--batch-size` is handled separately (in `fill_included_packs_batch()` instead of `fill_included_packs_all()`), a separate fix must be applied for this case. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-21midx.c: remove unnecessary loop conditionTaylor Blau
The fill_included_packs_batch() routine is responsible for aggregating objects in packs with a non-zero value for the `--batch-size` option of the `git multi-pack-index repack` sub-command. Since this routine is explicitly called only when `--batch-size` is non-zero, there is no point in checking that this is the case in our loop condition. Remove the unnecessary part of this condition to avoid confusion. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-21midx.c: replace `xcalloc()` with `CALLOC_ARRAY()`Taylor Blau
Replace a direct invocation of Git's `xcalloc()` wrapper with the `CALLOC_ARRAY()` macro instead. The latter is preferred since it is more conventional in Git's codebase, but also because it automatically picks the correct value for the record size. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-21midx.c: avoid cruft packs with `repack --batch-size=0`Taylor Blau
The `repack` sub-command of the `git multi-pack-index` builtin creates a new pack aggregating smaller packs contained in the MIDX up to some given `--batch-size`. When `--batch-size=0`, this instructs the MIDX builtin to repack everything contained in the MIDX into a single pack. In similar spirit as a previous commit, it is undesirable to repack the contents of a cruft pack in this step. Teach `repack` to ignore any cruft pack(s) when `--batch-size=0` for the same reason(s). (The case of a non-zero `--batch-size` will be handled in a subsequent commit). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-21midx.c: prevent `expire` from removing the cruft packTaylor Blau
The `expire` sub-command unlinks any packs that are (a) contained in the MIDX, but (b) have no objects referenced by the MIDX. This sub-command ignores `.keep` packs, which remain on-disk even if they have no objects referenced by the MIDX. Cruft packs, however, aren't given the same treatment: if none of the objects contained in the cruft pack are selected from the cruft pack by the MIDX, then the cruft pack is eligible to be expired. This is less than desireable, since the cruft pack has important metadata about the individual object mtimes, which is useful to determine how quickly an object should age out of the repository when pruning. Ordinarily, we wouldn't expect the contents of a cruft pack to duplicated across non-cruft packs (and we'd expect to see the MIDX select all cruft objects from other sources even less often). But nonetheless, it is still possible to trick the `expire` sub-command into removing the `.mtimes` file in this circumstance. Teach the `expire` sub-command to ignore cruft packs in the same manner as it does `.keep` packs, in order to keep their metadata around, even when they are unreferenced by the MIDX. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-06Merge branch 'ac/bitmap-lookup-table'Junio C Hamano
The pack bitmap file gained a bitmap-lookup table to speed up locating the necessary bitmap for a given commit. * ac/bitmap-lookup-table: pack-bitmap-write: drop unused pack_idx_entry parameters bitmap-lookup-table: add performance tests for lookup table pack-bitmap: prepare to read lookup table extension pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests pack-bitmap-write.c: write lookup table extension bitmap: move `get commit positions` code to `bitmap_writer_finish` Documentation/technical: describe bitmap lookup table extension
2022-09-06Merge branch 'tb/midx-with-changing-preferred-pack-fix'Junio C Hamano
Multi-pack index got corrupted when preferred pack changed from one pack to another in a certain way, which has been corrected. * tb/midx-with-changing-preferred-pack-fix: midx.c: avoid adding preferred objects twice midx.c: include preferred pack correctly with existing MIDX midx.c: extract `midx_fanout_add_pack_fanout()` midx.c: extract `midx_fanout_add_midx_fanout()` midx.c: extract `struct midx_fanout` t/lib-bitmap.sh: avoid silencing stderr t5326: demonstrate potential bitmap corruption
2022-08-26pack-bitmap-write: learn pack.writeBitmapLookupTable and add testsAbhradeep Chakraborty
Teach Git to provide a way for users to enable/disable bitmap lookup table extension by providing a config option named 'writeBitmapLookupTable'. Default is false. Also add test to verify writting of lookup table. Mentored-by: Taylor Blau <me@ttaylorr.com> Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com> Co-Authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-22midx.c: avoid adding preferred objects twiceTaylor Blau
The last commit changes the behavior of midx.c's `get_sorted_objects()` function to handle the case of writing a MIDX bitmap while reusing an existing MIDX and changing the identity of the preferred pack separately. As part of this change, all objects from the (new) preferred pack are added to the fanout table in a separate pass. Since these copies of the objects all have their preferred bits set, any duplicates will be resolved in their favor. Importantly, this includes any copies of those same objects that come from the existing MIDX. We know at the time of adding them that they'll be redundant if their source pack is the (new) preferred one, so we can avoid adding them to the list in this case. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-22midx.c: include preferred pack correctly with existing MIDXTaylor Blau
This patch resolves an issue where the object order used to generate a MIDX bitmap would violate an invariant that all of the preferred pack's objects are represented by that pack in the MIDX. The problem arises when reusing an existing MIDX while generating a new one, and occurs specifically when the identity of the preferred pack changes from one MIDX to another, along with a few other conditions: - the new preferred pack must also be present in the existing MIDX - the new preferred pack must *not* have been the preferred pack in the existing MIDX - most importantly, there must be at least one object present in the physical preferred pack (ie., it shows up in that pack's index) but was selected from a *different* pack when the previous MIDX was generated When the above conditions are all met, we end up (incorrectly) discarding copies of some objects in the pack selected as the preferred pack. This is because `get_sorted_entries()` adds objects to its list by doing the following at each fanout level: - first, adding all objects from that fanout level from an existing MIDX - then, adding all objects from that fanout level in each pack *not* included in the existing MIDX So if some object was not selected from the to-be-preferred pack when writing the previous MIDX, then we will never consider it as a candidate when generating the new MIDX. This means that it's possible for the preferred pack to not include all of its objects in the MIDX's pseudo-pack object order, which is an invariant violation of that order. Resolve this by adding all objects from the preferred pack separately when it appears in the existing MIDX (if one was present). This will duplicate objects from that pack that *did* appear in the MIDX, but this is fine, since get_sorted_entries() already handles duplicates. (A future optimization in this area could avoid adding copies of objects that we know already existing in the MIDX.) Note that we no longer need to compute the preferred-ness of objects added from the MIDX, since we only want to select the preferred objects from a single source. (We could still mark these preferred bits, but doing so is redundant and unnecessary). This resolves the bug demonstrated by t5326.174 ("preferred pack change with existing MIDX bitmap"). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-22midx.c: extract `midx_fanout_add_pack_fanout()`Taylor Blau
Extract a routine to add all objects whose object ID's first byte is `cur_fanout` from a given pack (identified by its index into the `struct pack_info` array maintained by the MIDX writing routine). Unlike the previous extraction (for `midx_fanout_add_midx_fanout()`), this function will be called twice, once for all new packs, and again for the preferred pack (if it appears in an existing MIDX). The latter change is to resolve the bug described a few patches ago, and will be made in the subsequent commit. Similar to the previous refactoring, this function also enhances the readability of its caller in `get_sorted_entries()`. Its functionality is unchanged in this commit. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-22midx.c: extract `midx_fanout_add_midx_fanout()`Taylor Blau
Extract a routine to add all objects whose object ID's first byte is `cur_fanout` from an existing MIDX. This function will only be called once, so extracting it is purely cosmetic to improve the readability of `get_sorted_entries()` (its sole caller) below. The functionality is unchanged in this commit. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-22midx.c: extract `struct midx_fanout`Taylor Blau
To build up a list of objects (along with their packs, and the offsets within those packs that each object appears at), the MIDX code implements `get_sorted_entries()` which builds up a list of candidates, sorts them, and then removes duplicate entries. To do this, it keeps an array of `pack_midx_entry` structures that it builds up once for each fanout level (ie., for all possible values of the first byte of each object's ID). This array is a function-local variable of `get_sorted_entries()`. Since it uses the ALLOC_GROW() macro, having the `alloc_fanout` variable also be local to that function, and only modified within that function is convenient. However, subsequent changes will extract the two ways this array is filled (from a pack at some fanout value, and from an existing MIDX at some fanout value) into separate functions. Instead of passing around pointers to the entries array, along with `nr_fanout` and `alloc_fanout`, encapsulate these three into a structure instead. Then pass around a pointer to this structure instead. This patch does not yet extract the above two functions, but sets us up to begin doing so in the following commit. For now, the implementation of get_sorted_entries() is only modified to replace `entries_by_fanout` with `fanout.entries`, `nr_fanout` with `fanout.nr`, and so on. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-27write_midx_bitmap(): drop unused refs_snapshot parameterJeff King
The refactoring in 90b2bb710d (midx: extract bitmap write setup, 2022-07-19) hoisted our call to find_commits_for_midx_bitmap() into the caller, which means we no longer need to see the refs_snapshot at all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-19midx: reduce memory pressure while writing bitmapsDerrick Stolee
We noticed that some 'git multi-pack-index write --bitmap' processes were running with very high memory. It turns out that a lot of this memory is required to store a list of every object in the written multi-pack-index, with a second copy that has additional information used for the bitmap writing logic. Using 'valgrind --tool=massif' before this change, the following chart shows how memory load increased and was maintained throughout the process: GB 4.102^ :: | @ @::@@::@@::::::::@::::::@@:#:::::::::::::@@:: : | :::::@@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :::: :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :::: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : 0 +---------------------------------------------------------------> It turns out that the 'struct write_midx_context' data is persisting through the life of the process, including the 'entries' array. This array is used last inside find_commits_for_midx_bitmap() within write_midx_bitmap(). If we free (and nullify) the array at that point, we can free a decent chunk of memory before the bitmap logic adds more to the memory footprint. Here is the massif memory load chart after this change: GB 3.111^ # | # :::::::::::@::::::::::::::@ | # ::::::::::::::::::::::::: : :: : @:: ::::: :: ::@ | @# :::::::::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :::@#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ 0 +---------------------------------------------------------------> The previous change introduced a refactoring of write_midx_bitmap() to make it more clear how much of the 'struct write_midx_context' instance is needed at different parts of the process. In addition, the following defensive programming measures were put in place: 1. Using FREE_AND_NULL() we will at least get a segfault from reading a NULL pointer instead of a use-after-free. 2. 'entries_nr' is also set to zero to make any loop that would iterate over the entries be trivial. 3. Add significant comments in write_midx_internal() to add warnings for future authors who might accidentally add references to this cleared memory. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-19midx: extract bitmap write setupDerrick Stolee
The write_midx_bitmap() method is a long method that does a lot of steps. It requires the write_midx_context struct for use in prepare_midx_packing_data() and find_commits_for_midx_bitmap(), but after that only needs the pack_order array. This is a messy, but completely non-functional refactoring. The code is only being moved around to reduce visibility of the write_midx_context during the longest part of computing reachability bitmaps. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-08Merge branch 'ab/plug-leak-in-revisions'Junio C Hamano
Plug the memory leaks from the trickiest API of all, the revision walker. * ab/plug-leak-in-revisions: (27 commits) revisions API: add a TODO for diff_free(&revs->diffopt) revisions API: have release_revisions() release "topo_walk_info" revisions API: have release_revisions() release "date_mode" revisions API: call diff_free(&revs->pruning) in revisions_release() revisions API: release "reflog_info" in release revisions() revisions API: clear "boundary_commits" in release_revisions() revisions API: have release_revisions() release "prune_data" revisions API: have release_revisions() release "grep_filter" revisions API: have release_revisions() release "filter" revisions API: have release_revisions() release "cmdline" revisions API: have release_revisions() release "mailmap" revisions API: have release_revisions() release "commits" revisions API users: use release_revisions() for "prune_data" users revisions API users: use release_revisions() with UNLEAK() revisions API users: use release_revisions() in builtin/log.c revisions API users: use release_revisions() in http-push.c revisions API users: add "goto cleanup" for release_revisions() stash: always have the owner of "stash_info" free it revisions API users: use release_revisions() needing REV_INFO_INIT revision.[ch]: document and move code declared around "init" ...
2022-06-04Merge branch 'tb/cruft-packs'Junio C Hamano
A mechanism to pack unreachable objects into a "cruft pack", instead of ejecting them into loose form to be reclaimed later, has been introduced. * tb/cruft-packs: sha1-file.c: don't freshen cruft packs builtin/gc.c: conditionally avoid pruning objects via loose builtin/repack.c: add cruft packs to MIDX during geometric repack builtin/repack.c: use named flags for existing_packs builtin/repack.c: allow configuring cruft pack generation builtin/repack.c: support generating a cruft pack builtin/pack-objects.c: --cruft with expiration reachable: report precise timestamps from objects in cruft packs reachable: add options to add_unseen_recent_objects_to_traversal builtin/pack-objects.c: --cruft without expiration builtin/pack-objects.c: return from create_object_entry() t/helper: add 'pack-mtimes' test-tool pack-mtimes: support writing pack .mtimes files chunk-format.h: extract oid_version() pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' pack-mtimes: support reading .mtimes files Documentation/technical: add cruft-packs.txt
2022-05-27chunk-format.h: extract oid_version()Taylor Blau
There are three definitions of an identical function which converts `the_hash_algo` into either 1 (for SHA-1) or 2 (for SHA-256). There is a copy of this function for writing both the commit-graph and multi-pack-index file, and another inline definition used to write the .rev header. Consolidate these into a single definition in chunk-format.h. It's not clear that this is the best header to define this function in, but it should do for now. (Worth noting, the .rev caller expects a 4-byte unsigned, but the other two callers work with a single unsigned byte. The consolidated version uses the latter type, and lets the compiler widen it when required). Another caller will be added in a subsequent patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-04Merge branch 'ds/midx-normalize-pathname-before-comparison'Junio C Hamano
The path taken by "git multi-pack-index" command from the end user was compared with path internally prepared by the tool withut first normalizing, which lead to duplicated paths not being noticed, which has been corrected. * ds/midx-normalize-pathname-before-comparison: cache: use const char * for get_object_directory() multi-pack-index: use --object-dir real path midx: use real paths in lookup_multi_pack_index()
2022-04-25midx: use real paths in lookup_multi_pack_index()Derrick Stolee
This helper looks for a parsed multi-pack-index whose object directory matches the given object_dir. Before going into the loop over the parsed multi-pack-indexes, it calls find_odb() to ensure that the given object_dir is actually a known object directory. However, find_odb() uses real-path manipulations to compare the input to the alternate directories. This same real-path comparison is not used in the loop, leading to potential issues with the strcmp(). Update the method to use the real-path values instead. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-14revisions API users: add straightforward release_revisions()Ævar Arnfjörð Bjarmason
Add a release_revisions() to various users of "struct rev_list" in those straightforward cases where we only need to add the release_revisions() call to the end of a block, and don't need to e.g. refactor anything to use a "goto cleanup" pattern. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-11core.fsync: introduce granular fsync control infrastructureNeeraj Singh
This commit introduces the infrastructure for the core.fsync configuration knob. The repository components we want to sync are identified by flags so that we can turn on or off syncing for specific components. If core.fsyncObjectFiles is set and the core.fsync configuration also includes FSYNC_COMPONENT_LOOSE_OBJECT, we will fsync any loose objects. This picks the strictest data integrity behavior if core.fsync and core.fsyncObjectFiles are set to conflicting values. This change introduces the currently unused fsync_component helper, which will be used by a later patch that adds fsyncing to the refs backend. Actual configuration and documentation of the fsync components list are in other patches in the series to separate review of the underlying mechanism from the policy of how it's configured. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-19Merge branch 'tb/midx-no-bitmap-for-no-objects'Junio C Hamano
When there is no object to write .bitmap file for, "git multi-pack-index" triggered an error, instead of just skipping, which has been corrected. * tb/midx-no-bitmap-for-no-objects: midx: prevent writing a .bitmap without any objects
2022-02-10midx: prevent writing a .bitmap without any objectsTaylor Blau
When trying to write a MIDX, we already prevent the case where there weren't any packs present, and thus we would have written an empty MIDX. But there is another "empty" case, which is more interesting, and we don't yet handle. If we try to write a MIDX which has at least one pack, but those packs together don't contain any objects, we will encounter a BUG() when trying to use the bitmap corresponding to that MIDX, like so: $ git rev-parse HEAD | git pack-objects --revs --use-bitmap-index --stdout >/dev/null BUG: pack-revindex.c:394: pack_pos_to_midx: out-of-bounds object at 0 (note that in the above reproduction, both `--use-bitmap-index` and `--stdout` are important, since without the former we won't even both to load the .bitmap, and without the latter we wont attempt pack reuse). The problem occurs when we try to discover the identity of the preferred pack to determine which range if any of existing packs we can reuse verbatim. This path is: `reuse_packfile_objects()` -> `reuse_partial_packfile_from_bitmap()` -> `midx_preferred_pack()`. #4 0x000055555575401f in pack_pos_to_midx (m=0x555555997160, pos=0) at pack-revindex.c:394 #5 0x00005555557502c8 in midx_preferred_pack (bitmap_git=0x55555599c280) at pack-bitmap.c:1431 #6 0x000055555575036c in reuse_partial_packfile_from_bitmap (bitmap_git=0x55555599c280, packfile_out=0x5555559666b0 <reuse_packfile>, entries=0x5555559666b8 <reuse_packfile_objects>, reuse_out=0x5555559666c0 <reuse_packfile_bitmap>) at pack-bitmap.c:1452 #7 0x00005555556041f6 in get_object_list_from_bitmap (revs=0x7fffffffcbf0) at builtin/pack-objects.c:3658 #8 0x000055555560465c in get_object_list (ac=2, av=0x555555997050) at builtin/pack-objects.c:3765 #9 0x0000555555605e4e in cmd_pack_objects (argc=0, argv=0x7fffffffe920, prefix=0x0) at builtin/pack-objects.c:4154 Since neither the .bitmap or MIDX stores the identity of the preferred pack, we infer it by trying to load the first object in pseudo-pack order, and then asking the MIDX which pack was chosen to represent that object. But this fails our bounds check, since there are zero objects in the MIDX to begin with, which results in the BUG(). We could catch this more carefully in `midx_preferred_pack()`, but signaling the absence of a preferred pack out to all of its callers is somewhat awkward. Instead, let's avoid writing a MIDX .bitmap without any objects altogether. We catch this case in `write_midx_internal()`, and emit a warning if the caller indicated they wanted to write a bitmap before clearing out the relevant flags. If we somehow got to write_midx_bitmap(), then we will call BUG(), but this should now be an unreachable path. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-27midx: read `RIDX` chunk when presentTaylor Blau
When a MIDX contains the new `RIDX` chunk, ensure that the reverse index is read from it instead of the on-disk .rev file. Since we need to encode the object order in the MIDX itself for correctness reasons, there is no point in storing the same data again outside of the MIDX. So, this patch stops writing separate .rev files, and reads it out of the MIDX itself. This is possible to do with relatively little new code, since the format of the RIDX chunk is identical to the data in the .rev file. In other words, we can implement this by pointing the `revindex_data` field at the reverse index chunk of the MIDX instead of the .rev file without any other changes. Note that we have two knobs that are adjusted for the new tests: GIT_TEST_MIDX_WRITE_REV and GIT_TEST_MIDX_READ_RIDX. The former controls whether the MIDX .rev is written at all, and the latter controls whether we read the MIDX's RIDX chunk. Both are necessary to ensure that the test added at the beginning of this series continues to work. This is because we always need to write the RIDX chunk in the MIDX in order to change its checksum, but we want to make sure reading the existing .rev file still works (since the RIDX chunk takes precedence by default). Arguably this isn't a very interesting mode to test, because the precedence rules mean that we'll always read the RIDX chunk over the .rev file. But it makes it impossible for a user to induce corruption in their repository by adjusting the test knobs (since if we had an either/or knob they could stop writing the RIDX chunk, allowing them to tweak the MIDX's object order without changing its checksum). Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-27midx.c: make changing the preferred pack safeTaylor Blau
The previous patch demonstrates a bug where a MIDX's auxiliary object order can become out of sync with a MIDX bitmap. This is because of two confounding factors: - First, the object order is stored in a file which is named according to the multi-pack index's checksum, and the MIDX does not store the object order. This means that the object order can change without altering the checksum. - But the .rev file is moved into place with finalize_object_file(), which link(2)'s the file into place instead of renaming it. For us, that means that a modified .rev file will not be moved into place if MIDX's checksum was unchanged. This fix is to force the MIDX's checksum to change when the preferred pack changes but the set of packs contained in the MIDX does not. In other words, when the object order changes, the MIDX's checksum needs to change with it (regardless of whether the MIDX is tracking the same or different packs). This prevents a race whereby changing the object order (but not the packs themselves) enables a reader to see the new .rev file with the old MIDX, or similarly seeing the new bitmap with the old object order. But why can't we just stop hardlinking the .rev into place instead adding additional data to the MIDX? Suppose that's what we did. Then when we go to generate the new bitmap, we'll load the old MIDX bitmap, along with the MIDX that it references. That's fine, since the new MIDX isn't moved into place until after the new bitmap is generated. But the new object order *has* been moved into place. So we'll read the old bitmaps in the new order when generating the new bitmap file, meaning that without this secondary change, bitmap generation itself would become a victim of the race described here. This can all be prevented by forcing the MIDX's checksum to change when the object order does. By embedding the entire object order into the MIDX, we do just that. That is, the MIDX's checksum will change in response to any perturbation of the underlying object order. In t5326, this will cause the MIDX's checksum to update (even without changing the set of packs in the MIDX), preventing the stale read problem. Note that this makes it safe to continue to link(2) the MIDX .rev file into place, since it is now impossible to have a .rev file that is out-of-sync with the MIDX whose checksum it references. (But we will do away with MIDX .rev files later in this series anyway, so this is somewhat of a moot point). In theory, it is possible to store a "fingerprint" of the full object order here, so long as that fingerprint changes at least as often as the full object order does. Some possibilities here include storing the identity of the preferred pack, along with the mtimes of the non-preferred packs in a consistent order. But storing a limited part of the information makes it difficult to reason about whether or not there are gaps between the two that would cause us to get bitten by this bug again. Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-30Merge branch 'tb/plug-pack-bitmap-leaks'Junio C Hamano
Leakfix. * tb/plug-pack-bitmap-leaks: pack-bitmap.c: more aggressively free in free_bitmap_index() pack-bitmap.c: don't leak type-level bitmaps midx.c: write MIDX filenames to strbuf builtin/multi-pack-index.c: don't leak concatenated options builtin/repack.c: avoid leaking child arguments builtin/pack-objects.c: don't leak memory via arguments t/helper/test-read-midx.c: free MIDX within read_midx_file() midx.c: don't leak MIDX from verify_midx_file midx.c: clean up chunkfile after reading the MIDX
2021-10-29midx.c: write MIDX filenames to strbufTaylor Blau
To ask for the name of a MIDX and its corresponding .rev file, callers invoke get_midx_filename() and get_midx_rev_filename(), respectively. These both invoke xstrfmt(), allocating a chunk of memory which must be freed later on. This makes callers in pack-bitmap.c somewhat awkward. Specifically, midx_bitmap_filename(), which is implemented like: return xstrfmt("%s-%s.bitmap", get_midx_filename(midx->object_dir), hash_to_hex(get_midx_checksum(midx))); this leaks the second argument to xstrfmt(), which itself was allocated with xstrfmt(). This caller could assign both the result of get_midx_filename() and the outer xstrfmt() to a temporary variable, remembering to free() the former before returning. But that involves a wasteful copy. Instead, get_midx_filename() and get_midx_rev_filename() take a strbuf as an output parameter. This way midx_bitmap_filename() can manipulate and pass around a temporary buffer which it detaches back to its caller. That allows us to implement the function without copying or open-coding get_midx_filename() in a way that doesn't leak. Update the other callers of get_midx_filename() and get_midx_rev_filename() accordingly. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-28midx.c: don't leak MIDX from verify_midx_fileTaylor Blau
The function midx.c:verify_midx_file() allocates a MIDX struct by calling load_multi_pack_index(). But when cleaning up, it calls free() without freeing any resources associated with the MIDX. Call the more appropriate close_midx() which does free those resources, which causes t5319.3 to pass when Git is compiled with SANITIZE=leak. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-21midx.c: clean up chunkfile after reading the MIDXTaylor Blau
In order to read the contents of a MIDX, we initialize a chunkfile structure which can read the table of contents and assign pointers into different sections of the file for us. We do call free(), since the chunkfile struct is heap allocated, but not the more appropriate free_chunkfile(), which also frees memory that the structure itself owns. Call that instead to avoid leaking memory in this function. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-19Merge branch 'tb/repack-write-midx'Junio C Hamano
"git repack" has been taught to generate multi-pack reachability bitmaps. * tb/repack-write-midx: test-read-midx: fix leak of bitmap_index struct builtin/repack.c: pass `--refs-snapshot` when writing bitmaps builtin/repack.c: make largest pack preferred builtin/repack.c: support writing a MIDX while repacking builtin/repack.c: extract showing progress to a variable builtin/repack.c: rename variables that deal with non-kept packs builtin/repack.c: keep track of existing packs unconditionally midx: preliminary support for `--refs-snapshot` builtin/multi-pack-index.c: support `--stdin-packs` mode midx: expose `write_midx_file_only()` publicly
2021-10-15midx.c: guard against commit_lock_file() failuresTaylor Blau
When writing a MIDX, we atomically move the new MIDX into place via commit_lock_file(), but do not check to see if that call was successful. Make sure that we do check in order to prevent us from incorrectly reporting that we wrote a new MIDX if we actually encountered an error. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-15midx.c: lookup MIDX by object directory during repackTaylor Blau
Apply similar treatment as in the last commit to the MIDX `repack` operation. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-15midx.c: lookup MIDX by object directory during expireTaylor Blau
Before a new MIDX can be written, expire_midx_packs() first loads the existing MIDX, figures out which packs can be expired, and then writes a new MIDX based on that information. In order to load the existing MIDX, it uses load_multi_pack_index(), which mmaps the multi-pack-index file, but does not store the resulting `struct multi_pack_index *` in the object store. write_midx_internal() also needs to open the existing MIDX, and it does so by iterating the results of get_multi_pack_index(), so that it reuses the same pointer held by the object store. But before it can move the new MIDX into place, it close_object_store() to munmap() the multi-pack-index file to accommodate platforms like Windows which don't allow overwriting files which are memory mapped. That's where things get weird. Since expire_midx_packs has its own *separate* memory mapped copy of the MIDX, the MIDX file is still memory mapped! Interestingly, this doesn't seem to cause a problem in our tests. (I believe that this has much more to do with my own lack of familiarity with Windows than it does a lack of coverage in our tests). In any case, we can side-step the whole issue by teaching expire_midx_packs() to use the `struct multi_pack_index` pointer it found via the object store instead of maintain its own copy. That way, when write_midx_internal() calls `close_object_store()`, we know that there are no memory mapped copies of the MIDX laying around. A couple of other small notes about this patch: - As far as I can tell, passing `local == 1` to the call to load_multi_pack_index() was an error, since object_dir could be an alternate. But it doesn't matter, since even though we write `m->local = 1`, we never read that field back later on. - Setting `m = NULL` after write_midx_internal() was likely to prevent a double-free back from when that function took a `struct multi_pack_index *` that it called close_midx() on itself. We can rely on write_midx_internal() to call that for us now. Finally, this enforces the same "the value of --object-dir must be the local object store, or an alternate" rule from f57a739691 (midx: avoid opening multiple MIDXs when writing, 2021-09-01) to the `expire` sub-command, too. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-15midx.c: extract MIDX lookup by object_dirTaylor Blau
The first thing that write_midx_internal() does is load the MIDX corresponding to the given object directory, if one is present. Prepare for other functions in midx.c to do the same thing by extracting that operation out to a small helper function. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-15Merge branch 'tb/repack-write-midx' into tb/fix-midx-rename-while-mappedJunio C Hamano
* tb/repack-write-midx: test-read-midx: fix leak of bitmap_index struct builtin/repack.c: pass `--refs-snapshot` when writing bitmaps builtin/repack.c: make largest pack preferred builtin/repack.c: support writing a MIDX while repacking builtin/repack.c: extract showing progress to a variable builtin/repack.c: rename variables that deal with non-kept packs builtin/repack.c: keep track of existing packs unconditionally midx: preliminary support for `--refs-snapshot` builtin/multi-pack-index.c: support `--stdin-packs` mode midx: expose `write_midx_file_only()` publicly
2021-10-11Merge branch 'tb/midx-write-propagate-namehash'Junio C Hamano
"git multi-pack-index write --bitmap" learns to propagate the hashcache from original bitmap to resulting bitmap. * tb/midx-write-propagate-namehash: t5326: test propagating hashcache values p5326: generate pack bitmaps before writing the MIDX bitmap p5326: don't set core.multiPackIndex unnecessarily p5326: create missing 'perf-tag' tag midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps pack-bitmap.c: propagate namehash values from existing bitmaps t/helper/test-bitmap.c: add 'dump-hashes' mode
2021-09-29midx: preliminary support for `--refs-snapshot`Taylor Blau
To figure out which commits we can write a bitmap for, the multi-pack index/bitmap code does a reachability traversal, marking any commit which can be found in the MIDX as eligible to receive a bitmap. This approach will cause a problem when multi-pack bitmaps are able to be generated from `git repack`, since the reference tips can change during the repack. Even though we ignore commits that don't exist in the MIDX (when doing a scan of the ref tips), it's possible that a commit in the MIDX reaches something that isn't. This can happen when a multi-pack index contains some pack which refers to loose objects (e.g., if a pack was pushed after starting the repack but before generating the MIDX which depends on an object which is stored as loose in the repository, and by definition isn't included in the multi-pack index). By taking a snapshot of the references before we start repacking, we can close that race window. In the above scenario (where we have a packed object pointing at a loose one), we'll either (a) take a snapshot of the references before seeing the packed one, or (b) take it after, at which point we can guarantee that the loose object will be packed and included in the MIDX. This patch does just that. It writes a temporary "reference snapshot", which is a list of OIDs that are at the ref tips before writing a multi-pack bitmap. References that are "preferred" (i.e,. are a suffix of at least one value of the 'pack.preferBitmapTips' configuration) are marked with a special '+'. The format is simple: one line per commit at each tip, with an optional '+' at the beginning (for preferred references, as described above). When provided, the reference snapshot is used to drive bitmap selection instead of the MIDX code doing its own traversal. When it isn't provided, the usual traversal takes place instead. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-29midx: expose `write_midx_file_only()` publiclyTaylor Blau
Expose a variant of the write_midx_file() function which ignores packs that aren't included in an explicit "allow" list. This will be used in an upcoming patch to power a new `--stdin-packs` mode of `git multi-pack-index write` for callers that only want to include certain packs in a MIDX (and ignore any packs which may have happened to enter the repository independently, e.g., from pushes). Those patches will provide test coverage for this new function. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-23Merge branch 'rs/packfile-bad-object-list-in-oidset'Junio C Hamano
Replace a handcrafted data structure used to keep track of bad objects in the packfile API by an oidset. * rs/packfile-bad-object-list-in-oidset: packfile: use oidset for bad objects packfile: convert has_packed_and_bad() to object_id packfile: convert mark_bad_packed_object() to object_id midx: inline nth_midxed_pack_entry() oidset: make oidset_size() an inline function
2021-09-15midx.c: respect 'pack.writeBitmapHashcache' when writing bitmapsTaylor Blau
In the previous commit, the bitmap writing code learned to propagate values from an existing hash-cache extension into the bitmap that it is writing. Now that that functionality exists, let's expose it by teaching the 'git multi-pack-index' builtin to respect the `pack.writeBitmapHashCache` option so that the hash-cache may be written at all. Two minor points worth noting here: - The 'git multi-pack-index write' sub-command didn't previously read any configuration (instead this is handled in the base command). A separate handler is added here to respect this write-specific config option. - I briefly considered adding a 'bitmap_flags' field to the static options struct, but decided against it since it would require plumbing through a new parameter to the write_midx_file() function. Instead, a new MIDX-specific flag is added, which is translated to the corresponding bitmap one. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>