Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-12-15midx: implement `BTMP` chunkTaylor Blau
When a multi-pack bitmap is used to implement verbatim pack reuse (that is, when verbatim chunks from an on-disk packfile are copied directly[^1]), it does so by using its "preferred pack" as the source for pack-reuse. This allows repositories to pack the majority of their objects into a single (often large) pack, and then use it as the single source for verbatim pack reuse. This increases the amount of objects that are reused verbatim (and consequently, decrease the amount of time it takes to generate many packs). But this performance comes at a cost, which is that the preferred packfile must pace its growth with that of the entire repository in order to maintain the utility of verbatim pack reuse. As repositories grow beyond what we can reasonably store in a single packfile, the utility of verbatim pack reuse diminishes. Or, at the very least, it becomes increasingly more expensive to maintain as the pack grows larger and larger. It would be beneficial to be able to perform this same optimization over multiple packs, provided some modest constraints (most importantly, that the set of packs eligible for verbatim reuse are disjoint with respect to the subset of their objects being sent). If we assume that the packs which we treat as candidates for verbatim reuse are disjoint with respect to any of their objects we may output, we need to make only modest modifications to the verbatim pack-reuse code itself. Most notably, we need to remove the assumption that the bits in the reachability bitmap corresponding to objects from the single reuse pack begin at the first bit position. Future patches will unwind these assumptions and reimplement their existing functionality as special cases of the more general assumptions (e.g. that reuse bits can start anywhere within the bitset, but happen to start at 0 for all existing cases). This patch does not yet relax any of those assumptions. Instead, it implements a foundational data-structure, the "Bitampped Packs" (`BTMP`) chunk of the multi-pack index. The `BTMP` chunk's contents are described in detail here. Importantly, the `BTMP` chunk contains information to map regions of a multi-pack index's reachability bitmap to the packs whose objects they represent. For now, this chunk is only written, not read (outside of the test-tool used in this patch to test the new chunk's behavior). Future patches will begin to make use of this new chunk. [^1]: Modulo patching any `OFS_DELTA`'s that cross over a region of the pack that wasn't used verbatim. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-08Merge branch 'tb/format-pack-doc-update'Junio C Hamano
Doc update. * tb/format-pack-doc-update: Documentation/gitformat-pack.txt: fix incorrect MIDX documentation Documentation/gitformat-pack.txt: fix typo
2023-11-01Documentation/gitformat-pack.txt: fix incorrect MIDX documentationTaylor Blau
Back in 32f3c541e3 (multi-pack-index: write pack names in chunk, 2018-07-12) the MIDX's "Packfile Names" (or "PNAM", for short) chunk was described as containing an array of string entries. e0d1bcf825 notes that this is the only chunk in the MIDX format's specification that is not guaranteed to be 4-byte aligned, and so should be placed last. This isn't quite accurate: the entries within the PNAM chunk are not guaranteed to be 4-byte aligned since they are arbitrary strings, but the chunk itself is 4-byte aligned since the ending is padded with NUL bytes. That padding has always been there since 32f3c541e3 via midx.c::write_midx_pack_names(), which ended with: i = MIDX_CHUNK_ALIGNMENT - (written % MIDX_CHUNK_ALIGNMENT) if (i < MIDX_CHUNK_ALIGNMENT) { unsigned char padding[MIDX_CHUNK_ALIGNMENT]; memset(padding, 0, sizeof(padding)) hashwrite(f, padding, i); written += i; } In fact, 32f3c541e3's log message itself describes the chunk in its first paragraph with: Since filenames are not well structured, add padding to keep good alignment in later chunks. So these have always been externally aligned. Correct the corresponding part of our documentation to reflect that. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-01Documentation/gitformat-pack.txt: fix typoTaylor Blau
e0d1bcf825 (multi-pack-index: add format details, 2018-07-12) describes the MIDX's "PNAM" chunk as having entries which are "null-terminated strings". This is a typo, as strings are terminated with a NUL character, which is a distinct concept from "NULL" or "null", which we typically reserve for the void pointer to address 0. Correct the documentation accordingly. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: add some commas where they are helpfulElijah Newren
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: add missing articleElijah Newren
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: fix verb tenseElijah Newren
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09documentation: fix typosElijah Newren
Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-29Documentation/gitformat-pack.txt: drop mixed version sectionTaylor Blau
This section was added in 3d89a8c118 (Documentation/technical: add cruft-packs.txt, 2022-05-20) to highlight a potential pitfall when deploying cruft packs in an environment where multiple versions of Git are GC-ing the same repository. Now that it has been more than a year since 3d89a8c118 was written, let's drop this section as it is no longer relevant. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-29Documentation/gitformat-pack.txt: remove multi-cruft packs alternativeTaylor Blau
This text, originally from 3d89a8c118 (Documentation/technical: add cruft-packs.txt, 2022-05-20) lists multiple cruft packs as a potential alternative to the design of cruft packs. We have always supported multiple cruft packs (i.e. we use the most recent mtime for a given object among all cruft packs which contain it, etc.), but haven't encouraged its use. We still aren't encouraging users to go out and generate multiple cruft packs, but let's take a step in that direction by dropping language that suggests we aren't capable of working with multiple cruft packs. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-19builtin/gc.c: make `gc.cruftPacks` enabled by defaultTaylor Blau
Back in 5b92477f89 (builtin/gc.c: conditionally avoid pruning objects via loose, 2022-05-20), `git gc` learned the `--cruft` option and `gc.cruftPacks` configuration to opt-in to writing cruft packs when collecting or pruning unreachable objects. Cruft packs were introduced with the merge in a50036da1a (Merge branch 'tb/cruft-packs', 2022-06-03). They address the problem of "loose object explosions", where Git will write out many individual loose objects when there is a large number of unreachable objects that have not yet aged past `--prune=<date>`. Instead of keeping track of those unreachable yet recent objects via their loose object file's mtime, cruft packs collect all unreachable objects into a single pack with a corresponding `*.mtimes` file that acts as a table to store the mtimes of all unreachable objects. This prevents the need to store unreachable objects as loose as they age out of the repository, and avoids the problem of loose object explosions. Beyond avoiding loose object explosions, cruft packs also act as a more efficient mechanism to store unreachable objects as they age out of a repository. This is because pairs of similar unreachable objects serve as delta bases for one another. In 5b92477f89, the feature was introduced as experimental. Since then, GitHub has been running these patches in every repository generating hundreds of millions of cruft packs along the way. The feature is battle-tested, and avoids many pathological cases such as above. Users who either run `git gc` manually, or via `git maintenance` can benefit from having cruft packs. As such, enable cruft pack generation to take place by default (by making `gc.cruftPacks` have the default of "true" rather than "false). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-05docs: move cruft pack docs to gitformat-packÆvar Arnfjörð Bjarmason
Integrate the cruft packs documentation initially added in 3d89a8c1180 (Documentation/technical: add cruft-packs.txt, 2022-05-20) to the newly created "gitformat-pack" documentation. Like the "bitmap-format" added before it in 0d4455a3ab0 (documentation: add documentation for the bitmap format, 2013-11-14) the "cruft-packs" were documented in their own file. As the diff move detection will show there is no change to "Documentation/technical/cruft-packs.txt" here except to move it, and to "indent" the existing sections by adding an extra "=" to them. We could similarly convert the "bitmap-format.txt", but let's leave it for now due to a conflict with the in-flight ac/bitmap-lookup-table series. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-05docs: move pack format docs to man section 5Ævar Arnfjörð Bjarmason
Continue the move of existing Documentation/technical/* protocol and file-format documentation into our main documentation space by moving the various documentation pertaining to the *.pack format and related files, and updating things that refer to it to link to the new location. By moving these we can properly link from the newly created gitformat-commit-graph to a gitformat-chunk-format page. Integrating "Documentation/technical/bitmap-format.txt" and "Documentation/technical/cruft-packs.txt" might logically be part of this change, but as those cover parts of the wider "pack format" (including associated files) that's documented outside of "Documentation/technical/pack-format.txt" let's leave those for now, subsequent commit(s) will address those. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>