Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-05-10Merge branch 'en/header-split-cache-h-part-2'Junio C Hamano
More header clean-up. * en/header-split-cache-h-part-2: (22 commits) reftable: ensure git-compat-util.h is the first (indirect) include diff.h: reduce unnecessary includes object-store.h: reduce unnecessary includes commit.h: reduce unnecessary includes fsmonitor: reduce includes of cache.h cache.h: remove unnecessary headers treewide: remove cache.h inclusion due to previous changes cache,tree: move basic name compare functions from read-cache to tree cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c hash-ll.h: split out of hash.h to remove dependency on repository.h tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h dir.h: move DTYPE defines from cache.h versioncmp.h: move declarations for versioncmp.c functions from cache.h ws.h: move declarations for ws.c functions from cache.h match-trees.h: move declarations for match-trees.c functions from cache.h pkt-line.h: move declarations for pkt-line.c functions from cache.h base85.h: move declarations for base85.c functions from cache.h copy.h: move declarations for copy.c functions from cache.h server-info.h: move declarations for server-info.c functions from cache.h packfile.h: move pack_window and pack_entry from cache.h ...
2023-05-02Merge branch 'tb/ban-strtok'Junio C Hamano
Mark strtok() and strtok_r() to be banned. * tb/ban-strtok: banned.h: mark `strtok()` and `strtok_r()` as banned t/helper/test-json-writer.c: avoid using `strtok()` t/helper/test-oidmap.c: avoid using `strtok()` t/helper/test-hashmap.c: avoid using `strtok()` string-list: introduce `string_list_setlen()` string-list: multi-delimiter `string_list_split_in_place()`
2023-04-29Merge branch 'tb/enable-cruft-packs-by-default'Junio C Hamano
When "gc" needs to retain unreachable objects, packing them into cruft packs (instead of exploding them into loose object files) has been offered as a more efficient option for some time. Now the use of cruft packs has been made the default and no longer considered an experimental feature. * tb/enable-cruft-packs-by-default: repository.h: drop unused `gc_cruft_packs` builtin/gc.c: make `gc.cruftPacks` enabled by default t/t9300-fast-import.sh: prepare for `gc --cruft` by default t/t6500-gc.sh: add additional test cases t/t6500-gc.sh: refactor cruft pack tests t/t6501-freshen-objects.sh: prepare for `gc --cruft` by default t/t5304-prune.sh: prepare for `gc --cruft` by default builtin/gc.c: ignore cruft packs with `--keep-largest-pack` builtin/repack.c: fix incorrect reference to '-C' pack-write.c: plug a leak in stage_tmp_packfiles()
2023-04-28Merge branch 'ds/fsck-pack-revindex'Junio C Hamano
"git fsck" learned to validate the on-disk pack reverse index files. * ds/fsck-pack-revindex: fsck: validate .rev file header fsck: check rev-index position values fsck: check rev-index checksums fsck: create scaffolding for rev-index checks
2023-04-28Merge branch 'tb/pack-revindex-on-disk'Junio C Hamano
The on-disk reverse index that allows mapping from the pack offset to the object name for the object stored at the offset has been enabled by default. * tb/pack-revindex-on-disk: t: invert `GIT_TEST_WRITE_REV_INDEX` config: enable `pack.writeReverseIndex` by default pack-revindex: introduce `pack.readReverseIndex` pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK pack-revindex: make `load_pack_revindex` take a repository t5325: mark as leak-free pack-write.c: plug a leak in stage_tmp_packfiles()
2023-04-25Merge branch 'ps/fix-geom-repack-with-alternates'Junio C Hamano
Geometric repacking ("git repack --geometric=<n>") in a repository that borrows from an alternate object database had various corner case bugs, which have been corrected. * ps/fix-geom-repack-with-alternates: repack: disable writing bitmaps when doing a local repack repack: honor `-l` when calculating pack geometry t/helper: allow chmtime to print verbosely without modifying mtime pack-objects: extend test coverage of `--stdin-packs` with alternates pack-objects: fix error when same packfile is included and excluded pack-objects: fix error when packing same pack twice pack-objects: split out `--stdin-packs` tests into separate file repack: fix generating multi-pack-index with only non-local packs repack: fix trying to use preferred pack in alternates midx: fix segfault with no packs and invalid preferred pack
2023-04-25Merge branch 'jk/protocol-cap-parse-fix'Junio C Hamano
The code to parse capability list for v0 on-wire protocol fell into an infinite loop when a capability appears multiple times, which has been corrected. * jk/protocol-cap-parse-fix: v0 protocol: use size_t for capability length/offset t5512: test "ls-remote --heads --symref" filtering with v0 and v2 t5512: allow any protocol version for filtered symref test t5512: add v2 support for "ls-remote --symref" test v0 protocol: fix sha1/sha256 confusion for capabilities^{} t5512: stop referring to "v1" protocol v0 protocol: fix infinite loop when parsing multi-valued capabilities
2023-04-25Merge branch 'en/header-split-cache-h'Junio C Hamano
Header clean-up. * en/header-split-cache-h: (24 commits) protocol.h: move definition of DEFAULT_GIT_PORT from cache.h mailmap, quote: move declarations of global vars to correct unit treewide: reduce includes of cache.h in other headers treewide: remove double forward declaration of read_in_full cache.h: remove unnecessary includes treewide: remove cache.h inclusion due to pager.h changes pager.h: move declarations for pager.c functions from cache.h treewide: remove cache.h inclusion due to editor.h changes editor: move editor-related functions and declarations into common file treewide: remove cache.h inclusion due to object.h changes object.h: move some inline functions and defines from cache.h treewide: remove cache.h inclusion due to object-file.h changes object-file.h: move declarations for object-file.c functions from cache.h treewide: remove cache.h inclusion due to git-zlib changes git-zlib: move declarations for git-zlib functions from cache.h treewide: remove cache.h inclusion due to object-name.h changes object-name.h: move declarations for object-name.c functions from cache.h treewide: remove unnecessary cache.h inclusion treewide: be explicit about dependence on mem-pool.h treewide: be explicit about dependence on oid-array.h ...
2023-04-25string-list: multi-delimiter `string_list_split_in_place()`Taylor Blau
Enhance `string_list_split_in_place()` to accept multiple characters as delimiters instead of a single character. Instead of using `strchr(2)` to locate the first occurrence of the given delimiter character, `string_list_split_in_place_multi()` uses `strcspn(2)` to move past the initial segment of characters comprised of any characters in the delimiting set. When only a single delimiting character is provided, `strpbrk(2)` (which is implemented with `strcspn(2)`) has equivalent performance to `strchr(2)`. Modern `strcspn(2)` implementations treat an empty delimiter or the singleton delimiter as a special case and fall back to calling strchrnul(). Both glibc[1] and musl[2] implement `strcspn(2)` this way. This change is one step to removing `strtok(2)` from the tree. Note that `string_list_split_in_place()` is not a strict replacement for `strtok()`, since it will happily turn sequential delimiter characters into empty entries in the resulting string_list. For example: string_list_split_in_place(&xs, "foo:;:bar:;:baz", ":;", -1) would yield a string list of: ["foo", "", "", "bar", "", "", "baz"] Callers that wish to emulate the behavior of strtok(2) more directly should call `string_list_remove_empty_items()` after splitting. To avoid regressions for the new multi-character delimter cases, update t0063 in this patch as well. [1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35 [2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11 Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24commit.h: reduce unnecessary includesElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24hash-ll.h: split out of hash.h to remove dependency on repository.hElijah Newren
hash.h depends upon and includes repository.h, due to the definition and use of the_hash_algo (defined as the_repository->hash_algo). However, most headers trying to include hash.h are only interested in the layout of the structs like object_id. Move the parts of hash.h that do not depend upon repository.h into a new file hash-ll.h (the "low level" parts of hash.h), and adjust other files to use this new header where the convenience inline functions aren't needed. This allows hash.h and object.h to be fairly small, minimal headers. It also exposes a lot of hidden dependencies on both path.h (which was brought in by repository.h) and repository.h (which was previously implicitly brought in by object.h), so also adjust other files to be more explicit about what they depend upon. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24pkt-line.h: move declarations for pkt-line.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24copy.h: move declarations for copy.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24server-info.h: move declarations for server-info.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24symlinks.h: move declarations for symlinks.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-22Merge branch 'ow/ref-filter-omit-empty'Junio C Hamano
"git branch --format=..." and "git format-patch --format=..." learns "--omit-empty" to hide refs that whose formatting result becomes an empty string from the output. * ow/ref-filter-omit-empty: branch, for-each-ref, tag: add option to omit empty lines
2023-04-22Merge branch 'rn/sparse-describe'Junio C Hamano
"git describe --dirty" learns to work better with sparse-index. * rn/sparse-describe: describe: enable sparse index for describe
2023-04-21Merge branch 'gc/better-error-when-local-clone-fails-with-symlink'Junio C Hamano
"git clone --local" stops copying from an original repository that has symbolic links inside its $GIT_DIR; an error message when that happens has been updated. * gc/better-error-when-local-clone-fails-with-symlink: clone: error specifically with --local and symlinked objects
2023-04-21Merge branch 'rs/get-tar-commit-id-use-defined-const'Junio C Hamano
Code clean-up to replace a hardcoded constant with a CPP macro. * rs/get-tar-commit-id-use-defined-const: get-tar-commit-id: use TYPEFLAG_GLOBAL_HEADER instead of magic value
2023-04-19builtin/gc.c: make `gc.cruftPacks` enabled by defaultTaylor Blau
Back in 5b92477f89 (builtin/gc.c: conditionally avoid pruning objects via loose, 2022-05-20), `git gc` learned the `--cruft` option and `gc.cruftPacks` configuration to opt-in to writing cruft packs when collecting or pruning unreachable objects. Cruft packs were introduced with the merge in a50036da1a (Merge branch 'tb/cruft-packs', 2022-06-03). They address the problem of "loose object explosions", where Git will write out many individual loose objects when there is a large number of unreachable objects that have not yet aged past `--prune=<date>`. Instead of keeping track of those unreachable yet recent objects via their loose object file's mtime, cruft packs collect all unreachable objects into a single pack with a corresponding `*.mtimes` file that acts as a table to store the mtimes of all unreachable objects. This prevents the need to store unreachable objects as loose as they age out of the repository, and avoids the problem of loose object explosions. Beyond avoiding loose object explosions, cruft packs also act as a more efficient mechanism to store unreachable objects as they age out of a repository. This is because pairs of similar unreachable objects serve as delta bases for one another. In 5b92477f89, the feature was introduced as experimental. Since then, GitHub has been running these patches in every repository generating hundreds of millions of cruft packs along the way. The feature is battle-tested, and avoids many pathological cases such as above. Users who either run `git gc` manually, or via `git maintenance` can benefit from having cruft packs. As such, enable cruft pack generation to take place by default (by making `gc.cruftPacks` have the default of "true" rather than "false). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-19builtin/gc.c: ignore cruft packs with `--keep-largest-pack`Taylor Blau
When cruft packs were implemented, we never adjusted the code for `git gc`'s `--keep-largest-pack` and `gc.bigPackThreshold` to ignore cruft packs. This option and configuration option share a common implementation, but including cruft packs is wrong in both cases: - Running `git gc --keep-largest-pack` in a repository where the largest pack is the cruft pack itself will make it impossible for `git gc` to prune objects, since the cruft pack itself is kept. - The same is true for `gc.bigPackThreshold`, if the size of the cruft pack exceeds the limit set by the caller. In the future, it is possible that `gc.bigPackThreshold` could be used to write a separate cruft pack containing any new unreachable objects that entered the repository since the last time a cruft pack was written. There are some complexities to doing so, mainly around handling pruning objects that are in an existing cruft pack that is above the threshold (which would either need to be rewritten, or else delay pruning). Rewriting a substantially similar cruft pack isn't ideal, but it is significantly better than the status-quo. If users have large cruft packs that they don't want to rewrite, they can mark them as `*.keep` packs. But in general, if a repository has a cruft pack that is so large it is slowing down GC's, it should probably be pruned anyway. In the meantime, ignore cruft packs in the common implementation for both of these options, and add a pair of tests to prevent any future regressions here. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-19builtin/repack.c: fix incorrect reference to '-C'Taylor Blau
When cruft packs were originally being developed, `-C` was designated as the short-form for `--cruft` (as in `git repack -C`). This was dropped due to confusion with Git's top-level `-C` option before submitting to the list. But the reference to it in `--cruft-expiration`'s help text was never updated. Fix that dangling reference in this patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-18Merge branch 'pw/rebase-cleanup-merge-strategy-option-handling'Junio C Hamano
Clean-up of the code path that deals with merge strategy option handling in "git rebase". * pw/rebase-cleanup-merge-strategy-option-handling: rebase: remove a couple of redundant strategy tests rebase -m: fix serialization of strategy options rebase -m: cleanup --strategy-option handling sequencer: use struct strvec to store merge strategy options rebase: stop reading and writing unnecessary strategy state
2023-04-18Merge branch 'cm/branch-delete-error-message-update'Junio C Hamano
"git branch -d origin/master" would say "no such branch", but it is likely a missed "-r" if refs/remotes/origin/master exists. The command has been taught to give such a hint in its error message. * cm/branch-delete-error-message-update: branch: improve error log on branch not found by checking remotes refs
2023-04-18Merge branch 'tk/mergetool-gui-default-config'Junio C Hamano
"git mergetool" and "git difftool" learns a new configuration guiDefault to optionally favor configured guitool over non-gui-tool automatically when $DISPLAY is set. * tk/mergetool-gui-default-config: mergetool: new config guiDefault supports auto-toggling gui by DISPLAY
2023-04-18Merge branch 'sl/sparse-write-tree'Junio C Hamano
"git write-tree" learns to work better with sparse-index. * sl/sparse-write-tree: write-tree: integrate with sparse index
2023-04-18fsck: validate .rev file headerDerrick Stolee
While parsing a .rev file, we check the header information to be sure it makes sense. This happens before doing any additional validation such as a checksum or value check. In order to differentiate between a bad header and a non-existent file, we need to update the API for loading a reverse index. Make load_pack_revindex_from_disk() non-static and specify that a positive value means "the file does not exist" while other errors during parsing are negative values. Since an invalid header prevents setting up the structures we would use for further validations, we can stop at that point. The place where we can distinguish between a missing file and a corrupt file is inside load_revindex_from_disk(), which is used both by pack rev-indexes and multi-pack-index rev-indexes. Some tests in t5326 demonstrate that it is critical to take some conditions to allow positive error signals. Add tests that check the three header values. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-18fsck: create scaffolding for rev-index checksDerrick Stolee
The 'fsck' builtin checks many of Git's on-disk data structures, but does not currently validate the pack rev-index files (a .rev file to pair with a .pack and .idx file). Before doing a more-involved check process, create the scaffolding within builtin/fsck.c to have a new error type and add that error type when the API method verify_pack_revindex() returns an error. That method does nothing currently, but we will add checks to it in later changes. For now, check that 'git fsck' succeeds without any errors in the normal case. Future checks will be paired with tests that corrupt the .rev file appropriately. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-15v0 protocol: use size_t for capability length/offsetJeff King
When parsing server capabilities, we use "int" to store lengths and offsets. At first glance this seems like a spot where our parser may be confused by integer overflow if somebody sent us a malicious response. In practice these strings are all bounded by the 64k limit of a pkt-line, so using "int" is OK. However, it makes the code simpler to audit if they just use size_t everywhere. Note that because we take these parameters as pointers, this also forces many callers to update their declared types. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14repack: disable writing bitmaps when doing a local repackPatrick Steinhardt
In order to write a bitmap, we need to have full coverage of all objects that are about to be packed. In the traditional non-multi-pack-index world this meant we need to do a full repack of all objects into a single packfile. But in the new multi-pack-index world we can get away with writing bitmaps when we have multiple packfiles as long as the multi-pack-index covers all objects. This is not always the case though. When asked to perform a repack of local objects, only, then we cannot guarantee to have full coverage of all objects regardless of whether we do a full repack or a repack with a multi-pack-index. The end result is that writing the bitmap will fail in both worlds: $ git multi-pack-index write --stdin-packs --bitmap <packfiles warning: Failed to write bitmap index. Packfile doesn't have full closure (object 1529341d78cf45377407369acb0f4ff2b5cdae42 is missing) error: could not write multi-pack bitmap Now there are two different ways to fix this. The first one would be to amend git-multi-pack-index(1) to disable writing bitmaps when we notice that we don't have full object coverage. - We don't have enough information in git-multi-pack-index(1) in order to tell whether the local repository _should_ have full coverage. Because even when connected to an alternate object directory, it may be the case that we still have all objects around in the main object database. - git-multi-pack-index(1) is quite a low-level tool. Automatically disabling functionality that it was asked to provide does not feel like the right thing to do. We can easily fix it at a higher level in git-repack(1) though. When asked to only include local objects via `-l` and when connected to an alternate object directory then we will override the user's ask and disable writing bitmaps with a warning. This is similar to what we do in git-pack-objects(1), where we also disable writing bitmaps in case we omit an object from the pack. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14repack: honor `-l` when calculating pack geometryPatrick Steinhardt
When the user passes `-l` to git-repack(1), then they essentially ask us to only repack objects part of the local object database while ignoring any packfiles part of an alternate object database. And we in fact honor this bit when doing a geometric repack as the resulting packfile will only ever contain local objects. What we're missing though is that we don't take locality of packfiles into account when computing whether the geometric sequence is intact or not. So even though we would only ever roll up local packfiles anyway, we could end up trying to repack because of non-local packfiles. This does not make much sense, and in the worst case it can cause us to try and do the geometric repack over and over again because we're never able to restore the geometric sequence. Fix this bug by honoring whether the user has passed `-l`. If so, we skip adding any non-local packfiles to the pack geometry. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14pack-objects: fix error when same packfile is included and excludedPatrick Steinhardt
When passing the same packfile both as included and excluded via the `--stdin-packs` option, then we will return an error because the excluded packfile cannot be found. This is because we will only set the `util` pointer for the included packfile list if it was found, so that we later die when we notice that it's in fact not set for the excluded packfile list. Fix this bug by always setting the `util` pointer for both the included and excluded list entries. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14pack-objects: fix error when packing same pack twicePatrick Steinhardt
When passed the same packfile twice via `--stdin-packs` we return an error that the packfile supposedly was not found. This is because when reading packs into the list of included or excluded packfiles, we will happily re-add packfiles even if they are part of the lists already. And while the list can now contain duplicates, we will only set the `util` pointer of the first list entry to the `packed_git` structure. We notice that at a later point when checking that all list entries have their `util` pointer set and die with an error. While this is kind of a nonsensical request, this scenario can be hit when doing geometric repacks. When a repository is connected to an alternate object directory and both have the exact same packfile then both would get added to the geometric sequence. And when we then decide to perform the repack, we will invoke git-pack-objects(1) with the same packfile twice. Fix this bug by removing any duplicates from both the included and excluded packs. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14repack: fix generating multi-pack-index with only non-local packsPatrick Steinhardt
When writing the multi-pack-index with geometric repacking we will add all packfiles to the index that are part of the geometric sequence. This can potentially also include packfiles borrowed from an alternate object directory. But given that a multi-pack-index can only ever include packs that are part of the main object database this does not make much sense whatsoever. In the edge case where all packfiles are contained in the alternate object database and the local repository has none itself this bug can cause us to invoke git-multi-pack-index(1) with only non-local packfiles that it ultimately cannot find. This causes it to return an error and thus causes the geometric repack to fail. Fix the code to skip non-local packfiles. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14repack: fix trying to use preferred pack in alternatesPatrick Steinhardt
When doing a geometric repack with multi-pack-indices, then we ask git-multi-pack-index(1) to use the largest packfile as the preferred pack. It can happen though that the largest packfile is not part of the main object database, but instead part of an alternate object database. The result is that git-multi-pack-index(1) will not be able to find the preferred pack and print a warning. It then falls back to use the first packfile that the multi-pack-index shall reference. Fix this bug by only considering packfiles as preferred pack that are local. This is the right thing to do given that a multi-pack-index should never reference packfiles borrowed from an alternate. While at it, rename the function `get_largest_active_packfile()` to `get_preferred_pack()` to better document its intent. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-13branch, for-each-ref, tag: add option to omit empty linesØystein Walle
If the given format string expands to the empty string, a newline is still printed. This makes using the output linewise more tedious. For example, git update-ref --stdin does not accept empty lines. Add options to "git branch", "git for-each-ref", and "git tag" to not print these empty lines. The default behavior remains the same. Signed-off-by: Øystein Walle <oystwa@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-13t: invert `GIT_TEST_WRITE_REV_INDEX`Taylor Blau
Back in e8c58f894b (t: support GIT_TEST_WRITE_REV_INDEX, 2021-01-25), we added a test knob to conditionally enable writing a ".rev" file when indexing a pack. At the time, this was used to ensure that the test suite worked even when ".rev" files were written, which served as a stress-test for the on-disk reverse index implementation. Now that reading from on-disk ".rev" files is enabled by default, the test knob `GIT_TEST_WRITE_REV_INDEX` no longer has any meaning. We could get rid of the option entirely, but there would be no convenient way to test Git when ".rev" files *aren't* in place. Instead of getting rid of the option, invert its meaning to instead disable writing ".rev" files, thereby running the test suite in a mode where the reverse index is generated from scratch. This ensures that, when GIT_TEST_NO_WRITE_REV_INDEX is set to some spelling of "true", we are still running and exercising Git's behavior when forced to generate reverse indexes from scratch. Do so by setting it in the linux-TEST-vars CI run to ensure that we are maintaining good coverage of this now-legacy code. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-13config: enable `pack.writeReverseIndex` by defaultTaylor Blau
Back in e37d0b8730 (builtin/index-pack.c: write reverse indexes, 2021-01-25), Git learned how to read and write a pack's reverse index from a file instead of in-memory. A pack's reverse index is a mapping from pack position (that is, the order that objects appear together in a ".pack") to their position in lexical order (that is, the order that objects are listed in an ".idx" file). Reverse indexes are consulted often during pack-objects, as well as during auxiliary operations that require mapping between pack offsets, pack order, and index index. They are useful in GitHub's infrastructure, where we have seen a dramatic increase in performance when writing ".rev" files[1]. In particular: - an ~80% reduction in the time it takes to serve fetches on a popular repository, Homebrew/homebrew-core. - a ~60% reduction in the peak memory usage to serve fetches on that same repository. - a collective savings of ~35% in CPU time across all pack-objects invocations serving fetches across all repositories in a single datacenter. Reverse indexes are also beneficial to end-users as well as forges. For example, the time it takes to generate a pack containing the objects for the 10 most recent commits in linux.git (representing a typical push) is significantly faster when on-disk reverse indexes are available: $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~10 } >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null' Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 543.0 ms ± 20.3 ms [User: 616.2 ms, System: 58.8 ms] Range (min … max): 521.0 ms … 577.9 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 245.0 ms ± 11.4 ms [User: 335.6 ms, System: 31.3 ms] Range (min … max): 226.0 ms … 259.6 ms 13 runs Summary 'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran 2.22 ± 0.13 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null' The same is true of writing a pack containing the objects for the 30 most-recent commits: $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~30 } >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null' Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 866.5 ms ± 16.2 ms [User: 1414.5 ms, System: 97.0 ms] Range (min … max): 839.3 ms … 886.9 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 581.6 ms ± 10.2 ms [User: 1181.7 ms, System: 62.6 ms] Range (min … max): 567.5 ms … 599.3 ms 10 runs Summary 'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran 1.49 ± 0.04 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ...and savings on trivial operations like computing the on-disk size of a single (packed) object are even more dramatic: $ git rev-parse HEAD >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in' Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 305.8 ms ± 11.4 ms [User: 264.2 ms, System: 41.4 ms] Range (min … max): 290.3 ms … 331.1 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 4.0 ms ± 0.3 ms [User: 1.7 ms, System: 2.3 ms] Range (min … max): 1.6 ms … 4.6 ms 1155 runs Summary 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran 76.96 ± 6.25 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in' In the more than two years since e37d0b8730 was merged, Git's implementation of on-disk reverse indexes has been thoroughly tested, both from users enabling `pack.writeReverseIndexes`, and from GitHub's deployment of the feature. The latter has been running without incident for more than two years. This patch changes Git's behavior to write on-disk reverse indexes by default when indexing a pack, which should make the above operations faster for everybody's Git installation after a repack. (The previous commit explains some potential drawbacks of using on-disk reverse indexes in certain limited circumstances, that essentially boil down to a trade-off between time to generate, and time to access. For those limited cases, the `pack.readReverseIndex` escape hatch can be used). [1]: https://github.blog/2021-04-29-scaling-monorepo-maintenance/#reverse-indexes Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11Merge branch 'jc/clone-object-format-from-void'Junio C Hamano
"git clone" from an empty repository learned to propagate the choice of the hash algorithm from the source repository to the newly created repository. * jc/clone-object-format-from-void: clone: propagate object-format when cloning from void
2023-04-11Merge branch 'ws/sparse-check-rules'Junio C Hamano
"git sparse-checkout" command learns a debugging aid for the sparse rule definitions. * ws/sparse-check-rules: builtin/sparse-checkout: add check-rules command builtin/sparse-checkout: remove NEED_WORK_TREE flag
2023-04-11treewide: remove double forward declaration of read_in_fullElijah Newren
cache.h's nature of a dumping ground of includes prevented it from being included in some compat/ files, forcing us into a workaround of having a double forward declaration of the read_in_full() function (see commit 14086b0a13 ("compat/pread.c: Add a forward declaration to fix a warning", 2007-11-17)). Now that we have moved functions like read_in_full() from cache.h to wrapper.h, and wrapper.h isn't littered with unrelated and scary #defines, get rid of the extra forward declaration and just have compat/pread.c include wrapper.h. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11pager.h: move declarations for pager.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11editor: move editor-related functions and declarations into common fileElijah Newren
cache.h and strbuf.[ch] had editor-related functions. Move these into editor.[ch]. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-file.h: move declarations for object-file.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11git-zlib: move declarations for git-zlib functions from cache.hElijah Newren
Move functions from cache.h for zlib.c into a new header file. Since adding a "zlib.h" would cause issues with the real zlib, rename zlib.c to git-zlib.c while we are at it. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-name.h: move declarations for object-name.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on mem-pool.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on oid-array.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on pack-revindex.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on convert.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>