Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
path: root/path.c
AgeCommit message (Collapse)Author
2024-06-14global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt
Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-31Merge branch 'ps/undecided-is-not-necessarily-sha1'Junio C Hamano
Before discovering the repository details, We used to assume SHA-1 as the "default" hash function, which has been corrected. Hopefully this will smoke out codepaths that rely on such an unwarranted assumptions. * ps/undecided-is-not-necessarily-sha1: repository: stop setting SHA1 as the default object hash oss-fuzz/commit-graph: set up hash algorithm builtin/shortlog: don't set up revisions without repo builtin/diff: explicitly set hash algo when there is no repo builtin/bundle: abort "verify" early when there is no repository builtin/blame: don't access potentially unitialized `the_hash_algo` builtin/rev-parse: allow shortening to more than 40 hex characters remote-curl: fix parsing of detached SHA256 heads attr: fix BUG() when parsing attrs outside of repo attr: don't recompute default attribute source parse-options-cb: only abbreviate hashes when hash algo is known path: move `validate_headref()` to its only user path: harden validation of HEAD with non-standard hashes
2024-05-07path: move `validate_headref()` to its only userPatrick Steinhardt
While `validate_headref()` is only called from `is_git_directory()` in "setup.c", it is currently implemented in "path.c". Move it over such that it becomes clear that it is only really used during setup in order to discover repositories. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07path: harden validation of HEAD with non-standard hashesPatrick Steinhardt
The `validate_headref()` function takes a path to a supposed "HEAD" file and checks whether its format is something that we understand. It is used as part of our repository discovery to check whether a specific directory is a Git directory or not. Part of the validation is a check for a detached HEAD that contains a plain object ID. To do this validation we use `get_oid_hex()`, which relies on `the_hash_algo`. At this point in time the hash algo cannot yet be initialized though because we didn't yet read the Git config. Consequently, it will always be the SHA1 hash algorithm. In practice this works alright because `get_oid_hex()` only ends up checking whether the prefix of the buffer is a valid object ID. And because SHA1 is shorter than SHA256, the function will successfully parse SHA256 object IDs, as well. It is somewhat fragile though and not really the intent to only check for SHA1. With this in mind, harden the code to use `get_oid_hex_any()` to check whether the "HEAD" file parses as any known hash. One might be hard pressed to tighten the check even further and fully validate the file contents, not only the prefix. In practice though that wouldn't make a lot of sense as it could be that the repository uses a hash function that produces longer hashes than SHA256, but which the current version of Git doesn't understand yet. We'd still want to detect the repository as proper Git repository in that case, and we will fail eventually with a proper error message that the hash isn't understood when trying to set up the repository format. It follows that we could just leave the current code intact, as in practice the code change doesn't have any user visible impact. But it also prepares us for `the_hash_algo` being unset when there is no repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-29Sync with 2.44.1Johannes Schindelin
* maint-2.44: (41 commits) Git 2.44.1 Git 2.43.4 Git 2.42.2 Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel ...
2024-04-19Sync with 2.43.4Johannes Schindelin
* maint-2.43: (40 commits) Git 2.43.4 Git 2.42.2 Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories ...
2024-04-19Sync with 2.41.1Johannes Schindelin
* maint-2.41: (38 commits) Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' docs: document security issues around untrusted .git dirs ...
2024-04-19Sync with 2.40.2Johannes Schindelin
* maint-2.40: (39 commits) Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' docs: document security issues around untrusted .git dirs upload-pack: disable lazy-fetching by default ...
2024-04-17fetch/clone: detect dubious ownership of local repositoriesJohannes Schindelin
When cloning from somebody else's repositories, it is possible that, say, the `upload-pack` command is overridden in the repository that is about to be cloned, which would then be run in the user's context who started the clone. To remind the user that this is a potentially unsafe operation, let's extend the ownership checks we have already established for regular gitdir discovery to extend also to local repositories that are about to be cloned. This protection extends also to file:// URLs. The fixes in this commit address CVE-2024-32004. Note: This commit does not touch the `fetch`/`clone` code directly, but instead the function used implicitly by both: `enter_repo()`. This function is also used by `git receive-pack` (i.e. pushes), by `git upload-archive`, by `git daemon` and by `git http-backend`. In setups that want to serve repositories owned by different users than the account running the service, this will require `safe.*` settings to be configured accordingly. Also note: there are tiny time windows where a time-of-check-time-of-use ("TOCTOU") race is possible. The real solution to those would be to work with `fstat()` and `openat()`. However, the latter function is not available on Windows (and would have to be emulated with rather expensive low-level `NtCreateFile()` calls), and the changes would be quite extensive, for my taste too extensive for the little gain given that embargoed releases need to pay extra attention to avoid introducing inadvertent bugs. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-04-16Merge branch 'rs/apply-lift-path-length-limit'Junio C Hamano
"git apply" has been updated to lift the hardcoded pathname length limit, which in turn allowed a mksnpath() function that is no longer used. * rs/apply-lift-path-length-limit: path: remove mksnpath() apply: avoid fixed-size buffer in create_one_file()
2024-04-05path: remove mksnpath()René Scharfe
Remove the function mksnpath(), which has become unused. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-07refs: introduce reftable backendPatrick Steinhardt
Due to scalability issues, Shawn Pearce has originally proposed a new "reftable" format more than six years ago [1]. Initially, this new format was implemented in JGit with promising results. Around two years ago, we have then added the "reftable" library to the Git codebase via a4bbd13be3 (Merge branch 'hn/reftable', 2021-12-15). With this we have landed all the low-level code to read and write reftables. Notably missing though was the integration of this low-level code into the Git code base in the form of a new ref backend that ties all of this together. This gap is now finally closed by introducing a new "reftable" backend into the Git codebase. This new backend promises to bring some notable improvements to Git repositories: - It becomes possible to do truly atomic writes where either all refs are committed to disk or none are. This was not possible with the "files" backend because ref updates were split across multiple loose files. - The disk space required to store many refs is reduced, both compared to loose refs and packed-refs. This is enabled both by the reftable format being a binary format, which is more compact, and by prefix compression. - We can ignore filesystem-specific behaviour as ref names are not encoded via paths anymore. This means there is no need to handle case sensitivity on Windows systems or Unicode precomposition on macOS. - There is no need to rewrite the complete refdb anymore every time a ref is being deleted like it was the case for packed-refs. This means that ref deletions are now constant time instead of scaling linearly with the number of refs. - We can ignore file/directory conflicts so that it becomes possible to store both "refs/heads/foo" and "refs/heads/foo/bar". - Due to this property we can retain reflogs for deleted refs. We have previously been deleting reflogs together with their refs to avoid file/directory conflicts, which is not necessary anymore. - We can properly enumerate all refs. With the "files" backend it is not easily possible to distinguish between refs and non-refs because they may live side by side in the gitdir. Not all of these improvements are realized with the current "reftable" backend implementation. At this point, the new backend is supposed to be a drop-in replacement for the "files" backend that is used by basically all Git repositories nowadays. It strives for 1:1 compatibility, which means that a user can expect the same behaviour regardless of whether they use the "reftable" backend or the "files" backend for most of the part. Most notably, this means we artificially limit the capabilities of the "reftable" backend to match the limits of the "files" backend. It is not possible to create refs that would end up with file/directory conflicts, we do not retain reflogs, we perform stricter-than-necessary checks. This is done intentionally due to two main reasons: - It makes it significantly easier to land the "reftable" backend as tests behave the same. It would be tough to argue for each and every single test that doesn't pass with the "reftable" backend. - It ensures compatibility between repositories that use the "files" backend and repositories that use the "reftable" backend. Like this, hosters can migrate their repositories to use the "reftable" backend without causing issues for clients that use the "files" backend in their clones. It is expected that these artificial limitations may eventually go away in the long term. Performance-wise things very much depend on the actual workload. The following benchmarks compare the "files" and "reftable" backends in the current version: - Creating N refs in separate transactions shows that the "files" backend is ~50% faster. This is not surprising given that creating a ref only requires us to create a single loose ref. The "reftable" backend will also perform auto compaction on updates. In real-world workloads we would likely also want to perform pack loose refs, which would likely change the picture. Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1) Time (mean ± σ): 2.1 ms ± 0.3 ms [User: 0.6 ms, System: 1.7 ms] Range (min … max): 1.8 ms … 4.3 ms 133 runs Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1) Time (mean ± σ): 2.7 ms ± 0.1 ms [User: 0.6 ms, System: 2.2 ms] Range (min … max): 2.4 ms … 2.9 ms 132 runs Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000) Time (mean ± σ): 1.975 s ± 0.006 s [User: 0.437 s, System: 1.535 s] Range (min … max): 1.969 s … 1.980 s 3 runs Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000) Time (mean ± σ): 2.611 s ± 0.013 s [User: 0.782 s, System: 1.825 s] Range (min … max): 2.597 s … 2.622 s 3 runs Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000) Time (mean ± σ): 198.442 s ± 0.241 s [User: 43.051 s, System: 155.250 s] Range (min … max): 198.189 s … 198.670 s 3 runs Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000) Time (mean ± σ): 294.509 s ± 4.269 s [User: 104.046 s, System: 190.326 s] Range (min … max): 290.223 s … 298.761 s 3 runs - Creating N refs in a single transaction shows that the "files" backend is significantly slower once we start to write many refs. The "reftable" backend only needs to update two files, whereas the "files" backend needs to write one file per ref. Benchmark 1: update-ref: create many refs (refformat = files, refcount = 1) Time (mean ± σ): 1.9 ms ± 0.1 ms [User: 0.4 ms, System: 1.4 ms] Range (min … max): 1.8 ms … 2.6 ms 151 runs Benchmark 2: update-ref: create many refs (refformat = reftable, refcount = 1) Time (mean ± σ): 2.5 ms ± 0.1 ms [User: 0.7 ms, System: 1.7 ms] Range (min … max): 2.4 ms … 3.4 ms 148 runs Benchmark 3: update-ref: create many refs (refformat = files, refcount = 1000) Time (mean ± σ): 152.5 ms ± 5.2 ms [User: 19.1 ms, System: 133.1 ms] Range (min … max): 148.5 ms … 167.8 ms 15 runs Benchmark 4: update-ref: create many refs (refformat = reftable, refcount = 1000) Time (mean ± σ): 58.0 ms ± 2.5 ms [User: 28.4 ms, System: 29.4 ms] Range (min … max): 56.3 ms … 72.9 ms 40 runs Benchmark 5: update-ref: create many refs (refformat = files, refcount = 1000000) Time (mean ± σ): 152.752 s ± 0.710 s [User: 20.315 s, System: 131.310 s] Range (min … max): 152.165 s … 153.542 s 3 runs Benchmark 6: update-ref: create many refs (refformat = reftable, refcount = 1000000) Time (mean ± σ): 51.912 s ± 0.127 s [User: 26.483 s, System: 25.424 s] Range (min … max): 51.769 s … 52.012 s 3 runs - Deleting a ref in a fully-packed repository shows that the "files" backend scales with the number of refs. The "reftable" backend has constant-time deletions. Benchmark 1: update-ref: delete ref (refformat = files, refcount = 1) Time (mean ± σ): 1.7 ms ± 0.1 ms [User: 0.4 ms, System: 1.2 ms] Range (min … max): 1.6 ms … 2.1 ms 316 runs Benchmark 2: update-ref: delete ref (refformat = reftable, refcount = 1) Time (mean ± σ): 1.8 ms ± 0.1 ms [User: 0.4 ms, System: 1.3 ms] Range (min … max): 1.7 ms … 2.1 ms 294 runs Benchmark 3: update-ref: delete ref (refformat = files, refcount = 1000) Time (mean ± σ): 2.0 ms ± 0.1 ms [User: 0.5 ms, System: 1.4 ms] Range (min … max): 1.9 ms … 2.5 ms 287 runs Benchmark 4: update-ref: delete ref (refformat = reftable, refcount = 1000) Time (mean ± σ): 1.9 ms ± 0.1 ms [User: 0.5 ms, System: 1.3 ms] Range (min … max): 1.8 ms … 2.1 ms 217 runs Benchmark 5: update-ref: delete ref (refformat = files, refcount = 1000000) Time (mean ± σ): 229.8 ms ± 7.9 ms [User: 182.6 ms, System: 46.8 ms] Range (min … max): 224.6 ms … 245.2 ms 6 runs Benchmark 6: update-ref: delete ref (refformat = reftable, refcount = 1000000) Time (mean ± σ): 2.0 ms ± 0.0 ms [User: 0.6 ms, System: 1.3 ms] Range (min … max): 2.0 ms … 2.1 ms 3 runs - Listing all refs shows no significant advantage for either of the backends. The "files" backend is a bit faster, but not by a significant margin. When repositories are not packed the "reftable" backend outperforms the "files" backend because the "reftable" backend performs auto-compaction. Benchmark 1: show-ref: print all refs (refformat = files, refcount = 1, packed = true) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.0 ms 1729 runs Benchmark 2: show-ref: print all refs (refformat = reftable, refcount = 1, packed = true) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 1.8 ms 1816 runs Benchmark 3: show-ref: print all refs (refformat = files, refcount = 1000, packed = true) Time (mean ± σ): 4.3 ms ± 0.1 ms [User: 0.9 ms, System: 3.3 ms] Range (min … max): 4.1 ms … 4.6 ms 645 runs Benchmark 4: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = true) Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 1.0 ms, System: 3.3 ms] Range (min … max): 4.2 ms … 5.9 ms 643 runs Benchmark 5: show-ref: print all refs (refformat = files, refcount = 1000000, packed = true) Time (mean ± σ): 2.537 s ± 0.034 s [User: 0.488 s, System: 2.048 s] Range (min … max): 2.511 s … 2.627 s 10 runs Benchmark 6: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = true) Time (mean ± σ): 2.712 s ± 0.017 s [User: 0.653 s, System: 2.059 s] Range (min … max): 2.692 s … 2.752 s 10 runs Benchmark 7: show-ref: print all refs (refformat = files, refcount = 1, packed = false) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 1.9 ms 1834 runs Benchmark 8: show-ref: print all refs (refformat = reftable, refcount = 1, packed = false) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 2.0 ms 1840 runs Benchmark 9: show-ref: print all refs (refformat = files, refcount = 1000, packed = false) Time (mean ± σ): 13.8 ms ± 0.2 ms [User: 2.8 ms, System: 10.8 ms] Range (min … max): 13.3 ms … 14.5 ms 208 runs Benchmark 10: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = false) Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 1.2 ms, System: 3.3 ms] Range (min … max): 4.3 ms … 6.2 ms 624 runs Benchmark 11: show-ref: print all refs (refformat = files, refcount = 1000000, packed = false) Time (mean ± σ): 12.127 s ± 0.129 s [User: 2.675 s, System: 9.451 s] Range (min … max): 11.965 s … 12.370 s 10 runs Benchmark 12: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = false) Time (mean ± σ): 2.799 s ± 0.022 s [User: 0.735 s, System: 2.063 s] Range (min … max): 2.769 s … 2.836 s 10 runs - Printing a single ref shows no real difference between the "files" and "reftable" backends. Benchmark 1: show-ref: print single ref (refformat = files, refcount = 1) Time (mean ± σ): 1.5 ms ± 0.1 ms [User: 0.4 ms, System: 1.0 ms] Range (min … max): 1.4 ms … 1.8 ms 1779 runs Benchmark 2: show-ref: print single ref (refformat = reftable, refcount = 1) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 2.5 ms 1753 runs Benchmark 3: show-ref: print single ref (refformat = files, refcount = 1000) Time (mean ± σ): 1.5 ms ± 0.1 ms [User: 0.3 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 1.9 ms 1840 runs Benchmark 4: show-ref: print single ref (refformat = reftable, refcount = 1000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.0 ms 1831 runs Benchmark 5: show-ref: print single ref (refformat = files, refcount = 1000000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.1 ms 1848 runs Benchmark 6: show-ref: print single ref (refformat = reftable, refcount = 1000000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.1 ms 1762 runs So overall, performance depends on the usecases. Except for many sequential writes the "reftable" backend is roughly on par or significantly faster than the "files" backend though. Given that the "files" backend has received 18 years of optimizations by now this can be seen as a win. Furthermore, we can expect that the "reftable" backend will grow faster over time when attention turns more towards optimizations. The complete test suite passes, except for those tests explicitly marked to require the REFFILES prerequisite. Some tests in t0610 are marked as failing because they depend on still-in-flight bug fixes. Tests can be run with the new backend by setting the GIT_TEST_DEFAULT_REF_FORMAT environment variable to "reftable". There is a single known conceptual incompatibility with the dumb HTTP transport. As "info/refs" SHOULD NOT contain the HEAD reference, and because the "HEAD" file is not valid anymore, it is impossible for the remote client to figure out the default branch without changing the protocol. This shortcoming needs to be handled in a subsequent patch series. As the reftable library has already been introduced a while ago, this commit message will not go into the details of how exactly the on-disk format works. Please refer to our preexisting technical documentation at Documentation/technical/reftable for this. [1]: https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=S-DpZ5dR1aF+VHVETKG20OQ@mail.gmail.com/ Original-idea-by: Shawn Pearce <spearce@spearce.org> Based-on-patch-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-19refs: convert MERGE_AUTOSTASH to become a normal pseudo-refPatrick Steinhardt
Similar to the preceding conversion of the AUTO_MERGE pseudo-ref, let's convert the MERGE_AUTOSTASH ref to become a normal pseudo-ref as well. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-19refs: convert AUTO_MERGE to become a normal pseudo-refPatrick Steinhardt
In 70c70de616 (refs: complete list of special refs, 2023-12-14) we have inrtoduced a new `is_special_ref()` function that classifies some refs as being special. The rule is that special refs are exclusively read and written via the filesystem directly, whereas normal refs exclucsively go via the refs API. The intent of that commit was to record the status quo so that we know to route reads of such special refs consistently. Eventually, the list should be reduced to its bare minimum of refs which really are special, namely FETCH_HEAD and MERGE_HEAD. Follow up on this promise and convert the AUTO_MERGE ref to become a normal pseudo-ref by using the refs API to both read and write it instead of accessing the filesystem directly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-17Merge branch 'cw/compat-util-header-cleanup'Junio C Hamano
Further shuffling of declarations across header files to streamline file dependencies. * cw/compat-util-header-cleanup: git-compat-util: move alloc macros to git-compat-util.h treewide: remove unnecessary includes for wrapper.h kwset: move translation table from ctype sane-ctype.h: create header for sane-ctype macros git-compat-util: move wrapper.c funcs to its header git-compat-util: move strbuf.c funcs to its header
2023-07-06Merge branch 'cw/strbuf-cleanup'Junio C Hamano
Move functions that are not about pure string manipulation out of strbuf.[ch] * cw/strbuf-cleanup: strbuf: remove global variable path: move related function to path object-name: move related functions to object-name credential-store: move related functions to credential-store file abspath: move related functions to abspath strbuf: clarify dependency strbuf: clarify API boundary
2023-07-05treewide: remove unnecessary includes for wrapper.hCalvin Wan
Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-12path: move related function to pathCalvin Wan
Move path-related function from strbuf.[ch] to path.[ch] so that strbuf is focused on string manipulation routines with minimal dependencies. repository.h is no longer a necessary dependency after moving this function out. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: remove double forward declaration of read_in_fullElijah Newren
cache.h's nature of a dumping ground of includes prevented it from being included in some compat/ files, forcing us into a workaround of having a double forward declaration of the read_in_full() function (see commit 14086b0a13 ("compat/pread.c: Add a forward declaration to fix a warning", 2007-11-17)). Now that we have moved functions like read_in_full() from cache.h to wrapper.h, and wrapper.h isn't littered with unrelated and scary #defines, get rid of the extra forward declaration and just have compat/pread.c include wrapper.h. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: remove cache.h inclusion due to setup.h changesElijah Newren
By moving several declarations to setup.h, the previous patch made it possible to remove the include of cache.h in several source files. Do so. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21setup.h: move declarations for setup.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21abspath.h: move absolute path functions from cache.hElijah Newren
This is another step towards letting us remove the include of cache.h in strbuf.c. It does mean that we also need to add includes of abspath.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-18Merge branch 'jk/unused-post-2.39-part2'Junio C Hamano
More work towards -Wunused. * jk/unused-post-2.39-part2: (21 commits) help: mark unused parameter in git_unknown_cmd_config() run_processes_parallel: mark unused callback parameters userformat_want_item(): mark unused parameter for_each_commit_graft(): mark unused callback parameter rewrite_parents(): mark unused callback parameter fetch-pack: mark unused parameter in callback function notes: mark unused callback parameters prio-queue: mark unused parameters in comparison functions for_each_object: mark unused callback parameters list-objects: mark unused callback parameters mark unused parameters in signal handlers run-command: mark error routine parameters as unused mark "pointless" data pointers in callbacks ref-filter: mark unused callback parameters http-backend: mark unused parameters in virtual functions http-backend: mark argc/argv unused object-name: mark unused parameters in disambiguate callbacks serve: mark unused parameters in virtual functions serve: use repository pointer to get config ls-refs: drop config caching ...
2023-02-24mark "pointless" data pointers in callbacksJeff King
Both the object_array_filter() and trie_find() functions use callback functions that let the caller specify which elements match. These callbacks take a void pointer in case the caller wants to pass in extra data. But in each case, the single user of these functions just passes NULL, and the callback ignores the extra pointer. We could just remove these unused parameters from the callback interface entirely. But it's good practice to provide such a pointer, as it guides future callers of the function in the right direction (rather than tempting them to access global data). Plus it's consistent with other generic callback interfaces. So let's instead annotate the unused parameters, in order to silence the compiler's -Wunused-parameter warning. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-29adjust_shared_perm(): leave g+s alone when the group does not matterJunio C Hamano
Julien Moutinho reports that in an environment where directory does not have BSD group semantics and requires the g+s to be set (aka FORCE_DIR_SET_GID), but the system forbids chmod() to touch the g+s bit, adjust_shared_perm() fails even when the repository is for private use with perm = 0600, because we unconditionally try to set the g+s bit. When we grant extra access based on group membership (i.e. the directory has either g+r or g+w bit set), which group the directory and its contents are owned by matters. But otherwise (e.g. perm is set to 0600, in Julien's case), flipping g+s bit is not necessary. Reported-by: Julien Moutinho <julm+git@sourcephile.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-04Merge branch 'ds/bundle-uri'Junio C Hamano
Preliminary code refactoring around transport and bundle code. * ds/bundle-uri: bundle.h: make "fd" version of read_bundle_header() public remote: allow relative_url() to return an absolute url remote: move relative_url() http: make http_get_file() external fetch-pack: move --keep=* option filling to a function fetch-pack: add a deref_without_lazy_fetch_extended() dir API: add a generalized path_match_flags() function connect.c: refactor sending of agent & object-format
2022-05-17dir API: add a generalized path_match_flags() functionÆvar Arnfjörð Bjarmason
Add a path_match_flags() function and have the two sets of starts_with_dot_{,dot_}slash() functions added in 63e95beb085 (submodule: port resolve_relative_url from shell to C, 2016-04-15) and a2b26ffb1a8 (fsck: convert gitmodules url to URL passed to curl, 2020-04-18) be thin wrappers for it. As the latter of those notes the fsck version was copied from the initial builtin/submodule--helper.c version. Since the code added in a2b26ffb1a8 was doing really doing the same as win32_is_dir_sep() added in 1cadad6f658 (git clone <url> C:\cygwin\home\USER\repo' is working (again), 2018-12-15) let's move the latter to git-compat-util.h is a is_xplatform_dir_sep(). We can then call either it or the platform-specific is_dir_sep() from this new function. Let's likewise change code in various other places that was hardcoding checks for "'/' || '\\'" with the new is_xplatform_dir_sep(). As can be seen in those callers some of them still concern themselves with ':' (Mac OS classic?), but let's leave the question of whether that should be consolidated for some other time. As we expect to make wider use of the "native" case in the future, define and use two starts_with_dot_{,dot_}slash_native() convenience wrappers. This makes the diff in builtin/submodule--helper.c much smaller. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-02tree-wide: apply equals-null.cocciJunio C Hamano
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-24Sync with 2.33.2Johannes Schindelin
* maint-2.33: Git 2.33.2 Git 2.32.1 Git 2.31.2 GIT-VERSION-GEN: bump to v2.33.1 Git 2.30.3 setup_git_directory(): add an owner check for the top-level directory Add a function to determine whether a path is owned by the current user
2022-03-24Sync with 2.31.2Johannes Schindelin
* maint-2.31: Git 2.31.2 Git 2.30.3 setup_git_directory(): add an owner check for the top-level directory Add a function to determine whether a path is owned by the current user
2022-03-24Fix `GIT_CEILING_DIRECTORIES` with `C:\` and the likesJohannes Schindelin
When determining the length of the longest ancestor of a given path with respect to to e.g. `GIT_CEILING_DIRECTORIES`, we special-case the root directory by returning 0 (i.e. we pretend that the path `/` does not end in a slash by virtually stripping it). That is the correct behavior because when normalizing paths, the root directory is special: all other directory paths have their trailing slash stripped, but not the root directory's path (because it would become the empty string, which is not a legal path). However, this special-casing of the root directory in `longest_ancestor_length()` completely forgets about Windows-style root directories, e.g. `C:\`. These _also_ get normalized with a trailing slash (because `C:` would actually refer to the current directory on that drive, not necessarily to its root directory). In fc56c7b34b (mingw: accomodate t0060-path-utils for MSYS2, 2016-01-27), we almost got it right. We noticed that `longest_ancestor_length()` expects a slash _after_ the matched prefix, and if the prefix already ends in a slash, the normalized path won't ever match and -1 is returned. But then that commit went astray: The correct fix is not to adjust the _tests_ to expect an incorrect -1 when that function is fed a prefix that ends in a slash, but instead to treat such a prefix as if the trailing slash had been removed. Likewise, that function needs to handle the case where it is fed a path that ends in a slash (not only a prefix that ends in a slash): if it matches the prefix (plus trailing slash), we still need to verify that the path does not end there, otherwise the prefix is not actually an ancestor of the path but identical to it (and we need to return -1 in that case). With these two adjustments, we no longer need to play games in t0060 where we only add `$rootoff` if the passed prefix is different from the MSYS2 pseudo root, instead we also add it for the MSYS2 pseudo root itself. We do have to be careful to skip that logic entirely for Windows paths, though, because they do are not subject to that MSYS2 pseudo root treatment. This patch fixes the scenario where a user has set `GIT_CEILING_DIRECTORIES=C:\`, which would be ignored otherwise. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2021-09-21Merge branch 'lh/systemd-timers'Junio C Hamano
"git maintenance" scheduler learned to use systemd timers as a possible backend. * lh/systemd-timers: maintenance: add support for systemd timers on Linux maintenance: `git maintenance run` learned `--scheduler=<scheduler>` cache.h: Introduce a generic "xdg_config_home_for(…)" function
2021-09-07cache.h: Introduce a generic "xdg_config_home_for(…)" functionLénaïc Huard
Current implementation of `xdg_config_home(filename)` returns `$XDG_CONFIG_HOME/git/$filename`, with the `git` subdirectory inserted between the `XDG_CONFIG_HOME` environment variable and the parameter. This patch introduces a `xdg_config_home_for(subdir, filename)` function which is more generic. It only concatenates "$XDG_CONFIG_HOME", or "$HOME/.config" if the former isn’t defined, with the parameters, without adding `git` in between. `xdg_config_home(filename)` is now implemented by calling `xdg_config_home_for("git", filename)` but this new generic function can be used to compute the configuration directory of other programs. Signed-off-by: Lénaïc Huard <lenaic@lhuard.fr> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-26interpolate_path(): allow specifying paths relative to the runtime prefixJohannes Schindelin
Ever since Git learned to detect its install location at runtime, there was the slightly awkward problem that it was impossible to specify paths relative to said location. For example, if a version of Git was shipped with custom SSL certificates to use, there was no portable way to specify `http.sslCAInfo`. In Git for Windows, the problem was "solved" for years by interpreting paths starting with a slash as relative to the runtime prefix. However, this is not correct: such paths _are_ legal on Windows, and they are interpreted as absolute paths in the same drive as the current directory. After a lengthy discussion, and an even lengthier time to mull over the problem and its best solution, and then more discussions, we eventually decided to introduce support for the magic sequence `%(prefix)/`. If a path starts with this, the remainder is interpreted as relative to the detected (runtime) prefix. If built without runtime prefix support, Git will simply interpolate the compiled-in prefix. If a user _wants_ to specify a path starting with the magic sequence, they can prefix the magic sequence with `./` and voilà, the path won't be expanded. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-26Use a better name for the function interpolating pathsJohannes Schindelin
It is not immediately clear what `expand_user_path()` means, so let's rename it to `interpolate_path()`. This also opens the path for interpolating more than just a home directory. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-26expand_user_path(): clarify the role of the `real_home` parameterJohannes Schindelin
The `real_home` parameter only has an effect when expanding paths starting with `~/`, not when expanding paths starting with `~<user>/`. Let's make that clear. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-26expand_user_path(): remove stale part of the commentJohannes Schindelin
In 395de250d9d (Expand ~ and ~user in core.excludesfile, commit.template, 2009-11-17), the `user_path()` function was refactored into the `expand_user_path()`. During that refactoring, the `buf` parameter was lost, but the code comment above said function still talks about it. Let's remove that stale part of the comment. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-04t0060: test ntfs/hfs-obscured dotfilesJeff King
We have tests that cover various filesystem-specific spellings of ".gitmodules", because we need to reliably identify that path for some security checks. These are from dc2d9ba318 (is_{hfs,ntfs}_dotgitmodules: add tests, 2018-05-12), with the actual code coming from e7cb0b4455 (is_ntfs_dotgit: match other .git files, 2018-05-11) and 0fc333ba20 (is_hfs_dotgit: match other .git files, 2018-05-02). Those latter two commits also added similar matching functions for .gitattributes and .gitignore. These ended up not being used in the final series, and are currently dead code. But in preparation for them being used in some fsck checks, let's make sure they actually work by throwing a few basic tests at them. Likewise, let's cover .mailmap (which does need matching code added). I didn't bother with the whole battery of tests that we cover for .gitmodules. These functions are all based on the same generic matcher, so it's sufficient to test most of the corner cases just once. Note that the ntfs magic prefix names in the tests come from the algorithm described in e7cb0b4455 (and are different for each file). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-20merge-ort: write $GIT_DIR/AUTO_MERGE whenever we hit a conflictElijah Newren
There are a variety of questions users might ask while resolving conflicts: * What changes have been made since the previous (first) parent? * What changes are staged? * What is still unstaged? (or what is still conflicted?) * What changes did I make to resolve conflicts so far? The first three of these have simple answers: * git diff HEAD * git diff --cached * git diff There was no way to answer the final question previously. Adding one is trivial in merge-ort, since it works by creating a tree representing what should be written to the working copy complete with conflict markers. Simply write that tree to .git/AUTO_MERGE, allowing users to answer the fourth question with * git diff AUTO_MERGE I avoided using a name like "MERGE_AUTO", because that would be merge-specific (much like MERGE_HEAD, REBASE_HEAD, REVERT_HEAD, CHERRY_PICK_HEAD) and I wanted a name that didn't change depending on which type of operation the merge was part of. Ensure that paths which clean out other temporary operation-specific files (e.g. CHERRY_PICK_HEAD, MERGE_MSG, rebase-merge/ state directory) also clean out this AUTO_MERGE file. Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-21sequencer: treat REVERT_HEAD as a pseudo refHan-Wen Nienhuys
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-21sequencer: treat CHERRY_PICK_HEAD as a pseudo refHan-Wen Nienhuys
Check for existence and delete CHERRY_PICK_HEAD through ref functions. This will help cherry-pick work with alternate ref storage backends. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-30Merge branch 'dl/merge-autostash'Junio C Hamano
"git merge" learns the "--autostash" option. * dl/merge-autostash: (22 commits) pull: pass --autostash to merge t5520: make test_pull_autostash() accept expect_parent_num merge: teach --autostash option sequencer: implement apply_autostash_oid() sequencer: implement save_autostash() sequencer: unlink autostash in apply_autostash() sequencer: extract perform_autostash() from rebase rebase: generify create_autostash() rebase: extract create_autostash() reset: extract reset_head() from rebase rebase: generify reset_head() rebase: use apply_autostash() from sequencer.c sequencer: rename stash_sha1 to stash_oid sequencer: make apply_autostash() accept a path rebase: use read_oneliner() sequencer: make read_oneliner() extern sequencer: configurably warn on non-existent files sequencer: make read_oneliner() accept flags sequencer: make file exists check more efficient sequencer: stop leaking buf ...
2020-04-10merge: teach --autostash optionDenton Liu
In rebase, one can pass the `--autostash` option to cause the worktree to be automatically stashed before continuing with the rebase. This option is missing in merge, however. Implement the `--autostash` option and corresponding `merge.autoStash` option in merge which stashes before merging and then pops after. This option is useful when a developer has some local changes on a topic branch but they realize that their work depends on another branch. Previously, they had to run something like git fetch ... git stash push git merge FETCH_HEAD git stash pop but now, that is reduced to git fetch ... git merge --autostash FETCH_HEAD When an autostash is generated, it is automatically reapplied to the worktree only in three explicit situations: 1. An incomplete merge is commit using `git commit`. 2. A merge completes successfully. 3. A merge is aborted using `git merge --abort`. In all other situations where the merge state is removed using remove_merge_branch_state() such as aborting a merge via `git reset --hard`, the autostash is saved into the stash reflog instead keeping the worktree clean. Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk> Suggested-by: Alban Gruin <alban.gruin@gmail.com> Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27Merge branch 'bc/sha-256-part-1-of-4'Junio C Hamano
SHA-256 transition continues. * bc/sha-256-part-1-of-4: (22 commits) fast-import: add options for rewriting submodules fast-import: add a generic function to iterate over marks fast-import: make find_marks work on any mark set fast-import: add helper function for inserting mark object entries fast-import: permit reading multiple marks files commit: use expected signature header for SHA-256 worktree: allow repository version 1 init-db: move writing repo version into a function builtin/init-db: add environment variable for new repo hash builtin/init-db: allow specifying hash algorithm on command line setup: allow check_repository_format to read repository format t/helper: make repository tests hash independent t/helper: initialize repository if necessary t/helper/test-dump-split-index: initialize git repository t6300: make hash algorithm independent t6300: abstract away SHA-1-specific constants t: use hash-specific lookup tables to define test constants repository: require a build flag to use SHA-256 hex: add functions to parse hex object IDs in any algorithm hex: introduce parsing variants taking hash algorithms ...
2020-03-10real_path: remove unsafe APIAlexandr Miloslavskiy
Returning a shared buffer invites very subtle bugs due to reentrancy or multi-threading, as demonstrated by the previous patch. There was an unfinished effort to abolish this [1]. Let's finally rid of `real_path()`, using `strbuf_realpath()` instead. This patch uses a local `strbuf` for most places where `real_path()` was previously called. However, two places return the value of `real_path()` to the caller. For them, a `static` local `strbuf` was added, effectively pushing the problem one level higher: read_gitfile_gently() get_superproject_working_tree() [1] https://lore.kernel.org/git/1480964316-99305-1-git-send-email-bmwill@google.com/ Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-07set_git_dir: fix crash when used with real_path()Alexandr Miloslavskiy
`real_path()` returns result from a shared buffer, inviting subtle reentrance bugs. One of these bugs occur when invoked this way: set_git_dir(real_path(git_dir)) In this case, `real_path()` has reentrance: real_path read_gitfile_gently repo_set_gitdir setup_git_env set_git_dir_1 set_git_dir Later, `set_git_dir()` uses its now-dead parameter: !is_absolute_path(path) Fix this by using a dedicated `strbuf` to hold `strbuf_realpath()`. Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>