git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2023-12-20	Merge branch 'jc/revision-parse-int'	Junio C Hamano
	The command line parser for the "log" family of commands was too loose when parsing certain numbers, e.g., silently ignoring the extra 'q' in "git log -n 1q" without complaining, which has been tightened up. * jc/revision-parse-int: revision: parse integer arguments to --max-count, --skip, etc., more carefully
2023-12-20	Merge branch 'ps/ref-tests-update-more'	Junio C Hamano
	Tests update. * ps/ref-tests-update-more: t6301: write invalid object ID via `test-tool ref-store` t5551: stop writing packed-refs directly t5401: speed up creation of many branches t4013: simplify magic parsing and drop "failure" t3310: stop checking for reference existence via `test -f` t1417: make `reflog --updateref` tests backend agnostic t1410: use test-tool to create empty reflog t1401: stop treating FETCH_HEAD as real reference t1400: split up generic reflog tests from the reffile-specific ones t0410: mark tests to require the reffiles backend
2023-12-20	Merge branch 'en/complete-sparse-checkout'	Junio C Hamano
	Command line completion (in contrib/) learned to complete path arguments to the "add/set" subcommands of "git sparse-checkout" better. * en/complete-sparse-checkout: completion: avoid user confusion in non-cone mode completion: avoid misleading completions in cone mode completion: fix logic for determining whether cone mode is active completion: squelch stray errors in sparse-checkout completion
2023-12-19	Merge branch 'jh/trace2-redact-auth'	Junio C Hamano
	trace2 streams used to record the URLs that potentially embed authentication material, which has been corrected. * jh/trace2-redact-auth: t0212: test URL redacting in EVENT format t0211: test URL redacting in PERF format trace2: redact passwords from https:// URLs by default trace2: fix signature of trace2_def_param() macro
2023-12-19	Merge branch 'ad/merge-file-diff-algo'	Junio C Hamano
	"git merge-file" learned to take the "--diff-algorithm" option to use algorithm different from the default "myers" diff. * ad/merge-file-diff-algo: merge-file: add --diff-algorithm option
2023-12-19	Merge branch 'rs/column-leakfix'	Junio C Hamano
	Leakfix. * rs/column-leakfix: column: release strbuf and string_list after use
2023-12-19	Merge branch 'rs/i18n-cannot-be-used-together'	Junio C Hamano
	Clean-up code that handles combinations of incompatible options. * rs/i18n-cannot-be-used-together: i18n: factorize even more 'incompatible options' messages
2023-12-19	Merge branch 'js/update-urls-in-doc-and-comment'	Junio C Hamano
	Stale URLs have been updated to their current counterparts (or archive.org) and HTTP links are replaced with working HTTPS links. * js/update-urls-in-doc-and-comment: doc: refer to internet archive doc: update links for andre-simon.de doc: switch links to https doc: update links to current pages
2023-12-19	Merge branch 'ps/commit-graph-less-paranoid'	Junio C Hamano
	Earlier we stopped relying on commit-graph that (still) records information about commits that are lost from the object store, which has negative performance implications. The default has been flipped to disable this pessimization. * ps/commit-graph-less-paranoid: commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
2023-12-19	Merge branch 'cc/git-replay'	Junio C Hamano
	Introduce "git replay", a tool meant on the server side without working tree to recreate a history. * cc/git-replay: replay: stop assuming replayed branches do not diverge replay: add --contained to rebase contained branches replay: add --advance or 'cherry-pick' mode replay: use standard revision ranges replay: make it a minimal server side command replay: remove HEAD related sanity check replay: remove progress and info output replay: add an important FIXME comment about gpg signing replay: change rev walking options replay: introduce pick_regular_commit() replay: die() instead of failing assert() replay: start using parse_options API replay: introduce new builtin t6429: remove switching aspects of fast-rebase
2023-12-19	Merge branch 'ps/ref-deletion-updates'	Junio C Hamano
	Simplify API implementation to delete references by eliminating duplication. * ps/ref-deletion-updates: refs: remove `delete_refs` callback from backends refs: deduplicate code to delete references refs/files: use transactions to delete references t5510: ensure that the packed-refs file needs locking
2023-12-19	pkt-line: do not chomp newlines for sideband messages	Jiang Xin
	When calling "packet_read_with_status()" to parse pkt-line encoded packets, we can turn on the flag "PACKET_READ_CHOMP_NEWLINE" to chomp newline character for each packet for better line matching. But when receiving data and progress information using sideband, we should turn off the flag "PACKET_READ_CHOMP_NEWLINE" to prevent mangling newline characters from data and progress information. When both the server and the client support "sideband-all" capability, we have a dilemma that newline characters in negotiation packets should be removed, but the newline characters in the progress information should be left intact. Add new flag "PACKET_READ_USE_SIDEBAND" for "packet_read_with_status()" to prevent mangling newline characters in sideband messages. Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-19	pkt-line: memorize sideband fragment in reader	Jiang Xin
	When we turn on the "use_sideband" field of the packet_reader, "packet_reader_read()" will call the function "demultiplex_sideband()" to parse and consume sideband messages. Sideband fragment which does not end with "\r" or "\n" will be saved in the sixth parameter "scratch" and it can be reused and be concatenated when parsing another sideband message. In "packet_reader_read()" function, the local variable "scratch" can only be reused by subsequent sideband messages. But if there is a payload message between two sideband fragments, the first fragment which is saved in the local variable "scratch" will be lost. To solve this problem, we can add a new field "scratch" in packet_reader to memorize the sideband fragment across different calls of "packet_reader_read()". Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-19	test-pkt-line: add option parser for unpack-sideband	Jiang Xin
	We can use the test helper program "test-tool pkt-line" to test pkt-line related functions. E.g.: * Use "test-tool pkt-line send-split-sideband" to generate sideband messages. * Pipe these generated sideband messages to command "test-tool pkt-line unpack-sideband" to test packet_reader_read() function. In order to make a complete test of the packet_reader_read() function, add option parser for command "test-tool pkt-line unpack-sideband". * To remove newlines in sideband messages, we can use: $ test-tool pkt-line unpack-sideband --chomp-newline * To preserve newlines in sideband messages, we can use: $ test-tool pkt-line unpack-sideband --no-chomp-newline * To parse sideband messages using "demultiplex_sideband()" inside the function "packet_reader_read()", we can use: $ test-tool pkt-line unpack-sideband --reader-use-sideband We also add new example sideband packets in send_split_sideband() and add several new test cases in t0070. Among these test cases, we pipe output of the "send-split-sideband" subcommand to the "unpack-sideband" subcommand. We found two issues: 1. The two splitted sideband messages "Hello," and " world!\n" should be concatenated together. But when we turn on use_sideband field of reader to parse sideband messages, the first part of the splitted message ("Hello,") is lost. 2. The newline characters in sideband 2 (progress info) and sideband 3 (error message) should be preserved, but they are both trimmed. Will fix the above two issues in subsequent commits. Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-18	test-lib-functions.sh: fix test_grep fail message wording	Shreyansh Paliwal
	In the recent commit 2e87fca189 (test framework: further deprecate test_i18ngrep, 2023-10-31), the test_i18ngrep function was deprecated, and all the callers were updated to call the test_grep function instead. But test_grep inherited an error message that still refers to test_i18ngrep by mistake. Correct it so that a broken call to the test_grep will identify itself as such. Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-18	fetch: no redundant error message for atomic fetch	Jiang Xin
	If an error occurs during an atomic fetch, a redundant error message will appear at the end of do_fetch(). It was introduced in b3a804663c (fetch: make `--atomic` flag cover backfilling of tags, 2022-02-17). Because a failure message is displayed before setting retcode in the function do_fetch(), calling error() on the err message at the end of this function may result in redundant or empty error message to be displayed. We can remove the redundant error() function, because we know that the function ref_transaction_abort() never fails. While we can find a common pattern for calling ref_transaction_abort() by running command "git grep -A1 ref_transaction_abort", e.g.: if (ref_transaction_abort(transaction, &error)) error("abort: %s", error.buf); Following this pattern, we can tolerate the return value of the function ref_transaction_abort() being changed in the future. We also delay the output of the err message to the end of do_fetch() to reduce redundant code. With these changes, the test case "fetch porcelain output (atomic)" in t5574 will also be fixed. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-18	t5574: test porcelain output of atomic fetch	Jiang Xin
	The test case "fetch porcelain output" checks output of the fetch command. The error output must be empty with the follow assertion: test_must_be_empty stderr But this assertion fails if using atomic fetch. Refactor this test case to use different fetch options by splitting it into three test cases. 1. "setup for fetch porcelain output". 2. "fetch porcelain output": for non-atomic fetch. 3. "fetch porcelain output (atomic)": for atomic fetch. Add new command "test_commit ..." in the first test case, so that if we run these test cases individually (--run=4-6), "git rev-parse HEAD~" command will work properly. Run the above test cases, we can find that one test case has a known breakage, as shown below: ok 4 - setup for fetch porcelain output ok 5 - fetch porcelain output # TODO known breakage vanished not ok 6 - fetch porcelain output (atomic) # TODO known breakage The failed test case has an error message with only the error prompt but no message body, as follows: 'stderr' is not empty, it contains: error: In a later commit, we will fix this issue. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-15	tests: adjust whitespace in chainlint expectations	Patrick Steinhardt
	The "check-chainlint" target runs automatically when running tests and performs self-checks to verify that the chainlinter itself produces the expected output. Originally, the chainlinter was implemented via sed, but the infrastructure has been rewritten in fb41727b7e (t: retire unused chainlint.sed, 2022-09-01) to use a Perl script instead. The rewrite caused some slight whitespace changes in the output that are ultimately not of much importance. In order to be able to assert that the actual chainlinter errors match our expectations we thus have to ignore whitespace characters when diffing them. As the `-w` flag is not in POSIX we try to use `git diff -w --no-index` before we fall back to `diff -w -u`. To accomodate for cases where the host system has no Git installation we use the locally-compiled version of Git. This can result in problems though when the Git project's repository is using extensions that the locally-compiled version of Git doesn't understand. It will refuse to run and thus cause the checks to fail. Instead of improving the detection logic, fix our ".expect" files so that we do not need any post-processing at all anymore. This allows us to drop the `-w` flag when diffing so that we can always use diff(1) now. Note that we keep some of the post-processing of `chainlint.pl` output intact to strip leading line numbers generated by the script. Having these would cause a rippling effect whenever we add a new test that sorts into the middle of existing tests and would require us to renumerate all subsequent lines, which seems rather pointless. Signed-off-by: Patrick Steinhardt <ps@pks.im> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-15	t/perf: add performance tests for multi-pack reuse	Taylor Blau
	To ensure that we don't regress either the size or runtime performance of multi-pack reuse, add a performance test to measure both of these. The test partitions the objects in GIT_TEST_PERF_LARGE_REPO into 1, 10, and 100 packs, and then tries to perform a "clone" at each stage with both single- and multi-pack reuse enabled. Note that the `repack_into_n_chunks()` function in this new test script differs from the existing `repack_into_n()`. The former partitions the repository into N equal-sized chunks, while the latter produces N packs of five commits each (plus their objects), and then another pack with the remainder. On git.git, I can produce the following results on my machine: Test this tree -------------------------------------------------------------------------------- 5332.3: clone for 1-pack scenario (single-pack reuse) 1.57(2.99+0.15) 5332.4: clone size for 1-pack scenario (single-pack reuse) 231.8M 5332.5: clone for 1-pack scenario (multi-pack reuse) 1.79(2.96+0.21) 5332.6: clone size for 1-pack scenario (multi-pack reuse) 231.7M 5332.9: clone for 10-pack scenario (single-pack reuse) 3.89(16.75+0.35) 5332.10: clone size for 10-pack scenario (single-pack reuse) 209.9M 5332.11: clone for 10-pack scenario (multi-pack reuse) 1.56(2.99+0.17) 5332.12: clone size for 10-pack scenario (multi-pack reuse) 224.4M 5332.15: clone for 100-pack scenario (single-pack reuse) 8.24(54.31+0.59) 5332.16: clone size for 100-pack scenario (single-pack reuse) 278.3M 5332.17: clone for 100-pack scenario (multi-pack reuse) 2.13(2.44+0.33) 5332.18: clone size for 100-pack scenario (multi-pack reuse) 357.9M Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-15	pack-bitmap: enable reuse from all bitmapped packs	Taylor Blau
	Now that both the pack-bitmap and pack-objects code are prepared to handle marking and using objects from multiple bitmapped packs for verbatim reuse, allow marking objects from all bitmapped packs as eligible for reuse. Within the `reuse_partial_packfile_from_bitmap()` function, we no longer only mark the pack whose first object is at bit position zero for reuse, and instead mark any pack contained in the MIDX as a reuse candidate. Provide a handful of test cases in a new script (t5332) exercising interesting behavior for multi-pack reuse to ensure that we performed all of the previous steps correctly. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-15	t/test-lib-functions.sh: implement `test_trace2_data` helper	Taylor Blau
	Introduce a helper function which looks for a specific (category, key, value) tuple in the output of a trace2 event stream. We will use this function in a future patch to ensure that the expected number of objects are reused from an expected number of packs. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-15	midx: implement `midx_preferred_pack()`	Taylor Blau
	When performing a binary search over the objects in a MIDX's bitmap (i.e. in pseudo-pack order), the reader reconstructs the pseudo-pack ordering using a combination of (a) the preferred pack, (b) the pack's lexical position in the MIDX based on pack names, and (c) the object offset within the pack. In order to perform this binary search, the reader must know the identity of the preferred pack. This could be stored in the MIDX, but isn't for historical reasons, mostly because it can easily be inferred at read-time by looking at the object in the first bit position and finding out which pack it was selected from in the MIDX, like so: nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0)); In midx_to_pack_pos() which performs this binary search, we look up the identity of the preferred pack before each search. This is relatively quick, since it involves two table-driven lookups (one in the MIDX's revindex for `pack_pos_to_midx()`, and another in the MIDX's object table for `nth_midxed_pack_int_id()`). But since the preferred pack does not change after the MIDX is written, it is safe to cache this value on the MIDX itself. Write a helper to do just that, and rewrite all of the existing call-sites that care about the identity of the preferred pack in terms of this new helper. This will prepare us for a subsequent patch where we will need to binary search through the MIDX's pseudo-pack order multiple times. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-15	midx: implement `BTMP` chunk	Taylor Blau
	When a multi-pack bitmap is used to implement verbatim pack reuse (that is, when verbatim chunks from an on-disk packfile are copied directly[^1]), it does so by using its "preferred pack" as the source for pack-reuse. This allows repositories to pack the majority of their objects into a single (often large) pack, and then use it as the single source for verbatim pack reuse. This increases the amount of objects that are reused verbatim (and consequently, decrease the amount of time it takes to generate many packs). But this performance comes at a cost, which is that the preferred packfile must pace its growth with that of the entire repository in order to maintain the utility of verbatim pack reuse. As repositories grow beyond what we can reasonably store in a single packfile, the utility of verbatim pack reuse diminishes. Or, at the very least, it becomes increasingly more expensive to maintain as the pack grows larger and larger. It would be beneficial to be able to perform this same optimization over multiple packs, provided some modest constraints (most importantly, that the set of packs eligible for verbatim reuse are disjoint with respect to the subset of their objects being sent). If we assume that the packs which we treat as candidates for verbatim reuse are disjoint with respect to any of their objects we may output, we need to make only modest modifications to the verbatim pack-reuse code itself. Most notably, we need to remove the assumption that the bits in the reachability bitmap corresponding to objects from the single reuse pack begin at the first bit position. Future patches will unwind these assumptions and reimplement their existing functionality as special cases of the more general assumptions (e.g. that reuse bits can start anywhere within the bitset, but happen to start at 0 for all existing cases). This patch does not yet relax any of those assumptions. Instead, it implements a foundational data-structure, the "Bitampped Packs" (`BTMP`) chunk of the multi-pack index. The `BTMP` chunk's contents are described in detail here. Importantly, the `BTMP` chunk contains information to map regions of a multi-pack index's reachability bitmap to the packs whose objects they represent. For now, this chunk is only written, not read (outside of the test-tool used in this patch to test the new chunk's behavior). Future patches will begin to make use of this new chunk. [^1]: Modulo patching any `OFS_DELTA`'s that cross over a region of the pack that wasn't used verbatim. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-15	pack-bitmap: plug leak in find_objects()	Taylor Blau
	The `find_objects()` function creates an object_list for any tips of the reachability query which do not have corresponding bitmaps. The object_list is not used outside of `find_objects()`, but we never free it with `object_list_free()`, resulting in a leak. Let's plug that leak by calling `object_list_free()`, which results in t6113 becoming leak-free. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-15	t5100: make rfc822 comment test more careful	Jeff King
	When processing "From" headers in an email, mailinfo "unquotes" quoted strings and rfc822 parenthesized comments. For quoted strings, we actually remove the double-quotes, so: From: "A U Thor" <someone@example.com> become: Author: A U Thor Email: someone@example.com But for comments, we leave the outer parentheses in place, so: From: A U (this is a comment) Thor <someone@example.com> becomes: Author: A U (this is a comment) Thor Email: someone@example.com So what is the comment "unquoting" actually doing? In our code, being in a comment section has exactly two effects: 1. We'll unquote backslash-escaped characters inside a comment section. 2. We _won't_ unquote double-quoted strings inside a comment section. Our test for comments in t5100 checks this: From: "A U Thor" <somebody@example.com> (this is $really$ a comment (honestly)) So it is covering (1), but not (2). Let's add in a quoted string to cover this. Moreover, because the comment appears at the end of the From header, there's nothing to confirm that we correctly found the end of the comment section (and not just the end-of-string). Let's instead move it to the beginning of the header, which means we can confirm that the existing quoted string is detected (which will only happen if we know we've left the comment block). As expected, the test continues to pass, but this will give us more confidence as we refactor the code in the next patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14	bisect: consistently write BISECT_EXPECTED_REV via the refdb	Patrick Steinhardt
	We're inconsistently writing BISECT_EXPECTED_REV both via the filesystem and via the refdb, which violates the newly established rules for how special refs must be treated. This works alright in practice with the reffiles reference backend, but will cause bugs once we gain additional backends. Fix this issue and consistently write BISECT_EXPECTED_REV via the refdb so that it is no longer a special ref. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14	refs: propagate errno when reading special refs fails	Patrick Steinhardt
	Some refs in Git are more special than others due to reasons explained in the next commit. These refs are read via `refs_read_special_head()`, but this function doesn't behave the same as when we try to read a normal ref. Most importantly, we do not propagate `failure_errno` in the case where the reference does not exist, which is behaviour that we rely on in many parts of Git. Fix this bug by propagating errno when `strbuf_read_file()` fails. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-13	checkout: forbid "-B <branch>" from touching a branch used elsewhere	Junio C Hamano
	"git checkout -B <branch> [<start-point>]", being a "forced" version of "-b", switches to the <branch>, after optionally resetting its tip to the <start-point>, even if the <branch> is in use in another worktree, which is somewhat unexpected. Protect the <branch> using the same logic that forbids "git checkout <branch>" from touching a branch that is in use elsewhere. This is a breaking change that may deserve backward compatibliity warning in the Release Notes. The "--ignore-other-worktrees" option can be used as an escape hatch if the finger memory of existing users depend on the current behaviour of "-B". Reported-by: Willem Verstraeten <willem.verstraeten@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-13	t6300: avoid hard-coding object sizes	René Scharfe
	f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24) hard-coded the expected object sizes. Coincidentally the size of commit and tag is the same with zlib at the default compression level. 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22) encoded the sizes as a single value, which coincidentally also works with sha256. Different compression libraries like zlib-ng may arrive at different values. Get them from the file system instead of hard-coding them to make switching the compression library (or changing the compression level) easier. Reported-by: Ondrej Pohorelsky <opohorel@redhat.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-13	mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()	Jeff King
	When processing a header like a "From" line, mailinfo uses unquote_quoted_pair() to handle double-quotes and rfc822 parenthesized comments. It takes a NUL-terminated string on input, and loops over the "in" pointer until it sees the NUL. When it finds the start of an interesting block, it delegates to helper functions which also increment "in", and return the updated pointer. But there's a bug here: the helpers find the NUL with a post-increment in the loop condition, like: while ((c = *in++) != 0) So when they do see a NUL (rather than the correct termination of the quote or comment section), they return "in" as one _past_ the NUL terminator. And thus the outer loop in unquote_quoted_pair() does not realize we hit the NUL, and keeps reading past the end of the buffer. We should instead make sure to return "in" positioned at the NUL, so that the caller knows to stop their loop, too. A hacky way to do this is to return "in - 1" after leaving the inner loop. But a slightly cleaner solution is to avoid incrementing "in" until we are sure it contained a non-NUL byte (i.e., doing it inside the loop body). The two tests here show off the problem. Since we check the output, they'll _usually_ report a failure in a normal build, but it depends on what garbage bytes are found after the heap buffer. Building with SANITIZE=address reliably notices the problem. The outcome (both the exit code and the exact bytes) are just what Git happens to produce for these cases today, and shouldn't be taken as an endorsement. It might be reasonable to abort on an unterminated string, for example. The priority for this patch is fixing the out-of-bounds memory access. Reported-by: Carlos Andrés Ramírez Cataño <antaigroupltda@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-12	builtin/clone: create the refdb with the correct object format	Patrick Steinhardt
	We're currently creating the reference database with a potentially incorrect object format when the remote repository's object format is different from the local default object format. This works just fine for now because the files backend never records the object format anywhere. But this logic will fail with any new reference backend that encodes this information in some form either on-disk or in-memory. The preceding commits have reshuffled code in git-clone(1) so that there is no code path that will access the reference database before we have detected the remote's object format. With these refactorings we can now defer initialization of the reference database until after we have learned the remote's object format and thus initialize it with the correct format from the get-go. These refactorings are required to make git-clone(1) work with the upcoming reftable backend when cloning repositories with the SHA256 object format. This change breaks a test in "t5550-http-fetch-dumb.sh" when cloning an empty repository with `GIT_TEST_DEFAULT_HASH=sha256`. The test expects the resulting hash format of the empty cloned repository to match the default hash, but now we always end up with a sha1 repository. The problem is that for dumb HTTP fetches, we have no easy way to figure out the remote's hash function except for deriving it based on the hash length of refs in `info/refs`. But as the remote repository is empty we cannot rely on this detection mechanism. Before the change in this commit we already initialized the repository with the default hash function and then left it as-is. With this patch we always use the hash function detected via the remote, where we fall back to "sha1" in case we cannot detect it. Neither the old nor the new behaviour are correct as we second-guess the remote hash function in both cases. But given that this is a rather unlikely edge case (we use the dumb HTTP protocol, the remote repository uses SHA256 and the remote repository is empty), let's simply adapt the test to assert the new behaviour. If we want to properly address this edge case in the future we will have to extend the dumb HTTP protocol so that we can properly detect the hash function for empty repositories. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-12	builtin/clone: fix bundle URIs with mismatching object formats	Patrick Steinhardt
	We create the reference database in git-clone(1) quite early before connecting to the remote repository. Given that we do not yet know about the object format that the remote repository uses at that point in time the consequence is that the refdb may be initialized with the wrong object format. This is not a problem in the context of the files backend as we do not encode the object format anywhere, and furthermore the only reference that we write between initializing the refdb and learning about the object format is the "HEAD" symref. It will become a problem though once we land the reftable backend, which indeed does require to know about the proper object format at the time of creation. We thus need to rearrange the logic in git-clone(1) so that we only initialize the refdb once we have learned about the actual object format. As a first step, move listing of remote references to happen earlier, which also allow us to set up the hash algorithm of the repository earlier now. While we aim to execute this logic as late as possible until after most of the setup has happened already, detection of the object format and thus later the setup of the reference database must happen before any other logic that may spawn Git commands or otherwise these Git commands may not recognize the repository as such. The first Git step where we expect the repository to be fully initalized is when we fetch bundles via bundle URIs. Funny enough, the comments there also state that "the_repository must match the cloned repo", which is indeed not necessarily the case for the hash algorithm right now. So in practice it is the right thing to detect the remote's object format before downloading bundle URIs anyway, and not doing so causes clones with bundle URIs to fail when the local default object format does not match the remote repository's format. Unfortunately though, this creates a new issue: downloading bundles may take a long time, so if we list refs beforehand they might've grown stale meanwhile. It is not clear how to solve this issue except for a second reference listing though after we have downloaded the bundles, which may be an expensive thing to do. Arguably though, it's preferable to have a staleness issue compared to being unable to clone a repository altogether. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-11	show-ref: use die_for_incompatible_opt3()	René Scharfe
	Use the standard message for reporting the use of multiple mutually exclusive options by calling die_for_incompatible_opt3() instead of rolling our own. This has the benefits of showing only the actually given options, reducing the number of strings to translate and making the UI slightly more consistent. Adjust the test to no longer insist on a specific order of the reported options, as this implementation detail does not affect the usefulness of the error message. Reported-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: René Scharfe <l.s.r@web.de> Reviewed-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-10	Merge branch 'tz/send-email-negatable-options'	Junio C Hamano
	Newer versions of Getopt::Long started giving warnings against our (ab)use of it in "git send-email". Bump the minimum version requirement for Perl to 5.8.1 (from September 2002) to allow simplifying our implementation. * tz/send-email-negatable-options: send-email: avoid duplicate specification warnings perl: bump the required Perl version to 5.8.1 from 5.8.0
2023-12-10	Merge branch 'ak/rebase-autosquash'	Junio C Hamano
	"git rebase --autosquash" is now enabled for non-interactive rebase, but it is still incompatible with the apply backend. * ak/rebase-autosquash: rebase: rewrite --(no-)autosquash documentation rebase: support --autosquash without -i rebase: fully ignore rebase.autoSquash without -i
2023-12-10	Merge branch 'vd/for-each-ref-unsorted-optimization'	Junio C Hamano
	"git for-each-ref --no-sort" still sorted the refs alphabetically which paid non-trivial cost. It has been redefined to show output in an unspecified order, to allow certain optimizations to take advantage of. * vd/for-each-ref-unsorted-optimization: t/perf: add perf tests for for-each-ref ref-filter.c: use peeled tag for '*' format fields for-each-ref: clean up documentation of --format ref-filter.c: filter & format refs in the same callback ref-filter.c: refactor to create common helper functions ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()' ref-filter.h: add functions for filter/format & format-only ref-filter.h: move contains caches into filter ref-filter.h: add max_count and omit_empty to ref_format ref-filter.c: really don't sort when using --no-sort
2023-12-10	Merge branch 'ps/ban-a-or-o-operator-with-test'	Junio C Hamano
	Test and shell scripts clean-up. * ps/ban-a-or-o-operator-with-test: Makefile: stop using `test -o` when unlinking duplicate executables contrib/subtree: convert subtree type check to use case statement contrib/subtree: stop using `-o` to test for number of args global: convert trivial usages of `test <expr> -a/-o <expr>`
2023-12-10	Merge branch 'ss/format-patch-use-encode-headers-for-cover-letter'	Junio C Hamano
	"git format-patch --encode-email-headers" ignored the option when preparing the cover letter, which has been corrected. * ss/format-patch-use-encode-headers-for-cover-letter: format-patch: fix ignored encode_email_headers for cover letter
2023-12-10	Merge branch 'ps/ref-tests-update'	Junio C Hamano
	Update ref-related tests. * ps/ref-tests-update: t: mark several tests that assume the files backend with REFFILES t7900: assert the absence of refs via git-for-each-ref(1) t7300: assert exact states of repo t4207: delete replace references via git-update-ref(1) t1450: convert tests to remove worktrees via git-worktree(1) t: convert tests to not access reflog via the filesystem t: convert tests to not access symrefs via the filesystem t: convert tests to not write references via the filesystem t: allow skipping expected object ID in `ref-store update-ref`
2023-12-10	Merge branch 'jw/git-add-attr-pathspec'	Junio C Hamano
	"git add" and "git stash" learned to support the ":(attr:...)" magic pathspec. * jw/git-add-attr-pathspec: attr: enable attr pathspec magic for git-add and git-stash
2023-12-10	Merge branch 'jk/chunk-bounds-more'	Junio C Hamano
	Code clean-up for jk/chunk-bounds topic. * jk/chunk-bounds-more: commit-graph: mark chunk error messages for translation commit-graph: drop verify_commit_graph_lite() commit-graph: check order while reading fanout chunk commit-graph: use fanout value for graph size commit-graph: abort as soon as we see a bogus chunk commit-graph: clarify missing-chunk error messages commit-graph: drop redundant call to "lite" verification midx: check consistency of fanout table commit-graph: handle overflow in chunk_size checks
2023-12-10	Merge branch 'ps/ci-gitlab'	Junio C Hamano
	Add support for GitLab CI. * ps/ci-gitlab: ci: add support for GitLab CI ci: install test dependencies for linux-musl ci: squelch warnings when testing with unusable Git repo ci: unify setup of some environment variables ci: split out logic to set up failed test artifacts ci: group installation of Docker dependencies ci: make grouping setup more generic ci: reorder definitions for grouping functions
2023-12-10	Merge branch 'js/doc-unit-tests-with-cmake'	Junio C Hamano
	Update the base topic to work with CMake builds. * js/doc-unit-tests-with-cmake: cmake: handle also unit tests cmake: use test names instead of full paths cmake: fix typo in variable name artifacts-tar: when including `.dll` files, don't forget the unit-tests unit-tests: do show relative file paths unit-tests: do not mistake `.pdb` files for being executable cmake: also build unit tests
2023-12-10	Merge branch 'js/doc-unit-tests'	Junio C Hamano
	Process to add some form of low-level unit tests has started. * js/doc-unit-tests: ci: run unit tests in CI unit tests: add TAP unit test framework unit tests: add a project plan document
2023-12-09	revision: parse integer arguments to --max-count, --skip, etc., more carefully	Junio C Hamano
	The "rev-list" and other commands in the "log" family, being the oldest part of the system, use their own custom argument parsers, and integer values of some options are parsed with atoi(), which allows a non-digit after the number (e.g., "1q") to be silently ignored. As a natural consequence, an argument that does not begin with a digit (e.g., "q") silently becomes zero, too. Switch to use strtol_i() and parse_timestamp() appropriately to catch bogus input. Note that one may naïvely expect that --max-count, --skip, etc., to only take non-negative values, but we must allow them to also take negative values, as an escape hatch to countermand a limit set by an earlier option on the command line; the underlying variables are initialized to (-1) and "--max-count=-1", for example, is a legitimate way to reinitialize the limit. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09	bisect: always clean on reset	Jeff King
	Usually "bisect reset" cleans up any refs/bisect/ refs, along with meta-files like .git/BISECT_LOG. But it only does so after deciding that a bisection is active, which it does by reading BISECT_START. This is usually fine, but it's possible to get into a confusing state if the BISECT_START file is gone, but other cruft is left (this might be due to a bug, or a system crash, etc). And since "bisect reset" refuses to do anything in this state, the user has no easy way to clean up the leftover cruft. While another "bisect start" would clear the state, in the interim it can be annoying, as other tools (like our bash prompt code) think we are bisecting, and for-each-ref output may be polluted with refs/bisect/ entries. Further adding to the confusion is that running "bisect reset $some_ref" skips the BISECT_START check. So it never realizes that there's no bisection active and does the cleanup anyway! So let's just make sure we always do the cleanup, whether we looked at BISECT_START or not. If the user doesn't give us a commit to reset to, we'll still say "We are not bisecting" and skip the call to "git checkout". Reported-by: Janik Haag <janik@aq0.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09	parse-options: decouple "--end-of-options" and "--"	Jeff King
	When we added generic end-of-options support in 51b4594b40 (parse-options: allow --end-of-options as a synonym for "--", 2019-08-06), we made them true synonyms. They both stop option parsing, and they are both returned in the resulting argv if the KEEP_DASHDASH flag is used. The hope was that this would work for all callers: - most generic callers would not pass KEEP_DASHDASH, and so would just do the right thing (stop parsing there) without needing to know anything more. - callers with KEEP_DASHDASH were generally going to rely on setup_revisions(), which knew to handle --end-of-options specially But that turned out miss quite a few cases that pass KEEP_DASHDASH but do their own manual parsing. For example, "git reset", "git checkout", and so on want pass KEEP_DASHDASH so they can support: git reset $revs -- $paths but of course aren't going to actually do a traversal, so they don't call setup_revisions(). And those cases currently get confused by --end-of-options being left in place, like: $ git reset --end-of-options HEAD fatal: option '--end-of-options' must come before non-option arguments We could teach each of these callers to handle the leftover option explicitly. But let's try to be a bit more clever and see if we can solve it centrally in parse-options.c. The bogus assumption here is that KEEP_DASHDASH tells us the caller wants to see --end-of-options in the result. But really, the callers which need to know that --end-of-options was reached are those that may potentially parse more options from argv. In other words, those that pass the KEEP_UNKNOWN_OPT flag. If such a caller is aware of --end-of-options (e.g., because they call setup_revisions() with the result), then this will continue to do the right thing, treating anything after --end-of-options as a non-option. And if the caller is not aware of --end-of-options, they are better off keeping it intact, because either: 1. They are just passing the options along to somebody else anyway, in which case that somebody would need to know about the --end-of-options marker. 2. They are going to parse the remainder themselves, at which point choking on --end-of-options is much better than having it silently removed. The point is to avoid option injection from untrusted command line arguments, and bailing is better than quietly treating the untrusted argument as an option. This fixes bugs with --end-of-options across several commands, but I've focused on two in particular here: - t7102 confirms that "git reset --end-of-options --foo" now works. This checks two things. One, that we no longer barf on "--end-of-options" itself (which previously we did, even if the rev was something vanilla like "HEAD" instead of "--foo"). And two, that we correctly treat "--foo" as a revision rather than an option. This fix applies to any other cases which pass KEEP_DASHDASH but not KEEP_UNKNOWN_OPT, like "git checkout", "git check-attr", "git grep", etc, which would previously choke on "--end-of-options". - t9350 shows the opposite case: fast-export passed KEEP_UNKNOWN_OPT but not KEEP_DASHDASH, but then passed the result on to setup_revisions(). So it never saw --end-of-options, and would erroneously parse "fast-export --end-of-options --foo" as having a "--foo" option. This is now fixed. Note that this does shut the door for callers which want to know if we hit end-of-options, but don't otherwise need to keep unknown opts. The obvious thing here is feeding it to the DWIM verify_filename() machinery. And indeed, this is a problem even for commands which do understand --end-of-options already. For example, without this patch, you get: $ git log --end-of-options --foo fatal: option '--foo' must come before non-option arguments because we refuse to accept "--foo" as a filename (because it starts with a dash) even though we could know that we saw end-of-options. The verify_filename() function simply doesn't accept this extra information. So that is the status quo, and this patch doubles down further on that. Commands like "git reset" have the same problem, but they won't even know that parse-options saw --end-of-options! So even if we fixed verify_filename(), they wouldn't have anything to pass to it. But in practice I don't think this is a big deal. If you are being careful enough to use --end-of-options, then you should also be using "--" to disambiguate and avoid the DWIM behavior in the first place. In other words, doing: git log --end-of-options --this-is-a-rev -- --this-is-a-path works correctly, and will continue to do so. And likewise, with this patch now: git reset --end-of-options --this-is-a-rev -- --this-is-a-path will work, as well. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09	worktree: standardize incompatibility messages	René Scharfe
	Use the standard parameterized message for reporting incompatible options for worktree add. This reduces the number of strings to translate and makes the UI slightly more consistent. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09	revision, rev-parse: factorize incompatibility messages about - -exclude-hidden	René Scharfe
	Use the standard parameterized message for reporting incompatible options to report options that are not accepted in combination with --exclude-hidden. This reduces the number of strings to translate and makes the UI a bit more consistent. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-03	completion: avoid user confusion in non-cone mode	Elijah Newren
	It is tempting to think of "files and directories" of the current directory as valid inputs to the add and set subcommands of git sparse-checkout. However, in non-cone mode, they often aren't and using them as potential completions leads to many forms of confusion: Issue #1. It provides the wrong files and directories. For git sparse-checkout add we always want to add files and directories not currently in our sparse checkout, which means we want file and directories not currently present in the current working tree. Providing the files and directories currently present is thus always wrong. For git sparse-checkout set we have a similar problem except in the subset of cases where we are trying to narrow our checkout to a strict subset of what we already have. That is not a very common scenario, especially since it often does not even happen to be true for the first use of the command; for years we required users to create a sparse-checkout via git sparse-checkout init git sparse-checkout set <args...> (or use a clone option that did the init step for you at clone time). The init command creates a minimal sparse-checkout with just the top-level directory present, meaning the set command has to be used to expand the checkout. Thus, only in a special and perhaps unusual cases would any of the suggestions from normal file and directory completion be appropriate. Issue #2: Suggesting patterns that lead to warnings is unfriendly. If the user specifies any regular file and omits the leading '/', then the sparse-checkout command will warn the user that their command is problematic and suggest they use a leading slash instead. Issue #3: Completion gets confused by leading '/', and provides wrong paths. Users often want to anchor their patterns to the toplevel of the repository, especially when listing individual files. There are a number of reasons for this, but notably even sparse-checkout encourages them to do so (as noted above). However, if users do so (via adding a leading '/' to their pattern), then bash completion will interpret the leading slash not as a request for a path at the toplevel of the repository, but as a request for a path at the root of the filesytem. That means at best that completion cannot help with such paths, and if it does find any completions, they are almost guaranteed to be wrong. Issue #4: Suggesting invalid patterns from subdirectories is unfriendly. There is no per-directory equivalent to .gitignore with sparse-checkouts. There is only a single worktree-global $GIT_DIR/info/sparse-checkout file. As such, paths to files must be specified relative to the toplevel of a repository. Providing suggestions of paths that are relative to the current working directory, as bash completion defaults to, is wrong when the current working directory is not the worktree toplevel directory. Issue #5: Paths with special characters will be interpreted incorrectly The entries in the sparse-checkout file are patterns, not paths. While most paths also qualify as patterns (though even in such cases it would be better for users to not use them directly but prefix them with a leading '/'), there are a variety of special characters that would need special escaping beyond the normal shell escaping: '*', '?', '\', '[', ']', and any leading '#' or '!'. If completion suggests any such paths, users will likely expect them to be treated as an exact path rather than as a pattern that might match some number of files other than 1. However, despite the first four issues, we can note that _if_ users are using tab completion, then they are probably trying to specify a path in the index. As such, we transform their argument into a top-level-rooted pattern that matches such a file. For example, if they type: git sparse-checkout add Make<TAB> we could "complete" to git sparse-checkout add /Makefile or, if they ran from the Documentation/technical/ subdirectory: git sparse-checkout add m<TAB> we could "complete" it to: git sparse-checkout add /Documentation/technical/multi-pack-index.txt Note in both cases I use "complete" in quotes, because we actually add characters both before and after the argument in question, so we are kind of abusing "bash completions" to be "bash completions AND beginnings". The fifth issue is a bit stickier, especially when you consider that we not only need to deal with escaping issues because of special meanings of patterns in sparse-checkout & gitignore files, but also that we need to consider escaping issues due to ls-files needing to sometimes quote or escape characters, and because the shell needs to escape some characters. The multiple interacting forms of escaping could get ugly; this patch makes no attempt to do so and simply documents that we decided to not deal with those corner cases for now but at least get the common cases right. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>