Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-10commit-graph: check bounds when accessing BDAT chunkJeff King
When loading Bloom filters from a commit-graph file, we use the offset values in the BIDX chunk to index into the memory mapped for the BDAT chunk. But since we don't record how big the BDAT chunk is, we just trust that the BIDX offsets won't cause us to read outside of the chunk memory. A corrupted or malicious commit-graph file will cause us to segfault (in practice this isn't a very interesting attack, since commit-graph files are local-only, and the worst case is an out-of-bounds read). We can't fix this by checking the chunk size during parsing, since the data in the BDAT chunk doesn't have a fixed size (that's why we need the BIDX in the first place). So we'll fix it in two parts: 1. Record the BDAT chunk size during parsing, and then later check that the BIDX offsets we look up are within bounds. 2. Because the offsets are relative to the end of the BDAT header, we must also make sure that the BDAT chunk is at least as large as the expected header size. Otherwise, we overflow when trying to move past the header, even for an offset of "0". We can check this early, during the parsing stage. The error messages are rather verbose, but since this is not something you'd expect to see outside of severe bugs or corruption, it makes sense to err on the side of too many details. Sadly we can't mention the filename during the chunk-parsing stage, as we haven't set g->filename at this point, nor passed it down through the stack. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10commit-graph: bounds-check generation overflow chunkJeff King
If the generation entry in a commit-graph doesn't fit, we instead insert an offset into a generation overflow chunk. But since we don't record the size of the chunk, we may read outside the chunk if the offset we find on disk is malicious or corrupted. We can't check the size of the chunk up-front; it will vary based on how many entries need overflow. So instead, we'll do a bounds-check before accessing the chunk memory. Unfortunately there is no error-return from this function, so we'll just have to die(), which is what it does for other forms of corruption. As with other cases, we can drop the st_mult() call, since we know our bounds-checked value will fit within a size_t. Before this patch, the test here actually "works" because we read garbage data from the next chunk. And since that garbage data happens not to provide a generation number which changes the output, it appears to work. We could construct a case that actually segfaults or produces wrong output, but it would be a bit tricky. For our purposes its sufficient to check that we've detected the bounds error. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10commit-graph: check size of generations chunkJeff King
We neither check nor record the size of the generations chunk we parse from a commit-graph file. This should have one uint32_t for each commit in the file; if it is smaller (due to corruption, etc), we may read outside the mapped memory. The included test segfaults without this patch, as it shrinks the size considerably (and the chunk is near the end of the file, so we read off the end of the array rather than accidentally reading another chunk). We can fix this by checking the size up front (like we do for other fixed-size chunks, like CDAT). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10commit-graph: bounds-check base graphs chunkJeff King
When we are loading a commit-graph chain, we check that each slice of the chain points to the appropriate set of base graphs via its BASE chunk. But since we don't record the size of the chunk, we may access out-of-bounds memory if the file is corrupted. Since we know the number of entries we expect to find (based on the position within the commit-graph-chain file), we can just check the size up front. In theory this would also let us drop the st_mult() call a few lines later when we actually access the memory, since we know that the computed offset will fit in a size_t. But because the operands "g->hash_len" and "n" have types "unsigned char" and "int", we'd have to cast to size_t first. Leaving the st_mult() does that cast, and makes it more obvious that we don't have an overflow problem. Note that the test does not actually segfault before this patch, since it just reads garbage from the chunk after BASE (and indeed, it even rejects the file because that garbage does not have the expected hash value). You could construct a file with BASE at the end that did segfault, but corrupting the existing one is easy, and we can check stderr for the expected message. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10commit-graph: detect out-of-bounds extra-edges pointersJeff King
If an entry in a commit-graph file has more than 2 parents, the fixed-size parent fields instead point to an offset within an "extra edges" chunk. We blindly follow these, assuming that the chunk is present and sufficiently large; this can lead to an out-of-bounds read for a corrupt or malicious file. We can fix this by recording the size of the chunk and adding a bounds-check in fill_commit_in_graph(). There are a few tricky bits: 1. We'll switch from working with a pointer to an offset. This makes some corner cases just fall out naturally: a. If we did not find an EDGE chunk at all, our size will correctly be zero (so everything is "out of bounds"). b. Comparing "size / 4" lets us make sure we have at least 4 bytes to read, and we never compute a pointer more than one element past the end of the array (computing a larger pointer is probably OK in practice, but is technically undefined behavior). c. The current code casts to "uint32_t *". Replacing it with an offset avoids any comparison between different types of pointer (since the chunk is stored as "unsigned char *"). 2. This is the first case in which fill_commit_in_graph() may return anything but success. We need to make sure to roll back the "parsed" flag (and any parents we might have added before running out of buffer) so that the caller can cleanly fall back to loading the commit object itself. It's a little non-trivial to do this, and we might benefit from factoring it out. But we can wait on that until we actually see a second case where we return an error. As a bonus, this lets us drop the st_mult() call. Since we've already done a bounds check, we know there won't be any integer overflow (it would imply our buffer is larger than a size_t can hold). The included test does not actually segfault before this patch (though you could construct a case where it does). Instead, it reads garbage from the next chunk which results in it complaining about a bogus parent id. This is sufficient for our needs, though (we care that the fallback succeeds, and that stderr mentions the out-of-bounds read). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10commit-graph: check size of commit data chunkJeff King
We expect a commit-graph file to have a fixed-size data record for each commit in the file (and we know the number of commits to expct from the size of the lookup table). If we encounter a file where this is too small, we'll look past the end of the chunk (and possibly even off the mapped memory). We can fix this by checking the size up front when we record the pointer. The included test doesn't segfault, since it ends up reading bytes from another chunk. But it produces nonsense results, since the values it reads are garbage. Our test notices this by comparing the output to a non-corrupted run of the same command (and of course we also check that the expected error is printed to stderr). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10midx: check size of revindex chunkJeff King
When we load a revindex from disk, we check the size of the file compared to the number of objects we expect it to have. But when we use a RIDX chunk stored directly in the midx, we just access the memory directly. This can lead to out-of-bounds memory access for a corrupted or malicious multi-pack-index file. We can catch this by recording the RIDX chunk size, and then checking it against the expected size when we "load" the revindex. Note that this check is much simpler than the one that load_revindex_from_disk() does, because we just have the data array with no header (so we do not need to account for the header size, and nor do we need to bother validating the header values). The test confirms both that we catch this case, and that we continue the process (the revindex is required to use the midx bitmaps, but we fallback to a non-bitmap traversal). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10midx: bounds-check large offset chunkJeff King
When we see a large offset bit in the regular midx offset table, we use the entry as an index into a separate large offset table (just like a pack idx does). But we don't bounds-check the access to that large offset table (nor even record its size when we parse the chunk!). The equivalent code for a regular pack idx is in check_pack_index_ptr(). But things are a bit simpler here because of the chunked format: we can just check our array index directly. As a bonus, we can get rid of the st_mult() here. If our array bounds-check is successful, then we know that the result will fit in a size_t (and the bounds check uses a division to avoid overflow entirely). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10midx: check size of object offset chunkJeff King
The object offset chunk has one fixed-size entry for each object in the midx. But since we don't check its size, we may access out-of-bounds memory if we see a corrupt or malicious midx file. Sine the entries are fixed-size, the total length can be known up-front, and we can just check it while parsing the chunk (this is similar to what we do when opening pack idx files, which contain a similar offset table). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10midx: enforce chunk alignment on readingJeff King
The midx reader assumes chunks are aligned to a 4-byte boundary: we treat the fanout chunk as an array of uint32_t, indexing it to feed the results to ntohl(). Without aligning the chunks, we may violate the CPU's alignment constraints. Though many platforms allow this, some do not. And certanily UBSan will complain, since it is undefined behavior. Even though most chunks are naturally 4-byte-aligned (because they are storing uint32_t or larger types), PNAM is not. It stores NUL-terminated pack names, so you can have a valid chunk with any length. The writing side handles this by 4-byte-aligning the chunk, introducing a few extra NULs as necessary. But since we don't check this on the reading side, we may end up with a misaligned fanout and trigger the undefined behavior. We have two options here: 1. Swap out ntohl(fanout[i]) for get_be32(fanout+i) everywhere. The latter handles alignment itself. It's possible that it's slightly slower (though in practice I'm not sure how true that is, especially for these code paths which then go on to do a binary search). 2. Enforce the alignment when reading the chunks. This is easy to do, since the table-of-contents reader can check it in one spot. I went with the second option here, just because it places less burden on maintenance going forward (it is OK to continue using ntohl), and we know it can't have any performance impact on the actual reads. The commit-graph code uses the same chunk API. It's usually also 4-byte aligned, but some chunks are not (like Bloom filter BDAT chunks). So we'll pass "1" here to allow any alignment. It doesn't suffer from the same problem as midx with its fanout because the fanout chunk is always the first (and the rest of the format dictates that the first chunk will start aligned). The new test shows the effect on a midx with a misaligned PNAM chunk. Note that the midx-reading code treats chunk-toc errors as soft, falling back to the non-midx path rather than calling die(), as we do for other parsing errors. Arguably we should make all of these behave the same, but that's out of scope for this patch. For now the test just expects the fallback behavior. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10midx: check size of pack names chunkJeff King
We parse the pack-name chunk as a series of NUL-terminated strings. But since we don't look at the chunk size, there's nothing to guarantee that we don't parse off the end of the chunk (or even off the end of the mapped file). We can record the length, and then as we parse make sure that we never walk past it. The new test exercises the case, though note that it does not actually segfault before this patch. It hits a NUL byte somewhere in one of the other chunks, and comes up with a garbage pack name. You could construct one that reads out-of-bounds (e.g., a PNAM chunk at the end of file), but this case is simple and sufficient to check that we detect the problem. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10commit-graph: check consistency of fanout tableJeff King
We use bsearch_hash() to look up items in the oid index of a commit-graph. It also has a fanout table to reduce the initial range in which we'll search. But since the fanout comes from the on-disk file, a corrupted or malicious file can cause us to look outside of the allocated index memory. One solution here would be to pass the total table size to bsearch_hash(), which could then bounds check the values it reads from the fanout. But there's an inexpensive up-front check we can do, and it's the same one used by the midx and pack idx code (both of which likewise have fanout tables and use bsearch_hash(), but are not affected by this bug): 1. We can check the value of the final fanout entry against the size of the table we got from the index chunk. These must always match, since the fanout is just slicing up the index. As a side note, the midx and pack idx code compute it the other way around: they use the final fanout value as the object count, and check the index size against it. Either is valid; if they disagree we cannot know which is wrong (a corrupted fanout value, or a too-small table of oids). 2. We can quickly scan the fanout table to make sure it is monotonically increasing. If it is, then we know that every value is less than or equal to the final value, and therefore less than or equal to the table size. It would also be sufficient to just check that each fanout value is smaller than the final one, but the midx and pack idx code both do a full monotonicity check. It's the same cost, and it catches some other corruptions (though not all; the checks done by "commit-graph verify" are more complete but more expensive, and our goal here is to be fast and memory-safe). There are two new tests. One just checks the final fanout value (this is the mirror image of the "too small oid lookup" case added for the midx in the previous commit; it's flipped here because commit-graph considers the oid lookup chunk to be the source of truth). The other actually creates a fanout with many out-of-bounds entries, and prior to this patch, it does cause the segfault you'd expect. But note that the error is not "your fanout entry is out-of-bounds", but rather "fanout value out of order". That's because we leave the final fanout value in place (to get past the table size check), making the index non-monotonic (the second-to-last entry is big, but the last one must remain small to match the actual table). We need adjustments to a few existing tests, as well: - an earlier test in t5318 corrupts the fanout and runs "commit-graph verify". Its message is now changed, since we catch the problem earlier (during the load step, rather than the careful validation step). - in t5324, we test that "commit-graph verify --shallow" does not do expensive verification on the base file of the chain. But the corruption it uses (munging a byte at offset 1000) happens to be in the middle of the fanout table. And now we detect that problem in the cheaper checks that are performed for every part of the graph. We'll push this back to offset 1500, which is only caught by the more expensive checksum validation. Likewise, there's a later test in t5324 which munges an offset 100 bytes into a file (also in the fanout table) that is referenced by an alternates file. So we now find that corruption during the load step, rather than the verification step. At the very least we need to change the error message (like the case above in t5318). But it is probably good to make sure we handle all parts of the verification even for alternate graph files. So let's likewise corrupt byte 1500 and make sure we found the invalid checksum. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10midx: check size of oid lookup chunkJeff King
When reading an on-disk multi-pack-index, we take the number of objects in the midx from the final value of the fanout table. But we just blindly assume that the chunk containing the actual oid entries is the correct size. This can lead to us reading out-of-bounds memory if the lookup chunk is too small (or if the fanout is corrupted; when they don't agree we cannot tell which one is wrong). Note that we bump the assignment of m->num_objects into the fanout parser callback, so that it's set when we parse the lookup table (otherwise we'd have to manually record the lookup table size and check it later). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10commit-graph: check size of oid fanout chunkJeff King
We load the oid fanout chunk with pair_chunk(), which means we never see the size of the chunk. We just assume the on-disk file uses the appropriate size, and if it's too small we'll access random memory. It's easy to check this up-front; the fanout always consists of 256 uint32's, since it is a fanout of the first byte of the hash pointing into the oid index. These parameters can't be changed without introducing a new chunk type. This matches the similar check in the midx OIDF chunk (but note that rather than checking for the error immediately, the graph code just leaves parts of the struct NULL and checks for required fields later). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10midx: stop ignoring malformed oid fanout chunkJeff King
When we load the oid-fanout chunk, our callback checks that its size is reasonable and returns an error if not. However, the caller only checks our return value against CHUNK_NOT_FOUND, so we end up ignoring the error completely! Using a too-small fanout table means we end up accessing random memory for the fanout and segfault. We can fix this by checking for any non-zero return value, rather than just CHUNK_NOT_FOUND, and adjusting our error message to cover both cases. We could handle each error code individually, but there's not much point for such a rare case. The extra message produced in the callback makes it clear what is going on. The same pattern is used in the adjacent code. Those cases are actually OK for now because they do not use a custom callback, so the only error they can get is CHUNK_NOT_FOUND. But let's convert them, as this is an accident waiting to happen (especially as we convert some of them away from pair_chunk). The error messages are more verbose, but it should be rare for a user to see these anyway. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10t: add library for munging chunk-format filesJeff King
When testing corruption of files using the chunk format (like commit-graphs and midx files), it's helpful to be able to modify bytes in specific chunks. This requires being able both to read the table-of-contents (to find the chunk to modify) but also to adjust it (to account for size changes in the offsets of subsequent chunks). We have some tests already which corrupt chunk files, but they have some downsides: 1. They are very brittle, as they manually compute the expected size of a particular instance of the file (e.g., see the definitions starting with NUM_OBJECTS in t5319). 2. Because they rely on manual offsets and don't read the table-of-contents, they're limited to overwriting bytes. But there are many interesting corruptions that involve changing the sizes of chunks (especially smaller-than-expected ones). This patch adds a perl script which makes such corruptions easy. We'll use it in subsequent patches. Note that we could get by with just a big "perl -e" inside the helper function. I chose to put it in a separate script for two reasons. One, so we don't have to worry about the extra layer of shell quoting. And two, the script is kind of big, and running the tests with "-x" would repeatedly dump it into the log output. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10chunk-format: note that pair_chunk() is unsafeJeff King
The pair_chunk() function is provided as an easy helper for parsing chunks that just want a pointer to a set of bytes. But every caller has a hidden bug: because we return only the pointer without the matching chunk size, the callers have no clue how many bytes they are allowed to look at. And as a result, they may read off the end of the mmap'd data when the on-disk file does not match their expectations. Since chunk files are typically used for local-repository data like commit-graph files and midx's, the security implications here are pretty mild. The worst that can happen is that you hand somebody a corrupted repository tarball, and running Git on it does an out-of-bounds read and crashes. So it's worth being more defensive, but we don't need to drop everything and fix every caller immediately. I noticed the problem because the pair_chunk_fn() callback does not look at its chunk_size argument, and wanted to annotate it to silence -Wunused-parameter. We could do that now, but we'd lose the hint that this code should be audited and fixed. So instead, let's set ourselves up for going down that path: 1. Provide a pair_chunk() function that does return the size, which prepares us for fixing these cases. 2. Rename the existing function to pair_chunk_unsafe(). That gives us an easy way to grep for cases which still need to be fixed, and the name should cause anybody adding new calls to think twice before using it. There are no callers of the "safe" version yet, but we'll add some in subsequent patches. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-04The fifteenth batchJunio C Hamano
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-04Merge branch 'bb/unicode-width-table-15'Junio C Hamano
The display width table for unicode characters has been updated for Unicode 15.1 * bb/unicode-width-table-15: unicode: update the width tables to Unicode 15.1
2023-10-04Merge branch 'xz/commit-title-soft-limit-doc'Junio C Hamano
Doc tweak. * xz/commit-title-soft-limit-doc: doc: correct the 50 characters soft limit
2023-10-04Merge branch 'jk/commit-graph-verify-fix'Junio C Hamano
Various fixes to "git commit-graph verify". * jk/commit-graph-verify-fix: commit-graph: report incomplete chains during verification commit-graph: tighten chain size check commit-graph: detect read errors when verifying graph chain t5324: harmonize sha1/sha256 graph chain corruption commit-graph: check mixed generation validation when loading chain file commit-graph: factor out chain opening function
2023-10-04Merge branch 'ks/ref-filter-mailmap'Junio C Hamano
"git for-each-ref" and friends learn to apply mailmap to authorname and other fields. * ks/ref-filter-mailmap: ref-filter: add mailmap support t/t6300: introduce test_bad_atom t/t6300: cleanup test_atom
2023-10-04Merge branch 'ps/revision-cmdline-stdin-not'Junio C Hamano
"git rev-list --stdin" learned to take non-revisions (like "--not") recently from the standard input, but the way such a "--not" was handled was quite confusing, which has been rethought. This is potentially a change that breaks backward compatibility. * ps/revision-cmdline-stdin-not: revision: make pseudo-opt flags read via stdin behave consistently
2023-10-02The fourteenth batchJunio C Hamano
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02Merge branch 'js/doc-status-with-submodules-mark-up-fix'Junio C Hamano
Docfix. * js/doc-status-with-submodules-mark-up-fix: Documentation/git-status: add missing line breaks
2023-10-02Merge branch 'jc/unresolve-removal'Junio C Hamano
"checkout --merge -- path" and "update-index --unresolve path" did not resurrect conflicted state that was resolved to remove path, but now they do. * jc/unresolve-removal: checkout: allow "checkout -m path" to unmerge removed paths checkout/restore: add basic tests for --merge checkout/restore: refuse unmerging paths unless checking out of the index update-index: remove stale fallback code for "--unresolve" update-index: use unmerge_index_entry() to support removal resolve-undo: allow resurrecting conflicted state that resolved to deletion update-index: do not read HEAD and MERGE_HEAD unconditionally
2023-09-29The thirteenth batchJunio C Hamano
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-29Merge branch 'ob/am-msgfix'Junio C Hamano
The parameters to generate an error message have been corrected. * ob/am-msgfix: am: fix error message in parse_opt_show_current_patch()
2023-09-29Merge branch 'jk/test-pass-ubsan-options-to-http-test'Junio C Hamano
UBSAN options were not propagated through the test framework to git run via the httpd, unlike ASAN options, which has been corrected. * jk/test-pass-ubsan-options-to-http-test: test-lib: set UBSAN_OPTIONS to match ASan
2023-09-29Merge branch 'jc/alias-completion'Junio C Hamano
The command line completion script (in contrib/) can be told to complete aliases by including ": git <cmd> ;" in the alias to tell it that the alias should be completed similar to how "git <cmd>" is completed. The parsing code for the alias as been loosened to allow ';' without an extra space before it. * jc/alias-completion: completion: loosen and document the requirement around completing alias
2023-09-29Merge branch 'hy/doc-show-is-like-log-not-diff-tree'Junio C Hamano
Doc update. * hy/doc-show-is-like-log-not-diff-tree: show doc: redirect user to git log manual instead of git diff-tree
2023-09-29Merge branch 'kh/range-diff-notes'Junio C Hamano
"git range-diff --notes=foo" compared "log --notes=foo --notes" of the two ranges, instead of using just the specified notes tree. * kh/range-diff-notes: range-diff: treat notes like `log`
2023-09-29Merge branch 'ds/stat-name-width-configuration'Junio C Hamano
"git diff" learned diff.statNameWidth configuration variable, to give the default width for the name part in the "--stat" output. * ds/stat-name-width-configuration: diff --stat: add config option to limit filename width
2023-09-29Merge branch 'jk/fsmonitor-unused-parameter'Junio C Hamano
Unused parameters in fsmonitor related code paths have been marked as such. * jk/fsmonitor-unused-parameter: run-command: mark unused parameters in start_bg_wait callbacks fsmonitor: mark unused hashmap callback parameters fsmonitor/darwin: mark unused parameters in system callback fsmonitor: mark unused parameters in stub functions fsmonitor/win32: mark unused parameter in fsm_os__incompatible() fsmonitor: mark some maybe-unused parameters fsmonitor/win32: drop unused parameters fsmonitor: prefer repo_git_path() to git_pathdup()
2023-09-29Merge branch 'ml/git-gui-exec-path-fix'Junio C Hamano
Fix recent regression in Git-GUI that fails to run hook scripts at all. * ml/git-gui-exec-path-fix: git-gui - use git-hook, honor core.hooksPath git-gui - re-enable use of hook scripts
2023-09-28doc: correct the 50 characters soft limit谢致邦 (XIE Zhibang)
The soft limit of the first line of the commit message should be "no more than 50 characters" or "50 characters or less", but not "less than 50 character". Signed-off-by: 谢致邦 (XIE Zhibang) <Yeking@Red54.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-28commit-graph: report incomplete chains during verificationJeff King
The load_commit_graph_chain_fd_st() function will stop loading chains when it sees an error. But if it has loaded any graph slice at all, it will return it. This is a good thing for normal use (we use what data we can, and this is just an optimization). But it's a bad thing for "commit-graph verify", which should be careful about finding any irregularities. We do complain to stderr with a warning(), but the verify command still exits with a successful return code. The new tests here cover corruption of both the base and tip slices of the chain. The corruption of the base file already works (it is the first file we look at, so when we see the error we return NULL). The "tip" case is what is fixed by this patch (it complains to stderr but still returns the base slice). Likewise the existing tests for corruption of the commit-graph-chain file itself need to be updated. We already exited non-zero correctly for the "base" case, but the "tip" case can now do so, too. Note that this also causes us to adjust a test later in the file that similarly corrupts a tip (though confusingly the test script calls this "base"). It checks stderr but erroneously expects the whole "verify" command to exit with a successful code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-28commit-graph: tighten chain size checkJeff King
When we open a commit-graph-chain file, if it's smaller than a single entry, we just quietly treat that as ENOENT. That make some sense if the file is truly zero bytes, but it means that "commit-graph verify" will quietly ignore a file that contains garbage if that garbage happens to be short. Instead, let's only simulate ENOENT when the file is truly empty, and otherwise return EINVAL. The normal graph-loading routines don't care, but "commit-graph verify" will notice and complain about the difference. It's not entirely clear to me that the 0-is-ENOENT case actually happens in real life, so we could perhaps just eliminate this special-case altogether. But this is how we've always behaved, so I'm preserving it in the name of backwards compatibility (though again, it really only matters for "verify", as the regular routines are happy to load what they can). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-28commit-graph: detect read errors when verifying graph chainJeff King
Because it's OK to not have a graph file at all, the graph_verify() function needs to tell the difference between a missing file and a real error. So when loading a traditional graph file, we call open_commit_graph() separately from load_commit_graph_chain_fd_st(), and don't complain if the first one fails with ENOENT. When the function learned about chain files in 3da4b609bb (commit-graph: verify chains with --shallow mode, 2019-06-18), we couldn't be as careful, since the only way to load a chain was with read_commit_graph_one(), which did both the open/load as a single unit. So we'll miss errors in chain files we load, thinking instead that there was just no chain file at all. Note that we do still report some of these problems to stderr, as the loading function calls error() and warning(). But we'd exit with a successful exit code, which is wrong. We can fix that by using the recently split open/load functions for chains. That lets us treat the chain file just like a single file with respect to error handling here. An existing test (from 3da4b609bb) shows off the problem; we were expecting "commit-graph verify" to report success, but that makes no sense. We did not even verify the contents of the graph data, because we couldn't load it! I don't think this was an intentional exception, but rather just the test covering what happened to occur. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-28t5324: harmonize sha1/sha256 graph chain corruptionJeff King
In t5324.20, we corrupt a hex character 60 bytes into the graph chain file. Since the file consists of two hash identifiers, one per line, the corruption differs between sha1 and sha256. In a sha1 repository, the corruption is on the second line, and in a sha256 repository, it is on the first. We should of course detect the problem with either line. But as the next few patches will show (and fix), that is not the case (in fact, we currently do not exit non-zero for either line!). And while at the end of our series we'll catch all errors, our intermediate states will have differing behavior between the two hashes. Let's make sure we test corruption of both the first and second lines, and do so consistently with either hash by choosing offsets which are always in the first hash (30 bytes) or in the second (70). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-28commit-graph: check mixed generation validation when loading chain fileJeff King
In read_commit_graph_one(), we call validate_mixed_generation_chain() after loading the graph. Even though we don't check the return value, this has the side effect of clearing the read_generation_data flag, which is important when working with mixed generation numbers. But doing this in load_commit_graph_chain_fd_st() makes more sense: 1. We are calling it even when we did not load a chain at all, which is pointless (you cannot have mixed generations in a single file). 2. For now, all callers load the graph via read_commit_graph_one(). But the point of factoring out the open/load in the previous commit was to let "commit-graph verify" call them separately. So it needs to trigger this function as part of the load. Without this patch, the mixed-generation tests in t5324 would start failing on "git commit-graph verify" calls, once we switch to using a separate open/load call there. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-28commit-graph: factor out chain opening functionJeff King
The load_commit_graph_chain() function opens the chain file and all of the slices of graph that it points to. If there is no chain file (which is a totally normal condition), we return NULL. But if we run into errors with the chain file or loading the actual graph data, we also return NULL, and the caller cannot tell the difference. The caller can check for ENOENT for the unremarkable "no such file" case. But I'm hesitant to assume that the rest of the function would never accidentally set errno to ENOENT itself, since it is opening the slice files (and that would mean the caller fails to notice a real error). So let's break this into two functions: one to open the file, and one to actually load it. This matches the interface we provide for the non-chain graph file, which will also come in handy in a moment when we fix some bugs in the "git commit-graph verify" code. Some notes: - I've kept the "1 is good, 0 is bad" return convention (and the weird "fd" out-parameter) used by the matching open_commit_graph() function and other parts of the commit-graph code. This is unlike most of the rest of Git (which would just return the fd, with -1 for error), but it makes sense to stay consistent with the adjacent bits of the API here. - The existing chain loading function will quietly return if the file is too small to hold a single entry. I've retained that behavior (and explicitly set ENOENT in the opener function) for now, under the notion that it's probably valid (though I'd imagine unusual) to have an empty chain file. There are two small behavior changes here, but I think both are strictly positive: 1. The original blindly did a stat() before checking if fopen() succeeded, meaning we were making a pointless extra stat call. 2. We now use fstat() to check the file size. The previous code using a regular stat() on the pathname meant we could technically race with somebody updating the chain file, and end up with a size that does not match what we just opened with fopen(). I doubt anybody ever hit this in practice, but it may have caused an out-of-bounds read. We'll retain the load_commit_graph_chain() function which does both the open and reading steps (most existing callers do not care about seeing errors anyway, since loading commit-graphs is optimistic). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-26unicode: update the width tables to Unicode 15.1Beat Bolli
Unicode 15.1 has been announced on 2023-09-12 [0], so update the character width tables to the new version. [0] http://blog.unicode.org/2023/09/announcing-unicode-standard-version-151.html Signed-off-by: Beat Bolli <dev+git@drbeat.li> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-26ref-filter: add mailmap supportKousik Sanagavarapu
Add mailmap support to ref-filter formats which are similar in pretty. This support is such that the following pretty placeholders are equivalent to the new ref-filter atoms: %aN = authorname:mailmap %cN = committername:mailmap %aE = authoremail:mailmap %aL = authoremail:mailmap,localpart %cE = committeremail:mailmap %cL = committeremail:mailmap,localpart Additionally, mailmap can also be used with ":trim" option for email by doing something like "authoremail:mailmap,trim". The above also applies for the "tagger" atom, that is, "taggername:mailmap", "taggeremail:mailmap", "taggeremail:mailmap,trim" and "taggername:mailmap,localpart". The functionality of ":trim" and ":localpart" remains the same. That is, ":trim" gives the email, but without the angle brackets and ":localpart" gives the part of the email before the '@' character (if such a character is not found then we directly grab everything between the angle brackets). Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-26t/t6300: introduce test_bad_atomKousik Sanagavarapu
Introduce a new function "test_bad_atom", which is similar to "test_atom()" but should be used to check whether the correct error message is shown on stderr. Like "test_atom", the new function takes three arguments. The three arguments specify the ref, the format and the expected error message respectively, with an optional fourth argument for tweaking "test_expect_*" (which is by default "success"). Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-26t/t6300: cleanup test_atomKousik Sanagavarapu
Previously, when the executable part of "test_expect_{success,failure}" (inside "test_atom") got "eval"ed, it would have been syntactically incorrect if the second argument ($2, which is the format) to "test_atom" were enclosed in single quotes because the $variables would get interpolated even before the arguments to "test_expect_{success,failure}" are formed. So fix this and also some style issues along the way. Helped-by: Junio C Hamano <gitster@pobox.com> Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-25revision: make pseudo-opt flags read via stdin behave consistentlyPatrick Steinhardt
When reading revisions from stdin via git-rev-list(1)'s `--stdin` option then these revisions never honor flags like `--not` which have been passed on the command line. Thus, an invocation like e.g. `git rev-list --all --not --stdin` will not treat all revisions read from stdin as uninteresting. While this behaviour may be surprising to a user, it's been this way ever since it has been introduced via 42cabc341c4 (Teach rev-list an option to read revs from the standard input., 2006-09-05). With that said, in c40f0b7877 (revision: handle pseudo-opts in `--stdin` mode, 2023-06-15) we have introduced a new mode to read pseudo opts from standard input where this behaviour is a lot more confusing. If you pass `--not` via stdin, it will: - Influence subsequent revisions or pseudo-options passed on the command line. - Influence pseudo-options passed via standard input. - _Not_ influence normal revisions passed via standard input. This behaviour is extremely inconsistent and bound to cause confusion. While it would be nice to retroactively change the behaviour for how `--not` and `--stdin` behave together, chances are quite high that this would break existing scripts that expect the current behaviour that has been around for many years by now. This is thus not really a viable option to explore to fix the inconsistency. Instead, we change the behaviour of how pseudo-opts read via standard input influence the flags such that the effect is fully localized. With this change, when reading `--not` via standard input, it will: - _Not_ influence subsequent revisions or pseudo-options passed on the command line, which is a change in behaviour. - Influence pseudo-options passed via standard input. - Influence normal revisions passed via standard input, which is a change in behaviour. Thus, all flags read via standard input are fully self-contained to that standard input, only. While this is a breaking change as well, the behaviour has only been recently introduced with Git v2.42.0. Furthermore, the current behaviour can be regarded as a simple bug. With that in mind it feels like the right thing to retroactively change it and make the behaviour sane. Signed-off-by: Patrick Steinhardt <ps@pks.im> Reported-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-23The twelfth batchJunio C Hamano
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-23Merge branch 'tb/send-email-extract-valid-address-error-message-fix'Junio C Hamano
An error message given by "git send-email" when given a malformed address did not give correct information, which has been corrected. * tb/send-email-extract-valid-address-error-message-fix: git-send-email.perl: avoid printing undef when validating addresses
2023-09-23Merge branch 'ch/clean-docfix'Junio C Hamano
Typofix. * ch/clean-docfix: git-clean doc: fix "without do cleaning" typo