fsck: require an actual buffer for non-blobs

The fsck_object() function takes in a buffer, but also a "struct object". The rules for using these vary between types: - for a commit, we'll use the provided buffer; if it's NULL, we'll fall back to get_commit_buffer(), which loads from either an in-memory cache or from disk. If the latter fails, we'd die(), which is non-ideal for fsck. - for a tag, a NULL buffer will fall back to loading the object from disk (and failure would lead to an fsck error) - for a tree, we _never_ look at the provided buffer, and always use tree->buffer - for a blob, we usually don't look at the buffer at all, unless it has been marked as a .gitmodule file. In that case we check the buffer given to us, or assume a NULL buffer is a very large blob (and complain about it) This is much more complex than it needs to be. It turns out that nobody ever feeds a NULL buffer that isn't a blob: - git-fsck calls fsck_object() only from fsck_obj(). That in turn is called by one of: - fsck_obj_buffer(), which is a callback to verify_pack(), which unpacks everything except large blobs into a buffer (see pack-check.c, lines 131-141). - fsck_loose(), which hits a BUG() on non-blobs with a NULL buffer (builtin/fsck.c, lines 639-640) And in either case, we'll have just called parse_object_buffer() anyway, which would segfault on a NULL buffer for commits or tags (not for trees, but it would install a NULL tree->buffer which would later cause a segfault) - git-index-pack asserts that the buffer is non-NULL unless the object is a blob (see builtin/index-pack.c, line 832) - git-unpack-objects always writes a non-NULL buffer into its obj_buffer hash, which is then fed to fsck_object(). (There is actually a funny thing here where it does not store blob buffers at all, nor does it call fsck on them; it does check any needed blobs via fsck_finish() though). Let's make the rules simpler, which reduces the amount of code and gives us more flexibility in refactoring the fsck code. The new rules are: - only blobs are allowed to pass a NULL buffer - we always use the provided buffer, never pulling information from the object struct We don't have to adjust any callers, because they were already adhering to these. Note that we do drop a few fsck identifiers for missing tags, but that was all dead code (because nobody passed a NULL tag buffer). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
author: Jeff King <peff@peff.net> 2019-10-18 07:54:12 +0300
committer: Junio C Hamano <gitster@pobox.com> 2019-10-28 08:05:17 +0300
commit: 23a173a761c9ed9a1e90167386e8b908728f27c0 (patch)
tree: 9295c1f246487f152f0b6fa5a0779fb0fc631279 /fsck.h
parent: 2175a0c601af269c7aa335bc7faf27e36173ca08 (diff)
1 files changed, 5 insertions, 1 deletions
diff --git a/fsck.h b/fsck.h
index b95595ae5f..e479461075 100644
--- a/fsck.h
+++ b/fsck.h
@@ -52,7 +52,11 @@ struct fsck_options {
  *    0		everything OK
  */
 int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
-/* If NULL is passed for data, we assume the object is local and read it. */
+
+/*
+ * Blob objects my pass a NULL data pointer, which indicates they are too large
+ * to fit in memory. All other types must pass a real buffer.
+ */
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
author	Jeff King <peff@peff.net>	2019-10-18 07:54:12 +0300
committer	Junio C Hamano <gitster@pobox.com>	2019-10-28 08:05:17 +0300
commit	23a173a761c9ed9a1e90167386e8b908728f27c0 (patch)
tree	9295c1f246487f152f0b6fa5a0779fb0fc631279 /fsck.h
parent	2175a0c601af269c7aa335bc7faf27e36173ca08 (diff)