From 31deb28f5e0c85e8bd556ba135e5f0e0926bad7a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= Date: Fri, 1 Oct 2021 11:16:52 +0200 Subject: fsck: don't hard die on invalid object types MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Change the error fsck emits on invalid object types, such as: $ git hash-object --stdin -w -t garbage --literally From the very ungraceful error of: $ git fsck fatal: invalid object type $ To: $ git fsck error: : object is of unknown type 'garbage': [ other fsck output ] We'll still exit with non-zero, but now we'll finish the rest of the traversal. The tests that's being added here asserts that we'll still complain about other fsck issues (e.g. an unrelated dangling blob). To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag from read_loose_object() through to parse_loose_header(). Since the read_loose_object() function is only used in builtin/fsck.c we can simply change it to accept a "struct object_info" (which contains the OBJECT_INFO_ALLOW_UNKNOWN_TYPE in its flags). See f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13) for the introduction of read_loose_object(). Since we'll need a "struct strbuf" to hold the "type_name" let's pass it to the for_each_loose_file_in_objdir() callback to avoid allocating a new one for each loose object in the iteration. It also makes the memory management simpler than sticking it in fsck_loose() itself, as we'll only need to strbuf_reset() it, with no need to do a strbuf_release() before each "return". Before this commit we'd never check the "type" if read_loose_object() failed, but now we do. We therefore need to initialize it to OBJ_NONE to be able to tell the difference between e.g. its unpack_loose_header() having failed, and us getting past that and into parse_loose_header(). Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Junio C Hamano --- object-store.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'object-store.h') diff --git a/object-store.h b/object-store.h index ec32c23dcb..3eb597a82a 100644 --- a/object-store.h +++ b/object-store.h @@ -236,6 +236,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime); /* * Open the loose object at path, check its hash, and return the contents, + * use the "oi" argument to assert things about the object, or e.g. populate its * type, and size. If the object is a blob, then "contents" may return NULL, * to allow streaming of large blobs. * @@ -243,9 +244,8 @@ int force_object_loose(const struct object_id *oid, time_t mtime); */ int read_loose_object(const char *path, const struct object_id *expected_oid, - enum object_type *type, - unsigned long *size, - void **contents); + void **contents, + struct object_info *oi); /* Retry packed storage after checking packed and loose storage */ #define HAS_OBJECT_RECHECK_PACKED 1 -- cgit v1.2.3 From 96e41f58fe1a5aeadf2bf1c1850c53a1c1144bbc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= Date: Fri, 1 Oct 2021 11:16:53 +0200 Subject: fsck: report invalid object type-path combinations MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Improve the error that's emitted in cases where we find a loose object we parse, but which isn't at the location we expect it to be. Before this change we'd prefix the error with a not-a-OID derived from the path at which the object was found, due to an emergent behavior in how we'd end up with an "OID" in these codepaths. Now we'll instead say what object we hashed, and what path it was found at. Before this patch series e.g.: $ git hash-object --stdin -w -t blob >objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 $ git fsck error: garbage at end of loose object 'e69d[...]' error: unable to unpack contents of ./objects/e6/9d[...] error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...] There is currently some weird messaging in the edge case when the two are combined, i.e. because we're not explicitly passing along an error state about this specific scenario from check_stream_oid() via read_loose_object() we'll end up printing the null OID if an object is of an unknown type *and* it can't be unpacked by zlib, e.g.: $ git hash-object --stdin -w -t garbage --literally >objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f $ /usr/bin/git fsck fatal: invalid object type $ ~/g/git/git fsck error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f' error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f [...] I think it's OK to leave that for future improvements, which would involve enum-ifying more error state as we've done with "enum unpack_loose_header_result" in preceding commits. In these increasingly more obscure cases the worst that can happen is that we'll get slightly nonsensical or inapplicable error messages. There's other such potential edge cases, all of which might produce some confusing messaging, but still be handled correctly as far as passing along errors goes. E.g. if check_object_signature() returns and oideq(real_oid, null_oid()) is true, which could happen if it returns -1 due to the read_istream() call having failed. Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Junio C Hamano --- object-store.h | 1 + 1 file changed, 1 insertion(+) (limited to 'object-store.h') diff --git a/object-store.h b/object-store.h index 3eb597a82a..6b9ffcffb2 100644 --- a/object-store.h +++ b/object-store.h @@ -244,6 +244,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime); */ int read_loose_object(const char *path, const struct object_id *expected_oid, + struct object_id *real_oid, void **contents, struct object_info *oi); -- cgit v1.2.3