Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJeff King <peff@peff.net>2022-09-22 13:11:43 +0300
committerJunio C Hamano <gitster@pobox.com>2022-09-22 21:30:06 +0300
commitfbce4fa9ae4f010e509be288d8b113610e78781d (patch)
tree07ff62c1c626d3004233358f3b7d77f522cf5ed5 /builtin/fsck.c
parenta0feb8611d4c0b2b5d954efe4e98207f62223436 (diff)
fsck: free tree buffers after walking unreachable objects
After calling fsck_walk(), a tree object struct may be left in the parsed state, with the full tree contents available via tree->buffer. It's the responsibility of the caller to free these when it's done with the object to avoid having many trees allocated at once. In a regular "git fsck", we hit fsck_walk() only from fsck_obj(), which does call free_tree_buffer(). Likewise for "--connectivity-only", we see most objects via traverse_one_object(), which makes a similar call. The exception is in mark_unreachable_referents(). When using both "--connectivity-only" and "--dangling" (the latter of which is the default), we walk all of the unreachable objects, and there we forget to free. Most cases would not notice this, because they don't have a lot of unreachable objects, but you can make a pathological case like this: git clone --bare /path/to/linux.git repo.git cd repo.git rm packed-refs ;# now everything is unreachable! git fsck --connectivity-only That ends up with peak heap usage ~18GB, which is (not coincidentally) close to the size of all uncompressed trees in the repository. After this patch, the peak heap is only ~2GB. A few things to note: - it might seem like fsck_walk(), if it is parsing the trees, should be responsible for freeing them. But the situation is quite tricky. In the non-connectivity mode, after we call fsck_walk() we then proceed with fsck_object() which actually does the type-specific sanity checks on the object contents. We do pass our own separate buffer to fsck_object(), but there's a catch: our earlier call to parse_object_buffer() may have attached that buffer to the object struct! So by freeing it, we leave the rest of the code with a dangling pointer. Likewise, the call to fsck_walk() in index-pack is subtle. It attaches a buffer to the tree object that must not be freed! And so rather than calling free_tree_buffer(), it actually detaches it by setting tree->buffer to NULL. These cases would _probably_ be fixable by having fsck_walk() free the tree buffer only when it was the one who allocated it via parse_tree(). But that would still leave the callers responsible for freeing other cases, so they wouldn't be simplified. While the current semantics for fsck_walk() make it easy to accidentally leak in new callers, at least they are simple to explain, and it's not a function that's likely to get a lot of new call-sites. And in any case, it's probably sensible to fix the leak first with this simple patch, and try any more complicated refactoring separately. - a careful reader may notice that fsck_obj() also frees commit buffers, but neither the call in traverse_one_object() nor the one touched in this patch does so. And indeed, this is another problem for --connectivity-only (and accounts for most of the 2GB heap after this patch), but it's one we'll fix in a separate commit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'builtin/fsck.c')
-rw-r--r--builtin/fsck.c2
1 files changed, 2 insertions, 0 deletions
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 6c73092f10..2119758ef4 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -228,6 +228,8 @@ static void mark_unreachable_referents(const struct object_id *oid)
options.walk = mark_used;
fsck_walk(obj, NULL, &options);
+ if (obj->type == OBJ_TREE)
+ free_tree_buffer((struct tree *)obj);
}
static int mark_loose_unreachable_referents(const struct object_id *oid,