diff options
author | Jeff King <peff@peff.net> | 2022-09-22 13:11:43 +0300 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2022-09-22 21:30:06 +0300 |
commit | fbce4fa9ae4f010e509be288d8b113610e78781d (patch) | |
tree | 07ff62c1c626d3004233358f3b7d77f522cf5ed5 /builtin/fsck.c | |
parent | a0feb8611d4c0b2b5d954efe4e98207f62223436 (diff) |
fsck: free tree buffers after walking unreachable objects
After calling fsck_walk(), a tree object struct may be left in the
parsed state, with the full tree contents available via tree->buffer.
It's the responsibility of the caller to free these when it's done with
the object to avoid having many trees allocated at once.
In a regular "git fsck", we hit fsck_walk() only from fsck_obj(), which
does call free_tree_buffer(). Likewise for "--connectivity-only", we see
most objects via traverse_one_object(), which makes a similar call.
The exception is in mark_unreachable_referents(). When using both
"--connectivity-only" and "--dangling" (the latter of which is the
default), we walk all of the unreachable objects, and there we forget to
free. Most cases would not notice this, because they don't have a lot of
unreachable objects, but you can make a pathological case like this:
git clone --bare /path/to/linux.git repo.git
cd repo.git
rm packed-refs ;# now everything is unreachable!
git fsck --connectivity-only
That ends up with peak heap usage ~18GB, which is (not coincidentally)
close to the size of all uncompressed trees in the repository. After
this patch, the peak heap is only ~2GB.
A few things to note:
- it might seem like fsck_walk(), if it is parsing the trees, should
be responsible for freeing them. But the situation is quite tricky.
In the non-connectivity mode, after we call fsck_walk() we then
proceed with fsck_object() which actually does the type-specific
sanity checks on the object contents. We do pass our own separate
buffer to fsck_object(), but there's a catch: our earlier call to
parse_object_buffer() may have attached that buffer to the object
struct! So by freeing it, we leave the rest of the code with a
dangling pointer.
Likewise, the call to fsck_walk() in index-pack is subtle. It
attaches a buffer to the tree object that must not be freed! And
so rather than calling free_tree_buffer(), it actually detaches it
by setting tree->buffer to NULL.
These cases would _probably_ be fixable by having fsck_walk() free
the tree buffer only when it was the one who allocated it via
parse_tree(). But that would still leave the callers responsible for
freeing other cases, so they wouldn't be simplified. While the
current semantics for fsck_walk() make it easy to accidentally leak
in new callers, at least they are simple to explain, and it's not a
function that's likely to get a lot of new call-sites.
And in any case, it's probably sensible to fix the leak first with
this simple patch, and try any more complicated refactoring
separately.
- a careful reader may notice that fsck_obj() also frees commit
buffers, but neither the call in traverse_one_object() nor the one
touched in this patch does so. And indeed, this is another problem
for --connectivity-only (and accounts for most of the 2GB heap after
this patch), but it's one we'll fix in a separate commit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'builtin/fsck.c')
-rw-r--r-- | builtin/fsck.c | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/builtin/fsck.c b/builtin/fsck.c index 6c73092f10..2119758ef4 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -228,6 +228,8 @@ static void mark_unreachable_referents(const struct object_id *oid) options.walk = mark_used; fsck_walk(obj, NULL, &options); + if (obj->type == OBJ_TREE) + free_tree_buffer((struct tree *)obj); } static int mark_loose_unreachable_referents(const struct object_id *oid, |