diff options
author | Derrick Stolee <derrickstolee@github.com> | 2022-07-19 18:26:06 +0300 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2022-07-19 18:38:17 +0300 |
commit | 068fa54c0034be0b953d798628b06815f9cfaff0 (patch) | |
tree | 0c150de0ca27ce1bba559829e887f942e221e17e /midx.c | |
parent | 90b2bb710da47f12967a6f6a32640c197ca82685 (diff) |
midx: reduce memory pressure while writing bitmaps
We noticed that some 'git multi-pack-index write --bitmap' processes
were running with very high memory. It turns out that a lot of this
memory is required to store a list of every object in the written
multi-pack-index, with a second copy that has additional information
used for the bitmap writing logic.
Using 'valgrind --tool=massif' before this change, the following chart
shows how memory load increased and was maintained throughout the
process:
GB
4.102^ ::
| @ @::@@::@@::::::::@::::::@@:#:::::::::::::@@:: :
| :::::@@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :::: :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :::: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
0 +--------------------------------------------------------------->
It turns out that the 'struct write_midx_context' data is persisting
through the life of the process, including the 'entries' array. This
array is used last inside find_commits_for_midx_bitmap() within
write_midx_bitmap(). If we free (and nullify) the array at that point,
we can free a decent chunk of memory before the bitmap logic adds more
to the memory footprint.
Here is the massif memory load chart after this change:
GB
3.111^ #
| # :::::::::::@::::::::::::::@
| # ::::::::::::::::::::::::: : :: : @:: ::::: :: ::@
| @# :::::::::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| :::@#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
0 +--------------------------------------------------------------->
The previous change introduced a refactoring of write_midx_bitmap() to
make it more clear how much of the 'struct write_midx_context' instance
is needed at different parts of the process. In addition, the following
defensive programming measures were put in place:
1. Using FREE_AND_NULL() we will at least get a segfault from reading a
NULL pointer instead of a use-after-free.
2. 'entries_nr' is also set to zero to make any loop that would iterate
over the entries be trivial.
3. Add significant comments in write_midx_internal() to add warnings
for future authors who might accidentally add references to this
cleared memory.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'midx.c')
-rw-r--r-- | midx.c | 13 |
1 files changed, 13 insertions, 0 deletions
@@ -1451,6 +1451,15 @@ static int write_midx_internal(const char *object_dir, commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, &ctx); + /* + * The previous steps translated the information from + * 'entries' into information suitable for constructing + * bitmaps. We no longer need that array, so clear it to + * reduce memory pressure. + */ + FREE_AND_NULL(ctx.entries); + ctx.entries_nr = 0; + if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata, commits, commits_nr, ctx.pack_order, refs_snapshot, flags) < 0) { @@ -1459,6 +1468,10 @@ static int write_midx_internal(const char *object_dir, goto cleanup; } } + /* + * NOTE: Do not use ctx.entries beyond this point, since it might + * have been freed in the previous if block. + */ if (ctx.m) close_object_store(the_repository->objects); |