Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJunio C Hamano <gitster@pobox.com>2024-01-13 03:09:56 +0300
committerJunio C Hamano <gitster@pobox.com>2024-01-13 03:09:56 +0300
commit0fea6b73f1771469cc423288fcd754d374865400 (patch)
treeead5e63ecb0371557bf920fdd0e150d87219877b /Documentation
parent0ebbaa07d0c19b2ade6fed9660e10dcf834fc922 (diff)
parentba47d88795e12193e7b0fffc5130757a5517a5da (diff)
Merge branch 'tb/multi-pack-verbatim-reuse'
Streaming spans of packfile data used to be done only from a single, primary, pack in a repository with multiple packfiles. It has been extended to allow reuse from other packfiles, too. * tb/multi-pack-verbatim-reuse: (26 commits) t/perf: add performance tests for multi-pack reuse pack-bitmap: enable reuse from all bitmapped packs pack-objects: allow setting `pack.allowPackReuse` to "single" t/test-lib-functions.sh: implement `test_trace2_data` helper pack-objects: add tracing for various packfile metrics pack-bitmap: prepare to mark objects from multiple packs for reuse pack-revindex: implement `midx_pair_to_pack_pos()` pack-revindex: factor out `midx_key_to_pack_pos()` helper midx: implement `midx_preferred_pack()` git-compat-util.h: implement checked size_t to uint32_t conversion pack-objects: include number of packs reused in output pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse pack-objects: prepare `write_reused_pack()` for multi-pack reuse pack-objects: pass `bitmapped_pack`'s to pack-reuse functions pack-objects: keep track of `pack_start` for each reuse pack pack-objects: parameterize pack-reuse routines over a single pack pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()` pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature ewah: implement `bitmap_is_empty()` pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/config/pack.txt16
-rw-r--r--Documentation/gitformat-pack.txt76
2 files changed, 87 insertions, 5 deletions
diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index f50df9dbce..9c630863e6 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -28,11 +28,17 @@ all existing objects. You can force recompression by passing the -F option
to linkgit:git-repack[1].
pack.allowPackReuse::
- When true, and when reachability bitmaps are enabled,
- pack-objects will try to send parts of the bitmapped packfile
- verbatim. This can reduce memory and CPU usage to serve fetches,
- but might result in sending a slightly larger pack. Defaults to
- true.
+ When true or "single", and when reachability bitmaps are
+ enabled, pack-objects will try to send parts of the bitmapped
+ packfile verbatim. When "multi", and when a multi-pack
+ reachability bitmap is available, pack-objects will try to send
+ parts of all packs in the MIDX.
++
+ If only a single pack bitmap is available, and
+ `pack.allowPackReuse` is set to "multi", reuse parts of just the
+ bitmapped packfile. This can reduce memory and CPU usage to
+ serve fetches, but might result in sending a slightly larger
+ pack. Defaults to true.
pack.island::
An extended regular expression configuring a set of delta
diff --git a/Documentation/gitformat-pack.txt b/Documentation/gitformat-pack.txt
index 9fcb29a9c8..d6ae229be5 100644
--- a/Documentation/gitformat-pack.txt
+++ b/Documentation/gitformat-pack.txt
@@ -396,6 +396,15 @@ CHUNK DATA:
is padded at the end with between 0 and 3 NUL bytes to make the
chunk size a multiple of 4 bytes.
+ Bitmapped Packfiles (ID: {'B', 'T', 'M', 'P'})
+ Stores a table of two 4-byte unsigned integers in network order.
+ Each table entry corresponds to a single pack (in the order that
+ they appear above in the `PNAM` chunk). The values for each table
+ entry are as follows:
+ - The first bit position (in pseudo-pack order, see below) to
+ contain an object from that pack.
+ - The number of bits whose objects are selected from that pack.
+
OID Fanout (ID: {'O', 'I', 'D', 'F'})
The ith entry, F[i], stores the number of OIDs with first
byte at most i. Thus F[255] stores the total
@@ -509,6 +518,73 @@ packs arranged in MIDX order (with the preferred pack coming first).
The MIDX's reverse index is stored in the optional 'RIDX' chunk within
the MIDX itself.
+=== `BTMP` chunk
+
+The Bitmapped Packfiles (`BTMP`) chunk encodes additional information
+about the objects in the multi-pack index's reachability bitmap. Recall
+that objects from the MIDX are arranged in "pseudo-pack" order (see
+above) for reachability bitmaps.
+
+From the example above, suppose we have packs "a", "b", and "c", with
+10, 15, and 20 objects, respectively. In pseudo-pack order, those would
+be arranged as follows:
+
+ |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|
+
+When working with single-pack bitmaps (or, equivalently, multi-pack
+reachability bitmaps with a preferred pack), linkgit:git-pack-objects[1]
+performs ``verbatim'' reuse, attempting to reuse chunks of the bitmapped
+or preferred packfile instead of adding objects to the packing list.
+
+When a chunk of bytes is reused from an existing pack, any objects
+contained therein do not need to be added to the packing list, saving
+memory and CPU time. But a chunk from an existing packfile can only be
+reused when the following conditions are met:
+
+ - The chunk contains only objects which were requested by the caller
+ (i.e. does not contain any objects which the caller didn't ask for
+ explicitly or implicitly).
+
+ - All objects stored in non-thin packs as offset- or reference-deltas
+ also include their base object in the resulting pack.
+
+The `BTMP` chunk encodes the necessary information in order to implement
+multi-pack reuse over a set of packfiles as described above.
+Specifically, the `BTMP` chunk encodes three pieces of information (all
+32-bit unsigned integers in network byte-order) for each packfile `p`
+that is stored in the MIDX, as follows:
+
+`bitmap_pos`:: The first bit position (in pseudo-pack order) in the
+ multi-pack index's reachability bitmap occupied by an object from `p`.
+
+`bitmap_nr`:: The number of bit positions (including the one at
+ `bitmap_pos`) that encode objects from that pack `p`.
+
+For example, the `BTMP` chunk corresponding to the above example (with
+packs ``a'', ``b'', and ``c'') would look like:
+
+[cols="1,2,2"]
+|===
+| |`bitmap_pos` |`bitmap_nr`
+
+|packfile ``a''
+|`0`
+|`10`
+
+|packfile ``b''
+|`10`
+|`15`
+
+|packfile ``c''
+|`25`
+|`20`
+|===
+
+With this information in place, we can treat each packfile as
+individually reusable in the same fashion as verbatim pack reuse is
+performed on individual packs prior to the implementation of the `BTMP`
+chunk.
+
== cruft packs
The cruft packs feature offer an alternative to Git's traditional mechanism of