Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-09-26pack-bitmap: remove trace2 region from hot pathDerrick Stolee
The trace2 region around the call to lazy_bitmap_for_commit() in bitmap_for_commit() was added in 28cd730680d (pack-bitmap: prepare to read lookup table extension, 2022-08-14). While adding trace2 regions is typically helpful for tracking performance, this method is called possibly thousands of times as a commit walk explores commit history looking for a matching bitmap. When trace2 output is enabled, this region is emitted many times and performance is throttled by that output. For now, remove these regions entirely. This is a critical path, and it would be valuable to measure that the time spent in bitmap_for_commit() does not increase when using the commit lookup table. The best way to do that would be to use a mechanism that sums the time spent in a region and reports a single value at the end of the process. This technique was introduced but not merged by [1] so maybe this example presents some justification to revisit that approach. [1] https://lore.kernel.org/git/pull.1099.v2.git.1640720202.gitgitgadget@gmail.com/ To help with the 'git blame' output in this region, add a comment that warns against adding a trace2 region. Delete a test from t5310 that used that trace output to check that this lookup optimization was activated. To create this kind of test again in the future, the stopwatch traces mentioned earlier could be used as a signal that we activated this code path. Helpedy-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-26pack-bitmap: prepare to read lookup table extensionAbhradeep Chakraborty
Earlier change teaches Git to write bitmap lookup table. But Git does not know how to parse them. Teach Git to parse the existing bitmap lookup table. The older versions of Git are not affected by it. Those versions ignore the lookup table. Mentored-by: Taylor Blau <me@ttaylorr.com> Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-26pack-bitmap-write: learn pack.writeBitmapLookupTable and add testsAbhradeep Chakraborty
Teach Git to provide a way for users to enable/disable bitmap lookup table extension by providing a config option named 'writeBitmapLookupTable'. Default is false. Also add test to verify writting of lookup table. Mentored-by: Taylor Blau <me@ttaylorr.com> Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com> Co-Authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-27pack-bitmap.c: gracefully fallback after opening pack/MIDXTaylor Blau
When opening a MIDX/pack-bitmap, we call open_midx_bitmap_1() or open_pack_bitmap_1() respectively in a loop over the set of MIDXs/packs. By design, these functions are supposed to be called over every pack and MIDX, since only one of them should have a valid bitmap. Ordinarily we return '0' from these two functions in order to indicate that we successfully loaded a bitmap To signal that we couldn't load a bitmap corresponding to the MIDX/pack (either because one doesn't exist, or because there was an error with loading it), we can return '-1'. In either case, the callers each enumerate all MIDXs/packs to ensure that at most one bitmap per-kind is present. But when we fail to load a bitmap that does exist (for example, loading a MIDX bitmap without finding a corresponding reverse index), we'll return -1 but leave the 'midx' field non-NULL. So when we fallback to loading a pack bitmap, we'll complain that the bitmap we're trying to populate already is "opened", even though it isn't. Rectify this by setting the '->pack' and '->midx' field back to NULL as appropriate. Two tests are added: one to ensure that the MIDX-to-pack bitmap fallback works, and another to ensure we still complain when there are multiple pack bitmaps in a repository. Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-04Merge branch 'es/test-chain-lint'Junio C Hamano
Broken &&-chains in the test scripts have been corrected. * es/test-chain-lint: t6000-t9999: detect and signal failure within loop t5000-t5999: detect and signal failure within loop t4000-t4999: detect and signal failure within loop t0000-t3999: detect and signal failure within loop tests: simplify by dropping unnecessary `for` loops tests: apply modern idiom for exiting loop upon failure tests: apply modern idiom for signaling test failure tests: fix broken &&-chains in `{...}` groups tests: fix broken &&-chains in `$(...)` command substitutions tests: fix broken &&-chains in compound statements tests: use test_write_lines() to generate line-oriented output tests: simplify construction of large blocks of text t9107: use shell parameter expansion to avoid breaking &&-chain t6300: make `%(raw:size) --shell` test more robust t5516: drop unnecessary subshell and command invocation t4202: clarify intent by creating expected content less cleverly t1020: avoid aborting entire test script when one test fails t1010: fix unnoticed failure on Windows t/lib-pager: use sane_unset() to avoid breaking &&-chain
2021-12-15Merge branch 'js/test-initial-branch-override-cleanup'Junio C Hamano
Many tests that used to need GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME mechanism to force "git" to use 'master' as the default name for the initial branch no longer need it; the use of the mechanism from them have been removed. * js/test-initial-branch-override-cleanup: tests: set GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME only when needed
2021-12-13t5000-t5999: detect and signal failure within loopEric Sunshine
Failures within `for` and `while` loops can go unnoticed if not detected and signaled manually since the loop itself does not abort when a contained command fails, nor will a failure necessarily be detected when the loop finishes since the loop returns the exit code of the last command it ran on the final iteration, which may not be the command which failed. Therefore, detect and signal failures manually within loops using the idiom `|| return 1` (or `|| exit 1` within subshells). Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-05tests: set GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME only when neededJohannes Schindelin
A couple of test scripts have actually been adapted to accommodate for a configurable default branch name, but they still overrode it via the `GIT_TEST_*` variable. Let's drop that override where possible. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-29t/t*: remove custom GIT_TRACE2_EVENT_NESTINGDerrick Stolee
The previous change modified GIT_TRACE2_EVENT_NESTING by default within test-lib.sh. These custom assignments throughout the test suite are no longer necessary. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-01Merge branch 'ab/test-lib'Junio C Hamano
Test (cosmetic) fix. * ab/test-lib: t5310: drop lib-bundle.sh include
2021-10-29t5310: drop lib-bundle.sh includeJeff King
Commit ddfe900612 (test-lib-functions: move function to lib-bitmap.sh, 2021-02-09) meant to include lib-bitmap.sh in t5310, but also includes lib-bundle.sh. Yet we don't use any of its functions, nor have anything to do with bundles. This is probably just a typo/copy-paste error, as lib-bundle.sh was added (correctly) to other scripts in the same series. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAPJeff King
Generating a MIDX bitmap confuses many of the tests in t5310, which expect to control whether and how bitmaps are written. Since the relevant MIDX-bitmap tests here are covered already in t5326, let's just disable the flag for the whole t5310 script. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01t5310: move some tests to lib-bitmap.shTaylor Blau
We'll soon be adding a test script that will cover many of the same bitmap concepts as t5310, but for MIDX bitmaps. Let's pull out as many of the applicable tests as we can so we don't have to rewrite them. There should be no functional change to t5310; we still run the same operations in the same order. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-21Merge branch 'jk/pack-objects-bitmap-progress-fix'Junio C Hamano
When "git pack-objects" makes a literal copy of a part of existing packfile using the reachability bitmaps, its update to the progress meter was broken. * jk/pack-objects-bitmap-progress-fix: pack-objects: update "nr_seen" progress based on pack-reused count
2021-04-12pack-objects: update "nr_seen" progress based on pack-reused countJeff King
When serving a clone or fetch with bitmaps, after deciding which objects need to be sent our "pack reuse" mechanism kicks in: we try to send more-or-less verbatim a bunch of objects from the beginning of the bitmapped packfile without even adding them to the to_pack.objects array. After deciding which objects will be in the "reused" portion, we update nr_result to account for those, and then trigger display_progress() to show the user (who is undoubtedly dazzled that we managed to enumerate so many objects so quickly). But then something confusing happens: the "Enumerating objects" progress meter jumps _backwards_, counting up from zero the number of objects we actually add into to_pack.objects. This worked correctly once upon a time, but was broken in 5af050437a (pack-objects: show some progress when counting kept objects, 2018-04-15), when the latter half of that progress meter switched to using a separate nr_seen counter, rather than nr_result. Nobody noticed for two reasons: - prior to the pack-reuse fixes from a14aebeac3 (Merge branch 'jk/packfile-reuse-cleanup', 2020-02-14), the reuse code almost never kicked in anyway - the output looks _kind of_ correct. The "backwards" moment is hard to catch, because we overwrite the old progress number with the new one, and the larger number is displayed only for a second. So unless you look at that exact second, you just see the much smaller value, counting up to the number of non-reused objects (though of course if you catch it in stderr, or look at GIT_TRACE_PACKET from a server with bitmaps, you can see both values). This smaller output isn't wrong per se, but isn't counting what we ever intended to. We should give the user the whole number of objects we considered (which, as per 5af050437a's original purpose, is already _not_ a count of what goes into to_pack.objects). The follow-on "Counting objects" meter shows the actual number of objects we feed into that array. We can easily fix this by bumping (and showing) nr_seen for the pack-reused objects. When the included test is run without this patch, the second pack-objects invocation produces "Enumerating objects: 1" to show the one loose object, even though the resulting pack has hundreds of objects in it. With it, we jump to "Enumerating objects: 674" after deciding on reuse, and then "675" when we add in the loose object. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-01builtin/pack-objects.c: respect 'pack.preferBitmapTips'Taylor Blau
When writing a new pack with a bitmap, it is sometimes convenient to indicate some reference prefixes which should receive priority when selecting which commits to receive bitmaps. A truly motivated caller could accomplish this by setting 'pack.islandCore', (since all commits in the core island are similarly marked as preferred) but this requires callers to opt into using delta islands, which they may or may not want to do. Introduce a new multi-valued configuration, 'pack.preferBitmapTips' to allow callers to specify a list of reference prefixes. All references which have a prefix contained in 'pack.preferBitmapTips' will mark their tips as "preferred" in the same way as commits are marked as preferred for selection by 'pack.islandCore'. The choice of the verb "prefer" is intentional: marking the NEEDS_BITMAP flag on an object does *not* guarantee that that object will receive a bitmap. It merely guarantees that that commit will receive a bitmap over any *other* commit in the same window by bitmap_writer_select_commits(). The test this patch adds reflects this quirk, too. It only tests that a commit (which didn't receive bitmaps by default) is selected for bitmaps after changing the value of 'pack.preferBitmapTips' to include it. Other commits may lose their bitmaps as a byproduct of how the selection process works (bitmap_writer_select_commits() ignores the remainder of a window after seeing a commit with the NEEDS_BITMAP flag). This configuration will aide in selecting important references for multi-pack bitmaps, since they do not respect the same pack.islandCore configuration. (They could, but doing so may be confusing, since it is packs--not bitmaps--which are influenced by the delta-islands configuration). In a fork network repository (one which lists all forks of a given repository as remotes), for example, it is useful to set pack.preferBitmapTips to 'refs/remotes/<root>/heads' and 'refs/remotes/<root>/tags', where '<root>' is an opaque identifier referring to the repository which is at the base of the fork chain. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-11test-lib-functions: move function to lib-bitmap.shÆvar Arnfjörð Bjarmason
Move a function added to test-lib-functions.sh in ea047a8eb4f (t5310: factor out bitmap traversal comparison, 2020-02-14) into a new lib-bitmap.sh. The test-lib-functions.sh file should be for functions that are widely used across the test suite, if something's only used by a few tests it makes more sense to have it in a lib-*.sh file. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-26Merge branch 'js/default-branch-name-tests-final-stretch'Junio C Hamano
Prepare tests not to be affected by the name of the default branch "git init" creates. * js/default-branch-name-tests-final-stretch: (28 commits) tests: drop prereq `PREPARE_FOR_MAIN_BRANCH` where no longer needed t99*: adjust the references to the default branch name "main" tests(git-p4): transition to the default branch name `main` t9[5-7]*: adjust the references to the default branch name "main" t9[0-4]*: adjust the references to the default branch name "main" t8*: adjust the references to the default branch name "main" t7[5-9]*: adjust the references to the default branch name "main" t7[0-4]*: adjust the references to the default branch name "main" t6[4-9]*: adjust the references to the default branch name "main" t64*: preemptively adjust alignment to prepare for `master` -> `main` t6[0-3]*: adjust the references to the default branch name "main" t5[6-9]*: adjust the references to the default branch name "main" t55[4-9]*: adjust the references to the default branch name "main" t55[23]*: adjust the references to the default branch name "main" t551*: adjust the references to the default branch name "main" t550*: adjust the references to the default branch name "main" t5503: prepare aligned comment for replacing `master` with `main` t5[0-4]*: adjust the references to the default branch name "main" t5323: prepare centered comment for `master` -> `main` t4*: adjust the references to the default branch name "main" ...
2021-01-07Merge branch 'tb/pack-bitmap'Junio C Hamano
Various improvements to the codepath that writes out pack bitmaps. * tb/pack-bitmap: (24 commits) pack-bitmap-write: better reuse bitmaps pack-bitmap-write: relax unique revwalk condition pack-bitmap-write: use existing bitmaps pack-bitmap: factor out 'add_commit_to_bitmap()' pack-bitmap: factor out 'bitmap_for_commit()' pack-bitmap-write: ignore BITMAP_FLAG_REUSE pack-bitmap-write: build fewer intermediate bitmaps pack-bitmap.c: check reads more aggressively when loading pack-bitmap-write: rename children to reverse_edges t5310: add branch-based checks commit: implement commit_list_contains() bitmap: implement bitmap_is_subset() pack-bitmap-write: fill bitmap with commit history pack-bitmap-write: pass ownership of intermediate bitmaps pack-bitmap-write: reimplement bitmap writing ewah: add bitmap_dup() function ewah: implement bitmap_or() ewah: make bitmap growth less aggressive ewah: factor out bitmap growth rev-list: die when --test-bitmap detects a mismatch ...
2020-12-09pack-bitmap-write: relax unique revwalk conditionDerrick Stolee
The previous commits improved the bitmap computation process for very long, linear histories with many refs by removing quadratic growth in how many objects were walked. The strategy of computing "intermediate commits" using bitmasks for which refs can reach those commits partitioned the poset of reachable objects so each part could be walked exactly once. This was effective for linear histories. However, there was a (significant) drawback: wide histories with many refs had an explosion of memory costs to compute the commit bitmasks during the exploration that discovers these intermediate commits. Since these wide histories are unlikely to repeat walking objects, the benefit of walking objects multiple times was not expensive before. But now, the commit walk *before computing bitmaps* is incredibly expensive. In an effort to discover a happy medium, this change reduces the walk for intermediate commits to only the first-parent history. This focuses the walk on how the histories converge, which still has significant reduction in repeat object walks. It is still possible to create quadratic behavior in this version, but it is probably less likely in realistic data shapes. Here is some data taken on a fresh clone of the kernel: | runtime (sec) | peak heap (GB) | | | | | from | with | from | with | | scratch | existing | scratch | existing | -----------+---------+----------+---------+----------- original | 64.044 | 83.241 | 2.088 | 2.194 | last patch | 45.049 | 37.624 | 2.267 | 2.334 | this patch | 88.478 | 53.218 | 2.157 | 2.224 | Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-09pack-bitmap-write: build fewer intermediate bitmapsDerrick Stolee
The bitmap_writer_build() method calls bitmap_builder_init() to construct a list of commits reachable from the selected commits along with a "reverse graph". This reverse graph has edges pointing from a commit to other commits that can reach that commit. After computing a reachability bitmap for a commit, the values in that bitmap are then copied to the reachability bitmaps across the edges in the reverse graph. We can now relax the role of the reverse graph to greatly reduce the number of intermediate reachability bitmaps we compute during this reverse walk. The end result is that we walk objects the same number of times as before when constructing the reachability bitmaps, but we also spend much less time copying bits between bitmaps and have much lower memory pressure in the process. The core idea is to select a set of "important" commits based on interactions among the sets of commits reachable from each selected commit. The first technical concept is to create a new 'commit_mask' member in the bb_commit struct. Note that the selected commits are provided in an ordered array. The first thing to do is to mark the ith bit in the commit_mask for the ith selected commit. As we walk the commit-graph, we copy the bits in a commit's commit_mask to its parents. At the end of the walk, the ith bit in the commit_mask for a commit C stores a boolean representing "The ith selected commit can reach C." As we walk, we will discover non-selected commits that are important. We will get into this later, but those important commits must also receive bit positions, growing the width of the bitmasks as we walk. At the true end of the walk, the ith bit means "the ith _important_ commit can reach C." MAXIMAL COMMITS --------------- We use a new 'maximal' bit in the bb_commit struct to represent whether a commit is important or not. The term "maximal" comes from the partially-ordered set of commits in the commit-graph where C >= P if P is a parent of C, and then extending the relationship transitively. Instead of taking the maximal commits across the entire commit-graph, we instead focus on selecting each commit that is maximal among commits with the same bits on in their commit_mask. This definition is important, so let's consider an example. Suppose we have three selected commits A, B, and C. These are assigned bitmasks 100, 010, and 001 to start. Each of these can be marked as maximal immediately because they each will be the uniquely maximal commit that contains their own bit. Keep in mind that that these commits may have different bitmasks after the walk; for example, if B can reach C but A cannot, then the final bitmask for C is 011. Even in these cases, C would still be a maximal commit among all commits with the third bit on in their masks. Now define sets X, Y, and Z to be the sets of commits reachable from A, B, and C, respectively. The intersections of these sets correspond to different bitmasks: * 100: X - (Y union Z) * 010: Y - (X union Z) * 001: Z - (X union Y) * 110: (X intersect Y) - Z * 101: (X intersect Z) - Y * 011: (Y intersect Z) - X * 111: X intersect Y intersect Z This can be visualized with the following Hasse diagram: 100 010 001 | \ / \ / | | \/ \/ | | /\ /\ | | / \ / \ | 110 101 011 \___ | ___/ \ | / 111 Some of these bitmasks may not be represented, depending on the topology of the commit-graph. In fact, we are counting on it, since the number of possible bitmasks is exponential in the number of selected commits, but is also limited by the total number of commits. In practice, very few bitmasks are possible because most commits converge on a common "trunk" in the commit history. With this three-bit example, we wish to find commits that are maximal for each bitmask. How can we identify this as we are walking? As we walk, we visit a commit C. Since we are walking the commits in topo-order, we know that C is visited after all of its children are visited. Thus, when we get C from the revision walk we inspect the 'maximal' property of its bb_data and use that to determine if C is truly important. Its commit_mask is also nearly final. If C is not one of the originally-selected commits, then assign a bit position to C (by incrementing num_maximal) and set that bit on in commit_mask. See "MULTIPLE MAXIMAL COMMITS" below for more detail on this. Now that the commit C is known to be maximal or not, consider each parent P of C. Compute two new values: * c_not_p : true if and only if the commit_mask for C contains a bit that is not contained in the commit_mask for P. * p_not_c : true if and only if the commit_mask for P contains a bit that is not contained in the commit_mask for P. If c_not_p is false, then P already has all of the bits that C would provide to its commit_mask. In this case, move on to other parents as C has nothing to contribute to P's state that was not already provided by other children of P. We continue with the case that c_not_p is true. This means there are bits in C's commit_mask to copy to P's commit_mask, so use bitmap_or() to add those bits. If p_not_c is also true, then set the maximal bit for P to one. This means that if no other commit has P as a parent, then P is definitely maximal. This is because no child had the same bitmask. It is important to think about the maximal bit for P at this point as a temporary state: "P is maximal based on current information." In contrast, if p_not_c is false, then set the maximal bit for P to zero. Further, clear all reverse_edges for P since any edges that were previously assigned to P are no longer important. P will gain all reverse edges based on C. The final thing we need to do is to update the reverse edges for P. These reverse edges respresent "which closest maximal commits contributed bits to my commit_mask?" Since C contributed bits to P's commit_mask in this case, C must add to the reverse edges of P. If C is maximal, then C is a 'closest' maximal commit that contributed bits to P. Add C to P's reverse_edges list. Otherwise, C has a list of maximal commits that contributed bits to its bitmask (and this list is exactly one element). Add all of these items to P's reverse_edges list. Be careful to ignore duplicates here. After inspecting all parents P for a commit C, we can clear the commit_mask for C. This reduces the memory load to be limited to the "width" of the commit graph. Consider our ABC/XYZ example from earlier and let's inspect the state of the commits for an interesting bitmask, say 011. Suppose that D is the only maximal commit with this bitmask (in the first three bits). All other commits with bitmask 011 have D as the only entry in their reverse_edges list. D's reverse_edges list contains B and C. COMPUTING REACHABILITY BITMAPS ------------------------------ Now that we have our definition, let's zoom out and consider what happens with our new reverse graph when computing reachability bitmaps. We walk the reverse graph in reverse-topo-order, so we visit commits with largest commit_masks first. After we compute the reachability bitmap for a commit C, we push the bits in that bitmap to each commit D in the reverse edge list for C. Then, when we finally visit D we already have the bits for everything reachable from maximal commits that D can reach and we only need to walk the objects in the set-difference. In our ABC/XYZ example, when we finally walk for the commit A we only need to walk commits with bitmask equal to A's bitmask. If that bitmask is 100, then we are only walking commits in X - (Y union Z) because the bitmap already contains the bits for objects reachable from (X intersect Y) union (X intersect Z) (i.e. the bits from the reachability bitmaps for the maximal commits with bitmasks 110 and 101). The behavior is intended to walk each commit (and the trees that commit introduces) at most once while allocating and copying fewer reachability bitmaps. There is one caveat: what happens when there are multiple maximal commits with the same bitmask, with respect to the initial set of selected commits? MULTIPLE MAXIMAL COMMITS ------------------------ Earlier, we mentioned that when we discover a new maximal commit, we assign a new bit position to that commit and set that bit position to one for that commit. This is absolutely important for interesting commit-graphs such as git/git and torvalds/linux. The reason is due to the existence of "butterflies" in the commit-graph partial order. Here is an example of four commits forming a butterfly: I J |\ /| | \/ | | /\ | |/ \| M N \ / |/ Q Here, I and J both have parents M and N. In general, these do not need to be exact parent relationships, but reachability relationships. The most important part is that M and N cannot reach each other, so they are independent in the partial order. If I had commit_mask 10 and J had commit_mask 01, then M and N would both be assigned commit_mask 11 and be maximal commits with the bitmask 11. Then, what happens when M and N can both reach a commit Q? If Q is also assigned the bitmask 11, then it is not maximal but is reachable from both M and N. While this is not necessarily a deal-breaker for our abstract definition of finding maximal commits according to a given bitmask, we have a few issues that can come up in our larger picture of constructing reachability bitmaps. In particular, if we do not also consider Q to be a "maximal" commit, then we will walk commits reachable from Q twice: once when computing the reachability bitmap for M and another time when computing the reachability bitmap for N. This becomes much worse if the topology continues this pattern with multiple butterflies. The solution has already been mentioned: each of M and N are assigned their own bits to the bitmask and hence they become uniquely maximal for their bitmasks. Finally, Q also becomes maximal and thus we do not need to walk its commits multiple times. The final bitmasks for these commits are as follows: I:10 J:01 |\ /| | \ _____/ | | /\____ | |/ \ | M:111 N:1101 \ / Q:1111 Further, Q's reverse edge list is { M, N }, while M and N both have reverse edge list { I, J }. PERFORMANCE MEASUREMENTS ------------------------ Now that we've spent a LOT of time on the theory of this algorithm, let's show that this is actually worth all that effort. To test the performance, use GIT_TRACE2_PERF=1 when running 'git repack -abd' in a repository with no existing reachability bitmaps. This avoids any issues with keeping existing bitmaps to skew the numbers. Inspect the "building_bitmaps_total" region in the trace2 output to focus on the portion of work that is affected by this change. Here are the performance comparisons for a few repositories. The timings are for the following versions of Git: "multi" is the timing from before any reverse graph is constructed, where we might perform multiple traversals. "reverse" is for the previous change where the reverse graph has every reachable commit. Finally "maximal" is the version introduced here where the reverse graph only contains the maximal commits. Repository: git/git multi: 2.628 sec reverse: 2.344 sec maximal: 2.047 sec Repository: torvalds/linux multi: 64.7 sec reverse: 205.3 sec maximal: 44.7 sec So in all cases we've not only recovered any time lost to switching to the reverse-edge algorithm, but we come out ahead of "multi" in all cases. Likewise, peak heap has gone back to something reasonable: Repository: torvalds/linux multi: 2.087 GB reverse: 3.141 GB maximal: 2.288 GB While I do not have access to full fork networks on GitHub, Peff has run this algorithm on the chromium/chromium fork network and reported a change from 3 hours to ~233 seconds. That network is particularly beneficial for this approach because it has a long, linear history along with many tags. The "multi" approach was obviously quadratic and the new approach is linear. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-09t5310: add branch-based checksDerrick Stolee
The current rev-list tests that check the bitmap data only work on HEAD instead of multiple branches. Expand the test cases to handle both 'master' and 'other' branches. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-09t5310: drop size of truncated ewah bitmapJeff King
We truncate the .bitmap file to 512 bytes and expect to run into problems reading an individual ewah file. But this length is somewhat arbitrary, and just happened to work when the test was added in 9d2e330b17 (ewah_read_mmap: bounds-check mmap reads, 2018-06-14). An upcoming commit will change the size of the history we create in the test repo, which will cause this test to fail. We can future-proof it a bit more by reducing the size of the truncated bitmap file. Signed-off-by: Jeff King <peff@peff.net> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-09pack-bitmap: bounds-check size of cache extensionJeff King
A .bitmap file may have a "name hash cache" extension, which puts a sequence of uint32_t values (one per object) at the end of the file. When we see a flag indicating this extension, we blindly subtract the appropriate number of bytes from our available length. However, if the .bitmap file is too short, we'll underflow our length variable and wrap around, thinking we have a very large length. This can lead to reading out-of-bounds bytes while loading individual ewah bitmaps. We can fix this by checking the number of available bytes when we parse the header. The existing "truncated bitmap" test is now split into two tests: one where we don't have this extension at all (and hence actually do try to read a truncated ewah bitmap) and one where we realize up-front that we can't even fit in the cache structure. We'll check stderr in each case to make sure we hit the error we're expecting. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-20tests: mark tests relying on the current default for `init.defaultBranch`Johannes Schindelin
In addition to the manual adjustment to let the `linux-gcc` CI job run the test suite with `master` and then with `main`, this patch makes sure that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts that currently rely on the initial branch name being `master by default. To determine which test scripts to mark up, the first step was to force-set the default branch name to `master` in - all test scripts that contain the keyword `master`, - t4211, which expects `t/t4211/history.export` with a hard-coded ref to initialize the default branch, - t5560 because it sources `t/t556x_common` which uses `master`, - t8002 and t8012 because both source `t/annotate-tests.sh` which also uses `master`) This trick was performed by this command: $ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' $(git grep -l master t/t[0-9]*.sh) \ t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh After that, careful, manual inspection revealed that some of the test scripts containing the needle `master` do not actually rely on a specific default branch name: either they mention `master` only in a comment, or they initialize that branch specificially, or they do not actually refer to the current default branch. Therefore, the aforementioned modification was undone in those test scripts thusly: $ git checkout HEAD -- \ t/t0027-auto-crlf.sh t/t0060-path-utils.sh \ t/t1011-read-tree-sparse-checkout.sh \ t/t1305-config-include.sh t/t1309-early-config.sh \ t/t1402-check-ref-format.sh t/t1450-fsck.sh \ t/t2024-checkout-dwim.sh \ t/t2106-update-index-assume-unchanged.sh \ t/t3040-subprojects-basic.sh t/t3301-notes.sh \ t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \ t/t3436-rebase-more-options.sh \ t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \ t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \ t/t5511-refspec.sh t/t5526-fetch-submodules.sh \ t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \ t/t5548-push-porcelain.sh \ t/t5552-skipping-fetch-negotiator.sh \ t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \ t/t5614-clone-submodules-shallow.sh \ t/t7508-status.sh t/t7606-merge-custom.sh \ t/t9302-fast-import-unpack-limit.sh We excluded one set of test scripts in these commands, though: the range of `git p4` tests. The reason? `git p4` stores the (foreign) remote branch in the branch called `p4/master`, which is obviously not the default branch. Manual analysis revealed that only five of these tests actually require a specific default branch name to pass; They were modified thusly: $ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' t/t980[0167]*.sh t/t9811*.sh Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-16t5310-pack-bitmaps: skip JGit tests with SHA256SZEDER Gábor
In 't5310-pack-bitmaps.sh' two tests make sure that our pack bitmaps are compatible with JGit's bitmaps. Alas, not even the most recent JGit version (5.9.0.202009080501-r) supports SHA256 yet, so when this test script is run with GIT_TEST_DEFAULT_HASH=sha256 on a setup with JGit installed in PATH, then these two tests fail. Protect these two tests with the SHA1 prereq in order to skip them when testing with SHA256. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14pack-objects: support filters with bitmapsJeff King
Just as rev-list recently learned to combine filters and bitmaps, let's do the same for pack-objects. The infrastructure is all there; we just need to pass along our filter options, and the pack-bitmap code will decide to use bitmaps or not. This unsurprisingly makes things faster for partial clones of large repositories (here we're cloning linux.git): Test HEAD^ HEAD ------------------------------------------------------------------------------ 5310.11: simulated partial clone 38.94(37.28+5.87) 11.06(11.27+4.07) -71.6% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14rev-list: allow commit-only bitmap traversalsJeff King
Ever since we added reachability bitmap support, we've been able to use it with rev-list to get the full list of objects, like: git rev-list --objects --use-bitmap-index --all But you can't do so without --objects, since we weren't ready to just show the commits. However, the internals of the bitmap code are mostly ready for this: they avoid opening up trees when walking to fill in the bitmaps. We just need to actually pass in the rev_info to traverse_bitmap_commit_list() so it knows which types to bother triggering our callback for. For completeness, the perf test now covers both the existing --objects case, as well as the new commits-only behavior (the objects one got way faster when we introduced bitmaps, but obviously isn't improved now). Here are numbers for linux.git: Test HEAD^ HEAD ------------------------------------------------------------------------ 5310.7: rev-list (commits) 8.29(8.10+0.19) 1.76(1.72+0.04) -78.8% 5310.8: rev-list (objects) 8.06(7.94+0.12) 8.14(7.94+0.13) +1.0% That run was cheating a little, as I didn't have any commit-graph in the repository, and we'd built it by default these days when running git-gc. Here are numbers with a commit-graph: Test HEAD^ HEAD ------------------------------------------------------------------------ 5310.7: rev-list (commits) 0.70(0.58+0.12) 0.51(0.46+0.04) -27.1% 5310.8: rev-list (objects) 6.20(6.09+0.10) 6.27(6.16+0.11) +1.1% Still an improvement, but a lot less impressive. We could have the perf script remove any commit-graph to show the out-sized effect, but it probably makes sense to leave it in what would be a more typical setup. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14t5310: factor out bitmap traversal comparisonJeff King
We check the results of "rev-list --use-bitmap-index" by comparing it to the same traversal without the bitmap option. However, this is a little tricky to do because of some output differences (see the included comment for details). Let's pull this out into a helper function, since we'll be adding some similar tests. While we're at it, let's also try to confirm that the bitmap output did indeed use bitmaps. Since the code internally falls back to the non-bitmap path in some cases, the tests are at risk of becoming trivial noops. This is a bit fragile, as not all outputs will differ (e.g., looking at only the commits from a fully-bitmapped pack will end up exactly the same as the normal traversal order, since it also matches the pack order). So we'll provide an escape hatch by which tests can disable this check (which should only be used after manually confirming that bitmaps kicked in). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14rev-list: allow bitmaps when counting objectsJeff King
The prior commit taught "--count --objects" to work without bitmaps. We should be able to get the same answer much more quickly with bitmaps. Note that we punt on the max_count case here. This perhaps _could_ be made to work if we find all of the boundary commits and treat them as UNINTERESTING, subtracting them (and their reachable objects) from the set we return. That implies an actual commit traversal, but we'd still be faster due to avoiding opening up any trees. Given the complexity and the fact that anyone is unlikely to want this, it makes sense to just fall back to the non-bitmap case for now. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-02t5310: increase the number of bitmapped commitsJeff King
The bitmap index we compute in t5310 has only 20 commits in it. This gives poor coverage of bitmap_writer_select_commits(), which simply writes a bitmap for everything when there are fewer than 100 commits. Let's bump the number of commits in the test to cover the more complex code paths (this does drop coverage of the individual lines of the trivial path, but the complex path does everything it does and more). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-02test-lib: introduce test_commit_bulkJeff King
Some tests need to create a string of commits. Doing this with test_commit is very heavy-weight, as it needs at least one process per commit (and in fact, uses several). For bulk creation, we can do much better by using fast-import, but it's often a pain to generate the input. Let's provide a helper to do so. We'll use t5310 as a guinea pig, as it has three 10-commit loops. Here are hyperfine results before and after: [before] Benchmark #1: ./t5310-pack-bitmaps.sh --root=/var/ram/git-tests Time (mean ± σ): 2.846 s ± 0.305 s [User: 3.042 s, System: 0.919 s] Range (min … max): 2.250 s … 3.210 s 10 runs [after] Benchmark #1: ./t5310-pack-bitmaps.sh --root=/var/ram/git-tests Time (mean ± σ): 2.210 s ± 0.174 s [User: 2.570 s, System: 0.604 s] Range (min … max): 1.999 s … 2.590 s 10 runs So we're over 20% faster, while making the callers slightly shorter. We added a lot more lines in test-lib-function.sh, of course, and the helper is way more featureful than we need here. But my hope is that it will be flexible enough to use in more places. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-18pack-objects: default to writing bitmap hash-cacheJeff King
Enabling pack.writebitmaphashcache should always be a performance win. It costs only 4 bytes per object on disk, and the timings in ae4f07fbcc (pack-bitmap: implement optional name_hash cache, 2013-12-21) show it improving fetch and partial-bitmap clone times by 40-50%. The only reason we didn't enable it by default at the time is that early versions of JGit's bitmap reader complained about the presence of optional header bits it didn't understand. But that was changed in JGit's d2fa3987a (Use bitcheck to check for presence of OPT_FULL option, 2013-10-30), which made it into JGit v3.5.0 in late 2014. So let's turn this option on by default. It's backwards-compatible with all versions of Git, and if you are also using JGit on the same repository, you'd only run into problems using a version that's almost 5 years old. We'll drop the manual setting from all of our test scripts, including perf tests. This isn't strictly necessary, but it has two advantages: 1. If the hash-cache ever stops being enabled by default, our perf regression tests will notice. 2. We can use the modified perf tests to show off the behavior of an otherwise unconfigured repo, as shown below. These are the results of a few of a perf tests against linux.git that showed interesting results. You can see the expected speedup in 5310.4, which was noted in ae4f07fbcc. Curiously, 5310.8 did not improve (and actually got slower), despite seeing the opposite in ae4f07fbcc. I don't have an explanation for that. The tests from p5311 did not exist back then, but do show improvements (a smaller pack due to better deltas, which we found in less time). Test HEAD^ HEAD ------------------------------------------------------------------------------------- 5310.4: simulated fetch 7.39(22.70+0.25) 5.64(11.43+0.22) -23.7% 5310.8: clone (partial bitmap) 18.45(24.83+1.19) 19.94(28.40+1.36) +8.1% 5311.31: server (128 days) 0.41(1.13+0.05) 0.34(0.72+0.02) -17.1% 5311.32: size (128 days) 7.4M 7.0M -4.8% 5311.33: client (128 days) 1.33(1.49+0.06) 1.29(1.37+0.12) -3.0% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-18t5310: correctly remove bitmaps for jgit testJeff King
We use "jgit gc" to generate a pack bitmap file, and then make sure our implementation can read it. To prepare the repo before running jgit, we try to "rm -f" any existing bitmap files. But we got the path wrong; we're in a bare repo, so looking in ".git/" finds nothing. Our "rm" doesn't complain because of the "-f", and when we run "rev-list" there are two bitmap files (ours and jgit's). Our reader implementation will ignore one of the bitmap files, but it's likely non-deterministic which one we will use. We'd prefer the one with the more recent timestamp (just because of the way the packed_git list is sorted), but in most test runs they'd have identical timestamps. So this was probably actually testing something useful about 50% of the time, and other half just testing that we could read our own bitmaps (which is covered elsewhere). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-22multi-pack-index: define GIT_TEST_MULTI_PACK_INDEXDerrick Stolee
The multi-pack-index feature is tested in isolation by t5319-multi-pack-index.sh, but there are many more interesting scenarios in the test suite surrounding pack-file data shapes and interactions. Since the multi-pack-index is an optional data structure, it does not make sense to include it by default in those tests. Instead, add a new GIT_TEST_MULTI_PACK_INDEX environment variable that enables core.multiPackIndex and writes a multi-pack-index after each 'git repack' command. This adds extra test coverage when needed. There are a few spots in the test suite that need to react to this change: * t5319-multi-pack-index.sh: there is a test that checks that 'git repack' deletes the multi-pack-index. Disable the environment variable to ensure this still happens. * t5310-pack-bitmaps.sh: One test moves a pack-file from the object directory to an alternate. This breaks the multi-pack-index, so delete the multi-pack-index at this point, if it exists. * t9300-fast-import.sh: One test verifies the number of files in the .git/objects/pack directory is exactly 8. Exclude the multi-pack-index from this count so it is still 8 in all cases. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17Merge branch 'jk/pack-objects-with-bitmap-fix'Junio C Hamano
Hotfix of the base topic. * jk/pack-objects-with-bitmap-fix: pack-bitmap: drop "loaded" flag traverse_bitmap_commit_list(): don't free result t5310: test delta reuse with bitmaps bitmap_has_sha1_in_uninteresting(): drop BUG check
2018-09-04t5310: test delta reuse with bitmapsJeff King
Commit 6a1e32d532 (pack-objects: reuse on-disk deltas for thin "have" objects, 2018-08-21) taught pack-objects a new optimization trick. Since this wasn't meant to change user-visible behavior, but only produce smaller packs more quickly, testing focused on t/perf/p5311. However, since people don't run perf tests very often, we should make sure that the feature is exercised in the regular test suite. This patch does so. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-27tests: fix and add lint for non-portable head -c NÆvar Arnfjörð Bjarmason
The "head -c BYTES" option is non-portable (not in POSIX[1]). Change such invocations to use the test_copy_bytes wrapper added in 48860819e8 ("t9300: factor out portable "head -c" replacement", 2016-06-30). This fixes a test added in 9d2e330b17 ("ewah_read_mmap: bounds-check mmap reads", 2018-06-14), which has been breaking t5310-pack-bitmaps.sh on OpenBSD since 2.18.0. The OpenBSD ports already have a similar workaround after their upgrade to 2.18.0[2]. I have not tested this on IRIX, but according to 4de0bbd898 ("t9300: use perl "head -c" clone in place of "dd bs=1 count=16000" kluge", 2010-12-13) this invocation would have broken things there too. Also, change a valgrind-specific codepath in test-lib.sh to use this wrapper. Given where valgrind runs I don't think this would ever become a portability issue in practice, but it's easier to just use the wrapper than introduce some exception for the "make test-lint" check being added here. 1. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html 2. https://github.com/openbsd/ports/commit/08d5d82eaefe5cf2f125ecc0c6a57df9cf91350c#diff-f7d3c4fabeed1691620d608f1534f5e5 Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20Merge branch 'sg/t5310-empty-input-fix'Junio C Hamano
Test fix. * sg/t5310-empty-input-fix: t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test
2018-08-14t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' testSZEDER Gábor
The test 'pack-objects to file can use bitmap' added in 645c432d61 (pack-objects: use reachability bitmap index when generating non-stdout pack, 2016-09-10) is silently buggy and doesn't check what it's supposed to. In 't5310-pack-bitmaps.sh', the 'list_packed_objects' helper function does what its name implies by running: git show-index <"$1" | cut -d' ' -f2 The test in question invokes this function like this: list_packed_objects <packa-$packasha1.idx >packa.objects && list_packed_objects <packb-$packbsha1.idx >packb.objects && test_cmp packa.objects packb.objects Note how these two callsites don't specify the name of the pack index file as the function's parameter, but redirect the function's standard input from it. This triggers an error message from the shell, as it has no filename to redirect from in the function, but this error is ignored, because it happens upstream of a pipe. Consequently, both invocations produce empty 'pack{a,b}.objects' files, and the subsequent 'test_cmp' happily finds those two empty files identical. Fix these two 'list_packed_objects' invocations by specifying the pack index files as parameters. Furthermore, eliminate the pipe in that function by replacing it with an &&-chained pair of commands using an intermediate file, so a failure of 'git show-index' or the shell redirection will fail the test. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-30tests: make use of the test_must_be_empty functionÆvar Arnfjörð Bjarmason
Change various tests that use an idiom of the form: >expect && test_cmp expect actual To instead use: test_must_be_empty actual The test_must_be_empty() wrapper was introduced in ca8d148daf ("test: test_must_be_empty helper", 2013-06-09). Many of these tests have been added after that time. This was mostly found with, and manually pruned from: git grep '^\s+>.*expect.* &&$' t Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18Merge branch 'jk/ewah-bounds-check'Junio C Hamano
The code to read compressed bitmap was not careful to avoid reading past the end of the file, which has been corrected. * jk/ewah-bounds-check: ewah: adjust callers of ewah_read_mmap() ewah_read_mmap: bounds-check mmap reads
2018-06-18ewah_read_mmap: bounds-check mmap readsJeff King
The on-disk ewah format tells us how big the ewah data is, and we blindly read that much from the buffer without considering whether the mmap'd data is long enough, which can lead to out-of-bound reads. Let's make sure we have data available before reading it, both for the ewah header/footer as well as for the bit data itself. In particular: - keep our ptr/len pair in sync as we move through the buffer, and check it before each read - check the size for integer overflow (this should be impossible on 64-bit, as the size is given as a 32-bit count of 8-byte words, but is possible on a 32-bit system) - return the number of bytes read as an ssize_t instead of an int, again to prevent integer overflow - compute the return value using a pointer difference; this should yield the same result as the existing code, but makes it more obvious that we got our computations right The included test is far from comprehensive, as it just picks a static point at which to truncate the generated bitmap. But in practice this will hit in the middle of an ewah and make sure we're at least exercising this code. Reported-by: Luat Nguyen <root@l4w.io> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-23Merge branch 'sg/t5310-jgit-bitmap-test'Junio C Hamano
Test update. * sg/t5310-jgit-bitmap-test: t5310-pack-bitmaps: make JGit tests work with GIT_TEST_SPLIT_INDEX
2018-05-11t5310-pack-bitmaps: make JGit tests work with GIT_TEST_SPLIT_INDEXSZEDER Gábor
The two JGit tests 'we can read jgit bitmaps' and 'jgit can read our bitmaps' in 't5310-pack-bitmaps.sh' fail when run with GIT_TEST_SPLIT_INDEX=YesPlease. Both tests create a clone of the test repository to check bitmap interoperability with JGit. With split indexes enabled the index in the clone repositories contains the 'link' extension, which JGit doesn't support and, consequently, an exception aborts it: <...> org.eclipse.jgit.api.errors.JGitInternalException: DIRC extension 'link' not supported by this version. at org.eclipse.jgit.dircache.DirCache.readFrom(DirCache.java:562) <...> Since testing bitmaps doesn't need a worktree in the first place, let's just create bare clones for the two JGit tests, so the cloned won't have an index, and these two tests can be executed even with split index enabled. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-27t/helper: merge test-genrandom into test-toolNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-09t5310: fix "; do" styleJeff King
Our usual shell style is to put the "do" of a loop on its own line, like: while $cond do something done instead of: while $cond; do something done We have a bit of both in our code base, but the former is what's in CodingGuidelines (and outnumbers the latter in t/ by about 6:1). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-09pack-objects: disable pack reuse for object-selection optionsJeff King
If certain options like --honor-pack-keep, --local, or --incremental are used with pack-objects, then we need to feed each potential object to want_object_in_pack() to see if it should be filtered out. But when the bitmap reuse_packfile optimization is in effect, we do not call that function at all, and in fact skip adding the objects to the to_pack list entirely. This means we have a bug: for certain requests we will silently ignore those options and include objects in that pack that should not be there. The problem has been present since the inception of the pack-reuse code in 6b8fda2db (pack-objects: use bitmaps when packing objects, 2013-12-21), but it was unlikely to come up in practice. These options are generally used for on-disk packing, not transfer packs (which go to stdout), but we've never allowed pack reuse for non-stdout packs (until 645c432d6, we did not even use bitmaps, which the reuse optimization relies on; after that, we explicitly turned it off when not packing to stdout). We can fix this by just disabling the reuse_packfile optimization when the options are in use. In theory we could teach the pack-reuse code to satisfy these checks, but it's not worth the complexity. The purpose of the optimization is to keep the amount of per-object work we do to a minimum. But these options inherently require us to search for other copies of each object, drowning out any benefit of the pack-reuse optimization. But note that the optimizations from 56dfeb626 (pack-objects: compute local/ignore_pack_keep early, 2016-07-29) happen before pack-reuse, meaning that specifying "--honor-pack-keep" in a repository with no .keep files can still follow the fast path. There are tests in t5310 that check these options with bitmaps and --stdout, but they didn't catch the bug, and it's hard to adapt them to do so. One problem is that they don't use --delta-base-offset; without that option, we always disable the reuse optimization entirely. It would be fine to add it in (it actually makes the test more realistic), but that still isn't quite enough. The other problem is that the reuse code is very picky; it only kicks in when it can reuse most of a pack, starting from the first byte. So we'd have to start from a fully repacked and bitmapped state to trigger it. But the tests for these options use a much more subtle state; they want to be sure that the want_object_in_pack() code is allowing some objects but not others. Doing a full repack runs counter to that. So this patch adds new tests at the end of the script which create the fully-packed state and make sure that each option is not fooled by reusable pack. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01Merge branch 'dt/disable-bitmap-in-auto-gc' into maintJunio C Hamano
It is natural that "git gc --auto" may not attempt to pack everything into a single pack, and there is no point in warning when the user has configured the system to use the pack bitmap, leading to disabling further "gc". * dt/disable-bitmap-in-auto-gc: repack: die on incremental + write-bitmap-index auto gc: don't write bitmaps for incremental repacks
2016-12-30repack: die on incremental + write-bitmap-indexDavid Turner
The bitmap index only works for single packs, so requesting an incremental repack with bitmap indexes makes no sense. Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>