Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPatrick Steinhardt <psteinhardt@gitlab.com>2021-06-15 09:09:01 +0300
committerPatrick Steinhardt <psteinhardt@gitlab.com>2021-06-21 08:49:55 +0300
commitd2870b204c6801317a6e4c4fba09968fa6fd283d (patch)
tree2cbc0fe2a27c028fbd788e4dfa83847f7b1e9f89
parent7a33a3366d9b6b67dadec40e64b15f57e45bce04 (diff)
blob: Speed up LFS pointer search via object type filters
The `ListLFSPointers()` RPC returns all LFS pointers referenced by a set of revisions. This filtering is quite expensive: we first need to enumerate all reachable objects, then for each object we need to see whether it's a blob and whether its size indicates that it can be an LFS pointer, and finally we need to check the blobs' contents and test whether it really is an LFS pointer. To optimize this a bit, we do set up a blob size limit of 200 bytes, which is the maximum size an LFS pointer can have. While this severely brings down the number of candidate blobs, one issue we have is that git-rev-list(1) will still unconditionally list all the other object types. Effectively, we're thus needlessly retrieving object info of all tags, commits and trees only to notice that they aren't blobs in the first place. It goes without saying that this is a huge waste of time. To tackle this problem, we have upstreamed two new options for git-rev-list(1): - By default, git-rev-list(1) will always unconditionally print objects which have directly been received either via the command line or via stdin. A new option `--filter-provided-objects` has been added which changes this behaviour and also causes provided revisions to be filtered. - A new object type filter `--filter=object:type=<type>` has been added which will cause git-rev-list(1) to only list objects whose type matches the given type. Used in combination, this brings down the number of potential LFS pointer candidates by a significant factor. Executed on linux.git: $ git rev-list --objects --filter=blob:limit=200 --all | wc -l 7146677 $ git rev-list --objects --filter=blob:limit=200 --all \ --filter=object:type=blob --filter-provided-objects | wc -l 15217 For this particular repo, we have a factor of 470 less objects to check for whether they are an LFS pointer or not. Naturally, this is an artificial demonstration only because we don't typically search LFS objects with `--all`. But we can expect that this translates to speedups at a smaller scale by not having to do pointless work. So let's use this by setting up the new `withObjectTypeFilter()` option in case we're running a Git version which supports it. No new feature flag is introduced given that we only implement it on the new pipeline code, which is already guarded by a featureflag anyway. Changelog: performance
-rw-r--r--internal/gitaly/service/blob/lfs_pointers.go12
1 files changed, 11 insertions, 1 deletions
diff --git a/internal/gitaly/service/blob/lfs_pointers.go b/internal/gitaly/service/blob/lfs_pointers.go
index f1a5c625b..200377d77 100644
--- a/internal/gitaly/service/blob/lfs_pointers.go
+++ b/internal/gitaly/service/blob/lfs_pointers.go
@@ -73,7 +73,17 @@ func (s *server) ListLFSPointers(in *gitalypb.ListLFSPointersRequest, stream git
return helper.ErrInternal(fmt.Errorf("creating catfile process: %w", err))
}
- revlistChan := revlist(ctx, repo, in.GetRevisions(), withBlobLimit(lfsPointerMaxSize))
+ gitVersion, err := git.CurrentVersion(ctx, s.gitCmdFactory)
+ if err != nil {
+ return helper.ErrInternalf("cannot determine Git version: %v", err)
+ }
+
+ revlistOptions := []revlistOption{withBlobLimit(lfsPointerMaxSize)}
+ if gitVersion.SupportsObjectTypeFilter() {
+ revlistOptions = append(revlistOptions, withObjectTypeFilter(objectTypeBlob))
+ }
+
+ revlistChan := revlist(ctx, repo, in.GetRevisions(), revlistOptions...)
catfileInfoChan := catfileInfo(ctx, catfileProcess, revlistChan)
catfileInfoChan = catfileInfoFilter(ctx, catfileInfoChan, func(r catfileInfoResult) bool {
return r.objectInfo.Type == "blob" && r.objectInfo.Size <= lfsPointerMaxSize