diff options
author | Patrick Steinhardt <psteinhardt@gitlab.com> | 2021-02-22 17:41:12 +0300 |
---|---|---|
committer | Patrick Steinhardt <psteinhardt@gitlab.com> | 2021-02-22 17:44:24 +0300 |
commit | 3c73c0df32d7aa933aa66d52f7ac19751e1a3e62 (patch) | |
tree | c8760b9171b5bb9c8ce8012377217c67e6c72122 | |
parent | 3f29772a7241e66cc89608778b2411f2a3733a17 (diff) |
ruby: lfs: Limit blob size when searching for new LFS pointers
Given that LFS is bolted on top of git, git itself doesn't provide any
way to nicely find all LFS pointers. We thus have to resort to
heuristics to identify whether a blob is an LFS pointer or not. The
most important heuristic here is that we limit by blob size: we know
that the maximum size for LFS objects is 200 bytes, so we do not want to
load and scan any larger objects. This is a significant optimization,
given that it allows us to skip loading any biggish objects into memory.
While we do use this optimization when searching all referenced blobs,
we do not when only searching for new pointers in a limited revision
graph. Fix this by using the same filter for both operations, which
should significantly speed getting new pointers.
-rw-r--r-- | ruby/lib/gitlab/git/lfs_changes.rb | 7 |
1 files changed, 6 insertions, 1 deletions
diff --git a/ruby/lib/gitlab/git/lfs_changes.rb b/ruby/lib/gitlab/git/lfs_changes.rb index 194d5f584..3e1a55465 100644 --- a/ruby/lib/gitlab/git/lfs_changes.rb +++ b/ruby/lib/gitlab/git/lfs_changes.rb @@ -17,7 +17,12 @@ module Gitlab private def git_new_pointers(object_limit, not_in) - rev_list.new_objects(**rev_list_params(not_in: not_in)) do |object_ids| + params = { + options: ["--filter=blob:limit=#{Gitlab::Git::Blob::LFS_POINTER_MAX_SIZE}"], + not_in: not_in + } + + rev_list.new_objects(**rev_list_params(params)) do |object_ids| object_ids = object_ids.take(object_limit) if object_limit Gitlab::Git::Blob.batch_lfs_pointers(@repository, object_ids) |