blob: Buffer output of git-catfile to speed up reading LFS pointers

In order to read LFS pointer candidates which were returned by git-rev-list(1), we use git-cat-file(1) with the `--batch` flag. By default, git-cat-file(1) flushes output after each object is output, so that a process can interactively read and write from it. In our context we do not care about interactivity though, but instead only care about the stream of objects until we've either hit the limit of objects set by the user or until we get an EOF. As such, flushing each object is not necessary and slows down processing of objects. Fix the issue by adding the `--buffer` flag to git-cat-file(1). This causes it to use normal stdio buffering, which is more efficient than manually flushing after each object. This brings a small speedup when reading objects directly: # before BenchmarkFindLFSPointers/limitless-16 1 19982855076 ns/op BenchmarkFindLFSPointers/limit-16 1000000000 0.167 ns/op BenchmarkReadLFSPointers/limitless-16 1 18622537182 ns/op BenchmarkReadLFSPointers/limit-16 1000000000 0.120 ns/op # after BenchmarkFindLFSPointers/limitless-16 1 19988832079 ns/op BenchmarkFindLFSPointers/limit-16 1000000000 0.177 ns/op BenchmarkReadLFSPointers/limitless-16 1 17376539083 ns/op BenchmarkReadLFSPointers/limit-16 1000000000 0.103 ns/op So we've got a 7% speedup when reading a large bunch of LFS pointer candidates, and of 20% for limited reads. Note that there is no change for `FindLFSPointers` though. This is because by default, there is no buffering when a process writes into the stdin of another process directly anyway. So disabling flushing semantics doesn't really change anything in this context. As a result, we shouldn't see any improvement for either `GetNewLFSPointers` or `GetAllLFSPointers`, but there should be one for `GetLFSPointers`.
author: Patrick Steinhardt <psteinhardt@gitlab.com> 2021-03-11 09:57:21 +0300
committer: Patrick Steinhardt <psteinhardt@gitlab.com> 2021-03-11 10:15:58 +0300
commit: c09fbd455c4efaae13316fc1d25832f733500fa1 (patch)
tree: bfae095510a808eb1b82b310515f2c4405fcae29
parent: fcdef3b1e8c4b431700a9c2e47db830d32a8ad6f (diff)
2 files changed, 6 insertions, 0 deletions
diff --git a/changelogs/unreleased/pks-blob-lfs-catfile-buffering.yml b/changelogs/unreleased/pks-blob-lfs-catfile-buffering.yml
new file mode 100644
index 000000000..888fd9431
--- /dev/null
+++ b/changelogs/unreleased/pks-blob-lfs-catfile-buffering.yml
@@ -0,0 +1,5 @@
+---
+title: 'blob: Buffer output of git-catfile to speed up reading LFS pointers'
+merge_request: 3241
+author:
+type: performance
diff --git a/internal/gitaly/service/blob/lfs_pointers.go b/internal/gitaly/service/blob/lfs_pointers.go
index 92c1948af..092d11724 100644
--- a/internal/gitaly/service/blob/lfs_pointers.go
+++ b/internal/gitaly/service/blob/lfs_pointers.go
@@ -356,6 +356,7 @@ func readLFSPointers(
 		Name: "cat-file",
 		Flags: []git.Option{
 			git.Flag{Name: "--batch"},
+			git.Flag{Name: "--buffer"},
 		},
 	}, git.WithStdin(objectIDReader))
 	if err != nil {
author	Patrick Steinhardt <psteinhardt@gitlab.com>	2021-03-11 09:57:21 +0300
committer	Patrick Steinhardt <psteinhardt@gitlab.com>	2021-03-11 10:15:58 +0300
commit	c09fbd455c4efaae13316fc1d25832f733500fa1 (patch)
tree	bfae095510a808eb1b82b310515f2c4405fcae29
parent	fcdef3b1e8c4b431700a9c2e47db830d32a8ad6f (diff)