Age | Commit message (Collapse) | Author |
|
PruneUnreachableObjects
When pruning objects we need to rewrite our commit-graphs so that they
don't reference any commits that have just been pruned. While this is
not typically an issue, git-fsck(1) will report any such inconsistencies
in the commit-graph. Furthermore, Git may return an error when being
asked to parse such a missing commit directly.
Fix `PruneUnreachableObjects()` to do so.
Changelog: fixed
|
|
PruneUnreachableObjects
Add a test to demonstrate a shortcoming we have with commit-graphs that
reference pruned commits: even though we're pruning commits, we don't
update the commit-graphs accordingly to drop references to any such
pruned commits.
|
|
When pruning objects we need to rewrite our commit-graphs so that they
don't reference any commits that have just been pruned. While this is
not typically an issue, git-fsck(1) will report any such inconsistencies
in the commit-graph. Furthermore, Git may return an error when being
asked to parse such a missing commit directly.
Fix `GarbageCollect()` to do so.
Changelog: fixed
|
|
Add a test to demonstrate a shortcoming we have with commit-graphs that
reference pruned commits: even though we're pruning commits, we don't
update the commit-graphs accordingly to drop references to any such
pruned commits.
|
|
Refactor the test that verifies behaviour of the GarbageCollect RPC when
there are lockfiles around to match modern best practices.
|
|
When pruning objects we need to rewrite our commit-graphs so that they
don't reference any commits that have just been pruned. While this is
not typically an issue, git-fsck(1) will report any such inconsistencies
in the commit-graph. Furthermore, Git may return an error when being
asked to parse such a missing commit directly.
Fix `OptimizeRepository()` to do so when it's pruned objects.
Changelog: fixed
|
|
Move writing of commit-graphs to happen after pruning of objects. This
is done so that we can start to use the information whether we pruned
any objects or not to decide whether to rewrite the commit-graph chain.
By moving the logic after packing refs we also have the benefit that we
have less loose refs to count in our heuristics.
|
|
Add a test to demonstrate a shortcoming we have with commit-graphs that
reference pruned commits: even though we're pruning commits, we don't
update the commit-graphs accordingly to drop references to any such
pruned commits.
|
|
Now that we have information about whether we did a full rewrite of the
commit-graph chain or only an incremental update we can provide better
metrics in `OptimizeRepository()` to report what we have been doing.
|
|
Make the behaviour of whether we rewrite the commit-graph or not
controllable by callers and adjust callers accordingly. This new
configuration will be used to rewrite the commit-graph in more cases
than only missing bitmaps, as it is done right now.
|
|
Now that writing commit-graphs has been disentangled from repacking
objects we can iterate on the heuristics we use to determine whether we
need to write commit-graphs:
- We stop writing commit-graphs in case there are no references.
This wouldn't make any sense anyway given that we use only write
commit-graphs for reachable commits, of which there aren't any in
case there are no references.
- We stop repacking objects based on whether the repository has
bloom filters or not. This logic was previously mixed into the
heuristics for `RepackObjects()`, but that has only be a historic
artifact because that function handled both repacks and rewrites
of commit-graphs. Instead, we only rewrite commit-graphs based on
that information.
Ultimately, the end result should be that we repack objects less often,
but keep rewriting commit-graphs in the same circumstances as we did
before.
Add tests to verify this new behaviour and adjust existing tests for
repacking of objects.
|
|
Writing commit-graphs and repacking objects is currently both done in
`RepackObjects()`. Back when the code was introduced this was done to
keep required code changes at bay by continuing to always rewrite the
commit-graph when we repack objects. We're about to iterate on the
strategy we use to write commit-graphs though, which requires us to
start doing so independently of whether we did or didn't repack objects.
Pull out the logic to write commit-graphs from `RepackObjects()` and
adjust all callsites of the latter to manually write commit-graphs. This
has merit on its own given that we can now properly report metrics for
writing the commit-graph on its own.
|
|
We're about to add a second callsite that requires us to count loose and
packed references in a repository as `packRefsIfNeeded()` already knows
to do. Extract the logic into its own function to make it reusable.
|
|
The errors reported in our Repack RPCs may not be wrapped and are
lacking breadcrumbs. Fix both issues by using `helper.ErrInternalf()`.
|
|
helper: Fix error messages for wrapped gRPC errors
See merge request gitlab-org/gitaly!4692
|
|
testserver: Capture Praefect logs if dialing fails
See merge request gitlab-org/gitaly!4691
|
|
golangci-lint: Increase timeout to 10 minutes
See merge request gitlab-org/gitaly!4707
|
|
updateref: Skip known-flaky test with Git v2.33.0
See merge request gitlab-org/gitaly!4699
|
|
We have recently started to see that golangci-lint is timing out more
frequently. Let's bump its timeout to 10 minutes to stop this from
happening.
|
|
Update golangci-lint to v1.46.2. There aren't any relevant changes to us
in this release, but it keeps the next person from checking whether
there are.
|
|
go: Update module github.com/containerd/cgroups to v1
See merge request gitlab-org/gitaly!4697
|
|
In 710409d89c8f31a7b711612b1860b8b2771965c4 we removed the 15 second
timeout in `waitHealthy()` to avoid spurious failures in slow CI
environments. Since this change we have seen occasional instances of
multi-minute waits for Praefect, causing the test process to panic on
timeout.
To better understand what's going wrong, let's increase Praefect's log
verbosity to `info` and print stderr (where all logging is written) when
the context deadline is reached in `waitHealthy()`. If we're lucky this
will have an error, but even which events were logged should give us
clues as to where Praefect got stuck.
|
|
testcfg: Fix workaround to build Go binaries in unowned directories
See merge request gitlab-org/gitaly!4694
|
|
Update template to have Enablement section labels
See merge request gitlab-org/gitaly!4667
|
|
|
|
go: Update module github.com/stretchr/testify to v1.8.0
See merge request gitlab-org/gitaly!4686
|
|
cgroups: Adjust metric names & disable metrics with config
See merge request gitlab-org/gitaly!4619
|
|
|
|
On many cloud providers, cadvisor runs and keeps track of cgroups. To
have Gitaly expose its own metrics is redundant at this point. Allow
users to configure whether or not they would like Gitaly to collect its
own metrics about cgroups.
Changelog: added
|
|
linguist: Implement Stats in pure Go
See merge request gitlab-org/gitaly!4580
|
|
git: Fix commit-graph corruption caused by corrected committer dates
Closes #4327 and gitlab#365903
See merge request gitlab-org/gitaly!4677
|
|
testhelper: Replace `testhelper.ModifyEnvironment()`
See merge request gitlab-org/gitaly!4685
|
|
Git v2.33.0 had a bug in git-update-ref(1) where it didn't know to flush
its output correctly. As a result we cannot be sure that references have
been locked when calling `updateref.Prepare()` because we don't get a
confirmation from Git. We have since upstreamed a fix for this bug which
works as expected, but one of our tests is still frequently failing
because of exactly that bug when running with Git v2.33.0.
Skip this test to reduce flakiness of our pipelines. We know it's an
issue, we have a fix for it, and we want to upgrade the minimum required
Git version anyway. So there is not much of a point to continue hitting
this flake.
|
|
|
|
In staging systems we have observed corruption in commit-graphs with the
following error message:
fatal: commit-graph requires overflow generation data but has none
This bug is caused by the rollout of Git v2.36.0, which has fixed a set
of bugs with reading corrected committer dates in commit-graphs [1].
Unfortunately, these fixes surface a corruption in commit-graphs that
can happen when upgrading a commit-graph written by Git v2.35.0 with a
Git version of v2.36.0 or later with `--changed-paths` enabled [2].
Disable use of corrected committer dates for now. Due to the bug that
existed in Git v2.35.0 and earlier we haven't ever read them anyway, so
this is not a performance regression for us. Instead, we'll continue to
use topological generation numbers to still speed up certain queries.
We should reenable them when the bug has been fixed upstream.
[1]: http://public-inbox.org/git/pull.1163.git.1645735117.gitgitgadget@gmail.com/
[2]: https://public-inbox.org/git/DD88D523-0ECA-4474-9AA5-1D4A431E532A@wfchandler.org/
Changelog: fixed
|
|
[ci skip]
|
|
Add an option to skip flat paths for tree entries
See merge request gitlab-org/gitaly!4693
|
|
To make sure we're not breaking things when we'll switch to go-enry for
the language detection, compare the known languages of the linguist gem
with the Go package.
|
|
The go-enry package is based of github-linguist v7.20.0. To be able to
compare the set of languages they both know, we update the Gem to match
the version where the go package got the mustard from.
|
|
The main reason we wrote the Go implementation for linguist is getting
rid of Ruby. But we don't want this implementation to be slower. So to
compare the performance, this change adds a benchmark of both
implementations.
These are the results running it on my computer:
$ go test -run=^$ -bench=. -benchtime=4x ./internal/gitaly/linguist
goos: linux
goarch: amd64
pkg: gitlab.com/gitlab-org/gitaly/v15/internal/gitaly/linguist
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
BenchmarkInstance_Stats/go_language_stats=false/from_scratch-8 4 56371415917 ns/op
BenchmarkInstance_Stats/go_language_stats=false/incremental-8 4 25310716778 ns/op
BenchmarkInstance_Stats/go_language_stats=true/from_scratch-8 4 3149285756 ns/op
BenchmarkInstance_Stats/go_language_stats=true/incremental-8 4 1402539266 ns/op
Getting the stats from scratch drops from roughly 56s to 3.1s, which is
impressive. Getting stats incrementally, drops from about 31s to 1.4s,
which is also pretty nice.
On the topic of cache size, these are the file size for both
implementations:
* Ruby: 473993 bytes
* Golang: 436335 bytes
These are comparable in size and it's a relief to see it's not
significantly larger in the new implementation.
|
|
This change adds an alternative implementation of linguist.Stats using
go-enry as a pure Go solution. The code is behind a default disabled
feature flag 'go_language_stats'.
Issue: https://gitlab.com/gitlab-org/gitaly/-/issues/2571
Changelog: performance
|
|
We're about to introduce an implementation of getting the language
statistics in Go. For this we need a way to collect and store the
results in a cache. This cache will be used to incrementally calculate the
stats between commits.
This change introduces the linguist.languageStats struct which will deal
with all this.
|
|
In some cases you might only have access to a localrepo.Repo instance,
and directly to the locator. To make it more convenient to find the
TempDir for the storage where the repo is on, add a helper method to
Repo that will use it's storage.Locator to determine the temporary dir
for the storage.
It also ensures this temp dir exists.
|
|
This change adds a function to get a RevisionIterator from the output of
git-ls-tree. This iterator will loop through all files that are
reachable from the given revision.
|
|
This change adds a function to get a RevisionIterator from the output of
git-diff-tree. This iterator will loop through all the new objects that
has been introduced between the two given revisions.
|
|
We're about to add an alternative implementation for the Stats method,
written in Go, and for that we need a few different things. This change
prepares for that.
|
|
It was a little bit hard to debug the incorrectness of the expected
languages and their attributes. With this change we use
`testhelper.ProtoEqual` to compare the expected with the actual language
statistics. This will provide a better error message when there is a
mismatch.
|
|
We're about to make some changes in the handling of the CommitLanguages
rpc, and because I noticed this RPC did not have any documentation, I've
decided to add it.
|
|
Populating flat paths for large trees may be expensive
Let's add an option to skip it (to preserve the default behaviour)
in order to be able to avoid this expensive operation when it's
not necessary
Changelog: added
|
|
Go is embedding VCS information into Go binaries since Go 1.18, which it
derives from the repository by executing some Git commands. This doesn't
work though when the repository is not owned by the user building the
binaries due to CVE-2022-24765, where Git started to refuse operating in
any such repository it doesn't own.
We have tried to fix this in 61331af03 (testcfg: Fix building binaries
as unprivileged user with Go 1.18+, 2022-07-07) by setting `GIT_CONFIG_`
environment variables to inject the `safe.directory` config entry, which
can be used to override this safety mechanism. This doesn't work though,
as documented by git-config(1):
This config setting is only respected when specified in a system or
global config, not when it is specified in a repository config, via
the command line option -c safe.directory=<path>, or in environment
variables.
Work around this limitation by writing a temporary, system-level config
file that contains this key and setting `GIT_CONFIG_SYSTEM` to point to
that file.
|