Age | Commit message (Collapse) | Author |
|
With 72497fc37 (ci: Add jobs which exercise Gitaly in FIPS mode,
2022-06-14), we have added a set of jobs which exercise Gitaly in FIPS
mode. These jobs require a special runner that has booted into FIPS mode
itself. Gitaly has been manually assigned such runners, but they are not
generally available for any of Gitaly's forks. Consequentially, trying
to run these jobs in any of our forks will eventually cause them to time
out because no runner could be acquired. And because our first job rule
for the FIPS jobs will automatically run when changes are merged to the
default branch, this causes pipelines to fail in our forks.
Fix this issue by only automatically creating these jobs when run in the
"official" repository. In case we're not running in that repository
we'll instead just create these jobs with a manual trigger.
|
|
go: Update Postgres dependencies
See merge request gitlab-org/gitaly!4647
|
|
ssh: Extend tests to emulate client-side failure conditions with sidechannel
See merge request gitlab-org/gitaly!4615
|
|
repository: Always use long-running LFS filter process in GetArchive
Closes #4282
See merge request gitlab-org/gitaly!4661
|
|
operations: Always enable structured errors in UserDeleteBranch
Closes #4286
See merge request gitlab-org/gitaly!4660
|
|
The command package is not only used to spawn Git commands, but may also
spawn any other arbitrary command. It's thus quite misleading that some
of the error messages are mentioninng "GitCommand"s as context.
Reword these errors to be command-agnostic. While at it, let's also use
the `%w` formatter to chain errors together.
|
|
The read monitor is implemented via a Unix pipe, which means that its
file descriptors need to always be closed by us or otherwise we have the
potential for a resource leak. This is done by spawning a Goroutine that
waits for the context to be cancelled. Unfortunately, when the context
is cancelled when spawning a command that is about to use these file
descriptors then we now run into a race: `exec.Cmd.Start()` may just be
about to duplicate those file descriptors to pass them to the child
process while we're in the process of closing them.
Fix this race by cleaning them up inside of the command's finalizer
instead. Like this we ensure that the descriptors are only closed after
we have already reaped the child process.
Changelog: fixed
|
|
We have cases where we need to clean up resources as soon as a command
has finished. Add the ability to set up a finalizer that runs exactly
once after the command has been reaped to make it easier to implement
this in a race-free manner.
|
|
When spawning commands, callers can afterwards add these commands to a
Cgroup and then call `SetCgroupPath()` to add the path to log messages.
But because we spawn a Goroutine that creates these logs as soon as the
context is done, this creates a race between setting the Cgroup path and
context cancellation.
Fix this race by handling addition to the Cgroups manager in the command
package itself. Like this, we can set up the Cgroup path before starting
the Goroutine and thus fix the race.
Changelog: fixed
|
|
When spawning commands, callers can afterwards call `SetMetricsCmd()`
and `SetMetricsSubCmd()` to have the command be properly represented in
Prometheus metrics via its command and subcommand. But because we spawn
a Goroutine that logs these metrics as soon as the context is done, this
creates a race between setting those names and context cancellation.
Fix this race by converting these functions into options that can be
passed to `New()` so that they're set up correctly at construction time.
Changelog: fixed
|
|
The `StdinSentinel` value can be passed to `WithStdin()` in order to
tell the command package to set up the command for `Write()`ing to it.
Replace this with a new option `WithSetupStdin()` so that we can avoid
the use of sentinel values.
|
|
Now that the command package accepts options, it becomes easier for us
to pass standard streams to `New()` by just assembling them into an
array of `command.Option`s, as well.
Refactor the code to do so.
|
|
We're about to extend construction of commands so that all members of
the resulting structure will be set at construction time in order to fix
racy access patterns. For this we will need to pass more parameters to
`New()` so that it has all information available at construction time.
Right now the function isn't all that extensive though given that it
only accepts a fixed set of parameters.
Refactor `New()` to instead accept a variable number of `Option`s. As a
first step, convert all parameters currently passed to it to instead use
these new options.
|
|
Modernize tests of the command package to follow our current code style.
|
|
Reorder the code so that all functions belonging to the `Command` type
are grouped together.
|
|
Add several tests that exercise how SSHUploadPackWithSidechannel behaves
in case the client-side fails in unexpected ways.
|
|
In order to be able to monitor what git-upload-pack(1) is communicating
with the user we create a pipe so that we can intercept the data. And
while make sure to close the reading-side of the pipe in all cases, we
in fact don't close the writing side when spawning git-upload-pack(1)
fails.
Fix this potential resource leak by immediately deferring the close of
the writing side.
|
|
localrepo: Speed up calculating size for repo with excluded alternates
See merge request gitlab-org/gitaly!4657
|
|
go: Update module github.com/google/uuid to v1.3.0
See merge request gitlab-org/gitaly!4648
|
|
go: Update module github.com/pelletier/go-toml to v1.9.5
See merge request gitlab-org/gitaly!4650
|
|
In 0603c758d (repository: Use long-running filter process for converting
LFS pointers, 2022-05-31), we have introduced support for the long-lived
process protocol so that we only require a single `gitaly-lfs-smudge`
to convert multiple LFS pointers instead of one per pointer. This has
been wired up for `GetArchive()` to speed up this RPC in large repos
with many pointers.
This change was rolled out on June 13th without any issues having been
observed so far. So let's remove the feature flag so that we always use
the long-running filter process protocol.
|
|
The `PreReceiveError` field in the `UserDeleteBranchResponse` is never
set anymore in favor of the new structured errors. Mark the field as
deprecated.
|
|
With 30d4fd427 (operations: Use structured errors in UserDeleteBranch,
2022-06-07), we have introduced structured errors in UserDeletBranch.
The intent is to both fix silent errors in case we failed to delete the
branch, and to fix replication in case no transactional votes were cast
in the error case.
The change has been rolled out on June 14th and is part of v15.1. So
let's remove the feature flag guarding it.
Changelog: fixed
|
|
Optimize ChompBytes() function
See merge request gitlab-org/gitaly!4659
|
|
The github.com/pelletier/go-toml package is updated to 1.9.5, and this
requires us to update NOTICE.
|
|
No need to allocate an intermediate string. Trim suffix first, then turn the result into a string.
|
|
When calculating a repository's size, we optionally allow the caller to
exclude the size of any object pools the repository is connected to.
This causes us to add `--not --alternate-refs` to the git-rev-list(1)
command, which will thus exclude all objects from disk usage calculation
that are reachable by the alternate.
As it turns out though, we're hitting a performance edge case: we ask
git-rev-list(1) to use bitmaps to calculate the size, but in the case of
a pooled repository only the object pool itself will have a bitmap. This
means that by definition, the bitmap can only contain objects that we
wish to exclude from the disk calculations anyway. All objects that are
not reachable by the pool are thus known to not be contained in any
bitmap. Because of this using bitmaps is extremely inefficient as shown
by the following benchmark, which is performed in `gitlab-org/gitlab`:
Benchmark 1: git rev-list --all --objects --disk-usage
Time (mean ± σ): 13.290 s ± 0.085 s [User: 13.023 s, System: 0.255 s]
Range (min … max): 13.160 s … 13.355 s 5 runs
Benchmark 2: git rev-list --all --objects --disk-usage --use-bitmap-index
Time (mean ± σ): 3.588 s ± 0.016 s [User: 3.326 s, System: 0.259 s]
Range (min … max): 3.576 s … 3.616 s 5 runs
Benchmark 3: git rev-list --not --alternate-refs --not --all --objects --disk-usage
Time (mean ± σ): 6.828 s ± 0.056 s [User: 6.601 s, System: 0.363 s]
Range (min … max): 6.761 s … 6.897 s 5 runs
Benchmark 4: git rev-list --not --alternate-refs --not --all --objects --disk-usage --use-bitmap-index
Time (mean ± σ): 68.105 s ± 0.383 s [User: 67.471 s, System: 0.744 s]
Range (min … max): 67.663 s … 68.509 s 5 runs
Summary
'git rev-list --all --objects --disk-usage --use-bitmap-index' ran
1.90 ± 0.02 times faster than 'git rev-list --not --alternate-refs --not --all --objects --disk-usage'
3.70 ± 0.03 times faster than 'git rev-list --all --objects --disk-usage'
18.98 ± 0.14 times faster than 'git rev-list --not --alternate-refs --not --all --objects --disk-usage --use-bitmap-index'
As you can see in benchmark #1 and #2, bitmaps speed up disk usage
calculations when not using alternate references. But the use of bitmaps
severely degrades performance by almost a factor of 10 as soon as we use
them in combination with `--alternate-refs` as shown in #4. On the other
hand, when we disable the use of bitmaps with alternate refs we are only
about twice as slow as compared to not iterating over alternate refs.
Interestingly, we never hit this issue in production until recently.
This is because of a configuration issue we have had in production: we
unconditionally set `core.alternateRefsCommand=exit 0 #`, which causes
us to skip over any alternate refs even when explicitly asking for them
via `--alternate-refs`. This is definitely unintentional as it causes us
to not honor the case where the client asks for shared objects to be
excluded from the size calculations. With a recent change though we
fixed this issue and started to correctly iterate over alterante refs
again, but that resulted in a 20-fold increase in latency for the
`RepositorySize()` RPC. So we're currently living in a world where
`RepostiorySize()` is either broken, or where it has significant issues
with performance.
Mitigate the performance hit by not using bitmaps when the client asks
tor alternate references to be excluded only in case the repository has
an object pool. As shown by the benchmark, this should result in a 10x
speedup compared to using bitmaps for repositories with many refs.
Changelog: fixed
|
|
Convert the tests for calculating the repository size for a repository
with alternates to be table-driven instead of using a standalone test.
This will our increase test coverage when we start verifying command
line parameters passed to git-rev-list(1).
|
|
gitaly-hooks: Update README.md
See merge request gitlab-org/gitaly!4616
|
|
Replace deprecated WithInsecure with WithTransportCredentials
See merge request gitlab-org/gitaly!4658
|
|
Clarify the story of how `gitaly-hooks` is invoked through being
symlinked from `post-receive`, `pre-receive`, `update`,
`reference-transaction`.
Add section on `reference-transaction` hook.
Update some outdated code references.
|
|
gRPC has deprecated the WithInsecure option in favor of the more
general WithTransportCredentials combined with insecure.NewCredentials.
This commit removes our usage of the deprecated option so we don't get
lint failures for using deprecated functionality when upgrading our
gRPC dependency.
|
|
FindChangedPaths: Add old_mode and new_mode fields
See merge request gitlab-org/gitaly!4592
|
|
Ensure there is a receive hooks payload before using it
See merge request gitlab-org/gitaly!4655
|
|
Convert the tests for calculating the repository size for a repository
while excluding a subset of references to be table-driven instead of
using a standalone test. This will our increase test coverage when we
start verifying command line parameters passed to git-rev-list(1).
|
|
Move repository creation into the test helper's setup function so that
we have full control over how repositories are initialized for each
respective test. This will allow us to add tests for repositories with
alternates into the table-driven tests.
|
|
Tests should ideally not be using worktrees, but instead use our
`gittest` helpers to write Git objects into bare repositories. Convert
our tests for `localrepo.Size()` to do so.
|
|
While we know that git-fetch(1) is having issues when using alternate
refs, we currently only disable it when using `FetchInternal()`. Let's
globally disable usage of alternate refs so that we don't ever hit the
performance edge case when fetching into repositories which have an
object pool connected to them.
This fixes a performance regression in the FetchRemote RPC in case the
external configuration doesn't ask us to ignore alternate refs. In a
recent incident, latency of the RPC regressed from ~1s to ~90s per
invocation for forks of `gitlab-org/gitlab` when the external Git
configuration was removed.
Changelog: fixed
|
|
[ci skip]
|
|
Changelog: fixed
|
|
operations: Convert UserCherryPick to return structured errors
See merge request gitlab-org/gitaly!4585
|
|
smarthttp: Refactor PostReceivePack tests and fix flakiness caused by keep-alive
Closes #4291
See merge request gitlab-org/gitaly!4643
|
|
On some machines we see that the PostUploadPack tests are flaky because
the responses we receive sometimes contain `0005\01` prefixes. As it
turns out, these are keep-alive packets that Git starts to send over the
sideband every 5 seconds after it has received the full packfile from
the client. So these are legit and can totally be expected on machines
that are sufficiently slow.
Implement a new helper function `requireSideband()` that parses the
response into separate sideband lines while stripping out any keep-alive
packets to fix this flakiness.
|
|
Group together test helper functions at the bottom of our
PostReceivePack tests to make them easier to find.
|
|
Adjust test names for PostReceivePack tests to match our best practices.
|
|
This is a set of random small refactors for the PostReceivePack tests to
align them better with our current best practices with regards to tests.
|
|
The tests which exercise our request validation for PostReceivePack only
assert error codes. This is quite fragile though as it could easily be
that the error code matches what we'd expect, while the actual error is
still caused by something different.
Refactor the test to instead verify that the actual error matches.
|
|
Rename `doPush()` to `performPush()` to clarify the intent of this
helper function. While at it, change its return type from `[]byte` to
`string`: all callers cast the result to a string anyway.
|
|
The `newTestPush()` helper is not really matching our current best
practices with regards to how we write tests. It's hard to understand
what this function does by just taking a look at its name.
Refactor the code to better match our best practices and rename the
helper to `createPushRequest()`.
|
|
Convert tests to use gittest helpers to write pktlines instead of using
our own, hand-coded implementation.
|