Age | Commit message (Collapse) | Author |
|
A recent change to how we build git resulted in 100s of MB of size
increase in the omnibus packages and cloud native docker images.
This change should bring the sizes back down.
|
|
limithandler: Return error from limit handler
See merge request gitlab-org/gitaly!4492
|
|
catfile: Make process cancellation synchronous to fix test race
See merge request gitlab-org/gitaly!4500
|
|
doc: Document process to upgrade the Git version
Closes #4191
See merge request gitlab-org/gitaly!4509
|
|
Start using protolint
See merge request gitlab-org/gitaly!4465
|
|
In c83f00059 (add concurrency queue limit 2022-01-24), we modified the
limit handler to return immediately with an error if certain limit
conditions are reached.
In the wrappedStream however, RecvMsg() merely logs the error. This will
block indefinitely, because when the limiter returns with an error, it
will no longer try to get a concurrency slot. Thus this function will
wait until the context is cancelled.
Fix this by returning an error when we get an error from the limiter.
Changelog: fixed
|
|
replicator: Drop logic to create and handle maintenance-style replication jobs
See merge request gitlab-org/gitaly!4494
|
|
config: Remove internal socket path configuration
Closes #4139
See merge request gitlab-org/gitaly!4504
|
|
Remove feature flag "command_stats_metrics", always enable feature
See merge request gitlab-org/gitaly!4484
|
|
When creating a new cached catfile process we need to decorrelate it
from the parent context. For one this means that we need to remove the
correlation ID and any tracing spans from the context. And second it
means we need to suppress cancellation of the parent so that our cached
process doesn't get killed when the RPC context is cancelled.
The way we do this right now is to use `context.Background()`: this just
creates a completely new background. Funny enough, we still try to strip
information from this new context, which doesn't make any sense given
that it's empty already. But this hints at an actual bug: the intent is
not to get a completely new context, but by using `context.Background()`
we also lose information such as feature flags here.
Fix this bug by not creating a completely new context. Instead, we just
suppress context cancellation via `helper.SuppressCancellation()` and
then continue to decorrelate it like we always used to do. Only that now
it actually does something.
Changelog: fixed
|
|
The catfile cache is using Goroutines to clean up processes. This is not
deterministic and thus frequently causes our tests to flake when we lose
one of the races caused by this. Furthermore, the asynchronicity makes
the code quite hard to understand.
Refactor the code to use synchronous cancellation. Instead of waiting
for the context to get cancelled, it is now the caller that is
responsible for killing the process. The caller transparently either
gets a cancellation function that
- kills the process directly in case it is an uncachable process.
- or returns the process to the cache in case it is a cachable
process.
This significantly simplifies the code and also makes it deterministic,
which fixes the flaky tests we have observed. Furthermore, this also
allows us to get rid of one more Goroutine per cached catfile process.
Note that this allows us to get rid of the signalling logic we had in
place previous to this change: there is no need to wait for processes to
return to the cache in our tests anymore because this is now guaranteed
after the cancellation function was called.
|
|
Both the `CatfileReader` and `CatfileInfoReader` have the same logic to
tear down the process when the context gets cancelled. This logic can
easily be unified into the cache: it already got a Goroutine that waits
for the context to get cancelled to decrement the process counter, so it
is a perfect fit to also host the other logic.
Refactor the code to do this. While this reduces complexity, it also
cuts the number of Goroutines in half.
|
|
The catfile cache still mentions "batch" in a lot of places, even though
what it's actually handling is processes that may or may not be a batch
process.
Adjust the variable names to instead say "process".
|
|
command: Add metric for spawning tokens
See merge request gitlab-org/gitaly!4487
|
|
With the ability to do feature-flag-based rollouts of the Git version
our process to upgrade Git versions has become a bit more complex.
Document it so that it's easy to follow.
|
|
git: Globally disable generation of server info for git-repack(1)
Closes #4123
See merge request gitlab-org/gitaly!4505
|
|
Previously this plugin was used for linting, but it actually generates
its own source files.
Simplifies the lint-proto job since CI already runs `make proto` there
is no need to duplicate work.
|
|
This makes the naming of all proto files consistent.
|
|
|
|
From https://github.com/yoheimuta/protolint this linter is configurable
and much more strict than protoc.
|
|
Makefile: Small improvements to how Git is built
See merge request gitlab-org/gitaly!4506
|
|
ci: Remove DOCKER_DRIVER=overlay2 from Gitaly CI as it is already the default
Closes #4196
See merge request gitlab-org/gitaly!4503
|
|
Since the inception of our `make git` target we have always passed the
`NO_INSTALL_HARDLINKS=YesPlease` build option. This option causes the
Git build system to create copies of the Git binaries instead of hard
links.
It is not clear why we should need to set this option though as it only
increases the size of the installed distribution by about 25MB without
much merit. So let's stop passing the build option so that Git's build
system can decide for itself how it should install the binaries.
|
|
Building either our full Git distribution or the bundled Git versions
always uses a `.version` file to verify whether we need to rebuild the
target or not: only if the commit, build options or patches have been
modified since the last time the `.version` file was generated will we
actually rebuild Git. This is an important optimization and works quite
well.
One case where it doesn't work well right now though is when building
different versions of distributed Git. Depending on which version the
user wants to build via `GIT_VERSION`, we use different directories to
build Git and thus also use different `.version` files. This has a catch
though: because the `.version` file will not be updated again when both
Git versions have already been built successfully, we now think that we
don't need to rebuild Git because its `Makefile` is still newer than the
`.version` file when we switch between versions.
Fix this bug by always using the same directory to build distributed
Git, regardless of its version. While this means more rebuilds, it's
simple and correct.
|
|
In 0d588e403 (git: Disable generation of server info in git-repack(1),
2022-03-11), we have started to disable generation of server info in
git-repack(1) by passing the `-n` flag to all invocations. This was the
best we could do because Git didn't provide any way to do this on a
global level. We have since upstreamed a new `repack.updateServerInfo`
config key though which allows us to do so.
The new config key was released with Git v2.36.0, so let's start
injecting it into git-repack(1) processes. Note that we cannot yet get
rid of the `-n` flag because the minimum required Git version is still
at v2.33.1, which doesn't understand the new config. But by injecting
the config via environment variables, Git will now also honor it in case
git-repack(1) was spawned via any intermediate command.
Further note that we cannot really test this either because we don't yet
have Git v2.36.0 as part of our CI infrastructure.
|
|
Signed-off-by: Takuya Noguchi <takninnovationresearch@gmail.com>
|
|
The replicator used to host logic to replicate maintenance-style RPCs,
and thus it also had some tests which exercised this logic. Now that
replication of those RPCs has been removed in favor of a best-effort
routing strategy, we're essentially not testing the replicator anymore
though, but the coordinator.
Move the test into the coordinator's test suite. While at it, refactor
it to be table-driven so that it's more readable and convert the setup
to use `runPraefectServer()`.
|
|
With commit 7a8b33aa7 (gitaly/config: Introduce runtime directory
configuration, 2022-03-17), we have introduced a new global runtime
directory that is supposed to hold all files and directories required by
Gitaly to operate correctly. This new directory supersedes the previous
internal runtime directory, which shouldn't be used anymore.
We have since converted the GDK, CNG and Omnibus accordingly and put up
a deprecation notice for v15.0. Let's thus remove the old configuration
option so that we now always use the runtime directory instead.
Changelog: removed
|
|
The function `GitalyInternalSocketPath()` exposed by Gitaly's
configuration is needlessly verbose with its `Gitaly` prefix. Rename it
to `InternalSocketPath()` instead.
|
|
Makefile: Drop bundled Git v2.33.1.gl3
See merge request gitlab-org/gitaly!4495
|
|
Instrument verification worker for observability
Closes #4097
See merge request gitlab-org/gitaly!4473
|
|
smarthttp: Add finalizing vote in PostReceivePack
Closes #4110
See merge request gitlab-org/gitaly!4488
|
|
operations: Remove feature flag for transactional voting in UserSquash
Closes #4119
See merge request gitlab-org/gitaly!4496
|
|
When git commands take a long time, it's hard to know whether it's from
the actual process or that time is spent waiting for a spawn token. This
adds a metric that measures and emits the time spent waiting for a spawn
token.
Changelog: added
|
|
Ignore verification columns for read-only cache updates
Closes #4159
See merge request gitlab-org/gitaly!4468
|
|
When all updates are rejected, there will be no votes cast. This leads
to an unnecessary replication job. In this case, we want to cast a last
but empty vote. This is the same as how SSHReceivePack works.
Changelog: changed
|
|
sidechannel: lower yamux close timeout
See merge request gitlab-org/gitaly!4491
|
|
This is a preperatory commit so that PostReceivePack can do transaction
voting in the RPC handler. This adds a transaction manager to the server
struct to be used by PostReceivePack.
|
|
With af4ea3258 (operations: Fix missing votes on squashed commits,
2022-03-18) we have introduced the use of quarantine directories in the
`UserSquash()` RPC to enable transactional voting. This change in
behaviour has been rolled out to production, and the change was enabled
by default in e8413304d (operations: Always use structured errors for
UserSquash, 2022-03-21).
We haven't seen any issues with this change. Remove the feature flag to
always enable it.
Changelog: removed
|
|
We have finished the migration to bundled Git v2.35.1.gl1 in v14.10. Due
to concerns with zero-downtime upgrades we couldn't yet remove the old
version though. But now that we have waited for a release we can finally
remove the old version.
Remove the infrastructure to build and install bundled Git v2.33.1.gl3.
Changelog: removed
|
|
The test repository helper in our rspecs is used to create temporary
repositories for testing purposes. The Git version it uses for this
purpose depends on a set of environment variables, but in case it runs
with bundled Git it needs to some more bootstrapping of an execution
environment in order to be able to properly clone the repository.
The bundled Git version is currently hard-coded to `gitaly-git`, which
we're about to phase out in favor of `gitaly-git-v2.35.1.gl1`. With the
removal of the old version the helper is thus about to break.
Fix this by switching over to the newer bundled Git version. While we
could do fancy things like injecting the version or auto-detecting it,
it doesn't really feel worth it over simply hard-coding it. We're
ultimately going to retire the sidecar over the next few releases, so we
shouldn't spend more time than necessary on maintaining it.
|
|
Starting with Gitaly v14.10, we use a best-effort strategy to route
maintenance-style RPCs. With this strategy, we always route such RPC
calls to all healthy nodes which have the given target repository,
regardless of whether they're currently consistent or not. Under this
new schema we also don't create any replication jobs anymore: we deem
maintenance to not be crucial enough to warrant all the infrastructure
here, and instead we can just wait for the next maintenance call to
optimize the repository.
Even though we don't create any replication jobs anymore, we still
handle them right now. And this is a strict requirement for now for
zero-downtime upgrades: a concurrently running Gitaly node may still be
generating maintenance-style replication events, and consequentially we
must still know how to handle them or otherwise we'll start generating
errors.
Despite that, we can still remove all the logic supporting these events
and just make them a no-op. Like this, we make sure to drain the queue
while keeping intact the new best-effort semantics. And furthermore, it
allows us to get rid of a significant chunk of code.
Do so and remove the infrastructure to handle maintenance-style
replication jobs. The no-op can then be removed in the next release
together with a SQL migration which drains all remaining events.
|
|
We have recently introduced special routing for maintenance-style RPCs,
which now uses a best-effort strategy. As a consequence, we don't create
any replication jobs for maintenance RPCs anymore, but the code which
handles the creation still exists.
Remove the unused code.
|
|
repository: RestoreCustomHooks to do transaction voting
Closes #4081
See merge request gitlab-org/gitaly!4481
|
|
[ci skip]
|
|
Lower the yamux StreamCloseTimeout setting from 5 minutes to 1 second
for sidechannel connections.
The motivation for changing this is to make it easier to test client
side cancelation in Workhorse tests.
|
|
Fix typo in documentation of UserCommitFilesRequestHeader
See merge request gitlab-org/gitaly!4422
|
|
Remove Maintenance routing feature flag
See merge request gitlab-org/gitaly!4486
|
|
[ci skip]
|
|
The verification worker is already collecting the metrics on how
many jobs it's processing. Praefect is still lacking a metric for
the overall verification queue depth which would be very helpful
in determining how much work is there left for the verification to
be completed. This commit adds a collector for the verification
queue depth. The metric is collected for each storage separately.
The metric differentiates between replicas that are unverified and
replicas that have been verified but the verification has expired.
|