Age | Commit message (Collapse) | Author |
|
This commit adds a prometheus counter that records which connection
type was used for a request. This is just to verify the performance
testing of the multiplexed connections actually used the multiplexed
connections and will be removed once the testing is done.
|
|
This commit configures defaults for yamux that match our fit our
use case better:
1. AcceptBacklog is set to 1, as we initiate at most one stream from
bot sides of the connection. With that in mind, we should never have
more than one stream waiting to be accepted on both sides. If we do,
something is wrong and it's fine to error out.
2. Yamux keep alives are disabled as we are already using gRPC to send
keep alives. gRPC should be used for this so keep alives get sent to
non-multiplexed connections as well.
3. Maximum stream receive window size was increased from the default of
256Kib to 16Mib. We only have two streams per connection which
carry all of the gRPC traffic. As such, we can give a more generous
receive buffer to ensure the multiplexing does not bottleneck the
connection. The buffer size is quite generous to ensure we have a good
throughput even if the latency is higher between the Praefect's and the
Gitalys. Given each Praefect holds a single connection to each Gitaly and
we have two mux streams on each connection, the memory impact of of each
connection is increased by a maximumx of 32Mib. This should be fine given
even our 50k reference architechture only recommends 3 Gitaly nodes.
That said, given that Gitaly Cluster should be deployed in a low latency
environment, we probably could get by with a smaller buffer. We probably
want to tweak the value further in the future or allow it to be configured.
For now, this value should cover most cases and not have too big of a memory
impact.
|
|
This commit integrates connection multiplexing in Gitaly and Praefect.
Praefect dials both regular and multiplexed connections to Gitaly.
The `gitaly_connection_multiplexing` feature flag is used to toggle
which connection Praefect uses to send the requests to the Gitaly.
The multiplexed connections are not health checked separately as they
are not tied to the leader election logic. Connection multiplexing is
only used if `sql` election strategy is configured, as the implementation
is only done there. Replicaton queue operations are do not go overm multiplexed
connections as it is not possible to feature flag them. This should be
sufficient for performance testing, proper integration follows with the voting
logic changes if the performance is acceptable.
|
|
This commit adds DialWithMux to client package which dials to Gitaly
using a multiplexed connection. This is an experimental function that
will be removed in the future and is not considered for backwards
compatibility. This function is only going to be used for performance
testing the multiplexed connection.
|
|
The client packages' Dial already sets the WithInsecure dial option
if TLS is not used. As such, it si not necessary to set the WithInsecure
option again in newRepositoryClient. Doing so will fail the tests if
transport credentials are configured as will be the case with multiplexed
connections.
|
|
This commit plugs in the connection multiplexing logic into Gitaly.
The aim is to test the peformance of the multiplexed connection in
production. As such, the Registry is not injected anywhere else in
Gitaly as it is not going to be used yet for voting.
|
|
Remove config.Config from wiki package
See merge request gitlab-org/gitaly!3314
|
|
Fix race condition failure
Closes #3529
See merge request gitlab-org/gitaly!3291
|
|
As we are ready to get rid of the global config.Config
variable this change does exact it for the files in the
wiki package. It also includes refactoring: addition of
the setup functions, renames and marking functions as
test-helpers.
All ruby sidecar dependent tests are assembled into a single one
as a sub-test. It uses a single instance of the ruby service
created internally. The global rubyServer variable and
initialisation code are removed as they are not unused anymore.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/2699
|
|
blob: Remove feature flags for LFS pointer RPC ports
See merge request gitlab-org/gitaly!3309
|
|
Upgrade git version to v2.31.1
See merge request gitlab-org/gitaly!3306
|
|
Prevent running reconcile with repository specific primaries
See merge request gitlab-org/gitaly!3303
|
|
Add backchannel package for bidirectional gRPC invocations
See merge request gitlab-org/gitaly!3270
|
|
blob: Convert LFS pointer search by revision to use pseudo-revs in args
See merge request gitlab-org/gitaly!3304
|
|
maintenance: Introduce rate limiting for `OptimizeRepo`
See merge request gitlab-org/gitaly!3296
|
|
Historically, we couldn't pass pseudo-revisions as normal positional
arguments when searching for LFS pointers via git-rev-list(1) because
their leading dash is disallowed by our git DSL. Given that this
restriction has now been lifted in the preceding commit, we can get rid
of some awkward constructs we had in our code to work around the
limitation.
Convert `findLFSPointersByRevisions()` to not accept any options
anymore, but instead pass the pseudo-revisions directly as positional
arguments to simplify the code.
|
|
The git-rev-list(1) command is used to compute a list of objects based
on a set of revisions passed in by the user. Next to passing normal
revisions, it also allows pseudo-revisions like "--not" or "--all" which
have some special logic and may influence normal revisions which follow
after the given pseudo-revision. But because they start with a leading
dash and we refuse positional arguments with dashes, we haven't been
able to use their full potential yet.
With the new per-command validation of positional arguments, this
shortcoming can now easily be fixed: while we keep normal validation of
positional arguments in place, we do override it for "--all" and "--not"
such that they're allowed in positional arguments when git-rev-list(1)
executes. They're harmless, and being able to mix them with normal
revisions will allow better interfaces for us in the future.
|
|
Currently, all git commands use the same logic to validate global,
positional and post-separator arguments. This is inflexible, because
different commands use different syntax and also have different
behaviour. Until now, this hasn't bitten us yet and has served its
purpose of disallowing injection of command line options. But for some
commands, the existing sets of rules bar us from using some features.
Introduce a new optional function for command descriptions which allow
us to override validation of positional arguments per command. This can
be used in arbitrary ways: either to strengthen validation and assert
that certain kinds of arguments cannot ever be passed, or to weaken
validation in case we know that a set of arguments is safe to be
accepted as positional argument.
A user of this is going to be introduced in the following commit.
|
|
Assembly of command args is currently hosted next to the `Cmd` interface
and its implementations. While this isn't wrong per se, it also relies
on the command descriptions such that it knows how options can be
passed.
This commit thus moves over assembly of command arguments to be a
receiver function for the command descriptions. This allows us to easily
extend command descriptions to implement custom per-command validations
of arguments in a subsequent commit.
|
|
The `gitCommands` array hosts descriptions about all the git commands
we're running in Gitaly's codebase. Its naming is somewhat awkward
though because it clashes with the `Cmd` interface and its `SubCmd`
respectively `SubSubCmd` implementations. By only taking a look at their
names, it's not obvious what the scope of each of these is and how they
interact with each other.
Rename the `gitCommands` array and its `gitCommand` structure to
`commandDescriptions` respectively `commandDescription`. Because this is
what it is: it describes git commands, how we need to invoke them and
how they behave.
|
|
Over the years, we have grown quite a lot of options which can be passed
to git commands. This has made the actual implementation of git commands
quite hard to read given that the code file where it's hosted does so
many different things now.
Split out command options into a separate file to fix this and help
readability.
|
|
Now that the LFS RPCs have been ported from Ruby to Go, the blob service
server has no dependency on the Ruby server anymore. Remove the
dependency both from the server itself as well as from our testcode.
|
|
The Go port of the LFS pointer requests have been enabled in production
for some time now without any issues. Given that they've been default
enabled already, this commit now gets rid of those flags completely.
|
|
When Gitaly needs to cast a vote for a reference transaction, it
currently relies on Praefect to provide address information and tokens
needed to dial to the Praefect for voting. This can be problematic as
Praefect's listening socket may not be reachable by the Gitaly. Configuring
TLS provides additional challenges, as Gitaly would need to have the certificates
to identify the Praefect.
Praefect already establishes a connection to Gitaly, so we can avoid these problems
by piggybacking Gitaly's votes through that same connection. This piggybacking could
be implemented as persistent bidi stream that Praefect calls Gitaly with. However, this
leaves a lot of the complexities managed by gRPC up to us, such as request routing,
concurrency handling, cancellations and request packing.
In order to leverage gRPC for the above, we can instead multiplex the network
connection established by Praefect in order to run multiple gRPC/HTTP2 sessions in it.
This allows for dialing from Gitaly to a gRPC server running on Praefect's end of the
established connection. This allows us to rely on simply using gRPC in the usual fashion
to communicate with the transaction service running in Praefect.
The implementation has components in the client and the server.
The client implements a function, a ServerFactory, that returns a server that should
serve on the client's end of the connection. It implements a ClientHanshake which multiplexes
the connection and starts the backchannel server on it. The gRPC ClientConn itself looks like
normal ClientConn to the rest of the code.
The server implements most of the functionality in gRPC's ServerHandshake. When a connection
is established, the server peeks the stream to see if the client has indicated it supports
multiplexing. Client indicates this by sending magic bytes to the server after establishing
the connection. If the client is not multiplexing aware, the server handles the connection
as it does usually. If the client is multiplexing aware, the server establishes a multiplexing
session, dials the client's backchannel server and starts serving the connection as usual.
The ServerHandshake injects an identifier which can be accessed through the RPC handler's
context. The ID can be used to retrieve the peer's backchannel connection when it is needed,
namely when the votes need to be cast. Connections are uniquely identified by an incrementing
counter, so no ID can ever refer to the wrong peer.
|
|
With the current implementation of `OptimizeRepo`, we're trying to do as
much work as possible in the timeframe dictated by the maintenance
schedule. This has proven to be problematic even in times of reduced
load on deployments because we were essentially DoS'ing ourselves with
so many requests that it caused alerts to trigger. On staging systems,
we see more than 200 calls per second for the `OptimizeRepository` RPC.
Even if those jobs don't need to do any heavy repacking, it does cause
heavy read-load on Gitaly nodes just to determine that nothing needs to
be done.
As a first iteration towards betterment, this commit introduces rate
limiting to the maintenance job. Instead of going as quickly as we can,
we limit requests per second to 1.
Of course this means that we're now able to optimize less repositories
than we previously did. But this is still much better than driving a DoS
against ourselves and waking up SREs on weekends. Depending on the
typical load we'll see with this rate limiting, we may be able to enable
the maintenance task 24/7.
|
|
Now that we have a separate implementation of the random filesystem walk
via the `randomWalker`, this commit converts the `OptimizeRepository`
implementation to use it. This allows us to convert the current
recursive implementation into a simple loop, which is easier to reason
about and also easier to extend in subsequent commits.
No functional changes are expected from this commit.
|
|
In order to get a random distribution of optimized repositories, the
optimization maintenance job does a random walk through all storage
repositories. This random walk is implemented ad-hoc via a recursive
function which is deeply married with the optimization calls. This makes
it hard to extend, reason about and test.
To improve the situation, this commit thus introduces a generic random
filesystem walker implementation which has the same semantics as the
current implementation: given a directory hierarchy, the walker will go
down this hierarchy depth-first and, for each directory it encounters,
process its entries in a randomized order.
The new walker is not yet used.
|
|
While we semantically have two different test cases for the
`OptimizeRepo` maintenance implementation, they are not separated from
each other. This make it hard to reason about those tests as standalone
entities and requires duplicate setup.
Improve the code by using table-driven tests.
|
|
The `OptimizeReposRandomly()` function is currently creating an ad-hoc
source of randomness based on the current Unix timestamp. With this
architecture, the call is hard to test as we cannot make it behave
deterministically.
Refactor the code to instead accept the source of randomness via a
parameter. This slightly semantics given that we never reinitialize the
source anymore as we previously did when `OptimizeReposRandomly` was
called. This is perfectly fine though: we do not care whether we
reinitialize or not -- it should be random enough either way.
|
|
|
|
Git has released the new bugfix release v2.31.1. Upgrade our Makefile
to use that new version by default.
|
|
Git has released the new bugfix release v2.31.1. Upgrade our CI to use
that new version wherever we were using v2.31.0 previously.
|
|
Call for the WriterLevel on the existing logger entry is not concurrently safe
that is why we should create a new instance using the same Logger as a base and
copy all fields that are already tracked by the old logger entry.
|
|
Removal of config.Config from ref package
See merge request gitlab-org/gitaly!3298
|
|
`praefect reconcile` command is unnecessary with repository specific
primaries and doesn't work together with them. There are few reasons
for this:
1. When repository specific primaries are enabled, Praefect expects
to have database records for every repository. As such, the automatic
reconciler can already perform all of the reconciling the manual command
would do. This makes the manual command unnecessary.
2. The manual command works on the level of a storage. Running the command
would very likely result in jobs from secondaries to the primaries. While
this is not harmful as Praefect has checks against downgrading, we should
not produce such jobs.
3. Reconcile automatically figures out the primary of the virtual storage and
uses it as the reference storage. This is not a good approach as the primary
might not even be up to date. More so, there is no single primary when
repository specific primaries are enabled.
Due to the above points, the reconcile subcommand is not needed when repository
specific primaries are enabled. While we should keep it around for backwards
compatibility as long as virtual storage scoped primaries can be enabled, we should
prevent it from being used with repository specific primaries.
|
|
Remove go_user_commit_files feature flag
See merge request gitlab-org/gitaly!3281
|
|
Go port of UserCommitFiles has been enabled in production for over a
month and was defaulted to on in v13.10.0. This commit removes the
feature flag in preparation for removing the Ruby implementation.
|
|
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/2699
|
|
As we are ready to get rid of the global config.Config
variable this change does exact it for the files in the
ref package. It also includes some small refactoring
like adding setup functions, etc.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/2699
|
|
Make pack-objects cache configurable via TOML
See merge request gitlab-org/gitaly!3264
|
|
The way we wire up gRPC services, it is awkward if a gRPC service
server constructor can return an error. This commit refactors
streamcache.New and streamcache.newFilestore to never return an error.
The one thing that could return an error now happens lazily.
Because there is now more synchronization in filestore, this change
includes a refactor where we change to a single mutex for the whole
struct.
|
|
UserUpdateSubmodule: default to Go implementation
See merge request gitlab-org/gitaly!3265
|
|
gitaly-git2go: Ignore nsecs and timezone in tests
Closes #3534
See merge request gitlab-org/gitaly!3295
|
|
git(1) stores commits with a one second precision, and it stores the
timezone offset, but not it's name.
This change sets the nanosecond offset for `DefaultAuthor` to zero, and
the timezone name to an empty string. This makes it possible to use
`require.Equal` to compare a given `git2go.Signature` with a signature
deserialized from an actual commit.
Fixes https://gitlab.com/gitlab-org/gitaly/-/issues/3534.
|
|
Makefile: Switch to git-checkout(1) to allow oldish host git
See merge request gitlab-org/gitaly!3290
|
|
gitaly-git2go: Set correct Commit Date and Author in CherryPick command
Closes #3516
See merge request gitlab-org/gitaly!3274
|
|
We're using git-switch(1) to switch between different revisions when
building either the git or libgit2 targets. git-switch(1) may not be
available on some systems given that it's only been introduced with git
v2.23.0, so building those targets on such systems is broken.
Fix the issue by using the venerable git-checkout(1) command instead.
For our usecase, they're doing the same anyway.
|
|
Removal of the feature flag: distributed_reads
Closes #3383
See merge request gitlab-org/gitaly!3271
|
|
Makefile: Fix libgit2/git builds with old git versions
See merge request gitlab-org/gitaly!3288
|
|
featureflag: Remove per-RPC transactional feature flags
Closes #3310
See merge request gitlab-org/gitaly!3284
|