Age | Commit message (Collapse) | Author |
|
limithandler: Refactor package to not use global state
See merge request gitlab-org/gitaly!4197
|
|
gittest: Accept Gitaly configuration instead of binary paths
See merge request gitlab-org/gitaly!4203
|
|
The PgBouncer-based test job is really flaky, where the symptom of the
flakiness is that we fail to close all connections to the database on
test cleanup. Seemingly, PgBouncer will in some situations keep around
the connection without us being able to close them in the 20 second
timeframe we allow the conection termination to take.
While we could try to increase the timeout even further, this is not a
good way to guarantee that terminations do indeed connect. Actually, we
probably cannot require PgBouncer to really terminate connections only
because we kill them in the Postgres backend itself: PgBouncer may
reconnect automatically or anything else, so we should treat it as a
black box which just does some automagic things we cannot control by
issuing commands against Postgres.
Luckily, PgBouncer has an admin console which allows us to take control
over what it does directly. Instead of trying to terminate connections
via Postgres, we can thus ask PgBouncer to terminate connections by
issuing a `KILL` command, which will immediately terminate all client
and server connections.
Convert the test cleanup logic to do so. Given that we have test setups
where the same database configuration (and thus database) is reused, we
also need to issue a `RESUME` command such that subsequent test cases
will be able to reconnect to the database.
|
|
The functions `testdb.NewDB()` and `testdb.GetDBConfig()` both stutter.
Rename them to `testdb.New()` and `testdb.GetConfig()`, respectively.
|
|
The glsql package hosts both the implementation used to connect to a
"normal" production database as well as the logic to connect to a test
database. Mixing production and test code like this is bad practice
though.
Move test-related functions into the "testdb" package to fix this. This
change requires the glsql tests to use a separate package to break a
cyclic dependency between the "glsql" and "testdb" package. And
furthermore, the linter kicks in after this move and (legitly) complains
about missing error checks and closes for rows.
|
|
The helper function to set up healthy nodes in the test database is
hosted in a file "db.go". This has been fine until now given that it was
the only function in this package anyway. But we're about to move the
logic to set up a test database and connect to it into this package,
too, at which point it becomes a bit awkward that it's hosted in such a
generic file.
Rename the file to "health.go" to prepare for the move.
|
|
The `MockQuerier` structure is not used anywhere. Delete it.
|
|
The helper function which writes a Git hook which checks for object
existence accepts a Git binary path as parameter. Convert it to instead
accept a Gitaly configuration.
|
|
The `CreateCommitInAlternateObjectDirectory()` helper function has been
replaced by a new option for `WriteCommit()` and is now unused. Drop it.
|
|
Rewrite tests which write commits into alternate ODBs to use
`gittest.WriteCommit()`. Note that the tests are converted to use bare
Git repositories because the helper doesn't need a working tree. This is
closer to how repositories would look like in production.
|
|
Rewrite tests which write commits into alternate ODBs to use
`gittest.WriteCommit()`. Note that the tests are converted to use bare
Git repositories because the helper doesn't need a working tree. This is
closer to how repositories would look like in production.
|
|
We have several tests which write commit objects into the alternate
object database. While some hand-code this, most use the helper function
`CreateCommitInAlternateObjectDirectory()`. This helper requires manual
assemlby of the command and is thus unnecessarily verbose to be used.
Add a new option to the `WriteCommit()` helper which allows callers to
specify an alternate ODB such that it can be used instead of the old
helper.
|
|
The object existence helpers exposed by the gittest package have various
problems:
- They stutter due to their "GitObject" prefix.
- They accept the Git binary path instead of a configuration.
- The object ID is an untyped string.
Refactor them to better match our current coding best practices. They're
renamed to `RequireObject{,Not}Exists()`, accept the Gitaly config as
parameter and the object ID is now of type `git.ObjectID`.
|
|
Run tests in parallel, which is an easy thing to do now that the
limithandler package doesn't rely on global state anymore.
|
|
The limithandler package uses global variables to both track Prometheus
metrics and to handle its configuration. Especially the latter is really
fragile, where we need to set up state of the limithandler in various
places by calling global functions.
Fix this by making the `LimiterMiddleware` self-contained, where it
hosts all configuration as well as the Prometheus metrics.
|
|
The limit handler is currently created ad-hoc by Gitaly's server code.
We're about to change the way it hosts Prometheus metrics though such
that we can get rid of a set of global variables, and this will require
us to make sure that there's only a single instance of the limiter.
Inject the limit handler as a dependency when creating the server to
prepare for this.
|
|
The limithandler supports use of different keys to limit concurrency by
via a keyer function. The main function used by production code is
hosted in an unrelated package, mostly because it's where the limiter is
currently initialized.
Move it into the limithandler package to make it available at a central
place.
|
|
The monitor logic is hosted in both the concurrency limiting code and in
the metrics code. Move it into a separate file to make it more
self-contained.
|
|
The functions in the limithandler's middleware implementation are
seemingly ordered at random. Reorder them to group structures and their
receiver functions together. Furthermore, order constructors right after
their struct.
|
|
The limithandler package implements both the concurrency-limiting
primitives as well as the middleware used by gRPC. The files are
confusingly named though, so it's not clear which file hosts which
logic.
Rename the "limithandler" file to "middleware" to clarify.
|
|
praefect: add missing primaries check
See merge request gitlab-org/gitaly!4176
|
|
Add a praefect startup check to get the number of repositories that have
a missing primary. This check fails if there are any repositories
are deemed unavailable.
Changelog: added
|
|
Export the function that queries for unavailable repositories in
preparation for a praefect startup check to utilizie it.
|
|
Our testhelper context has recently gained the ability to verify that
feature flags are exercised as expected. This is an important safety
guard which can help us to avoid releasing untested and thus potentially
buggy code.
Disallow usage of "normal" contexts which don't have this ability such
that we do not introduce their usage by accident.
|
|
We're about to disallow use of "normal" contexts created via the
`context` package given that they do not perform sanity checks for
feature flags. Instead, contexts should be created via the testhelper
package.
Convert tests to use the testhelper context.
|
|
The keyer tests are all dependent on each other, and furthermore they're
written in terms of one long test. This makes it hard to see the
different parts we're trying to test, and furthermore it makes it harder
than necessary to refactor those tests.
Split up these tests into several subtests and have each of them use a
separate cache such that there can be no interdependencies.
|
|
In some contexts, we want to create test contexts which are not
cancellable to exercise certain behaviour in case it is missing. It's
still useful to inject feature flags in these cases though, so one
shouldn't just reach for `context.Background()` in such a case.
Introduce a new function `ContextWithoutCancel()` to address this
usecase.
|
|
There are no context options anymore which would set up a separate
way to cancel the context given that deadlines and similar things are
forbidden nowadays. As a result, the `func()` return value of context
options isn't used at all anymore.
Simplify context creation by dropping the cancel function from context
options.
|
|
The test for `FeatureSet`s is needlessly complex, doing a lot of voodoo
with converting contexts. Simplify the test to instead just assert that
incoming and outgoing metadata are exactly the same such that we can
then proceed to only whether feature flags are set in one of both tests.
|
|
When setting feature flags on a context, then the expectation is that it
will only be set for the child context which is returned from the
function. This doesn't work as expected though: the metadata we retrieve
from the current context is not a copy, so setting any new values on it
will modify the parent context, as well.
Fix this bug by copying the context before setting the key. Somehow,
this bug only seemed to impact incoming contexts: I wasn't able to
demonstrate that this is broken for outgoing contexts, even though they
should in theory suffer from the exact same problem. This commit fixes
the issue for both incoming and outgoing contexts regardless.
Changelog: fixed
|
|
When appending feature flags to a context, the expectation is that the
child context will have the feature flag set, but the parent context
will remain unmodified. This expectation is broken for incoming contexts
though, where setting a feature flag will set it in both contexts.
Add a test to demonstrate that this is broken.
|
|
ssh: Log error on git command failure
Closes #3865
See merge request gitlab-org/gitaly!4173
|
|
bootstrap: Rewrite tests to not use timeouts
See merge request gitlab-org/gitaly!4188
|
|
If the `git upload-pack` process forked by `SSHUploadPack` or
`git upload-archive` from `SSHUploadArchive` exits with a non-zero
return code, the result of executing `stream.Send` to respond to the
client will be returned, causing the request to be seen as successful
by Gitaly.
Clients will correctly receive the non-zero return code, but a status of
`OK` will be logged and the `grpc_server_handled_total` metric will
increment for `grpc_code=OK`.
This commit corrects `SSHUploadPack` and `SSHArchivePack` to return
an error when git fails.
Changelog: fixed
|
|
praefect: replicate immediately after track-repository
Closes #3892 and #3909
See merge request gitlab-org/gitaly!4183
|
|
track-repository is a subcommand that tracks a repository in the
praefect database. This is generally used by an admin when a repository
is on disk but not tracked in the praefect db. Once tracked, replication
to the other nodes takes about 5 minutes and requires the Praefect
process to be running. This MR adds a flag that, if set, will immediately
process a relication job and wait for it to finish.
Changelog: added
|
|
repository: Always enable locking RenameRepository RPC
Closes #3950
See merge request gitlab-org/gitaly!4194
|
|
The bootstrapping tests still need to use timeouts because of the grace
period, which is the duration we give the server to exit before we force
the upgrade. Convert the interface to instead accept tickers such that
we can get rid of using timeouts in our tests.
|
|
Our tests for the bootstrap package are quite complex and really hard to
understand. Furthermore, they're using timeouts everywhere, which slows
down execution of the tests, futher obscures what we actually want to
test, and has the potential for flakiness.
Rewrite the tests to not rely on the HTTP server anymore. This server
was used to simulate slow requests which caused us to exceed the grace
period in some cases, but doing so via an HTTP server is roundabout: we
can just block test execution via channels in a much easier way.
Instead, we now use a set of channels which allow us control when the
upgrader is doing various things, like when the upgrade process starts
or when the parent process supposedly exits. This allows us to directly
exercise what we want to in a much more direct way.
Furthermore, tests are refactored to not use real listeners anymore.
We do not care about whether the listeners work as expected: lifetime
wouldn't be controlled by the bootstrapper anyway, but instead by the
shutdown function that's provided. Using a mock listener thus allows us
to test the effects we want to see on the listeners directly.
|
|
praefect: Support new locking semantics in RemoveRepository handler
See merge request gitlab-org/gitaly!4187
|
|
We have been pushing our RPCs which manage repository locations to
always use locking semantics on the source repository and, if exists, on
the target repository. Like this, we cannot ever have two RPCs which try
to create and/or remove a repository location at the same point in time.
This allows us to properly use transactional voting for these RPCs.
In 0118edab0 (repository: Implement locking for RenameRepository,
2021-12-06), we have introduced locking support for the RenameRepository
RPC. This change has been successfull rolled out to production without
any issues on December 10th, so the change seems to work as intended.
Remove the feature flag and always enable locking semantics for
RenameRepository.
Changelog: changed
|
|
featureflag: Enable atomic repository creation by default
See merge request gitlab-org/gitaly!4181
|
|
With f87bc1e98 (Merge branch 'pks-create-repository-atomic' into
'master', 2021-11-24), we have implemented support for atomic repository
creation. With this mode, creation of repositories will use locking and
meaningful transactional voting to create repos in a race-free manner.
The feature flag guarding the new behaviour has been rolled out on
December 6th, and no issues have surfaced since then. But the change is
rather high-risk given that it changes semantics of some of the RPCs to
return errors in case the target repository exists already, which wasn't
the case previously. To be on the safe side, we thus want to keep this
feature flag for at least one release such that we have an escape hatch
in case there are issues with this change which we haven't yet seen.
Default-enable the flag.
Changelog: changed
|
|
Convert all callsites which perform transactional voting to specify the
phase the vote is being cast in. This information is not yet used by
Praefect yet, but serves as the baseline to make more informed
replication decisions in Praefect's at some point.
Ideally, all sites would either use "prepared" or "committed" as their
phase. But we have some callsites which don't use proper two-phase
voting, which thus cast their votes with "unknown" phase. These will
need to get converted eventually.
|
|
Convert tests to use the new `transaction.TrackingManager`, which simply
records all votes which have been cast.
|
|
We'll need to create votes from hex-encoded strings. Add a helper
function which does this for us.
|
|
A lot of tests need to create a mock transaction manager to track the
transactional votes and assert that we've got the votes we expected.
This is quite repetitive and ultimately requires us to change a lot of
function signatures if the `transaction.Manager` interface changes.
Improve the situation by introducing a new `TrackingManager` which
records all votes.
|
|
Passing in an interface makes for a more flexible signature and allows
for easier testing with other code.
|
|
Tests should always use the testhelper package to create contexts such
that we can benefit from the feature flag sanity checks. Convert backup
tests to do so.
|
|
Use better parallelization for tests by adding `t.Parallel()` calls as
required. Like this, tests drop from 28 seconds of execution time to
only 8 seconds.
|