Age | Commit message (Collapse) | Author |
|
|
|
|
|
PerRepositoryElector's test are missing a case that verifies a healthy
primary is not re-elected. This commit adds such a test case.
|
|
Praefect's PerRepositoryElector runs elections globally when Praefect
launches and when a Gitaly node's health status changed. This approach was
originally taken to match global elections done by the sqlElector as well.
While the sqlElector runs elections after every health check, by default
every 3s, the event driven approach was implemented for the PerRepositoryElector
as it has to perform a lot more work every election run compared to the
sqlElector. The sqlElector has a single primary for each virtual storage
where as the PerRepositoryElector has a primary record for every repository.
While both electors check every repository's generations to pick the best new
primary, only the PerRepositoryElector has to write potentially a large number
of records as well. We can do a lot better though:
1. If the primary is unavailable only temporarily, there's a high chance that
the repository is not even accesed during the outage. If so, there's no need
to eagerly failover as no one would even see the failure.
2. Most of the operations on the repositories are reads. Reads can be served from
any up to date replica without needing to have a primary. Only once an RPC that
requires the primary arrives we care about having a healthy primary.
Given the above, this commit implements a lazy approach to failovers. This removes
the background election loop entirely and elects a primary if needed when an RPC
requires a primary. This happens transparently when getting the primary from the
database. This brings multiple benefits:
1. Perfomance improves as we don't have to perform failovers for repositories which
are not written to during the primary's outage. This reduces the time to perfrom
failovers as we are working on records of a single repository as opposed to all
of the repositories.
2. Failover code is responsive without having to feed it more and more events. This
becomes more relevant as we implement rebalancing features. When moving a repository
with a single replica, we may have to demote the primary temporarily and we want it
to be re-elected as soon as a request needs it and it's possible. Previous approach
would require us hooking more code into the events where as this lazy approach just
works.
3. It's easier to reason about synchronous code rather than asynchronous elections.
Changelog: performance
|
|
PerRepositoryElector's test currently take in function that can be
used to assert the primary. It turns out we don't need this as we're
always just asserting the primary is one of a given set of nodes or that
there is no primary. Replace the function field with just a list of
acceptable primaries. This allows us later to use the information of which
primaries are expected which would currently be unavailable as the information
is contained in the closure.
|
|
wiki: Remove DeletePage RPC
See merge request gitlab-org/gitaly!3453
|
|
Update security merge request template
See merge request gitlab-org/gitaly!3535
|
|
Adjusts security merge request template to use the new changelog
workflow.
Related to gitlab-com/gl-infra/delivery#1767
|
|
UserUpdateSubmodule: Remove dead Ruby code
Closes #3381
See merge request gitlab-org/gitaly!3527
|
|
Update tooling for the new changelog workflow
See merge request gitlab-org/gitaly!3507
|
|
[ci skip]
|
|
This change was first introduced in merge request
https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3508.
Changelog: added
|
|
This change was first introduced in merge request
https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3483.
Changelog: performance
|
|
This change was first introduced in merge request
https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3502.
Changelog: fixed
|
|
This change was first introduced in merge request
https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3516.
Changelog: changed
|
|
Similar to https://gitlab.com/gitlab-org/gitlab/-/merge_requests/62012,
this updates Gitaly's CI tooling for the new changelog workflow.
|
|
Fix running Gitaly tests during interactive rebases
See merge request gitlab-org/gitaly!3524
|
|
[ci skip]
|
|
Since a827d8b77 (operations: Remove update submodule feature flag, 2021-05-06)
the Ruby code was no longer in operation. Thus a release later, now the
Ruby code can be removed.
|
|
Consider primary modified only if a subtransaction was committed
See merge request gitlab-org/gitaly!3494
|
|
Cancel a vote associated with a node that stops waiting for a quorum
See merge request gitlab-org/gitaly!3523
|
|
When a Gitaly has cast its vote for a transactions, it waits until
the transaction reaches a quorum. If the Gitaly stops waiting for the
quorum, it would not commit the changes if the vote is successful.
As such, we should not consider its vote in the quorum anymore as it's
not going to persist the changes. This commit cancels a Gitaly's vote
if it stops waiting for a quorum.
Changelog: changed
|
|
In order to catch cases early where we do not sanitize Git environment
variables passed to us by the outside, this commit adds a canary to our
Makefile by setting `GIT_DIR=/dev/null` for our tests. If any test does
not sanitize envvars, then our spawned Git commands would pick up this
envvar, assume it as their repository path and (hopefully) end up
failing in a controlled way.
This does indeed detect one case where we spawn Git commands but don't
yet sanitize the environment, which is also getting fixed by this
commit.
|
|
With the same reasoning as for the parent commit, unset Git-specific
environment variables in our testhelper.
|
|
Add option to run backups in parallel
See merge request gitlab-org/gitaly!3509
|
|
During an interactive rebase, Git sets up various environment variables
which point to the repository in which the rebase is being executed.
Other Git commands will pick up these envvars and will as a result
operate in that same repository. This is dangerous in our case, given
that we're spawning Git commands in our Makefile which we certainly want
to run in a different directory.
Fix the issue by unexporting these environment variables.
|
|
Test concurrent election runs with PerRepositoryElector
See merge request gitlab-org/gitaly!3512
|
|
|
|
To compute the Gitaly version, we're using plain "git" invocation. Given
that we're modifying PATH to point to a placeholder "git" executable,
this may not work.
Fix the issue by using "${GIT}" instead.
|
|
The Wiki service has functional duplication with the OperationService
RPCs. This means that if the clients are refactored, the wiki RPCs can
be removed. This change removes the DeletePage RPC as the client that
was using it, is no more: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/57106.
The proto is removed too, as the client is unable to call the RPC and
doesn't use it anymore.
|
|
Respect failover disabled config option with per_repository elector
See merge request gitlab-org/gitaly!3491
|
|
ci: Fix code navigation job
See merge request gitlab-org/gitaly!3520
|
|
Disjoint request finalizer timeout from the RPC
Closes #3624
See merge request gitlab-org/gitaly!3515
|
|
featureflag: Remove LogCommandStats feature flag
See merge request gitlab-org/gitaly!3517
|
|
isDone field in the subtransaction indicates whether the subtransaction
is already finished or not. It's set always when closing the doneCh, which
is used to signal to voters waiting for a subtransaction that it's finished.
Given the above, isDone really just duplicates the state of whether the doneCh
has been closed or not. This commit replaces isDone with a method that checks
doneCh's state to remove the duplication.
|
|
Our code navigation CI job is currently broken, where we always get an
Internal Server Error when uploading artifact created by lsif-go. As it
turns out, this breakage is caused by an upstream change which has
caused a change in the output format.
Fix the error by pinning the lsif-go version to v1.3.1, which is the
most recent version which still works without issue.
|
|
remote: Vote when adding and removing remotes
See merge request gitlab-org/gitaly!3508
|
|
Request finalizer currently uses the same context as the RPC. If the
RPC times out, the request finalizer's context is also canceled. This
prevents the request finalizer from updating the database state when
an RPC has timed out. The RPC could have performed some disk modifications
even if it timed out so we have to update the database state. This
commit gives the request finalizer a 30 second timeout independent of the
RPC that triggered the finalizer. This should be long enough to run the
database operations. The database state could still be left unupdated
due to various reasons like Praefect crashing or reaching it's graceful
shutdown limit.
Changelog: fixed
|
|
The command package has a function for suppressing the cancellation
of the parent context. This commit moves the functionality to the `helper`
package as it's not strictly only command related. This replaces the existing
tests with more concise ones that do not use time.Sleep.
|
|
wiki: Remove GetPageVersions RPC
See merge request gitlab-org/gitaly!3490
|
|
The LogCommandStats feature flag has been default-enabled since
21b8bb933 (featureflag: Default enable LogCommandStats, 2021-04-09)
without any issues. Let's remove it altogether.
Changelog: added
|
|
Remove gitaly feature flag gitaly_go_user_revert
See merge request gitlab-org/gitaly!3516
|
|
Vote when reference transaction has been committed
See merge request gitlab-org/gitaly!3514
|
|
Changelog: changed
|
|
|
|
The `AddRemote()` and `RemoveRemote()` RPCs are currently the only ones
which don't cast transactional votes in production. As a result, all
secondaries are always considered as outdated whenever we for example
update object pools or mirrors. Given that these RPCs are typically
executed before these long running actions, the net result is that
secondaries will be out of date most of the time.
Fix the issue by adding transactional voting for `AddRemote()` and
`RemoveRemote()`. In both cases, voting is done on the remote
configuration only, and not on the complete configuration. This may be
inaccurate as the remote may be influenced by other configuration like
for example the `http.<url>.` config entries. But given that we do not
synchronize configuration on replication, it may be risky to instead
vote on the complete configuration as it may have diverged across nodes
and would never recover even after a replication job got processed.
Given that these RPCs are about to be phased out soonish anyway, we just
ignore this issue for now. At least for a subset of fetches, this change
will likely not yet enable transactional behaviour. Instead, we may see
a shift to missing votes in `SetConfig()`, which is used to set up
credentials. We can fix this issue in a follow-up though, especially so
because at that time we'll also need to have replication of the config
to recover from failed votes.
Changelog: added
|
|
Neither the `AddRemote()` nor the `RemoveRemote()` RPCs currently handle
transactional voting. This is about to change in subsequent commits. But
given that it's got some risk involved we want to do it behind a feature
flag, which we introduce now.
|
|
The remote service setup functions do not accept testserver options,
which makes it impossible to override a subset of dependencies. Refactor
them to accept options.
|
|
We're about to start using the transaction manager in the remote
service. Inject it as a preparatory step.
|
|
repository: Enable replication of and voting on gitconfig
See merge request gitlab-org/gitaly!3511
|