Age | Commit message (Collapse) | Author |
|
While trace2 emits a lot of messages, this change disregards almost all
of them except Git shelling out to a child process. In that case we
capture only the most important details like the SID, arguments, and
exit code.
The trace2.CopyHandler has been renamed, and tests have been updated to
validate the messages in the logs.
Changelog: added
|
|
Git allows a caller to receive metrics and events through `GIT_TRACE` and
relatives; this has been around for a while. If needed, one could enable
these before this change with `GIT_TRACE=1 ./gitaly` on boot. While this
works for debugging, it logs a lot of events and thus is not enabled on
production system.
This change introduces a mechanism to enable the second iteration of the
tracing from Git. Again Git exposes a lot of events and metrics, though
now the there's improvements like setting a custom sink, exposing JSON
over unstructured text, and better context among lines emitted.
For now Gitaly creates a new open file descriptor when a Git command is
spawned and copies the input from it to an io.Writer. This is not yet
useful, but does sets up plumping to capture the events. Further, it
would now be trivial to copy all events to stderr during testing if one
wanted to.
Later changes will introduce mechanisms to analyse the stream, and
expose its data structured to be more useful in the GitLab architecture.
Say through Prometheus, tracing, and/or logging.
|
|
Remove on-by-default gitaly_go_user_update_branch feature flag
See merge request gitlab-org/gitaly!3475
|
|
Remove the the gitaly_go_user_update_branch feature flag that's been
on by default since my 866b08492 (Turn UserUpdateBranch in Go on by
default, 2021-03-23) merged in [1]. This brings us one step[2] closer
to removing this entirely.
As noted in [3] the underlying Ruby code is not being removed in this
commit, we'll have to wait until this change has been out for a while
to avoid the race condition described in that documentation.
1. https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3286
2. https://gitlab.com/gitlab-org/gitaly/-/issues/3472
3. https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/PROCESS.md#two-phase-ruby-to-go-rollouts
Changelog: changed
|
|
Ensure trailers use the right casing
See merge request gitlab-org/gitaly!3558
|
|
[ci skip]
|
|
logging: Drop topLevelGroup field
Closes #3639
See merge request gitlab-org/gitaly!3556
|
|
|
|
Changelog trailers are processed case-sensitively by the API. This
updates Danger so it errors when using incorrect casing, such as
`changelog` instead of `Changelog`.
See https://gitlab.com/gitlab-org/gitlab/-/merge_requests/62915 for more
information.
|
|
Create module v14 gitaly version
See merge request gitlab-org/gitaly!3525
|
|
Update default & secondary Go versions
See merge request gitlab-org/gitaly!3552
|
|
Changelog: changed
|
|
[ci skip]
|
|
[ci skip]
|
|
The topLevelGroup used to be important for logging as GitLab used to
store repositories in a directory structure mimicking the URL path. For
example `gitlab-org/gitaly` used to be stored at
`gitlab-org/gitaly.git`. Since 14.0 GitLab will no longer use the legacy
storage format this field will always read `@hashed`.
If a field is always a constant, why log it? This change stops logging
it.
Changelog: changed
|
|
[ci skip]
|
|
Prevent usage of other election strategies than per_repository
Closes #3574
See merge request gitlab-org/gitaly!3544
|
|
nodes: Mention gitaly in 'ErrPrimaryNotHealthy'
See merge request gitlab-org/gitaly!3551
|
|
transactions: Fail early if the threshold cannot be reached anymore
See merge request gitlab-org/gitaly!3530
|
|
Do not failover to outdated replicas
Closes #3631
See merge request gitlab-org/gitaly!3542
|
|
Gitaly errors messages will be shown to end-users in the web UI.
Without knowing that the error message for 'ErrPrimaryNotHealthy' is
from Praefect, it's difficult for users to understand what 'primary' is
referring to.
Let's mention that Gitaly is the source of the failure to make this less
confusing.
Changelog: changed
|
|
To have less places where the version of the gitaly module
needs to be changed, the module path is defined on the fly
for 'notice' task.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/3177
|
|
The new "v14" version of the Gitaly module is named to match
the next GitLab release. The module versioning is needed in
order to pull gitaly as a dependency in other projects. The
change updates all imports to include v14 version. The go.mod
file was modified as well after go mod tidy execution. And
the changes in dependency licenses are reflected in the NOTICE
file.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/3177
|
|
The path re-writer is the go script to re-write imports
in the go source code files, proto files and go.mod file.
The script accepts path to the project dir where go.mod file
locates, current module version and desired module version.
Upgrading a module requires re-generating the gRPC stubs
from proto file that is why the code of the path re-writer
script is imported in a new 'upgrade-module' task which covers
that need.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/3177
|
|
This commit extracts the logic for getting the valid primaries into a
view for easier reuse. This logic will be reused when implementing lazy
failovers in `praefect dataloss` and the read-only repository metric.
Both of them currently check whether the primary stored in the database
is outdated to determine whether the repository is in read-only mode or
not. This won't be sufficient with lazy failovers anymore. The repository
would only get a new primary if it is accessed while the current primary
is unhealthy and there are valid primary candidates available. As such,
both of these tools will need to check whether there are valid primaries
to failover before denoting the repository as read-only.
The repository_generations view is also extracted and used in the valid_primaries
view. Many queries currently check the storage_repositories records to figure
out what is the latest generation for a given repository. The view deduplicates
this logic.
|
|
When the primary was scoped to a virtual storage, it made sense to
failover to another node immediately. This could enable us to accept
writes again for some repositories while some repositories would still
remain unwritable if the replica on the new primary was outdated. For
the repositories which became writable, this is a clear win. For the
repositories that had an outdated copy on the new primary, this slows
down the process of accepting new writes. If the old primary comes
back again, we'd now have to wait for a replication job to be applied.
With repository-specific primaries we don't have to do this anymore. We
can check per repository whether or not there is a fully up to date, healthy
replica available to act as the new primary. This minimizes unnecessary
primary changes and speeds up recovery when the previous primary eventually
comes back as we can just directly keep using it as the primary.
Additionally, the repository-specific primary elector was previously never
using unassigned replicas as primaries. We don't generally want to do this
as the unassigned replicas are considered extra copies that should be removed.
If the unassigned replica is the only up to date replica, using it as the primary
minimizes the duration when the repository can't accept writes. This is
improved upon here by considering up to date, healthy, unassigned replicas
as valid primaries if there are no up to date, healthy, assigned replicas.
This allows us to temporarily use unassigned replicas as primaries if there
are no assigned replicas to act as the primary. The logic also implies that
as soon as there is an assigned replica that could act as the primary, the
unassigned ones will immediately be demoted and the assigned replica promoted.
While this is not a very common scenario yet, it will be more common when the
assignments are shuffled around. This allows the unassigned replica to act as
the primary until the repository has been moved to the new storage node and is
ready to act as the primary. This behavior allows us rebalance the storages
later without any interruptions to the write availability.
Changelog: changed
|
|
Extract health consensus logic into a view and query for it directly in the primary elector
See merge request gitlab-org/gitaly!3540
|
|
ResolveConflicts: Remove Ruby implementation
Closes #3289
See merge request gitlab-org/gitaly!3546
|
|
This change removes dead code, as this RPC has been rolled out without
feature flag and thus can only call the Go implementation. Now, a
release later, the Ruby code can be removed.
|
|
Praefect's legacy election strategies have been deprecated in 13.12
and are scheduled for removal in 14.0. This commit switches the
`per_repository` to be the default election strategy and ignores the
`failover.election_strategy` configuration key.
As there are still some tests in Gitaly's repo and in GitLab's repo
which are not configured to use the database, there's an additional
configuration key added which allows for still configuring the other
election strategies.
Changelog: removed
|
|
coordinator: Fix Goroutine leaks
See merge request gitlab-org/gitaly!3522
|
|
Align UpdateRemoteMirror with Ruby implementation of the RPC
See merge request gitlab-org/gitaly!3506
|
|
Fix flaky test RropagateReplicationJob
Closes #3622
See merge request gitlab-org/gitaly!3533
|
|
UpdateRemoteMirror pushes to a remote mirror repository. The pushes are
done in batches of 10 refspecs. The Ruby implementation of the RPC ensures
the default branch of the repository is always pushed in the first batch
to ensure anything that relies on the default branch works. The Go port
of the RPC is currently not doing so. This commit fixes the situation by
ensuring the default branch is always part of the first push.
|
|
UpdateRemoteMirror's Ruby implementation returns at maximum 100
divergent refs in the response. This commit changes the Go implementation
to match.
|
|
PushBatchSize constant is exported unnecessarily as it's only used
by UpdateRemoteMirror. This commit unexports it.
|
|
Update gitlab-labkit to 0.17.1 to update pg_query to 2.0.3
Closes #3446
See merge request gitlab-org/gitaly!3395
|
|
Update nokogiri gem from 1.11.1 to 1.11.5
See merge request gitlab-org/gitaly!3534
|
|
ruby: Remove UserCherryPick implementation
Closes #3281
See merge request gitlab-org/gitaly!3541
|
|
[ci skip]
|
|
In [1] the feature flag for the Go implementation is removed, making the
Go implementation the only possible code path for handling the
UserCherryPick RPC.
This was done in version 13.12. In this change, included in 14.0, the
Ruby implementation for this RPC is removed, since it's no longer can be
called.
1. 4ad92cb47 (operations: Drop GoUserCherryPick feature flag,
2021-04-26)
Fixes: https://gitlab.com/gitlab-org/gitaly/-/issues/3281
|
|
PerRepositoryElector currently gets the health consensus from the
HealthManager. This is an unnecessary loop as the primary elector
could query for the consensus directly from the database. This was
originally implemented like this to ease testing and avoid duplicating
the query logic. With the query logic extracted into a view, let's
query the view directly.
|
|
HealthManager currently contains the logic for determining which Gitaly
nodes are considered healthy by the Praefect nodes and which Praefect
nodes are part of the quorum. While in itself the logic works fine, the
consensus is returned from the database and passed in-memory to the
components that require the consensus, namely the primary elector. The
primary elector then runs the elections in a separate database transaction.
In practice, this works ok. In theory, it is possible that the Praefect
nodes perform elections using an outdated view of healthy nodes, which
could result in the primary node flickering unnecessarily.
This commit lays the first steps for reusing the consensus logic in the
queries' by extracting the logic into a view. Using the view, we can
directly get the health consensus in the primary elector without first
bringing it into the memory.
This view will also be needed when implementing the lazy failover logic
in `praefect dataloss` and read-only repository metric. Currently the
repository is considered to be read-only if the primary stored in the
database is outdated. With lazy failovers, the recorded primary being
outdated doesn't mean the repository is currently in read-only mode as
the repository could failover immediately if there's a request to it and
a viable primary exists. To support this use case without duplicating our
query logic, we need to extract the concept of a valid primary into a view.
`healthy_storages` view is going to be a part of that view.
The HealthManager has to now perform two queries on health checks. Combining
the updates to querying the consensus is no longer feasible as the CTE
modifications are not visible in the tables during the same query. To workaround
that limitation, the health checks are first updated and then queried immediately
after. This should work fine as the important thing is to notice changes in the
healths of the Gitaly nodes and trigger the election run. This works fine even
when updating the health checks and querying the consensus is done in different
transactions. PerRepositoryElector being the only consumer of the health
consensus at the moment, we can remove the second step completely once the lazy
failovers are implemented.
|
|
Revert "Makefile: Stop installing binaries into source dir"
Closes gitlab#331758
See merge request gitlab-org/gitaly!3538
|
|
coordinator: Add replication metrics for transactions
See merge request gitlab-org/gitaly!3519
|
|
This reverts commit eb6fd60561cffdbb183e74456268439bad60b21c.
|
|
wiki: Remove DeletePage RPC
See merge request gitlab-org/gitaly!3453
|
|
Update security merge request template
See merge request gitlab-org/gitaly!3535
|
|
Adjusts security merge request template to use the new changelog
workflow.
Related to gitlab-com/gl-infra/delivery#1767
|
|
Changelog: security
Signed-off-by: Takuya Noguchi <takninnovationresearch@gmail.com>
|