Age | Commit message (Collapse) | Author |
|
This commit injects the multiplexing handshaker from Praefect's main
to the dialing locations. This allows us to later plug in a backchannel
server easily. This commit has no changes to the functionality itself.
|
|
This commit wires up a PrimaryGetter and Connections into Praefect's
Info server as they'll be needed when converting the methods to support
variable replication factor and repository specific primaries.
ReplicationFactorSetter is also converted into AssignmentStore so the
methods can acccess assignments later.
|
|
This commimt removes serverAccesor definitions from test mocks as
they are unused. The possibility to mock the function is also removed
from the mock to ensure no usage.
|
|
With the server scope being removed, this commit removes a use of a server
scoped RPC in tests and replaces it with a repo scoped one. The actual RPC
doesn't matter for the test case as it is testing RPC authentication failures
which are triggered before the handler is reached.
|
|
Pulls the generation code from a lonely Makefile in to our protobuf
generation Makefile target. This helps keep the generation workflow
the same with every proto file and ensure the generation code works.
|
|
The `testhelper.GetTemporaryGitalySocketFileName()` function currently
returns an error. Given that all callers use `require.NoError()` on that
error, let's convert it to instead receive a `testing.TB` and not return
an error.
|
|
As the strategy of interacting with the repository
changes there is no more need to provide RepositoryStore
into Mgr as a dependency as it has no usage.
This commit removes RepositoryStore from the list of the
input parameters of the constructor function and from the
list of fields of the Mgr struct.
|
|
This commit implements an RPC on Praefect's Info service to allow
setting a repository's replication factor.
|
|
To prepare for removing the MemoryRepositoryStore, this commit
removes its uses in tests. Mostly the tests need something that works,
which is when DisableRepositoryStore is used. When a test is testing
a particular scenario with the RepositoryStore, a mock is provided
instead. Ideally we'd use the Postgres implementation in these cases
but hooking it in requires some additional work as the test setup
overwrites home directory which breaks the discovery of GDK's Postgres.
|
|
It is a next step in including cached storages provider in order to
support reads distribution across gitalies. On each invocation it
queries the passed in dependency and combine the result with existing
primary. The resulted list is used by the manager to decide where request
should be routed for processing. In a follow up MR it will be extended
with expiration cache to reduce load on database as accessing it on each
read operation is not efficient.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/3053
|
|
Routing logic is currently not pluggable as it is part of the coordinator
and depends on the NodeManager. In preparation for variable replication
factor and per repository primary, this commit extracts the routing logic
in to a separate component which allows for plugging in alternative
implementation. There should be no behavior difference, except for eagerly
loading consistent secondaries on repository scoped mutators regardless of
whether transactions are enabled or not. This should be fine though as
transactions are enabled by default.
|
|
Since the introduction of Praefect, our code layout started to become
confusing: while Praefect code lives in `internal/praefect`,
Gitaly-specific code is all over the place and not neatly singled out.
This makes it hard at times to tell apart Praefect- and Gitaly-specific
from generic code.
To improve the situation, this commit thus moves most of the server
specific code into a new `internal/gitaly` package. Currently, this is
the `internal/config`, `internal/server`, `internal/service` and
`internal/rubyserver` packages, which are all main components of Gitaly.
The move was realized with the following script:
#!/bin/sh
mkdir -p internal/gitaly
git mv internal/{config,server,service,rubyserver} internal/gitaly/
find . -name '*.go' -exec sed -i \
-e 's|gitlab-org/gitaly/internal/rubyserver|gitlab-org/gitaly/internal/gitaly/rubyserver|' \
-e 's|gitlab-org/gitaly/internal/server|gitlab-org/gitaly/internal/gitaly/server|' \
-e 's|gitlab-org/gitaly/internal/service|gitlab-org/gitaly/internal/gitaly/service|' \
-e 's|gitlab-org/gitaly/internal/config|gitlab-org/gitaly/internal/gitaly/config|' {} \;
In addition to that, some minor adjustments were needed for tests which
used relative paths.
|
|
Hooks up the error tracker in the node manager so it checks if a certain
backend node has reached a threshold of errors. If it has, then it will
be deemed unhealthy.
|
|
Right now, setup of metrics used in the transaction manager is split
across multiple locations. This makes the process of adding new metrics
more involved than it needs to be and is a source of bugs in case any of
those locations is not updated.
Improve the situation by moving setup of metrics into the transaction
manager. Metrics are exposed by implementing the Collector interface and
registering the transaction manager itself as a metric.
|
|
Retrieve up to date storages that can server read operation for the
repository in order to distribute reads across all healthy storages
of the virtual storage.
Closes: https://gitlab.com/gitlab-org/gitaly/-/issues/2944
|
|
Repository state tracking integration
Closes #2866
See merge request gitlab-org/gitaly!2379
|
|
|
|
Tracking the expected and the actual repository states within a virtual
storage is currently done by searching through the replication queue. This
requires many variables to be taken in to account such as timings between
different jobs and the job history of source nodes. To make the tracking
easier, this commit adds two tables to record the latest state of
repositories across the cluster:
1. `repositories` table contains the expected state of a repository
within a virtual storage.
2. `storage_repositories` table contains the state of the repository
on a given storage that is part of a virtual storage.
Cross-referencing `storage_repositories` with `repositories` makes it
straightforward to figure out repositories which are in the expected
state. If a repository on a storage is not in the expected state,
appropriate corrective actions can be scheduled by diffing the expected
record with the record of the stale storage.
Each repository has a generation number which increases monotonically
for each write. The generation number can be used to deduce whether
the repository has the latest changes or not. The generation number
guarantees the repository is at least on the generation stored but it
may also be on a later generation if an update was partially applied.
To prevent the generation number from referring to outdated data,
repository downgrades are rejected. Generation numbers get propagated
via replication jobs which again guarantee the repository will be at
least on the generation included in the job.
After the upgrade, there won't be any repositories in the tables and
there might be replication jobs which do not have a generation number.
To account for this, the downgrade protection is only applied to
repositories which have a stored generation number, ensuring existing
replication jobs during cluster upgrade are still accepted. As an
upgraded primary receives new writes, the repository entries will be
added to the tables and replication jobs with correct generation numbers
scheduled.
|
|
Scope of the FindRemoteRepository RPC call changed to STORAGE.
Storage value of the incoming message can be changed by new
'SetStorage' method.
Human-readable string for 'Scope' type with default string formatting.
Closes: https://gitlab.com/gitlab-org/gitaly/-/issues/2442
|
|
Introduces server factory for creating gRPC servers.
Praefect gRPC server created by separate function and
can be reused in tests to check routing.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/1698
|
|
After removal of Datastore struct it make sense to move
entities of the models package into config package.
As it is has only a configuration purpose.
As well 'node' configuration removed from Config as it is
not used anymore.
Closes: https://gitlab.com/gitlab-org/gitaly/-/issues/2613
|
|
Datastore is not needed anymore because of introduction of
nodes.NewManager. Now it is responsible for managing nodes.
Also Queue interface removed as unnecessary and replaced
with ReplicationEventQueue.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/2613
|
|
The Registry of proto files free of locks as it
fully initialized by constructor before usage.
Creation of Registry for each test makes no sense
and to make them more consistent the global
protoregistry.GitalyProtoPreregistered is used.
|
|
Random distribution of reads to up to date gitaly nodes.
Up to date status of node verified based on the state of the
replication queue. Backoff strategy is to use primary node
if there are no up to date secondaries or an error occurred.
This feature can be enabled with 'distribution_of_reads_enabled'.
Closes: https://gitlab.com/gitlab-org/gitaly/-/issues/2650
|
|
In order to implement transactions, a new reference transaction service
was created that provides multiple RPCs to register, start and cancel
transactions. While it makes sense to expose the start RPC, it doesn't
really for the other two as they will only ever be used by Praefect,
which is also responsible for hosting the service. Furthermore, it's
hard to actually register and cancel transactions in Praefect if these
are exposed as services, only, as using them would require us to connect
to ourselves.
Revise the design of the transaction service by splitting it up into two
parts:
1. The transaction service, which now only exposes the "start" RPC to
Gitaly nodes. The transaction handling logic has been split out of
the service, bringing us to the second part.
2. The transaction manager. Similar to how e.g. the node manager works,
this is where the actual business logic will take place. The
transaction service gets a manager injected and will call out to it
to serve the "start" RPC. The other two calls which were previously
exposed via RPC will now be called directly on this manager by
Praefect.
With this design, it becomes a lot easier to handle sessions on
Praefect's side.
|
|
Removal of `RPCCredentials` as it was completely the same
implementation as `RPCCredentialsV2`.
Fix all places where `RPCCredentials` was used.
Removal of `v1` token representation check and validation.
Closes: https://gitlab.com/gitlab-org/gitaly/-/issues/2498
|
|
This commit adds the following strategy to enable redundant Praefect
nodes to run simultaenously:
1. Every Praefect node periodically (every second) performs a health
check RPC with a Gitaly node.
2. For each node, Praefect updates a row in a new table (`node_status`)
with the following information:
a. The name of the Praefect instance (`praefect_name`)
b. The name of the virtual storage name (`shard_name`)
c. The name of the Gitaly storage name (`storage_name`)
d. The timestamp of the last time Praefect tried to reach that node
(`last_contact_attempt_at`)
e. The timestamp of the last successful health check (`last_seen_active_at`)
3. Periodically every Praefect node does a `SELECT` from `node_status`
to determine **healthy nodes**. A healthy node is defined by:
a. A node that has a recent successful error check (e.g. one in the last 10 s).
b. A majority of the available Praefect nodes have entries that match the two above.
4. To determine the majority, we use a lightweight service discovery
protocol: a Praefect node is deemed a voting member if the
`praefect_name` has a recent `last_contact_attempt_at` in the
`node_status` table. The name is derived from a combination of the
hostname and listening port/socket.
5. The primary of each shard is listed in the `shard_primaries`. If the
current primary is in the healthy node list, then no election needs to
be done.
6. Otherwise, if there is no primary or it is unhealthy, any Praefect
node can elect a new primary by choosing candidate from the healthy node
list and inserting a row into the table.
Closes https://gitlab.com/gitlab-org/gitaly/-/issues/2547
|
|
Implementation of the replication events queue now can be
switched using `postgres_queue_enabled` between in-memory
and Postgres.
`Datastore` changed from interface to struct as there is no
single struct implementation for it anymore.
Closes: https://gitlab.com/gitlab-org/gitaly/-/issues/2166
|
|
Replication storage interface switched to `ReplicationEventQueue`.
`gitaly_replication_queue` table extended with `meta` column
introduced as a container for meta information such as correlation
ID, etc.
`memoryReplicationEventQueue` now populates `LockID` field to
produce same result as SQL impl.
`ReplicationEventQueueInterceptor` introduced for testing purposes
as well as an interceptor for metrics, etc.
`slice` package created to assemble common operation on different
kind of slices (`Uint64` is first one).
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/2166
|
|
|
|
|
|
hooks up node manager to praefect's coordinator
|
|
|
|
|
|
|
|
Refactored configs so that both praefect and gitaly can share logging
and sentry config structs
|
|
|
|
|
|
|