Age | Commit message (Collapse) | Author |
|
|
|
[ci skip]
|
|
[ci skip]
|
|
|
|
[ci skip]
|
|
[ci skip]
|
|
[ci skip]
|
|
[ci skip]
|
|
Remove go.mod excludes directive (pick into 14-10-stable)
See merge request gitlab-org/gitaly!4530
|
|
Remove the exlude directive that prevented us from pulling in a a buggy
version of grpc/grpc-go. We don't use this package anymore, so there's
no reason to keep this directive. Also, newer versions of grpc/grpc-go
has fixed the bug that's mentioned as the reason for the exclude.
Changelog: other
|
|
[ci skip]
|
|
[ci skip]
|
|
|
|
[ci skip]
|
|
[ci skip]
|
|
[ci skip]
|
|
[ci skip]
|
|
[ci skip]
|
|
Expose last verification time in 'praefect metadata'
Closes #4092
See merge request gitlab-org/gitaly!4466
|
|
Administrator's may want to know when Praefect has last verified a
replica. This commit exposes that information via the 'praefect metadata'
command.
Changelog: changed
|
|
GetRepositoryMetadata fetches a repository's metadata from the
database. This commit expands the query to also fetch the newly added
verified_at column so we can expose it in the 'praefect metadata'
command to the admins.
|
|
Administrators may want to know when a replica has been last verified
by Praefect. GetRepositoryMetadata RPC is called by the 'metadata'
sub-command to retrieve infromation about a repository and its
replicas from Praefect's database. This commit adds the proto
definitions for exposing the last verification time of replicas to
the metadata sub-command.
Changelog: changed
|
|
Initial implementation of a metadata verifier
See merge request gitlab-org/gitaly!4459
|
|
This commit wires the metadata verifier in Praefect's main so it can
actually be configured for use. It's default disabled still as it still
is missing some functionality that should be in place before generally
enabling it, for example tooling like metrics, integration in to the
'praefect metadata' tool and a background routine to release stale leases.
Changelog: added
|
|
This commit adds an initial implementation of a metadata verifier
to Praefect.
Praefect stores metadata of the repositories stored on the cluster in
Postgres. These metadata records may become out of sync with the disks
if changes occur on the disks without going through Praefect, for example
due to disk failures or manual modifications. Right now, Praefect only
contains some temporary logic to clean up invalid metadata records when
replication is attempted using a non-existent source repository. This was
mostly put in place to stop reconciliation loops where Praefect keeps
scheduling replication jobs from the non-existent repository that will
never succeed. While this performs some clean up, it's not sufficient to
catch cases where something happens in the background without prompting
replication.
The metadata verifier introduced in this commit aims to catch these issues
by verifying the metadata eveynow and then in the background with the
state on the disks. For now, only the existence of the replica is verified,
not the actual contents by checksumming.
Each replica contains a 'verified_at' timestamp in the database that tells
Praefect when the metadata record was last verified. If it exceeds a configurable
threshold, the replica is considered to be due for reverification. Praefect
then asks the Gitaly hosting the replica whether the replica still exists.
If it doesn't the invalid metadata record is deleted and the removal is logged.
To avoid multiple Praefects verifying the same replica concurrently, Praefect
acquires the verification lease on the replica in the database prior to
verifying the existence of the repository.
The scheduling is fairly simplistic at the moment with each Praefect acquiring
a batch of work every two seconds. This also serves as a crude way to rate
limit the background verification work rather to avoid consuming too many
resources while doing it. This should be sufficient for now althoug could later
be improved.
Praefect leaves the repository's record in place even if all of its replicas
have been lost. This ensures no data loss goes unnoticed and that the loss
needs to be acknowledged by removing the repository manually.
Changelog: added
|
|
This commit adds the necessary schema changes for the metadata
background verification. Each replica receives two new columns:
1. 'verified_at' which contains the timestamp of the last successful
verification of the replica. This effectively allows for identifying
replicas that are in need of reverification.
2. 'verification_leased_until' which contains a timestamp until which
a worker has acquired a lease to reverify the repository. This prevents
multiple workers from picking the same repository for reverification at
the same time.
'verification_queue' index is added to index replicas which have not been
acquired by any worker. This allows for efficientl querying replicas that
are in need of reverification later.
Changelog: other
|
|
[ci skip]
|
|
README: Add link to backpressure video
See merge request gitlab-org/gitaly!4475
|
|
Add a link in the presentations section of the README to a video
explaining the different ways to configure request limits (or
backpressure) in Gitaly.
|
|
Expose command stats (rusage) metrics via prometheus
See merge request gitlab-org/gitaly!4464
|
|
Update activesupport to 6.1.4.7
See merge request gitlab-org/gitaly!4471
|
|
For linguist, we prepend `env` as well as `bundle exec`
to the exec invocation.
For better accounting, we can override this, so that
it shows `git-linguist` instead of `env` as the cmd".
|
|
|
|
Handle DeleteObjectPool calls in Praefect
Closes #3742 and #4078
See merge request gitlab-org/gitaly!4395
|
|
Remove implicit pool creation on link behavior
See merge request gitlab-org/gitaly!4455
|
|
featureflag: Remove TransactionalSymbolicRefUpdates featureflag
See merge request gitlab-org/gitaly!4467
|
|
Add RateLimiting
Closes #4026
See merge request gitlab-org/gitaly!4427
|
|
Add a rate limiting middleware into the interceptor chain for a Gitaly
server.
Changelog: added
|
|
Now that we are adding a second limit handle, adjust the code to allow
for multiple limit handlers to be passed into a server invocation.
|
|
RateLimiter contains a limiter per rpc/repo pair. We don't want this to
grow monotinically since it will incur a heavy memory burden on the
machine. Instead, introduce a background process that looks through the
limiters and removes the ones that have not been used in the past 10
refill intervals.
|
|
Introduce a simple rate limiter that limits the number of requests a
minute that an RPC can allow.
If the feature flag is enabled, the middleware will drop any request
that bursts the per second limit of the RPC. Otherwise, it will only emit
metrics so we can first have some data on the traffic profile.
|
|
[ci skip]
|
|
This feature flag has been set to default on, and deleted from
production. There have been no observable issues, so it's now safe to
fully remove the feature flag.
Changelog: changed
|
|
commit: Add CheckObjectsExist RPC
Closes #3986
See merge request gitlab-org/gitaly!4450
|
|
When pushing commits to a repository, access checks are run. In order to
use the quarantine directory, we need a way to filter out revisions that
a repository already has in the case that a packfile sends over objects
that already exists on the server. In this case, we don't need to check
the access.
Add an RPC that when given a list of revisions, returns the ones that
already exist in the repository, and the ones that do not exist in the
repository.
Changelog: added
|
|
Allow Commit.RawBlame to take a Range parameter
See merge request gitlab-org/gitaly!4433
|
|
Before, concurrency limiting was the only limiting middleware. The name
of the metric had gitaly_rate_limiting in it, which was a bit of a
misnomer since rate was never part of the equation. Now that we are
actually adding a rate limiter to Gitaly, the concurrency metrics will
be mistaken for rate limiting metrics.
Change the name of these by replacing rate_limiting with
concurrency_limiting.
Changelog: changed
|
|
A future commit will add a new middleware that will limit based on the
rate rather than concurrent calls. There is a good amount of logic
currently used by the concurrency limiter that can be reused since a
rate limiter is also operating on incoming requests based on RPC name.
To make easier to add this new limiter type in the future, refactor the
code by adding some abstractions easier to add another type of limiter.
|
|
To prepare for a rate limiting middleware, add a struct to support
configuring a rate limiter. Behind the scenes, we are using
golang.org/x/time/rate package, which implements rate limiting with a
concept of a token bucket. There are two relevant values. Burst refers
to the maximum size of the token bucket. For a request to be handled, a
token is retrieved from the token bucket. Once the bucket is empty, no
more requests can be handled.
The token bucket will be refilled to capacity as defined by "Burst"
according to what is set by "Interval".
Changelog: added
|
|
We currently track rusage metrics in logs on a per-RPC basis. This
allows us to get a very fine-grained view into resource attribution.
However, logs often do not lend themselves to corse-grained and long-
term analysis. For this reason it is useful to expose metrics via
prometheus.
By aggregating that data as metrics, we aim to partially close an
observability gap that exists for short-lived processes. The existing
`process-exporter` metrics are severely under-reporting the utilization
of short-lived processes, which gitaly spawns many of.
See also:
- https://gitlab.com/gitlab-com/gl-infra/scalability#1655
This patch introduces a set of `gitaly_command_*` metrics which
provide aggregated resource attribution along the following
dimensions:
- `cmd` - the basename of the command being executed.
- `subcmd` - an optional subcommand, e.g. `archive` for `git archive`
- `grpc_service` - the grpc service caller
- `grpc_method` - the grpc method caller
The newly introduced metrics are:
- `gitaly_command_cpu_seconds_total` Sum of CPU time spent
- `gitaly_command_real_seconds_total` Sum of real time spent
- `gitaly_command_minor_page_faults_total` Sum of minor page faults
- `gitaly_command_major_page_faults_total` Sum of major page faults
- `gitaly_command_signals_received_total` Sum of signals received
- `gitaly_command_context_switches_total` Sum of context switches
This feature is being introduced behind a feature flag. However,
since metrics are sticky, once the metric has been defined, it
will be returned by the process until the next restart.
The cardinality of the metrics should be relatively well-bounded
in any case.
|