Age | Commit message (Collapse) | Author |
|
[ci skip]
|
|
[ci skip]
|
|
'14-2-stable'
datastore: Revert use of materialized views (v14.2)
See merge request gitlab-org/gitaly!4119
|
|
Revert the introduction of materialized views for `valid_primaries`. As
it turns out, the changes cause incompatibilities with Postgres 11,
which is still actively in use. Furthermore, the performance issues we
have seen have not been fully fixed with this change, and we do not yet
fully understand the root cause for this.
Changelog: fixed
|
|
'14-2-stable'
praefect: Backport separate endpoint for datastore collector (v14.2)
See merge request gitlab-org/gitaly!4096
|
|
Materialize valid_primaries view (14.2)
See merge request gitlab-org/gitaly!4088
|
|
RepositoryStoreCollector gathers metrics on repositories which don't
have a valid primary candidates available. This indicates the repository
is unavailable as the current primary is not valid and ther are no valid
candidates to failover to. The query is currently extremely inefficient
on some versions of Postgres as it ends up computing the full valid_primaries
view for each of the rows it checks. This doesn't seem to occur on all versions
of Postgres, namely 12.6 at least manages to push down the search criteria
inside the view. This commit fixes the situation by materializing the
valid_primaries view prior to querying it. This ensures the full view isn't
computed for all of the rows but rather Postgres just uses the pre-computed
result.
Changelog: performance
|
|
Dataloss query is currently getting the latest generation of a repository
from a view that takes the max generation from storage_repositories. This
is unnecessary as the repositories table already contains the latest generation
and we can take it from there instead. This commit reads it from the repositories
table instead.
Changelog: performance
|
|
Our current code path will trigger the RepositoryStoreCollector to query
the database on startup, even if the prometheus listener is not
listening. This is because we call DescribeByCollect in the Describe
method. The Prometheus client will call Describe on Register, which ends
up triggering the Collect method and hence runs the queries. Instead, we
can just provide the decriptions separately from the Collect method.
Changelog: fixed
(cherry picked from commit 90cb7fb7b9f8703547fa62719650394478653c62)
|
|
By default, when metrics are enabled, then each Praefect will expose
information about how many read-only repositories there are, which
requires Praefect to query the database. First, this will result in the
same metrics being exposed by every Praefect given that the database is
shared between all of them. And second, this will cause one query per
Praefect per scraping run. This cost does add up and generate quite some
load on the database, especially so if there is a lot of repositories in
that database, up to a point where it may overload the database
completely.
Fix this issue by splitting metrics which hit the database into a
separate endpoint "/db_metrics". This allows admins to set up a separate
scraper with a different scraping interval for this metric, and
furthermore it gives the ability to only scrape this metric for one of
the Praefect instances so the work isn't unnecessarily duplicated.
Given that this is a breaking change which will get backported, we must
make this behaviour opt-in for now. We thus include a new configuration
key "prometheus_use_database_endpoint" which enables the new behaviour
such that existing installations' metrics won't break on a simple point
release. The intent is to eventually remove this configuration though
and enable it for all setups on a major release.
Changelog: added
(cherry picked from commit 7e74b7333ca6f2d1e55e7a17350cccc7c856c847)
|
|
Praefect uses prometheus to export metrics from inside.
It relies on the defaults from the prometheus library
to gather set of metrics and register a new metrics.
Because of it the new metrics got registered on the
DefaultRegisterer - a global pre-configured registerer.
Because of that we can't call 'run' function multiple
times (for testing purposes) as it results to the metrics
registration error. To omit that problem the 'run' function
extended with prometheus.Registerer parameter that is used
to register praefect custom metrics. The production code
still uses the same DefaultRegisterer as it was before.
And the test code creates a new instance of the registerer
for each 'run' invocation, so there are no more duplicates.
(cherry picked from commit 81368d46df2af8b715b92db8552e4f30cbf8118c)
|
|
The dataloss query is extremely slow for bigger datasets. The problem
is that for each row that the data loss query is returning,
Postgres computes the full result of the valid_primaries view only to
filter down to the correct record. This results in an o(n2) complexity
which kills the performance as soon as the dataset size increases. It's
not clear why the join parameters are not pushed down in to the view in
the query.
This commit optimizes the query by materializing the valid_primaries view.
This ensures Postgres computes the full view only once and joins with the
pre-computed result.
Changelog: performance
|
|
Backport praefect sub-commands to 14.2
See merge request gitlab-org/gitaly!4072
|
|
Adds track-repository subcommand that allows an admin to add a repository that
exists on one of the gitaly nodes to the praefect database. Does some
safety checks before inserting the reords.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/3773
Changelog: added
|
|
The change is a backport of the functionality implemented
to resolve #3792
issue. A new sub-command for the praefect binary. On run it
connects to all gitaly storages set in the configuration file
and receives from each list of the repositories existing on
the disk. Each repository then checked if it exists in the
praefect database and if it is not the location of the
repository is printed out in JSON format to the stdout of the
process.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/3792
Changelog: added
|
|
Users of the praefect are missing tools to manage state of the cluster.
One of such tools is a repository removal cli. It should be used to
remove any repository from the cluster. The removal covers cleanup of
the database and removal from the gitaly storages. The command tries
to remove as much as possible first by removing from praefect, replication
event queue and each gitaly node configured in the provided config file.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/3771
Changelog: added
|
|
|
|
[ci skip]
|
|
[ci skip]
|
|
|
|
[ci skip]
|
|
[ci skip]
|
|
[ci skip]
|
|
[ci skip]
|
|
Derive virtual storage's filesystem id from its name (14.2)
See merge request gitlab-org/gitaly!3838
|
|
Rails tests configure Praefect in front of the tests that exercise
the Rugged direct git access code. As Praefect is now deriving the
filesystem IDs from the names of the virtual storages, the filesystem
id checks fail and thus the test fail. This is not a problem in practice
as one wouldn't use rugged in a real-world setup with Praefect. This
commit worksaround the tests by returning the filesystem ID from the
Gitaly node if a virtual storage has only one Gitaly node configured.
This matches the setup the tests use and thus pass them. The workaround
and the filesystem ID code can be removed in 15.0 once the rugged patches
and NFS support are dropped.
|
|
Gitaly storages contain a UUID filesystem ID that is generated by
the Gitaly for each of its storages. The ID is used to determine
which storages can be accessed by Rails directly when rugged patches
are enabled and to see whether two different storages point to the same
directory when doing repository moves.
When repository moves are performed, the worker first checks whether the
repository's destination and source storage are the same. If they are, the
move is not performed. The check is performed by comparing the filesystem
IDs of the storages'. As Praefect is currently routing the server info RPC
to a random Gitaly node, the filesystem ID can differ between calls as each
of the Gitalys have their own ID. This causes the repository moving worker
to occasionally delete repositories from the virtual storage as it receives
two different IDs on sequential calls.
The filesystem ID can identify cases when two storages refer to the same
directory on a Gitaly node as the id is stored in a file in the storage.
This is not really possible with Praefect. The storage's are only identified
by the virtual storage's name. If the name changes, we can't really correlate
the ID between the different names as Praefect would consider them different
storages. Praefect also supports multiple virtual storages so it's not possible
to generate a single ID and use it for all of the virtual storages. Given this,
the approach taken here is to derive a stable filesystem ID from the virtual
storage's name. This guarantees calls to a given virtual storage always return
the same filesystem ID.
Configuring two storages that point to the same filesystem should be considered
an invalid configuration anyway. Historically, there's been cases when that has
been done for plain Gitalys. This is not done for Praefect and wouldn't work as
Praefect wouldn't find the repositories with an alternative virtual storage name.
With that in mind, we don't have to consider the case where two virtual storages
of different names point to the same backing Gitaly storages.
The use cases for the filesystem ID seem to be limited and we may be able to
remove it in the future once the rugged patches are removed.
Changelog: fixed
|
|
[ci skip]
|
|
[ci skip]
|
|
|
|
[14.2] Only activate Git pack-objects hook if cache is enabled
See merge request gitlab-org/gitaly!3815
|
|
In https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3301, we
dropped the `upload_pack_gitaly_hooks` feature flag because it was
confusing to have to enable this feature flag on top of the pack objects
cache setting in `config.toml`.
However, we have found that spawning the gitaly-hooks environment can
add significant CPU load due to overhead from transferring data over
gRPC and spawning gitaly-hooks processes.
We now only enable this hook if the pack objects cache is enabled.
Relates to https://gitlab.com/gitlab-org/gitaly/-/issues/3754
Changelog: performance
|
|
[ci skip]
|
|
[ci skip]
|
|
Downgrade grpc from 1.38.0 to 1.30.2 (14-2-stable)
See merge request gitlab-org/gitaly!3794
|
|
The newer shared library contains CPU instructions
that are incompatible with some customer machines.
This means we also have to stop building against
Ruby 3 for now since the update was required for
that to work.
Changelog: fixed
|
|
[ci skip]
|
|
[ci skip]
|
|
[ci skip]
|
|
[ci skip]
|
|
[ci skip]
|
|
Simplify tooling around SQL tests
See merge request gitlab-org/gitaly!3764
|
|
Now that our tests by default also include our Postgres tests, running
`make test` always requires a database to be set up. While our normal
test jobs does, the nightly job doesn't and thus always fails right now.
Fix this issue by merging in the Postgres template into the nightly job.
|
|
The test CI job duplicates the Postgres database setup which is alreay
provided by the Postgres template. Let's deduplicate this by including
this template.
|
|
[ci skip]
|
|
operations: Always enable squashing without worktrees
Closes #3713
See merge request gitlab-org/gitaly!3766
|
|
Split up Postgres and PgBouncer templates such that we can reuse the
Postgres template for our jobs without having to include PgBouncer.
|
|
The `test-postgres` target does essentially the same as the normal
`test-go` target nowadays, except that it doesn't generate a report and
that it only runs one specific test. We can thus simplify it to use the
`test-go` target diretcly.
|
|
linguist: Improve error messages
See merge request gitlab-org/gitaly!3763
|
|
Move the helper functions to add and remove worktrees to their only
remaining callsite in UserApplyPatch now that it's not used in
UserSquash anymore.
|