Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/gitaly/index.md')
-rw-r--r--doc/administration/gitaly/index.md295
1 files changed, 250 insertions, 45 deletions
diff --git a/doc/administration/gitaly/index.md b/doc/administration/gitaly/index.md
index 0af248e0573..bca83e903ac 100644
--- a/doc/administration/gitaly/index.md
+++ b/doc/administration/gitaly/index.md
@@ -30,8 +30,8 @@ repository storage is either:
- A Gitaly storage with direct access to repositories using [storage paths](../repository_storage_paths.md),
where each repository is stored on a single Gitaly node. All requests are routed to this node.
-- A virtual storage provided by [Gitaly Cluster](#gitaly-cluster), where each repository can be
- stored on multiple Gitaly nodes for fault tolerance. In a Gitaly Cluster:
+- A [virtual storage](#virtual-storage) provided by [Gitaly Cluster](#gitaly-cluster), where each
+ repository can be stored on multiple Gitaly nodes for fault tolerance. In a Gitaly Cluster:
- Read requests are distributed between multiple Gitaly nodes, which can improve performance.
- Write requests are broadcast to repository replicas.
@@ -39,32 +39,6 @@ WARNING:
Engineering support for NFS for Git repositories is deprecated. Read the
[deprecation notice](#nfs-deprecation-notice).
-## Virtual storage
-
-Virtual storage makes it viable to have a single repository storage in GitLab to simplify repository
-management.
-
-Virtual storage with Gitaly Cluster can usually replace direct Gitaly storage configurations.
-However, this is at the expense of additional storage space needed to store each repository on multiple
-Gitaly nodes. The benefit of using Gitaly Cluster virtual storage over direct Gitaly storage is:
-
-- Improved fault tolerance, because each Gitaly node has a copy of every repository.
-- Improved resource utilization, reducing the need for over-provisioning for shard-specific peak
- loads, because read loads are distributed across Gitaly nodes.
-- Manual rebalancing for performance is not required, because read loads are distributed across
- Gitaly nodes.
-- Simpler management, because all Gitaly nodes are identical.
-
-The number of repository replicas can be configured using a
-[replication factor](praefect.md#replication-factor).
-
-It can
-be uneconomical to have the same replication factor for all repositories.
-[Variable replication factor](https://gitlab.com/groups/gitlab-org/-/epics/3372) is planned to
-provide greater flexibility for extremely large GitLab instances.
-
-As with normal Gitaly storages, virtual storages can be sharded.
-
## Gitaly
The following shows GitLab set up to use direct access to Gitaly:
@@ -160,7 +134,7 @@ In this example:
- Repositories are stored on a virtual storage called `storage-1`.
- Three Gitaly nodes provide `storage-1` access: `gitaly-1`, `gitaly-2`, and `gitaly-3`.
- The three Gitaly nodes share data in three separate hashed storage locations.
-- The [replication factor](praefect.md#replication-factor) is `3`. There are three copies maintained
+- The [replication factor](#replication-factor) is `3`. There are three copies maintained
of each repository.
The availability objectives for Gitaly clusters are:
@@ -170,7 +144,7 @@ The availability objectives for Gitaly clusters are:
Writes are replicated asynchronously. Any writes that have not been replicated
to the newly promoted primary are lost.
- [Strong consistency](praefect.md#strong-consistency) can be used to avoid loss in some
+ [Strong consistency](#strong-consistency) can be used to avoid loss in some
circumstances.
- **Recovery Time Objective (RTO):** Less than 10 seconds.
@@ -178,20 +152,34 @@ The availability objectives for Gitaly clusters are:
second. Failover requires ten consecutive failed health checks on each
Praefect node.
- [Faster outage detection](https://gitlab.com/gitlab-org/gitaly/-/issues/2608)
- is planned to improve this to less than 1 second.
+ Faster outage detection, to improve this speed to less than 1 second,
+ is tracked [in this issue](https://gitlab.com/gitlab-org/gitaly/-/issues/2608).
+
+### Virtual storage
+
+Virtual storage makes it viable to have a single repository storage in GitLab to simplify repository
+management.
+
+Virtual storage with Gitaly Cluster can usually replace direct Gitaly storage configurations.
+However, this is at the expense of additional storage space needed to store each repository on multiple
+Gitaly nodes. The benefit of using Gitaly Cluster virtual storage over direct Gitaly storage is:
-Gitaly Cluster supports:
+- Improved fault tolerance, because each Gitaly node has a copy of every repository.
+- Improved resource utilization, reducing the need for over-provisioning for shard-specific peak
+ loads, because read loads are distributed across Gitaly nodes.
+- Manual rebalancing for performance is not required, because read loads are distributed across
+ Gitaly nodes.
+- Simpler management, because all Gitaly nodes are identical.
-- [Strong consistency](praefect.md#strong-consistency) of the secondary replicas.
-- [Automatic failover](praefect.md#automatic-failover-and-primary-election-strategies) from the primary to the secondary.
-- Reporting of possible data loss if replication queue is non-empty.
-- From GitLab 13.0 to GitLab 14.0, marking repositories as [read-only](praefect.md#read-only-mode)
- if data loss is detected to prevent data inconsistencies.
+The number of repository replicas can be configured using a
+[replication factor](#replication-factor).
+
+It can
+be uneconomical to have the same replication factor for all repositories.
+To provide greater flexibility for extremely large GitLab instances,
+variable replication factor is tracked in [this issue](https://gitlab.com/groups/gitlab-org/-/epics/3372).
-Follow the [Gitaly Cluster epic](https://gitlab.com/groups/gitlab-org/-/epics/1489)
-for improvements including
-[horizontally distributing reads](https://gitlab.com/groups/gitlab-org/-/epics/2013).
+As with normal Gitaly storages, virtual storages can be sharded.
### Moving beyond NFS
@@ -220,7 +208,7 @@ Further reading:
- Blog post: [The road to Gitaly v1.0 (aka, why GitLab doesn't require NFS for storing Git data anymore)](https://about.gitlab.com/blog/2018/09/12/the-road-to-gitaly-1-0/)
- Blog post: [How we spent two weeks hunting an NFS bug in the Linux kernel](https://about.gitlab.com/blog/2018/11/14/how-we-spent-two-weeks-hunting-an-nfs-bug/)
-### Components of Gitaly Cluster
+### Components
Gitaly Cluster consists of multiple components:
@@ -240,10 +228,227 @@ component for running a Gitaly Cluster.
For more information, see [Gitaly High Availability (HA) Design](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/design_ha.md).
+### Features
+
+Gitaly Cluster provides the following features:
+
+- [Distributed reads](#distributed-reads) among Gitaly nodes.
+- [Strong consistency](#strong-consistency) of the secondary replicas.
+- [Replication factor](#replication-factor) of repositories for increased redundancy.
+- [Automatic failover](praefect.md#automatic-failover-and-primary-election-strategies) from the
+ primary Gitaly node to secondary Gitaly nodes.
+- Reporting of possible [data loss](praefect.md#check-for-data-loss) if replication queue is
+ non-empty.
+
+Follow the [Gitaly Cluster epic](https://gitlab.com/groups/gitlab-org/-/epics/1489) for improvements
+including [horizontally distributing reads](https://gitlab.com/groups/gitlab-org/-/epics/2013).
+
+#### Distributed reads
+
+> - Introduced in GitLab 13.1 in [beta](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga) with feature flag `gitaly_distributed_reads` set to disabled.
+> - [Made generally available and enabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/2951) in GitLab 13.3.
+> - [Disabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/3178) in GitLab 13.5.
+> - [Enabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/3334) in GitLab 13.8.
+> - [Feature flag removed](https://gitlab.com/gitlab-org/gitaly/-/issues/3383) in GitLab 13.11.
+
+Gitaly Cluster supports distribution of read operations across Gitaly nodes that are configured for
+the [virtual storage](#virtual-storage).
+
+All RPCs marked with the `ACCESSOR` option are redirected to an up to date and healthy Gitaly node.
+For example, [`GetBlob`](https://gitlab.com/gitlab-org/gitaly/-/blob/v12.10.6/proto/blob.proto#L16).
+
+_Up to date_ in this context means that:
+
+- There is no replication operations scheduled for this Gitaly node.
+- The last replication operation is in _completed_ state.
+
+The primary node is chosen to serve the request if:
+
+- There are no up to date nodes.
+- Any other error occurs during node selection.
+
+You can [monitor distribution of reads](#monitor-gitaly-cluster) using Prometheus.
+
+#### Strong consistency
+
+> - Introduced in GitLab 13.1 in [alpha](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga), disabled by default.
+> - Entered [beta](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga) in GitLab 13.2, disabled by default.
+> - In GitLab 13.3, disabled unless primary-wins voting strategy is disabled.
+> - From GitLab 13.4, enabled by default.
+> - From GitLab 13.5, you must use Git v2.28.0 or higher on Gitaly nodes to enable strong consistency.
+> - From GitLab 13.6, primary-wins voting strategy and `gitaly_reference_transactions_primary_wins` feature flag were removed from the source code.
+
+By default, Gitaly Cluster guarantees eventual consistency by replicating all writes to secondary
+Gitaly nodes after the write to the primary Gitaly node has happened.
+
+Praefect can instead provide strong consistency by creating a transaction and writing changes to all
+Gitaly nodes at once.
+
+If enabled, transactions are only available for a subset of RPCs. For more information, see the
+[strong consistency epic](https://gitlab.com/groups/gitlab-org/-/epics/1189).
+
+For configuration information, see [Configure strong consistency](praefect.md#configure-strong-consistency).
+
+#### Replication factor
+
+Replication factor is the number of copies Gitaly Cluster maintains of a given repository. A higher
+replication factor:
+
+- Offers better redundancy and distribution of read workload.
+- Results in higher storage cost.
+
+By default, Gitaly Cluster replicates repositories to every storage in a
+[virtual storage](#virtual-storage).
+
+For configuration information, see [Configure replication factor](praefect.md#configure-replication-factor).
+
### Configure Gitaly Cluster
For more information on configuring Gitaly Cluster, see [Configure Gitaly Cluster](praefect.md).
+### Migrate to Gitaly Cluster
+
+Whether migrating to Gitaly Cluster because of [NFS support deprecation](index.md#nfs-deprecation-notice)
+or to move from single Gitaly nodes, the basic process involves:
+
+1. Create the required storage. Refer to
+ [repository storage recommendations](faq.md#what-are-some-repository-storage-recommendations).
+1. Create and configure [Gitaly Cluster](praefect.md).
+1. [Move the repositories](../operations/moving_repositories.md#move-repositories). To migrate to
+ Gitaly Cluster, existing repositories stored outside Gitaly Cluster must be moved. There is no
+ automatic migration but the moves can be scheduled with the GitLab API.
+
+## Monitor Gitaly and Gitaly Cluster
+
+You can use the available logs and [Prometheus metrics](../monitoring/prometheus/index.md) to
+monitor Gitaly and Gitaly Cluster (Praefect).
+
+Metric definitions are available:
+
+- Directly from Prometheus `/metrics` endpoint configured for Gitaly.
+- Using [Grafana Explore](https://grafana.com/docs/grafana/latest/explore/) on a
+ Grafana instance configured against Prometheus.
+
+### Monitor Gitaly
+
+You can observe the behavior of [queued requests](configure_gitaly.md#limit-rpc-concurrency) using
+the Gitaly logs and Prometheus:
+
+- In the [Gitaly logs](../logs.md#gitaly-logs), look for the string (or structured log field)
+ `acquire_ms`. Messages that have this field are reporting about the concurrency limiter.
+- In Prometheus, look for the following metrics:
+ - `gitaly_rate_limiting_in_progress`.
+ - `gitaly_rate_limiting_queued`.
+ - `gitaly_rate_limiting_seconds`.
+
+ Although the name of the Prometheus metric contains `rate_limiting`, it's a concurrency limiter,
+ not a rate limiter. If a Gitaly client makes 1,000 requests in a row very quickly, concurrency
+ doesn't exceed 1, and the concurrency limiter has no effect.
+
+The following [pack-objects cache](configure_gitaly.md#pack-objects-cache) metrics are available:
+
+- `gitaly_pack_objects_cache_enabled`, a gauge set to `1` when the cache is enabled. Available
+ labels: `dir` and `max_age`.
+- `gitaly_pack_objects_cache_lookups_total`, a counter for cache lookups. Available label: `result`.
+- `gitaly_pack_objects_generated_bytes_total`, a counter for the number of bytes written into the
+ cache.
+- `gitaly_pack_objects_served_bytes_total`, a counter for the number of bytes read from the cache.
+- `gitaly_streamcache_filestore_disk_usage_bytes`, a gauge for the total size of cache files.
+ Available label: `dir`.
+- `gitaly_streamcache_index_entries`, a gauge for the number of entries in the cache. Available
+ label: `dir`.
+
+Some of these metrics start with `gitaly_streamcache` because they are generated by the
+`streamcache` internal library package in Gitaly.
+
+Example:
+
+```plaintext
+gitaly_pack_objects_cache_enabled{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache",max_age="300"} 1
+gitaly_pack_objects_cache_lookups_total{result="hit"} 2
+gitaly_pack_objects_cache_lookups_total{result="miss"} 1
+gitaly_pack_objects_generated_bytes_total 2.618649e+07
+gitaly_pack_objects_served_bytes_total 7.855947e+07
+gitaly_streamcache_filestore_disk_usage_bytes{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 2.6200152e+07
+gitaly_streamcache_filestore_removed_total{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 1
+gitaly_streamcache_index_entries{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 1
+```
+
+#### Useful queries
+
+The following are useful queries for monitoring Gitaly:
+
+- Use the following Prometheus query to observe the
+ [type of connections](configure_gitaly.md#enable-tls-support) Gitaly is serving a production
+ environment:
+
+ ```prometheus
+ sum(rate(gitaly_connections_total[5m])) by (type)
+ ```
+
+- Use the following Prometheus query to monitor the
+ [authentication behavior](configure_gitaly.md#observe-type-of-gitaly-connections) of your GitLab
+ installation:
+
+ ```prometheus
+ sum(rate(gitaly_authentications_total[5m])) by (enforced, status)
+ ```
+
+ In a system where authentication is configured correctly and where you have live traffic, you
+ see something like this:
+
+ ```prometheus
+ {enforced="true",status="ok"} 4424.985419441742
+ ```
+
+ There may also be other numbers with rate 0, but you only have to take note of the non-zero numbers.
+
+ The only non-zero number should have `enforced="true",status="ok"`. If you have other non-zero
+ numbers, something is wrong in your configuration.
+
+ The `status="ok"` number reflects your current request rate. In the example above, Gitaly is
+ handling about 4000 requests per second.
+
+- Use the following Prometheus query to observe the [Git protocol versions](../git_protocol.md)
+ being used in a production environment:
+
+ ```prometheus
+ sum(rate(gitaly_git_protocol_requests_total[1m])) by (grpc_method,git_protocol,grpc_service)
+ ```
+
+### Monitor Gitaly Cluster
+
+To monitor Gitaly Cluster (Praefect), you can use these Prometheus metrics:
+
+- `gitaly_praefect_read_distribution`, a counter to track [distribution of reads](#distributed-reads).
+ It has two labels:
+
+ - `virtual_storage`.
+ - `storage`.
+
+ They reflect configuration defined for this instance of Praefect.
+
+- `gitaly_praefect_replication_latency_bucket`, a histogram measuring the amount of time it takes
+ for replication to complete once the replication job starts. Available in GitLab 12.10 and later.
+- `gitaly_praefect_replication_delay_bucket`, a histogram measuring how much time passes between
+ when the replication job is created and when it starts. Available in GitLab 12.10 and later.
+- `gitaly_praefect_node_latency_bucket`, a histogram measuring the latency in Gitaly returning
+ health check information to Praefect. This indicates Praefect connection saturation. Available in
+ GitLab 12.10 and later.
+
+To monitor [strong consistency](#strong-consistency), you can use the following Prometheus metrics:
+
+- `gitaly_praefect_transactions_total`, the number of transactions created and voted on.
+- `gitaly_praefect_subtransactions_per_transaction_total`, the number of times nodes cast a vote for
+ a single transaction. This can happen multiple times if multiple references are getting updated in
+ a single transaction.
+- `gitaly_praefect_voters_per_transaction_total`: the number of Gitaly nodes taking part in a
+ transaction.
+- `gitaly_praefect_transactions_delay_seconds`, the server-side delay introduced by waiting for the
+ transaction to be committed.
+- `gitaly_hook_transaction_voting_delay_seconds`, the client-side delay introduced by waiting for
+ the transaction to be committed.
+
## Do not bypass Gitaly
GitLab doesn't advise directly accessing Gitaly repositories stored on disk with a Git client,
@@ -253,8 +458,8 @@ your assumptions, resulting in performance degradation, instability, and even da
- Gitaly has optimizations such as the [`info/refs` advertisement cache](https://gitlab.com/gitlab-org/gitaly/blob/master/doc/design_diskcache.md),
that rely on Gitaly controlling and monitoring access to repositories by using the official gRPC
interface.
-- [Gitaly Cluster](praefect.md) has optimizations, such as fault tolerance and
- [distributed reads](praefect.md#distributed-reads), that depend on the gRPC interface and database
+- [Gitaly Cluster](#gitaly-cluster) has optimizations, such as fault tolerance and
+ [distributed reads](#distributed-reads), that depend on the gRPC interface and database
to determine repository state.
WARNING:
@@ -367,7 +572,7 @@ Additional information:
GitLab recommends:
- Creating a [Gitaly Cluster](#gitaly-cluster) as soon as possible.
-- [Moving your repositories](praefect.md#migrate-to-gitaly-cluster) from NFS-based storage to Gitaly
+- [Moving your repositories](#migrate-to-gitaly-cluster) from NFS-based storage to Gitaly
Cluster.
We welcome your feedback on this process. You can: