Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/reference_architectures/3k_users.md')
-rw-r--r--doc/administration/reference_architectures/3k_users.md228
1 files changed, 205 insertions, 23 deletions
diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md
index ef58e69ee27..f4ae01c7442 100644
--- a/doc/administration/reference_architectures/3k_users.md
+++ b/doc/administration/reference_architectures/3k_users.md
@@ -1,5 +1,4 @@
---
-reading_time: true
stage: Enablement
group: Distribution
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
@@ -24,22 +23,23 @@ For a full list of reference architectures, see
> - **Supported users (approximate):** 3,000
> - **High Availability:** Yes, although [Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution
> - **Test requests per second (RPS) rates:** API: 60 RPS, Web: 6 RPS, Git (Pull): 6 RPS, Git (Push): 1 RPS
+> - **[Latest 3k weekly performance testing results](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/3k)**
| Service | Nodes | Configuration | GCP | AWS | Azure |
|--------------------------------------------|-------------|-----------------------|-----------------|--------------|----------|
-| External load balancing node(3) | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Redis(2) | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
-| Consul(1) + Sentinel(2) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| PostgreSQL(1) | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
-| PgBouncer(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Internal load balancing node(3) | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| External load balancing node<sup>3</sup> | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Redis<sup>2</sup> | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
+| Consul<sup>1</sup> + Sentinel<sup>2</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| PostgreSQL<sup>1</sup> | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
+| PgBouncer<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Internal load balancing node<sup>3</sup> | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
| Gitaly | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
| Praefect | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Praefect PostgreSQL(1) | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Praefect PostgreSQL<sup>1</sup> | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
| Sidekiq | 4 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
| GitLab Rails | 3 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` | `F8s v2` |
| Monitoring node | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Object storage(4) | n/a | n/a | n/a | n/a | n/a |
+| Object storage<sup>4</sup> | n/a | n/a | n/a | n/a | n/a |
| NFS server (optional, not recommended) | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
@@ -47,7 +47,7 @@ For a full list of reference architectures, see
1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
3. Can be optionally run on reputable third-party load balancing services (LB PaaS). AWS ELB is known to work.
-4. Should be run on reputable third party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+4. Should be run on reputable third-party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
<!-- markdownlint-enable MD029 -->
NOTE:
@@ -63,10 +63,7 @@ together {
collections "**Sidekiq** x4" as sidekiq #ff8dd1
}
-together {
- card "**Prometheus + Grafana**" as monitor #7FFFD4
- collections "**Consul** x3" as consul #e76a9b
-}
+card "**Prometheus + Grafana**" as monitor #7FFFD4
card "Gitaly Cluster" as gitaly_cluster {
collections "**Praefect** x3" as praefect #FF8C00
@@ -86,14 +83,15 @@ card "Database" as database {
postgres_primary .[#4EA7FF]> postgres_secondary
}
-card "redis" as redis {
- collections "**Redis Persistent** x3" as redis_persistent #FF6347
- collections "**Redis Cache** x3" as redis_cache #FF6347
- collections "**Redis Persistent Sentinel** x3" as redis_persistent_sentinel #FF6347
- collections "**Redis Cache Sentinel** x3"as redis_cache_sentinel #FF6347
+card "**Consul + Sentinel**" as consul_sentinel {
+ collections "**Consul** x3" as consul #e76a9b
+ collections "**Redis Sentinel** x3" as sentinel #e6e727
+}
- redis_persistent <.[#FF6347]- redis_persistent_sentinel
- redis_cache <.[#FF6347]- redis_cache_sentinel
+card "Redis" as redis {
+ collections "**Redis** x3" as redis_nodes #FF6347
+
+ redis_nodes <.[#FF6347]- sentinel
}
cloud "**Object Storage**" as object_storage #white
@@ -348,7 +346,7 @@ Configure DNS for an alternate SSH hostname such as `altssh.gitlab.example.com`.
The Internal Load Balancer is used to balance any internal connections the GitLab environment requires
such as connections to [PgBouncer](#configure-pgbouncer) and [Praefect](#configure-praefect) (Gitaly Cluster).
-Note that it's a separate node from the External Load Balancer and shouldn't have any access externally.
+It's a separate node from the External Load Balancer and shouldn't have any access externally.
The following IP will be used as an example:
@@ -885,7 +883,10 @@ in the second step, do not supply the `EXTERNAL_URL` value.
patroni['username'] = '<patroni_api_username>'
patroni['password'] = '<patroni_api_password>'
- # Replace XXX.XXX.XXX.XXX/YY with Network Address
+ # Replace 10.6.0.0/24 with Network Addresses for your other patroni nodes
+ patroni['allowlist'] = %w(10.6.0.0/24 127.0.0.1/32)
+
+ # Replace 10.6.0.0/24 with Network Address
postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24 127.0.0.1/32)
# Set the network addresses that the exporters will listen on for monitoring
@@ -2088,12 +2089,193 @@ but with smaller performance requirements, several modifications can be consider
- Combining select nodes: Some nodes can be combined to reduce complexity at the cost of some performance:
- GitLab Rails and Sidekiq: Sidekiq nodes can be removed and the component instead enabled on the GitLab Rails nodes.
- PostgreSQL and PgBouncer: PgBouncer nodes can be removed and the component instead enabled on PostgreSQL with the Internal Load Balancer pointing to them instead.
+- Reducing the node counts: Some node types do not need consensus and can run with fewer nodes (but more than one for redundancy). This will also lead to reduced performance.
+ - GitLab Rails and Sidekiq: Stateless services don't have a minimum node count. Two are enough for redundancy.
+ - Gitaly and Praefect: A quorum is not strictly necessary. Two Gitaly nodes and two Praefect nodes are enough for redundancy.
- Running select components in reputable Cloud PaaS solutions: Select components of the GitLab setup can instead be run on Cloud Provider PaaS solutions. By doing this, additional dependent components can also be removed:
- PostgreSQL: Can be run on reputable Cloud PaaS solutions such as Google Cloud SQL or AWS RDS. In this setup, the PgBouncer and Consul nodes are no longer required:
- Consul may still be desired if [Prometheus](../monitoring/prometheus/index.md) auto discovery is a requirement, otherwise you would need to [manually add scrape configurations](../monitoring/prometheus/index.md#adding-custom-scrape-configurations) for all nodes.
- As Redis Sentinel runs on the same box as Consul in this architecture, it may need to be run on a separate box if Redis is still being run via Omnibus.
- Redis: Can be run on reputable Cloud PaaS solutions such as Google Memorystore and AWS ElastiCache. In this setup, the Redis Sentinel is no longer required.
+## Cloud Native Hybrid reference architecture with Helm Charts (alternative)
+
+As an alternative approach, you can also run select components of GitLab as Cloud Native
+in Kubernetes via our official [Helm Charts](https://docs.gitlab.com/charts/).
+In this setup, we support running the equivalent of GitLab Rails and Sidekiq nodes
+in a Kubernetes cluster, named Webservice and Sidekiq respectively. In addition,
+the following other supporting services are supported: NGINX, Task Runner, Migrations,
+Prometheus, and Grafana.
+
+Hybrid installations leverage the benefits of both cloud native and traditional
+compute deployments. With this, _stateless_ components can benefit from cloud native
+workload management benefits while _stateful_ components are deployed in compute VMs
+with Omnibus to benefit from increased permanence.
+
+NOTE:
+This is an **advanced** setup. Running services in Kubernetes is well known
+to be complex. **This setup is only recommended** if you have strong working
+knowledge and experience in Kubernetes. The rest of this
+section assumes this.
+
+### Cluster topology
+
+The following tables and diagram detail the hybrid environment using the same formats
+as the normal environment above.
+
+First are the components that run in Kubernetes. The recommendation at this time is to
+use Google Cloud’s Kubernetes Engine (GKE) and associated machine types, but the memory
+and CPU requirements should translate to most other providers. We hope to update this in the
+future with further specific cloud provider details.
+
+| Service | Nodes<sup>1</sup> | Configuration | GCP | Allocatable CPUs and Memory |
+|-------------------------------------------------------|-------------------|-------------------------|------------------|-----------------------------|
+| Webservice | 2 | 16 vCPU, 14.4 GB memory | `n1-highcpu-16` | 31.8 vCPU, 24.8 GB memory |
+| Sidekiq | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | 11.8 vCPU, 38.9 GB memory |
+| Supporting services such as NGINX, Prometheus | 2 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | 3.9 vCPU, 11.8 GB memory |
+
+<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
+<!-- markdownlint-disable MD029 -->
+1. Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**.
+ In production deployments there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices.
+<!-- markdownlint-enable MD029 -->
+
+Next are the backend components that run on static compute VMs via Omnibus (or External PaaS
+services where applicable):
+
+| Service | Nodes | Configuration | GCP |
+|--------------------------------------------|-------|-------------------------|------------------|
+| Redis<sup>2</sup> | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` |
+| Consul<sup>1</sup> + Sentinel<sup>2</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| PostgreSQL<sup>1</sup> | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` |
+| PgBouncer<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Internal load balancing node<sup>3</sup> | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Gitaly | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
+| Praefect | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Praefect PostgreSQL<sup>1</sup> | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Object storage<sup>4</sup> | n/a | n/a | n/a |
+
+<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
+<!-- markdownlint-disable MD029 -->
+1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
+2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
+3. Can be optionally run on reputable third-party load balancing services (LB PaaS). AWS ELB is known to work.
+4. Should be run on reputable third-party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+<!-- markdownlint-enable MD029 -->
+
+NOTE:
+For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices.
+
+```plantuml
+@startuml 3k
+
+card "Kubernetes via Helm Charts" as kubernetes {
+ card "**External Load Balancer**" as elb #6a9be7
+
+ together {
+ collections "**Webservice** x2" as gitlab #32CD32
+ collections "**Sidekiq** x3" as sidekiq #ff8dd1
+ }
+
+ card "**Prometheus + Grafana**" as monitor #7FFFD4
+ card "**Supporting Services**" as support
+}
+
+card "**Internal Load Balancer**" as ilb #9370DB
+
+card "**Consul + Sentinel**" as consul_sentinel {
+ collections "**Consul** x3" as consul #e76a9b
+ collections "**Redis Sentinel** x3" as sentinel #e6e727
+}
+
+card "Gitaly Cluster" as gitaly_cluster {
+ collections "**Praefect** x3" as praefect #FF8C00
+ collections "**Gitaly** x3" as gitaly #FF8C00
+ card "**Praefect PostgreSQL***\n//Non fault-tolerant//" as praefect_postgres #FF8C00
+
+ praefect -[#FF8C00]-> gitaly
+ praefect -[#FF8C00]> praefect_postgres
+}
+
+card "Database" as database {
+ collections "**PGBouncer** x3" as pgbouncer #4EA7FF
+ card "**PostgreSQL** (Primary)" as postgres_primary #4EA7FF
+ collections "**PostgreSQL** (Secondary) x2" as postgres_secondary #4EA7FF
+
+ pgbouncer -[#4EA7FF]-> postgres_primary
+ postgres_primary .[#4EA7FF]> postgres_secondary
+}
+
+card "Redis" as redis {
+ collections "**Redis** x3" as redis_nodes #FF6347
+
+ redis_nodes <.[#FF6347]- sentinel
+}
+
+cloud "**Object Storage**" as object_storage #white
+
+elb -[#6a9be7]-> gitlab
+elb -[#6a9be7]-> monitor
+elb -[hidden]-> support
+
+gitlab -[#32CD32]--> ilb
+gitlab -[#32CD32]-> object_storage
+gitlab -[#32CD32]---> redis
+gitlab -[hidden]--> consul
+
+sidekiq -[#ff8dd1]--> ilb
+sidekiq -[#ff8dd1]-> object_storage
+sidekiq -[#ff8dd1]---> redis
+sidekiq -[hidden]--> consul
+
+ilb -[#9370DB]-> gitaly_cluster
+ilb -[#9370DB]-> database
+
+consul .[#e76a9b]-> database
+consul .[#e76a9b]-> gitaly_cluster
+consul .[#e76a9b,norank]--> redis
+
+monitor .[#7FFFD4]> consul
+monitor .[#7FFFD4]-> database
+monitor .[#7FFFD4]-> gitaly_cluster
+monitor .[#7FFFD4,norank]--> redis
+monitor .[#7FFFD4]> ilb
+monitor .[#7FFFD4,norank]u--> elb
+
+@enduml
+```
+
+### Resource usage settings
+
+The following formulas help when calculating how many pods may be deployed within resource constraints.
+The [3k reference architecture example values file](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/3k.yaml)
+documents how to apply the calculated configuration to the Helm Chart.
+
+#### Webservice
+
+Webservice pods typically need about 1 vCPU and 1.25 GB of memory _per worker_.
+Each Webservice pod consumes roughly 4 vCPUs and 5 GB of memory using
+the [recommended topology](#cluster-topology) because four worker processes
+are created by default and each pod has other small processes running.
+
+For 3,000 users we recommend a total Puma worker count of around 16.
+With the [provided recommendations](#cluster-topology) this allows the deployment of up to 4
+Webservice pods with 4 workers per pod and 2 pods per node. Expand available resources using
+the ratio of 1 vCPU to 1.25 GB of memory _per each worker process_ for each additional
+Webservice pod.
+
+For further information on resource usage, see the [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources).
+
+#### Sidekiq
+
+Sidekiq pods should generally have 1 vCPU and 2 GB of memory.
+
+[The provided starting point](#cluster-topology) allows the deployment of up to
+8 Sidekiq pods. Expand available resources using the 1 vCPU to 2GB memory
+ratio for each additional pod.
+
+For further information on resource usage, see the [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources).
+
<div align="right">
<a type="button" class="btn btn-default" href="#setup-components">
Back to setup components <i class="fa fa-angle-double-up" aria-hidden="true"></i>