Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/reference_architectures')
-rw-r--r--doc/administration/reference_architectures/10k_users.md111
-rw-r--r--doc/administration/reference_architectures/1k_users.md1
-rw-r--r--doc/administration/reference_architectures/25k_users.md111
-rw-r--r--doc/administration/reference_architectures/2k_users.md165
-rw-r--r--doc/administration/reference_architectures/3k_users.md228
-rw-r--r--doc/administration/reference_architectures/50k_users.md111
-rw-r--r--doc/administration/reference_architectures/5k_users.md85
-rw-r--r--doc/administration/reference_architectures/index.md2
8 files changed, 571 insertions, 243 deletions
diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md
index 1fc3483fbd4..f9397e6dbca 100644
--- a/doc/administration/reference_architectures/10k_users.md
+++ b/doc/administration/reference_architectures/10k_users.md
@@ -1,5 +1,4 @@
---
-reading_time: true
stage: Enablement
group: Distribution
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
@@ -14,33 +13,34 @@ full list of reference architectures, see
> - **Supported users (approximate):** 10,000
> - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA)
> - **Test requests per second (RPS) rates:** API: 200 RPS, Web: 20 RPS, Git (Pull): 20 RPS, Git (Push): 4 RPS
-
-| Service | Nodes | Configuration | GCP | AWS | Azure |
-|--------------------------------------------|-------------|-------------------------|------------------|--------------|-----------|
-| External load balancing node(3) | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Consul(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| PostgreSQL(1) | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` | `m5.2xlarge` | `D8s v3` |
-| PgBouncer(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Internal load balancing node(3) | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Redis - Cache(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| Redis - Queues / Shared State(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| Redis Sentinel - Cache(2) | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `c5.large` | `A1 v2` |
-| Redis Sentinel - Queues / Shared State(2) | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `c5.large` | `A1 v2` |
-| Gitaly | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | `m5.4xlarge` | `D16s v3` |
-| Praefect | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Praefect PostgreSQL(1) | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| GitLab Rails | 3 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | `c5.9xlarge` | `F32s v2` |
-| Monitoring node | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
-| Object storage(4) | n/a | n/a | n/a | n/a | n/a |
-| NFS server (optional, not recommended) | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
+> - **[Latest 10k weekly performance testing results](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/10k)**
+
+| Service | Nodes | Configuration | GCP | AWS | Azure |
+|-----------------------------------------------------|-------------|-------------------------|------------------|--------------|-----------|
+| External load balancing node<sup>3</sup> | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Consul<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| PostgreSQL<sup>1</sup> | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` | `m5.2xlarge` | `D8s v3` |
+| PgBouncer<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Internal load balancing node<sup>3</sup> | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Redis - Cache<sup>2</sup> | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| Redis - Queues / Shared State<sup>2</sup> | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| Redis Sentinel - Cache<sup>2</sup> | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `c5.large` | `A1 v2` |
+| Redis Sentinel - Queues / Shared State<sup>2</sup> | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `c5.large` | `A1 v2` |
+| Gitaly | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | `m5.4xlarge` | `D16s v3` |
+| Praefect | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Praefect PostgreSQL<sup>1</sup> | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| GitLab Rails | 3 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | `c5.9xlarge` | `F32s v2` |
+| Monitoring node | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
+| Object storage<sup>4</sup> | n/a | n/a | n/a | n/a | n/a |
+| NFS server (optional, not recommended) | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
<!-- markdownlint-disable MD029 -->
1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
3. Can be optionally run on reputable third-party load balancing services (LB PaaS). AWS ELB is known to work.
-4. Should be run on reputable third party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+4. Should be run on reputable third-party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
<!-- markdownlint-enable MD029 -->
NOTE:
@@ -141,7 +141,7 @@ is recommended instead of using NFS. Using an object storage service also
doesn't require you to provision and maintain a node.
It's also worth noting that at this time [Praefect requires its own database server](../gitaly/praefect.md#postgresql) and
-that to achieve full High Availability a third party PostgreSQL database solution will be required.
+that to achieve full High Availability a third-party PostgreSQL database solution will be required.
We hope to offer a built in solutions for these restrictions in the future but in the meantime a non HA PostgreSQL server
can be set up via Omnibus GitLab, which the above specs reflect. Refer to the following issues for more information: [`omnibus-gitlab#5919`](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5919) & [`gitaly#3398`](https://gitlab.com/gitlab-org/gitaly/-/issues/3398)
@@ -345,7 +345,7 @@ Configure DNS for an alternate SSH hostname such as `altssh.gitlab.example.com`.
The Internal Load Balancer is used to balance any internal connections the GitLab environment requires
such as connections to [PgBouncer](#configure-pgbouncer) and [Praefect](#configure-praefect) (Gitaly Cluster).
-Note that it's a separate node from the External Load Balancer and shouldn't have any access externally.
+It's a separate node from the External Load Balancer and shouldn't have any access externally.
The following IP will be used as an example:
@@ -569,12 +569,12 @@ in the second step, do not supply the `EXTERNAL_URL` value.
# Sets `max_replication_slots` to double the number of database nodes.
# Patroni uses one extra slot per node when initiating the replication.
- patroni['postgresql']['max_replication_slots'] = 8
+ patroni['postgresql']['max_replication_slots'] = 6
# Set `max_wal_senders` to one more than the number of replication slots in the cluster.
# This is used to prevent replication from using up all of the
# available database connections.
- patroni['postgresql']['max_wal_senders'] = 9
+ patroni['postgresql']['max_wal_senders'] = 7
# Incoming recommended value for max connections is 500. See https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5691.
patroni['postgresql']['max_connections'] = 500
@@ -601,7 +601,10 @@ in the second step, do not supply the `EXTERNAL_URL` value.
patroni['username'] = '<patroni_api_username>'
patroni['password'] = '<patroni_api_password>'
- # Replace XXX.XXX.XXX.XXX/YY with Network Address
+ # Replace 10.6.0.0/24 with Network Addresses for your other patroni nodes
+ patroni['allowlist'] = %w(10.6.0.0/24 127.0.0.1/32)
+
+ # Replace 10.6.0.0/24 with Network Address
postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24 127.0.0.1/32)
# Set the network addresses that the exporters will listen on for monitoring
@@ -2368,7 +2371,7 @@ in Kubernetes via our official [Helm Charts](https://docs.gitlab.com/charts/).
In this setup, we support running the equivalent of GitLab Rails and Sidekiq nodes
in a Kubernetes cluster, named Webservice and Sidekiq respectively. In addition,
the following other supporting services are supported: NGINX, Task Runner, Migrations,
-Prometheus and Grafana.
+Prometheus, and Grafana.
Hybrid installations leverage the benefits of both cloud native and traditional
compute deployments. With this, _stateless_ components can benefit from cloud native
@@ -2379,23 +2382,23 @@ NOTE:
This is an **advanced** setup. Running services in Kubernetes is well known
to be complex. **This setup is only recommended** if you have strong working
knowledge and experience in Kubernetes. The rest of this
-section will assume this.
+section assumes this.
### Cluster topology
-The following tables and diagram details the hybrid environment using the same formats
+The following tables and diagram detail the hybrid environment using the same formats
as the normal environment above.
-First starting with the components that run in Kubernetes. The recommendations at this
-time use Google Cloud’s Kubernetes Engine (GKE) and associated machine types, but the memory
+First are the components that run in Kubernetes. The recommendation at this time is to
+use Google Cloud’s Kubernetes Engine (GKE) and associated machine types, but the memory
and CPU requirements should translate to most other providers. We hope to update this in the
future with further specific cloud provider details.
-| Service | Nodes(1) | Configuration | GCP | Allocatable CPUs and Memory |
-|-------------------------------------------------------|----------|-------------------------|------------------|-----------------------------|
-| Webservice | 4 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | 127.5 vCPU, 118 GB memory |
-| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | 15.5 vCPU, 50 GB memory |
-| Supporting services such as NGINX or Prometheus | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory |
+| Service | Nodes<sup>1</sup> | Configuration | GCP | Allocatable CPUs and Memory |
+|-------------------------------------------------------|-------------------|-------------------------|------------------|-----------------------------|
+| Webservice | 4 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | 127.5 vCPU, 118 GB memory |
+| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | 15.5 vCPU, 50 GB memory |
+| Supporting services such as NGINX, Prometheus | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory |
<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
<!-- markdownlint-disable MD029 -->
@@ -2406,27 +2409,27 @@ future with further specific cloud provider details.
Next are the backend components that run on static compute VMs via Omnibus (or External PaaS
services where applicable):
-| Service | Nodes | Configuration | GCP |
-|--------------------------------------------|-------|-------------------------|------------------|
-| Consul(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
-| PostgreSQL(1) | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` |
-| PgBouncer(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
-| Internal load balancing node(3) | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
-| Redis - Cache(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
-| Redis - Queues / Shared State(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
-| Redis Sentinel - Cache(2) | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
-| Redis Sentinel - Queues / Shared State(2) | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
-| Gitaly | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` |
-| Praefect | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
-| Praefect PostgreSQL(1) | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
-| Object storage(4) | n/a | n/a | n/a |
+| Service | Nodes | Configuration | GCP |
+|-----------------------------------------------------|-------|-------------------------|------------------|
+| Consul<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| PostgreSQL<sup>1</sup> | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` |
+| PgBouncer<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Internal load balancing node<sup>3</sup> | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Redis - Cache<sup>2</sup> | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
+| Redis - Queues / Shared State<sup>2</sup> | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
+| Redis Sentinel - Cache<sup>2</sup> | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
+| Redis Sentinel - Queues / Shared State<sup>2</sup> | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
+| Gitaly | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` |
+| Praefect | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Praefect PostgreSQL<sup>1</sup> | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Object storage<sup>4</sup> | n/a | n/a | n/a |
<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
<!-- markdownlint-disable MD029 -->
1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
3. Can be optionally run on reputable third-party load balancing services (LB PaaS). AWS ELB is known to work.
-4. Should be run on reputable third party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+4. Should be run on reputable third-party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
<!-- markdownlint-enable MD029 -->
NOTE:
@@ -2520,11 +2523,11 @@ documents how to apply the calculated configuration to the Helm Chart.
#### Webservice
Webservice pods typically need about 1 vCPU and 1.25 GB of memory _per worker_.
-Each Webservice pod will consume roughly 4 vCPUs and 5 GB of memory using
+Each Webservice pod consumes roughly 4 vCPUs and 5 GB of memory using
the [recommended topology](#cluster-topology) because four worker processes
are created by default and each pod has other small processes running.
-For 10k users we recommend a total Puma worker count of around 80.
+For 10,000 users we recommend a total Puma worker count of around 80.
With the [provided recommendations](#cluster-topology) this allows the deployment of up to 20
Webservice pods with 4 workers per pod and 5 pods per node. Expand available resources using
the ratio of 1 vCPU to 1.25 GB of memory _per each worker process_ for each additional
diff --git a/doc/administration/reference_architectures/1k_users.md b/doc/administration/reference_architectures/1k_users.md
index 6ad4e0f05e3..18e34711953 100644
--- a/doc/administration/reference_architectures/1k_users.md
+++ b/doc/administration/reference_architectures/1k_users.md
@@ -19,6 +19,7 @@ many organizations .
> - **High Availability:** No. For a highly-available environment, you can
> follow a modified [3K reference architecture](3k_users.md#supported-modifications-for-lower-user-counts-ha).
> - **Test requests per second (RPS) rates:** API: 20 RPS, Web: 2 RPS, Git (Pull): 2 RPS, Git (Push): 1 RPS
+> - **[Latest 1k weekly performance testing results](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/1k)**
| Users | Configuration | GCP | AWS | Azure |
|--------------|-------------------------|----------------|--------------|----------|
diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md
index e45a8f6963c..65d59422da8 100644
--- a/doc/administration/reference_architectures/25k_users.md
+++ b/doc/administration/reference_architectures/25k_users.md
@@ -1,5 +1,4 @@
---
-reading_time: true
stage: Enablement
group: Distribution
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
@@ -14,33 +13,34 @@ full list of reference architectures, see
> - **Supported users (approximate):** 25,000
> - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA)
> - **Test requests per second (RPS) rates:** API: 500 RPS, Web: 50 RPS, Git (Pull): 50 RPS, Git (Push): 10 RPS
-
-| Service | Nodes | Configuration | GCP | AWS | Azure |
-|------------------------------------------|-------------|-------------------------|------------------|--------------|-----------|
-| External load balancing node(3) | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
-| Consul(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| PostgreSQL(1) | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | `m5.4xlarge` | `D16s v3` |
-| PgBouncer(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Internal load balancing node(3) | 1 | 4 vCPU, 3.6GB memory | `n1-highcpu-4` | `c5.large` | `F2s v2` |
-| Redis - Cache(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| Redis - Queues / Shared State(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| Redis Sentinel - Cache(2) | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `c5.large` | `A1 v2` |
-| Redis Sentinel - Queues / Shared State(2)| 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `c5.large` | `A1 v2` |
-| Gitaly | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` | `m5.8xlarge` | `D32s v3` |
-| Praefect | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
-| Praefect PostgreSQL(1) | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| GitLab Rails | 5 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | `c5.9xlarge` | `F32s v2` |
-| Monitoring node | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
-| Object storage(4) | n/a | n/a | n/a | n/a | n/a |
-| NFS server (optional, not recommended) | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
+> - **[Latest 25k weekly performance testing results](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/25k)**
+
+| Service | Nodes | Configuration | GCP | AWS | Azure |
+|---------------------------------------------------|-------------|-------------------------|------------------|--------------|-----------|
+| External load balancing node<sup>3</sup> | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
+| Consul<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| PostgreSQL<sup>1</sup> | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | `m5.4xlarge` | `D16s v3` |
+| PgBouncer<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Internal load balancing node<sup>3</sup> | 1 | 4 vCPU, 3.6GB memory | `n1-highcpu-4` | `c5.large` | `F2s v2` |
+| Redis - Cache<sup>2</sup> | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| Redis - Queues / Shared State<sup>2</sup> | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| Redis Sentinel - Cache<sup>2</sup> | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `c5.large` | `A1 v2` |
+| Redis Sentinel - Queues / Shared State<sup>2</sup>| 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `c5.large` | `A1 v2` |
+| Gitaly | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` | `m5.8xlarge` | `D32s v3` |
+| Praefect | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
+| Praefect PostgreSQL<sup>1</sup> | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| GitLab Rails | 5 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | `c5.9xlarge` | `F32s v2` |
+| Monitoring node | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
+| Object storage<sup>4</sup> | n/a | n/a | n/a | n/a | n/a |
+| NFS server (optional, not recommended) | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
<!-- markdownlint-disable MD029 -->
1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
3. Can be optionally run on reputable third-party load balancing services (LB PaaS). AWS ELB is known to work.
-4. Should be run on reputable third party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+4. Should be run on reputable third-party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
<!-- markdownlint-enable MD029 -->
NOTE:
@@ -141,7 +141,7 @@ is recommended instead of using NFS. Using an object storage service also
doesn't require you to provision and maintain a node.
It's also worth noting that at this time [Praefect requires its own database server](../gitaly/praefect.md#postgresql) and
-that to achieve full High Availability a third party PostgreSQL database solution will be required.
+that to achieve full High Availability a third-party PostgreSQL database solution will be required.
We hope to offer a built in solutions for these restrictions in the future but in the meantime a non HA PostgreSQL server
can be set up via Omnibus GitLab, which the above specs reflect. Refer to the following issues for more information: [`omnibus-gitlab#5919`](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5919) & [`gitaly#3398`](https://gitlab.com/gitlab-org/gitaly/-/issues/3398)
@@ -347,7 +347,7 @@ Configure DNS for an alternate SSH hostname such as `altssh.gitlab.example.com`.
The Internal Load Balancer is used to balance any internal connections the GitLab environment requires
such as connections to [PgBouncer](#configure-pgbouncer) and [Praefect](#configure-praefect) (Gitaly Cluster).
-Note that it's a separate node from the External Load Balancer and shouldn't have any access externally.
+It's a separate node from the External Load Balancer and shouldn't have any access externally.
The following IP will be used as an example:
@@ -571,12 +571,12 @@ in the second step, do not supply the `EXTERNAL_URL` value.
# Sets `max_replication_slots` to double the number of database nodes.
# Patroni uses one extra slot per node when initiating the replication.
- patroni['postgresql']['max_replication_slots'] = 8
+ patroni['postgresql']['max_replication_slots'] = 6
# Set `max_wal_senders` to one more than the number of replication slots in the cluster.
# This is used to prevent replication from using up all of the
# available database connections.
- patroni['postgresql']['max_wal_senders'] = 9
+ patroni['postgresql']['max_wal_senders'] = 7
# Incoming recommended value for max connections is 500. See https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5691.
patroni['postgresql']['max_connections'] = 500
@@ -603,7 +603,10 @@ in the second step, do not supply the `EXTERNAL_URL` value.
patroni['username'] = '<patroni_api_username>'
patroni['password'] = '<patroni_api_password>'
- # Replace XXX.XXX.XXX.XXX/YY with Network Address
+ # Replace 10.6.0.0/24 with Network Addresses for your other patroni nodes
+ patroni['allowlist'] = %w(10.6.0.0/24 127.0.0.1/32)
+
+ # Replace 10.6.0.0/24 with Network Address
postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24 127.0.0.1/32)
# Set the network addresses that the exporters will listen on for monitoring
@@ -2380,7 +2383,7 @@ in Kubernetes via our official [Helm Charts](https://docs.gitlab.com/charts/).
In this setup, we support running the equivalent of GitLab Rails and Sidekiq nodes
in a Kubernetes cluster, named Webservice and Sidekiq respectively. In addition,
the following other supporting services are supported: NGINX, Task Runner, Migrations,
-Prometheus and Grafana.
+Prometheus, and Grafana.
Hybrid installations leverage the benefits of both cloud native and traditional
compute deployments. With this, _stateless_ components can benefit from cloud native
@@ -2391,23 +2394,23 @@ NOTE:
This is an **advanced** setup. Running services in Kubernetes is well known
to be complex. **This setup is only recommended** if you have strong working
knowledge and experience in Kubernetes. The rest of this
-section will assume this.
+section assumes this.
### Cluster topology
-The following tables and diagram details the hybrid environment using the same formats
+The following tables and diagram detail the hybrid environment using the same formats
as the normal environment above.
-First starting with the components that run in Kubernetes. The recommendations at this
-time use Google Cloud’s Kubernetes Engine (GKE) and associated machine types, but the memory
+First are the components that run in Kubernetes. The recommendation at this time is to
+use Google Cloud’s Kubernetes Engine (GKE) and associated machine types, but the memory
and CPU requirements should translate to most other providers. We hope to update this in the
future with further specific cloud provider details.
-| Service | Nodes(1) | Configuration | GCP | Allocatable CPUs and Memory |
-|-------------------------------------------------------|----------|-------------------------|------------------|-----------------------------|
-| Webservice | 7 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | 223 vCPU, 206.5 GB memory |
-| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | 15.5 vCPU, 50 GB memory |
-| Supporting services such as NGINX, Prometheus, etc. | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory |
+| Service | Nodes<sup>1</sup> | Configuration | GCP | Allocatable CPUs and Memory |
+|-------------------------------------------------------|-------------------|-------------------------|------------------|-----------------------------|
+| Webservice | 7 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | 223 vCPU, 206.5 GB memory |
+| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | 15.5 vCPU, 50 GB memory |
+| Supporting services such as NGINX, Prometheus | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory |
<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
<!-- markdownlint-disable MD029 -->
@@ -2418,27 +2421,27 @@ future with further specific cloud provider details.
Next are the backend components that run on static compute VMs via Omnibus (or External PaaS
services where applicable):
-| Service | Nodes | Configuration | GCP |
-|--------------------------------------------|-------|-------------------------|------------------|
-| Consul(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
-| PostgreSQL(1) | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` |
-| PgBouncer(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
-| Internal load balancing node(3) | 1 | 4 vCPU, 3.6GB memory | `n1-highcpu-4` |
-| Redis - Cache(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
-| Redis - Queues / Shared State(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
-| Redis Sentinel - Cache(2) | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
-| Redis Sentinel - Queues / Shared State(2) | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
-| Gitaly | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` |
-| Praefect | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` |
-| Praefect PostgreSQL(1) | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
-| Object storage(4) | n/a | n/a | n/a |
+| Service | Nodes | Configuration | GCP |
+|-----------------------------------------------------|-------|-------------------------|------------------|
+| Consul<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| PostgreSQL<sup>1</sup> | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` |
+| PgBouncer<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Internal load balancing node<sup>3</sup> | 1 | 4 vCPU, 3.6GB memory | `n1-highcpu-4` |
+| Redis - Cache<sup>2</sup> | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
+| Redis - Queues / Shared State<sup>2</sup> | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
+| Redis Sentinel - Cache<sup>2</sup> | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
+| Redis Sentinel - Queues / Shared State<sup>2</sup> | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
+| Gitaly | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` |
+| Praefect | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` |
+| Praefect PostgreSQL<sup>1</sup> | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Object storage<sup>4</sup> | n/a | n/a | n/a |
<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
<!-- markdownlint-disable MD029 -->
1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
3. Can be optionally run on reputable third-party load balancing services (LB PaaS). AWS ELB is known to work.
-4. Should be run on reputable third party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+4. Should be run on reputable third-party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
<!-- markdownlint-enable MD029 -->
NOTE:
@@ -2532,11 +2535,11 @@ documents how to apply the calculated configuration to the Helm Chart.
#### Webservice
Webservice pods typically need about 1 vCPU and 1.25 GB of memory _per worker_.
-Each Webservice pod will consume roughly 4 vCPUs and 5 GB of memory using
+Each Webservice pod consumes roughly 4 vCPUs and 5 GB of memory using
the [recommended topology](#cluster-topology) because four worker processes
are created by default and each pod has other small processes running.
-For 25k users we recommend a total Puma worker count of around 140.
+For 25,000 users we recommend a total Puma worker count of around 140.
With the [provided recommendations](#cluster-topology) this allows the deployment of up to 35
Webservice pods with 4 workers per pod and 5 pods per node. Expand available resources using
the ratio of 1 vCPU to 1.25 GB of memory _per each worker process_ for each additional
diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md
index ff3db877553..0af4dbc8a7f 100644
--- a/doc/administration/reference_architectures/2k_users.md
+++ b/doc/administration/reference_architectures/2k_users.md
@@ -1,5 +1,4 @@
---
-reading_time: true
stage: Enablement
group: Distribution
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
@@ -15,23 +14,24 @@ For a full list of reference architectures, see
> - **High Availability:** No. For a highly-available environment, you can
> follow a modified [3K reference architecture](3k_users.md#supported-modifications-for-lower-user-counts-ha).
> - **Test requests per second (RPS) rates:** API: 40 RPS, Web: 4 RPS, Git (Pull): 4 RPS, Git (Push): 1 RPS
+> - **[Latest 2k weekly performance testing results](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/2k)**
| Service | Nodes | Configuration | GCP | AWS | Azure |
|------------------------------------------|--------|-------------------------|-----------------|--------------|----------|
-| Load balancer(3) | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| PostgreSQL(1) | 1 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
-| Redis(2) | 1 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `m5.large` | `D2s v3` |
+| Load balancer<sup>3</sup> | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| PostgreSQL<sup>1</sup> | 1 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
+| Redis<sup>2</sup> | 1 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `m5.large` | `D2s v3` |
| Gitaly | 1 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
| GitLab Rails | 2 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` | `F8s v2` |
| Monitoring node | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Object storage(4) | n/a | n/a | n/a | n/a | n/a |
+| Object storage<sup>4</sup> | n/a | n/a | n/a | n/a | n/a |
| NFS server (optional, not recommended) | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
<!-- markdownlint-disable MD029 -->
-1. Can be optionally run on reputable third party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
-2. Can be optionally run as reputable third party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
-3. Can be optionally run as reputable third party load balancing services (LB PaaS). AWS ELB is known to work.
-4. Should be run on reputable third party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
+2. Can be optionally run as reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
+3. Can be optionally run as reputable third-party load balancing services (LB PaaS). AWS ELB is known to work.
+4. Should be run on reputable third-party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
<!-- markdownlint-enable MD029 -->
NOTE:
@@ -444,10 +444,7 @@ To configure the Gitaly server, on the server node you want to use for Gitaly:
1. Edit the Gitaly server node's `/etc/gitlab/gitlab.rb` file to configure
storage paths, enable the network listener, and to configure the token:
- <!--
- updates to following example must also be made at
- https://gitlab.com/gitlab-org/charts/gitlab/blob/master/doc/advanced/external-gitaly/external-omnibus-gitaly.md#configure-omnibus-gitlab
- -->
+ <!-- Updates to following example must also be made at https://gitlab.com/gitlab-org/charts/gitlab/blob/master/doc/advanced/external-gitaly/external-omnibus-gitaly.md#configure-omnibus-gitlab -->
```ruby
# /etc/gitlab/gitlab.rb
@@ -557,10 +554,7 @@ To configure Gitaly with TLS:
1. Edit `/etc/gitlab/gitlab.rb` and add:
- <!--
- updates to following example must also be made at
- https://gitlab.com/gitlab-org/charts/gitlab/blob/master/doc/advanced/external-gitaly/external-omnibus-gitaly.md#configure-omnibus-gitlab
- -->
+ <!-- Updates to following example must also be made at https://gitlab.com/gitlab-org/charts/gitlab/blob/master/doc/advanced/external-gitaly/external-omnibus-gitaly.md#configure-omnibus-gitlab -->
```ruby
gitaly['tls_listen_addr'] = "0.0.0.0:9999"
@@ -973,6 +967,143 @@ Read:
- The [Gitaly and NFS deprecation notice](../gitaly/index.md#nfs-deprecation-notice).
- About the [correct mount options to use](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss).
+## Cloud Native Hybrid reference architecture with Helm Charts (alternative)
+
+As an alternative approach, you can also run select components of GitLab as Cloud Native
+in Kubernetes via our official [Helm Charts](https://docs.gitlab.com/charts/).
+In this setup, we support running the equivalent of GitLab Rails and Sidekiq nodes
+in a Kubernetes cluster, named Webservice and Sidekiq respectively. In addition,
+the following other supporting services are supported: NGINX, Task Runner, Migrations,
+Prometheus, and Grafana.
+
+Hybrid installations leverage the benefits of both cloud native and traditional
+compute deployments. With this, _stateless_ components can benefit from cloud native
+workload management benefits while _stateful_ components are deployed in compute VMs
+with Omnibus to benefit from increased permanence.
+
+The 2,000 reference architecture is not a highly-available setup. To achieve HA, you can follow a modified [3K reference architecture](3k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative).
+
+NOTE:
+This is an **advanced** setup. Running services in Kubernetes is well known
+to be complex. **This setup is only recommended** if you have strong working
+knowledge and experience in Kubernetes. The rest of this
+section assumes this.
+
+### Cluster topology
+
+The following tables and diagram detail the hybrid environment using the same formats
+as the normal environment above.
+
+First are the components that run in Kubernetes. The recommendation at this time is to
+use Google Cloud’s Kubernetes Engine (GKE) and associated machine types, but the memory
+and CPU requirements should translate to most other providers. We hope to update this in the
+future with further specific cloud provider details.
+
+| Service | Nodes<sup>1</sup> | Configuration | GCP | Allocatable CPUs and Memory |
+|-------------------------------------------------------|-------------------|------------------------|-----------------|-----------------------------|
+| Webservice | 3 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | 23.7 vCPU, 16.9 GB memory |
+| Sidekiq | 2 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | 3.9 vCPU, 11.8 GB memory |
+| Supporting services such as NGINX, Prometheus | 2 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | 1.9 vCPU, 5.5 GB memory |
+
+<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
+<!-- markdownlint-disable MD029 -->
+1. Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**.
+ In production deployments there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices.
+<!-- markdownlint-enable MD029 -->
+
+Next are the backend components that run on static compute VMs via Omnibus (or External PaaS
+services where applicable):
+
+| Service | Nodes | Configuration | GCP |
+|--------------------------------------------|-------|-------------------------|------------------|
+| PostgreSQL<sup>1</sup> | 1 | 2 vCPU, 7.5 GB memory | `n1-standard-2` |
+| Redis<sup>2</sup> | 1 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
+| Gitaly | 1 | 4 vCPU, 15 GB memory | `n1-standard-4` |
+| Object storage<sup>3</sup> | n/a | n/a | n/a |
+
+<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
+<!-- markdownlint-disable MD029 -->
+1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
+2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
+3. Should be run on reputable third-party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+<!-- markdownlint-enable MD029 -->
+
+NOTE:
+For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices.
+
+```plantuml
+@startuml 2k
+
+card "Kubernetes via Helm Charts" as kubernetes {
+ card "**External Load Balancer**" as elb #6a9be7
+
+ together {
+ collections "**Webservice** x3" as gitlab #32CD32
+ collections "**Sidekiq** x2" as sidekiq #ff8dd1
+ }
+
+ card "**Prometheus + Grafana**" as monitor #7FFFD4
+ card "**Supporting Services**" as support
+}
+
+card "**Gitaly**" as gitaly #FF8C00
+card "**PostgreSQL**" as postgres #4EA7FF
+card "**Redis**" as redis #FF6347
+cloud "**Object Storage**" as object_storage #white
+
+elb -[#6a9be7]-> gitlab
+elb -[#6a9be7]--> monitor
+
+gitlab -[#32CD32]--> gitaly
+gitlab -[#32CD32]--> postgres
+gitlab -[#32CD32]-> object_storage
+gitlab -[#32CD32]--> redis
+
+sidekiq -[#ff8dd1]--> gitaly
+sidekiq -[#ff8dd1]-> object_storage
+sidekiq -[#ff8dd1]---> postgres
+sidekiq -[#ff8dd1]---> redis
+
+monitor .[#7FFFD4]u-> gitlab
+monitor .[#7FFFD4]-> gitaly
+monitor .[#7FFFD4]-> postgres
+monitor .[#7FFFD4,norank]--> redis
+monitor .[#7FFFD4,norank]u--> elb
+
+@enduml
+```
+
+### Resource usage settings
+
+The following formulas help when calculating how many pods may be deployed within resource constraints.
+The [2k reference architecture example values file](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/2k.yaml)
+documents how to apply the calculated configuration to the Helm Chart.
+
+#### Webservice
+
+Webservice pods typically need about 1 vCPU and 1.25 GB of memory _per worker_.
+Each Webservice pod consumes roughly 2 vCPUs and 2.5 GB of memory using
+the [recommended topology](#cluster-topology) because two worker processes
+are created by default and each pod has other small processes running.
+
+For 2,000 users we recommend a total Puma worker count of around 12.
+With the [provided recommendations](#cluster-topology) this allows the deployment of up to 6
+Webservice pods with 2 workers per pod and 2 pods per node. Expand available resources using
+the ratio of 1 vCPU to 1.25 GB of memory _per each worker process_ for each additional
+Webservice pod.
+
+For further information on resource usage, see the [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources).
+
+#### Sidekiq
+
+Sidekiq pods should generally have 1 vCPU and 2 GB of memory.
+
+[The provided starting point](#cluster-topology) allows the deployment of up to
+2 Sidekiq pods. Expand available resources using the 1 vCPU to 2GB memory
+ratio for each additional pod.
+
+For further information on resource usage, see the [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources).
+
<div align="right">
<a type="button" class="btn btn-default" href="#setup-components">
Back to setup components <i class="fa fa-angle-double-up" aria-hidden="true"></i>
diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md
index ef58e69ee27..f4ae01c7442 100644
--- a/doc/administration/reference_architectures/3k_users.md
+++ b/doc/administration/reference_architectures/3k_users.md
@@ -1,5 +1,4 @@
---
-reading_time: true
stage: Enablement
group: Distribution
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
@@ -24,22 +23,23 @@ For a full list of reference architectures, see
> - **Supported users (approximate):** 3,000
> - **High Availability:** Yes, although [Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution
> - **Test requests per second (RPS) rates:** API: 60 RPS, Web: 6 RPS, Git (Pull): 6 RPS, Git (Push): 1 RPS
+> - **[Latest 3k weekly performance testing results](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/3k)**
| Service | Nodes | Configuration | GCP | AWS | Azure |
|--------------------------------------------|-------------|-----------------------|-----------------|--------------|----------|
-| External load balancing node(3) | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Redis(2) | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
-| Consul(1) + Sentinel(2) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| PostgreSQL(1) | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
-| PgBouncer(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Internal load balancing node(3) | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| External load balancing node<sup>3</sup> | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Redis<sup>2</sup> | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
+| Consul<sup>1</sup> + Sentinel<sup>2</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| PostgreSQL<sup>1</sup> | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
+| PgBouncer<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Internal load balancing node<sup>3</sup> | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
| Gitaly | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
| Praefect | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Praefect PostgreSQL(1) | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Praefect PostgreSQL<sup>1</sup> | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
| Sidekiq | 4 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
| GitLab Rails | 3 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` | `F8s v2` |
| Monitoring node | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Object storage(4) | n/a | n/a | n/a | n/a | n/a |
+| Object storage<sup>4</sup> | n/a | n/a | n/a | n/a | n/a |
| NFS server (optional, not recommended) | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
@@ -47,7 +47,7 @@ For a full list of reference architectures, see
1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
3. Can be optionally run on reputable third-party load balancing services (LB PaaS). AWS ELB is known to work.
-4. Should be run on reputable third party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+4. Should be run on reputable third-party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
<!-- markdownlint-enable MD029 -->
NOTE:
@@ -63,10 +63,7 @@ together {
collections "**Sidekiq** x4" as sidekiq #ff8dd1
}
-together {
- card "**Prometheus + Grafana**" as monitor #7FFFD4
- collections "**Consul** x3" as consul #e76a9b
-}
+card "**Prometheus + Grafana**" as monitor #7FFFD4
card "Gitaly Cluster" as gitaly_cluster {
collections "**Praefect** x3" as praefect #FF8C00
@@ -86,14 +83,15 @@ card "Database" as database {
postgres_primary .[#4EA7FF]> postgres_secondary
}
-card "redis" as redis {
- collections "**Redis Persistent** x3" as redis_persistent #FF6347
- collections "**Redis Cache** x3" as redis_cache #FF6347
- collections "**Redis Persistent Sentinel** x3" as redis_persistent_sentinel #FF6347
- collections "**Redis Cache Sentinel** x3"as redis_cache_sentinel #FF6347
+card "**Consul + Sentinel**" as consul_sentinel {
+ collections "**Consul** x3" as consul #e76a9b
+ collections "**Redis Sentinel** x3" as sentinel #e6e727
+}
- redis_persistent <.[#FF6347]- redis_persistent_sentinel
- redis_cache <.[#FF6347]- redis_cache_sentinel
+card "Redis" as redis {
+ collections "**Redis** x3" as redis_nodes #FF6347
+
+ redis_nodes <.[#FF6347]- sentinel
}
cloud "**Object Storage**" as object_storage #white
@@ -348,7 +346,7 @@ Configure DNS for an alternate SSH hostname such as `altssh.gitlab.example.com`.
The Internal Load Balancer is used to balance any internal connections the GitLab environment requires
such as connections to [PgBouncer](#configure-pgbouncer) and [Praefect](#configure-praefect) (Gitaly Cluster).
-Note that it's a separate node from the External Load Balancer and shouldn't have any access externally.
+It's a separate node from the External Load Balancer and shouldn't have any access externally.
The following IP will be used as an example:
@@ -885,7 +883,10 @@ in the second step, do not supply the `EXTERNAL_URL` value.
patroni['username'] = '<patroni_api_username>'
patroni['password'] = '<patroni_api_password>'
- # Replace XXX.XXX.XXX.XXX/YY with Network Address
+ # Replace 10.6.0.0/24 with Network Addresses for your other patroni nodes
+ patroni['allowlist'] = %w(10.6.0.0/24 127.0.0.1/32)
+
+ # Replace 10.6.0.0/24 with Network Address
postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24 127.0.0.1/32)
# Set the network addresses that the exporters will listen on for monitoring
@@ -2088,12 +2089,193 @@ but with smaller performance requirements, several modifications can be consider
- Combining select nodes: Some nodes can be combined to reduce complexity at the cost of some performance:
- GitLab Rails and Sidekiq: Sidekiq nodes can be removed and the component instead enabled on the GitLab Rails nodes.
- PostgreSQL and PgBouncer: PgBouncer nodes can be removed and the component instead enabled on PostgreSQL with the Internal Load Balancer pointing to them instead.
+- Reducing the node counts: Some node types do not need consensus and can run with fewer nodes (but more than one for redundancy). This will also lead to reduced performance.
+ - GitLab Rails and Sidekiq: Stateless services don't have a minimum node count. Two are enough for redundancy.
+ - Gitaly and Praefect: A quorum is not strictly necessary. Two Gitaly nodes and two Praefect nodes are enough for redundancy.
- Running select components in reputable Cloud PaaS solutions: Select components of the GitLab setup can instead be run on Cloud Provider PaaS solutions. By doing this, additional dependent components can also be removed:
- PostgreSQL: Can be run on reputable Cloud PaaS solutions such as Google Cloud SQL or AWS RDS. In this setup, the PgBouncer and Consul nodes are no longer required:
- Consul may still be desired if [Prometheus](../monitoring/prometheus/index.md) auto discovery is a requirement, otherwise you would need to [manually add scrape configurations](../monitoring/prometheus/index.md#adding-custom-scrape-configurations) for all nodes.
- As Redis Sentinel runs on the same box as Consul in this architecture, it may need to be run on a separate box if Redis is still being run via Omnibus.
- Redis: Can be run on reputable Cloud PaaS solutions such as Google Memorystore and AWS ElastiCache. In this setup, the Redis Sentinel is no longer required.
+## Cloud Native Hybrid reference architecture with Helm Charts (alternative)
+
+As an alternative approach, you can also run select components of GitLab as Cloud Native
+in Kubernetes via our official [Helm Charts](https://docs.gitlab.com/charts/).
+In this setup, we support running the equivalent of GitLab Rails and Sidekiq nodes
+in a Kubernetes cluster, named Webservice and Sidekiq respectively. In addition,
+the following other supporting services are supported: NGINX, Task Runner, Migrations,
+Prometheus, and Grafana.
+
+Hybrid installations leverage the benefits of both cloud native and traditional
+compute deployments. With this, _stateless_ components can benefit from cloud native
+workload management benefits while _stateful_ components are deployed in compute VMs
+with Omnibus to benefit from increased permanence.
+
+NOTE:
+This is an **advanced** setup. Running services in Kubernetes is well known
+to be complex. **This setup is only recommended** if you have strong working
+knowledge and experience in Kubernetes. The rest of this
+section assumes this.
+
+### Cluster topology
+
+The following tables and diagram detail the hybrid environment using the same formats
+as the normal environment above.
+
+First are the components that run in Kubernetes. The recommendation at this time is to
+use Google Cloud’s Kubernetes Engine (GKE) and associated machine types, but the memory
+and CPU requirements should translate to most other providers. We hope to update this in the
+future with further specific cloud provider details.
+
+| Service | Nodes<sup>1</sup> | Configuration | GCP | Allocatable CPUs and Memory |
+|-------------------------------------------------------|-------------------|-------------------------|------------------|-----------------------------|
+| Webservice | 2 | 16 vCPU, 14.4 GB memory | `n1-highcpu-16` | 31.8 vCPU, 24.8 GB memory |
+| Sidekiq | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | 11.8 vCPU, 38.9 GB memory |
+| Supporting services such as NGINX, Prometheus | 2 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | 3.9 vCPU, 11.8 GB memory |
+
+<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
+<!-- markdownlint-disable MD029 -->
+1. Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**.
+ In production deployments there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices.
+<!-- markdownlint-enable MD029 -->
+
+Next are the backend components that run on static compute VMs via Omnibus (or External PaaS
+services where applicable):
+
+| Service | Nodes | Configuration | GCP |
+|--------------------------------------------|-------|-------------------------|------------------|
+| Redis<sup>2</sup> | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` |
+| Consul<sup>1</sup> + Sentinel<sup>2</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| PostgreSQL<sup>1</sup> | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` |
+| PgBouncer<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Internal load balancing node<sup>3</sup> | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Gitaly | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
+| Praefect | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Praefect PostgreSQL<sup>1</sup> | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Object storage<sup>4</sup> | n/a | n/a | n/a |
+
+<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
+<!-- markdownlint-disable MD029 -->
+1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
+2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
+3. Can be optionally run on reputable third-party load balancing services (LB PaaS). AWS ELB is known to work.
+4. Should be run on reputable third-party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+<!-- markdownlint-enable MD029 -->
+
+NOTE:
+For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices.
+
+```plantuml
+@startuml 3k
+
+card "Kubernetes via Helm Charts" as kubernetes {
+ card "**External Load Balancer**" as elb #6a9be7
+
+ together {
+ collections "**Webservice** x2" as gitlab #32CD32
+ collections "**Sidekiq** x3" as sidekiq #ff8dd1
+ }
+
+ card "**Prometheus + Grafana**" as monitor #7FFFD4
+ card "**Supporting Services**" as support
+}
+
+card "**Internal Load Balancer**" as ilb #9370DB
+
+card "**Consul + Sentinel**" as consul_sentinel {
+ collections "**Consul** x3" as consul #e76a9b
+ collections "**Redis Sentinel** x3" as sentinel #e6e727
+}
+
+card "Gitaly Cluster" as gitaly_cluster {
+ collections "**Praefect** x3" as praefect #FF8C00
+ collections "**Gitaly** x3" as gitaly #FF8C00
+ card "**Praefect PostgreSQL***\n//Non fault-tolerant//" as praefect_postgres #FF8C00
+
+ praefect -[#FF8C00]-> gitaly
+ praefect -[#FF8C00]> praefect_postgres
+}
+
+card "Database" as database {
+ collections "**PGBouncer** x3" as pgbouncer #4EA7FF
+ card "**PostgreSQL** (Primary)" as postgres_primary #4EA7FF
+ collections "**PostgreSQL** (Secondary) x2" as postgres_secondary #4EA7FF
+
+ pgbouncer -[#4EA7FF]-> postgres_primary
+ postgres_primary .[#4EA7FF]> postgres_secondary
+}
+
+card "Redis" as redis {
+ collections "**Redis** x3" as redis_nodes #FF6347
+
+ redis_nodes <.[#FF6347]- sentinel
+}
+
+cloud "**Object Storage**" as object_storage #white
+
+elb -[#6a9be7]-> gitlab
+elb -[#6a9be7]-> monitor
+elb -[hidden]-> support
+
+gitlab -[#32CD32]--> ilb
+gitlab -[#32CD32]-> object_storage
+gitlab -[#32CD32]---> redis
+gitlab -[hidden]--> consul
+
+sidekiq -[#ff8dd1]--> ilb
+sidekiq -[#ff8dd1]-> object_storage
+sidekiq -[#ff8dd1]---> redis
+sidekiq -[hidden]--> consul
+
+ilb -[#9370DB]-> gitaly_cluster
+ilb -[#9370DB]-> database
+
+consul .[#e76a9b]-> database
+consul .[#e76a9b]-> gitaly_cluster
+consul .[#e76a9b,norank]--> redis
+
+monitor .[#7FFFD4]> consul
+monitor .[#7FFFD4]-> database
+monitor .[#7FFFD4]-> gitaly_cluster
+monitor .[#7FFFD4,norank]--> redis
+monitor .[#7FFFD4]> ilb
+monitor .[#7FFFD4,norank]u--> elb
+
+@enduml
+```
+
+### Resource usage settings
+
+The following formulas help when calculating how many pods may be deployed within resource constraints.
+The [3k reference architecture example values file](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/3k.yaml)
+documents how to apply the calculated configuration to the Helm Chart.
+
+#### Webservice
+
+Webservice pods typically need about 1 vCPU and 1.25 GB of memory _per worker_.
+Each Webservice pod consumes roughly 4 vCPUs and 5 GB of memory using
+the [recommended topology](#cluster-topology) because four worker processes
+are created by default and each pod has other small processes running.
+
+For 3,000 users we recommend a total Puma worker count of around 16.
+With the [provided recommendations](#cluster-topology) this allows the deployment of up to 4
+Webservice pods with 4 workers per pod and 2 pods per node. Expand available resources using
+the ratio of 1 vCPU to 1.25 GB of memory _per each worker process_ for each additional
+Webservice pod.
+
+For further information on resource usage, see the [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources).
+
+#### Sidekiq
+
+Sidekiq pods should generally have 1 vCPU and 2 GB of memory.
+
+[The provided starting point](#cluster-topology) allows the deployment of up to
+8 Sidekiq pods. Expand available resources using the 1 vCPU to 2GB memory
+ratio for each additional pod.
+
+For further information on resource usage, see the [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources).
+
<div align="right">
<a type="button" class="btn btn-default" href="#setup-components">
Back to setup components <i class="fa fa-angle-double-up" aria-hidden="true"></i>
diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md
index 766f94f6c53..b262545b27d 100644
--- a/doc/administration/reference_architectures/50k_users.md
+++ b/doc/administration/reference_architectures/50k_users.md
@@ -1,5 +1,4 @@
---
-reading_time: true
stage: Enablement
group: Distribution
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
@@ -14,33 +13,34 @@ full list of reference architectures, see
> - **Supported users (approximate):** 50,000
> - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA)
> - **Test requests per second (RPS) rates:** API: 1000 RPS, Web: 100 RPS, Git (Pull): 100 RPS, Git (Push): 20 RPS
-
-| Service | Nodes | Configuration | GCP | AWS | Azure |
-|------------------------------------------|-------------|-------------------------|------------------|---------------|-----------|
-| External load balancing node(3) | 1 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` | `F8s v2` |
-| Consul(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| PostgreSQL(1) | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` | `m5.8xlarge` | `D32s v3` |
-| PgBouncer(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Internal load balancing node(3) | 1 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` | `F8s v2` |
-| Redis - Cache(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| Redis - Queues / Shared State(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| Redis Sentinel - Cache(2) | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `c5.large` | `A1 v2` |
-| Redis Sentinel - Queues / Shared State(2)| 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `c5.large` | `A1 v2` |
-| Gitaly | 3 | 64 vCPU, 240 GB memory | `n1-standard-64` | `m5.16xlarge` | `D64s v3` |
-| Praefect | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
-| Praefect PostgreSQL(1) | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| GitLab Rails | 12 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | `c5.9xlarge` | `F32s v2` |
-| Monitoring node | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
-| Object storage(4) | n/a | n/a | n/a | n/a | n/a |
-| NFS server (optional, not recommended) | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
+> - **[Latest 50k weekly performance testing results](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/50k)**
+
+| Service | Nodes | Configuration | GCP | AWS | Azure |
+|---------------------------------------------------|-------------|-------------------------|------------------|---------------|-----------|
+| External load balancing node<sup>3</sup> | 1 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` | `F8s v2` |
+| Consul<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| PostgreSQL<sup>1</sup> | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` | `m5.8xlarge` | `D32s v3` |
+| PgBouncer<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Internal load balancing node<sup>3</sup> | 1 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` | `F8s v2` |
+| Redis - Cache<sup>2</sup> | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| Redis - Queues / Shared State<sup>2</sup> | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| Redis Sentinel - Cache<sup>2</sup> | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `c5.large` | `A1 v2` |
+| Redis Sentinel - Queues / Shared State<sup>2</sup>| 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `c5.large` | `A1 v2` |
+| Gitaly | 3 | 64 vCPU, 240 GB memory | `n1-standard-64` | `m5.16xlarge` | `D64s v3` |
+| Praefect | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
+| Praefect PostgreSQL<sup>1</sup> | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| GitLab Rails | 12 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | `c5.9xlarge` | `F32s v2` |
+| Monitoring node | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
+| Object storage<sup>4</sup> | n/a | n/a | n/a | n/a | n/a |
+| NFS server (optional, not recommended) | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
<!-- markdownlint-disable MD029 -->
1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
3. Can be optionally run on reputable third-party load balancing services (LB PaaS). AWS ELB is known to work.
-4. Should be run on reputable third party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+4. Should be run on reputable third-party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
<!-- markdownlint-enable MD029 -->
NOTE:
@@ -141,7 +141,7 @@ is recommended instead of using NFS. Using an object storage service also
doesn't require you to provision and maintain a node.
It's also worth noting that at this time [Praefect requires its own database server](../gitaly/praefect.md#postgresql) and
-that to achieve full High Availability a third party PostgreSQL database solution will be required.
+that to achieve full High Availability a third-party PostgreSQL database solution will be required.
We hope to offer a built in solutions for these restrictions in the future but in the meantime a non HA PostgreSQL server
can be set up via Omnibus GitLab, which the above specs reflect. Refer to the following issues for more information: [`omnibus-gitlab#5919`](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5919) & [`gitaly#3398`](https://gitlab.com/gitlab-org/gitaly/-/issues/3398)
@@ -354,7 +354,7 @@ Configure DNS for an alternate SSH hostname such as `altssh.gitlab.example.com`.
The Internal Load Balancer is used to balance any internal connections the GitLab environment requires
such as connections to [PgBouncer](#configure-pgbouncer) and [Praefect](#configure-praefect) (Gitaly Cluster).
-Note that it's a separate node from the External Load Balancer and shouldn't have any access externally.
+It's a separate node from the External Load Balancer and shouldn't have any access externally.
The following IP will be used as an example:
@@ -578,12 +578,12 @@ in the second step, do not supply the `EXTERNAL_URL` value.
# Sets `max_replication_slots` to double the number of database nodes.
# Patroni uses one extra slot per node when initiating the replication.
- patroni['postgresql']['max_replication_slots'] = 8
+ patroni['postgresql']['max_replication_slots'] = 6
# Set `max_wal_senders` to one more than the number of replication slots in the cluster.
# This is used to prevent replication from using up all of the
# available database connections.
- patroni['postgresql']['max_wal_senders'] = 9
+ patroni['postgresql']['max_wal_senders'] = 7
# Incoming recommended value for max connections is 500. See https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5691.
patroni['postgresql']['max_connections'] = 500
@@ -611,7 +611,10 @@ in the second step, do not supply the `EXTERNAL_URL` value.
patroni['username'] = '<patroni_api_username>'
patroni['password'] = '<patroni_api_password>'
- # Replace XXX.XXX.XXX.XXX/YY with Network Address
+ # Replace 10.6.0.0/24 with Network Addresses for your other patroni nodes
+ patroni['allowlist'] = %w(10.6.0.0/24 127.0.0.1/32)
+
+ # Replace 10.6.0.0/24 with Network Address
postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24 127.0.0.1/32)
# Set the network addresses that the exporters will listen on for monitoring
@@ -2391,7 +2394,7 @@ in Kubernetes via our official [Helm Charts](https://docs.gitlab.com/charts/).
In this setup, we support running the equivalent of GitLab Rails and Sidekiq nodes
in a Kubernetes cluster, named Webservice and Sidekiq respectively. In addition,
the following other supporting services are supported: NGINX, Task Runner, Migrations,
-Prometheus and Grafana.
+Prometheus, and Grafana.
Hybrid installations leverage the benefits of both cloud native and traditional
compute deployments. With this, _stateless_ components can benefit from cloud native
@@ -2402,23 +2405,23 @@ NOTE:
This is an **advanced** setup. Running services in Kubernetes is well known
to be complex. **This setup is only recommended** if you have strong working
knowledge and experience in Kubernetes. The rest of this
-section will assume this.
+section assumes this.
### Cluster topology
-The following tables and diagram details the hybrid environment using the same formats
+The following tables and diagram detail the hybrid environment using the same formats
as the normal environment above.
-First starting with the components that run in Kubernetes. The recommendations at this
-time use Google Cloud’s Kubernetes Engine (GKE) and associated machine types, but the memory
+First are the components that run in Kubernetes. The recommendation at this time is to
+use Google Cloud’s Kubernetes Engine (GKE) and associated machine types, but the memory
and CPU requirements should translate to most other providers. We hope to update this in the
future with further specific cloud provider details.
-| Service | Nodes(1) | Configuration | GCP | Allocatable CPUs and Memory |
-|-------------------------------------------------------|----------|-------------------------|------------------|-----------------------------|
-| Webservice | 16 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | 510 vCPU, 472 GB memory |
-| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | 15.5 vCPU, 50 GB memory |
-| Supporting services such as NGINX, Prometheus, etc. | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory |
+| Service | Nodes<sup>1</sup> | Configuration | GCP | Allocatable CPUs and Memory |
+|-------------------------------------------------------|-------------------|-------------------------|------------------|-----------------------------|
+| Webservice | 16 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | 510 vCPU, 472 GB memory |
+| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | 15.5 vCPU, 50 GB memory |
+| Supporting services such as NGINX, Prometheus | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory |
<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
<!-- markdownlint-disable MD029 -->
@@ -2429,27 +2432,27 @@ future with further specific cloud provider details.
Next are the backend components that run on static compute VMs via Omnibus (or External PaaS
services where applicable):
-| Service | Nodes | Configuration | GCP |
-|--------------------------------------------|-------|-------------------------|------------------|
-| Consul(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
-| PostgreSQL(1) | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` |
-| PgBouncer(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
-| Internal load balancing node(3) | 1 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` |
-| Redis - Cache(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
-| Redis - Queues / Shared State(2) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
-| Redis Sentinel - Cache(2) | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
-| Redis Sentinel - Queues / Shared State(2) | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
-| Gitaly | 3 | 64 vCPU, 240 GB memory | `n1-standard-64` |
-| Praefect | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` |
-| Praefect PostgreSQL(1) | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
-| Object storage(4) | n/a | n/a | n/a |
+| Service | Nodes | Configuration | GCP |
+|-----------------------------------------------------|-------|-------------------------|------------------|
+| Consul<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| PostgreSQL<sup>1</sup> | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` |
+| PgBouncer<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Internal load balancing node<sup>3</sup> | 1 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` |
+| Redis - Cache<sup>2</sup> | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
+| Redis - Queues / Shared State<sup>2</sup> | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
+| Redis Sentinel - Cache<sup>2</sup> | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
+| Redis Sentinel - Queues / Shared State<sup>2</sup> | 3 | 1 vCPU, 3.75 GB memory | `n1-standard-1` |
+| Gitaly | 3 | 64 vCPU, 240 GB memory | `n1-standard-64` |
+| Praefect | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` |
+| Praefect PostgreSQL<sup>1</sup> | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Object storage<sup>4</sup> | n/a | n/a | n/a |
<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
<!-- markdownlint-disable MD029 -->
1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
3. Can be optionally run on reputable third-party load balancing services (LB PaaS). AWS ELB is known to work.
-4. Should be run on reputable third party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+4. Should be run on reputable third-party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
<!-- markdownlint-enable MD029 -->
NOTE:
@@ -2543,11 +2546,11 @@ documents how to apply the calculated configuration to the Helm Chart.
#### Webservice
Webservice pods typically need about 1 vCPU and 1.25 GB of memory _per worker_.
-Each Webservice pod will consume roughly 4 vCPUs and 5 GB of memory using
+Each Webservice pod consumes roughly 4 vCPUs and 5 GB of memory using
the [recommended topology](#cluster-topology) because four worker processes
are created by default and each pod has other small processes running.
-For 50k users we recommend a total Puma worker count of around 320.
+For 50,000 users we recommend a total Puma worker count of around 320.
With the [provided recommendations](#cluster-topology) this allows the deployment of up to 80
Webservice pods with 4 workers per pod and 5 pods per node. Expand available resources using
the ratio of 1 vCPU to 1.25 GB of memory _per each worker process_ for each additional
diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md
index e57c4545b13..666d18a66fc 100644
--- a/doc/administration/reference_architectures/5k_users.md
+++ b/doc/administration/reference_architectures/5k_users.md
@@ -1,5 +1,4 @@
---
-reading_time: true
stage: Enablement
group: Distribution
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
@@ -21,22 +20,23 @@ costly-to-operate environment by using the
> - **Supported users (approximate):** 5,000
> - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA)
> - **Test requests per second (RPS) rates:** API: 100 RPS, Web: 10 RPS, Git (Pull): 10 RPS, Git (Push): 2 RPS
+> - **[Latest 5k weekly performance testing results](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/5k)**
| Service | Nodes | Configuration | GCP | AWS | Azure |
|--------------------------------------------|-------------|-------------------------|-----------------|--------------|----------|
-| External load balancing node(3) | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Redis(2) | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
-| Consul(1) + Sentinel(2) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| PostgreSQL(1) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| PgBouncer(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Internal load balancing node(3) | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| External load balancing node<sup>3</sup> | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Redis<sup>2</sup> | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
+| Consul<sup>1</sup> + Sentinel<sup>2</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| PostgreSQL<sup>1</sup> | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| PgBouncer<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Internal load balancing node<sup>3</sup> | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
| Gitaly | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` | `m5.2xlarge` | `D8s v3` |
| Praefect | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Praefect PostgreSQL(1) | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Praefect PostgreSQL<sup>1</sup> | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
| Sidekiq | 4 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
| GitLab Rails | 3 | 16 vCPU, 14.4 GB memory | `n1-highcpu-16` | `c5.4xlarge` | `F16s v2`|
| Monitoring node | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Object storage(4) | n/a | n/a | n/a | n/a | n/a |
+| Object storage<sup>4</sup> | n/a | n/a | n/a | n/a | n/a |
| NFS server (optional, not recommended) | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
@@ -44,7 +44,7 @@ costly-to-operate environment by using the
1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
3. Can be optionally run on reputable third-party load balancing services (LB PaaS). AWS ELB is known to work.
-4. Should be run on reputable third party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+4. Should be run on reputable third-party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
<!-- markdownlint-enable MD029 -->
NOTE:
@@ -80,9 +80,9 @@ card "Database" as database {
postgres_primary .[#4EA7FF]> postgres_secondary
}
-node "**Consul + Sentinel** x3" as consul_sentinel {
- component Consul as consul #e76a9b
- component Sentinel as sentinel #e6e727
+card "**Consul + Sentinel**" as consul_sentinel {
+ collections "**Consul** x3" as consul #e76a9b
+ collections "**Redis Sentinel** x3" as sentinel #e6e727
}
card "Redis" as redis {
@@ -143,7 +143,7 @@ is recommended instead of using NFS. Using an object storage service also
doesn't require you to provision and maintain a node.
It's also worth noting that at this time [Praefect requires its own database server](../gitaly/praefect.md#postgresql) and
-that to achieve full High Availability a third party PostgreSQL database solution will be required.
+that to achieve full High Availability a third-party PostgreSQL database solution will be required.
We hope to offer a built in solutions for these restrictions in the future but in the meantime a non HA PostgreSQL server
can be set up via Omnibus GitLab, which the above specs reflect. Refer to the following issues for more information: [`omnibus-gitlab#5919`](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5919) & [`gitaly#3398`](https://gitlab.com/gitlab-org/gitaly/-/issues/3398)
@@ -338,7 +338,7 @@ Configure DNS for an alternate SSH hostname such as `altssh.gitlab.example.com`.
The Internal Load Balancer is used to balance any internal connections the GitLab environment requires
such as connections to [PgBouncer](#configure-pgbouncer) and [Praefect](#configure-praefect) (Gitaly Cluster).
-Note that it's a separate node from the External Load Balancer and shouldn't have any access externally.
+It's a separate node from the External Load Balancer and shouldn't have any access externally.
The following IP will be used as an example:
@@ -842,12 +842,12 @@ in the second step, do not supply the `EXTERNAL_URL` value.
# Sets `max_replication_slots` to double the number of database nodes.
# Patroni uses one extra slot per node when initiating the replication.
- patroni['postgresql']['max_replication_slots'] = 8
+ patroni['postgresql']['max_replication_slots'] = 6
# Set `max_wal_senders` to one more than the number of replication slots in the cluster.
# This is used to prevent replication from using up all of the
# available database connections.
- patroni['postgresql']['max_wal_senders'] = 9
+ patroni['postgresql']['max_wal_senders'] = 7
# Incoming recommended value for max connections is 500. See https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5691.
patroni['postgresql']['max_connections'] = 500
@@ -874,7 +874,10 @@ in the second step, do not supply the `EXTERNAL_URL` value.
patroni['username'] = '<patroni_api_username>'
patroni['password'] = '<patroni_api_password>'
- # Replace XXX.XXX.XXX.XXX/YY with Network Address
+ # Replace 10.6.0.0/24 with Network Addresses for your other patroni nodes
+ patroni['allowlist'] = %w(10.6.0.0/24 127.0.0.1/32)
+
+ # Replace 10.6.0.0/24 with Network Address
postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/24 127.0.0.1/32)
# Set the network addresses that the exporters will listen on for monitoring
@@ -2072,7 +2075,7 @@ in Kubernetes via our official [Helm Charts](https://docs.gitlab.com/charts/).
In this setup, we support running the equivalent of GitLab Rails and Sidekiq nodes
in a Kubernetes cluster, named Webservice and Sidekiq respectively. In addition,
the following other supporting services are supported: NGINX, Task Runner, Migrations,
-Prometheus and Grafana.
+Prometheus, and Grafana.
Hybrid installations leverage the benefits of both cloud native and traditional
compute deployments. With this, _stateless_ components can benefit from cloud native
@@ -2083,23 +2086,23 @@ NOTE:
This is an **advanced** setup. Running services in Kubernetes is well known
to be complex. **This setup is only recommended** if you have strong working
knowledge and experience in Kubernetes. The rest of this
-section will assume this.
+section assumes this.
### Cluster topology
-The following tables and diagram details the hybrid environment using the same formats
+The following tables and diagram detail the hybrid environment using the same formats
as the normal environment above.
-First starting with the components that run in Kubernetes. The recommendations at this
-time use Google Cloud’s Kubernetes Engine (GKE) and associated machine types, but the memory
+First are the components that run in Kubernetes. The recommendation at this time is to
+use Google Cloud’s Kubernetes Engine (GKE) and associated machine types, but the memory
and CPU requirements should translate to most other providers. We hope to update this in the
future with further specific cloud provider details.
-| Service | Nodes(1) | Configuration | GCP | Allocatable CPUs and Memory |
-|-------------------------------------------------------|----------|-------------------------|------------------|-----------------------------|
-| Webservice | 5 | 16 vCPU, 14.4 GB memory | `n1-highcpu-16` | 79.5 vCPU, 62 GB memory |
-| Sidekiq | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | 11.8 vCPU, 38.9 GB memory |
-| Supporting services such as NGINX, Prometheus, etc. | 2 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | 3.9 vCPU, 11.8 GB memory |
+| Service | Nodes<sup>1</sup> | Configuration | GCP | Allocatable CPUs and Memory |
+|-------------------------------------------------------|-------------------|-------------------------|------------------|-----------------------------|
+| Webservice | 5 | 16 vCPU, 14.4 GB memory | `n1-highcpu-16` | 79.5 vCPU, 62 GB memory |
+| Sidekiq | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | 11.8 vCPU, 38.9 GB memory |
+| Supporting services such as NGINX, Prometheus | 2 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | 3.9 vCPU, 11.8 GB memory |
<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
<!-- markdownlint-disable MD029 -->
@@ -2112,22 +2115,22 @@ services where applicable):
| Service | Nodes | Configuration | GCP |
|--------------------------------------------|-------|-------------------------|------------------|
-| Redis(2) | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` |
-| Consul(1) + Sentinel(2) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
-| PostgreSQL(1) | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
-| PgBouncer(1) | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
-| Internal load balancing node(3) | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Redis<sup>2</sup> | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` |
+| Consul<sup>1</sup> + Sentinel<sup>2</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| PostgreSQL<sup>1</sup> | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` |
+| PgBouncer<sup>1</sup> | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Internal load balancing node<sup>3</sup> | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
| Gitaly | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` |
| Praefect | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
-| Praefect PostgreSQL(1) | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
-| Object storage(4) | n/a | n/a | n/a |
+| Praefect PostgreSQL<sup>1</sup> | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` |
+| Object storage<sup>4</sup> | n/a | n/a | n/a |
<!-- Disable ordered list rule https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md029---ordered-list-item-prefix -->
<!-- markdownlint-disable MD029 -->
1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery.
2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work.
3. Can be optionally run on reputable third-party load balancing services (LB PaaS). AWS ELB is known to work.
-4. Should be run on reputable third party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
+4. Should be run on reputable third-party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work.
<!-- markdownlint-enable MD029 -->
NOTE:
@@ -2150,9 +2153,9 @@ card "Kubernetes via Helm Charts" as kubernetes {
card "**Internal Load Balancer**" as ilb #9370DB
-node "**Consul + Sentinel** x3" as consul_sentinel {
- component Consul as consul #e76a9b
- component Sentinel as sentinel #e6e727
+card "**Consul + Sentinel**" as consul_sentinel {
+ collections "**Consul** x3" as consul #e76a9b
+ collections "**Redis Sentinel** x3" as sentinel #e6e727
}
card "Gitaly Cluster" as gitaly_cluster {
@@ -2221,11 +2224,11 @@ documents how to apply the calculated configuration to the Helm Chart.
#### Webservice
Webservice pods typically need about 1 vCPU and 1.25 GB of memory _per worker_.
-Each Webservice pod will consume roughly 4 vCPUs and 5 GB of memory using
+Each Webservice pod consumes roughly 4 vCPUs and 5 GB of memory using
the [recommended topology](#cluster-topology) because four worker processes
are created by default and each pod has other small processes running.
-For 5k users we recommend a total Puma worker count of around 40.
+For 5,000 users we recommend a total Puma worker count of around 40.
With the [provided recommendations](#cluster-topology) this allows the deployment of up to 10
Webservice pods with 4 workers per pod and 2 pods per node. Expand available resources using
the ratio of 1 vCPU to 1.25 GB of memory _per each worker process_ for each additional
diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md
index 22871f6ea8d..74d8bf39d03 100644
--- a/doc/administration/reference_architectures/index.md
+++ b/doc/administration/reference_architectures/index.md
@@ -71,6 +71,8 @@ The following reference architectures are available:
The following Cloud Native Hybrid reference architectures, where select recommended components can be run in Kubernetes, are available:
+- [Up to 2,000 users](2k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative)
+- [Up to 3,000 users](3k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative)
- [Up to 5,000 users](5k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative)
- [Up to 10,000 users](10k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative)
- [Up to 25,000 users](25k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative)