doc/administration/scaling/index.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256

---
type: reference, concepts
---

# Scaling

GitLab supports a number of scaling options to ensure that your self-managed
instance is able to scale out to meet your organization's needs when scaling up
a single-box GitLab installation is no longer practical or feasible.

Please consult our [high availability documentation](../availability/index.md)
if your organization requires fault tolerance and redundancy features, such as
automatic database system failover.

## GitLab components and scaling instructions

Here's a list of components directly provided by Omnibus GitLab or installed as
part of a source installation and their configuration instructions for scaling.

| Component | Description | Configuration instructions |
|-----------|-------------|----------------------------|
| [PostgreSQL](../../development/architecture.md#postgresql) | Database | [PostgreSQL configuration](https://docs.gitlab.com/omnibus/settings/database.html) |
| [Redis](../../development/architecture.md#redis)  | Key/value store for fast data lookup and caching | [Redis configuration](../high_availability/redis.md) |
| [GitLab application services](../../development/architecture.md#unicorn) | Unicorn/Puma, Workhorse, GitLab Shell - serves front-end requests (UI, API, Git over HTTP/SSH) | [GitLab app scaling configuration](../high_availability/gitlab.md) |
| [PgBouncer](../../development/architecture.md#pgbouncer) | Database connection pooler | [PgBouncer configuration](../high_availability/pgbouncer.md#running-pgbouncer-as-part-of-a-non-ha-gitlab-installation) **(PREMIUM ONLY)** |
| [Sidekiq](../../development/architecture.md#sidekiq) | Asynchronous/background jobs | [Sidekiq configuration](../high_availability/sidekiq.md) |
| [Gitaly](../../development/architecture.md#gitaly) | Provides access to Git repositories | [Gitaly configuration](../gitaly/index.md#running-gitaly-on-its-own-server) |
| [Prometheus](../../development/architecture.md#prometheus) and [Grafana](../../development/architecture.md#grafana) | GitLab environment monitoring | [Monitoring node for scaling](../high_availability/monitoring_node.md) |

## Third-party services used for scaling

Here's a list of third-party services you may require as part of scaling GitLab.
The services can be provided by numerous applications or vendors and further
advice is given on how best to select the right choice for your organization's
needs.

| Component | Description | Configuration instructions |
|-----------|-------------|----------------------------|
| Load balancer(s) | Handles load balancing, typically when you have multiple GitLab application services nodes | [Load balancer configuration](../high_availability/load_balancer.md)      |
| Object storage service | Recommended store for shared data objects | [Cloud Object Storage configuration](../high_availability/object_storage.md) |
| NFS | Shared disk storage service. Can be used as an alternative for Gitaly or Object Storage. Required for GitLab Pages | [NFS configuration](../high_availability/nfs.md) |

## Reference architectures

- 1 - 1000 Users: A single-node [Omnibus](https://docs.gitlab.com/omnibus/) setup with frequent backups. Refer to the [Single-node Omnibus installation](#single-node-installation) section below.
- 1000 to 50000+ Users: A [Scaled-out Omnibus installation with multiple servers](#multi-node-installation-scaled-out-for-availability), it can be with or without high-availability components applied.
  - To decide the level of Availability please refer to our [Availability](../availability/index.md) page.

### Single-node installation

This solution is appropriate for many teams that have a single server at their disposal. With automatic backup of the GitLab repositories, configuration, and the database, this can be an optimal solution if you don't have strict availability requirements.

You can also optionally configure GitLab to use an [external PostgreSQL service](../external_database.md)
or an [external object storage service](../high_availability/object_storage.md) for added
performance and reliability at a relatively low complexity cost.

References:

- [Installation Docs](../../install/README.md)
- [Backup/Restore Docs](https://docs.gitlab.com/omnibus/settings/backups.html#backup-and-restore-omnibus-gitlab-configuration)

### Multi-node installation (scaled out for availability)

This solution is appropriate for teams that are starting to scale out when
scaling up is no longer meeting their needs. In this configuration, additional application nodes will handle frontend traffic, with a load balancer in front to distribute traffic across those nodes. Meanwhile, each application node connects to a shared file server and PostgreSQL and Redis services on the back end.

The additional application servers adds limited fault tolerance to your GitLab
instance. As long as one application node is online and capable of handling the
instance's usage load, your team's productivity will not be interrupted. Having
multiple application nodes also enables [zero-downtime updates](https://docs.gitlab.com/omnibus/update/#zero-downtime-updates).

References:

- [Configure your load balancer for GitLab](../high_availability/load_balancer.md)
- [Configure your NFS server to work with GitLab](../high_availability/nfs.md)
- [Configure packaged PostgreSQL server to listen on TCP/IP](https://docs.gitlab.com/omnibus/settings/database.html#configure-packaged-postgresql-server-to-listen-on-tcpip)
- [Setting up a Redis-only server](https://docs.gitlab.com/omnibus/settings/redis.html#setting-up-a-redis-only-server)

In this section we'll detail the Reference Architectures that can support large numbers
of users. These were built, tested and verified by our Quality and Support teams.

Testing was done with our GitLab Performance Tool at specific coded workloads, and the
throughputs used for testing were calculated based on sample customer data. We
test each endpoint type with the following number of requests per second (RPS)
per 1000 users:

- API: 20 RPS
- Web: 2 RPS
- Git: 2 RPS

NOTE: **Note:** Note that depending on your workflow the below recommended
reference architectures may need to be adapted accordingly. Your workload
is influenced by factors such as - but not limited to - how active your users are,
how much automation you use, mirroring, and repo/change size. Additionally the
shown memory values are given directly by [GCP machine types](https://cloud.google.com/compute/docs/machine-types).
On different cloud vendors a best effort like for like can be used.

#### 2,000 user configuration

- **Supported users (approximate):** 2,000
- **Test RPS rates:** API: 40 RPS, Web: 4 RPS, Git: 4 RPS
- **Known issues:**  [List of known performance issues](https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=Quality%3Aperformance-issues)

| Service                     | Nodes | Configuration[^8]     | GCP type      | AWS type[^9] |
| ----------------------------|-------|-----------------------|---------------|--------------|
| GitLab Rails[^1]            | 3     | 8 vCPU, 7.2GB Memory  | n1-highcpu-8  | c5.2xlarge   |
| PostgreSQL                  | 3     | 2 vCPU, 7.5GB Memory  | n1-standard-2 | m5.large     |
| PgBouncer                   | 3     | 2 vCPU, 1.8GB Memory  | n1-highcpu-2  | c5.large     |
| Gitaly[^2] [^5] [^7]        | X     | 4 vCPU, 15GB Memory   | n1-standard-4 | m5.xlarge    |
| Redis[^3]                   | 3     | 2 vCPU, 7.5GB Memory  | n1-standard-2 | m5.large     |
| Consul + Sentinel[^3]       | 3     | 2 vCPU, 1.8GB Memory  | n1-highcpu-2  | c5.large     |
| Sidekiq                     | 4     | 2 vCPU, 7.5GB Memory  | n1-standard-2 | m5.large     |
| Cloud Object Storage[^4]    | -     | -                     | -             | -            |
| NFS Server[^5] [^7]         | 1     | 4 vCPU, 3.6GB Memory  | n1-highcpu-4  | c5.xlarge    |
| Monitoring node             | 1     | 2 vCPU, 1.8GB Memory  | n1-highcpu-2  | c5.large     |
| External load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2  | c5.large     |
| Internal load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2  | c5.large     |

#### 5,000 user configuration

- **Supported users (approximate):** 5,000
- **Test RPS rates:** API: 100 RPS, Web: 10 RPS, Git: 10 RPS
- **Known issues:**  [List of known performance issues](https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=Quality%3Aperformance-issues)

| Service                     | Nodes | Configuration[^8]      | GCP type      | AWS type[^9] |
| ----------------------------|-------|------------------------|---------------|--------------|
| GitLab Rails[^1]            | 3     | 16 vCPU, 14.4GB Memory | n1-highcpu-16 | c5.4xlarge   |
| PostgreSQL                  | 3     | 2 vCPU, 7.5GB Memory   | n1-standard-2 | m5.large     |
| PgBouncer                   | 3     | 2 vCPU, 1.8GB Memory   | n1-highcpu-2  | c5.large     |
| Gitaly[^2] [^5] [^7]        | X     | 8 vCPU, 30GB Memory    | n1-standard-8 | m5.2xlarge   |
| Redis[^3]                   | 3     | 2 vCPU, 7.5GB Memory   | n1-standard-2 | m5.large     |
| Consul + Sentinel[^3]       | 3     | 2 vCPU, 1.8GB Memory   | n1-highcpu-2  | c5.large     |
| Sidekiq                     | 4     | 2 vCPU, 7.5GB Memory   | n1-standard-2 | m5.large     |
| Cloud Object Storage[^4]    | -     | -                      | -             | -            |
| NFS Server[^5] [^7]         | 1     | 4 vCPU, 3.6GB Memory   | n1-highcpu-4  | c5.xlarge    |
| Monitoring node             | 1     | 2 vCPU, 1.8GB Memory   | n1-highcpu-2  | c5.large     |
| External load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory  | n1-highcpu-2  | c5.large     |
| Internal load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory  | n1-highcpu-2  | c5.large     |

#### 10,000 user configuration

- **Supported users (approximate):** 10,000
- **Test RPS rates:** API: 200 RPS, Web: 20 RPS, Git: 20 RPS
- **Known issues:**  [List of known performance issues](https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=Quality%3Aperformance-issues)

| Service                     | Nodes | GCP Configuration[^8]  | GCP type       | AWS type[^9] |
| ----------------------------|-------|------------------------|----------------|--------------|
| GitLab Rails[^1]            | 3     | 32 vCPU, 28.8GB Memory | n1-highcpu-32  | c5.9xlarge   |
| PostgreSQL                  | 3     | 4 vCPU, 15GB Memory    | n1-standard-4  | m5.xlarge    |
| PgBouncer                   | 3     | 2 vCPU, 1.8GB Memory   | n1-highcpu-2   | c5.large     |
| Gitaly[^2] [^5] [^7]        | X     | 16 vCPU, 60GB Memory   | n1-standard-16 | m5.4xlarge   |
| Redis[^3] - Cache           | 3     | 4 vCPU, 15GB Memory    | n1-standard-4  | m5.xlarge    |
| Redis[^3] - Queues / Shared State | 3 | 4 vCPU, 15GB Memory  | n1-standard-4  | m5.xlarge    |
| Redis Sentinel[^3] - Cache  | 3     | 1 vCPU, 1.7GB Memory   | g1-small       | t2.small     |
| Redis Sentinel[^3] - Queues / Shared State | 3 | 1 vCPU, 1.7GB Memory | g1-small | t2.small |
| Consul                      | 3     | 2 vCPU, 1.8GB Memory   | n1-highcpu-2   | c5.large     |
| Sidekiq                     | 4     | 4 vCPU, 15GB Memory    | n1-standard-4  | m5.xlarge    |
| Cloud Object Storage[^4]    | -     | -                      | -              | -            |
| NFS Server[^5] [^7]         | 1     | 4 vCPU, 3.6GB Memory   | n1-highcpu-4   | c5.xlarge    |
| Monitoring node             | 1     | 4 vCPU, 3.6GB Memory   | n1-highcpu-4   | c5.xlarge    |
| External load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory  | n1-highcpu-2   | c5.large     |
| Internal load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory  | n1-highcpu-2   | c5.large     |

#### 25,000 user configuration

- **Supported users (approximate):** 25,000
- **Test RPS rates:** API: 500 RPS, Web: 50 RPS, Git: 50 RPS
- **Known issues:**  [List of known performance issues](https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=Quality%3Aperformance-issues)

| Service                     | Nodes | Configuration[^8]      | GCP type       | AWS type[^9] |
| ----------------------------|-------|------------------------|----------------|--------------|
| GitLab Rails[^1]            | 5     | 32 vCPU, 28.8GB Memory | n1-highcpu-32  | c5.9xlarge   |
| PostgreSQL                  | 3     | 8 vCPU, 30GB Memory    | n1-standard-8  | m5.2xlarge   |
| PgBouncer                   | 3     | 2 vCPU, 1.8GB Memory   | n1-highcpu-2   | c5.large     |
| Gitaly[^2] [^5] [^7]        | X     | 32 vCPU, 120GB Memory  | n1-standard-32 | m5.8xlarge   |
| Redis[^3] - Cache           | 3     | 4 vCPU, 15GB Memory    | n1-standard-4  | m5.xlarge    |
| Redis[^3] - Queues / Shared State | 3 | 4 vCPU, 15GB Memory  | n1-standard-4  | m5.xlarge    |
| Redis Sentinel[^3] - Cache  | 3     | 1 vCPU, 1.7GB Memory   | g1-small       | t2.small     |
| Redis Sentinel[^3] - Queues / Shared State | 3 | 1 vCPU, 1.7GB Memory | g1-small | t2.small |
| Consul                      | 3     | 2 vCPU, 1.8GB Memory   | n1-highcpu-2   | c5.large     |
| Sidekiq                     | 4     | 4 vCPU, 15GB Memory    | n1-standard-4  | m5.xlarge    |
| Cloud Object Storage[^4]    | -     | -                      | -              | -            |
| NFS Server[^5] [^7]         | 1     | 4 vCPU, 3.6GB Memory   | n1-highcpu-4   | c5.xlarge    |
| Monitoring node             | 1     | 4 vCPU, 3.6GB Memory   | n1-highcpu-4   | c5.xlarge    |
| External load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory  | n1-highcpu-2   | c5.large     |
| Internal load balancing node[^6] | 1 | 4 vCPU, 3.6GB Memory  | n1-highcpu-4   | c5.xlarge    |

#### 50,000 user configuration

- **Supported users (approximate):** 50,000
- **Test RPS rates:** API: 1000 RPS, Web: 100 RPS, Git: 100 RPS
- **Known issues:**  [List of known performance issues](https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=Quality%3Aperformance-issues)

| Service                     | Nodes | Configuration[^8]      | GCP type       | AWS type[^9] |
| ----------------------------|-------|------------------------|----------------|--------------|
| GitLab Rails[^1]            | 12    | 32 vCPU, 28.8GB Memory | n1-highcpu-32  | c5.9xlarge   |
| PostgreSQL                  | 3     | 16 vCPU, 60GB Memory   | n1-standard-16 | m5.4xlarge   |
| PgBouncer                   | 3     | 2 vCPU, 1.8GB Memory   | n1-highcpu-2   | c5.large     |
| Gitaly[^2] [^5] [^7]        | X     | 64 vCPU, 240GB Memory  | n1-standard-64 | m5.16xlarge  |
| Redis[^3] - Cache           | 3     | 4 vCPU, 15GB Memory    | n1-standard-4  | m5.xlarge    |
| Redis[^3] - Queues / Shared State | 3 | 4 vCPU, 15GB Memory  | n1-standard-4  | m5.xlarge    |
| Redis Sentinel[^3] - Cache  | 3     | 1 vCPU, 1.7GB Memory   | g1-small       | t2.small     |
| Redis Sentinel[^3] - Queues / Shared State | 3 | 1 vCPU, 1.7GB Memory | g1-small | t2.small |
| Consul                      | 3     | 2 vCPU, 1.8GB Memory   | n1-highcpu-2   | c5.large     |
| Sidekiq                     | 4     | 4 vCPU, 15GB Memory    | n1-standard-4  | m5.xlarge    |
| NFS Server[^5] [^7]         | 1     | 4 vCPU, 3.6GB Memory   | n1-highcpu-4   | c5.xlarge    |
| Cloud Object Storage[^4]    | -     | -                      | -              | -            |
| Monitoring node             | 1     | 4 vCPU, 3.6GB Memory   | n1-highcpu-4   | c5.xlarge    |
| External load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory  | n1-highcpu-2   | c5.large     |
| Internal load balancing node[^6] | 1 | 8 vCPU, 7.2GB Memory  | n1-highcpu-8   | c5.2xlarge   |

[^1]: In our architectures we run each GitLab Rails node using the Puma webserver
      and have its number of workers set to 90% of available CPUs along with 4 threads.

[^2]: Gitaly node requirements are dependent on customer data, specifically the number of
      projects and their sizes. We recommend 2 nodes as an absolute minimum for HA environments
      and at least 4 nodes should be used when supporting 50,000 or more users.
      We also recommend that each Gitaly node should store no more than 5TB of data
      and have the number of [`gitaly-ruby` workers](../gitaly/index.md#gitaly-ruby)
      set to 20% of available CPUs. Additional nodes should be considered in conjunction
      with a review of expected data size and spread based on the recommendations above.

[^3]: Recommended Redis setup differs depending on the size of the architecture.
      For smaller architectures (up to 5,000 users) we suggest one Redis cluster for all
      classes and that Redis Sentinel is hosted alongside Consul.
      For larger architectures (10,000 users or more) we suggest running a separate
      [Redis Cluster](../high_availability/redis.md#running-multiple-redis-clusters) for the Cache class
      and another for the Queues and Shared State classes respectively. We also recommend
      that you run the Redis Sentinel clusters separately as well for each Redis Cluster.

[^4]: For data objects such as LFS, Uploads, Artifacts, etc. We recommend a [Cloud Object Storage service](../object_storage.md)
      over NFS where possible, due to better performance and availability.

[^5]: NFS can be used as an alternative for both repository data (replacing Gitaly) and
      object storage but this isn't typically recommended for performance reasons. Note however it is required for
      [GitLab Pages](https://gitlab.com/gitlab-org/gitlab-pages/issues/196).

[^6]: Our architectures have been tested and validated with [HAProxy](https://www.haproxy.org/)
      as the load balancer. However other reputable load balancers with similar feature sets
      should also work instead but be aware these aren't validated.

[^7]: We strongly recommend that any Gitaly and / or NFS nodes are set up with SSD disks over
      HDD with a throughput of at least 8,000 IOPS for read operations and 2,000 IOPS for write
      as these components have heavy I/O. These IOPS values are recommended only as a starter
      as with time they may be adjusted higher or lower depending on the scale of your
      environment's workload. If you're running the environment on a Cloud provider
      you may need to refer to their documentation on how configure IOPS correctly.

[^8]: The architectures were built and tested with the [Intel Xeon E5 v3 (Haswell)](https://cloud.google.com/compute/docs/cpu-platforms)
      CPU platform on GCP. On different hardware you may find that adjustments, either lower
      or higher, are required for your CPU or Node counts accordingly. For more information, a
      [Sysbench](https://github.com/akopytov/sysbench) benchmark of the CPU can be found
      [here](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Reference-Architectures/GCP-CPU-Benchmarks).

[^9]: AWS-equivalent configurations are rough suggestions and may change in the
      future. They have not yet been tested and validated.