diff options
author | GitLab Bot <gitlab-bot@gitlab.com> | 2023-05-17 19:05:49 +0300 |
---|---|---|
committer | GitLab Bot <gitlab-bot@gitlab.com> | 2023-05-17 19:05:49 +0300 |
commit | 43a25d93ebdabea52f99b05e15b06250cd8f07d7 (patch) | |
tree | dceebdc68925362117480a5d672bcff122fb625b /doc/development/database/load_balancing.md | |
parent | 20c84b99005abd1c82101dfeff264ac50d2df211 (diff) |
Add latest changes from gitlab-org/gitlab@16-0-stable-eev16.0.0-rc42
Diffstat (limited to 'doc/development/database/load_balancing.md')
-rw-r--r-- | doc/development/database/load_balancing.md | 59 |
1 files changed, 59 insertions, 0 deletions
diff --git a/doc/development/database/load_balancing.md b/doc/development/database/load_balancing.md new file mode 100644 index 00000000000..f623ad1eab0 --- /dev/null +++ b/doc/development/database/load_balancing.md @@ -0,0 +1,59 @@ +--- +stage: Data Stores +group: Database +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments +--- + +# Database load balancing + +With database load balancing, read-only queries can be distributed across multiple +PostgreSQL nodes to increase performance. + +This documentation provides a technical overview on how database load balancing +is implemented in GitLab Rails and Sidekiq. + +## Nomenclature + +1. **Host**: Each database host. It could be a primary or a replica. +1. **Primary**: Primary PostgreSQL host that is used for write-only and read-and-write operations. +1. **Replica**: Secondary PostgreSQL hosts that are used for read-only operations. +1. **Workload**: a Rails request or a Sidekiq job that requires database connections. + +## Components + +F few Ruby classes are involved in the load balancing process. All of them are +in the namespace `Gitlab::Database::LoadBalancing`: + +1. `Host` +1. `LoadBalancer` +1. `ConnectionProxy` +1. `Session` + +Each workload begins with a new instance of `Gitlab::Database::LoadBalancing::Session`. +The `Session` keeps track of the database operations that have been performed. It then +determines if the workload requires a connection to either the primary host or a replica host. + +When the workload requires a database connection through `ActiveRecord`, +`ConnectionProxy` first redirects the connection request to `LoadBalancer`. +`ConnectionProxy` requests either a `read` or `read_write` connection from the `LoadBalancer` +depending on a few criteria: + +1. Whether the query is a read-only or it requires write. +1. Whether the `Session` has recorded a write operation previously. +1. Whether any special blocks have been used to prefer primary or replica, such as: + - `use_primary` + - `ignore_writes` + - `use_replicas_for_read_queries` + - `fallback_to_replicas_for_ambiguous_queries` + +`LoadBalancer` then yields the requested connection from the respective database connection pool. +It yields either: + +- A `read_write` connection from the primary's connection pool. +- A `read` connection from the replicas' connection pools. + +When responding to a request for a `read` connection, `LoadBalancer` would +first attempt to load balance the connection across the replica hosts. +It looks for the next `online` replica host and yields a connection from the host's connection pool. +A replica host is considered `online` if it is up-to-date with the primary, based on +either the replication lag size or time. The thresholds for these requirements are configurable. |