docs: Document Gitaly backpressure

There are a number of knobs in Gitaly to tune backpressure Gitaly can impose on services that call it. This commit documents these.
author: John Cai <jcai@gitlab.com> 2022-04-20 06:30:22 +0300
committer: John Cai <jcai@gitlab.com> 2022-04-20 06:30:22 +0300
commit: 051d510a384ffa12f02a14ff292cdbf3e141505a (patch)
tree: dc60cce072ba466a83970714718592d641cad5e7
parent: 5591e2b54cff1fbfa38d19a3747c18fb847f9b4a (diff)
3 files changed, 102 insertions, 0 deletions
diff --git a/README.md b/README.md
index 0d1c348ac..3c304af70 100644
--- a/README.md
+++ b/README.md
@@ -170,6 +170,7 @@ For more information on how to set it up, see the [LabKit monitoring docs](https
 - [How to configure backpressure in Gitaly](https://youtu.be/wX9CtFdLYxE)
 
     An overview of the knobs in the Gitaly config to set limits on incoming traffic.
+    There is also [written documentation](doc/backpressure.md).
 
 - [How Gitaly fits into GitLab (Youtube)](https://www.youtube.com/playlist?list=PL05JrBw4t0KqoFUiX42JG7BAc7pipMBAy) - a series of 1-hour training videos for contributors new to GitLab and Gitaly.
   - [Part 1: the Gitaly client in gitlab-ce, 2019-02-21](https://www.youtube.com/watch?v=j0HNiKCnLTI&list=PL05JrBw4t0KqoFUiX42JG7BAc7pipMBAy)
diff --git a/doc/README.md b/doc/README.md
index 00951449c..e72f5fae3 100644
--- a/doc/README.md
+++ b/doc/README.md
@@ -39,6 +39,7 @@ For configuration please read [praefects configuration documentation](doc/config
 - [Serverside Git Usage](serverside_git_usage.md)
 - [Object Pools](object_pools.md)
 - [Sidechannel protocol](sidechannel.md)
+- [Backpressure](backpressure.md)
 
 #### RFCs
 
diff --git a/doc/backpressure.md b/doc/backpressure.md
new file mode 100644
index 000000000..8452d4fd9
--- /dev/null
+++ b/doc/backpressure.md
@@ -0,0 +1,100 @@
+# Request limiting in Gitaly
+
+In the GitLab ecosystem, Gitaly is the service that is at the bottom of the
+stack for Git data access. This means that when there is a surge of
+requests to retrieve or change a piece of Git data, the I/O happens in Gitaly.
+This can lead to Gitaly being overwhelmed due to system resource exhaustion
+because all Git access goes through Gitaly.
+
+If there is a surge of traffic beyond what Gitaly can handle, Gitaly should
+be able to push back on the client calling. Gitaly shouldn't subserviently agree
+to process more than it can handle.
+
+We can turn several different knobs in Gitaly that put a limit on different kinds
+of traffic patterns.
+
+## Concurrency queue
+
+Limit the number of concurrent RPCs that are in flight on each Gitaly node for each
+repository per RPC using `[[concurrency]]` configuration:
+
+```toml
+[[concurrency]]
+rpc = "/gitaly.SmartHTTPService/PostUploadPackWithSidechannel"
+max_per_repo = 1
+```
+
+For example:
+
+- One clone request comes in for repository "A" (a largish repository).
+- While this RPC is executing, another request comes in for repository "A". Because
+  `max_per_repo` is 1 in this case, the second request blocks until the first request
+  is finished.
+
+An in-memory queue of requests can build up in Gitaly that are waiting their turn. Because
+this is a potential vector for a memory leak, two other values in the `[[concurrency]]`
+configuration can prevent an unbounded in-memory queue of requests:
+
+- `max_queue_wait` is the maximum amount of time a request can wait in the
+  concurrency queue. When a request waits longer than this time, it returns
+  an error to the client.
+- `max_queue_size` is the maximum size the concurrency queue can grow for a given
+  RPC for a repository. If a concurrency queue is at its maximum, subsequent requests
+  return with an error.
+
+For example:
+
+```toml
+[[concurrency]]
+rpc = "/gitaly.SmartHTTPService/PostUploadPackWithSidechannel"
+max_per_repo = 1
+max_queue_wait = "1m"
+max_queue_size = 5
+```
+
+## Rate limiting
+
+To allow Gitaly to put back pressure on its clients, administrators can set a rate limit per
+repository for each RPC:
+
+```toml
+[[rate_limiting]]
+rpc =  "/gitaly.RepositoryService/RepackFull"
+interval = "1m"
+burst = 1
+```
+
+The rate limiter is implemented using the concept of a `token bucket`. A `token
+bucket` has capacity `burst` and is refilled at an interval of `interval`. When a
+request comes into Gitaly, a token is retrieved from the `token bucket` per
+request. When the `token bucket` is empty, there are no more requests for that
+RPC for a repository until the `token bucket` is refilled again. There is a `token bucket`
+each RPC for each repository.
+
+In the above configuration, the `token bucket` has a capacity of 1 and gets
+refilled every minute. This means that Gitaly only accepts 1 `RepackFull`
+request per repository each minute.
+
+Requests that come in after the `token bucket` is full (and before it is 
+replenished) are rejected with an error.
+
+## Errors
+
+With concurrency limiting and rate limiting, Gitaly responds with a structured
+gRPC `gitalypb.LimitError` error with:
+
+- A `Message` field that describes the error.
+- A `BackoffDuration` field that provides the client with a time when it is safe to retry.
+  If 0, it means it should never retry.
+
+Gitaly clients (`gitlab-shell`, `workhorse`, Rails) must parse this error and
+return sensible error messages to the end user. For example:
+
+- Something trying to clone using HTTP or SSH.
+- The GitLab application.
+- Something calling the API.
+
+## Metrics
+
+Metrics are available that provide visibility into how these limits are being applied.
+See the [GitLab Documentation](https://docs.gitlab.com/ee/administration/gitaly/#monitor-gitaly-and-gitaly-cluster) for details.
author	John Cai <jcai@gitlab.com>	2022-04-20 06:30:22 +0300
committer	John Cai <jcai@gitlab.com>	2022-04-20 06:30:22 +0300
commit	051d510a384ffa12f02a14ff292cdbf3e141505a (patch)
tree	dc60cce072ba466a83970714718592d641cad5e7
parent	5591e2b54cff1fbfa38d19a3747c18fb847f9b4a (diff)