docs: Document Gitaly backpressurejc-docs-backpressure

There are a couple of knobs we can turn in Gitaly in terms of backpressure. Concurrency queue & limits, and rate limiting. This change documents both.
author: John Cai <jcai@gitlab.com> 2022-04-07 22:29:07 +0300
committer: John Cai <jcai@gitlab.com> 2022-04-14 23:12:33 +0300
commit: 020722161c681dc4f5208b3646d413b9b5b2639a (patch)
tree: ce8f1334393dfa0301e4690736833750970b4780
parent: 93153d53f1c77a28ef76ae9c5777ed5477835962 (diff)
3 files changed, 121 insertions, 0 deletions
diff --git a/README.md b/README.md
index 0d1c348ac..3c304af70 100644
--- a/README.md
+++ b/README.md
@@ -170,6 +170,7 @@ For more information on how to set it up, see the [LabKit monitoring docs](https
 - [How to configure backpressure in Gitaly](https://youtu.be/wX9CtFdLYxE)
 
     An overview of the knobs in the Gitaly config to set limits on incoming traffic.
+    There is also [written documentation](doc/backpressure.md).
 
 - [How Gitaly fits into GitLab (Youtube)](https://www.youtube.com/playlist?list=PL05JrBw4t0KqoFUiX42JG7BAc7pipMBAy) - a series of 1-hour training videos for contributors new to GitLab and Gitaly.
   - [Part 1: the Gitaly client in gitlab-ce, 2019-02-21](https://www.youtube.com/watch?v=j0HNiKCnLTI&list=PL05JrBw4t0KqoFUiX42JG7BAc7pipMBAy)
diff --git a/doc/README.md b/doc/README.md
index 00951449c..e72f5fae3 100644
--- a/doc/README.md
+++ b/doc/README.md
@@ -39,6 +39,7 @@ For configuration please read [praefects configuration documentation](doc/config
 - [Serverside Git Usage](serverside_git_usage.md)
 - [Object Pools](object_pools.md)
 - [Sidechannel protocol](sidechannel.md)
+- [Backpressure](backpressure.md)
 
 #### RFCs
 
diff --git a/doc/backpressure.md b/doc/backpressure.md
new file mode 100644
index 000000000..d2b1b7c1c
--- /dev/null
+++ b/doc/backpressure.md
@@ -0,0 +1,119 @@
+# Request Limiting in Gitaly
+
+## The Problem
+
+In the GitLab ecosystem, Gitaly is the service that is at the bottom of the
+stack as far as Git data access goes. This means that when there is a surge of
+requests to retrieve or change a piece of Git data, the I/O happens in Gitaly.
+This can lead to Gitaly being overloadeded due to system resource exhaustion,
+since all roads lead to Gitaly.
+
+## The Solution
+
+If there is a surge of traffic beyond what Gitaly can handle, Gitaly should
+be able to push back on the client calling it instead of subserviently agreeing
+to bite off much more than it can chew.
+
+There are several different knobs we can turn in Gitaly that put a limit on
+different kinds of traffic patterns.
+
+### Concurrency Queue
+
+There is a way to limit the number of concurrent RPCs that are in flight per
+Gitaly node/repository/RPC. This is done through the `[[concurrency]]`
+configuration:
+
+```toml
+[[concurrency]]
+rpc = "/gitaly.SmartHTTPService/PostUploadPackWithSidechannel"
+max_per_repo = 1
+```
+
+Let's say that 1 clone request come in for repo "A", and "A" is a largish
+repository. While this RPC is executing, another request comes in for repo "A".
+Since `max_per_repo` is 1 in this case, the second request will block until the
+first request is finished. 
+
+In this way, an in memory queue of requests can build up in Gitaly that are
+waiting their turn. Since this is a potential vector for a memory leak, there
+are two other values in the `[[concurrency]]` config to prevent an unbounded in
+memory queue of requests.
+
+```toml
+[[concurrency]]
+rpc = "/gitaly.SmartHTTPService/PostUploadPackWithSidechannel"
+max_per_repo = 1
+max_queue_wait = "1m"
+max_queue_size = 5
+```
+
+`max_queue_wait` is the maximum number of time a request can wait in the
+concurrency queue. When a request waits longer than this time, it simply return
+to the client with an error.
+
+`max_queue_size` is the maximum size the concurrency queue can grow for a given
+repository/rpc. If a concurrency queue is at its maximum, subsequent requests
+will return with an error.
+
+### Rate Limiting
+
+Another way to allow Gitaly to put backpressure on its clients is through rate
+limiting. Admins can set a rate limit per repository/rpc:
+
+```toml
+[[rate_limiting]]
+rpc =  "/gitaly.RepositoryService/RepackFull"
+interval = "1m"
+burst = 1
+```
+
+The rate limiter is implemented using the concept of a `token bucket`. A `token
+bucket` has capacity `burst` and is refilled at an interval of `interval`. When a
+request comes into Gitaly, a token is retrieved fro the `token bucket` per
+request. When the `token bucket` is empty, there are no more requests for that
+repository/rpc until the `token bucket` is refilled again. There is a `token bucket`
+per repository/rpc.
+
+In the above configuration, the `token bucket` has a capacity of 1 and gets
+refilled every minute. This means that Gitaly will only accept 1 `RepackFull`
+request per repository each minute.
+
+Requests that come in after the `token bucket` is full, and before it is
+replenished are rejected with an error.
+
+## Errors
+
+With concurrency limiting as well as rate limiting, Gitaly will respond with a
+structured gRPC error of the type `gitalypb.LimitError` with a `Message` field
+that describes the error, and a `BackoffDuration` field that provides
+the client with a time when it is safe to retry. If 0, it means it should never
+retry.
+
+Gitaly clients (gitlab-shell, workhorse, rails) all need to parse this error and
+return sensible error messages to the end producer whether it be something
+trying to clone via http or ssh, the GitLab application, or something calling
+the API.
+
+## Metrics
+
+There are metrics that provide visibility into how these limits are being
+applied.
+
+**gitaly_requests_dropped_total** - Total number of requests dropped by Gitaly
+due to request limiting. **reason** is a label that indicates why a request was
+dropped.
+	- **rate** indicates the request was dropped because the rate exceeded the
+      configured limit.
+	- **max_size** indicates the request was dropped because the concurrency
+	    queue's size was at the configured maximum.
+	- **max_time** indicates the request was dropped because the wait time
+	    exceeded the configured maximum.
+
+**gitaly_concurrency_limiting_acquiring_seconds** - How long a request has to
+wait due to concurrency limits before being processed.
+
+**gitaly_concurrency_limiting_in_progress** - How many concurrent requests are
+being processed currently.
+
+**gitaly_concurrency_limiting** - How large the concurrency queue is.
+
author	John Cai <jcai@gitlab.com>	2022-04-07 22:29:07 +0300
committer	John Cai <jcai@gitlab.com>	2022-04-14 23:12:33 +0300
commit	020722161c681dc4f5208b3646d413b9b5b2639a (patch)
tree	ce8f1334393dfa0301e4690736833750970b4780
parent	93153d53f1c77a28ef76ae9c5777ed5477835962 (diff)