limithandler: Fix queueing mechanism in the concurrency limiter

The concurrency limiter has two different mechanisms: 1. Callers first get put into a queue that is global across all limiting keys. This queue is depth- and time-limited, which means that callers will get rejected if it is full and will be evicted when they took too long to acquire the concurrency token. 2. The concurrency limit itself that will only allow a set number of callers at the same time. This functionality is per limiting key. While the intent is usually to concurrency-limit either by user or by repository, this is sabotaged by the fact that the queue is key-agnostic and thus global. So even if almost all calls would apply to a single repository only, if the queue is enabled and full then it would become impossible to invoke the function for any other repository now. As a result the queue is rather useless as a concurrency-limiting primitive. But there is another big design flaw: as callers need to be in the queue to obtain the concurrency-limiting token, and the number of callers in the queue is strictly limited, we essentially have a global concurrency limit to obtain the concurrency-limiting tokens. Suppose you have two calls to a function that has a maximum queue depth of 1 and two calls to this function at the same time. Even if the concurrency limit would now theoretically allow for both functions to run at the same time, there is a race window where both callers might try to enter the queue at the same point in time. If this race is lost, then one of both callers will be rejected due to the queue being full while the other one is trying to obtain the concurrency token. This issue in fact surfaces in our tests, where the `TestStreamLimitHandler` test is frequently failing because the race is often lost. This second design flaw cannot easily be fixed while the queue remains global: we need to remain in the queue when trying to acquire the concurrency token, or otherwise it wouldn't really be limiting anything at all. Convert the queue to be per-key so that each resource identified by a key essentially has its own queue. This fixes both issues: - If the queue is full then we only limit access to the specific resource for which it is full. This makes the queueing mechanism useful again. - We can now change the queueing logic to allow as many callers into the queue as the concurrency limit would allow _plus_ the allowed depth of the queue itself. This fixes the race described aboved. Changelog: fixed
author: Patrick Steinhardt <psteinhardt@gitlab.com> 2023-01-05 13:29:47 +0300
committer: Patrick Steinhardt <psteinhardt@gitlab.com> 2023-01-09 11:51:48 +0300
commit: ee7c6acd586d74548138f5b7c8e89328621d2c71 (patch)
tree: 81b61a0e054f39cd07487696289f3ae792104f02 /doc
parent: 22783620ad85d2aa4b5e6f628231d9bd8ca341d8 (diff)
1 files changed, 4 insertions, 4 deletions
diff --git a/doc/backpressure.md b/doc/backpressure.md
index 71ae4fef0..f5110f8d8 100644
--- a/doc/backpressure.md
+++ b/doc/backpressure.md
@@ -38,9 +38,9 @@ configuration can prevent an unbounded in-memory queue of requests:
 - `max_queue_wait` is the maximum amount of time a request can wait in the
   concurrency queue. When a request waits longer than this time, it returns
   an error to the client.
-- `max_queue_size` is the maximum size the concurrency queue can grow for a given
-  RPC. If a concurrency queue is at its maximum, subsequent requests
-  return with an error.
+- `max_queue_size` is the maximum size the concurrency queue can grow for a
+  given RPC. If a concurrency queue is at its maximum, subsequent requests
+  return with an error. The queue size is per repository.
 
 For example:
 
@@ -75,7 +75,7 @@ In the above configuration, the `token bucket` has a capacity of 1 and gets
 refilled every minute. This means that Gitaly only accepts 1 `RepackFull`
 request per repository each minute.
 
-Requests that come in after the `token bucket` is full (and before it is 
+Requests that come in after the `token bucket` is full (and before it is
 replenished) are rejected with an error.
 
 ## Errors
author	Patrick Steinhardt <psteinhardt@gitlab.com>	2023-01-05 13:29:47 +0300
committer	Patrick Steinhardt <psteinhardt@gitlab.com>	2023-01-09 11:51:48 +0300
commit	ee7c6acd586d74548138f5b7c8e89328621d2c71 (patch)
tree	81b61a0e054f39cd07487696289f3ae792104f02 /doc
parent	22783620ad85d2aa4b5e6f628231d9bd8ca341d8 (diff)