# Request limiting in Gitaly In the GitLab ecosystem, Gitaly is the service that is at the bottom of the stack for Git data access. This means that when there is a surge of requests to retrieve or change a piece of Git data, the I/O happens in Gitaly. This can lead to Gitaly being overwhelmed due to system resource exhaustion because all Git access goes through Gitaly. If there is a surge of traffic beyond what Gitaly can handle, Gitaly should be able to push back on the client calling. Gitaly shouldn't subserviently agree to process more than it can handle. We can turn several different knobs in Gitaly that put a limit on different kinds of traffic patterns. ## Concurrency queue Limit the number of concurrent RPCs that are in flight on each Gitaly node for each repository per RPC using `[[concurrency]]` configuration: ```toml [[concurrency]] rpc = "/gitaly.SmartHTTPService/PostUploadPackWithSidechannel" max_per_repo = 1 ``` For example: - One clone request comes in for repository "A" (a largish repository). - While this RPC is executing, another request comes in for repository "A". Because `max_per_repo` is 1 in this case, the second request blocks until the first request is finished. An in-memory queue of requests can build up in Gitaly that are waiting their turn. Because this is a potential vector for a memory leak, two other values in the `[[concurrency]]` configuration can prevent an unbounded in-memory queue of requests: - `max_queue_wait` is the maximum amount of time a request can wait in the concurrency queue. When a request waits longer than this time, it returns an error to the client. - `max_queue_size` is the maximum size the concurrency queue can grow for a given RPC. If a concurrency queue is at its maximum, subsequent requests return with an error. The queue size is per repository. For example: ```toml [[concurrency]] rpc = "/gitaly.SmartHTTPService/PostUploadPackWithSidechannel" max_per_repo = 1 max_queue_wait = "1m" max_queue_size = 5 ``` ## Rate limiting To allow Gitaly to put back pressure on its clients, administrators can set a rate limit per repository for each RPC: ```toml [[rate_limiting]] rpc = "/gitaly.RepositoryService/OptimizeRepository" interval = "1m" burst = 1 ``` The rate limiter is implemented using the concept of a `token bucket`. A `token bucket` has capacity `burst` and is refilled at an interval of `interval`. When a request comes into Gitaly, a token is retrieved from the `token bucket` per request. When the `token bucket` is empty, there are no more requests for that RPC for a repository until the `token bucket` is refilled again. There is a `token bucket` each RPC for each repository. In the above configuration, the `token bucket` has a capacity of 1 and gets refilled every minute. This means that Gitaly only accepts 1 `OptimizeRepository` request per repository each minute. Requests that come in after the `token bucket` is full (and before it is replenished) are rejected with an error. ## Errors With concurrency limiting and rate limiting, Gitaly responds with a structured gRPC `gitalypb.LimitError` error with: - A `Message` field that describes the error. - A `BackoffDuration` field that provides the client with a time when it is safe to retry. If 0, it means it should never retry. Gitaly clients (`gitlab-shell`, `workhorse`, Rails) must parse this error and return sensible error messages to the end user. For example: - Something trying to clone using HTTP or SSH. - The GitLab application. - Something calling the API. ## Metrics Metrics are available that provide visibility into how these limits are being applied. See the [GitLab Documentation](https://docs.gitlab.com/ee/administration/gitaly/#monitor-gitaly-and-gitaly-cluster) for details.