Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorWill Chandler <wchandler@gitlab.com>2022-08-01 17:35:30 +0300
committerWill Chandler <wchandler@gitlab.com>2022-08-02 20:03:14 +0300
commit6a02746cd6edc7d3e87e8c340ad271e8859b4956 (patch)
tree119f9bc72f77f8c30a4bba26b44c6c1ce337ebe0
parent2667f87af97aaf7aaa5c54bbc7d91472b26dfc33 (diff)
server: Give clients grace-period for keepaliveswc-reduce-min-keepalive
Currently gRPC clients are configured to send keepalives every 20 seconds, and the servers enforce that as a minimum interval. Requiring precisely the same interval has caused a number of problems over time. gGRPC-Core has an issue where keepalives may be sent a few milliseconds ahead of schedule[0], breaking keepalives from Puma and Sidekiq. With the introduction of `SSHUploadPackWithSidechannel` in v15.0, the gRPC connection for Gitlab-Shell and Gitlab-Workhorse is now idle during lengthy clones, triggering keepalives in this scenario. Previously the connection was always busy and keepalives were rare. After upgrading to v15.0 a large customer reported that they have started to receive intermittent ENHANCE_YOUR_CALM errors on SSH clones, which indicates that keepalives are being sent to rapidly from the client. Examining packet captures from their hosts, we found that Gitlab-Shell was sending keepalive packets roughly 20,001ms apart, which is correct. However, the Gitaly server was intermittently receiving these at an interval of 19,998ms, triggering the error. This appears to be due to the variable connection latency, where typically there's 50ms of latency between the hosts, but if there's slightly less latency then the norm the keepalive arrives early from the server's perspective. To resolve this, let's give the clients a grace-period of ten seconds by reducing the minimum keepalive interval to 10 seconds. This way a keepalive that arrives slightly ahead of schedule is not a critical error. Given that a Gitaly server will typically have hundreds of concurrent clients, perhaps thousands on a very busy host, the volume of keepalives should not add significant load and we can afford to be forgiving. [0] https://github.com/grpc/grpc/issues/28830 Changelog: fixed
-rw-r--r--internal/gitaly/server/server.go6
-rw-r--r--internal/praefect/server.go6
2 files changed, 10 insertions, 2 deletions
diff --git a/internal/gitaly/server/server.go b/internal/gitaly/server/server.go
index 2aad1ad74..e62b4b99c 100644
--- a/internal/gitaly/server/server.go
+++ b/internal/gitaly/server/server.go
@@ -156,8 +156,12 @@ func New(
grpc.Creds(lm),
grpc.StreamInterceptor(grpcmw.ChainStreamServer(streamServerInterceptors...)),
grpc.UnaryInterceptor(grpcmw.ChainUnaryServer(unaryServerInterceptors...)),
+ // We deliberately set the server MinTime to significantly less than the client interval of 20
+ // seconds to allow for network jitter. We can afford to be forgiving as the maximum number of
+ // concurrent clients for a Gitaly server is typically in the hundreds and this volume of
+ // keepalives won't add significant load.
grpc.KeepaliveEnforcementPolicy(keepalive.EnforcementPolicy{
- MinTime: 20 * time.Second,
+ MinTime: 10 * time.Second,
PermitWithoutStream: true,
}),
grpc.KeepaliveParams(keepalive.ServerParameters{
diff --git a/internal/praefect/server.go b/internal/praefect/server.go
index f2bc6a380..3192ddfdd 100644
--- a/internal/praefect/server.go
+++ b/internal/praefect/server.go
@@ -128,8 +128,12 @@ func NewGRPCServer(
auth.UnaryServerInterceptor(conf.Auth),
)...,
)),
+ // We deliberately set the server MinTime to significantly less than the client interval of 20
+ // seconds to allow for network jitter. We can afford to be forgiving as the maximum number of
+ // concurrent clients for a Gitaly server is typically in the hundreds and this volume of
+ // keepalives won't add significant load.
grpc.KeepaliveEnforcementPolicy(keepalive.EnforcementPolicy{
- MinTime: 20 * time.Second,
+ MinTime: 10 * time.Second,
PermitWithoutStream: true,
}),
grpc.KeepaliveParams(keepalive.ServerParameters{