diff options
author | Patrick Steinhardt <psteinhardt@gitlab.com> | 2021-01-18 12:54:14 +0300 |
---|---|---|
committer | Patrick Steinhardt <psteinhardt@gitlab.com> | 2021-01-18 13:20:01 +0300 |
commit | d023af5566b798c227114c14f1012a4beb868a9c (patch) | |
tree | a8edc35e20cbbfb77d54f7ad14fe73e0f4620900 | |
parent | cd606a2951187043282701036135cd72193f88d3 (diff) |
repository: Fix ReplicateRepository returning before RPCs have finished
When calling `ReplicateRepository()`, we try to both synchronize the
gitattributes file as well as the actual repository contents. This is
being done in parallel via the `errgroup` package, which allows us to
run multiple functions in parallel and then retrieve the first error
that was returned by any of them.
One feature that is not obvious is that `errgroup.WithContext()` returns
a cancellable context which gets cancelled as soon as any of the invoked
functions returns an error. We're using this feature right now, which
means that if either of the synchronizing functions fails, the other one
will get cancelled.
While it sounds like a logical thing to do, it does have problems in
combination with the gRPC calls. If the context of a gRPC call gets
cancelled, then the call will get aborted. But while there is no
official documentation proving this, it seems like cancellation happens
asynchronously. So while we're cancelling the remote call, we'll return
even though the remote side hasn't yet been cancelled. Which effectively
means that it may still be running on the remote server after
`ReplicateRepository()` has returned on the local client.
There doesn't seem to be any way to synchronize on the remote
cancellation, so what this commit instead does is to simply not pass the
cancellable context of the `errgroup` to those functions. This has the
downside of not cancelling RPCs even if we know that they'll fail, or
that we don't care for their result. But on the other hand, it fixes
above race where we may modify the repository even after the RPC has
returned. This also fixes a test race in "TestReplicate/BadRepository".
-rw-r--r-- | changelogs/unreleased/pks-replicate-repository-cancellation.yml | 5 | ||||
-rw-r--r-- | internal/gitaly/service/repository/replicate.go | 8 |
2 files changed, 12 insertions, 1 deletions
diff --git a/changelogs/unreleased/pks-replicate-repository-cancellation.yml b/changelogs/unreleased/pks-replicate-repository-cancellation.yml new file mode 100644 index 000000000..be729fd24 --- /dev/null +++ b/changelogs/unreleased/pks-replicate-repository-cancellation.yml @@ -0,0 +1,5 @@ +--- +title: 'repository: Fix ReplicateRepository returning before RPCs have finished' +merge_request: 3011 +author: +type: fixed diff --git a/internal/gitaly/service/repository/replicate.go b/internal/gitaly/service/repository/replicate.go index fd0b09de2..20d3ffa74 100644 --- a/internal/gitaly/service/repository/replicate.go +++ b/internal/gitaly/service/repository/replicate.go @@ -47,7 +47,13 @@ func (s *server) ReplicateRepository(ctx context.Context, in *gitalypb.Replicate } } - g, ctx := errgroup.WithContext(ctx) + // We're not using the context of the errgroup here, as an error + // returned by either of the called functions would cancel the + // respective other function. Given that we're doing RPC calls in + // them, cancellation of the calls would mean that the remote side + // may still modify the repository even though the local side has + // returned already. + g, _ := errgroup.WithContext(ctx) outgoingCtx := helper.IncomingToOutgoing(ctx) syncFuncs := []func(context.Context, *gitalypb.ReplicateRepositoryRequest) error{ |