diff options
author | Patrick Steinhardt <psteinhardt@gitlab.com> | 2021-02-03 10:45:03 +0300 |
---|---|---|
committer | Patrick Steinhardt <psteinhardt@gitlab.com> | 2021-02-19 08:55:57 +0300 |
commit | fc93593cb6b417e3fda6cf1fa9e34c85ff464c95 (patch) | |
tree | 728e224f1225b18041b6526f91a12d15ea3e35dc /internal/praefect/replicator.go | |
parent | e17cb5f018a693a46b71fab5ef1d2459a4df19b0 (diff) |
praefect: Stop creating replication jobs on cleanup
The `RepositoryService.Cleanup` RPC performs a set of housekeeping tasks
for git repositories. Most importantly, it will remove stale lockfiles,
broken objects and broken references. One can thus consider this RPC to
be in the same class as `RepackFull` and `RepackIncremental` in that
they're doing essentially idempotent optimizations on a repository.
But while these RPCs get special handling in Praefect's coordinator,
`Cleanup` doesn't.
While the lack of special handling for this RPC doesn't impact
correctness, due to it being a mutator it will cause us to bump the
repository generation whenever it gets executed. This is essentially
useless as in non-corrupted repositories no user-visible change should
occur. But the biggest downside of it bumping the generation number is
that many mutating RPCs actually trigger a `Cleanup()` immediately
before or after the RPC. As a result, even if transactions are used, we
are forced to generate a replication job because of the `Cleanup()`,
rendering transactions a lot less useful.
Fix this issue by implementing special routing for those cleanup RPCs
similar to how repacks and garbage collection is handled, allowing us to
get rid of the generation bumping. As a result, we do not schedule
replication jobs anymore.
Note that originally, there had been two RPCs which got executed after
another mutator: `Cleanup()` and `ApplyGitattributes()`, which in
combination caused us to bump the generation twice. Turns out we were
implicitly relying on this generation bump for pushes though: when a
push was served, read requests inspecting the push must get routed to
the primary node. Otherwise e.g. the `/internal/allowed` checks are
broken because secondaries have a different quarantine directory for
received objects than the primary. We never hit this issue because of
the generation bump, which would've caused us to always route to the
primary. But with both RPCs being fixed via this commit and 77c0166c3
(repository: Use transactions when writing gitattributes, 2021-01-28),
all replicas are now considered up-to-date during the whole push, which
in combination with reads distribution causes failing access checks.
This bug has been fixed by implementing force-routing to primaries in
1102b0b67 (praefect: Implement force-routing to primary for node-manager
router, 2021-02-03) and 4d877d7d5 (praefect: Implement force-routing to
primary for per-repo router, 2021-02-03). But given that it also
requires the GitLab Rail supstream change edab619aa1f (gitaly: Fix
access checks with transactions and quarantine environments, 2021-02-05)
which has only been merged with v13.9, we have to wait until at least
v13.10 to land this fix.
Diffstat (limited to 'internal/praefect/replicator.go')
-rw-r--r-- | internal/praefect/replicator.go | 19 |
1 files changed, 19 insertions, 0 deletions
diff --git a/internal/praefect/replicator.go b/internal/praefect/replicator.go index 240c59ae3..8e65d6757 100644 --- a/internal/praefect/replicator.go +++ b/internal/praefect/replicator.go @@ -34,6 +34,8 @@ type Replicator interface { RepackFull(ctx context.Context, event datastore.ReplicationEvent, target *grpc.ClientConn) error // RepackIncremental will do an incremental repack on the target repository RepackIncremental(ctx context.Context, event datastore.ReplicationEvent, target *grpc.ClientConn) error + // Cleanup will do a cleanup on the target repository + Cleanup(ctx context.Context, event datastore.ReplicationEvent, target *grpc.ClientConn) error } type defaultReplicator struct { @@ -256,6 +258,21 @@ func (dr defaultReplicator) RepackIncremental(ctx context.Context, event datasto return err } +func (dr defaultReplicator) Cleanup(ctx context.Context, event datastore.ReplicationEvent, targetCC *grpc.ClientConn) error { + targetRepo := &gitalypb.Repository{ + StorageName: event.Job.TargetNodeStorage, + RelativePath: event.Job.RelativePath, + } + + repoSvcClient := gitalypb.NewRepositoryServiceClient(targetCC) + + _, err := repoSvcClient.Cleanup(ctx, &gitalypb.CleanupRequest{ + Repository: targetRepo, + }) + + return err +} + func (dr defaultReplicator) RepackFull(ctx context.Context, event datastore.ReplicationEvent, targetCC *grpc.ClientConn) error { targetRepo := &gitalypb.Repository{ StorageName: event.Job.TargetNodeStorage, @@ -624,6 +641,8 @@ func (r ReplMgr) processReplicationEvent(ctx context.Context, event datastore.Re err = r.replicator.RepackFull(ctx, event, targetCC) case datastore.RepackIncremental: err = r.replicator.RepackIncremental(ctx, event, targetCC) + case datastore.Cleanup: + err = r.replicator.Cleanup(ctx, event, targetCC) default: err = fmt.Errorf("unknown replication change type encountered: %q", event.Job.Change) } |