praefect: Stop creating replication jobs on cleanup

The `RepositoryService.Cleanup` RPC performs a set of housekeeping tasks for git repositories. Most importantly, it will remove stale lockfiles, broken objects and broken references. One can thus consider this RPC to be in the same class as `RepackFull` and `RepackIncremental` in that they're doing essentially idempotent optimizations on a repository. But while these RPCs get special handling in Praefect's coordinator, `Cleanup` doesn't. While the lack of special handling for this RPC doesn't impact correctness, due to it being a mutator it will cause us to bump the repository generation whenever it gets executed. This is essentially useless as in non-corrupted repositories no user-visible change should occur. But the biggest downside of it bumping the generation number is that many mutating RPCs actually trigger a `Cleanup()` immediately before or after the RPC. As a result, even if transactions are used, we are forced to generate a replication job because of the `Cleanup()`, rendering transactions a lot less useful. Fix this issue by implementing special routing for those cleanup RPCs similar to how repacks and garbage collection is handled, allowing us to get rid of the generation bumping. As a result, we do not schedule replication jobs anymore. Note that originally, there had been two RPCs which got executed after another mutator: `Cleanup()` and `ApplyGitattributes()`, which in combination caused us to bump the generation twice. Turns out we were implicitly relying on this generation bump for pushes though: when a push was served, read requests inspecting the push must get routed to the primary node. Otherwise e.g. the `/internal/allowed` checks are broken because secondaries have a different quarantine directory for received objects than the primary. We never hit this issue because of the generation bump, which would've caused us to always route to the primary. But with both RPCs being fixed via this commit and 77c0166c3 (repository: Use transactions when writing gitattributes, 2021-01-28), all replicas are now considered up-to-date during the whole push, which in combination with reads distribution causes failing access checks. This bug has been fixed by implementing force-routing to primaries in 1102b0b67 (praefect: Implement force-routing to primary for node-manager router, 2021-02-03) and 4d877d7d5 (praefect: Implement force-routing to primary for per-repo router, 2021-02-03). But given that it also requires the GitLab Rail supstream change edab619aa1f (gitaly: Fix access checks with transactions and quarantine environments, 2021-02-05) which has only been merged with v13.9, we have to wait until at least v13.10 to land this fix.
author: Patrick Steinhardt <psteinhardt@gitlab.com> 2021-02-03 10:45:03 +0300
committer: Patrick Steinhardt <psteinhardt@gitlab.com> 2021-02-19 08:55:57 +0300
commit: fc93593cb6b417e3fda6cf1fa9e34c85ff464c95 (patch)
tree: 728e224f1225b18041b6526f91a12d15ea3e35dc /internal/praefect/replicator.go
parent: e17cb5f018a693a46b71fab5ef1d2459a4df19b0 (diff)
1 files changed, 19 insertions, 0 deletions
diff --git a/internal/praefect/replicator.go b/internal/praefect/replicator.go
index 240c59ae3..8e65d6757 100644
--- a/internal/praefect/replicator.go
+++ b/internal/praefect/replicator.go
@@ -34,6 +34,8 @@ type Replicator interface {
 	RepackFull(ctx context.Context, event datastore.ReplicationEvent, target *grpc.ClientConn) error
 	// RepackIncremental will do an incremental repack on the target repository
 	RepackIncremental(ctx context.Context, event datastore.ReplicationEvent, target *grpc.ClientConn) error
+	// Cleanup will do a cleanup on the target repository
+	Cleanup(ctx context.Context, event datastore.ReplicationEvent, target *grpc.ClientConn) error
 }
 
 type defaultReplicator struct {
@@ -256,6 +258,21 @@ func (dr defaultReplicator) RepackIncremental(ctx context.Context, event datasto
 	return err
 }
 
+func (dr defaultReplicator) Cleanup(ctx context.Context, event datastore.ReplicationEvent, targetCC *grpc.ClientConn) error {
+	targetRepo := &gitalypb.Repository{
+		StorageName:  event.Job.TargetNodeStorage,
+		RelativePath: event.Job.RelativePath,
+	}
+
+	repoSvcClient := gitalypb.NewRepositoryServiceClient(targetCC)
+
+	_, err := repoSvcClient.Cleanup(ctx, &gitalypb.CleanupRequest{
+		Repository: targetRepo,
+	})
+
+	return err
+}
+
 func (dr defaultReplicator) RepackFull(ctx context.Context, event datastore.ReplicationEvent, targetCC *grpc.ClientConn) error {
 	targetRepo := &gitalypb.Repository{
 		StorageName:  event.Job.TargetNodeStorage,
@@ -624,6 +641,8 @@ func (r ReplMgr) processReplicationEvent(ctx context.Context, event datastore.Re
 		err = r.replicator.RepackFull(ctx, event, targetCC)
 	case datastore.RepackIncremental:
 		err = r.replicator.RepackIncremental(ctx, event, targetCC)
+	case datastore.Cleanup:
+		err = r.replicator.Cleanup(ctx, event, targetCC)
 	default:
 		err = fmt.Errorf("unknown replication change type encountered: %q", event.Job.Change)
 	}
author	Patrick Steinhardt <psteinhardt@gitlab.com>	2021-02-03 10:45:03 +0300
committer	Patrick Steinhardt <psteinhardt@gitlab.com>	2021-02-19 08:55:57 +0300
commit	fc93593cb6b417e3fda6cf1fa9e34c85ff464c95 (patch)
tree	728e224f1225b18041b6526f91a12d15ea3e35dc /internal/praefect/replicator.go
parent	e17cb5f018a693a46b71fab5ef1d2459a4df19b0 (diff)