Document Praefect replication of forks

author: Paul Okstad <pokstad@gitlab.com> 2020-12-16 02:31:09 +0300
committer: Paul Okstad <pokstad@gitlab.com> 2020-12-16 02:31:09 +0300
commit: fa977d487d43ce790d667405966d2fe1453afd80 (patch)
tree: 6de982ca59b6200ae3a0cbff61aeac402e5fe97c
parent: e7af6836dbbda1b791d2336f6d8986d4f0aac5a1 (diff)
1 files changed, 65 insertions, 14 deletions
diff --git a/doc/design_ha.md b/doc/design_ha.md
index e3e58855c..ff95c5484 100644
--- a/doc/design_ha.md
+++ b/doc/design_ha.md
@@ -86,24 +86,32 @@ sequenceDiagram
 *Note: the above interaction between the praefect and nodes A-B-C is an all-or-nothing transaction. All nodes must complete in success, otherwise a single node failure will cause the entire transaction to fail. This will be improved when replication is introduced.*
 
 ### 3. Replication
-The next phase is to enable replication of data between nodes. This makes transactions more efficient and fault tolerant. This could be done a few ways:
 
-#### Node Orchestrated [👎]
-Node orchestrated puts the intelligence of replication into one of the nodes being modified:
+Praefect relies on replication when a Gitaly RPC doesn't support transactions or
+a repository replica needs to be repaired.
+
+For transaction mutator RPCs, Praefect attempts to make the same change to a
+quroum of a repository replicas in a single transactional write. If a quorom of replicas
+successfully applies the RPC, then replication will only be scheduled for any
+replicas that were unsuccessful. See the section on [strong consistency
+design](#strong-consistency-design) for more details.
 
 ```mermaid
 sequenceDiagram
 	Praefect->>Node A: Modify repo X
-	activate Node A
-	Node A->>Node B: Modify repo X
-	Node A->>Node C: Modify repo X
-	Node A->>Praefect: Modification successful!
+	Praefect->>Node B: Modify repo X
+	Praefect->>Node C: Modify repo X
+	Node A->>Praefect: Success :-)
+	Node B->>Praefect: Success :-)
+	Node C->>Praefect: FAILURE :'(
+	Praefect->>Node C: Replicate From A
+	Node C->>Praefect: Success!
 ```
 
-Orchestration requires designating a leader node for the transaction. This leader node becomes a critical path for all nodes involved. Ideally, we want several simpler (less riskier) operations that can succeed/fail independently of each other. This way, failure and recovery can be handled externally of the nodes.
-
-#### Praefect Orchestrated [👍]
-With the praefect orchestrating replication, we are isolating the critical path to a stateless service. Stateless services are preferred for the critical path since another praefect can pick up the task after a praefect failure.
+When Praefect proxies a non-transactional mutator RPC, it will first route the
+RPC to the current primary Gitaly for the given repository. Once the RPC
+completes, Praefect will schedule replication of these changes from the primary
+to all secondaries.
 
 ```mermaid
 sequenceDiagram
@@ -115,9 +123,52 @@ sequenceDiagram
 	Node C->>Praefect: Success!
 ```
 
-*Note: Once Node-A propagates changes to a peer, Node-A is no longer the critical path for subsequent propagations. If Node-A fails after a second peer is propagated, that second peer can become the new leader and resume replications.*
-
-##### Replication Logic
+#### Replication Process
+
+The actual replication process is still in active development. At the time of
+this writing, the replication process looks like this:
+
+1. Instruct the target Gitaly to replicate from the source Gitaly
+   1. Does the target repository exist?
+      - Yes: continue
+      - No:
+         1. Snapshot the repository from the source Gitaly
+         1. Extract the snapshot to the target Gitaly
+         1. Fetch changes from the source Gitaly
+         1. Sync misc files (e.g. info attributes)
+1. Does the source repository have an object pool?
+   - No: continue
+   - Yes:
+      1. Get source repository object pool information
+      1. Manipulate object pool to work for target repo
+      1. Link target repo to manipulated object pool
+
+##### Replication Process Concerns
+
+The replication process has been tested in production and works well for small
+repositories. For larger repositories, such as `www-gitlab-com` and
+`gitlab-org/gitlab`, it starts to show signs of stress.
+
+The snapshot process is very resource intensive for fork operations. When
+snapshotting a large repo, you end up with n-1 (n == replica count) copies of
+the repository being compressed and extracted to secondary replicas.
+
+Adding to this stress is the constraint of storage limitations for gitlab.com
+users. The GitLab handbook (`www-gitlab-com`) is now larger than the storage
+quota for free users. Until a secondary replica performs housekeeping, it
+will consume the storage quota of the extracted snapshot. If Praefect instead
+used fast forking (https://gitlab.com/gitlab-org/gitlab/-/issues/24523), this
+would not be an issue since forked copies would only use a small amount of
+additional data.
+
+To complicates matter even more, read distribution can contribute to
+inconsistent behavior when attempting to determine how much storage a user has
+consumed. Since stating a repository's disk space is a read-only operation, it
+is load balanced across all up to date replicas of the repository. If any of
+those replicas still has the duplicated fork data, this will lead to a much
+higher disk usage being reported than a replica that has been deduplicated.
+
+#### Replication Logic
 
 Here are the steps during a Gitaly client GRPC call intercepted by Praefect:
author	Paul Okstad <pokstad@gitlab.com>	2020-12-16 02:31:09 +0300
committer	Paul Okstad <pokstad@gitlab.com>	2020-12-16 02:31:09 +0300
commit	fa977d487d43ce790d667405966d2fe1453afd80 (patch)
tree	6de982ca59b6200ae3a0cbff61aeac402e5fe97c
parent	e7af6836dbbda1b791d2336f6d8986d4f0aac5a1 (diff)