Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/architecture/blueprints/pods/pods-feature-data-migration.md')
-rw-r--r--doc/architecture/blueprints/pods/pods-feature-data-migration.md50
1 files changed, 49 insertions, 1 deletions
diff --git a/doc/architecture/blueprints/pods/pods-feature-data-migration.md b/doc/architecture/blueprints/pods/pods-feature-data-migration.md
index fad6bca45fa..fbe97316dcc 100644
--- a/doc/architecture/blueprints/pods/pods-feature-data-migration.md
+++ b/doc/architecture/blueprints/pods/pods-feature-data-migration.md
@@ -26,6 +26,24 @@ we can document the reasons for not choosing this approach.
It is essential for Pods architecture to provide a way to migrate data out of big Pods
into smaller ones. This describes various approaches to provide this type of split.
+We also need to handle for cases where data is already violating the expected
+isolation constraints of Pods (ie. references cannot span multiple
+organizations). We know that existing features like linked issues allowed users
+to link issues across any projects regardless of their hierarchy. There are many
+similar features. All of this data will need to be migrated in some way before
+it can be split across different pods. This may mean some data needs to be
+deleted, or the feature changed and modelled slightly differently before we can
+properly split or migrate the organizations between pods.
+
+Having schema deviations across different Pods, which is a necessary
+consequence of different databases, will also impact our ability to migrate
+data between pods. Different schemas impact our ability to reliably replicate
+data across pods and especially impact our ability to validate that the data is
+correctly replicated. It might force us to only be able to move data between
+pods when the schemas are all in sync (slowing down deployments and the
+rebalancing process) or possibly only migrate from newer to older schemas which
+would be complex.
+
## 1. Definition
## 2. Data flow
@@ -48,13 +66,35 @@ physical replication, etc.
1. The data of Pod 0 is live replicated to as many Pods it needs to be split.
1. Once consensus is achieved between Pod 0 and N-Pods the organizations to be migrated away
are marked as read-only cluster-wide.
-1. The `routes` is updated on for all organizations to be split to indicate an authorative
+1. The `routes` is updated on for all organizations to be split to indicate an authoritative
Pod holding the most recent data, like `gitlab-org` on `pod-100`.
1. The data for `gitlab-org` on Pod 0, and on other non-authoritative N-Pods are dormant
and will be removed in the future.
1. All accesses to `gitlab-org` on a given Pod are validated about `pod_id` of `routes`
to ensure that given Pod is authoritative to handle the data.
+#### More challenges of this proposal
+
+1. There is no streaming replication capability for Elasticsearch, but you could
+ snapshot the whole Elasticsearch index and recreate, but this takes hours.
+ It could be handled by pausing Elasticsearch indexing on the initial pod during
+ the migration as indexing downtime is not a big issue, but this still needs
+ to be coordinated with the migration process
+1. Syncing Redis, Gitaly, CI Postgres, Main Postgres, registry Postgres, other
+ new data stores snapshots in an online system would likely lead to gaps
+ without a long downtime. You need to choose a sync point and at the sync
+ point you need to stop writes to perform the migration. The more data stores
+ there are to migrate at the same time the longer the write downtime for the
+ failover. We would also need to find a reliable place in the application to
+ actually block updates to all these systems with a high degree of
+ confidence. In the past we've only been confident by shutting down all rails
+ services because any rails process could write directly to any of these at
+ any time due to async workloads or other surprising code paths.
+1. How to efficiently delete all the orphaned data. Locating all `ci_builds`
+ associated with half the organizations would be very expensive if we have to
+ do joins. We haven't yet determined if we'd want to store an `organization_id`
+ column on every table, but this is the kind of thing it would be helpful for.
+
### 3.2. Migrate organization from an existing Pod
This is different to split, as we intend to perform logical and selective replication
@@ -75,6 +115,14 @@ which Pod is authoritative for this organization.
1. It likely will require a full database structure analysis (more robust than project import/export)
to perform selective PostgreSQL logical replication.
+#### More challenges of this proposal
+
+1. Logical replication is still not performant enough to keep up with our
+ scale. Even if we could use logical replication we still don't have an
+ efficient way to filter data related to a single organization without
+ joining all the way to the `organizations` table which will slow down
+ logical replication dramatically.
+
## 4. Evaluation
## 4.1. Pros