Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/geo.md')
-rw-r--r--doc/development/geo.md60
1 files changed, 41 insertions, 19 deletions
diff --git a/doc/development/geo.md b/doc/development/geo.md
index 884c09cc174..76c75cb1c6a 100644
--- a/doc/development/geo.md
+++ b/doc/development/geo.md
@@ -200,7 +200,7 @@ sequenceDiagram
S->>TDB: Insert to `job_artifact_registry`
```
-- [Sidekiq-cron](https://github.com/ondrejbartas/sidekiq-cron) enqueues a `Geo::Secondary::RegistryConsistencyWorker` job every minute. As long as it is actively doing work (creating and deleting rows), this job immediately reenqueues itself. This job uses an exclusive lease to prevent multiple instances of itself from running simultaneously.
+- [Sidekiq-cron](https://github.com/ondrejbartas/sidekiq-cron) enqueues a `Geo::Secondary::RegistryConsistencyWorker` job every minute. As long as it is actively doing work (creating and deleting rows), this job immediately re-enqueues itself. This job uses an exclusive lease to prevent multiple instances of itself from running simultaneously.
- [Sidekiq](architecture.md#sidekiq) picks up `Geo::Secondary::RegistryConsistencyWorker` job
- Sidekiq queries `ci_job_artifacts` table for up to 10000 rows
- Sidekiq queries `job_artifact_registry` table for up to 10000 rows
@@ -308,7 +308,7 @@ sequenceDiagram
- Sidekiq queries `job_artifact_registry` in the [PostgreSQL Geo Tracking Database](#tracking-database) for the number of rows marked "pending verification" or "failed verification and ready to retry"
- Sidekiq enqueues one or more `Geo::VerificationBatchWorker` jobs, limited by the "maximum verification concurrency" setting
- Sidekiq picks up `Geo::VerificationBatchWorker` job
- - Sidekiq queries `job_artifact_registry` in the PostgreSQL Geo Tracking Databasef for rows marked "pending verification"
+ - Sidekiq queries `job_artifact_registry` in the PostgreSQL Geo Tracking Database for rows marked "pending verification"
- If the previous step yielded less than 10 rows, then Sidekiq queries `job_artifact_registry` for rows marked "failed verification and ready to retry"
- For each row
- Sidekiq marks it "started verification"
@@ -588,23 +588,45 @@ When some write actions are not allowed because the site is a
The database itself will already be read-only in a replicated setup,
so we don't need to take any extra step for that.
-## Steps needed to replicate a new data type
-
-As GitLab evolves, we constantly need to add new resources to the Geo replication system.
-The implementation depends on resource specifics, but there are several things
-that need to be taken care of:
-
-- Event generation on the primary site. Whenever a new resource is changed/updated, we need to
- create a task for the Log Cursor.
-- Event handling. The Log Cursor needs to have a handler for every event type generated by the primary site.
-- Dispatch worker (cron job). Make sure the backfill condition works well.
-- Sync worker.
-- Registry with all possible states.
-- Verification.
-- Cleaner. When sync settings are changed for the secondary site, some resources need to be cleaned up.
-- Geo Node Status. We need to provide API endpoints as well as some presentation in the GitLab Admin Area.
-- Health Check. If we can perform some pre-cheсks and make site unhealthy if something is wrong, we should do that.
- The `rake gitlab:geo:check` command has to be updated too.
+## Ensuring a new feature has Geo support
+
+Geo depends on PostgreSQL replication of the main and CI databases, so if you add a new table or field, it should already work on secondary Geo sites.
+
+However, if you introduce a new kind of data which is stored outside of the main and CI PostgreSQL databases, then you need to ensure that this data is replicated and verified by Geo. This is necessary for customers to be able to rely on their secondary sites for [disaster recovery](../administration/geo/disaster_recovery/index.md).
+
+The following subsections describe how to determine whether work is needed, and if so, how to proceed. If you have any questions, [contact the Geo team](https://about.gitlab.com/handbook/product/categories/#geo-group).
+
+For comparison with your own features, see [Supported Geo data types](../administration/geo/replication/datatypes.md). It has a detailed, up-to-date list of the types of data that Geo replicates and verifies.
+
+### Git repositories
+
+If you add a feature that is backed by Git repositories, then you must add Geo support. See [the repository replicator strategy of the Geo self-service framework](geo/framework.md#repository-replicator-strategy).
+
+### Blobs
+
+If you add a subclass of `CarrierWave::Uploader::Base`, then you are adding what Geo calls a blob. If you specifically subclass [`AttachmentUploader` as generally recommended](uploads/working_with_uploads.md#recommendations), then the data has Geo support with no work needed. This is because `AttachmentUploader` tracks blobs with the `Upload` model using the `uploads` table, and Geo support is already implemented for that model.
+
+If your blobs are tracked in a new table, perhaps because you expect millions of rows at GitLab.com scale, then you must add Geo support. See [the blob replicator strategy of the Geo self-service framework](geo/framework.md#blob-replicator-strategy).
+
+[Geo detects new blobs with a spec](https://gitlab.com/gitlab-org/gitlab/-/blob/eeba0e4d231ae39012a5bbaeac43a72c2bd8affb/ee/spec/uploaders/every_gitlab_uploader_spec.rb) that fails when an `Uploader` does not have a corresponding `Replicator`.
+
+### Features with more than one kind of data
+
+If a new complex feature is backed by multiple kinds of data, for example, a Git repository and a blob, then you can likely consider each kind of data separately.
+
+Taking [Designs](../user/project/issues/design_management.md) as an example, each issue has a Git repository which can have many LFS objects, and each LFS object may have an automatically generated thumbnail.
+
+- LFS objects were already supported by Geo, so no Geo-specific work was needed.
+- The implementation of thumbnails reused the `Upload` model, so no Geo-specific work was needed.
+- Design Git repositories were not inherently supported by Geo, so work was needed.
+
+As another example, [Dependency Proxy](../administration/packages/dependency_proxy.md) is backed by two kinds of blobs, `DependencyProxy::Blob` and `DependencyProxy::Manifest`. We can use [the blob replicator strategy of the Geo self-service framework](geo/framework.md#blob-replicator-strategy) on each type, independent of each other.
+
+### Other kinds of data
+
+If a new feature introduces a new kind of data which is not a Git repository, or a blob, or a combination of the two, then contact the Geo team to discuss how to handle it.
+
+As an example, Container Registry data does not easily fit into the above categories. It is backed by a registry service which owns the data, and GitLab interacts with the registry service's API. So a one off approach is required for Geo support of Container Registry. Still, we are able to reuse much of the glue code of [the Geo self-service framework](geo/framework.md#repository-replicator-strategy).
## History of communication channel