Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/geo/disaster_recovery')
-rw-r--r--doc/administration/geo/disaster_recovery/index.md27
-rw-r--r--doc/administration/geo/disaster_recovery/planned_failover.md24
2 files changed, 26 insertions, 25 deletions
diff --git a/doc/administration/geo/disaster_recovery/index.md b/doc/administration/geo/disaster_recovery/index.md
index 7c6f4a32b57..f6f88e9b193 100644
--- a/doc/administration/geo/disaster_recovery/index.md
+++ b/doc/administration/geo/disaster_recovery/index.md
@@ -7,17 +7,14 @@ type: howto
# Disaster Recovery (Geo) **(PREMIUM SELF)**
-Geo replicates your database, your Git repositories, and few other assets.
-We will support and replicate more data in the future, that will enable you to
-failover with minimal effort, in a disaster situation.
-
-See [Geo limitations](../index.md#limitations) for more information.
+Geo replicates your database, your Git repositories, and few other assets,
+but there are some [limitations](../index.md#limitations).
WARNING:
Disaster recovery for multi-secondary configurations is in **Alpha**.
For the latest updates, check the [Disaster Recovery epic for complete maturity](https://gitlab.com/groups/gitlab-org/-/epics/3574).
Multi-secondary configurations require the complete re-synchronization and re-configuration of all non-promoted secondaries and
-will cause downtime.
+causes downtime.
## Promoting a **secondary** Geo node in single-secondary configurations
@@ -91,13 +88,16 @@ Note the following when promoting a secondary:
before proceeding. If the secondary node
[has been paused](../../geo/index.md#pausing-and-resuming-replication), the promotion
performs a point-in-time recovery to the last known state.
- Data that was created on the primary while the secondary was paused will be lost.
+ Data that was created on the primary while the secondary was paused is lost.
- A new **secondary** should not be added at this time. If you want to add a new
**secondary**, do this after you have completed the entire process of promoting
the **secondary** to the **primary**.
- If you encounter an `ActiveRecord::RecordInvalid: Validation failed: Name has already been taken`
error message during this process, for more information, see this
[troubleshooting advice](../replication/troubleshooting.md#fixing-errors-during-a-failover-or-when-promoting-a-secondary-to-a-primary-node).
+- If you run into errors when using `--force` or `--skip-preflight-checks` before 13.5 during this process,
+ for more information, see this
+ [troubleshooting advice](../replication/troubleshooting.md#errors-when-using---skip-preflight-checks-or---force).
#### Promoting a **secondary** node running on a single machine
@@ -243,6 +243,7 @@ required:
sets the database to read-write. The instructions vary depending on where your database is hosted:
- [Amazon RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html#USER_ReadRepl.Promote)
- [Azure PostgreSQL](https://docs.microsoft.com/en-us/azure/postgresql/howto-read-replicas-portal#stop-replication)
+ - [Google Cloud SQL](https://cloud.google.com/sql/docs/mysql/replication/manage-replicas#promote-replica)
- For other external PostgreSQL databases, save the following script in your
secondary node, for example `/tmp/geo_promote.sh`, and modify the connection
parameters to match your environment. Then, execute it to promote the replica:
@@ -493,7 +494,7 @@ must disable the **primary** site:
WARNING:
If the secondary site [has been paused](../../geo/index.md#pausing-and-resuming-replication), this performs
a point-in-time recovery to the last known state.
-Data that was created on the primary while the secondary was paused will be lost.
+Data that was created on the primary while the secondary was paused is lost.
1. SSH in to the database node in the **secondary** and trigger PostgreSQL to
promote to read-write:
@@ -509,7 +510,7 @@ Data that was created on the primary while the secondary was paused will be lost
`geo_secondary_role`:
NOTE:
- Depending on your architecture these steps will need to be run on any GitLab node that is external to the **secondary** Kubernetes cluster.
+ Depending on your architecture, these steps need to run on any GitLab node that is external to the **secondary** Kubernetes cluster.
```ruby
## In pre-11.5 documentation, the role was enabled as follows. Remove this line.
@@ -537,13 +538,13 @@ Data that was created on the primary while the secondary was paused will be lost
1. Update the existing cluster configuration.
- You can retrieve the existing config with Helm:
+ You can retrieve the existing configuration with Helm:
```shell
helm --namespace gitlab get values gitlab-geo > gitlab.yaml
```
- The existing config will contain a section for Geo that should resemble:
+ The existing configuration contains a section for Geo that should resemble:
```yaml
geo:
@@ -560,9 +561,9 @@ Data that was created on the primary while the secondary was paused will be lost
To promote the **secondary** cluster to a **primary** cluster, update `role: secondary` to `role: primary`.
- You can remove the entire `psql` section if the cluster will remain as a primary site, this refers to the tracking database and will be ignored whilst the cluster is acting as a primary site.
+ If the cluster remains as a primary site, you can remove the entire `psql` section; it refers to the tracking database and is ignored whilst the cluster is acting as a primary site.
- Update the cluster with the new config:
+ Update the cluster with the new configuration:
```shell
helm upgrade --install --version <current Chart version> gitlab-geo gitlab/gitlab --namespace gitlab -f gitlab.yaml
diff --git a/doc/administration/geo/disaster_recovery/planned_failover.md b/doc/administration/geo/disaster_recovery/planned_failover.md
index bd8467f5437..d50078da172 100644
--- a/doc/administration/geo/disaster_recovery/planned_failover.md
+++ b/doc/administration/geo/disaster_recovery/planned_failover.md
@@ -35,7 +35,7 @@ required scheduled maintenance period significantly.
A common strategy for keeping this period as short as possible for data stored
in files is to use `rsync` to transfer the data. An initial `rsync` can be
performed ahead of the maintenance window; subsequent `rsync`s (including a
-final transfer inside the maintenance window) will then transfer only the
+final transfer inside the maintenance window) then transfers only the
*changes* between the **primary** node and the **secondary** nodes.
Repository-centric strategies for using `rsync` effectively can be found in the
@@ -50,7 +50,7 @@ this command reports `ERROR - Replication is not up-to-date` even if
replication is actually up-to-date. This bug was fixed in GitLab 13.8 and
later.
-Run this command to list out all preflight checks and automatically check if replication and verification are complete before scheduling a planned failover to ensure the process will go smoothly:
+Run this command to list out all preflight checks and automatically check if replication and verification are complete before scheduling a planned failover to ensure the process goes smoothly:
```shell
gitlab-ctl promotion-preflight-checks
@@ -73,7 +73,7 @@ In GitLab 12.4, you can optionally allow GitLab to manage replication of Object
Database settings are automatically replicated to the **secondary** node, but the
`/etc/gitlab/gitlab.rb` file must be set up manually, and differs between
nodes. If features such as Mattermost, OAuth or LDAP integration are enabled
-on the **primary** node but not the **secondary** node, they will be lost during failover.
+on the **primary** node but not the **secondary** node, they are lost during failover.
Review the `/etc/gitlab/gitlab.rb` file for both nodes and ensure the **secondary** node
supports everything the **primary** node does **before** scheduling a planned failover.
@@ -119,7 +119,7 @@ time to complete
If any objects are failing to replicate, this should be investigated before
scheduling the maintenance window. Following a planned failover, anything that
-failed to replicate will be **lost**.
+failed to replicate is **lost**.
You can use the [Geo status API](../../../api/geo_nodes.md#retrieve-project-sync-or-verification-failures-that-occurred-on-the-current-node) to review failed objects and
the reasons for failure.
@@ -136,9 +136,9 @@ This [content was moved to another location](background_verification.md).
On the **primary** node, navigate to **Admin Area > Messages**, add a broadcast
message. You can check under **Admin Area > Geo** to estimate how long it
-will take to finish syncing. An example message would be:
+takes to finish syncing. An example message would be:
-> A scheduled maintenance will take place at XX:XX UTC. We expect it to take
+> A scheduled maintenance takes place at XX:XX UTC. We expect it to take
> less than 1 hour.
## Prevent updates to the **primary** node
@@ -151,7 +151,7 @@ be disabled on the primary site:
1. Disable non-Geo periodic background jobs on the **primary** node by navigating
to **Admin Area > Monitoring > Background Jobs > Cron**, pressing `Disable All`,
and then pressing `Enable` for the `geo_sidekiq_cron_config_worker` cron job.
- This job will re-enable several other cron jobs that are essential for planned
+ This job re-enables several other cron jobs that are essential for planned
failover to complete successfully.
## Finish replicating and verifying all data
@@ -161,7 +161,7 @@ be disabled on the primary site:
1. On the **primary** node, navigate to **Admin Area > Monitoring > Background Jobs > Queues**
and wait for all queues except those with `geo` in the name to drop to 0.
These queues contain work that has been submitted by your users; failing over
- before it is completed will cause the work to be lost.
+ before it is completed, causes the work to be lost.
1. On the **primary** node, navigate to **Admin Area > Geo** and wait for the
following conditions to be true of the **secondary** node you are failing over to:
@@ -176,15 +176,15 @@ be disabled on the primary site:
to verify the integrity of CI artifacts, LFS objects, and uploads in file
storage.
-At this point, your **secondary** node will contain an up-to-date copy of everything the
-**primary** node has, meaning nothing will be lost when you fail over.
+At this point, your **secondary** node contains an up-to-date copy of everything the
+**primary** node has, meaning nothing was lost when you fail over.
## Promote the **secondary** node
Finally, follow the [Disaster Recovery docs](index.md) to promote the
-**secondary** node to a **primary** node. This process will cause a brief outage on the **secondary** node, and users may need to log in again.
+**secondary** node to a **primary** node. This process causes a brief outage on the **secondary** node, and users may need to log in again.
-Once it is completed, the maintenance window is over! Your new **primary** node will now
+Once it is completed, the maintenance window is over! Your new **primary** node, now
begin to diverge from the old one. If problems do arise at this point, failing
back to the old **primary** node [is possible](bring_primary_back.md), but likely to result
in the loss of any data uploaded to the new **primary** in the meantime.