Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/geo')
-rw-r--r--doc/administration/geo/disaster_recovery/background_verification.md37
-rw-r--r--doc/administration/geo/disaster_recovery/bring_primary_back.md2
-rw-r--r--doc/administration/geo/disaster_recovery/index.md8
-rw-r--r--doc/administration/geo/replication/troubleshooting.md330
-rw-r--r--doc/administration/geo/replication/upgrading_the_geo_sites.md2
-rw-r--r--doc/administration/geo/setup/database.md21
-rw-r--r--doc/administration/geo/setup/index.md19
-rw-r--r--doc/administration/geo/setup/two_single_node_external_services.md493
8 files changed, 745 insertions, 167 deletions
diff --git a/doc/administration/geo/disaster_recovery/background_verification.md b/doc/administration/geo/disaster_recovery/background_verification.md
index a31261892bb..8aec77d9d88 100644
--- a/doc/administration/geo/disaster_recovery/background_verification.md
+++ b/doc/administration/geo/disaster_recovery/background_verification.md
@@ -23,28 +23,6 @@ these failures, so you should follow [these instructions](background_verificatio
If verification is lagging significantly behind replication, consider giving
the site more time before scheduling a planned failover.
-## Disabling or enabling the automatic background verification
-
-Run the following commands in a [Rails console](../../operations/rails_console.md) on a **Rails node on the primary** site.
-
-To check if automatic background verification is enabled:
-
-```ruby
-Gitlab::Geo.repository_verification_enabled?
-```
-
-To disable automatic background verification:
-
-```ruby
-Feature.disable('geo_repository_verification')
-```
-
-To enable automatic background verification:
-
-```ruby
-Feature.enable('geo_repository_verification')
-```
-
## Repository verification
On the **primary** site:
@@ -97,21 +75,6 @@ On the **primary** site:
![Re-verification interval](img/reverification-interval.png)
-The automatic background re-verification is enabled by default, but you can
-disable if you need. Run the following commands in a [Rails console](../../operations/rails_console.md) on a **Rails node on the primary** site:
-
-To disable automatic background re-verification:
-
-```ruby
-Feature.disable('geo_repository_reverification')
-```
-
-To enable automatic background re-verification:
-
-```ruby
-Feature.enable('geo_repository_reverification')
-```
-
## Reset verification for projects where verification has failed
Geo actively tries to correct verification failures marking the repository to
diff --git a/doc/administration/geo/disaster_recovery/bring_primary_back.md b/doc/administration/geo/disaster_recovery/bring_primary_back.md
index fe05b52cec9..5f2cbd4d03b 100644
--- a/doc/administration/geo/disaster_recovery/bring_primary_back.md
+++ b/doc/administration/geo/disaster_recovery/bring_primary_back.md
@@ -55,7 +55,7 @@ To bring the former **primary** site up to date:
[block all the writes to this site](planned_failover.md#prevent-updates-to-the-primary-site)
during this procedure.
-1. [Set up database replication](../setup/database.md). In this case, the **secondary** site
+1. [Set up Geo](../setup/index.md). In this case, the **secondary** site
refers to the former **primary** site.
1. If [PgBouncer](../../postgresql/pgbouncer.md) was enabled on the **current secondary** site
(when it was a primary site) disable it by editing `/etc/gitlab/gitlab.rb`
diff --git a/doc/administration/geo/disaster_recovery/index.md b/doc/administration/geo/disaster_recovery/index.md
index 0c160e85570..d6f6211ed4c 100644
--- a/doc/administration/geo/disaster_recovery/index.md
+++ b/doc/administration/geo/disaster_recovery/index.md
@@ -670,7 +670,9 @@ If the secondary site [has been paused](../../geo/index.md#pausing-and-resuming-
a point-in-time recovery to the last known state.
Data that was created on the primary while the secondary was paused is lost.
-If you are running GitLab 14.5 and later:
+::Tabs
+
+:::TabTitle For GitLab 14.5 and later
1. For each node (such as PostgreSQL or Gitaly) outside of the **secondary** Kubernetes cluster using the Linux
package, SSH into the node and run one of the following commands:
@@ -706,7 +708,7 @@ If you are running GitLab 14.5 and later:
| ---- | ------------- | ------- |
| `ENABLE_SILENT_MODE` | `false` | If `true`, enables [Silent Mode](../../silent_mode/index.md) before promotion (GitLab 16.4 and later) |
-If you are running GitLab 14.4 and earlier:
+:::TabTitle For GitLab 14.4 and earlier
1. SSH in to the database node in the **secondary** site and trigger PostgreSQL to
promote to read-write:
@@ -744,6 +746,8 @@ If you are running GitLab 14.4 and earlier:
kubectl --namespace gitlab exec -ti gitlab-geo-task-runner-XXX -- gitlab-rake geo:set_secondary_as_primary
```
+::EndTabs
+
### Step 3. Promote the **secondary** cluster
1. Update the existing cluster configuration.
diff --git a/doc/administration/geo/replication/troubleshooting.md b/doc/administration/geo/replication/troubleshooting.md
index dd2693f4ba7..3c2d43d196a 100644
--- a/doc/administration/geo/replication/troubleshooting.md
+++ b/doc/administration/geo/replication/troubleshooting.md
@@ -172,8 +172,7 @@ http://secondary.example.com/
GitLab Version: 14.9.2-ee
Geo Role: Secondary
Health Status: Healthy
- Repositories: succeeded 12345 / total 12345 (100%)
- Verified Repositories: succeeded 12345 / total 12345 (100%)
+ Project Repositories: succeeded 12345 / total 12345 (100%)
Project Wiki Repositories: succeeded 6789 / total 6789 (100%)
Attachments: succeeded 4 / total 4 (100%)
CI job artifacts: succeeded 0 / total 0 (0%)
@@ -191,6 +190,7 @@ http://secondary.example.com/
Terraform State Versions Verified: succeeded 0 / total 10 (0%)
Snippet Repositories Verified: succeeded 99 / total 100 (99%)
Pipeline Artifacts Verified: succeeded 0 / total 10 (0%)
+ Project Repositories Verified: succeeded 12345 / total 12345 (100%)
Project Wiki Repositories Verified: succeeded 6789 / total 6789 (100%)
Sync Settings: Full
Database replication lag: 0 seconds
@@ -199,19 +199,19 @@ http://secondary.example.com/
Last status report was: 1 minute ago
```
-There are up to three statuses for each item. For example, for `Repositories`, you see the following lines:
+There are up to three statuses for each item. For example, for `Project Repositories`, you see the following lines:
```plaintext
- Repositories: succeeded 12345 / total 12345 (100%)
- Verified Repositories: succeeded 12345 / total 12345 (100%)
+ Project Repositories: succeeded 12345 / total 12345 (100%)
+ Project Repositories Verified: succeeded 12345 / total 12345 (100%)
Repositories Checked: failed 5 / succeeded 0 / total 5 (0%)
```
The 3 status items are defined as follows:
-- The `Repositories` output shows how many repositories are synced from the primary to the secondary.
-- The `Verified Repositories` output shows how many repositories on this secondary have a matching repository checksum with the Primary.
-- The `Repositories Checked` output shows how many repositories have passed a local Git repository check (`git fsck`) on the secondary.
+- The `Project Repositories` output shows how many project repositories are synced from the primary to the secondary.
+- The `Project Verified Repositories` output shows how many project repositories on this secondary have a matching repository checksum with the Primary.
+- The `Repositories Checked` output shows how many project repositories have passed a local Git repository check (`git fsck`) on the secondary.
To find more details about failed items, check
[the `gitlab-rails/geo.log` file](../../logs/log_parsing.md#find-most-common-geo-sync-errors)
@@ -503,6 +503,46 @@ This check is also required when using a mixture of GitLab deployments. The loca
## Fixing PostgreSQL database replication errors
+The following sections outline troubleshooting steps for fixing replication error messages (indicated by `Database replication working? ... no` in the
+[`geo:check` output](#health-check-rake-task).
+The instructions present here mostly assume a single-node Geo Linux package deployment, and might need to be adapted to different environments.
+
+### Removing an inactive replication slot
+
+Replication slots are marked as 'inactive' when the replication client (a secondary site) connected to the slot disconnects.
+Inactive replication slots cause WAL files to be retained, because they are sent to the client when it reconnects and the slot becomes active once more.
+If the secondary site is not able to reconnect, use the following steps to remove its corresponding inactive replication slot:
+
+1. [Start a PostgreSQL console session](https://docs.gitlab.com/omnibus/settings/database.html#connecting-to-the-postgresql-database) on the Geo primary site's database node:
+
+ ```shell
+ sudo gitlab-psql -d gitlabhq_production
+ ```
+
+ NOTE:
+ Using `gitlab-rails dbconsole` does not work, because managing replication slots requires superuser permissions.
+
+1. View the replication slots and remove them if they are inactive:
+
+ ```sql
+ SELECT * FROM pg_replication_slots;
+ ```
+
+ Slots where `active` is `f` are inactive.
+
+ - When this slot should be active, because you have a **secondary** site configured using that slot,
+ look for the [PostgreSQL logs](../../logs/index.md#postgresql-logs) for the **secondary** site,
+ to view why the replication is not running.
+ - If you are no longer using the slot (for example, you no longer have Geo enabled), or the secondary site is no longer able to reconnect,
+ you should remove it using the PostgreSQL console session:
+
+ ```sql
+ SELECT pg_drop_replication_slot('<name_of_inactive_slot>');
+ ```
+
+1. Follow either the steps [to remove that Geo site](remove_geo_site.md) if it's no longer required,
+ or [re-initiate the replication process](../setup/database.md#step-3-initiate-the-replication-process), which recreates the replication slot correctly.
+
### Message: `WARNING: oldest xmin is far in the past` and `pg_wal` size growing
If a replication slot is inactive,
@@ -517,18 +557,7 @@ HINT: Close open transactions soon to avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
```
-To fix this:
-
-1. [Connect to the primary database](https://docs.gitlab.com/omnibus/settings/database.html#connecting-to-the-bundled-postgresql-database).
-
-1. Run `SELECT * FROM pg_replication_slots;`.
- Note the `slot_name` that reports `active` as `f` (false).
-
-1. Follow [the steps to remove that Geo site](remove_geo_site.md).
-
-The following sections outline troubleshooting steps for fixing replication
-error messages (indicated by `Database replication working? ... no` in the
-[`geo:check` output](#health-check-rake-task).
+To fix this, you should [remove the inactive replication slot](#removing-an-inactive-replication-slot) and re-initiate the replication.
### Message: `ERROR: replication slots can only be used if max_replication_slots > 0`?
@@ -568,35 +597,9 @@ the default 30 minutes. Adjust as required for your installation.
### Message: "PANIC: could not write to file `pg_xlog/xlogtemp.123`: No space left on device"
Determine if you have any unused replication slots in the **primary** database. This can cause large amounts of
-log data to build up in `pg_xlog`. Removing the unused slots can reduce the amount of space used in the `pg_xlog`.
-
-1. Start a PostgreSQL console session:
-
- ```shell
- sudo gitlab-psql
- ```
+log data to build up in `pg_xlog`.
- NOTE:
- Using `gitlab-rails dbconsole` does not work, because managing replication slots requires superuser permissions.
-
-1. View your replication slots:
-
- ```sql
- SELECT * FROM pg_replication_slots;
- ```
-
-Slots where `active` is `f` are not active.
-
-- When this slot should be active, because you have a **secondary** site configured using that slot,
- sign in on the web interface for the **secondary** site and check the [PostgreSQL logs](../../logs/index.md#postgresql-logs)
- to view why the replication is not running.
-
-- If you are no longer using the slot (for example, you no longer have Geo enabled), you can remove it with in the
- PostgreSQL console session:
-
- ```sql
- SELECT pg_drop_replication_slot('<name_of_extra_slot>');
- ```
+[Removing the inactive slots](#removing-an-inactive-replication-slot) can reduce the amount of space used in the `pg_xlog`.
### Message: "ERROR: canceling statement due to conflict with recovery"
@@ -1016,83 +1019,166 @@ If you notice replication failures in `Admin > Geo > Sites` or the [Sync status
### Manually retry replication or verification
-Project Git repositories and Project Wiki Git repositories have the ability in `Admin > Geo > Replication` to `Resync all`, `Reverify all`, or for a single resource, `Resync` or `Reverify`.
+A Geo data type is a specific class of data that is required by one or more GitLab features to store relevant information and is replicated by Geo to secondary sites.
+
+The following Geo data types exist:
+
+- **Blob types:**
+ - `Ci::JobArtifact`
+ - `Ci::PipelineArtifact`
+ - `Ci::SecureFile`
+ - `LfsObject`
+ - `MergeRequestDiff`
+ - `Packages::PackageFile`
+ - `PagesDeployment`
+ - `Terraform::StateVersion`
+ - `Upload`
+ - `DependencyProxy::Manifest`
+ - `DependencyProxy::Blob`
+- **Repository types:**
+ - `ContainerRepositoryRegistry`
+ - `DesignManagement::Repository`
+ - `ProjectRepository`
+ - `ProjectWikiRepository`
+ - `SnippetRepository`
+ - `GroupWikiRepository`
-Adding this ability to other data types is proposed in issue [364725](https://gitlab.com/gitlab-org/gitlab/-/issues/364725).
+The main kinds of classes are Registry, Model, and Replicator. If you have an instance of one of these classes, you can get the others. The Registry and Model mostly manage PostgreSQL DB state. The Replicator knows how to replicate/verify (or it can call a service to do it):
-The following sections describe how to use internal application commands in the [Rails console](../../../administration/operations/rails_console.md#starting-a-rails-console-session) to cause replication or verification immediately.
+```ruby
+model_record = Packages::PackageFile.last
+model_record.replicator.registry.replicator.model_record # just showing that these methods exist
+```
-WARNING:
-Commands that change data can cause damage if not run correctly or under the right conditions. Always run commands in a test environment first and have a backup instance ready to restore.
+With all this information, you can:
-### Blob types
+- [Manually resync and reverify individual components](#resync-and-reverify-individual-components)
+- [Manually resync and reverify multiple components](#resync-and-reverify-multiple-components)
-- `Ci::JobArtifact`
-- `Ci::PipelineArtifact`
-- `Ci::SecureFile`
-- `LfsObject`
-- `MergeRequestDiff`
-- `Packages::PackageFile`
-- `PagesDeployment`
-- `Terraform::StateVersion`
-- `Upload`
+#### Resync and reverify individual components
-`Packages::PackageFile` is used in the following
-[Rails console](../../../administration/operations/rails_console.md#starting-a-rails-console-session)
-examples, but things generally work the same for the other types.
+[You can force a resync and reverify individual items](https://gitlab.com/gitlab-org/gitlab/-/issues/364727)
+for all component types managed by the [self-service framework](../../../development/geo/framework.md) using the UI.
+On the secondary site, visit **Admin > Geo > Replication**.
+
+However, if this doesn't work, you can perform the same action using the Rails
+console. The following sections describe how to use internal application
+commands in the Rails console to cause replication or verification for
+individual records synchronously or asynchronously.
WARNING:
Commands that change data can cause damage if not run correctly or under the right conditions. Always run commands in a test environment first and have a backup instance ready to restore.
-### Repository types, except for project or project wiki repositories
+[Start a Rails console session](../../../administration/operations/rails_console.md#starting-a-rails-console-session)
+to enact the following, basic troubleshooting steps:
-- `SnippetRepository`
-- `GroupWikiRepository`
+- **For Blob types** (using the `Packages::PackageFile` component as an example)
-`SnippetRepository` is used in the examples below, but things generally work the same for the other Repository types.
+ - Find registry records that failed to sync:
-[Start a Rails console session](../../../administration/operations/rails_console.md#starting-a-rails-console-session)
-to enact the following, basic troubleshooting steps.
+ ```ruby
+ Geo::PackageFileRegistry.failed
+ ```
-WARNING:
-Commands that change data can cause damage if not run correctly or under the right conditions. Always run commands in a test environment first and have a backup instance ready to restore.
+ - Find registry records that are missing on the primary site:
-#### The Replicator
+ ```ruby
+ Geo::PackageFileRegistry.where(last_sync_failure: 'The file is missing on the Geo primary site')
+ ```
-The main kinds of classes are Registry, Model, and Replicator. If you have an instance of one of these classes, you can get the others. The Registry and Model mostly manage PostgreSQL DB state. The Replicator knows how to replicate/verify (or it can call a service to do it):
+ - Resync a package file, synchronously, given an ID:
-```ruby
-model_record = Packages::PackageFile.last
-model_record.replicator.registry.replicator.model_record # just showing that these methods exist
-```
+ ```ruby
+ model_record = Packages::PackageFile.find(id)
+ model_record.replicator.send(:download)
+ ```
-#### Replicate a package file, synchronously, given an ID
+ - Resync a package file, synchronously, given a registry ID:
-```ruby
-model_record = Packages::PackageFile.find(id)
-model_record.replicator.send(:download)
-```
+ ```ruby
+ registry = Geo::PackageFileRegistry.find(registry_id)
+ registry.replicator.send(:download)
+ ```
-#### Replicate a package file, synchronously, given a registry ID
+ - Resync a package file, asynchronously, given a registry ID.
+ Since GitLab 16.2, a component can be asynchronously replicated as follows:
-```ruby
-registry = Geo::PackageFileRegistry.find(registry_id)
-registry.replicator.send(:download)
-```
+ ```ruby
+ registry = Geo::PackageFileRegistry.find(registry_id)
+ registry.replicator.enqueue_sync
+ ```
-#### Find registry records of blobs that failed to sync
+ - Reverify a package file, asynchronously, given a registry ID.
+ Since GitLab 16.2, a component can be asynchronously reverified as follows:
-```ruby
-Geo::PackageFileRegistry.failed
-```
+ ```ruby
+ registry = Geo::PackageFileRegistry.find(registry_id)
+ registry.replicator.verify_async
+ ```
-#### Find registry records of blobs that are missing on the primary site
+- **For Repository types** (using the `SnippetRepository` component as an example)
-```ruby
-Geo::PackageFileRegistry.where(last_sync_failure: 'The file is missing on the Geo primary site')
-```
+ - Resync a snippet repository, synchronously, given an ID:
+
+ ```ruby
+ model_record = Geo::SnippetRepositoryRegistry.find(id)
+ model_record.replicator.sync_repository
+ ```
-#### Verify package files on the secondary manually
+ - Resync a snippet repository, synchronously, given a registry ID
+
+ ```ruby
+ registry = Geo::SnippetRepositoryRegistry.find(registry_id)
+ registry.replicator.sync_repository
+ ```
+
+ - Resync a snippet repository, asynchronously, given a registry ID.
+ Since GitLab 16.2, a component can be asynchronously replicated as follows:
+
+ ```ruby
+ registry = Geo::SnippetRepositoryRegistry.find(registry_id)
+ registry.replicator.enqueue_sync
+ ```
+
+ - Reverify a snippet repository, asynchronously, given a registry ID.
+ Since GitLab 16.2, a component can be asynchronously reverified as follows:
+
+ ```ruby
+ registry = Geo::SnippetRepositoryRegistry.find(registry_id)
+ registry.replicator.verify_async
+ ```
+
+#### Resync and reverify multiple components
+
+NOTE:
+There is an [issue to implement this functionality in the Admin Area UI](https://gitlab.com/gitlab-org/gitlab/-/issues/364729).
+
+WARNING:
+Commands that change data can cause damage if not run correctly or under the right conditions. Always run commands in a test environment first and have a backup instance ready to restore.
+
+The following sections describe how to use internal application commands in the [Rails console](../../../administration/operations/rails_console.md#starting-a-rails-console-session)
+to cause bulk replication or verification.
+
+##### Reverify all components (or any SSF data type which supports verification)
+
+For GitLab 16.4 and earlier:
+
+1. SSH into a GitLab Rails node in the primary Geo site.
+1. Open the [Rails console](../../../administration/operations/rails_console.md#starting-a-rails-console-session).
+1. Mark all uploads as `pending verification`:
+
+ ```ruby
+ Upload.verification_state_table_class.each_batch do |relation|
+ relation.update_all(verification_state: 0)
+ end
+ ```
+
+1. This causes the primary to start checksumming all Uploads.
+1. When a primary successfully checksums a record, then all secondaries recalculate the checksum as well, and they compare the values.
+
+For other SSF data types replace `Upload` in the command above with the desired model class.
+
+##### Verify blob files on the secondary manually
This iterates over all package files on the secondary, looking at the
`verification_checksum` stored in the database (which came from the primary)
@@ -1143,25 +1229,43 @@ status.keys.each {|key| puts "#{key} count: #{status[key].count}"}
status
```
-#### Reverify all uploads (or any SSF data type which is verified)
+### Failed verification of Uploads on the primary Geo site
-1. SSH into a GitLab Rails node in the primary Geo site.
-1. Open [Rails console](../../../administration/operations/rails_console.md#starting-a-rails-console-session).
-1. Mark all uploads as "pending verification":
+If some Uploads verification is failing on the primary Geo site with the `verification_checksum: nil` and `verification_failure: Error during verification: undefined method 'underscore' for NilClass:Class` errros, this can be due to orphaned Uploads. The parent record owning the Upload (the Upload's `model`) has somehow been deleted, but the Upload record still exists. These verification failures are false.
- ```ruby
- Upload.verification_state_table_class.each_batch do |relation|
- relation.update_all(verification_state: 0)
- end
- ```
+You can find these errors in the `geo.log` file on the primary Geo site.
-1. This causes the primary to start checksumming all Uploads.
-1. When a primary successfully checksums a record, then all secondaries recalculate the checksum as well, and they compare the values.
+To confirm that model records are missing, you can run a Rake task on the primary Geo site:
-For other SSF data types replace `Upload` in the command above with the desired model class.
+```shell
+sudo gitlab-rake gitlab:uploads:check
+```
-NOTE:
-There is an [issue to implement this functionality in the Admin Area UI](https://gitlab.com/gitlab-org/gitlab/-/issues/364729).
+You can delete these Upload records on the primary Geo site to get rid of these failures by running the following script from the [Rails console](../../operations/rails_console.md):
+
+```ruby
+# Look for uploads with the verification error
+# or edit with your own affected IDs
+uploads = Geo::UploadState.where(
+ verification_checksum: nil,
+ verification_state: 3,
+ verification_failure: "Error during verification: undefined method 'underscore' for NilClass:Class"
+).pluck(:upload_id)
+
+uploads_deleted = 0
+begin
+ uploads.each do |upload|
+ u = Upload.find upload
+ rescue => e
+ puts "checking upload #{u.id} failed with #{e.message}"
+ else
+ uploads_deleted=uploads_deleted + 1
+ p u ### allow verification before destroy
+ # p u.destroy! ### uncomment to actually destroy
+ end
+end
+p "#{uploads_deleted} remote objects were destroyed."
+```
## HTTP response code errors
diff --git a/doc/administration/geo/replication/upgrading_the_geo_sites.md b/doc/administration/geo/replication/upgrading_the_geo_sites.md
index ce0ad736071..6f02ef29f99 100644
--- a/doc/administration/geo/replication/upgrading_the_geo_sites.md
+++ b/doc/administration/geo/replication/upgrading_the_geo_sites.md
@@ -11,6 +11,8 @@ WARNING:
Read these sections carefully before updating your Geo sites. Not following
version-specific upgrade steps may result in unexpected downtime. If you have
any specific questions, [contact Support](https://about.gitlab.com/support/#contact-support).
+A database major version upgrade requires [re-initializing the PostgreSQL replication](https://docs.gitlab.com/omnibus/settings/database.html#upgrading-a-geo-instance)
+to Geo secondaries. This may result in a larger than expected downtime.
Upgrading Geo sites involves performing:
diff --git a/doc/administration/geo/setup/database.md b/doc/administration/geo/setup/database.md
index d94c44a76f2..471bae72c5b 100644
--- a/doc/administration/geo/setup/database.md
+++ b/doc/administration/geo/setup/database.md
@@ -911,13 +911,20 @@ For each node running a Patroni instance on the secondary site:
- If you are configuring a Patroni standby cluster on a site that previously had a working Patroni cluster:
- ```shell
- gitlab-ctl stop patroni
- rm -rf /var/opt/gitlab/postgresql/data
- /opt/gitlab/embedded/bin/patronictl -c /var/opt/gitlab/patroni/patroni.yaml remove postgresql-ha
- gitlab-ctl reconfigure
- gitlab-ctl start patroni
- ```
+ 1. Stop Patroni on all nodes that are managed by Patroni, including cascade replicas:
+
+ ```shell
+ gitlab-ctl stop patroni
+ ```
+
+ 1. Run the following on the leader Patroni node to recreate the standby cluster:
+
+ ```shell
+ rm -rf /var/opt/gitlab/postgresql/data
+ /opt/gitlab/embedded/bin/patronictl -c /var/opt/gitlab/patroni/patroni.yaml remove postgresql-ha
+ gitlab-ctl reconfigure
+ gitlab-ctl start patroni
+ ```
### Migrating a single tracking database node to Patroni
diff --git a/doc/administration/geo/setup/index.md b/doc/administration/geo/setup/index.md
index cb318783128..ea3bb5afc24 100644
--- a/doc/administration/geo/setup/index.md
+++ b/doc/administration/geo/setup/index.md
@@ -28,13 +28,8 @@ a single-node Geo site or a multi-node Geo site.
### Single-node Geo sites
-If both Geo sites are based on the [1K reference architecture](../../reference_architectures/1k_users.md):
-
-1. Set up the database replication based on your choice of PostgreSQL instances (`primary (read-write) <-> secondary (read-only)` topology):
- - [Using Linux package PostgreSQL instances](database.md) .
- - [Using external PostgreSQL instances](external_database.md)
-1. [Configure GitLab](../replication/configuration.md) to set the **primary** and **secondary** sites.
-1. Follow the [Using a Geo Site](../replication/usage.md) guide.
+If both Geo sites are based on the [1K reference architecture](../../reference_architectures/1k_users.md), follow
+[Set up Geo for two single-node sites](two_single_node_sites.md).
Depending on your GitLab deployment, [additional configuration](#additional-configuration) for LDAP, object storage, and the Container Registry might be required.
@@ -45,6 +40,16 @@ If one or more of your sites is using the [2K reference architecture](../../refe
Depending on your GitLab deployment, [additional configuration](#additional-configuration) for LDAP, object storage, and the Container Registry might be required.
+### General steps for reference
+
+1. Set up the database replication based on your choice of PostgreSQL instances (`primary (read-write) <-> secondary (read-only)` topology):
+ - [Using Linux package PostgreSQL instances](database.md) .
+ - [Using external PostgreSQL instances](external_database.md)
+1. [Configure GitLab](../replication/configuration.md) to set the **primary** and **secondary** sites.
+1. Follow the [Using a Geo Site](../replication/usage.md) guide.
+
+Depending on your GitLab deployment, [additional configuration](#additional-configuration) for LDAP, object storage, and the Container Registry might be required.
+
### Additional configuration
Depending on how you use GitLab, the following configuration might be required:
diff --git a/doc/administration/geo/setup/two_single_node_external_services.md b/doc/administration/geo/setup/two_single_node_external_services.md
new file mode 100644
index 00000000000..405a791fedc
--- /dev/null
+++ b/doc/administration/geo/setup/two_single_node_external_services.md
@@ -0,0 +1,493 @@
+---
+stage: Systems
+group: Geo
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
+---
+
+# Set up Geo for two single-node sites (with external PostgreSQL services) **(PREMIUM SELF)**
+
+The following guide provides concise instructions on how to deploy GitLab Geo for a two single-node site installation using two Linux package instances and external PostgreSQL databases like RDS, Azure Database, or Google Cloud SQL.
+
+Prerequisites:
+
+- You have at least two independently working GitLab sites.
+ To create the sites, see the [GitLab reference architectures documentation](../../reference_architectures/index.md).
+ - One GitLab site serves as the **Geo primary site**. You can use different reference architecture sizes for each Geo site. If you already have a working GitLab instance, you can use it as the primary site.
+ - The second GitLab site serves as the **Geo secondary site**. Geo supports multiple secondary sites.
+- The Geo primary site has at least a [GitLab Premium](https://about.gitlab.com/pricing/) license.
+ You need only one license for all sites.
+- Confirm all sites meet the [requirements for running Geo](../index.md#requirements-for-running-geo).
+
+## Set up Geo for Linux package (Omnibus)
+
+Prerequisites:
+
+- You use PostgreSQL 12 or later,
+ which includes the [`pg_basebackup` tool](https://www.postgresql.org/docs/12/app-pgbasebackup.html).
+
+### Configure the primary site
+
+1. SSH into your GitLab primary site and sign in as root:
+
+ ```shell
+ sudo -i
+ ```
+
+1. Add a unique Geo site name to `/etc/gitlab/gitlab.rb`:
+
+ ```ruby
+ ##
+ ## The unique identifier for the Geo site. See
+ ## https://docs.gitlab.com/ee/user/admin_area/geo_nodes.html#common-settings
+ ##
+ gitlab_rails['geo_node_name'] = '<site_name_here>'
+ ```
+
+1. To apply the change, reconfigure the primary site:
+
+ ```shell
+ gitlab-ctl reconfigure
+ ```
+
+1. Define the site as your primary Geo site:
+
+ ```shell
+ gitlab-ctl set-geo-primary-node
+ ```
+
+ This command uses the `external_url` defined in `/etc/gitlab/gitlab.rb`.
+
+### Configure the external database to be replicated
+
+To set up an external database, you can either:
+
+- Set up [streaming replication](https://www.postgresql.org/docs/12/warm-standby.html#STREAMING-REPLICATION-SLOTS) yourself (for example Amazon RDS, or bare metal not managed by the Linux package).
+- Manually perform the configuration of your Linux package installations as follows.
+
+#### Leverage your cloud provider's tools to replicate the primary database
+
+Given you have a primary site set up on AWS EC2 that uses RDS.
+You can now just create a read-only replica in a different region and the
+replication process is managed by AWS. Make sure you've set Network ACL (Access Control List), Subnet, and Security Group according to your needs, so the secondary Rails nodes can access the database.
+
+The following instructions detail how to create a read-only replica for common
+cloud providers:
+
+- Amazon RDS - [Creating a Read Replica](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html#USER_ReadRepl.Create)
+- Azure Database for PostgreSQL - [Create and manage read replicas in Azure Database for PostgreSQL](https://learn.microsoft.com/en-us/azure/postgresql/single-server/how-to-read-replicas-portal)
+- Google Cloud SQL - [Creating read replicas](https://cloud.google.com/sql/docs/postgres/replication/create-replica)
+
+When your read-only replica is set up, you can skip to [configure your secondary site](#configure-the-secondary-site-to-use-the-external-read-replica).
+
+### Configure the secondary site to use the external read-replica
+
+With Linux package installations, the
+[`geo_secondary_role`](https://docs.gitlab.com/omnibus/roles/#gitlab-geo-roles)
+has three main functions:
+
+1. Configure the replica database.
+1. Configure the tracking database.
+1. Enable the [Geo Log Cursor](../index.md#geo-log-cursor).
+
+To configure the connection to the external read-replica database:
+
+1. SSH into each **Rails, Sidekiq and Geo Log Cursor** node on your **secondary** site and login as root:
+
+ ```shell
+ sudo -i
+ ```
+
+1. Edit `/etc/gitlab/gitlab.rb` and add the following
+
+ ```ruby
+ ##
+ ## Geo Secondary role
+ ## - configure dependent flags automatically to enable Geo
+ ##
+ roles ['geo_secondary_role']
+
+ # note this is shared between both databases,
+ # make sure you define the same password in both
+ gitlab_rails['db_password'] = '<your_password_here>'
+
+ gitlab_rails['db_username'] = 'gitlab'
+ gitlab_rails['db_host'] = '<database_read_replica_host>'
+
+ # Disable the bundled Omnibus PostgreSQL, since we are
+ # using an external PostgreSQL
+ postgresql['enable'] = false
+ ```
+
+1. Save the file and reconfigure GitLab:
+
+ ```shell
+ gitlab-ctl reconfigure
+ ```
+
+In case you have connectivity issues to your replica database you can [check TCP connectivity](../../raketasks/maintenance.md) from your server with the following command:
+
+```shell
+gitlab-rake gitlab:tcp_check[<replica FQDN>,5432]
+```
+
+If this step fails, you might be using the wrong IP address, or a firewall might
+be preventing access to the site. Check the IP address, paying close
+attention to the difference between public and private addresses.
+If a firewall is present, ensure the secondary site is allowed to connect to the
+primary site on port 5432.
+
+#### Manually replicate secret GitLab values
+
+GitLab stores a number of secret values in `/etc/gitlab/gitlab-secrets.json`.
+This JSON file must be the same across each of the site nodes.
+You must manually replicate the secret file across all of your secondary sites, although
+[issue 3789](https://gitlab.com/gitlab-org/gitlab/-/issues/3789) proposes to change this behavior.
+
+1. SSH into a Rails node on your primary site, and execute the command below:
+
+ ```shell
+ sudo cat /etc/gitlab/gitlab-secrets.json
+ ```
+
+ This displays the secrets you must replicate, in JSON format.
+
+1. SSH into each node on your secondary Geo site and sign in as root:
+
+ ```shell
+ sudo -i
+ ```
+
+1. Make a backup of any existing secrets:
+
+ ```shell
+ mv /etc/gitlab/gitlab-secrets.json /etc/gitlab/gitlab-secrets.json.`date +%F`
+ ```
+
+1. Copy `/etc/gitlab/gitlab-secrets.json` from the primary site Rails node to each secondary site node.
+ You can also copy-and-paste the file contents between nodes:
+
+ ```shell
+ sudo editor /etc/gitlab/gitlab-secrets.json
+
+ # paste the output of the `cat` command you ran on the primary
+ # save and exit
+ ```
+
+1. Ensure the file permissions are correct:
+
+ ```shell
+ chown root:root /etc/gitlab/gitlab-secrets.json
+ chmod 0600 /etc/gitlab/gitlab-secrets.json
+ ```
+
+1. To apply the changes, reconfigure every Rails, Sidekiq, and Gitaly secondary site node:
+
+ ```shell
+ gitlab-ctl reconfigure
+ gitlab-ctl restart
+ ```
+
+#### Manually replicate the primary site SSH host keys
+
+1. SSH into each node on your secondary site and sign in as root:
+
+ ```shell
+ sudo -i
+ ```
+
+1. Back up any existing SSH host keys:
+
+ ```shell
+ find /etc/ssh -iname 'ssh_host_*' -exec cp {} {}.backup.`date +%F` \;
+ ```
+
+1. Copy OpenSSH host keys from the primary site.
+
+ - If you can access as root one of the primary site nodes serving SSH traffic (usually, the main GitLab Rails application nodes):
+
+ ```shell
+ # Run this from the secondary site, change `<primary_site_fqdn>` for the IP or FQDN of the server
+ scp root@<primary_node_fqdn>:/etc/ssh/ssh_host_*_key* /etc/ssh
+ ```
+
+ - If you only have access through a user with `sudo` privileges:
+
+ ```shell
+ # Run this from the node on your primary site:
+ sudo tar --transform 's/.*\///g' -zcvf ~/geo-host-key.tar.gz /etc/ssh/ssh_host_*_key*
+
+ # Run this on each node on your secondary site:
+ scp <user_with_sudo>@<primary_site_fqdn>:geo-host-key.tar.gz .
+ tar zxvf ~/geo-host-key.tar.gz -C /etc/ssh
+ ```
+
+1. For each secondary site node, ensure the file permissions are correct:
+
+ ```shell
+ chown root:root /etc/ssh/ssh_host_*_key*
+ chmod 0600 /etc/ssh/ssh_host_*_key
+ ```
+
+1. To verify key fingerprint matches, execute the following command on both the primary and secondary nodes on each site:
+
+ ```shell
+ for file in /etc/ssh/ssh_host_*_key; do ssh-keygen -lf $file; done
+ ```
+
+ You should get an output similar to the following:
+
+ ```shell
+ 1024 SHA256:FEZX2jQa2bcsd/fn/uxBzxhKdx4Imc4raXrHwsbtP0M root@serverhostname (DSA)
+ 256 SHA256:uw98R35Uf+fYEQ/UnJD9Br4NXUFPv7JAUln5uHlgSeY root@serverhostname (ECDSA)
+ 256 SHA256:sqOUWcraZQKd89y/QQv/iynPTOGQxcOTIXU/LsoPmnM root@serverhostname (ED25519)
+ 2048 SHA256:qwa+rgir2Oy86QI+PZi/QVR+MSmrdrpsuH7YyKknC+s root@serverhostname (RSA)
+ ```
+
+ The output should be identical on both nodes.
+
+1. Verify you have the correct public keys for the existing private keys:
+
+ ```shell
+ # This will print the fingerprint for private keys:
+ for file in /etc/ssh/ssh_host_*_key; do ssh-keygen -lf $file; done
+
+ # This will print the fingerprint for public keys:
+ for file in /etc/ssh/ssh_host_*_key.pub; do ssh-keygen -lf $file; done
+ ```
+
+ The output for the public and private key commands should generate the same fingerprint.
+
+1. For each secondary site node, restart `sshd`:
+
+ ```shell
+ # Debian or Ubuntu installations
+ sudo service ssh reload
+
+ # CentOS installations
+ sudo service sshd reload
+ ```
+
+1. To verify SSH is still functional, from a new terminal, SSH into your GitLab secondary server.
+ If you can't connect, make sure you have the correct permissions.
+
+#### Set up fast lookup of authorized SSH keys
+
+After the replication process is complete, you need to [configure fast lookup of authorized SSH keys](../../operations/fast_ssh_key_lookup.md).
+
+NOTE:
+Authentication is handled by the primary site. Don't set up custom authentication for the secondary site.
+Any change that requires access to the Admin Area should be made in the primary site, because the
+secondary site is a read-only copy.
+
+#### Add the secondary site
+
+1. SSH into each Rails and Sidekiq node on your secondary site and sign in as root:
+
+ ```shell
+ sudo -i
+ ```
+
+1. Edit `/etc/gitlab/gitlab.rb` and add a unique name for your site.
+
+ ```ruby
+ ##
+ ## The unique identifier for the Geo site. See
+ ## https://docs.gitlab.com/ee/user/admin_area/geo_nodes.html#common-settings
+ ##
+ gitlab_rails['geo_node_name'] = '<secondary_site_name_here>'
+ ```
+
+ Save the unique name for the next steps.
+
+1. To apply the changes, reconfigure each Rails and Sidekiq node on your secondary site.
+
+ ```shell
+ gitlab-ctl reconfigure
+ ```
+
+1. Go to the primary node GitLab instance:
+ 1. On the left sidebar, select **Search or go to**.
+ 1. Select **Admin Area**.
+ 1. Select **Geo > Sites**.
+ 1. Select **Add site**.
+
+ ![Add secondary site](../replication/img/adding_a_secondary_v15_8.png)
+
+ 1. In **Name**, enter the value for `gitlab_rails['geo_node_name']` in
+ `/etc/gitlab/gitlab.rb`. The values must match exactly.
+ 1. In **External URL**, enter the value for `external_url` in `/etc/gitlab/gitlab.rb`.
+ It's okay if one values ends in `/` and the other doesn't. Otherwise, the values must
+ match exactly.
+ 1. Optional. In **Internal URL (optional)**, enter an internal URL for the primary site.
+ 1. Optional. Select which groups or storage shards should be replicated by the
+ secondary site. To replicate all, leave the field blank. See [selective synchronization](../replication/configuration.md#selective-synchronization).
+ 1. Select **Save changes**.
+1. SSH into each Rails and Sidekiq node on your secondary site and restart the services:
+
+ ```shell
+ sudo gitlab-ctl restart
+ ```
+
+1. Check if there are any common issues with your Geo setup by running:
+
+ ```shell
+ sudo gitlab-rake gitlab:geo:check
+ ```
+
+ If any of the checks fail, see the [troubleshooting documentation](../replication/troubleshooting.md).
+
+1. To verify that the secondary site is reachable, SSH into a Rails or Sidekiq server on your primary site and run:
+
+ ```shell
+ sudo gitlab-rake gitlab:geo:check
+ ```
+
+ If any of the checks fail, check the [troubleshooting documentation](../replication/troubleshooting.md).
+
+After the secondary site is added to the Geo administration page and restarted,
+the site automatically starts to replicate missing data from the primary site
+in a process known as backfill.
+
+Meanwhile, the primary site starts to notify each secondary site of any changes, so
+that the secondary site can act on the notifications immediately.
+
+Be sure the secondary site is running and accessible. You can sign in to the
+secondary site with the same credentials as were used with the primary site.
+
+#### Enable Git access over HTTP/HTTPS and SSH
+
+Geo synchronizes repositories over HTTP/HTTPS (enabled by default for new installations),
+and therefore requires this clone method to be enabled.
+If you convert an existing site to Geo, you should check that the clone method is enabled.
+
+On the primary site:
+
+1. On the left sidebar, select **Search or go to**.
+1. Select **Admin Area**.
+1. Select **Settings > General**.
+1. Expand **Visibility and access controls**.
+1. If you use Git over SSH:
+ 1. Ensure **Enabled Git access protocols** is set to **Both SSH and HTTP(S)**.
+ 1. Enable the [fast lookup of authorized SSH keys in the database](../../operations/fast_ssh_key_lookup.md) on both the primary and secondary sites.
+1. If you don't use Git over SSH, set **Enabled Git access protocols** to **Only HTTP(S)**.
+
+#### Verify proper functioning of the secondary site
+
+You can sign in to the secondary site with the same credentials you used with
+the primary site.
+
+After you sign in:
+
+1. On the left sidebar, select **Search or go to**.
+1. Select **Admin Area**.
+1. Select **Geo > Sites**.
+1. Verify that the site is correctly identified as a secondary Geo site, and that
+ Geo is enabled.
+
+The initial replication might take some time.
+You can monitor the synchronization process on each Geo site from the primary
+site **Geo Sites** dashboard in your browser.
+
+![Geo dashboard](../replication/img/geo_dashboard_v14_0.png)
+
+## Configure the tracking database
+
+NOTE:
+This step is optional in case you also want to have your tracking database set up externally on another server.
+
+**Secondary** sites use a separate PostgreSQL installation as a tracking
+database to keep track of replication status and automatically recover from
+potential replication issues. The Linux package automatically configures a tracking database
+when `roles ['geo_secondary_role']` is set.
+If you want to run this database external to your Linux package installation, use the following instructions.
+
+### Cloud-managed database services
+
+If you are using a cloud-managed service for the tracking database, you may need
+to grant additional roles to your tracking database user (by default, this is
+`gitlab_geo`):
+
+- Amazon RDS requires the [`rds_superuser`](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.html#Appendix.PostgreSQL.CommonDBATasks.Roles) role.
+- Azure Database for PostgreSQL requires the [`azure_pg_admin`](https://learn.microsoft.com/en-us/azure/postgresql/single-server/how-to-create-users#how-to-create-additional-admin-users-in-azure-database-for-postgresql) role.
+- Google Cloud SQL requires the [`cloudsqlsuperuser`](https://cloud.google.com/sql/docs/postgres/users#default-users) role.
+
+Additional roles are needed for the installation of extensions during installation and upgrades. As an alternative,
+[ensure the extensions are installed manually, and read about the problems that may arise during future GitLab upgrades](../../../install/postgresql_extensions.md).
+
+NOTE:
+If you want to use Amazon RDS as a tracking database, make sure it has access to
+the secondary database. Unfortunately, just assigning the same security group is not enough as
+outbound rules do not apply to RDS PostgreSQL databases. Therefore, you need to explicitly add an inbound
+rule to the read-replica's security group allowing any TCP traffic from
+the tracking database on port 5432.
+
+### Create the tracking database
+
+Create and configure the tracking database in your PostgreSQL instance:
+
+1. Set up PostgreSQL according to the
+ [database requirements document](../../../install/requirements.md#database).
+1. Set up a `gitlab_geo` user with a password of your choice, create the `gitlabhq_geo_production` database, and make the user an owner of the database.
+ You can see an example of this setup in the [self-compiled installation documentation](../../../install/installation.md#7-database).
+1. If you are **not** using a cloud-managed PostgreSQL database, ensure that your secondary
+ site can communicate with your tracking database by manually changing the
+ `pg_hba.conf` that is associated with your tracking database.
+ Remember to restart PostgreSQL afterwards for the changes to take effect:
+
+ ```plaintext
+ ##
+ ## Geo Tracking Database Role
+ ## - pg_hba.conf
+ ##
+ host all all <trusted tracking IP>/32 md5
+ host all all <trusted secondary IP>/32 md5
+ ```
+
+### Configure GitLab
+
+Configure GitLab to use this database. These steps are for Linux package and Docker deployments.
+
+1. SSH into a GitLab **secondary** server and login as root:
+
+ ```shell
+ sudo -i
+ ```
+
+1. Edit `/etc/gitlab/gitlab.rb` with the connection parameters and credentials for
+ the machine with the PostgreSQL instance:
+
+ ```ruby
+ geo_secondary['db_username'] = 'gitlab_geo'
+ geo_secondary['db_password'] = '<your_password_here>'
+
+ geo_secondary['db_host'] = '<tracking_database_host>'
+ geo_secondary['db_port'] = <tracking_database_port> # change to the correct port
+ geo_postgresql['enable'] = false # don't use internal managed instance
+ ```
+
+1. Save the file and reconfigure GitLab:
+
+ ```shell
+ gitlab-ctl reconfigure
+ ```
+
+#### Manually set up the database schema (optional)
+
+The reconfigure in the [steps above](#configure-gitlab) handles these steps automatically. These steps are provided in case something went wrong.
+
+1. This task creates the database schema. It requires the database user to be a superuser.
+
+ ```shell
+ sudo gitlab-rake db:create:geo
+ ```
+
+1. Applying Rails database migrations (schema and data updates) is also performed by reconfigure. If `geo_secondary['auto_migrate'] = false` is set, or
+ the schema was created manually, this step is required:
+
+ ```shell
+ sudo gitlab-rake db:migrate:geo
+ ```
+
+## Troubleshooting
+
+See [troubleshooting Geo](../replication/troubleshooting.md).