Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/geo/replication/troubleshooting.md')
-rw-r--r--doc/administration/geo/replication/troubleshooting.md29
1 files changed, 26 insertions, 3 deletions
diff --git a/doc/administration/geo/replication/troubleshooting.md b/doc/administration/geo/replication/troubleshooting.md
index 59a67fecfcd..a8736b3ed1d 100644
--- a/doc/administration/geo/replication/troubleshooting.md
+++ b/doc/administration/geo/replication/troubleshooting.md
@@ -42,7 +42,7 @@ to help identify if something is wrong:
![Geo health check](img/geo_site_health_v14_0.png)
-A site shows as "Unhealthy" if the site's status is more than 10 minutes old. In that case, try running the following in the [Rails console](../../operations/rails_console.md) on the affected site:
+A site shows as "Unhealthy" if the site's status is more than 10 minutes old. In that case, try running the following in the [Rails console](../../operations/rails_console.md) on the affected secondary site:
```ruby
Geo::MetricsUpdateWorker.new.perform
@@ -52,7 +52,7 @@ If it raises an error, then the error is probably also preventing the jobs from
If it successfully updates the status, then something may be wrong with Sidekiq. Is it running? Do the logs show errors? This job is supposed to be enqueued every minute. It takes an exclusive lease in Redis to ensure that only one of these jobs can run at a time. The primary site updates its status directly in the PostgreSQL database. Secondary sites send an HTTP Post request to the primary site with their status data.
-A site also shows as "Unhealthy" if certain health checks fail. You can reveal the failure by running the following in the [Rails console](../../operations/rails_console.md) on the affected site:
+A site also shows as "Unhealthy" if certain health checks fail. You can reveal the failure by running the following in the [Rails console](../../operations/rails_console.md) on the affected secondary site:
```ruby
Gitlab::Geo::HealthCheck.new.perform_checks
@@ -718,6 +718,8 @@ To solve this:
### Very large repositories never successfully synchronize on the **secondary** site
+#### GitLab 10.1 and earlier
+
GitLab places a timeout on all repository clones, including project imports
and Geo synchronization operations. If a fresh `git clone` of a repository
on the **primary** takes more than the default three hours, you may be affected by this.
@@ -740,6 +742,10 @@ add the following line to `/etc/gitlab/gitlab.rb`:
This increases the timeout to four hours (14400 seconds). Choose a time
long enough to accommodate a full clone of your largest repositories.
+#### GitLab 10.2 and later
+
+Geo [replicates Git repositories over HTTPS](../index.md#how-it-works). GitLab does not place a timeout on these requests. If a Git repository is failing to replicate, with a consistent job execution time, then you should check for timeouts applied by external components such as load balancers.
+
### New LFS objects are never replicated
If new LFS objects are never replicated to secondary Geo sites, check the version of
@@ -959,7 +965,7 @@ This behavior affects only the following data types through GitLab 14.6:
| Package Registry | 13.10 |
| CI Pipeline Artifacts | 13.11 |
| Terraform State Versions | 13.12 |
-| Infrastructure Registry | 14.0 |
+| Infrastructure Registry (renamed to Terraform Module Registry in GitLab 15.11) | 14.0 |
| External MR diffs | 14.6 |
| LFS Objects | 14.6 |
| Pages Deployments | 14.6 |
@@ -1373,6 +1379,23 @@ The bug causes all wildcard domains (`.example.com`) to be ignored except for th
You can have only one wildcard domain in the `no_proxy` list.
+### Secondary site shows "Unhealthy" in UI after changing the value of `external_url` for the primary site
+
+If you have updated the value of `external_url` in `/etc/gitlab/gitlab.rb` for the primary site or changed the protocol from `http` to `https`, you may see that secondary sites are shown as `Unhealthy`. You may also find the following error in `geo.log`:
+
+```plaintext
+"class": "Geo::NodeStatusRequestService",
+...
+"message": "Failed to Net::HTTP::Post to primary url: http://primary-site.gitlab.tld/api/v4/geo/status",
+ "error": "Failed to open TCP connection to <PRIMARY_IP_ADDRESS>:80 (Connection refused - connect(2) for \"<PRIMARY_ID_ADDRESS>\" port 80)"
+```
+
+In this case, make sure to update the changed URL on all your sites:
+
+1. On the top bar, select **Main menu > Admin**.
+1. On the left sidebar, select **Admin > Geo > Sites**.
+1. Change the URL and save the change.
+
## Fixing non-PostgreSQL replication failures
If you notice replication failures in `Admin > Geo > Sites` or the [Sync status Rake task](#sync-status-rake-task), you can try to resolve the failures with the following general steps: