Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/geo/replication/database.md')
-rw-r--r--doc/administration/geo/replication/database.md496
1 files changed, 496 insertions, 0 deletions
diff --git a/doc/administration/geo/replication/database.md b/doc/administration/geo/replication/database.md
new file mode 100644
index 00000000000..10e5409124c
--- /dev/null
+++ b/doc/administration/geo/replication/database.md
@@ -0,0 +1,496 @@
+# Geo database replication (GitLab Omnibus)
+
+NOTE: **Note:**
+This is the documentation for the Omnibus GitLab packages. For installations
+from source, follow the
+[Geo database replication (source)](database_source.md) guide.
+
+NOTE: **Note:**
+If your GitLab installation uses external (not managed by Omnibus) PostgreSQL
+instances, the Omnibus roles will not be able to perform all necessary
+configuration steps. In this case, refer to
+[additional instructions](external_database.md).
+
+NOTE: **Note:**
+The stages of the setup process must be completed in the documented order.
+Before attempting the steps in this stage, [complete all prior stages][toc].
+
+This document describes the minimal steps you have to take in order to
+replicate your **primary** GitLab database to a **secondary** node's database. You may
+have to change some values according to your database setup, how big it is, etc.
+
+You are encouraged to first read through all the steps before executing them
+in your testing/production environment.
+
+## PostgreSQL replication
+
+The GitLab **primary** node where the write operations happen will connect to
+the **primary** database server, and **secondary** nodes will
+connect to their own database servers (which are also read-only).
+
+NOTE: **Note:**
+In database documentation, you may see "**primary**" being referenced as "master"
+and "**secondary**" as either "slave" or "standby" server (read-only).
+
+We recommend using [PostgreSQL replication slots][replication-slots-article]
+to ensure that the **primary** node retains all the data necessary for the **secondary** nodes to
+recover. See below for more details.
+
+The following guide assumes that:
+
+- You are using Omnibus and therefore you are using PostgreSQL 9.6 or later
+ which includes the [`pg_basebackup` tool][pgback] and improved
+ [Foreign Data Wrapper][FDW] support.
+- You have a **primary** node already set up (the GitLab server you are
+ replicating from), running Omnibus' PostgreSQL (or equivalent version), and
+ you have a new **secondary** server set up with the same versions of the OS,
+ PostgreSQL, and GitLab on all nodes.
+- The IP of the **primary** server for our examples is `198.51.100.1`, whereas the
+ **secondary** node's IP is `198.51.100.2`. Note that the **primary** and **secondary** servers
+ **must** be able to communicate over these addresses. More on this in the
+ guide below.
+
+CAUTION: **Warning:**
+Geo works with streaming replication. Logical replication is not supported at this time.
+There is an [issue where support is being discussed](https://gitlab.com/gitlab-org/gitlab-ee/issues/7420).
+
+### Step 1. Configure the **primary** server
+
+1. SSH into your GitLab **primary** server and login as root:
+
+ ```sh
+ sudo -i
+ ```
+
+1. Execute the command below to define the node as **primary** node:
+
+ ```sh
+ gitlab-ctl set-geo-primary-node
+ ```
+
+ This command will use your defined `external_url` in `/etc/gitlab/gitlab.rb`.
+
+1. GitLab 10.4 and up only: Do the following to make sure the `gitlab` database user has a password defined:
+
+ Generate a MD5 hash of the desired password:
+
+ ```sh
+ gitlab-ctl pg-password-md5 gitlab
+ # Enter password: mypassword
+ # Confirm password: mypassword
+ # fca0b89a972d69f00eb3ec98a5838484
+ ```
+
+ Edit `/etc/gitlab/gitlab.rb`:
+
+ ```ruby
+ # Fill with the hash generated by `gitlab-ctl pg-password-md5 gitlab`
+ postgresql['sql_user_password'] = 'fca0b89a972d69f00eb3ec98a5838484'
+
+ # Every node that runs Unicorn or Sidekiq needs to have the database
+ # password specified as below. If you have a high-availability setup, this
+ # must be present in all application nodes.
+ gitlab_rails['db_password'] = 'mypassword'
+ ```
+
+1. Omnibus GitLab already has a [replication user]
+ called `gitlab_replicator`. You must set the password for this user manually.
+ You will be prompted to enter a password:
+
+ ```sh
+ gitlab-ctl set-replication-password
+ ```
+
+ This command will also read the `postgresql['sql_replication_user']` Omnibus
+ setting in case you have changed `gitlab_replicator` username to something
+ else.
+
+ If you are using an external database not managed by Omnibus GitLab, you need
+ to create the replicator user and define a password to it manually.
+ For information on how to create a replication user, refer to the
+ [appropriate step](database_source.md#step-1-configure-the-primary-server)
+ in [Geo database replication (source)](database_source.md).
+
+1. Configure PostgreSQL to listen on network interfaces:
+
+ For security reasons, PostgreSQL does not listen on any network interfaces
+ by default. However, Geo requires the **secondary** node to be able to
+ connect to the **primary** node's database. For this reason, we need the address of
+ each node. Note: For external PostgreSQL instances, see [additional instructions](external_database.md).
+
+ If you are using a cloud provider, you can lookup the addresses for each
+ Geo node through your cloud provider's management console.
+
+ To lookup the address of a Geo node, SSH in to the Geo node and execute:
+
+ ```sh
+ ##
+ ## Private address
+ ##
+ ip route get 255.255.255.255 | awk '{print "Private address:", $NF; exit}'
+
+ ##
+ ## Public address
+ ##
+ echo "External address: $(curl --silent ipinfo.io/ip)"
+ ```
+
+ In most cases, the following addresses will be used to configure GitLab
+ Geo:
+
+ | Configuration | Address |
+ |:----------------------------------------|:------------------------------------------------------|
+ | `postgresql['listen_address']` | **Primary** node's public or VPC private address. |
+ | `postgresql['md5_auth_cidr_addresses']` | **Secondary** node's public or VPC private addresses. |
+
+ If you are using Google Cloud Platform, SoftLayer, or any other vendor that
+ provides a virtual private cloud (VPC) you can use the **secondary** node's private
+ address (corresponds to "internal address" for Google Cloud Platform) for
+ `postgresql['md5_auth_cidr_addresses']` and `postgresql['listen_address']`.
+
+ The `listen_address` option opens PostgreSQL up to network connections
+ with the interface corresponding to the given address. See [the PostgreSQL
+ documentation][pg-docs-runtime-conn] for more details.
+
+ Depending on your network configuration, the suggested addresses may not
+ be correct. If your **primary** node and **secondary** nodes connect over a local
+ area network, or a virtual network connecting availability zones like
+ [Amazon's VPC](https://aws.amazon.com/vpc/) or [Google's VPC](https://cloud.google.com/vpc/)
+ you should use the **secondary** node's private address for `postgresql['md5_auth_cidr_addresses']`.
+
+ Edit `/etc/gitlab/gitlab.rb` and add the following, replacing the IP
+ addresses with addresses appropriate to your network configuration:
+
+ ```ruby
+ ##
+ ## Geo Primary role
+ ## - configure dependent flags automatically to enable Geo
+ ##
+ roles ['geo_primary_role']
+
+ ##
+ ## Primary address
+ ## - replace '198.51.100.1' with the public or VPC address of your Geo primary node
+ ##
+ postgresql['listen_address'] = '198.51.100.1'
+
+ ##
+ # Primary and Secondary addresses
+ # - replace '198.51.100.1' with the public or VPC address of your Geo primary node
+ # - replace '198.51.100.2' with the public or VPC address of your Geo secondary node
+ ##
+ postgresql['md5_auth_cidr_addresses'] = ['198.51.100.1/32','198.51.100.2/32']
+
+ ##
+ ## Replication settings
+ ## - set this to be the number of Geo secondary nodes you have
+ ##
+ postgresql['max_replication_slots'] = 1
+ # postgresql['max_wal_senders'] = 10
+ # postgresql['wal_keep_segments'] = 10
+
+ ##
+ ## Disable automatic database migrations temporarily
+ ## (until PostgreSQL is restarted and listening on the private address).
+ ##
+ gitlab_rails['auto_migrate'] = false
+ ```
+
+1. Optional: If you want to add another **secondary** node, the relevant setting would look like:
+
+ ```ruby
+ postgresql['md5_auth_cidr_addresses'] = ['198.51.100.1/32', '198.51.100.2/32','198.51.100.3/32']
+ ```
+
+ You may also want to edit the `wal_keep_segments` and `max_wal_senders` to
+ match your database replication requirements. Consult the [PostgreSQL -
+ Replication documentation][pg-docs-runtime-replication]
+ for more information.
+
+1. Save the file and reconfigure GitLab for the database listen changes and
+ the replication slot changes to be applied:
+
+ ```sh
+ gitlab-ctl reconfigure
+ ```
+
+ Restart PostgreSQL for its changes to take effect:
+
+ ```sh
+ gitlab-ctl restart postgresql
+ ```
+
+1. Re-enable migrations now that PostgreSQL is restarted and listening on the
+ private address.
+
+ Edit `/etc/gitlab/gitlab.rb` and **change** the configuration to `true`:
+
+ ```ruby
+ gitlab_rails['auto_migrate'] = true
+ ```
+
+ Save the file and reconfigure GitLab:
+
+ ```sh
+ gitlab-ctl reconfigure
+ ```
+
+1. Now that the PostgreSQL server is set up to accept remote connections, run
+ `netstat -plnt | grep 5432` to make sure that PostgreSQL is listening on port
+ `5432` to the **primary** server's private address.
+
+1. A certificate was automatically generated when GitLab was reconfigured. This
+ will be used automatically to protect your PostgreSQL traffic from
+ eavesdroppers, but to protect against active ("man-in-the-middle") attackers,
+ the **secondary** node needs a copy of the certificate. Make a copy of the PostgreSQL
+ `server.crt` file on the **primary** node by running this command:
+
+ ```sh
+ cat ~gitlab-psql/data/server.crt
+ ```
+
+ Copy the output into a clipboard or into a local file. You
+ will need it when setting up the **secondary** node! The certificate is not sensitive
+ data.
+
+### Step 2. Configure the **secondary** server
+
+1. SSH into your GitLab **secondary** server and login as root:
+
+ ```
+ sudo -i
+ ```
+
+1. Stop application server and Sidekiq
+
+ ```
+ gitlab-ctl stop unicorn
+ gitlab-ctl stop sidekiq
+ ```
+
+ NOTE: **Note**:
+ This step is important so we don't try to execute anything before the node is fully configured.
+
+1. [Check TCP connectivity][rake-maintenance] to the **primary** node's PostgreSQL server:
+
+ ```sh
+ gitlab-rake gitlab:tcp_check[198.51.100.1,5432]
+ ```
+
+ NOTE: **Note**:
+ If this step fails, you may be using the wrong IP address, or a firewall may
+ be preventing access to the server. Check the IP address, paying close
+ attention to the difference between public and private addresses and ensure
+ that, if a firewall is present, the **secondary** node is permitted to connect to the
+ **primary** node on port 5432.
+
+1. Create a file `server.crt` in the **secondary** server, with the content you got on the last step of the **primary** node's setup:
+
+ ```
+ editor server.crt
+ ```
+
+1. Set up PostgreSQL TLS verification on the **secondary** node:
+
+ Install the `server.crt` file:
+
+ ```sh
+ install -D -o gitlab-psql -g gitlab-psql -m 0400 -T server.crt ~gitlab-psql/.postgresql/root.crt
+ ```
+
+ PostgreSQL will now only recognize that exact certificate when verifying TLS
+ connections. The certificate can only be replicated by someone with access
+ to the private key, which is **only** present on the **primary** node.
+
+1. Test that the `gitlab-psql` user can connect to the **primary** node's database:
+
+ ```sh
+ sudo -u gitlab-psql /opt/gitlab/embedded/bin/psql --list -U gitlab_replicator -d "dbname=gitlabhq_production sslmode=verify-ca" -W -h 198.51.100.1
+ ```
+
+ When prompted enter the password you set in the first step for the
+ `gitlab_replicator` user. If all worked correctly, you should see
+ the list of **primary** node's databases.
+
+ A failure to connect here indicates that the TLS configuration is incorrect.
+ Ensure that the contents of `~gitlab-psql/data/server.crt` on the **primary** node
+ match the contents of `~gitlab-psql/.postgresql/root.crt` on the **secondary** node.
+
+1. Configure PostgreSQL to enable FDW support:
+
+ This step is similar to how we configured the **primary** instance.
+ We need to enable this, to enable FDW support, even if using a single node.
+
+ Edit `/etc/gitlab/gitlab.rb` and add the following, replacing the IP
+ addresses with addresses appropriate to your network configuration:
+
+ ```ruby
+ ##
+ ## Geo Secondary role
+ ## - configure dependent flags automatically to enable Geo
+ ##
+ roles ['geo_secondary_role']
+
+ ##
+ ## Secondary address
+ ## - replace '198.51.100.2' with the public or VPC address of your Geo secondary node
+ ##
+ postgresql['listen_address'] = '198.51.100.2'
+ postgresql['md5_auth_cidr_addresses'] = ['198.51.100.2/32']
+
+ ##
+ ## Database credentials password (defined previously in primary node)
+ ## - replicate same values here as defined in primary node
+ ##
+ postgresql['sql_user_password'] = 'fca0b89a972d69f00eb3ec98a5838484'
+ gitlab_rails['db_password'] = 'mypassword'
+
+ ##
+ ## Enable FDW support for the Geo Tracking Database (improves performance)
+ ##
+ geo_secondary['db_fdw'] = true
+ ```
+
+ For external PostgreSQL instances, see [additional instructions](external_database.md).
+ If you bring a former **primary** node back online to serve as a **secondary** node, then you also need to remove `roles ['geo_primary_role']` or `geo_primary_role['enable'] = true`.
+
+1. Reconfigure GitLab for the changes to take effect:
+
+ ```sh
+ gitlab-ctl reconfigure
+ ```
+
+1. Restart PostgreSQL for the IP change to take effect and reconfigure again:
+
+ ```sh
+ gitlab-ctl restart postgresql
+ gitlab-ctl reconfigure
+ ```
+
+ This last reconfigure will provision the FDW configuration and enable it.
+
+### Step 3. Initiate the replication process
+
+Below we provide a script that connects the database on the **secondary** node to
+the database on the **primary** node, replicates the database, and creates the
+needed files for streaming replication.
+
+The directories used are the defaults that are set up in Omnibus. If you have
+changed any defaults or are using a source installation, configure it as you
+see fit replacing the directories and paths.
+
+CAUTION: **Warning:**
+Make sure to run this on the **secondary** server as it removes all PostgreSQL's
+data before running `pg_basebackup`.
+
+1. SSH into your GitLab **secondary** server and login as root:
+
+ ```sh
+ sudo -i
+ ```
+
+1. Choose a database-friendly name to use for your **secondary** node to
+ use as the replication slot name. For example, if your domain is
+ `secondary.geo.example.com`, you may use `secondary_example` as the slot
+ name as shown in the commands below.
+
+1. Execute the command below to start a backup/restore and begin the replication
+ CAUTION: **Warning:** Each Geo **secondary** node must have its own unique replication slot name.
+ Using the same slot name between two secondaries will break PostgreSQL replication.
+
+ ```sh
+ gitlab-ctl replicate-geo-database --slot-name=secondary_example --host=198.51.100.1
+ ```
+
+ When prompted, enter the _plaintext_ password you set up for the `gitlab_replicator`
+ user in the first step.
+
+ This command also takes a number of additional options. You can use `--help`
+ to list them all, but here are a couple of tips:
+ - If PostgreSQL is listening on a non-standard port, add `--port=` as well.
+ - If your database is too large to be transferred in 30 minutes, you will need
+ to increase the timeout, e.g., `--backup-timeout=3600` if you expect the
+ initial replication to take under an hour.
+ - Pass `--sslmode=disable` to skip PostgreSQL TLS authentication altogether
+ (e.g., you know the network path is secure, or you are using a site-to-site
+ VPN). This is **not** safe over the public Internet!
+ - You can read more details about each `sslmode` in the
+ [PostgreSQL documentation][pg-docs-ssl];
+ the instructions above are carefully written to ensure protection against
+ both passive eavesdroppers and active "man-in-the-middle" attackers.
+ - Change the `--slot-name` to the name of the replication slot
+ to be used on the **primary** database. The script will attempt to create the
+ replication slot automatically if it does not exist.
+ - If you're repurposing an old server into a Geo **secondary** node, you'll need to
+ add `--force` to the command line.
+ - When not in a production machine you can disable backup step if you
+ really sure this is what you want by adding `--skip-backup`
+
+The replication process is now complete.
+
+## PGBouncer support (optional)
+
+[PGBouncer](http://pgbouncer.github.io/) may be used with GitLab Geo to pool
+PostgreSQL connections. We recommend using PGBouncer if you use GitLab in a
+high-availability configuration with a cluster of nodes supporting a Geo
+**primary** node and another cluster of nodes supporting a Geo **secondary** node. For more
+information, see the [Omnibus HA](https://docs.gitlab.com/ee/administration/high_availability/database.html#configure-using-omnibus-for-high-availability)
+documentation.
+
+For a Geo **secondary** node to work properly with PGBouncer in front of the database,
+it will need a separate read-only user to make [PostgreSQL FDW queries][FDW]
+work:
+
+1. On the **primary** Geo database, enter the PostgreSQL on the console as an
+ admin user. If you are using an Omnibus-managed database, log onto the **primary**
+ node that is running the PostgreSQL database:
+
+ ```sh
+ sudo -u gitlab-psql /opt/gitlab/embedded/bin/psql -h /var/opt/gitlab/postgresql gitlabhq_production
+ ```
+
+1. Then create the read-only user:
+
+ ```sql
+ -- NOTE: Use the password defined earlier
+ CREATE USER gitlab_geo_fdw WITH password 'mypassword';
+ GRANT CONNECT ON DATABASE gitlabhq_production to gitlab_geo_fdw;
+ GRANT USAGE ON SCHEMA public TO gitlab_geo_fdw;
+ GRANT SELECT ON ALL TABLES IN SCHEMA public TO gitlab_geo_fdw;
+ GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO gitlab_geo_fdw;
+
+ -- Tables created by "gitlab" should be made read-only for "gitlab_geo_fdw"
+ -- automatically.
+ ALTER DEFAULT PRIVILEGES FOR USER gitlab IN SCHEMA public GRANT SELECT ON TABLES TO gitlab_geo_fdw;
+ ALTER DEFAULT PRIVILEGES FOR USER gitlab IN SCHEMA public GRANT SELECT ON SEQUENCES TO gitlab_geo_fdw;
+ ```
+
+1. On the **secondary** nodes, change `/etc/gitlab/gitlab.rb`:
+
+ ```
+ geo_postgresql['fdw_external_user'] = 'gitlab_geo_fdw'
+ ```
+
+1. Save the file and reconfigure GitLab for the changes to be applied:
+
+ ```sh
+ gitlab-ctl reconfigure
+ ```
+
+## MySQL replication
+
+MySQL replication is not supported for Geo.
+
+## Troubleshooting
+
+Read the [troubleshooting document](troubleshooting.md).
+
+[replication-slots-article]: https://medium.com/@tk512/replication-slots-in-postgresql-b4b03d277c75
+[pgback]: http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html
+[replication user]:https://wiki.postgresql.org/wiki/Streaming_Replication
+[FDW]: https://www.postgresql.org/docs/9.6/static/postgres-fdw.html
+[toc]: index.md#using-omnibus-gitlab
+[rake-maintenance]: ../../raketasks/maintenance.md
+[pg-docs-ssl]: https://www.postgresql.org/docs/9.6/static/libpq-ssl.html#LIBPQ-SSL-PROTECTION
+[pg-docs-runtime-conn]: https://www.postgresql.org/docs/9.6/static/runtime-config-connection.html
+[pg-docs-runtime-replication]: https://www.postgresql.org/docs/9.6/static/runtime-config-replication.html