diff options
author | GitLab Bot <gitlab-bot@gitlab.com> | 2021-11-18 16:16:36 +0300 |
---|---|---|
committer | GitLab Bot <gitlab-bot@gitlab.com> | 2021-11-18 16:16:36 +0300 |
commit | 311b0269b4eb9839fa63f80c8d7a58f32b8138a0 (patch) | |
tree | 07e7870bca8aed6d61fdcc810731c50d2c40af47 /doc/development/avoiding_downtime_in_migrations.md | |
parent | 27909cef6c4170ed9205afa7426b8d3de47cbb0c (diff) |
Add latest changes from gitlab-org/gitlab@14-5-stable-eev14.5.0-rc42
Diffstat (limited to 'doc/development/avoiding_downtime_in_migrations.md')
-rw-r--r-- | doc/development/avoiding_downtime_in_migrations.md | 176 |
1 files changed, 175 insertions, 1 deletions
diff --git a/doc/development/avoiding_downtime_in_migrations.md b/doc/development/avoiding_downtime_in_migrations.md index 9418eafa487..a5fc1909551 100644 --- a/doc/development/avoiding_downtime_in_migrations.md +++ b/doc/development/avoiding_downtime_in_migrations.md @@ -377,7 +377,181 @@ ensures that no downtime is needed. This operation does not require downtime. -## Data Migrations +## Migrating `integer` primary keys to `bigint` + +To [prevent the overflow risk](https://gitlab.com/groups/gitlab-org/-/epics/4785) for some tables +with `integer` primary key (PK), we have to migrate their PK to `bigint`. The process to do this +without downtime and causing too much load on the database is described below. + +### Initialize the conversion and start migrating existing data (release N) + +To start the process, add a regular migration to create the new `bigint` columns. Use the provided +`initialize_conversion_of_integer_to_bigint` helper. The helper also creates a database trigger +to keep in sync both columns for any new records ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/migrate/20210608072312_initialize_conversion_of_ci_stages_to_bigint.rb)): + +```ruby +class InitializeConversionOfCiStagesToBigint < ActiveRecord::Migration[6.1] + include Gitlab::Database::MigrationHelpers + + TABLE = :ci_stages + COLUMNS = %i(id) + + def up + initialize_conversion_of_integer_to_bigint(TABLE, COLUMNS) + end + + def down + revert_initialize_conversion_of_integer_to_bigint(TABLE, COLUMNS) + end +end +``` + +Ignore the new `bigint` columns: + +```ruby +module Ci + class Stage < Ci::ApplicationRecord + include IgnorableColumns + ignore_column :id_convert_to_bigint, remove_with: '14.2', remove_after: '2021-08-22' + end +``` + +To migrate existing data, we introduced new type of _batched background migrations_. +Unlike the classic background migrations, built on top of Sidekiq, batched background migrations +don't have to enqueue and schedule all the background jobs at the beginning. +They also have other advantages, like automatic tuning of the batch size, better progress visibility, +and collecting metrics. To start the process, use the provided `backfill_conversion_of_integer_to_bigint` +helper ([example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/migrate/20210608072346_backfill_ci_stages_for_bigint_conversion.rb)): + +```ruby +class BackfillCiStagesForBigintConversion < ActiveRecord::Migration[6.1] + include Gitlab::Database::MigrationHelpers + + TABLE = :ci_stages + COLUMNS = %i(id) + + def up + backfill_conversion_of_integer_to_bigint(TABLE, COLUMNS) + end + + def down + revert_backfill_conversion_of_integer_to_bigint(TABLE, COLUMNS) + end +end +``` + +### Monitor the background migration + +Check how the migration is performing while it's running. Multiple ways to do this are described below. + +#### High-level status of batched background migrations + +See how to [check the status of batched background migrations](../update/index.md#checking-for-background-migrations-before-upgrading). + +#### Query the database + +We can query the related database tables directly. Requires access to read-only replica. +Example queries: + +```sql +-- Get details for batched background migration for given table +SELECT * FROM batched_background_migrations WHERE table_name = 'namespaces'\gx + +-- Get count of batched background migration jobs by status for given table +SELECT + batched_background_migrations.id, batched_background_migration_jobs.status, COUNT(*) +FROM + batched_background_migrations + JOIN batched_background_migration_jobs ON batched_background_migrations.id = batched_background_migration_jobs.batched_background_migration_id +WHERE + table_name = 'namespaces' +GROUP BY + batched_background_migrations.id, batched_background_migration_jobs.status; + +-- Batched background migration progress for given table (based on estimated total number of tuples) +SELECT + m.table_name, + LEAST(100 * sum(j.batch_size) / pg_class.reltuples, 100) AS percentage_complete +FROM + batched_background_migrations m + JOIN batched_background_migration_jobs j ON j.batched_background_migration_id = m.id + JOIN pg_class ON pg_class.relname = m.table_name +WHERE + j.status = 3 AND m.table_name = 'namespaces' +GROUP BY m.id, pg_class.reltuples; +``` + +#### Sidekiq logs + +We can also use the Sidekiq logs to monitor the worker that executes the batched background +migrations: + +1. Sign in to [Kibana](https://log.gprd.gitlab.net) with a `@gitlab.com` email address. +1. Change the index pattern to `pubsub-sidekiq-inf-gprd*`. +1. Add filter for `json.queue: cronjob:database_batched_background_migration`. + +#### PostgerSQL slow queries log + +Slow queries log keeps track of low queries that took above 1 second to execute. To see them +for batched background migration: + +1. Sign in to [Kibana](https://log.gprd.gitlab.net) with a `@gitlab.com` email address. +1. Change the index pattern to `pubsub-postgres-inf-gprd*`. +1. Add filter for `json.endpoint_id.keyword: Database::BatchedBackgroundMigrationWorker`. +1. Optional. To see only updates, add a filter for `json.command_tag.keyword: UPDATE`. +1. Optional. To see only failed statements, add a filter for `json.error_severiry.keyword: ERROR`. +1. Optional. Add a filter by table name. + +#### Grafana dashboards + +To monitor the health of the database, use these additional metrics: + +- [PostgreSQL Tuple Statistics](https://dashboards.gitlab.net/d/000000167/postgresql-tuple-statistics?orgId=1&refresh=1m): if you see high rate of updates for the tables being actively converted, or increasing percentage of dead tuples for this table, it might mean that autovacuum cannot keep up. +- [PostgreSQL Overview](https://dashboards.gitlab.net/d/000000144/postgresql-overview?orgId=1): if you see high system usage or transactions per second (TPS) on the primary database server, it might mean that the migration is causing problems. + +### Prometheus metrics + +Number of [metrics](https://gitlab.com/gitlab-org/gitlab/-/blob/294a92484ce4611f660439aa48eee4dfec2230b5/lib/gitlab/database/background_migration/batched_migration_wrapper.rb#L90-128) +for each batched background migration are published to Prometheus. These metrics can be searched for and +visualized in Thanos ([see an example](https://thanos-query.ops.gitlab.net/graph?g0.expr=sum%20(rate(batched_migration_job_updated_tuples_total%7Benv%3D%22gprd%22%7D%5B5m%5D))%20by%20(migration_id)%20&g0.tab=0&g0.stacked=0&g0.range_input=3d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D&g0.end_input=2021-06-13%2012%3A18%3A24&g0.moment_input=2021-06-13%2012%3A18%3A24)). + +### Swap the columns (release N + 1) + +After the background is completed and the new `bigint` columns are populated for all records, we can +swap the columns. Swapping is done with post-deployment migration. The exact process depends on the +table being converted, but in general it's done in the following steps: + +1. Using the provided `ensure_batched_background_migration_is_finished` helper, make sure the batched +migration has finished ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L13-18)). +If the migration has not completed, the subsequent steps fail anyway. By checking in advance we +aim to have more helpful error message. +1. Create indexes using the `bigint` columns that match the existing indexes using the `integer` +column ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L28-34)). +1. Create foreign keys (FK) using the `bigint` columns that match the existing FKs using the +`integer` column. Do this both for FK referencing other tables, and FKs that reference the table +that is being migrated ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L36-43)). +1. Inside a transaction, swap the columns: + 1. Lock the tables involved. To reduce the chance of hitting a deadlock, we recommended to do this in parent to child order ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L47)). + 1. Rename the columns to swap names ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L49-54)) + 1. Reset the trigger function ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L56-57)). + 1. Swap the defaults ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L59-62)). + 1. Swap the PK constraint (if any) ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L64-68)). + 1. Remove old indexes and rename new ones ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L70-72)). + 1. Remove old FKs (if still present) and rename new ones ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L74)). + +See example [merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66088), and [migration](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb). + +### Remove the trigger and old `integer` columns (release N + 2) + +Using post-deployment migration and the provided `cleanup_conversion_of_integer_to_bigint` helper, +drop the database trigger and the old `integer` columns ([see an example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/69714)). + +### Remove ignore rules (release N + 3) + +In the next release after the columns were dropped, remove the ignore rules as we do not need them +anymore ([see an example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/71161)). + +## Data migrations Data migrations can be tricky. The usual approach to migrate data is to take a 3 step approach: |