Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/batched_background_migrations.md')
-rw-r--r--doc/development/batched_background_migrations.md322
1 files changed, 7 insertions, 315 deletions
diff --git a/doc/development/batched_background_migrations.md b/doc/development/batched_background_migrations.md
index e7703b5dd2b..f5f3655944b 100644
--- a/doc/development/batched_background_migrations.md
+++ b/doc/development/batched_background_migrations.md
@@ -1,319 +1,11 @@
---
-type: reference, dev
-stage: Enablement
-group: Database
-info: "See the Technical Writers assigned to Development Guidelines: https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments-to-development-guidelines"
+redirect_to: 'database/batched_background_migrations.md'
+remove_date: '2022-07-26'
---
-# Batched background migrations
+This document was moved to [another location](database/batched_background_migrations.md).
-Batched Background Migrations should be used to perform data migrations whenever a
-migration exceeds [the time limits](migration_style_guide.md#how-long-a-migration-should-take)
-in our guidelines. For example, you can use batched background
-migrations to migrate data that's stored in a single JSON column
-to a separate table instead.
-
-## When to use batched background migrations
-
-Use a batched background migration when you migrate _data_ in tables containing
-so many rows that the process would exceed
-[the time limits in our guidelines](migration_style_guide.md#how-long-a-migration-should-take)
-if performed using a regular Rails migration.
-
-- Batched background migrations should be used when migrating data in
- [high-traffic tables](migration_style_guide.md#high-traffic-tables).
-- Batched background migrations may also be used when executing numerous single-row queries
- for every item on a large dataset. Typically, for single-record patterns, runtime is
- largely dependent on the size of the dataset. Split the dataset accordingly,
- and put it into background migrations.
-- Don't use batched background migrations to perform schema migrations.
-
-Background migrations can help when:
-
-- Migrating events from one table to multiple separate tables.
-- Populating one column based on JSON stored in another column.
-- Migrating data that depends on the output of external services. (For example, an API.)
-
-NOTE:
-If the batched background migration is part of an important upgrade, it must be announced
-in the release post. Discuss with your Project Manager if you're unsure if the migration falls
-into this category.
-
-## Isolation
-
-Batched background migrations must be isolated and can not use application code. (For example,
-models defined in `app/models`.). Because these migrations can take a long time to
-run, it's possible for new versions to deploy while the migrations are still running.
-
-## Idempotence
-
-Batched background migrations are executed in a context of a Sidekiq process.
-The usual Sidekiq rules apply, especially the rule that jobs should be small
-and idempotent. Make sure that in case that your migration job is retried, data
-integrity is guaranteed.
-
-See [Sidekiq best practices guidelines](https://github.com/mperham/sidekiq/wiki/Best-Practices)
-for more details.
-
-## Batched background migrations for EE-only features
-
-All the background migration classes for EE-only features should be present in GitLab CE.
-For this purpose, create an empty class for GitLab CE, and extend it for GitLab EE
-as explained in the guidelines for
-[implementing Enterprise Edition features](ee_features.md#code-in-libgitlabbackground_migration).
-
-Batched Background migrations are simple classes that define a `perform` method. A
-Sidekiq worker then executes such a class, passing any arguments to it. All
-migration classes must be defined in the namespace
-`Gitlab::BackgroundMigration`. Place the files in the directory
-`lib/gitlab/background_migration/`.
-
-## Queueing
-
-Queueing a batched background migration should be done in a post-deployment
-migration. Use this `queue_batched_background_migration` example, queueing the
-migration to be executed in batches. Replace the class name and arguments with the values
-from your migration:
-
-```ruby
-queue_batched_background_migration(
- JOB_CLASS_NAME,
- TABLE_NAME,
- JOB_ARGUMENTS,
- JOB_INTERVAL
- )
-```
-
-Make sure the newly-created data is either migrated, or
-saved in both the old and new version upon creation. Removals in
-turn can be handled by defining foreign keys with cascading deletes.
-
-### Requeuing batched background migrations
-
-If one of the batched background migrations contains a bug that is fixed in a patch
-release, you must requeue the batched background migration so the migration
-repeats on systems that already performed the initial migration.
-
-When you requeue the batched background migration, turn the original
-queuing into a no-op by clearing up the `#up` and `#down` methods of the
-migration performing the requeuing. Otherwise, the batched background migration is
-queued multiple times on systems that are upgrading multiple patch releases at
-once.
-
-When you start the second post-deployment migration, delete the
-previously batched migration with the provided code:
-
-```ruby
-Gitlab::Database::BackgroundMigration::BatchedMigration
- .for_configuration(MIGRATION_NAME, TABLE_NAME, COLUMN, JOB_ARGUMENTS)
- .delete_all
-```
-
-## Cleaning up
-
-NOTE:
-Cleaning up any remaining background migrations must be done in either a major
-or minor release. You must not do this in a patch release.
-
-Because background migrations can take a long time, you can't immediately clean
-things up after queueing them. For example, you can't drop a column used in the
-migration process, as jobs would fail. You must add a separate _post-deployment_
-migration in a future release that finishes any remaining
-jobs before cleaning things up. (For example, removing a column.)
-
-To migrate the data from column `foo` (containing a big JSON blob) to column `bar`
-(containing a string), you would:
-
-1. Release A:
- 1. Create a migration class that performs the migration for a row with a given ID.
- 1. Update new rows using one of these techniques:
- - Create a new trigger for simple copy operations that don't need application logic.
- - Handle this operation in the model/service as the records are created or updated.
- - Create a new custom background job that updates the records.
- 1. Queue the batched background migration for all existing rows in a post-deployment migration.
-1. Release B:
- 1. Add a post-deployment migration that checks if the batched background migration is completed.
- 1. Deploy code so that the application starts using the new column and stops to update new records.
- 1. Remove the old column.
-
-Bump to the [import/export version](../user/project/settings/import_export.md) may
-be required, if importing a project from a prior version of GitLab requires the
-data to be in the new format.
-
-## Example
-
-The table `integrations` has a field called `properties`, stored in JSON. For all rows,
-extract the `url` key from this JSON object and store it in the `integrations.url`
-column. Millions of integrations exist, and parsing JSON is slow, so you can't
-do this work in a regular migration.
-
-1. Start by defining our migration class:
-
- ```ruby
- class Gitlab::BackgroundMigration::ExtractIntegrationsUrl
- class Integration < ActiveRecord::Base
- self.table_name = 'integrations'
- end
-
- def perform(start_id, end_id)
- Integration.where(id: start_id..end_id).each do |integration|
- json = JSON.load(integration.properties)
-
- integration.update(url: json['url']) if json['url']
- rescue JSON::ParserError
- # If the JSON is invalid we don't want to keep the job around forever,
- # instead we'll just leave the "url" field to whatever the default value
- # is.
- next
- end
- end
- end
- ```
-
- NOTE:
- To get a `connection` in the batched background migration,use an inheritance
- relation using the following base class `Gitlab::BackgroundMigration::BaseJob`.
- For example: `class Gitlab::BackgroundMigration::ExtractIntegrationsUrl < Gitlab::BackgroundMigration::BaseJob`
-
-1. Add a new trigger to the database to update newly created and updated integrations,
- similar to this example:
-
- ```ruby
- execute(<<~SQL)
- CREATE OR REPLACE FUNCTION example() RETURNS trigger
- LANGUAGE plpgsql
- AS $$
- BEGIN
- NEW."url" := NEW.properties -> "url"
- RETURN NEW;
- END;
- $$;
- SQL
- ```
-
-1. Create a post-deployment migration that queues the migration for existing data:
-
- ```ruby
- class QueueExtractIntegrationsUrl < Gitlab::Database::Migration[1.0]
- disable_ddl_transaction!
-
- MIGRATION = 'ExtractIntegrationsUrl'
- DELAY_INTERVAL = 2.minutes
-
- def up
- queue_batched_background_migration(
- MIGRATION,
- :migrations,
- :id,
- job_interval: DELAY_INTERVAL
- )
- end
-
- def down
- Gitlab::Database::BackgroundMigration::BatchedMigration
- .for_configuration(MIGRATION, :migrations, :id, []).delete_all
- end
- end
- ```
-
- After deployment, our application:
- - Continues using the data as before.
- - Ensures that both existing and new data are migrated.
-
-1. In the next release, remove the trigger. We must also add a new post-deployment migration
- that checks that the batched background migration is completed. For example:
-
- ```ruby
- class FinalizeExtractIntegrationsUrlJobs < Gitlab::Database::Migration[1.0]
- MIGRATION = 'ExtractIntegrationsUrl'
- disable_ddl_transaction!
-
- def up
- ensure_batched_background_migration_is_finished(
- job_class_name: MIGRATION,
- table_name: :integrations,
- column_name: :id,
- job_arguments: []
- )
- end
-
- def down
- # no-op
- end
- end
- ```
-
- If the application does not depend on the data being 100% migrated (for
- instance, the data is advisory, and not mission-critical), then you can skip this
- final step. This step confirms that the migration is completed, and all of the rows were migrated.
-
-After the batched migration is completed, you can safely remove the `integrations.properties` column.
-
-## Testing
-
-Writing tests is required for:
-
-- The batched background migrations' queueing migration.
-- The batched background migration itself.
-- A cleanup migration.
-
-The `:migration` and `schema: :latest` RSpec tags are automatically set for
-background migration specs. Refer to the
-[Testing Rails migrations](testing_guide/testing_migrations_guide.md#testing-a-non-activerecordmigration-class)
-style guide.
-
-Remember that `before` and `after` RSpec hooks
-migrate your database down and up. These hooks can result in other batched background
-migrations being called. Using `spy` test doubles with
-`have_received` is encouraged, instead of using regular test doubles, because
-your expectations defined in a `it` block can conflict with what is
-called in RSpec hooks. Refer to [issue #35351](https://gitlab.com/gitlab-org/gitlab/-/issues/18839)
-for more details.
-
-## Best practices
-
-1. Know how much data you're dealing with.
-1. Make sure the batched background migration jobs are idempotent.
-1. Confirm the tests you write are not false positives.
-1. If the data being migrated is critical and cannot be lost, the
- clean-up migration must also check the final state of the data before completing.
-1. Discuss the numbers with a database specialist. The migration may add
- more pressure on DB than you expect. Measure on staging,
- or ask someone to measure on production.
-1. Know how much time is required to run the batched background migration.
-
-## Additional tips and strategies
-
-### Viewing failure error logs
-
-You can view failures in two ways:
-
-- Via GitLab logs:
- 1. After running a batched background migration, if any jobs fail,
- view the logs in [Kibana](https://log.gprd.gitlab.net/goto/5f06a57f768c6025e1c65aefb4075694).
- View the production Sidekiq log and filter for:
-
- - `json.new_state: failed`
- - `json.job_class_name: <Batched Background Migration job class name>`
- - `json.job_arguments: <Batched Background Migration job class arguments>`
-
- 1. Review the `json.exception_class` and `json.exception_message` values to help
- understand why the jobs failed.
-
- 1. Remember the retry mechanism. Having a failure does not mean the job failed.
- Always check the last status of the job.
-
-- Via database:
-
- 1. Get the batched background migration `CLASS_NAME`.
- 1. Execute the following query in the PostgreSQL console:
-
- ```sql
- SELECT migration.id, migration.job_class_name, transition_logs.exception_class, transition_logs.exception_message
- FROM batched_background_migrations as migration
- INNER JOIN batched_background_migration_jobs as jobs
- ON jobs.batched_background_migration_id = migration.id
- INNER JOIN batched_background_migration_job_transition_logs as transition_logs
- ON transition_logs.batched_background_migration_job_id = jobs.id
- WHERE transition_logs.next_status = '2' AND migration.job_class_name = "CLASS_NAME";
- ```
+<!-- This redirect file can be deleted after <2022-07-26>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->