diff options
author | GitLab Bot <gitlab-bot@gitlab.com> | 2022-01-07 18:15:57 +0300 |
---|---|---|
committer | GitLab Bot <gitlab-bot@gitlab.com> | 2022-01-07 18:15:57 +0300 |
commit | c68ee79c332a9a08abaed7eb48fbc563a584d31d (patch) | |
tree | f98369bbc2a0317e0efce3d02054905b82298fb5 /doc/development/background_migrations.md | |
parent | bf57aa76628654e15c2035e21fb29ab39fdea131 (diff) |
Add latest changes from gitlab-org/gitlab@master
Diffstat (limited to 'doc/development/background_migrations.md')
-rw-r--r-- | doc/development/background_migrations.md | 99 |
1 files changed, 35 insertions, 64 deletions
diff --git a/doc/development/background_migrations.md b/doc/development/background_migrations.md index 4a18b2123da..49835085f96 100644 --- a/doc/development/background_migrations.md +++ b/doc/development/background_migrations.md @@ -83,23 +83,11 @@ replacing the class name and arguments with whatever values are necessary for your migration: ```ruby -migrate_async('BackgroundMigrationClassName', [arg1, arg2, ...]) +migrate_in('BackgroundMigrationClassName', [arg1, arg2, ...]) ``` -Usually it's better to enqueue jobs in bulk, for this you can use -`bulk_migrate_async`: - -```ruby -bulk_migrate_async( - [['BackgroundMigrationClassName', [1]], - ['BackgroundMigrationClassName', [2]]] -) -``` - -Note that this will queue a Sidekiq job immediately: if you have a large number -of records, this may not be what you want. You can use the function -`queue_background_migration_jobs_by_range_at_intervals` to split the job into -batches: +You can use the function `queue_background_migration_jobs_by_range_at_intervals` +to automatically split the job into batches: ```ruby queue_background_migration_jobs_by_range_at_intervals( @@ -117,16 +105,6 @@ consuming migrations it's best to schedule a background job using an updates. Removals in turn can be handled by simply defining foreign keys with cascading deletes. -If you would like to schedule jobs in bulk with a delay, you can use -`BackgroundMigrationWorker.bulk_perform_in`: - -```ruby -jobs = [['BackgroundMigrationClassName', [1]], - ['BackgroundMigrationClassName', [2]]] - -bulk_migrate_in(5.minutes, jobs) -``` - ### Rescheduling background migrations If one of the background migrations contains a bug that is fixed in a patch @@ -197,53 +175,47 @@ the new format. ## Example -To explain all this, let's use the following example: the table `services` has a +To explain all this, let's use the following example: the table `integrations` has a field called `properties` which is stored in JSON. For all rows you want to -extract the `url` key from this JSON object and store it in the `services.url` -column. There are millions of services and parsing JSON is slow, thus you can't +extract the `url` key from this JSON object and store it in the `integrations.url` +column. There are millions of integrations and parsing JSON is slow, thus you can't do this in a regular migration. To do this using a background migration we'll start with defining our migration class: ```ruby -class Gitlab::BackgroundMigration::ExtractServicesUrl - class Service < ActiveRecord::Base - self.table_name = 'services' +class Gitlab::BackgroundMigration::ExtractIntegrationsUrl + class Integration < ActiveRecord::Base + self.table_name = 'integrations' end - def perform(service_id) - # A row may be removed between scheduling and starting of a job, thus we - # need to make sure the data is still present before doing any work. - service = Service.select(:properties).find_by(id: service_id) + def perform(start_id, end_id) + Integration.where(id: start_id..end_id).each do |integration| + json = JSON.load(integration.properties) - return unless service - - begin - json = JSON.load(service.properties) + integration.update(url: json['url']) if json['url'] rescue JSON::ParserError # If the JSON is invalid we don't want to keep the job around forever, # instead we'll just leave the "url" field to whatever the default value # is. - return + next end - - service.update(url: json['url']) if json['url'] end end ``` Next we'll need to adjust our code so we schedule the above migration for newly -created and updated services. We can do this using something along the lines of +created and updated integrations. We can do this using something along the lines of the following: ```ruby -class Service < ActiveRecord::Base - after_commit :schedule_service_migration, on: :update - after_commit :schedule_service_migration, on: :create +class Integration < ActiveRecord::Base + after_commit :schedule_integration_migration, on: :update + after_commit :schedule_integration_migration, on: :create - def schedule_service_migration - BackgroundMigrationWorker.perform_async('ExtractServicesUrl', [id]) + def schedule_integration_migration + BackgroundMigrationWorker.perform_async('ExtractIntegrationsUrl', [id, id]) end end ``` @@ -253,21 +225,20 @@ before the transaction completes as doing so can lead to race conditions where the changes are not yet visible to the worker. Next we'll need a post-deployment migration that schedules the migration for -existing data. Since we're dealing with a lot of rows we'll schedule jobs in -batches instead of doing this one by one: +existing data. ```ruby -class ScheduleExtractServicesUrl < Gitlab::Database::Migration[1.0] +class ScheduleExtractIntegrationsUrl < Gitlab::Database::Migration[1.0] disable_ddl_transaction! - def up - define_batchable_model('services').select(:id).in_batches do |relation| - jobs = relation.pluck(:id).map do |id| - ['ExtractServicesUrl', [id]] - end + MIGRATION = 'ExtractIntegrationsUrl' + DELAY_INTERVAL = 2.minutes - BackgroundMigrationWorker.bulk_perform_async(jobs) - end + def up + queue_background_migration_jobs_by_range_at_intervals( + define_batchable_model('integrations'), + MIGRATION, + DELAY_INTERVAL) end def down @@ -284,18 +255,18 @@ jobs and manually run on any un-migrated rows. Such a migration would look like this: ```ruby -class ConsumeRemainingExtractServicesUrlJobs < Gitlab::Database::Migration[1.0] +class ConsumeRemainingExtractIntegrationsUrlJobs < Gitlab::Database::Migration[1.0] disable_ddl_transaction! def up # This must be included - Gitlab::BackgroundMigration.steal('ExtractServicesUrl') + Gitlab::BackgroundMigration.steal('ExtractIntegrationsUrl') # This should be included, but can be skipped - see below - define_batchable_model('services').where(url: nil).each_batch(of: 50) do |batch| + define_batchable_model('integrations').where(url: nil).each_batch(of: 50) do |batch| range = batch.pluck('MIN(id)', 'MAX(id)').first - Gitlab::BackgroundMigration::ExtractServicesUrl.new.perform(*range) + Gitlab::BackgroundMigration::ExtractIntegrationsUrl.new.perform(*range) end end @@ -313,9 +284,9 @@ If the application does not depend on the data being 100% migrated (for instance, the data is advisory, and not mission-critical), then this final step can be skipped. -This migration will then process any jobs for the ExtractServicesUrl migration +This migration will then process any jobs for the ExtractIntegrationsUrl migration and continue once all jobs have been processed. Once done you can safely remove -the `services.properties` column. +the `integrations.properties` column. ## Testing |