Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2022-01-07 18:15:57 +0300
committerGitLab Bot <gitlab-bot@gitlab.com>2022-01-07 18:15:57 +0300
commitc68ee79c332a9a08abaed7eb48fbc563a584d31d (patch)
treef98369bbc2a0317e0efce3d02054905b82298fb5 /doc/development/background_migrations.md
parentbf57aa76628654e15c2035e21fb29ab39fdea131 (diff)
Add latest changes from gitlab-org/gitlab@master
Diffstat (limited to 'doc/development/background_migrations.md')
-rw-r--r--doc/development/background_migrations.md99
1 files changed, 35 insertions, 64 deletions
diff --git a/doc/development/background_migrations.md b/doc/development/background_migrations.md
index 4a18b2123da..49835085f96 100644
--- a/doc/development/background_migrations.md
+++ b/doc/development/background_migrations.md
@@ -83,23 +83,11 @@ replacing the class name and arguments with whatever values are necessary for
your migration:
```ruby
-migrate_async('BackgroundMigrationClassName', [arg1, arg2, ...])
+migrate_in('BackgroundMigrationClassName', [arg1, arg2, ...])
```
-Usually it's better to enqueue jobs in bulk, for this you can use
-`bulk_migrate_async`:
-
-```ruby
-bulk_migrate_async(
- [['BackgroundMigrationClassName', [1]],
- ['BackgroundMigrationClassName', [2]]]
-)
-```
-
-Note that this will queue a Sidekiq job immediately: if you have a large number
-of records, this may not be what you want. You can use the function
-`queue_background_migration_jobs_by_range_at_intervals` to split the job into
-batches:
+You can use the function `queue_background_migration_jobs_by_range_at_intervals`
+to automatically split the job into batches:
```ruby
queue_background_migration_jobs_by_range_at_intervals(
@@ -117,16 +105,6 @@ consuming migrations it's best to schedule a background job using an
updates. Removals in turn can be handled by simply defining foreign keys with
cascading deletes.
-If you would like to schedule jobs in bulk with a delay, you can use
-`BackgroundMigrationWorker.bulk_perform_in`:
-
-```ruby
-jobs = [['BackgroundMigrationClassName', [1]],
- ['BackgroundMigrationClassName', [2]]]
-
-bulk_migrate_in(5.minutes, jobs)
-```
-
### Rescheduling background migrations
If one of the background migrations contains a bug that is fixed in a patch
@@ -197,53 +175,47 @@ the new format.
## Example
-To explain all this, let's use the following example: the table `services` has a
+To explain all this, let's use the following example: the table `integrations` has a
field called `properties` which is stored in JSON. For all rows you want to
-extract the `url` key from this JSON object and store it in the `services.url`
-column. There are millions of services and parsing JSON is slow, thus you can't
+extract the `url` key from this JSON object and store it in the `integrations.url`
+column. There are millions of integrations and parsing JSON is slow, thus you can't
do this in a regular migration.
To do this using a background migration we'll start with defining our migration
class:
```ruby
-class Gitlab::BackgroundMigration::ExtractServicesUrl
- class Service < ActiveRecord::Base
- self.table_name = 'services'
+class Gitlab::BackgroundMigration::ExtractIntegrationsUrl
+ class Integration < ActiveRecord::Base
+ self.table_name = 'integrations'
end
- def perform(service_id)
- # A row may be removed between scheduling and starting of a job, thus we
- # need to make sure the data is still present before doing any work.
- service = Service.select(:properties).find_by(id: service_id)
+ def perform(start_id, end_id)
+ Integration.where(id: start_id..end_id).each do |integration|
+ json = JSON.load(integration.properties)
- return unless service
-
- begin
- json = JSON.load(service.properties)
+ integration.update(url: json['url']) if json['url']
rescue JSON::ParserError
# If the JSON is invalid we don't want to keep the job around forever,
# instead we'll just leave the "url" field to whatever the default value
# is.
- return
+ next
end
-
- service.update(url: json['url']) if json['url']
end
end
```
Next we'll need to adjust our code so we schedule the above migration for newly
-created and updated services. We can do this using something along the lines of
+created and updated integrations. We can do this using something along the lines of
the following:
```ruby
-class Service < ActiveRecord::Base
- after_commit :schedule_service_migration, on: :update
- after_commit :schedule_service_migration, on: :create
+class Integration < ActiveRecord::Base
+ after_commit :schedule_integration_migration, on: :update
+ after_commit :schedule_integration_migration, on: :create
- def schedule_service_migration
- BackgroundMigrationWorker.perform_async('ExtractServicesUrl', [id])
+ def schedule_integration_migration
+ BackgroundMigrationWorker.perform_async('ExtractIntegrationsUrl', [id, id])
end
end
```
@@ -253,21 +225,20 @@ before the transaction completes as doing so can lead to race conditions where
the changes are not yet visible to the worker.
Next we'll need a post-deployment migration that schedules the migration for
-existing data. Since we're dealing with a lot of rows we'll schedule jobs in
-batches instead of doing this one by one:
+existing data.
```ruby
-class ScheduleExtractServicesUrl < Gitlab::Database::Migration[1.0]
+class ScheduleExtractIntegrationsUrl < Gitlab::Database::Migration[1.0]
disable_ddl_transaction!
- def up
- define_batchable_model('services').select(:id).in_batches do |relation|
- jobs = relation.pluck(:id).map do |id|
- ['ExtractServicesUrl', [id]]
- end
+ MIGRATION = 'ExtractIntegrationsUrl'
+ DELAY_INTERVAL = 2.minutes
- BackgroundMigrationWorker.bulk_perform_async(jobs)
- end
+ def up
+ queue_background_migration_jobs_by_range_at_intervals(
+ define_batchable_model('integrations'),
+ MIGRATION,
+ DELAY_INTERVAL)
end
def down
@@ -284,18 +255,18 @@ jobs and manually run on any un-migrated rows. Such a migration would look like
this:
```ruby
-class ConsumeRemainingExtractServicesUrlJobs < Gitlab::Database::Migration[1.0]
+class ConsumeRemainingExtractIntegrationsUrlJobs < Gitlab::Database::Migration[1.0]
disable_ddl_transaction!
def up
# This must be included
- Gitlab::BackgroundMigration.steal('ExtractServicesUrl')
+ Gitlab::BackgroundMigration.steal('ExtractIntegrationsUrl')
# This should be included, but can be skipped - see below
- define_batchable_model('services').where(url: nil).each_batch(of: 50) do |batch|
+ define_batchable_model('integrations').where(url: nil).each_batch(of: 50) do |batch|
range = batch.pluck('MIN(id)', 'MAX(id)').first
- Gitlab::BackgroundMigration::ExtractServicesUrl.new.perform(*range)
+ Gitlab::BackgroundMigration::ExtractIntegrationsUrl.new.perform(*range)
end
end
@@ -313,9 +284,9 @@ If the application does not depend on the data being 100% migrated (for
instance, the data is advisory, and not mission-critical), then this final step
can be skipped.
-This migration will then process any jobs for the ExtractServicesUrl migration
+This migration will then process any jobs for the ExtractIntegrationsUrl migration
and continue once all jobs have been processed. Once done you can safely remove
-the `services.properties` column.
+the `integrations.properties` column.
## Testing