1 files changed, 123 insertions, 75 deletions
diff --git a/doc/development/database/batched_background_migrations.md b/doc/development/database/batched_background_migrations.md
index f3ea82b5c61..edb22fcf436 100644
--- a/doc/development/database/batched_background_migrations.md
+++ b/doc/development/database/batched_background_migrations.md
@@ -105,11 +105,16 @@ for more details.
 
 ## Batched background migrations for EE-only features
 
-All the background migration classes for EE-only features should be present in GitLab CE.
-For this purpose, create an empty class for GitLab CE, and extend it for GitLab EE
+All the background migration classes for EE-only features should be present in GitLab FOSS.
+For this purpose, create an empty class for GitLab FOSS, and extend it for GitLab EE
 as explained in the guidelines for
 [implementing Enterprise Edition features](../ee_features.md#code-in-libgitlabbackground_migration).
 
+NOTE:
+Background migration classes for EE-only features that use job arguments should define them
+in the GitLab FOSS class. This is required to prevent job arguments validation from failing when
+migration is scheduled in GitLab FOSS context.
+
 Batched Background migrations are simple classes that define a `perform` method. A
 Sidekiq worker then executes such a class, passing any arguments to it. All
 migration classes must be defined in the namespace
@@ -132,6 +137,10 @@ queue_batched_background_migration(
   )
 ```
 
+NOTE:
+This helper raises an error if the number of provided job arguments does not match
+the number of [job arguments](#job-arguments) defined in `JOB_CLASS_NAME`.
+
 Make sure the newly-created data is either migrated, or
 saved in both the old and new version upon creation. Removals in
 turn can be handled by defining foreign keys with cascading deletes.
@@ -186,6 +195,115 @@ Bump to the [import/export version](../../user/project/settings/import_export.md
 be required, if importing a project from a prior version of GitLab requires the
 data to be in the new format.
 
+## Job arguments
+
+`BatchedMigrationJob` provides the `job_arguments` helper method for job classes to define the job arguments they need.
+
+Batched migrations scheduled with `queue_batched_background_migration` **must** use the helper to define the job arguments:
+
+```ruby
+queue_batched_background_migration(
+  'CopyColumnUsingBackgroundMigrationJob',
+  TABLE_NAME,
+  'name', 'name_convert_to_text',
+  job_interval: DELAY_INTERVAL
+)
+```
+
+NOTE:
+If the number of defined job arguments does not match the number of job arguments provided when
+scheduling the migration, `queue_batched_background_migration` raises an error.
+
+In this example, `copy_from` returns `name`, and `copy_to` returns `name_convert_to_text`:
+
+```ruby
+class CopyColumnUsingBackgroundMigrationJob < BatchedMigrationJob
+  job_arguments :copy_from, :copy_to
+
+  def perform
+    from_column = connection.quote_column_name(copy_from)
+    to_column = connection.quote_column_name(copy_to)
+
+    assignment_clause = "#{to_column} = #{from_column}"
+
+    each_sub_batch(operation_name: :update_all) do |relation|
+      relation.update_all(assignment_clause)
+    end
+  end
+end
+```
+
+### Additional filters
+
+By default, when creating background jobs to perform the migration, batched background migrations
+iterate over the full specified table. This iteration is done using the
+[`PrimaryKeyBatchingStrategy`](https://gitlab.com/gitlab-org/gitlab/-/blob/c9dabd1f4b8058eece6d8cb4af95e9560da9a2ee/lib/gitlab/database/migrations/batched_background_migration_helpers.rb#L17). If the table has 1000 records
+and the batch size is 100, the work is batched into 10 jobs. For illustrative purposes,
+`EachBatch` is used like this:
+
+```ruby
+# PrimaryKeyBatchingStrategy
+Namespace.each_batch(of: 100) do |relation|
+  relation.where(type: nil).update_all(type: 'User') # this happens in each background job
+end
+```
+
+In some cases, only a subset of records must be examined. If only 10% of the 1000 records
+need examination, apply a filter to the initial relation when the jobs are created:
+
+```ruby
+Namespace.where(type: nil).each_batch(of: 100) do |relation|
+  relation.update_all(type: 'User')
+end
+```
+
+In the first example, we don't know how many records will be updated in each batch.
+In the second (filtered) example, we know exactly 100 will be updated with each batch.
+
+`BatchedMigrationJob` provides a `scope_to` helper method to apply additional filters and achieve this:
+
+1. Create a new migration job class that inherits from `BatchedMigrationJob` and defines the additional filter:
+
+   ```ruby
+   class BackfillNamespaceType < BatchedMigrationJob
+     scope_to ->(relation) { relation.where(type: nil) }
+
+     def perform
+       each_sub_batch(operation_name: :update_all) do |sub_batch|
+         sub_batch.update_all(type: 'User')
+       end
+     end
+   end
+   ```
+
+1. In the post-deployment migration, enqueue the batched background migration:
+
+   ```ruby
+   class BackfillNamespaceType < Gitlab::Database::Migration[2.0]
+     MIGRATION = 'BackfillNamespaceType'
+     DELAY_INTERVAL = 2.minutes
+
+     restrict_gitlab_migration gitlab_schema: :gitlab_main
+
+     def up
+       queue_batched_background_migration(
+         MIGRATION,
+         :namespaces,
+         :id,
+         job_interval: DELAY_INTERVAL
+       )
+     end
+
+     def down
+       delete_batched_background_migration(MIGRATION, :namespaces, :id, [])
+     end
+   end
+   ```
+
+NOTE:
+When applying additional filters, it is important to ensure they are properly covered by an index to optimize `EachBatch` performance.
+In the example above we need an index on `(type, id)` to support the filters. See [the `EachBatch` docs for more information](../iterating_tables_in_batches.md).
+
 ## Example
 
 The `routes` table has a `source_type` field that's used for a polymorphic relationship.
@@ -221,8 +339,6 @@ background migration.
    correctly handled by the batched migration framework. Any subclass of
    `BatchedMigrationJob` is initialized with necessary arguments to
    execute the batch, as well as a connection to the tracking database.
-   Additional `job_arguments` set on the migration are passed to the
-   job's `perform` method.
 
 1. Add a new trigger to the database to update newly created and updated routes,
    similar to this example:
@@ -320,7 +436,7 @@ The default batching strategy provides an efficient way to iterate over primary
 However, if you need to iterate over columns where values are not unique, you must use a
 different batching strategy.
 
-The `LooseIndexScanBatchingStrategy` batching strategy uses a special version of [`EachBatch`](../iterating_tables_in_batches.md#loose-index-scan-with-distinct_each_batch)
+The `LooseIndexScanBatchingStrategy` batching strategy uses a special version of [`EachBatch`](iterating_tables_in_batches.md#loose-index-scan-with-distinct_each_batch)
 to provide efficient and stable iteration over the distinct column values.
 
 This example shows a batched background migration where the `issues.project_id` column is used as
@@ -374,76 +490,8 @@ module Gitlab
 end
 ```
 
-### Adding filters to the initial batching
-
-By default, when creating background jobs to perform the migration, batched background migrations will iterate over the full specified table. This is done using the [`PrimaryKeyBatchingStrategy`](https://gitlab.com/gitlab-org/gitlab/-/blob/c9dabd1f4b8058eece6d8cb4af95e9560da9a2ee/lib/gitlab/database/migrations/batched_background_migration_helpers.rb#L17). This means if there are 1000 records in the table and the batch size is 100, there will be 10 jobs. For illustrative purposes, `EachBatch` is used like this:
-
-```ruby
-# PrimaryKeyBatchingStrategy
-Projects.all.each_batch(of: 100) do |relation|
-  relation.where(foo: nil).update_all(foo: 'bar') # this happens in each background job
-end
-```
-
-There are cases where we only need to look at a subset of records. Perhaps we only need to update 1 out of every 10 of those 1000 records. It would be best if we could apply a filter to the initial relation when the jobs are created:
-
-```ruby
-Projects.where(foo: nil).each_batch(of: 100) do |relation|
-  relation.update_all(foo: 'bar')
-end
-```
-
-In the `PrimaryKeyBatchingStrategy` example, we do not know how many records will be updated in each batch. In the filtered example, we know exactly 100 will be updated with each batch.
-
-The `PrimaryKeyBatchingStrategy` contains [a method that can be overwritten](https://gitlab.com/gitlab-org/gitlab/-/blob/dd1e70d3676891025534dc4a1e89ca9383178fe7/lib/gitlab/background_migration/batching_strategies/primary_key_batching_strategy.rb#L38-52) to apply additional filtering on the initial `EachBatch`.
-
-We can accomplish this by:
-
-1. Create a new class that inherits from `PrimaryKeyBatchingStrategy` and overrides the method using the desired filter (this may be the same filter used in the sub-batch):
-
-   ```ruby
-   # frozen_string_literal: true
-
-   module GitLab
-     module BackgroundMigration
-       module BatchingStrategies
-         class FooStrategy < PrimaryKeyBatchingStrategy
-           def apply_additional_filters(relation, job_arguments: [], job_class: nil)
-             relation.where(foo: nil)
-           end
-         end
-       end
-     end
-   end
-   ```
-
-1. In the post-deployment migration that queues the batched background migration, specify the new batching strategy using the `batch_class_name` parameter:
-
-   ```ruby
-   class BackfillProjectsFoo < Gitlab::Database::Migration[2.0]
-     MIGRATION = 'BackfillProjectsFoo'
-     DELAY_INTERVAL = 2.minutes
-     BATCH_CLASS_NAME = 'FooStrategy'
-
-     restrict_gitlab_migration gitlab_schema: :gitlab_main
-
-     def up
-       queue_batched_background_migration(
-         MIGRATION,
-         :routes,
-         :id,
-         job_interval: DELAY_INTERVAL,
-         batch_class_name: BATCH_CLASS_NAME
-       )
-     end
-
-     def down
-       delete_batched_background_migration(MIGRATION, :routes, :id, [])
-     end
-   end
-   ```
-
-When applying a batching strategy, it is important to ensure the filter properly covered by an index to optimize `EachBatch` performance. See [the `EachBatch` docs for more information](../iterating_tables_in_batches.md).
+NOTE:
+[Additional filters](#additional-filters) defined with `scope_to` will be ignored by `LooseIndexScanBatchingStrategy` and `distinct_each_batch`.
 
 ## Testing