Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2023-07-20 06:08:56 +0300
committerGitLab Bot <gitlab-bot@gitlab.com>2023-07-20 06:08:56 +0300
commit0624f8b515fe2e40268f397cf698e3277c5d3877 (patch)
treeb6fae3b679b2b14619e8480e3845013deafafbec /doc/development
parent3c25c55bf591097e96170c2d2da1b40d295cc981 (diff)
Add latest changes from gitlab-org/gitlab@master
Diffstat (limited to 'doc/development')
-rw-r--r--doc/development/database/batched_background_migrations.md437
-rw-r--r--doc/development/features_inside_dot_gitlab.md2
2 files changed, 215 insertions, 224 deletions
diff --git a/doc/development/database/batched_background_migrations.md b/doc/development/database/batched_background_migrations.md
index 1bdd24afcad..10490df7b5e 100644
--- a/doc/development/database/batched_background_migrations.md
+++ b/doc/development/database/batched_background_migrations.md
@@ -13,6 +13,13 @@ in our guidelines. For example, you can use batched background
migrations to migrate data that's stored in a single JSON column
to a separate table instead.
+NOTE:
+Batched background migrations replaced the legacy background migrations framework.
+Check that documentation in reference to any changes involving that framework.
+
+NOTE:
+The batched background migrations framework has ChatOps support. Using ChatOps, GitLab engineers can interact with the batched background migrations present in the system.
+
## When to use batched background migrations
Use a batched background migration when you migrate _data_ in tables containing
@@ -201,6 +208,13 @@ models defined in `app/models` except the `ApplicationRecord` classes).
Because these migrations can take a long time to run, it's possible
for new versions to deploy while the migrations are still running.
+### Depending on migrated data
+
+Unlike a regular or a post migration, waiting for the next release is not enough to guarantee that the data was fully migrated.
+That means that you shouldn't depend on the data until the BBM is finished. If having 100% of the data migrated is a requirement,
+then, the `ensure_batched_background_migration_is_finished` helper can be used to guarantee that the migration was finished and the
+data fully migrated. ([See an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L13-18)).
+
## How to
### Generate a batched background migration
@@ -538,6 +552,107 @@ Bumping the [import/export version](../../user/project/settings/import_export.md
be required, if importing a project from a prior version of GitLab requires the
data to be in the new format.
+### Add indexes to support batched background migrations
+
+Sometimes it is necessary to add a new or temporary index to support a batched background migration.
+To do this, create the index in a post-deployment migration that precedes the post-deployment
+migration that queues the background migration.
+
+See the documentation for [adding database indexes](adding_database_indexes.md#analyzing-a-new-index-before-a-batched-background-migration)
+for additional information about some cases that require special attention to allow the index to be used directly after
+creation.
+
+### Execute a particular batch on the database testing pipeline
+
+NOTE:
+Only [database maintainers](https://gitlab.com/groups/gitlab-org/maintainers/database/-/group_members?with_inherited_permissions=exclude) can view the database testing pipeline artifacts. Ask one for help if you need to use this method.
+
+Let's assume that a batched background migration failed on a particular batch on GitLab.com and you want to figure out which query failed and why. At the moment, we don't have a good way to retrieve query information (especially the query parameters) and rerunning the entire migration with more logging would be a long process.
+
+Fortunately you can leverage our [database migration pipeline](database_migration_pipeline.md) to rerun a particular batch with additional logging and/or fix to see if it solves the problem.
+
+For an example see [Draft: `Test PG::CardinalityViolation` fix](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/110910) but make sure to read the entire section.
+
+To do that, you need to:
+
+1. [Find the batch `start_id` and `end_id`](#find-the-batch-start_id-and-end_id)
+1. [Create a regular migration](#create-a-regular-migration)
+1. [Apply a workaround for our migration helpers](#apply-a-workaround-for-our-migration-helpers-optional) (optional)
+1. [Start the database migration pipeline](#start-the-database-migration-pipeline)
+
+#### Find the batch `start_id` and `end_id`
+
+You should be able to find those in [Kibana](#viewing-failure-error-logs).
+
+#### Create a regular migration
+
+Schedule the batch in the `up` block of a regular migration:
+
+```ruby
+def up
+ instance = Gitlab::BackgroundMigration::YourBackgroundMigrationClass.new(
+ start_id: <batch start_id>,
+ end_id: <batch end_id>,
+ batch_table: <table name>,
+ batch_column: <batching column>,
+ sub_batch_size: <sub batch size>,
+ pause_ms: <miliseconds between batches>,
+ job_arguments: <job arguments if any>,
+ connection: connection
+ )
+
+ instance.perform
+end
+
+def down
+ # no-op
+end
+```
+
+#### Apply a workaround for our migration helpers (optional)
+
+If your batched background migration touches tables from a schema other than the one you specified by using `restrict_gitlab_migration` helper (example: the scheduling migration has `restrict_gitlab_migration gitlab_schema: :gitlab_main` but the background job uses tables from the `:gitlab_ci` schema) then the migration will fail. To prevent that from happening you must to monkey patch database helpers so they don't fail the testing pipeline job:
+
+1. Add the schema names to [`RestrictGitlabSchema`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb#L57)
+
+```diff
+diff --git a/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb b/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb
+index b8d1d21a0d2d2a23d9e8c8a0a17db98ed1ed40b7..912e20659a6919f771045178c66828563cb5a4a1 100644
+--- a/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb
++++ b/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb
+@@ -55,7 +55,7 @@ def unmatched_schemas
+ end
+
+ def allowed_schemas_for_connection
+- Gitlab::Database.gitlab_schemas_for_connection(connection)
++ Gitlab::Database.gitlab_schemas_for_connection(connection) << :gitlab_ci
+ end
+ end
+ end
+```
+
+1. Add the schema names to [`RestrictAllowedSchemas`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb#L82)
+
+```diff
+diff --git a/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb b/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb
+index 4ae3622479f0800c0553959e132143ec9051898e..d556ec7f55adae9d46a56665ce02de782cb09f2d 100644
+--- a/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb
++++ b/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb
+@@ -79,7 +79,7 @@ def restrict_to_dml_only(parsed)
+ tables = self.dml_tables(parsed)
+ schemas = self.dml_schemas(tables)
+
+- if (schemas - self.allowed_gitlab_schemas).any?
++ if (schemas - (self.allowed_gitlab_schemas << :gitlab_ci)).any?
+ raise DMLAccessDeniedError, \
+ "Select/DML queries (SELECT/UPDATE/DELETE) do access '#{tables}' (#{schemas.to_a}) " \
+ "which is outside of list of allowed schemas: '#{self.allowed_gitlab_schemas}'. " \
+```
+
+#### Start the database migration pipeline
+
+Create a Draft merge request with your changes and trigger the manual `db:gitlabcom-database-testing` job.
+
## Managing
NOTE:
@@ -683,22 +798,109 @@ migration is scheduled in the GitLab FOSS context.
You can use the [generator](#generate-a-batched-background-migration) to generate an EE-only migration scaffold by passing
`--ee-only` flag when generating a new batched background migration.
-## Throttling batched migrations
+## Debug
+
+### Viewing failure error logs
-Because batched migrations are update-heavy and there were few incidents in the past because of the heavy load from migrations while the database was underperforming, a throttling mechanism exists to mitigate them.
+You can view failures in two ways:
-These database indicators are checked to throttle a migration. On getting a
-stop signal, the migration is paused for a set time (10 minutes):
+- Via GitLab logs:
+ 1. After running a batched background migration, if any jobs fail,
+ view the logs in [Kibana](https://log.gprd.gitlab.net/goto/4cb43f40-f861-11ec-b86b-d963a1a6788e).
+ View the production Sidekiq log and filter for:
-- WAL queue pending archival crossing a threshold.
-- Active autovacuum on the tables on which the migration works on.
-- Patroni apdex SLI dropping below the SLO.
+ - `json.new_state: failed`
+ - `json.job_class_name: <Batched Background Migration job class name>`
+ - `json.job_arguments: <Batched Background Migration job class arguments>`
-It's an ongoing effort to add more indicators to further enhance the
-database health check framework. For more details, see
-[epic 7594](https://gitlab.com/groups/gitlab-org/-/epics/7594).
+ 1. Review the `json.exception_class` and `json.exception_message` values to help
+ understand why the jobs failed.
+
+ 1. Remember the retry mechanism. Having a failure does not mean the job failed.
+ Always check the last status of the job.
+
+- Via database:
+
+ 1. Get the batched background migration `CLASS_NAME`.
+ 1. Execute the following query in the PostgreSQL console:
+
+ ```sql
+ SELECT migration.id, migration.job_class_name, transition_logs.exception_class, transition_logs.exception_message
+ FROM batched_background_migrations as migration
+ INNER JOIN batched_background_migration_jobs as jobs
+ ON jobs.batched_background_migration_id = migration.id
+ INNER JOIN batched_background_migration_job_transition_logs as transition_logs
+ ON transition_logs.batched_background_migration_job_id = jobs.id
+ WHERE transition_logs.next_status = '2' AND migration.job_class_name = "CLASS_NAME";
+ ```
+
+## Testing
+
+Writing tests is required for:
+
+- The batched background migrations' queueing migration.
+- The batched background migration itself.
+- A cleanup migration.
+
+The `:migration` and `schema: :latest` RSpec tags are automatically set for
+background migration specs. Refer to the
+[Testing Rails migrations](../testing_guide/testing_migrations_guide.md#testing-a-non-activerecordmigration-class)
+style guide.
+
+Remember that `before` and `after` RSpec hooks
+migrate your database down and up. These hooks can result in other batched background
+migrations being called. Using `spy` test doubles with
+`have_received` is encouraged, instead of using regular test doubles, because
+your expectations defined in a `it` block can conflict with what is
+called in RSpec hooks. Refer to [issue #35351](https://gitlab.com/gitlab-org/gitlab/-/issues/18839)
+for more details.
+
+## Best practices
+
+1. Know how much data you're dealing with.
+1. Make sure the batched background migration jobs are idempotent.
+1. Confirm the tests you write are not false positives.
+1. If the data being migrated is critical and cannot be lost, the
+ clean-up migration must also check the final state of the data before completing.
+1. Discuss the numbers with a database specialist. The migration may add
+ more pressure on DB than you expect. Measure on staging,
+ or ask someone to measure on production.
+1. Know how much time is required to run the batched background migration.
+1. Be careful when silently rescuing exceptions inside job classes. This may lead to
+ jobs being marked as successful, even in a failure scenario.
+
+ ```ruby
+ # good
+ def perform
+ each_sub_batch do |sub_batch|
+ sub_batch.update_all(name: 'My Name')
+ end
+ end
+
+ # acceptable
+ def perform
+ each_sub_batch do |sub_batch|
+ sub_batch.update_all(name: 'My Name')
+ rescue Exception => error
+ logger.error(message: error.message, class: error.class)
+
+ raise
+ end
+ end
+
+ # bad
+ def perform
+ each_sub_batch do |sub_batch|
+ sub_batch.update_all(name: 'My Name')
+ rescue Exception => error
+ logger.error(message: error.message, class: self.class.name)
+ end
+ end
+ ```
+
+## Examples
-## Example
+### Routes use-case
The `routes` table has a `source_type` field that's used for a polymorphic relationship.
As part of a database redesign, we're removing the polymorphic relationship. One step of
@@ -867,216 +1069,3 @@ background migration.
After the batched migration is completed, you can safely depend on the
data in `routes.namespace_id` being populated.
-
-## Testing
-
-Writing tests is required for:
-
-- The batched background migrations' queueing migration.
-- The batched background migration itself.
-- A cleanup migration.
-
-The `:migration` and `schema: :latest` RSpec tags are automatically set for
-background migration specs. Refer to the
-[Testing Rails migrations](../testing_guide/testing_migrations_guide.md#testing-a-non-activerecordmigration-class)
-style guide.
-
-Remember that `before` and `after` RSpec hooks
-migrate your database down and up. These hooks can result in other batched background
-migrations being called. Using `spy` test doubles with
-`have_received` is encouraged, instead of using regular test doubles, because
-your expectations defined in a `it` block can conflict with what is
-called in RSpec hooks. Refer to [issue #35351](https://gitlab.com/gitlab-org/gitlab/-/issues/18839)
-for more details.
-
-## Best practices
-
-1. Know how much data you're dealing with.
-1. Make sure the batched background migration jobs are idempotent.
-1. Confirm the tests you write are not false positives.
-1. If the data being migrated is critical and cannot be lost, the
- clean-up migration must also check the final state of the data before completing.
-1. Discuss the numbers with a database specialist. The migration may add
- more pressure on DB than you expect. Measure on staging,
- or ask someone to measure on production.
-1. Know how much time is required to run the batched background migration.
-1. Be careful when silently rescuing exceptions inside job classes. This may lead to
- jobs being marked as successful, even in a failure scenario.
-
- ```ruby
- # good
- def perform
- each_sub_batch do |sub_batch|
- sub_batch.update_all(name: 'My Name')
- end
- end
-
- # acceptable
- def perform
- each_sub_batch do |sub_batch|
- sub_batch.update_all(name: 'My Name')
- rescue Exception => error
- logger.error(message: error.message, class: error.class)
-
- raise
- end
- end
-
- # bad
- def perform
- each_sub_batch do |sub_batch|
- sub_batch.update_all(name: 'My Name')
- rescue Exception => error
- logger.error(message: error.message, class: self.class.name)
- end
- end
- ```
-
-## Additional tips and strategies
-
-### ChatOps integration
-
-The batched background migrations framework has ChatOps support. Using ChatOps, GitLab engineers can interact with the batched background migrations present in the system.
-
-### Viewing failure error logs
-
-You can view failures in two ways:
-
-- Via GitLab logs:
- 1. After running a batched background migration, if any jobs fail,
- view the logs in [Kibana](https://log.gprd.gitlab.net/goto/4cb43f40-f861-11ec-b86b-d963a1a6788e).
- View the production Sidekiq log and filter for:
-
- - `json.new_state: failed`
- - `json.job_class_name: <Batched Background Migration job class name>`
- - `json.job_arguments: <Batched Background Migration job class arguments>`
-
- 1. Review the `json.exception_class` and `json.exception_message` values to help
- understand why the jobs failed.
-
- 1. Remember the retry mechanism. Having a failure does not mean the job failed.
- Always check the last status of the job.
-
-- Via database:
-
- 1. Get the batched background migration `CLASS_NAME`.
- 1. Execute the following query in the PostgreSQL console:
-
- ```sql
- SELECT migration.id, migration.job_class_name, transition_logs.exception_class, transition_logs.exception_message
- FROM batched_background_migrations as migration
- INNER JOIN batched_background_migration_jobs as jobs
- ON jobs.batched_background_migration_id = migration.id
- INNER JOIN batched_background_migration_job_transition_logs as transition_logs
- ON transition_logs.batched_background_migration_job_id = jobs.id
- WHERE transition_logs.next_status = '2' AND migration.job_class_name = "CLASS_NAME";
- ```
-
-### Executing a particular batch on the database testing pipeline
-
-NOTE:
-Only [database maintainers](https://gitlab.com/groups/gitlab-org/maintainers/database/-/group_members?with_inherited_permissions=exclude) can view the database testing pipeline artifacts. Ask one for help if you need to use this method.
-
-Let's assume that a batched background migration failed on a particular batch on GitLab.com and you want to figure out which query failed and why. At the moment, we don't have a good way to retrieve query information (especially the query parameters) and rerunning the entire migration with more logging would be a long process.
-
-Fortunately you can leverage our [database migration pipeline](database_migration_pipeline.md) to rerun a particular batch with additional logging and/or fix to see if it solves the problem.
-
-<!-- vale gitlab.Substitutions = NO -->
-
-For an example see [Draft: Test PG::CardinalityViolation fix](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/110910) but make sure to read the entire section.
-
-To do that, you need to:
-
-1. Find the batch `start_id` and `end_id`
-1. Create a regular migration
-1. Apply a workaround for our migration helpers (optional)
-1. Start the database migration pipeline
-
-#### 1. Find the batch `start_id` and `end_id`
-
-You should be able to find those in [Kibana](#viewing-failure-error-logs).
-
-#### 2. Create a regular migration
-
-Schedule the batch in the `up` block of a regular migration:
-
-```ruby
-def up
- instance = Gitlab::BackgroundMigration::YourBackgroundMigrationClass.new(
- start_id: <batch start_id>,
- end_id: <batch end_id>,
- batch_table: <table name>,
- batch_column: <batching column>,
- sub_batch_size: <sub batch size>,
- pause_ms: <miliseconds between batches>,
- job_arguments: <job arguments if any>,
- connection: connection
- )
-
- instance.perform
-end
-
-
-def down
- # no-op
-end
-```
-
-#### 3. Apply a workaround for our migration helpers (optional)
-
-If your batched background migration touches tables from a schema other than the one you specified by using `restrict_gitlab_migration` helper (example: the scheduling migration has `restrict_gitlab_migration gitlab_schema: :gitlab_main` but the background job uses tables from the `:gitlab_ci` schema) then the migration will fail. To prevent that from happening you must to monkey patch database helpers so they don't fail the testing pipeline job:
-
-1. Add the schema names to [`RestrictGitlabSchema`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb#L57)
-
-```diff
-diff --git a/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb b/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb
-index b8d1d21a0d2d2a23d9e8c8a0a17db98ed1ed40b7..912e20659a6919f771045178c66828563cb5a4a1 100644
---- a/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb
-+++ b/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb
-@@ -55,7 +55,7 @@ def unmatched_schemas
- end
-
- def allowed_schemas_for_connection
-- Gitlab::Database.gitlab_schemas_for_connection(connection)
-+ Gitlab::Database.gitlab_schemas_for_connection(connection) << :gitlab_ci
- end
- end
- end
-```
-
-1. Add the schema names to [`RestrictAllowedSchemas`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb#L82)
-
-```diff
-diff --git a/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb b/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb
-index 4ae3622479f0800c0553959e132143ec9051898e..d556ec7f55adae9d46a56665ce02de782cb09f2d 100644
---- a/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb
-+++ b/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb
-@@ -79,7 +79,7 @@ def restrict_to_dml_only(parsed)
- tables = self.dml_tables(parsed)
- schemas = self.dml_schemas(tables)
-
-- if (schemas - self.allowed_gitlab_schemas).any?
-+ if (schemas - (self.allowed_gitlab_schemas << :gitlab_ci)).any?
- raise DMLAccessDeniedError, \
- "Select/DML queries (SELECT/UPDATE/DELETE) do access '#{tables}' (#{schemas.to_a}) " \
- "which is outside of list of allowed schemas: '#{self.allowed_gitlab_schemas}'. " \
-```
-
-#### 4. Start the database migration pipeline
-
-Create a Draft merge request with your changes and trigger the manual `db:gitlabcom-database-testing` job.
-
-### Adding indexes to support batched background migrations
-
-Sometimes it is necessary to add a new or temporary index to support a batched background migration.
-To do this, create the index in a post-deployment migration that precedes the post-deployment
-migration that queues the background migration.
-
-See the documentation for [adding database indexes](adding_database_indexes.md#analyzing-a-new-index-before-a-batched-background-migration)
-for additional information about some cases that require special attention to allow the index to be used directly after
-creation.
-
-## Legacy background migrations
-
-Batched background migrations replaced the legacy background migrations framework.
-Check that documentation in reference to any changes involving that framework.
diff --git a/doc/development/features_inside_dot_gitlab.md b/doc/development/features_inside_dot_gitlab.md
index f55d7e52fbc..b7eda423584 100644
--- a/doc/development/features_inside_dot_gitlab.md
+++ b/doc/development/features_inside_dot_gitlab.md
@@ -17,3 +17,5 @@ When implementing new features, please refer to these existing features to avoid
- [Customize Auto DevOps Helm Values](../topics/autodevops/customize.md#customize-helm-chart-values): `.gitlab/auto-deploy-values.yaml`.
- [Insights](../user/project/insights/index.md#configure-project-insights): `.gitlab/insights.yml`.
- [Service Desk Templates](../user/project/service_desk.md#customize-emails-sent-to-the-requester): `.gitlab/service_desk_templates/`.
+- [Secret Detection Custom Rulesets](../user/application_security/secret_detection/index.md#disable-predefined-analyzer-rules): `.gitlab/secret-detection-ruleset.toml`
+- [Static Analysis Custom Rulesets](../user/application_security/sast/customize_rulesets.md#create-the-configuration-file): `.gitlab/sast-ruleset.toml`