Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2023-08-18 13:50:51 +0300
committerGitLab Bot <gitlab-bot@gitlab.com>2023-08-18 13:50:51 +0300
commitdb384e6b19af03b4c3c82a5760d83a3fd79f7982 (patch)
tree34beaef37df5f47ccbcf5729d7583aae093cffa0 /doc/development/database
parent54fd7b1bad233e3944434da91d257fa7f63c3996 (diff)
Add latest changes from gitlab-org/gitlab@16-3-stable-eev16.3.0-rc42
Diffstat (limited to 'doc/development/database')
-rw-r--r--doc/development/database/avoiding_downtime_in_migrations.md23
-rw-r--r--doc/development/database/batched_background_migrations.md437
-rw-r--r--doc/development/database/database_lab.md6
-rw-r--r--doc/development/database/database_migration_pipeline.md7
-rw-r--r--doc/development/database/database_reviewer_guidelines.md4
-rw-r--r--doc/development/database/efficient_in_operator_queries.md3
-rw-r--r--doc/development/database/iterating_tables_in_batches.md21
-rw-r--r--doc/development/database/multiple_databases.md195
-rw-r--r--doc/development/database/not_null_constraints.md97
-rw-r--r--doc/development/database/pagination_guidelines.md4
-rw-r--r--doc/development/database/query_recorder.md48
-rw-r--r--doc/development/database/single_table_inheritance.md60
-rw-r--r--doc/development/database/strings_and_the_text_data_type.md2
13 files changed, 534 insertions, 373 deletions
diff --git a/doc/development/database/avoiding_downtime_in_migrations.md b/doc/development/database/avoiding_downtime_in_migrations.md
index 6a819e9f6cd..371df5b45ff 100644
--- a/doc/development/database/avoiding_downtime_in_migrations.md
+++ b/doc/development/database/avoiding_downtime_in_migrations.md
@@ -29,22 +29,23 @@ and upgrade processes for self-managed installations that lump together any of t
### Ignoring the column (release M)
-The first step is to ignore the column in the application code. This step is
-necessary because Rails caches the columns and re-uses this cache in various
-places. This can be done by defining the columns to ignore. For example, to ignore
+The first step is to ignore the column in the application code and remove all code references to it including
+model validations.
+This step is necessary because Rails caches the columns and re-uses this cache in various
+places. This can be done by defining the columns to ignore. For example, in release `12.5`, to ignore
`updated_at` in the User model you'd use the following:
```ruby
class User < ApplicationRecord
include IgnorableColumns
- ignore_column :updated_at, remove_with: '12.7', remove_after: '2020-01-22'
+ ignore_column :updated_at, remove_with: '12.7', remove_after: '2019-12-22'
end
```
Multiple columns can be ignored, too:
```ruby
-ignore_columns %i[updated_at created_at], remove_with: '12.7', remove_after: '2020-01-22'
+ignore_columns %i[updated_at created_at], remove_with: '12.7', remove_after: '2019-12-22'
```
If the model exists in CE and EE, the column has to be ignored in the CE model. If the
@@ -52,21 +53,21 @@ model only exists in EE, then it has to be added there.
We require indication of when it is safe to remove the column ignore rule with:
-- `remove_with`: set to a GitLab release typically two releases (M+2) after adding the
+- `remove_with`: set to a GitLab release typically two releases (M+2) (`12.7` in our example) after adding the
column ignore.
- `remove_after`: set to a date after which we consider it safe to remove the column
- ignore, typically after the M+1 release date, during the M+2 development cycle.
+ ignore, typically after the M+1 release date, during the M+2 development cycle. For example, since the development cycle of `12.7` is between `2019-12-18` and `2020-01-17`, and `12.6` is the release to [drop the column](#dropping-the-column-release-m1), it's safe to set the date to the release date of `12.6` as `2019-12-22`.
This information allows us to reason better about column ignores and makes sure we
don't remove column ignores too early for both regular releases and deployments to GitLab.com. For
example, this avoids a situation where we deploy a bulk of changes that include both changes
to ignore the column and subsequently remove the column ignore (which would result in a downtime).
-In this example, the change to ignore the column went into release 12.5.
+In this example, the change to ignore the column went into release `12.5`.
### Dropping the column (release M+1)
-Continuing our example, dropping the column goes into a _post-deployment_ migration in release 12.6:
+Continuing our example, dropping the column goes into a _post-deployment_ migration in release `12.6`:
Start by creating the **post-deployment migration**:
@@ -135,7 +136,7 @@ for more information about database migrations.
### Removing the ignore rule (release M+2)
-With the next release, in this example 12.7, we set up another merge request to remove the ignore rule.
+With the next release, in this example `12.7`, we set up another merge request to remove the ignore rule.
This removes the `ignore_column` line and - if not needed anymore - also the inclusion of `IgnoreableColumns`.
This should only get merged with the release indicated with `remove_with` and once
@@ -195,7 +196,7 @@ This step is similar to [the first step when column is dropped](#ignoring-the-co
```ruby
class User < ApplicationRecord
include IgnorableColumns
- ignore_column :updated_at, remove_with: '12.7', remove_after: '2020-01-22'
+ ignore_column :updated_at, remove_with: '12.7', remove_after: '2019-12-22'
end
```
diff --git a/doc/development/database/batched_background_migrations.md b/doc/development/database/batched_background_migrations.md
index 1bdd24afcad..10490df7b5e 100644
--- a/doc/development/database/batched_background_migrations.md
+++ b/doc/development/database/batched_background_migrations.md
@@ -13,6 +13,13 @@ in our guidelines. For example, you can use batched background
migrations to migrate data that's stored in a single JSON column
to a separate table instead.
+NOTE:
+Batched background migrations replaced the legacy background migrations framework.
+Check that documentation in reference to any changes involving that framework.
+
+NOTE:
+The batched background migrations framework has ChatOps support. Using ChatOps, GitLab engineers can interact with the batched background migrations present in the system.
+
## When to use batched background migrations
Use a batched background migration when you migrate _data_ in tables containing
@@ -201,6 +208,13 @@ models defined in `app/models` except the `ApplicationRecord` classes).
Because these migrations can take a long time to run, it's possible
for new versions to deploy while the migrations are still running.
+### Depending on migrated data
+
+Unlike a regular or a post migration, waiting for the next release is not enough to guarantee that the data was fully migrated.
+That means that you shouldn't depend on the data until the BBM is finished. If having 100% of the data migrated is a requirement,
+then, the `ensure_batched_background_migration_is_finished` helper can be used to guarantee that the migration was finished and the
+data fully migrated. ([See an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L13-18)).
+
## How to
### Generate a batched background migration
@@ -538,6 +552,107 @@ Bumping the [import/export version](../../user/project/settings/import_export.md
be required, if importing a project from a prior version of GitLab requires the
data to be in the new format.
+### Add indexes to support batched background migrations
+
+Sometimes it is necessary to add a new or temporary index to support a batched background migration.
+To do this, create the index in a post-deployment migration that precedes the post-deployment
+migration that queues the background migration.
+
+See the documentation for [adding database indexes](adding_database_indexes.md#analyzing-a-new-index-before-a-batched-background-migration)
+for additional information about some cases that require special attention to allow the index to be used directly after
+creation.
+
+### Execute a particular batch on the database testing pipeline
+
+NOTE:
+Only [database maintainers](https://gitlab.com/groups/gitlab-org/maintainers/database/-/group_members?with_inherited_permissions=exclude) can view the database testing pipeline artifacts. Ask one for help if you need to use this method.
+
+Let's assume that a batched background migration failed on a particular batch on GitLab.com and you want to figure out which query failed and why. At the moment, we don't have a good way to retrieve query information (especially the query parameters) and rerunning the entire migration with more logging would be a long process.
+
+Fortunately you can leverage our [database migration pipeline](database_migration_pipeline.md) to rerun a particular batch with additional logging and/or fix to see if it solves the problem.
+
+For an example see [Draft: `Test PG::CardinalityViolation` fix](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/110910) but make sure to read the entire section.
+
+To do that, you need to:
+
+1. [Find the batch `start_id` and `end_id`](#find-the-batch-start_id-and-end_id)
+1. [Create a regular migration](#create-a-regular-migration)
+1. [Apply a workaround for our migration helpers](#apply-a-workaround-for-our-migration-helpers-optional) (optional)
+1. [Start the database migration pipeline](#start-the-database-migration-pipeline)
+
+#### Find the batch `start_id` and `end_id`
+
+You should be able to find those in [Kibana](#viewing-failure-error-logs).
+
+#### Create a regular migration
+
+Schedule the batch in the `up` block of a regular migration:
+
+```ruby
+def up
+ instance = Gitlab::BackgroundMigration::YourBackgroundMigrationClass.new(
+ start_id: <batch start_id>,
+ end_id: <batch end_id>,
+ batch_table: <table name>,
+ batch_column: <batching column>,
+ sub_batch_size: <sub batch size>,
+ pause_ms: <miliseconds between batches>,
+ job_arguments: <job arguments if any>,
+ connection: connection
+ )
+
+ instance.perform
+end
+
+def down
+ # no-op
+end
+```
+
+#### Apply a workaround for our migration helpers (optional)
+
+If your batched background migration touches tables from a schema other than the one you specified by using `restrict_gitlab_migration` helper (example: the scheduling migration has `restrict_gitlab_migration gitlab_schema: :gitlab_main` but the background job uses tables from the `:gitlab_ci` schema) then the migration will fail. To prevent that from happening you must to monkey patch database helpers so they don't fail the testing pipeline job:
+
+1. Add the schema names to [`RestrictGitlabSchema`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb#L57)
+
+```diff
+diff --git a/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb b/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb
+index b8d1d21a0d2d2a23d9e8c8a0a17db98ed1ed40b7..912e20659a6919f771045178c66828563cb5a4a1 100644
+--- a/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb
++++ b/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb
+@@ -55,7 +55,7 @@ def unmatched_schemas
+ end
+
+ def allowed_schemas_for_connection
+- Gitlab::Database.gitlab_schemas_for_connection(connection)
++ Gitlab::Database.gitlab_schemas_for_connection(connection) << :gitlab_ci
+ end
+ end
+ end
+```
+
+1. Add the schema names to [`RestrictAllowedSchemas`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb#L82)
+
+```diff
+diff --git a/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb b/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb
+index 4ae3622479f0800c0553959e132143ec9051898e..d556ec7f55adae9d46a56665ce02de782cb09f2d 100644
+--- a/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb
++++ b/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb
+@@ -79,7 +79,7 @@ def restrict_to_dml_only(parsed)
+ tables = self.dml_tables(parsed)
+ schemas = self.dml_schemas(tables)
+
+- if (schemas - self.allowed_gitlab_schemas).any?
++ if (schemas - (self.allowed_gitlab_schemas << :gitlab_ci)).any?
+ raise DMLAccessDeniedError, \
+ "Select/DML queries (SELECT/UPDATE/DELETE) do access '#{tables}' (#{schemas.to_a}) " \
+ "which is outside of list of allowed schemas: '#{self.allowed_gitlab_schemas}'. " \
+```
+
+#### Start the database migration pipeline
+
+Create a Draft merge request with your changes and trigger the manual `db:gitlabcom-database-testing` job.
+
## Managing
NOTE:
@@ -683,22 +798,109 @@ migration is scheduled in the GitLab FOSS context.
You can use the [generator](#generate-a-batched-background-migration) to generate an EE-only migration scaffold by passing
`--ee-only` flag when generating a new batched background migration.
-## Throttling batched migrations
+## Debug
+
+### Viewing failure error logs
-Because batched migrations are update-heavy and there were few incidents in the past because of the heavy load from migrations while the database was underperforming, a throttling mechanism exists to mitigate them.
+You can view failures in two ways:
-These database indicators are checked to throttle a migration. On getting a
-stop signal, the migration is paused for a set time (10 minutes):
+- Via GitLab logs:
+ 1. After running a batched background migration, if any jobs fail,
+ view the logs in [Kibana](https://log.gprd.gitlab.net/goto/4cb43f40-f861-11ec-b86b-d963a1a6788e).
+ View the production Sidekiq log and filter for:
-- WAL queue pending archival crossing a threshold.
-- Active autovacuum on the tables on which the migration works on.
-- Patroni apdex SLI dropping below the SLO.
+ - `json.new_state: failed`
+ - `json.job_class_name: <Batched Background Migration job class name>`
+ - `json.job_arguments: <Batched Background Migration job class arguments>`
-It's an ongoing effort to add more indicators to further enhance the
-database health check framework. For more details, see
-[epic 7594](https://gitlab.com/groups/gitlab-org/-/epics/7594).
+ 1. Review the `json.exception_class` and `json.exception_message` values to help
+ understand why the jobs failed.
+
+ 1. Remember the retry mechanism. Having a failure does not mean the job failed.
+ Always check the last status of the job.
+
+- Via database:
+
+ 1. Get the batched background migration `CLASS_NAME`.
+ 1. Execute the following query in the PostgreSQL console:
+
+ ```sql
+ SELECT migration.id, migration.job_class_name, transition_logs.exception_class, transition_logs.exception_message
+ FROM batched_background_migrations as migration
+ INNER JOIN batched_background_migration_jobs as jobs
+ ON jobs.batched_background_migration_id = migration.id
+ INNER JOIN batched_background_migration_job_transition_logs as transition_logs
+ ON transition_logs.batched_background_migration_job_id = jobs.id
+ WHERE transition_logs.next_status = '2' AND migration.job_class_name = "CLASS_NAME";
+ ```
+
+## Testing
+
+Writing tests is required for:
+
+- The batched background migrations' queueing migration.
+- The batched background migration itself.
+- A cleanup migration.
+
+The `:migration` and `schema: :latest` RSpec tags are automatically set for
+background migration specs. Refer to the
+[Testing Rails migrations](../testing_guide/testing_migrations_guide.md#testing-a-non-activerecordmigration-class)
+style guide.
+
+Remember that `before` and `after` RSpec hooks
+migrate your database down and up. These hooks can result in other batched background
+migrations being called. Using `spy` test doubles with
+`have_received` is encouraged, instead of using regular test doubles, because
+your expectations defined in a `it` block can conflict with what is
+called in RSpec hooks. Refer to [issue #35351](https://gitlab.com/gitlab-org/gitlab/-/issues/18839)
+for more details.
+
+## Best practices
+
+1. Know how much data you're dealing with.
+1. Make sure the batched background migration jobs are idempotent.
+1. Confirm the tests you write are not false positives.
+1. If the data being migrated is critical and cannot be lost, the
+ clean-up migration must also check the final state of the data before completing.
+1. Discuss the numbers with a database specialist. The migration may add
+ more pressure on DB than you expect. Measure on staging,
+ or ask someone to measure on production.
+1. Know how much time is required to run the batched background migration.
+1. Be careful when silently rescuing exceptions inside job classes. This may lead to
+ jobs being marked as successful, even in a failure scenario.
+
+ ```ruby
+ # good
+ def perform
+ each_sub_batch do |sub_batch|
+ sub_batch.update_all(name: 'My Name')
+ end
+ end
+
+ # acceptable
+ def perform
+ each_sub_batch do |sub_batch|
+ sub_batch.update_all(name: 'My Name')
+ rescue Exception => error
+ logger.error(message: error.message, class: error.class)
+
+ raise
+ end
+ end
+
+ # bad
+ def perform
+ each_sub_batch do |sub_batch|
+ sub_batch.update_all(name: 'My Name')
+ rescue Exception => error
+ logger.error(message: error.message, class: self.class.name)
+ end
+ end
+ ```
+
+## Examples
-## Example
+### Routes use-case
The `routes` table has a `source_type` field that's used for a polymorphic relationship.
As part of a database redesign, we're removing the polymorphic relationship. One step of
@@ -867,216 +1069,3 @@ background migration.
After the batched migration is completed, you can safely depend on the
data in `routes.namespace_id` being populated.
-
-## Testing
-
-Writing tests is required for:
-
-- The batched background migrations' queueing migration.
-- The batched background migration itself.
-- A cleanup migration.
-
-The `:migration` and `schema: :latest` RSpec tags are automatically set for
-background migration specs. Refer to the
-[Testing Rails migrations](../testing_guide/testing_migrations_guide.md#testing-a-non-activerecordmigration-class)
-style guide.
-
-Remember that `before` and `after` RSpec hooks
-migrate your database down and up. These hooks can result in other batched background
-migrations being called. Using `spy` test doubles with
-`have_received` is encouraged, instead of using regular test doubles, because
-your expectations defined in a `it` block can conflict with what is
-called in RSpec hooks. Refer to [issue #35351](https://gitlab.com/gitlab-org/gitlab/-/issues/18839)
-for more details.
-
-## Best practices
-
-1. Know how much data you're dealing with.
-1. Make sure the batched background migration jobs are idempotent.
-1. Confirm the tests you write are not false positives.
-1. If the data being migrated is critical and cannot be lost, the
- clean-up migration must also check the final state of the data before completing.
-1. Discuss the numbers with a database specialist. The migration may add
- more pressure on DB than you expect. Measure on staging,
- or ask someone to measure on production.
-1. Know how much time is required to run the batched background migration.
-1. Be careful when silently rescuing exceptions inside job classes. This may lead to
- jobs being marked as successful, even in a failure scenario.
-
- ```ruby
- # good
- def perform
- each_sub_batch do |sub_batch|
- sub_batch.update_all(name: 'My Name')
- end
- end
-
- # acceptable
- def perform
- each_sub_batch do |sub_batch|
- sub_batch.update_all(name: 'My Name')
- rescue Exception => error
- logger.error(message: error.message, class: error.class)
-
- raise
- end
- end
-
- # bad
- def perform
- each_sub_batch do |sub_batch|
- sub_batch.update_all(name: 'My Name')
- rescue Exception => error
- logger.error(message: error.message, class: self.class.name)
- end
- end
- ```
-
-## Additional tips and strategies
-
-### ChatOps integration
-
-The batched background migrations framework has ChatOps support. Using ChatOps, GitLab engineers can interact with the batched background migrations present in the system.
-
-### Viewing failure error logs
-
-You can view failures in two ways:
-
-- Via GitLab logs:
- 1. After running a batched background migration, if any jobs fail,
- view the logs in [Kibana](https://log.gprd.gitlab.net/goto/4cb43f40-f861-11ec-b86b-d963a1a6788e).
- View the production Sidekiq log and filter for:
-
- - `json.new_state: failed`
- - `json.job_class_name: <Batched Background Migration job class name>`
- - `json.job_arguments: <Batched Background Migration job class arguments>`
-
- 1. Review the `json.exception_class` and `json.exception_message` values to help
- understand why the jobs failed.
-
- 1. Remember the retry mechanism. Having a failure does not mean the job failed.
- Always check the last status of the job.
-
-- Via database:
-
- 1. Get the batched background migration `CLASS_NAME`.
- 1. Execute the following query in the PostgreSQL console:
-
- ```sql
- SELECT migration.id, migration.job_class_name, transition_logs.exception_class, transition_logs.exception_message
- FROM batched_background_migrations as migration
- INNER JOIN batched_background_migration_jobs as jobs
- ON jobs.batched_background_migration_id = migration.id
- INNER JOIN batched_background_migration_job_transition_logs as transition_logs
- ON transition_logs.batched_background_migration_job_id = jobs.id
- WHERE transition_logs.next_status = '2' AND migration.job_class_name = "CLASS_NAME";
- ```
-
-### Executing a particular batch on the database testing pipeline
-
-NOTE:
-Only [database maintainers](https://gitlab.com/groups/gitlab-org/maintainers/database/-/group_members?with_inherited_permissions=exclude) can view the database testing pipeline artifacts. Ask one for help if you need to use this method.
-
-Let's assume that a batched background migration failed on a particular batch on GitLab.com and you want to figure out which query failed and why. At the moment, we don't have a good way to retrieve query information (especially the query parameters) and rerunning the entire migration with more logging would be a long process.
-
-Fortunately you can leverage our [database migration pipeline](database_migration_pipeline.md) to rerun a particular batch with additional logging and/or fix to see if it solves the problem.
-
-<!-- vale gitlab.Substitutions = NO -->
-
-For an example see [Draft: Test PG::CardinalityViolation fix](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/110910) but make sure to read the entire section.
-
-To do that, you need to:
-
-1. Find the batch `start_id` and `end_id`
-1. Create a regular migration
-1. Apply a workaround for our migration helpers (optional)
-1. Start the database migration pipeline
-
-#### 1. Find the batch `start_id` and `end_id`
-
-You should be able to find those in [Kibana](#viewing-failure-error-logs).
-
-#### 2. Create a regular migration
-
-Schedule the batch in the `up` block of a regular migration:
-
-```ruby
-def up
- instance = Gitlab::BackgroundMigration::YourBackgroundMigrationClass.new(
- start_id: <batch start_id>,
- end_id: <batch end_id>,
- batch_table: <table name>,
- batch_column: <batching column>,
- sub_batch_size: <sub batch size>,
- pause_ms: <miliseconds between batches>,
- job_arguments: <job arguments if any>,
- connection: connection
- )
-
- instance.perform
-end
-
-
-def down
- # no-op
-end
-```
-
-#### 3. Apply a workaround for our migration helpers (optional)
-
-If your batched background migration touches tables from a schema other than the one you specified by using `restrict_gitlab_migration` helper (example: the scheduling migration has `restrict_gitlab_migration gitlab_schema: :gitlab_main` but the background job uses tables from the `:gitlab_ci` schema) then the migration will fail. To prevent that from happening you must to monkey patch database helpers so they don't fail the testing pipeline job:
-
-1. Add the schema names to [`RestrictGitlabSchema`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb#L57)
-
-```diff
-diff --git a/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb b/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb
-index b8d1d21a0d2d2a23d9e8c8a0a17db98ed1ed40b7..912e20659a6919f771045178c66828563cb5a4a1 100644
---- a/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb
-+++ b/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb
-@@ -55,7 +55,7 @@ def unmatched_schemas
- end
-
- def allowed_schemas_for_connection
-- Gitlab::Database.gitlab_schemas_for_connection(connection)
-+ Gitlab::Database.gitlab_schemas_for_connection(connection) << :gitlab_ci
- end
- end
- end
-```
-
-1. Add the schema names to [`RestrictAllowedSchemas`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb#L82)
-
-```diff
-diff --git a/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb b/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb
-index 4ae3622479f0800c0553959e132143ec9051898e..d556ec7f55adae9d46a56665ce02de782cb09f2d 100644
---- a/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb
-+++ b/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb
-@@ -79,7 +79,7 @@ def restrict_to_dml_only(parsed)
- tables = self.dml_tables(parsed)
- schemas = self.dml_schemas(tables)
-
-- if (schemas - self.allowed_gitlab_schemas).any?
-+ if (schemas - (self.allowed_gitlab_schemas << :gitlab_ci)).any?
- raise DMLAccessDeniedError, \
- "Select/DML queries (SELECT/UPDATE/DELETE) do access '#{tables}' (#{schemas.to_a}) " \
- "which is outside of list of allowed schemas: '#{self.allowed_gitlab_schemas}'. " \
-```
-
-#### 4. Start the database migration pipeline
-
-Create a Draft merge request with your changes and trigger the manual `db:gitlabcom-database-testing` job.
-
-### Adding indexes to support batched background migrations
-
-Sometimes it is necessary to add a new or temporary index to support a batched background migration.
-To do this, create the index in a post-deployment migration that precedes the post-deployment
-migration that queues the background migration.
-
-See the documentation for [adding database indexes](adding_database_indexes.md#analyzing-a-new-index-before-a-batched-background-migration)
-for additional information about some cases that require special attention to allow the index to be used directly after
-creation.
-
-## Legacy background migrations
-
-Batched background migrations replaced the legacy background migrations framework.
-Check that documentation in reference to any changes involving that framework.
diff --git a/doc/development/database/database_lab.md b/doc/development/database/database_lab.md
index 357133d8bca..7edb8ab4de5 100644
--- a/doc/development/database/database_lab.md
+++ b/doc/development/database/database_lab.md
@@ -146,13 +146,13 @@ the `pgai` Gem:
1. To get started, you need to gather some values from the [Postgres.ai instances page](https://console.postgres.ai/gitlab/instances):
- 1. Navigate to the instance that you want to configure and the on right side of the screen.
+ 1. Go to the instance that you want to configure and the on right side of the screen.
1. Under **Connection**, select **Connect**. The menu might be collapsed.
- A pop-up with everything that's needed for configuration appears, using this format:
+ A dialog with everything that's needed for configuration appears, using this format:
```shell
- dblab init --url http://127.0.0.1:1234 --token TOKEN --environment-id <environment-id>
+ dblab init --url "http://127.0.0.1:1234" --token TOKEN --environment-id <environment-id>
```
```shell
diff --git a/doc/development/database/database_migration_pipeline.md b/doc/development/database/database_migration_pipeline.md
index a9d525e2a41..280254d90b0 100644
--- a/doc/development/database/database_migration_pipeline.md
+++ b/doc/development/database/database_migration_pipeline.md
@@ -44,6 +44,13 @@ The next section of the comment contains detailed information for each migration
calls, timings, and the number of the changed rows.
- **Runtime histogram** - Indicates the distribution of query times for the migration.
+### Database size increase
+
+Occasionally, a migration shows a +8.00 KiB size increase, even if the migration was not
+expected to result in a size increase. Completing any migration adds a row to the
+`schema_migrations` table, which may require a new disk page to be created.
+If a new disk page is created, the size of the database will grow by exactly 8 KiB.
+
## Background migration details
The next section of the comment contains detailed information about each batched background migration, including:
diff --git a/doc/development/database/database_reviewer_guidelines.md b/doc/development/database/database_reviewer_guidelines.md
index c297c90918f..c75503c503b 100644
--- a/doc/development/database/database_reviewer_guidelines.md
+++ b/doc/development/database/database_reviewer_guidelines.md
@@ -68,8 +68,8 @@ The following guides provide a quick introduction and links to follow on more ad
- Guide on [understanding EXPLAIN plans](understanding_explain_plans.md).
- [Explaining the unexplainable series in `depesz`](https://www.depesz.com/tag/unexplainable/).
-We also have licensed access to The Art of PostgreSQL. If you are interested in getting access, check out the
-[issue (confidential)](https://gitlab.com/gitlab-org/database-team/team-tasks/-/issues/23).
+We also have licensed access to The Art of PostgreSQL. If you are interested in getting access, GitLab team
+members can check out the issue here: `https://gitlab.com/gitlab-org/database-team/team-tasks/-/issues/23`.
Finally, you can find various guides in the [Database guides](index.md) page that cover more specific
topics and use cases. The most frequently required during database reviewing are the following:
diff --git a/doc/development/database/efficient_in_operator_queries.md b/doc/development/database/efficient_in_operator_queries.md
index a0c71f49e2d..87976df5711 100644
--- a/doc/development/database/efficient_in_operator_queries.md
+++ b/doc/development/database/efficient_in_operator_queries.md
@@ -228,9 +228,6 @@ Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
If it's not given, the `IN` operator optimization only makes the `ORDER BY` columns available to
the end-user and not the full database row.
- If it's not given, the `IN` operator optimization only makes the `ORDER BY` columns available to
- the end-user and not the full database row.
-
The following database index on the `issues` table must be present
to make the query execute efficiently:
diff --git a/doc/development/database/iterating_tables_in_batches.md b/doc/development/database/iterating_tables_in_batches.md
index a927242e8d8..84b82b16255 100644
--- a/doc/development/database/iterating_tables_in_batches.md
+++ b/doc/development/database/iterating_tables_in_batches.md
@@ -135,11 +135,24 @@ Let's consider that we want to iterate over the `users` table and print the `Use
standard output. The `users` table contains millions of records, thus running one query to fetch
the users likely times out.
-![Users table overview](../img/each_batch_users_table_v13_7.png)
-
-This is a simplified version of the `users` table which contains several rows. We have a few
+This table is a simplified version of the `users` table which contains several rows. We have a few
smaller gaps in the `id` column to make the example a bit more realistic (a few records were
-already deleted). Currently, we have one index on the `id` field.
+already deleted). One index exists on the `id` field:
+
+| `ID` | `sign_in_count` | `created_at` |
+| -- | :-------------: | ------------ |
+| 1 | 1 | 2020-01-01
+| 2 | 4 | 2020-01-01
+| 9 | 1 | 2020-01-03
+| 300 | 5 | 2020-01-03
+| 301 | 9 | 2020-01-03
+| 302 | 8 | 2020-01-03
+| 303 | 2 | 2020-01-03
+| 350 | 1 | 2020-01-03
+| 351 | 3 | 2020-01-04
+| 352 | 0 | 2020-01-05
+| 353 | 9 | 2020-01-11
+| 354 | 3 | 2020-01-12
Loading all users into memory (avoid):
diff --git a/doc/development/database/multiple_databases.md b/doc/development/database/multiple_databases.md
index 7c6b94144b4..7037ab22983 100644
--- a/doc/development/database/multiple_databases.md
+++ b/doc/development/database/multiple_databases.md
@@ -163,6 +163,92 @@ allows you to avoid the join, while still avoiding the N+1 query.
You can see a real example of this solution being used in
<https://gitlab.com/gitlab-org/gitlab/-/merge_requests/67655>.
+##### Remove a redundant join
+
+Sometimes there are cases where a query is doing excess (or redundant) joins.
+
+A common example occurs where a query is joining from `A` to `C`, via some
+table with both foreign keys, `B`.
+When you only care about counting how
+many rows there are in `C` and if there are foreign keys and `NOT NULL` constraints
+on the foreign keys in `B`, then it might be enough to count those rows.
+For example, in
+[MR 71811](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/71811), it was
+previously doing `project.runners.count`, which would produce a query like:
+
+```sql
+select count(*) from projects
+inner join ci_runner_projects on ci_runner_projects.project_id = projects.id
+where ci_runner_projects.runner_id IN (1, 2, 3)
+```
+
+This was changed to avoid the cross-join by changing the code to
+`project.runner_projects.count`. It produces the same response with the
+following query:
+
+```sql
+select count(*) from ci_runner_projects
+where ci_runner_projects.runner_id IN (1, 2, 3)
+```
+
+Another common redundant join is joining all the way to another table,
+then filtering by primary key when you could have instead filtered on a foreign
+key. See an example in
+[MR 71614](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/71614). The previous
+code was `joins(scan: :build).where(ci_builds: { id: build_ids })`, which
+generated a query like:
+
+```sql
+select ...
+inner join security_scans
+inner join ci_builds on security_scans.build_id = ci_builds.id
+where ci_builds.id IN (1, 2, 3)
+```
+
+However, as `security_scans` already has a foreign key `build_id`, the code
+can be changed to `joins(:scan).where(security_scans: { build_id: build_ids })`,
+which produces the same response with the following query:
+
+```sql
+select ...
+inner join security_scans
+where security_scans.build_id IN (1, 2, 3)
+```
+
+Both of these examples of removing redundant joins remove the cross-joins,
+but they have the added benefit of producing simpler and faster
+queries.
+
+##### Limited pluck followed by a find
+
+Using `pluck` or `pick` to get an array of `id`s is not advisable unless the returned
+array is guaranteed to be bounded in size. Usually this is a good pattern where
+you know the result will be at most 1, or in cases where you have a list of in
+memory ids (or usernames) that need to be mapped to another list of equal size.
+It would not be suitable when mapping a list of ids in a one-to-many
+relationship as the result will be unbounded. We can then use the
+returned `id`s to obtain the related record:
+
+```ruby
+allowed_user_id = board_user_finder
+ .where(user_id: params['assignee_id'])
+ .pick(:user_id)
+
+User.find_by(id: allowed_user_id)
+```
+
+You can see an example where this was used in
+<https://gitlab.com/gitlab-org/gitlab/-/merge_requests/126856>
+
+Sometimes it might seem easy to convert a join into a `pluck` but often this
+results in loading an unbounded amount of ids into memory and then
+re-serializing those in a following query back to Postgres. These cases do not
+scale and we recommend attempting one of the other options. It might seem like a
+good idea to just apply some `limit` to the plucked data to have bounded memory
+but this introduces unpredictable results for users and often is most
+problematic for our largest customers (including ourselves), and as such we
+advise against it.
+
##### De-normalize some foreign key to the table
De-normalization refers to adding redundant precomputed (duplicated) data to
@@ -263,62 +349,6 @@ logic to delete these rows if or whenever necessary in your domain.
Finally, this de-normalization and new query also improves performance because
it does less joins and needs less filtering.
-##### Remove a redundant join
-
-Sometimes there are cases where a query is doing excess (or redundant) joins.
-
-A common example occurs where a query is joining from `A` to `C`, via some
-table with both foreign keys, `B`.
-When you only care about counting how
-many rows there are in `C` and if there are foreign keys and `NOT NULL` constraints
-on the foreign keys in `B`, then it might be enough to count those rows.
-For example, in
-[MR 71811](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/71811), it was
-previously doing `project.runners.count`, which would produce a query like:
-
-```sql
-select count(*) from projects
-inner join ci_runner_projects on ci_runner_projects.project_id = projects.id
-where ci_runner_projects.runner_id IN (1, 2, 3)
-```
-
-This was changed to avoid the cross-join by changing the code to
-`project.runner_projects.count`. It produces the same response with the
-following query:
-
-```sql
-select count(*) from ci_runner_projects
-where ci_runner_projects.runner_id IN (1, 2, 3)
-```
-
-Another common redundant join is joining all the way to another table,
-then filtering by primary key when you could have instead filtered on a foreign
-key. See an example in
-[MR 71614](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/71614). The previous
-code was `joins(scan: :build).where(ci_builds: { id: build_ids })`, which
-generated a query like:
-
-```sql
-select ...
-inner join security_scans
-inner join ci_builds on security_scans.build_id = ci_builds.id
-where ci_builds.id IN (1, 2, 3)
-```
-
-However, as `security_scans` already has a foreign key `build_id`, the code
-can be changed to `joins(:scan).where(security_scans: { build_id: build_ids })`,
-which produces the same response with the following query:
-
-```sql
-select ...
-inner join security_scans
-where security_scans.build_id IN (1, 2, 3)
-```
-
-Both of these examples of removing redundant joins remove the cross-joins,
-but they have the added benefit of producing simpler and faster
-queries.
-
##### Use `disable_joins` for `has_one` or `has_many` `through:` relations
Sometimes a join query is caused by using `has_one ... through:` or `has_many
@@ -628,3 +658,56 @@ If this task was run against a GitLab setup that uses only a single database
for both `gitlab_main` and `gitlab_ci` tables, then no tables will be locked.
To undo the operation, run the opposite Rake task: `gitlab:db:unlock_writes`.
+
+## Truncating tables
+
+When the databases `main` and `ci` are fully split, we can free up disk
+space by truncating tables. This results in a smaller data set: For example,
+the data in `users` table on CI database is no longer read and also no
+longer updated. So this data can be removed by truncating the tables.
+
+For this purpose, GitLab provides two Rake tasks, one for each database:
+
+- `gitlab:db:truncate_legacy_tables:main` will truncate the CI tables in Main database.
+- `gitlab:db:truncate_legacy_tables:ci` will truncate the Main tables in CI database.
+
+NOTE:
+These tasks can only be run when the tables in the database are
+[locked for writes](#locking-writes-on-the-tables-that-dont-belong-to-the-database-schemas).
+
+WARNING:
+The examples in this section use `DRY_RUN=true`. This ensures no data is actually
+truncated. GitLab highly recommends to have a backup available before you run any of
+these tasks without `DRY_RUN=true`.
+
+These tasks have the option to see what they do without actually changing the
+data:
+
+```shell
+$ sudo DRY_RUN=true gitlab-rake gitlab:db:truncate_legacy_tables:main
+I, [2023-07-14T17:08:06.665151 #92505] INFO -- : DRY RUN:
+I, [2023-07-14T17:08:06.761586 #92505] INFO -- : Truncating legacy tables for the database main
+I, [2023-07-14T17:08:06.761709 #92505] INFO -- : SELECT set_config('lock_writes.ci_build_needs', 'false', false)
+I, [2023-07-14T17:08:06.765272 #92505] INFO -- : SELECT set_config('lock_writes.ci_build_pending_states', 'false', false)
+I, [2023-07-14T17:08:06.768220 #92505] INFO -- : SELECT set_config('lock_writes.ci_build_report_results', 'false', false)
+[...]
+I, [2023-07-14T17:08:06.957294 #92505] INFO -- : TRUNCATE TABLE ci_build_needs, ci_build_pending_states, ci_build_report_results, ci_build_trace_chunks, ci_build_trace_metadata, ci_builds, ci_builds_metadata, ci_builds_runner_session, ci_cost_settings, ci_daily_build_group_report_results, ci_deleted_objects, ci_editor_ai_conversation_messages, ci_freeze_periods, ci_group_variables, ci_instance_variables, ci_job_artifact_states, ci_job_artifacts, ci_job_token_project_scope_links, ci_job_variables, ci_minutes_additional_packs, ci_namespace_mirrors, ci_namespace_monthly_usages, ci_partitions, ci_pending_builds, ci_pipeline_artifacts, ci_pipeline_chat_data, ci_pipeline_messages, ci_pipeline_metadata, ci_pipeline_schedule_variables, ci_pipeline_schedules, ci_pipeline_variables, ci_pipelines, ci_pipelines_config, ci_platform_metrics, ci_project_mirrors, ci_project_monthly_usages, ci_refs, ci_resource_groups, ci_resources, ci_runner_machines, ci_runner_namespaces, ci_runner_projects, ci_runner_versions, ci_runners, ci_running_builds, ci_secure_file_states, ci_secure_files, ci_sources_pipelines, ci_sources_projects, ci_stages, ci_subscriptions_projects, ci_trigger_requests, ci_triggers, ci_unit_test_failures, ci_unit_tests, ci_variables, external_pull_requests, p_ci_builds, p_ci_builds_metadata, p_ci_job_annotations, p_ci_runner_machine_builds, taggings, tags RESTRICT
+```
+
+The tasks will first find out the tables that need to be truncated. Truncation will
+happen in stages because we need to limit the amount of data removed in one database
+transaction. The tables are processed in a specific order depending on the definition
+of the foreign keys. The number of tables processed in one stage can be changed by
+adding a number when invoking the task. The default value is 5:
+
+```shell
+sudo DRY_RUN=true gitlab-rake gitlab:db:truncate_legacy_tables:main\[10\]
+```
+
+It is also possible to limit the number of tables to be truncated by setting the `UNTIL_TABLE`
+variable. For example in this case, the process will stop when `ci_unit_test_failures` has been
+truncated:
+
+```shell
+sudo DRY_RUN=true UNTIL_TABLE=ci_unit_test_failures gitlab-rake gitlab:db:truncate_legacy_tables:main
+```
diff --git a/doc/development/database/not_null_constraints.md b/doc/development/database/not_null_constraints.md
index 2b10e440799..e1b6868c68e 100644
--- a/doc/development/database/not_null_constraints.md
+++ b/doc/development/database/not_null_constraints.md
@@ -63,9 +63,10 @@ The steps required are:
1. Release `N.M` (current release)
- - Ensure the constraint is enforced at the application level (that is, add a model validation).
- - Add a post-deployment migration to add the `NOT NULL` constraint with `validate: false`.
- - Add a post-deployment migration to fix the existing records.
+ 1. Ensure $ATTRIBUTE value is being set at the application level.
+ 1. If the attribute has a default value, add the default value to the model so the default value is set for new records.
+ 1. Update all places in the code where the attribute is being set to `nil`, if any, for new and existing records.
+ 1. Add a post-deployment migration to fix the existing records.
NOTE:
Depending on the size of the table, a background migration for cleanup could be required in the next release.
@@ -75,22 +76,16 @@ The steps required are:
1. Release `N.M+1` (next release)
- - Validate the `NOT NULL` constraint using a post-deployment migration.
+ 1. Make sure all existing records on GitLab.com have attribute set. If not, go back to step 1 from Release `N.M`.
+ 1. If step 1 seems fine and the backfill from Release `N.M` was done via a batched background migration then add a
+ post-deployment migration to
+ [finalize the background migration](batched_background_migrations.md#depending-on-migrated-data).
+ 1. Add a validation for the attribute in the model to prevent records with `nil` attribute as now all existing and new records should be valid.
+ 1. Add a post-deployment migration to add the `NOT NULL` constraint.
### Example
-Considering a given release milestone, such as 13.0, a model validation has been added into `epic.rb`
-to require a description:
-
-```ruby
-class Epic < ApplicationRecord
- validates :description, presence: true
-end
-```
-
-The same constraint should be added at the database level for consistency purposes.
-We only want to enforce the `NOT NULL` constraint without setting a default, as we have decided
-that all epics should have a user-generated description.
+Considering a given release milestone, such as 13.0.
After checking our production database, we know that there are `epics` with `NULL` descriptions,
so we cannot add and validate the constraint in one step.
@@ -101,33 +96,16 @@ such records, so we would follow the same process either way.
#### Prevent new invalid records (current release)
-We first add the `NOT NULL` constraint with a `NOT VALID` parameter, which enforces consistency
-when new records are inserted or current records are updated.
-
-In the example above, the existing epics with a `NULL` description are not affected and you are
-still able to update records in the `epics` table. However, when you try to update or insert
-an epic without providing a description, the constraint causes a database error.
-
-Adding or removing a `NOT NULL` clause requires that any application changes are deployed _first_.
-Thus, adding a `NOT NULL` constraint to an existing column should happen in a post-deployment migration.
+Update all the code paths where the attribute is being set to `nil`, if any, to set the attribute to non-nil value
+for new and existing records.
-Still in our example, for the 13.0 milestone example (current), we add the `NOT NULL` constraint
-with `validate: false` in a post-deployment migration,
-`db/post_migrate/20200501000001_add_not_null_constraint_to_epics_description.rb`:
+An attribute with default using the
+[Rails attributes API](https://api.rubyonrails.org/classes/ActiveRecord/Attributes/ClassMethods.html) has been added in
+`epic.rb` so that default value is set for new records:
```ruby
-class AddNotNullConstraintToEpicsDescription < Gitlab::Database::Migration[2.1]
- disable_ddl_transaction!
-
- def up
- # This will add the `NOT NULL` constraint WITHOUT validating it
- add_not_null_constraint :epics, :description, validate: false
- end
-
- def down
- # Down is required as `add_not_null_constraint` is not reversible
- remove_not_null_constraint :epics, :description
- end
+class Epic < ApplicationRecord
+ attribute :description, default: 'No description'
end
```
@@ -176,24 +154,47 @@ class CleanupEpicsWithNullDescription < Gitlab::Database::Migration[2.1]
end
```
-#### Validate the `NOT NULL` constraint (next release)
+#### Check if all records are fixed (next release)
+
+Use postgres.ai to [create a thin clone](https://about.gitlab.com/handbook/engineering/development/enablement/data_stores/database/doc/gitlab-com-database.html#use-postgresai-to-work-with-a-thin-clone-of-the-database-includes-direct-psql-access-to-the-thin-clone)
+of the production database and check if all records on GitLab.com have the attribute set.
+If not go back to [Prevent new invalid records](#prevent-new-invalid-records-current-release) step and figure out where
+in the code the attribute is explicitly set to `nil`. Fix the code path then reschedule the migration to fix the existing
+records and wait for the next release to do the following steps.
+
+#### Finalize the background migration (next release)
+
+If the migration was done using a background migration then [finalize the migration](batched_background_migrations.md#depending-on-migrated-data).
-Validating the `NOT NULL` constraint scans the whole table and make sure that each record is correct.
+#### Add validation to the model (next release)
-Still in our example, for the 13.1 milestone (next), we run the `validate_not_null_constraint`
-migration helper in a final post-deployment migration,
-`db/post_migrate/20200601000001_validate_not_null_constraint_on_epics_description.rb`:
+Add a validation for the attribute to the model to prevent records with `nil` attribute as now all existing and new records should be valid.
```ruby
-class ValidateNotNullConstraintOnEpicsDescription < Gitlab::Database::Migration[2.1]
+class Epic < ApplicationRecord
+ validates :description, presence: true
+end
+```
+
+#### Add the `NOT NULL` constraint (next release)
+
+Adding the `NOT NULL` constraint scans the whole table and make sure that each record is correct.
+
+Still in our example, for the 13.1 milestone (next), we run the `add_not_null_constraint`
+migration helper in a final post-deployment migration:
+
+```ruby
+class AddNotNullConstraintToEpicsDescription < Gitlab::Database::Migration[2.1]
disable_ddl_transaction!
def up
- validate_not_null_constraint :epics, :description
+ # This will add the `NOT NULL` constraint and validate it
+ add_not_null_constraint :epics, :description
end
def down
- # no-op
+ # Down is required as `add_not_null_constraint` is not reversible
+ remove_not_null_constraint :epics, :description
end
end
```
diff --git a/doc/development/database/pagination_guidelines.md b/doc/development/database/pagination_guidelines.md
index d6550d0a515..8b07dcada05 100644
--- a/doc/development/database/pagination_guidelines.md
+++ b/doc/development/database/pagination_guidelines.md
@@ -218,7 +218,9 @@ We can argue that a typical user does not visit these pages, however, API users
### Keyset pagination
-Keyset pagination addresses the performance concerns of "skipping" previous rows when requesting a large page, however, it's not a drop-in replacement for offset-based pagination. Keyset pagination is used only in the [GraphQL API](../graphql_guide/pagination.md)
+Keyset pagination addresses the performance concerns of "skipping" previous rows when requesting a large page, however, it's not a drop-in replacement for offset-based pagination. When moving an API endpoint from offset-based pagination to keyset-based pagination, both must be supported. Removing one type of pagination entirely is a [breaking changes](../../update/terminology.md#breaking-change).
+
+Keyset pagination used in both the [GraphQL API](../graphql_guide/pagination.md#keyset-pagination) and the [REST API](../../api/rest/index.md#keyset-based-pagination).
Consider the following `issues` table:
diff --git a/doc/development/database/query_recorder.md b/doc/development/database/query_recorder.md
index bae211c1618..39d54954268 100644
--- a/doc/development/database/query_recorder.md
+++ b/doc/development/database/query_recorder.md
@@ -20,7 +20,7 @@ This style of test works by counting the number of SQL queries executed by Activ
it "avoids N+1 database queries" do
control = ActiveRecord::QueryRecorder.new { visit_some_page }
create_list(:issue, 5)
- expect { visit_some_page }.not_to exceed_query_limit(control)
+ expect { visit_some_page }.to issue_same_number_of_queries_as(control)
end
```
@@ -33,13 +33,15 @@ it "avoids N+1 database queries" do
create_list(:issue, 5)
action = ActiveRecord::QueryRecorder.new { visit_some_page }
- expect(action).not_to exceed_query_limit(control)
+ expect(action).to issue_same_number_of_queries_as(control)
end
```
As an example you might create 5 issues in between counts, which would cause the query count to increase by 5 if an N+1 problem exists.
-In some cases, the query count might change slightly between runs for unrelated reasons. In this case you might need to test `exceed_query_limit(control_count + acceptable_change)`, but this should be avoided if possible.
+In some cases, the query count might change slightly between runs for unrelated reasons.
+In this case you might need to test `issue_same_number_of_queries_as(control_count + acceptable_change)`,
+but this should be avoided if possible.
If this test fails, and the control was passed as a `QueryRecorder`, then the
failure message indicates where the extra queries are by matching queries on
@@ -50,18 +52,45 @@ warm the cache, second one to establish a control, third one to validate that
there are no N+1 queries. Rather than make an extra request to warm the cache, prefer two requests
(control and test) and configure your test to ignore [cached queries](#cached-queries) in N+1 specs.
+```ruby
+it "avoids N+1 database queries" do
+ # warm up
+ visit_some_page
+
+ control = ActiveRecord::QueryRecorder.new(skip_cached: true) { visit_some_page }
+ create_list(:issue, 5)
+ expect { visit_some_page }.to issue_same_number_of_queries_as(control)
+end
+```
+
## Cached queries
-By default, QueryRecorder ignores [cached queries](../merge_request_concepts/performance.md#cached-queries) in the count. However, it may be better to count
-all queries to avoid introducing an N+1 query that may be masked by the statement cache.
+By default, QueryRecorder ignores [cached queries](../merge_request_concepts/performance.md#cached-queries) in the count.
+However, it may be better to count all queries to avoid introducing an N+1 query that may be masked by the statement cache.
To do this, this requires the `:use_sql_query_cache` flag to be set.
-You should pass the `skip_cached` variable to `QueryRecorder` and use the `exceed_all_query_limit` matcher:
+You should pass the `skip_cached` variable to `QueryRecorder` and use the `issue_same_number_of_queries_as` matcher:
```ruby
it "avoids N+1 database queries", :use_sql_query_cache do
control = ActiveRecord::QueryRecorder.new(skip_cached: false) { visit_some_page }
create_list(:issue, 5)
- expect { visit_some_page }.not_to exceed_all_query_limit(control)
+ expect { visit_some_page }.to issue_same_number_of_queries_as(control)
+end
+```
+
+## Using RequestStore
+
+[`RequestStore` / `Gitlab::SafeRequestStore`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/gems/gitlab-safe_request_store/README.md)
+helps us to avoid N+1 queries by caching data in memory for the duration of a request. However, it is disabled by default in tests
+and can lead to false negatives when testing for N+1 queries.
+
+To enable `RequestStore` in tests, use the `request_store` helper when needed:
+
+```ruby
+it "avoids N+1 database queries", :request_store do
+ control = ActiveRecord::QueryRecorder.new(skip_cached: true) { visit_some_page }
+ create_list(:issue, 5)
+ expect { visit_some_page }.to issue_same_number_of_queries_as(control)
end
```
@@ -72,6 +101,11 @@ Use a [request spec](https://gitlab.com/gitlab-org/gitlab/-/tree/master/spec/req
Controller specs should not be used to write N+1 tests as the controller is only initialized once per example.
This could lead to false successes where subsequent "requests" could have queries reduced (for example, because of memoization).
+## Never trust a test you haven't seen fail
+
+Before you add a test for N+1 queries, you should first verify that the test fails without your change.
+This is because the test may be broken, or the test may be passing for the wrong reasons.
+
## Finding the source of the query
There are multiple ways to find the source of queries.
diff --git a/doc/development/database/single_table_inheritance.md b/doc/development/database/single_table_inheritance.md
index dcf696b85bc..40b608bd110 100644
--- a/doc/development/database/single_table_inheritance.md
+++ b/doc/development/database/single_table_inheritance.md
@@ -6,22 +6,58 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Single Table Inheritance
-**Summary:** don't use Single Table Inheritance (STI), use separate tables
-instead.
+**Summary:** Don't design new tables using Single Table Inheritance (STI). For existing tables that use STI as a pattern, avoid adding new types, and consider splitting them into separate tables.
-Rails makes it possible to have multiple models stored in the same table and map
-these rows to the correct models using a `type` column. This can be used to for
-example store two different types of SSH keys in the same table.
+STI is a database design pattern where a single table stores
+different types of records. These records have a subset of shared columns and another column
+that instructs the application which object that record should be represented by.
+This can be used to for example store two different types of SSH keys in the same
+table. ActiveRecord makes use of it and provides some features that make STI usage
+more convenient.
-While tempting to use one should avoid this at all costs for the same reasons as
-outlined in the document ["Polymorphic Associations"](polymorphic_associations.md).
+We no longer allow new STI tables because they:
-## Solution
+- Lead to tables with large number of rows, when we should strive to keep tables small.
+- Need additional indexes, increasing our usage of lightweight locks, whose saturation can cause incidents.
+- Add overhead by having to filter all of the data by a value, leading to more page accesses on read.
+- Use the `class_name` to load the correct class for an object, but storing
+ the class name is costly and unnecessary.
-The solution is very simple: just use a separate table for every type you'd
-otherwise store in the same table. For example, instead of having a `keys` table
-with `type` set to either `Key` or `DeployKey` you'd have two separate tables:
-`keys` and `deploy_keys`.
+Instead of using STI, consider the following alternatives:
+
+- Use a different table for each type.
+- Avoid adding `*_type` columns. This is a code smell that might indicate that new types will be added in the future, and refactoring in the future will be much harder.
+- If you already have a table that is effectively an STI on a `_type` column, consider:
+ - Splitting the existent data into multiple tables.
+ - Refactoring so that new types can be added as new tables while keeping existing ones (for example, move logic of the base class into a concern).
+
+If, **after considering all of the above downsides and alternatives**, STI
+is the only solution for the problem at hand, we can at least avoid the
+issues with saving the class name in the record by using an enum type
+instead and the `EnumInheritance` concern:
+
+```ruby
+class Animal < ActiveRecord::Base
+ include EnumInheritance
+
+ enum species: {
+ dog: 1,
+ cat: 2
+ }
+
+ def self.inheritance_column_to_class_map = {
+ dog: 'Dog',
+ cat: 'Cat'
+ }
+
+ def self.inheritance_column = 'species'
+end
+
+class Dog < Animal; end
+class Cat < Animal; end
+```
+
+If your table already has a `*_type`, new classes for the different types can be added as needed.
## In migrations
diff --git a/doc/development/database/strings_and_the_text_data_type.md b/doc/development/database/strings_and_the_text_data_type.md
index 54bdeaa7b28..adb715e219d 100644
--- a/doc/development/database/strings_and_the_text_data_type.md
+++ b/doc/development/database/strings_and_the_text_data_type.md
@@ -129,8 +129,6 @@ class AddExtendedTitleToSprints < Gitlab::Database::Migration[2.1]
end
def down
- remove_text_limit :sprints, :extended_title
-
with_lock_retries do
remove_column :sprints, :extended_title, if_exists: true
end