Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/database')
-rw-r--r--doc/development/database/add_foreign_key_to_existing_column.md19
-rw-r--r--doc/development/database/adding_database_indexes.md34
-rw-r--r--doc/development/database/avoiding_downtime_in_migrations.md2
-rw-r--r--doc/development/database/batched_background_migrations.md253
-rw-r--r--doc/development/database/ci_mirrored_tables.md18
-rw-r--r--doc/development/database/clickhouse/index.md62
-rw-r--r--doc/development/database/clickhouse/merge_request_analytics.md355
-rw-r--r--doc/development/database/clickhouse/tiered_storage.md138
-rw-r--r--doc/development/database/creating_enums.md12
-rw-r--r--doc/development/database/database_dictionary.md16
-rw-r--r--doc/development/database/database_lab.md96
-rw-r--r--doc/development/database/database_migration_pipeline.md42
-rw-r--r--doc/development/database/database_reviewer_guidelines.md5
-rw-r--r--doc/development/database/efficient_in_operator_queries.md2
-rw-r--r--doc/development/database/index.md5
-rw-r--r--doc/development/database/iterating_tables_in_batches.md62
-rw-r--r--doc/development/database/load_balancing.md59
-rw-r--r--doc/development/database/loose_foreign_keys.md2
-rw-r--r--doc/development/database/multiple_databases.md22
-rw-r--r--doc/development/database/query_performance.md4
-rw-r--r--doc/development/database/required_stops.md56
-rw-r--r--doc/development/database/strings_and_the_text_data_type.md39
-rw-r--r--doc/development/database/table_partitioning.md192
-rw-r--r--doc/development/database/transaction_guidelines.md2
-rw-r--r--doc/development/database/understanding_explain_plans.md31
25 files changed, 1381 insertions, 147 deletions
diff --git a/doc/development/database/add_foreign_key_to_existing_column.md b/doc/development/database/add_foreign_key_to_existing_column.md
index 2c2999e69d6..823fb49a9ab 100644
--- a/doc/development/database/add_foreign_key_to_existing_column.md
+++ b/doc/development/database/add_foreign_key_to_existing_column.md
@@ -155,13 +155,13 @@ To limit impact on GitLab.com, a process exists to validate them asynchronously
during weekend hours. Due to generally lower traffic and fewer deployments,
FK validation can proceed at a lower level of risk.
-### Schedule foreign key validation for a low-impact time
+#### Schedule foreign key validation for a low-impact time
1. [Schedule the FK to be validated](#schedule-the-fk-to-be-validated).
1. [Verify the MR was deployed and the FK is valid in production](#verify-the-mr-was-deployed-and-the-fk-is-valid-in-production).
1. [Add a migration to validate the FK synchronously](#add-a-migration-to-validate-the-fk-synchronously).
-### Schedule the FK to be validated
+#### Schedule the FK to be validated
1. Create a merge request containing a post-deployment migration, which prepares
the foreign key for asynchronous validation.
@@ -198,7 +198,7 @@ def down
end
```
-### Verify the MR was deployed and the FK is valid in production
+#### Verify the MR was deployed and the FK is valid in production
1. Verify that the post-deploy migration was executed on GitLab.com using ChatOps with
`/chatops run auto_deploy status <merge_sha>`. If the output returns `db/gprd`,
@@ -208,7 +208,7 @@ end
1. Use [Database Lab](database_lab.md) to check if validation was successful.
Ensure the output does not indicate the foreign key is `NOT VALID`.
-### Add a migration to validate the FK synchronously
+#### Add a migration to validate the FK synchronously
After the foreign key is valid on the production database, create a second
merge request that validates the foreign key synchronously. The schema changes
@@ -240,19 +240,20 @@ end
```
-## Test database FK changes locally
+### Test database FK changes locally
You must test the database foreign key changes locally before creating a merge request.
-### Verify the foreign keys validated asynchronously
+#### Verify the foreign keys validated asynchronously
Use the asynchronous helpers on your local environment to test changes for
validating a foreign key:
-1. Enable the feature flags by running `Feature.enable(:database_async_foreign_key_validation)`
- and `Feature.enable(:database_reindexing)` in the Rails console.
+1. Enable the feature flag by running `Feature.enable(:database_async_foreign_key_validation)`
+ in the Rails console.
1. Run `bundle exec rails db:migrate` so that it creates an entry in the async validation table.
-1. Run `bundle exec rails gitlab:db:reindex` so that the FK is validated asynchronously.
+1. Run `bundle exec rails gitlab:db:validate_async_constraints:all` so that the FK is validated
+ asynchronously on all databases.
1. To verify the foreign key, open the PostgreSQL console using the
[GDK](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/postgresql.md)
command `gdk psql` and run the command `\d+ table_name` to check that your
diff --git a/doc/development/database/adding_database_indexes.md b/doc/development/database/adding_database_indexes.md
index 1e3a1de9b69..7b29b1b14de 100644
--- a/doc/development/database/adding_database_indexes.md
+++ b/doc/development/database/adding_database_indexes.md
@@ -38,6 +38,15 @@ when adding a new index:
1. Is the overhead of maintaining the index worth the reduction in query
timings?
+In some situations, an index might not be required:
+
+- The table is small (less than `1,000` records) and it's not expected to exponentially grow in size.
+- Any existing indexes filter out enough rows.
+- The reduction in query timings after the index is added is not significant.
+
+Additionally, wide indexes are not required to match all filter criteria of queries. We just need
+to cover enough columns so that the index lookup has a small enough selectivity.
+
## Re-using Queries
The first step is to make sure your query re-uses as many existing indexes as
@@ -183,6 +192,29 @@ for `index_exists?`, causing a required index to not be created
properly. By always requiring a name for certain types of indexes, the
chance of error is greatly reduced.
+## Testing for existence of indexes
+
+The easiest way to test for existence of an index by name is to use the `index_name_exists?` method, but the `index_exists?` method can also be used with a name option. For example:
+
+```ruby
+class MyMigration < Gitlab::Database::Migration[2.1]
+ INDEX_NAME = 'index_name'
+
+ def up
+ # an index must be conditionally created due to schema inconsistency
+ unless index_exists?(:table_name, :column_name, name: INDEX_NAME)
+ add_index :table_name, :column_name, name: INDEX_NAME
+ end
+ end
+
+ def down
+ # no op
+ end
+end
+```
+
+Keep in mind that concurrent index helpers like `add_concurrent_index`, `remove_concurrent_index`, and `remove_concurrent_index_by_name` already perform existence checks internally.
+
## Temporary indexes
There may be times when an index is only needed temporarily.
@@ -448,7 +480,7 @@ You must test the database index changes locally before creating a merge request
the post-deploy migration has been executed in the production database. For more information, see
[How to determine if a post-deploy migration has been executed on GitLab.com](https://gitlab.com/gitlab-org/release/docs/-/blob/master/general/post_deploy_migration/readme.md#how-to-determine-if-a-post-deploy-migration-has-been-executed-on-gitlabcom).
1. In the case of an [index removed asynchronously](#schedule-the-index-to-be-removed), wait
- until the next week so that the index can be created over a weekend.
+ until the next week so that the index can be removed over a weekend.
1. Use Database Lab [to check if removal was successful](database_lab.md#checking-indexes).
[Database Lab](database_lab.md)
should report an error when trying to find the removed index. If not, the index may still exist.
diff --git a/doc/development/database/avoiding_downtime_in_migrations.md b/doc/development/database/avoiding_downtime_in_migrations.md
index 8e1eeee7a42..25310554c24 100644
--- a/doc/development/database/avoiding_downtime_in_migrations.md
+++ b/doc/development/database/avoiding_downtime_in_migrations.md
@@ -605,7 +605,7 @@ See example [merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_request
### Remove the trigger and old `integer` columns (release N + 2)
Using post-deployment migration and the provided `cleanup_conversion_of_integer_to_bigint` helper,
-drop the database trigger and the old `integer` columns ([see an example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/69714)).
+drop the database trigger and the old `integer` columns ([see an example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/70351)).
### Remove ignore rules (release N + 3)
diff --git a/doc/development/database/batched_background_migrations.md b/doc/development/database/batched_background_migrations.md
index c6fe6d16faf..6a6b43e52a0 100644
--- a/doc/development/database/batched_background_migrations.md
+++ b/doc/development/database/batched_background_migrations.md
@@ -34,10 +34,13 @@ Background migrations can help when:
- Populating one column based on JSON stored in another column.
- Migrating data that depends on the output of external services. (For example, an API.)
-NOTE:
-If the batched background migration is part of an important upgrade, it must be announced
-in the release post. Discuss with your Project Manager if you're unsure if the migration falls
-into this category.
+### Notes
+
+- If the batched background migration is part of an important upgrade, it must be announced
+ in the release post. Discuss with your Project Manager if you're unsure if the migration falls
+ into this category.
+- You should use the [generator](#generator) to create batched background migrations,
+ so that required files are created by default.
## Isolation
@@ -145,6 +148,49 @@ Make sure the newly-created data is either migrated, or
saved in both the old and new version upon creation. Removals in
turn can be handled by defining foreign keys with cascading deletes.
+### Job retry mechanism
+
+The batched background migrations retry mechanism ensures that a job is executed again in case of failure.
+The following diagram shows the different stages of our retry mechanism:
+
+```plantuml
+@startuml
+hide empty description
+note as N1
+ can_split?:
+ the failure is due to a query timeout
+end note
+[*] --> Running
+Running --> Failed
+note on link
+ if number of retries <= MAX_ATTEMPTS
+end note
+Running --> Succeeded
+Failed --> Running
+note on link
+ if number of retries > MAX_ATTEMPTS
+ and can_split? == true
+ then two jobs with smaller
+ batch size will be created
+end note
+Failed --> [*]
+Succeeded --> [*]
+@enduml
+```
+
+- `MAX_ATTEMPTS` is defined in the [`Gitlab::Database::BackgroundMigration`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/database/background_migration/batched_job.rb)
+class.
+- `can_split?` is defined in the [`Gitlab::Database::BatchedJob`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/background_migration/batched_job.rb) class.
+
+### Failed batched background migrations
+
+The whole batched background migration is marked as `failed`
+(`/chatops run batched_background_migrations status MIGRATION_ID` will show
+the migration as `failed`) if any of the following are true:
+
+- There are no more jobs to consume, and there are failed jobs.
+- More than [half of the jobs failed since the background migration was started](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/database/background_migration/batched_migration.rb).
+
### Requeuing batched background migrations
If one of the batched background migrations contains a bug that is fixed in a patch
@@ -311,6 +357,22 @@ NOTE:
When applying additional filters, it is important to ensure they are properly covered by an index to optimize `EachBatch` performance.
In the example above we need an index on `(type, id)` to support the filters. See [the `EachBatch` documentation for more information](iterating_tables_in_batches.md).
+## Generator
+
+The custom generator `batched_background_migration` scaffolds necessary files and
+accepts `table_name`, `column_name`, and `feature_category` as arguments. Usage:
+
+```shell
+bundle exec rails g batched_background_migration my_batched_migration --table_name=<table-name> --column_name=<column-name> --feature_category=<feature-category>
+```
+
+This command creates these files:
+
+- `db/post_migrate/20230214231008_queue_my_batched_migration.rb`
+- `spec/migrations/20230214231008_queue_my_batched_migration_spec.rb`
+- `lib/gitlab/background_migration/my_batched_migration.rb`
+- `spec/lib/gitlab/background_migration/my_batched_migration_spec.rb`
+
## Example
The `routes` table has a `source_type` field that's used for a polymorphic relationship.
@@ -319,8 +381,13 @@ the work is migrating data from the `source_id` column into a new singular forei
Because we intend to delete old rows later, there's no need to update them as part of the
background migration.
-1. Start by defining our migration class, which should inherit
- from `Gitlab::BackgroundMigration::BatchedMigrationJob`:
+1. Start by using the generator to create batched background migration files:
+
+ ```shell
+ bundle exec rails g batched_background_migration BackfillRouteNamespaceId --table_name=routes --column_name=id --feature_category=source_code_management
+ ```
+
+1. Update the migration job (subclass of `BatchedMigrationJob`) to copy `source_id` values to `namespace_id`:
```ruby
class Gitlab::BackgroundMigration::BackfillRouteNamespaceId < BatchedMigrationJob
@@ -344,10 +411,10 @@ background migration.
```
NOTE:
- Job classes must be subclasses of `BatchedMigrationJob` to be
+ Job classes inherit from `BatchedMigrationJob` to ensure they are
correctly handled by the batched migration framework. Any subclass of
- `BatchedMigrationJob` is initialized with necessary arguments to
- execute the batch, as well as a connection to the tracking database.
+ `BatchedMigrationJob` is initialized with the necessary arguments to
+ execute the batch, and a connection to the tracking database.
1. Create a database migration that adds a new trigger to the database. Example:
@@ -380,12 +447,14 @@ background migration.
end
```
-1. Create a post-deployment migration that queues the migration for existing data:
+1. Update the created post-deployment migration with required delay and batch sizes:
```ruby
class QueueBackfillRoutesNamespaceId < Gitlab::Database::Migration[2.1]
MIGRATION = 'BackfillRouteNamespaceId'
DELAY_INTERVAL = 2.minutes
+ BATCH_SIZE = 1000
+ SUB_BATCH_SIZE = 100
restrict_gitlab_migration gitlab_schema: :gitlab_main
@@ -394,7 +463,9 @@ background migration.
MIGRATION,
:routes,
:id,
- job_interval: DELAY_INTERVAL
+ job_interval: DELAY_INTERVAL,
+ batch_size: BATCH_SIZE,
+ sub_batch_size: SUB_BATCH_SIZE
)
end
@@ -416,24 +487,6 @@ background migration.
- Continues using the data as before.
- Ensures that both existing and new data are migrated.
-1. In the next release, add a database migration to remove the trigger.
-
- ```ruby
- class RemoveNamepaceIdTriggerFromRoutes < Gitlab::Database::Migration[2.1]
- FUNCTION_NAME = 'example_function'
- TRIGGER_NAME = 'example_trigger'
-
- def up
- drop_trigger(TRIGGER_NAME, :routes)
- drop_function(FUNCTION_NAME)
- end
-
- def down
- # Should reverse the trigger and the function in the up method of the migration that added it
- end
- end
- ```
-
1. Add a new post-deployment migration
that checks that the batched background migration is completed. For example:
@@ -469,6 +522,24 @@ background migration.
instance, the data is advisory, and not mission-critical), then you can skip this
final step. This step confirms that the migration is completed, and all of the rows were migrated.
+1. Add a database migration to remove the trigger.
+
+ ```ruby
+ class RemoveNamepaceIdTriggerFromRoutes < Gitlab::Database::Migration[2.1]
+ FUNCTION_NAME = 'example_function'
+ TRIGGER_NAME = 'example_trigger'
+
+ def up
+ drop_trigger(TRIGGER_NAME, :routes)
+ drop_function(FUNCTION_NAME)
+ end
+
+ def down
+ # Should reverse the trigger and the function in the up method of the migration that added it
+ end
+ end
+ ```
+
After the batched migration is completed, you can safely depend on the
data in `routes.namespace_id` being populated.
@@ -569,6 +640,37 @@ for more details.
more pressure on DB than you expect. Measure on staging,
or ask someone to measure on production.
1. Know how much time is required to run the batched background migration.
+1. Be careful when silently rescuing exceptions inside job classes. This may lead to
+ jobs being marked as successful, even in a failure scenario.
+
+ ```ruby
+ # good
+ def perform
+ each_sub_batch do |sub_batch|
+ sub_batch.update_all(name: 'My Name')
+ end
+ end
+
+ # acceptable
+ def perform
+ each_sub_batch do |sub_batch|
+ sub_batch.update_all(name: 'My Name')
+ rescue Exception => error
+ logger.error(message: error.message, class: error.class)
+
+ raise
+ end
+ end
+
+ # bad
+ def perform
+ each_sub_batch do |sub_batch|
+ sub_batch.update_all(name: 'My Name')
+ rescue Exception => error
+ logger.error(message: error.message, class: self.class.name)
+ end
+ end
+ ```
## Additional tips and strategies
@@ -719,6 +821,99 @@ You can view failures in two ways:
WHERE transition_logs.next_status = '2' AND migration.job_class_name = "CLASS_NAME";
```
+### Executing a particular batch on the database testing pipeline
+
+NOTE:
+Only [database maintainers](https://gitlab.com/groups/gitlab-org/maintainers/database/-/group_members?with_inherited_permissions=exclude) can view the database testing pipeline artifacts. Ask one for help if you need to use this method.
+
+Let's assume that a batched background migration failed on a particular batch on GitLab.com and you want to figure out which query failed and why. At the moment, we don't have a good way to retrieve query information (especially the query parameters) and rerunning the entire migration with more logging would be a long process.
+
+Fortunately you can leverage our [database migration pipeline](database_migration_pipeline.md) to rerun a particular batch with additional logging and/or fix to see if it solves the problem.
+
+<!-- vale gitlab.Substitutions = NO -->
+For an example see [Draft: Test PG::CardinalityViolation fix](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/110910) but make sure to read the entire section.
+
+To do that, you need to:
+
+1. Find the batch `start_id` and `end_id`
+1. Create a regular migration
+1. Apply a workaround for our migration helpers (optional)
+1. Start the database migration pipeline
+
+#### 1. Find the batch `start_id` and `end_id`
+
+You should be able to find those in [Kibana](#viewing-failure-error-logs).
+
+#### 2. Create a regular migration
+
+Schedule the batch in the `up` block of a regular migration:
+
+```ruby
+def up
+ instance = Gitlab::BackgroundMigration::YourBackgroundMigrationClass.new(
+ start_id: <batch start_id>,
+ end_id: <batch end_id>,
+ batch_table: <table name>,
+ batch_column: <batching column>,
+ sub_batch_size: <sub batch size>,
+ pause_ms: <miliseconds between batches>,
+ job_arguments: <job arguments if any>,
+ connection: connection
+ )
+
+ instance.perform
+end
+
+
+def down
+ # no-op
+end
+```
+
+#### 3. Apply a workaround for our migration helpers (optional)
+
+If your batched background migration touches tables from a schema other than the one you specified by using `restrict_gitlab_migration` helper (example: the scheduling migration has `restrict_gitlab_migration gitlab_schema: :gitlab_main` but the background job uses tables from the `:gitlab_ci` schema) then the migration will fail. To prevent that from happening you'll have to monkey patch database helpers so they don't fail the testing pipeline job:
+
+1. Add the schema names to [`RestrictGitlabSchema`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb#L57)
+
+```diff
+diff --git a/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb b/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb
+index b8d1d21a0d2d2a23d9e8c8a0a17db98ed1ed40b7..912e20659a6919f771045178c66828563cb5a4a1 100644
+--- a/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb
++++ b/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb
+@@ -55,7 +55,7 @@ def unmatched_schemas
+ end
+
+ def allowed_schemas_for_connection
+- Gitlab::Database.gitlab_schemas_for_connection(connection)
++ Gitlab::Database.gitlab_schemas_for_connection(connection) << :gitlab_ci
+ end
+ end
+ end
+```
+
+1. Add the schema names to [`RestrictAllowedSchemas`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb#L82)
+
+```diff
+diff --git a/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb b/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb
+index 4ae3622479f0800c0553959e132143ec9051898e..d556ec7f55adae9d46a56665ce02de782cb09f2d 100644
+--- a/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb
++++ b/lib/gitlab/database/query_analyzers/restrict_allowed_schemas.rb
+@@ -79,7 +79,7 @@ def restrict_to_dml_only(parsed)
+ tables = self.dml_tables(parsed)
+ schemas = self.dml_schemas(tables)
+
+- if (schemas - self.allowed_gitlab_schemas).any?
++ if (schemas - (self.allowed_gitlab_schemas << :gitlab_ci)).any?
+ raise DMLAccessDeniedError, \
+ "Select/DML queries (SELECT/UPDATE/DELETE) do access '#{tables}' (#{schemas.to_a}) " \
+ "which is outside of list of allowed schemas: '#{self.allowed_gitlab_schemas}'. " \
+```
+
+#### 4. Start the database migration pipeline
+
+Create a Draft merge request with your changes and trigger the manual `db:gitlabcom-database-testing` job.
+
### Adding indexes to support batched background migrations
Sometimes it is necessary to add a new or temporary index to support a batched background migration.
diff --git a/doc/development/database/ci_mirrored_tables.md b/doc/development/database/ci_mirrored_tables.md
index bf3a744b936..1e37739bdc4 100644
--- a/doc/development/database/ci_mirrored_tables.md
+++ b/doc/development/database/ci_mirrored_tables.md
@@ -76,9 +76,8 @@ the source and the target tables in sync:
1. Deleting namespaces/projects.
```mermaid
-graph TD
-
- subgraph "CI database (tables)"
+graph LR
+ subgraph CI["CI Tables"]
E[other CI tables]
F{queries with joins allowed}
G[ci_project_mirrors]
@@ -89,17 +88,18 @@ graph TD
F---H
end
- A---B
- B---C
- B---D
+ Main["Main Tables"]---L["⛔ ← Joins are not allowed → ⛔"]
+ L---CI
-L["⛔ ← Joins are not allowed → ⛔"]
-
- subgraph "Main database (tables)"
+ subgraph Main["Main Tables"]
A[other main tables]
B{queries with joins allowed}
C[projects]
D[namespaces]
+
+ A---B
+ B---C
+ B---D
end
```
diff --git a/doc/development/database/clickhouse/index.md b/doc/development/database/clickhouse/index.md
index a26bac261fd..032e4f5f6ee 100644
--- a/doc/development/database/clickhouse/index.md
+++ b/doc/development/database/clickhouse/index.md
@@ -83,3 +83,65 @@ Quoting the [documentation](https://clickhouse.com/docs/en/sql-reference/stateme
> If there's some aggregation in the view query, it's applied only to the batch
> of freshly inserted data. Any changes to existing data of the source table
> (like update, delete, drop a partition, etc.) do not change the materialized view.
+
+## Secure and sensible defaults
+
+ClickHouse instances should follow these security recommendations:
+
+### Users
+
+Files: `users.xml` and `config.xml`.
+
+| Topic | Security Requirement | Reason |
+| ----- | -------------------- | ------ |
+| [`user_name/password`](https://clickhouse.com/docs/en/operations/settings/settings-users/#user-namepassword) | Usernames **must not** be blank. Passwords **must** use `password_sha256_hex` and **must not** be blank. | `plaintext` and `password_double_sha1_hex` are insecure. If username isn't specified, [`default` is used with no password](https://clickhouse.com/docs/en/operations/settings/settings-users/). |
+| [`access_management`](https://clickhouse.com/docs/en/operations/settings/settings-users/#access_management-user-setting) | Use Server [configuration files](https://clickhouse.com/docs/en/operations/configuration-files) `users.xml` and `config.xml`. Avoid SQL-driven workflow. | SQL-driven workflow implies that at least one user has `access_management` which can be avoided via configuration files. These files are easier to audit and monitor too, considering that ["You can't manage the same access entity by both configuration methods simultaneously."](https://clickhouse.com/docs/en/operations/access-rights/#access-control). |
+| [`user_name/networks`](https://clickhouse.com/docs/en/operations/settings/settings-users/#user-namenetworks) | At least one of `<ip>`, `<host>`, `<host_regexp>` **must** be set. Do not use `<ip>::/0</ip>` to open access for any network. | Network controls. ([Trust cautiously](https://about.gitlab.com/handbook/security/architecture/#trust-cautiously) principle) |
+| [`user_name/profile`](https://clickhouse.com/docs/en/operations/settings/settings-users/#user-nameprofile) | Use profiles to set similar properties across multiple users and set limits (from the user interface). | [Least privilege](https://about.gitlab.com/handbook/security/architecture/#assign-the-least-privilege-possible) principle and limits. |
+| [`user_name/quota`](https://clickhouse.com/docs/en/operations/settings/settings-users/#user-namequota) | Set quotas for users whenever possible. | Limit resource usage over a period of time or track the use of resources. |
+| [`user_name/databases`](https://clickhouse.com/docs/en/operations/settings/settings-users/#user-namedatabases) | Restrict access to data, and avoid users with full access. | [Least privilege](https://about.gitlab.com/handbook/security/architecture/#assign-the-least-privilege-possible) principle. |
+
+### Network
+
+Files: `config.xml`
+
+| Topic | Security Requirement | Reason |
+| ----- | -------------------- | ------ |
+| [`mysql_port`](https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings/#server_configuration_parameters-mysql_port) | Disable MySQL access unless strictly necessary:<br/> `<!-- <mysql_port>9004</mysql_port> -->`. | Close unnecessary ports and features exposure. ([Defense in depth](https://about.gitlab.com/handbook/security/architecture/#implement-defense-in-depth) principle) |
+| [`postgresql_port`](https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings/#server_configuration_parameters-postgresql_port) | Disable PostgreSQL access unless strictly necessary:<br/> `<!-- <mysql_port>9005</mysql_port> -->` | Close unnecessary ports and features exposure. ([Defense in depth](https://about.gitlab.com/handbook/security/architecture/#implement-defense-in-depth) principle) |
+| [`http_port/https_port`](https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings/#http-porthttps-port) & [`tcp_port/tcp_port_secure`](https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings/#http-porthttps-port) | Configure [SSL-TLS](https://clickhouse.com/docs/en/guides/sre/configuring-ssl), and disable non SSL ports:<br/>`<!-- <http_port>8123</http_port> -->`<br/>`<!-- <tcp_port>9000</tcp_port> -->`<br/>and enable secure ports:<br/>`<https_port>8443</https_port>`<br/>`<tcp_port_secure>9440</tcp_port_secure>` | Encrypt data in transit. ([Defense in depth](https://about.gitlab.com/handbook/security/architecture/#implement-defense-in-depth) principle) |
+| [`interserver_http_host`](https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings/#interserver-http-host) | Disable `interserver_http_host` in favor of `interserver_https_host` (`<interserver_https_port>9010</interserver_https_port>`) if ClickHouse is configured as a cluster. | Encrypt data in transit. ([Defense in depth](https://about.gitlab.com/handbook/security/architecture/#implement-defense-in-depth) principle) |
+
+### Storage
+
+| Topic | Security Requirement | Reason |
+| ----- | -------------------- | ------ |
+| Permissions | ClickHouse runs by default with the `clickhouse` user. Running as `root` is never needed. Use the principle of least privileges for the folders: `/etc/clickhouse-server`, `/var/lib/clickhouse`, `/var/log/clickhouse-server`. These folders must belong to the `clickhouse` user and group, and no other system user must have access to them. | Default passwords, ports and rules are "open doors". ([Fail securely & use secure defaults](https://about.gitlab.com/handbook/security/architecture/#fail-securely--use-secure-defaults) principle) |
+| Encryption | Use an encrypted storage for logs and data if RED data is processed. On Kubernetes, the [StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/) used must be encrypted. | Encrypt data at rest. ([Defense in depth](https://about.gitlab.com/handbook/security/architecture/#implement-defense-in-depth)) |
+
+### Logging
+
+| Topic | Security Requirement | Reason |
+| ----- | -------------------- | ------ |
+| `logger` | `Log` and `errorlog` **must** be defined and writable by `clickhouse`. | Make sure logs are stored. |
+| SIEM | If hosted on GitLab.com, the ClickHouse instance or cluster **must** report [logs to our SIEM](https://internal-handbook.gitlab.io/handbook/security/infrastructure_security_logging/tooling/devo/) (internal link). | [GitLab logs critical information system activity](https://about.gitlab.com/handbook/security/audit-logging-policy.html). |
+| Log sensitive data | Query masking rules **must** be used if sensitive data can be logged. See [example masking rules](#example-masking-rules). | [Column level encryption](https://clickhouse.com/docs/en/sql-reference/functions/encryption-functions/) can be used and leak sensitive data (keys) in logs. |
+
+#### Example masking rules
+
+```xml
+<query_masking_rules>
+ <rule>
+ <name>hide SSN</name>
+ <regexp>(^|\D)\d{3}-\d{2}-\d{4}($|\D)</regexp>
+ <replace>000-00-0000</replace>
+ </rule>
+ <rule>
+ <name>hide encrypt/decrypt arguments</name>
+ <regexp>
+ ((?:aes_)?(?:encrypt|decrypt)(?:_mysql)?)\s*\(\s*(?:'(?:\\'|.)+'|.*?)\s*\)
+ </regexp>
+ <replace>\1(???)</replace>
+ </rule>
+</query_masking_rules>
+```
diff --git a/doc/development/database/clickhouse/merge_request_analytics.md b/doc/development/database/clickhouse/merge_request_analytics.md
new file mode 100644
index 00000000000..34da71d6c4c
--- /dev/null
+++ b/doc/development/database/clickhouse/merge_request_analytics.md
@@ -0,0 +1,355 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
+---
+
+# Merge request analytics with ClickHouse
+
+The [merge request analytics feature](../../../user/analytics/merge_request_analytics.md)
+shows statistics about the merged merge requests in the project and also exposes record-level metadata.
+Aggregations include:
+
+- **Average time to merge**: The duration between the creation time and the merge time.
+- **Monthly aggregations**: A chart of 12 months of the merged merge requests.
+
+Under the chart, the user can see the paginated list of merge requests, 12 months per page.
+
+You can filter by:
+
+- Author
+- Assignee
+- Labels
+- Milestone
+- Source branch
+- Target branch
+
+## Current performance problems
+
+- The aggregation queries require specialized indexes, which cost additional
+ disk space (index-only scans).
+- Querying the whole 12 months is slow (statement timeout). Instead, the frontend
+ requests data per month (12 database queries).
+- Even with specialized indexes, making the feature available on the group level
+ would not be feasible due to the large volume of merge requests.
+
+## Example queries
+
+Get the number of merge requests merged in a given month:
+
+```sql
+SELECT COUNT(*)
+FROM "merge_requests"
+INNER JOIN "merge_request_metrics" ON "merge_request_metrics"."merge_request_id" = "merge_requests"."id"
+WHERE (NOT EXISTS
+ (SELECT 1
+ FROM "banned_users"
+ WHERE (merge_requests.author_id = banned_users.user_id)))
+ AND "merge_request_metrics"."target_project_id" = 278964
+ AND "merge_request_metrics"."merged_at" >= '2022-12-01 00:00:00'
+ AND "merge_request_metrics"."merged_at" <= '2023-01-01 00:00:00'
+```
+
+The `merge_request_metrics` table was de-normalized (by adding `target_project_id`)
+to improve the first-page load time. The query itself works well for smaller date ranges,
+however, it can time out as the date range increases.
+
+After an extra filter is added, the query becomes more complex because it must also
+filter the `merge_requests` table:
+
+```sql
+SELECT COUNT(*)
+FROM "merge_requests"
+INNER JOIN "merge_request_metrics" ON "merge_request_metrics"."merge_request_id" = "merge_requests"."id"
+WHERE (NOT EXISTS
+ (SELECT 1
+ FROM "banned_users"
+ WHERE (merge_requests.author_id = banned_users.user_id)))
+ AND "merge_requests"."author_id" IN
+ (SELECT "users"."id"
+ FROM "users"
+ WHERE (LOWER("users"."username") IN (LOWER('ahegyi'))))
+ AND "merge_request_metrics"."target_project_id" = 278964
+ AND "merge_request_metrics"."merged_at" >= '2022-12-01 00:00:00'
+ AND "merge_request_metrics"."merged_at" <= '2023-01-01 00:00:00'
+```
+
+To calculate mean time to merge, we also query the total time between the
+merge request creation time and merge time.
+
+```sql
+SELECT EXTRACT(epoch
+ FROM SUM(AGE(merge_request_metrics.merged_at, merge_request_metrics.created_at)))
+FROM "merge_requests"
+INNER JOIN "merge_request_metrics" ON "merge_request_metrics"."merge_request_id" = "merge_requests"."id"
+WHERE (NOT EXISTS
+ (SELECT 1
+ FROM "banned_users"
+ WHERE (merge_requests.author_id = banned_users.user_id)))
+ AND "merge_requests"."author_id" IN
+ (SELECT "users"."id"
+ FROM "users"
+ WHERE (LOWER("users"."username") IN (LOWER('ahegyi'))))
+ AND "merge_request_metrics"."target_project_id" = 278964
+ AND "merge_request_metrics"."merged_at" >= '2022-08-01 00:00:00'
+ AND "merge_request_metrics"."merged_at" <= '2022-09-01 00:00:00'
+ AND "merge_request_metrics"."merged_at" > "merge_request_metrics"."created_at"
+LIMIT 1
+```
+
+## Store merge request data in ClickHouse
+
+Several other use cases exist for storing and querying merge request data in
+ClickHouse. In this document, we focus on this particular feature.
+
+The core data exists in the `merge_request_metrics` and in the `merge_requests`
+database tables. Some filters require extra tables to be joined:
+
+- `banned_users`: Filter out merge requests created by banned users.
+- `labels`: A merge request can have one or more assigned labels.
+- `assignees`: A merge request can have one or more assignees.
+- `merged_at`: The `merged_at` column is located in the `merge_request_metrics` table.
+
+The `merge_requests` table contains data that can be filtered directly:
+
+- **Author**: via the `author_id` column.
+- **Milestone**: via the `milestone_id` column.
+- **Source branch**.
+- **Target branch**.
+- **Project**: via the `project_id` column.
+
+### Keep ClickHouse data up to date
+
+Replicating or syncing the `merge_requests` table is unfortunately not enough.
+Separate queries to associated tables are required to insert one de-normalized
+`merge_requests` row into the ClickHouse database.
+
+Change detection is non-trivial to implement. A few corners we could cut:
+
+- The feature is available for GitLab Premium and GitLab Ultimate customers.
+ We don't have to sync all the data, but instead sync only the `merge_requests` records
+ which are part of licensed groups.
+- Data changes (often) happen via the `MergeRequest` services, where bumping the
+ `updated_at` timestamp column is mostly consistent. Some sort of incremental
+ synchronization process could be implemented.
+- We only need to query the merged merge requests. After the merge, the record rarely changes.
+
+### Database table structure
+
+The database table structure uses de-normalization to make all required columns
+available in one database table. This eliminates the need for `JOINs`.
+
+```sql
+CREATE TABLE merge_requests
+(
+ `id` UInt64,
+ `project_id` UInt64 DEFAULT 0 NOT NULL,
+ `author_id` UInt64 DEFAULT 0 NOT NULL,
+ `milestone_id` UInt64 DEFAULT 0 NOT NULL,
+ `label_ids` Array(UInt64) DEFAULT [] NOT NULL,
+ `assignee_ids` Array(UInt64) DEFAULT [] NOT NULL,
+ `source_branch` String DEFAULT '' NOT NULL,
+ `target_branch` String DEFAULT '' NOT NULL,
+ `merged_at` DateTime64(6, 'UTC') NOT NULL,
+ `created_at` DateTime64(6, 'UTC') DEFAULT now() NOT NULL,
+ `updated_at` DateTime64(6, 'UTC') DEFAULT now() NOT NULL
+)
+ENGINE = ReplacingMergeTree(updated_at)
+ORDER BY (project_id, merged_at, id);
+```
+
+Similarly to the [activity data example](gitlab_activity_data.md), we use the
+`ReplacingMergeTree` engine. Several columns of the merge request record may change,
+so keeping the table up-to-date is important.
+
+The database table is ordered by the `project_id, merged_at, id` columns. This ordering
+optimizes the table data for our use case: querying the `merged_at` column in a project.
+
+## Rewrite the count query
+
+First, let's generate some data for the table.
+
+```sql
+INSERT INTO merge_requests (id, project_id, author_id, milestone_id, label_ids, merged_at, created_at)
+SELECT id, project_id, author_id, milestone_id, label_ids, merged_at, created_at
+FROM generateRandom('id UInt64, project_id UInt8, author_id UInt8, milestone_id UInt8, label_ids Array(UInt8), merged_at DateTime64(6, \'UTC\'), created_at DateTime64(6, \'UTC\')')
+LIMIT 1000000;
+```
+
+NOTE:
+Some integer data types were cast as `UInt8` so it is highly probable that they
+have same values across different rows.
+
+The original count query only aggregated data for one month. With ClickHouse, we can
+attempt aggregating the data for the whole year.
+
+PostgreSQL-based count query:
+
+```sql
+SELECT COUNT(*)
+FROM "merge_requests"
+INNER JOIN "merge_request_metrics" ON "merge_request_metrics"."merge_request_id" = "merge_requests"."id"
+WHERE (NOT EXISTS
+ (SELECT 1
+ FROM "banned_users"
+ WHERE (merge_requests.author_id = banned_users.user_id)))
+ AND "merge_request_metrics"."target_project_id" = 278964
+ AND "merge_request_metrics"."merged_at" >= '2022-12-01 00:00:00'
+ AND "merge_request_metrics"."merged_at" <= '2023-01-01 00:00:00'
+```
+
+ClickHouse query:
+
+```sql
+SELECT
+ toYear(merged_at) AS year,
+ toMonth(merged_at) AS month,
+ COUNT(*)
+FROM merge_requests
+WHERE
+ project_id = 200
+ AND merged_at BETWEEN '2022-01-01 00:00:00'
+ AND '2023-01-01 00:00:00'
+GROUP BY year, month
+```
+
+The query processed a significantly lower number of rows compared to the generated data.
+The `ORDER BY` clause (primary key) is helping the query execution:
+
+```plaintext
+11 rows in set. Elapsed: 0.010 sec.
+Processed 8.19 thousand rows, 131.07 KB (783.45 thousand rows/s., 12.54 MB/s.)
+```
+
+## Rewrite the Mean time to merge query
+
+The query calculates the mean time to merge as:
+`duration(created_at, merged_at) / merge_request_count`. The calculation is done in
+two separate steps:
+
+1. Request the monthly counts and the monthly duration values.
+1. Sum the counts to get the yearly count.
+1. Sum the durations to get the yearly duration.
+1. Divide the durations by the count.
+
+In ClickHouse, we can calculate the mean time to merge with one query:
+
+```sql
+SELECT
+ SUM(
+ dateDiff('second', merged_at, created_at) / 3600 / 24
+ ) / COUNT(*) AS mean_time_to_merge -- mean_time_to_merge is in days
+FROM merge_requests
+WHERE
+ project_id = 200
+ AND merged_at BETWEEN '2022-01-01 00:00:00'
+ AND '2023-01-01 00:00:00'
+```
+
+## Filtering
+
+The database queries above can be used as base queries. You can add more filters.
+For example, filtering for a label and a milestone:
+
+```sql
+SELECT
+ toYear(merged_at) AS year,
+ toMonth(merged_at) AS month,
+ COUNT(*)
+FROM merge_requests
+WHERE
+ project_id = 200
+ AND milestone_id = 15
+ AND has(label_ids, 118)
+ AND -- array includes 118
+ merged_at BETWEEN '2022-01-01 00:00:00'
+ AND '2023-01-01 00:00:00'
+GROUP BY year, month
+```
+
+Optimizing a particular filter is usually done with a database index. This particular
+query reads 8000 rows:
+
+```plaintext
+1 row in set. Elapsed: 0.016 sec.
+Processed 8.19 thousand rows, 589.99 KB (505.38 thousand rows/s., 36.40 MB/s.)
+```
+
+Adding an index on `milestone_id`:
+
+```sql
+ALTER TABLE merge_requests
+ADD
+ INDEX milestone_id_index milestone_id TYPE minmax GRANULARITY 10;
+ALTER TABLE
+ merge_requests MATERIALIZE INDEX milestone_id_index;
+```
+
+On the generated data, adding the index didn't improve the performance.
+
+### Banned users filter
+
+A recently added feature in GitLab filters out merge requests where the author is
+banned by the admins. The banned users are tracked on the instance level in the
+`banned_users` database table.
+
+#### Idea 1: Enumerate the banned user IDs
+
+This would require no structural changes to the ClickHouse database schema.
+We could query the banned users in the project and filter the values out in query time.
+
+Get the banned users (in PostgreSQL):
+
+```sql
+SELECT user_id FROM banned_users
+```
+
+In ClickHouse
+
+```sql
+SELECT
+ toYear(merged_at) AS year,
+ toMonth(merged_at) AS month,
+ COUNT(*)
+FROM merge_requests
+WHERE
+ author_id NOT IN (1, 2, 3, 4) AND -- banned users
+ project_id = 200
+ AND milestone_id = 15
+ AND has(label_ids, 118) AND -- array includes 118
+ merged_at BETWEEN '2022-01-01 00:00:00'
+ AND '2023-01-01 00:00:00'
+GROUP BY year, month
+```
+
+The problem with this approach is that the number of banned users could increase significantly which would make the query bigger and slower.
+
+#### Idea 2: replicate the `banned_users` table
+
+Assuming that the `banned_users table` doesn't grow to millions of rows, we could
+attempt to periodically sync the whole table to ClickHouse. With this approach,
+a mostly consistent `banned_users` table could be used in the ClickHouse database query:
+
+```sql
+SELECT
+ toYear(merged_at) AS year,
+ toMonth(merged_at) AS month,
+ COUNT(*)
+FROM merge_requests
+WHERE
+ author_id NOT IN (SELECT user_id FROM banned_users) AND
+ project_id = 200 AND
+ milestone_id = 15 AND
+ has(label_ids, 118) AND -- array includes 118
+ merged_at BETWEEN '2022-01-01 00:00:00' AND '2023-01-01 00:00:00'
+GROUP BY year, month
+```
+
+Alternatively, the `banned_users` table could be stored as a
+[dictionary](https://clickhouse.com/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts)
+to further improve the query performance.
+
+#### Idea 3: Alter the feature
+
+For analytical calculations, it might be acceptable to drop this particular filter.
+This approach assumes that including the merge requests of banned users doesn't skew the statistics significantly.
diff --git a/doc/development/database/clickhouse/tiered_storage.md b/doc/development/database/clickhouse/tiered_storage.md
new file mode 100644
index 00000000000..d9026f47e28
--- /dev/null
+++ b/doc/development/database/clickhouse/tiered_storage.md
@@ -0,0 +1,138 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
+---
+
+# Tiered Storages in ClickHouse
+
+NOTE:
+The MergeTree table engine in ClickHouse supports tiered storage.
+See the documentation for [Using Multiple Block Devices for Data Storage](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-multiple-volumes)
+for details on setup and further explanation.
+
+Quoting from the [MergeTree documentation](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-multiple-volumes):
+
+<!-- vale gitlab.Simplicity = NO -->
+
+> MergeTree family table engines can store data on multiple block devices. For example,
+> it can be useful when the data of a certain table are implicitly split into "hot" and "cold".
+> The most recent data is regularly requested but requires only a small amount of space.
+> On the contrary, the fat-tailed historical data is requested rarely.
+
+<!-- vale gitlab.Simplicity = YES -->
+
+When used with remote storage backends such as
+[Amazon S3](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-s3),
+this makes a very efficient storage scheme. It allows for storage policies, which
+allows data to be on local disks for a period of time and then move it to object storage.
+
+An [example configuration](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-multiple-volumes_configure) can look like this:
+
+```xml
+<storage_configuration>
+ <disks>
+ <fast_ssd>
+ <path>/mnt/fast_ssd/clickhouse/</path>
+ </fast_ssd>
+ <gcs>
+ <support_batch_delete>false</support_batch_delete>
+ <type>s3</type>
+ <endpoint>https://storage.googleapis.com/${BUCKET_NAME}/${ROOT_FOLDER}/</endpoint>
+ <access_key_id>${SERVICE_ACCOUNT_HMAC_KEY}</access_key_id>
+ <secret_access_key>${SERVICE_ACCOUNT_HMAC_SECRET}</secret_access_key>
+ <metadata_path>/var/lib/clickhouse/disks/gcs/</metadata_path>
+ </gcs>
+ ...
+ </disks>
+ ...
+ <policies>
+
+ <move_from_local_disks_to_gcs> <!-- policy name -->
+ <volumes>
+ <hot> <!-- volume name -->
+ <disk>fast_ssd</disk> <!-- disk name -->
+ </hot>
+ <cold>
+ <disk>gcs</disk>
+ </cold>
+ </volumes>
+ <move_factor>0.2</move_factor>
+ <!-- The move factor determines when to move data from hot volume to cold.
+ See ClickHouse docs for more details. -->
+ </moving_from_ssd_to_hdd>
+ ....
+</storage_configuration>
+```
+
+In this storage policy, two volumes are defined `hot` and `cold`. After the `hot` volume is filled with occupancy of `disk_size * move_factor`, the data is being moved to Google Cloud Storage (GCS).
+
+If this storage policy is not the default, create tables by attaching the storage policies. For example:
+
+```sql
+CREATE TABLE key_value_table (
+ event_date Date,
+ key String,
+ value String,
+) ENGINE = MergeTree
+ORDER BY (key)
+PARTITION BY toYYYYMM(event_date)
+SETTINGS storage_policy = 'move_from_local_disks_to_gcs'
+```
+
+NOTE:
+In this storage policy, the move happens implicitly. It is also possible to keep
+_hot_ data on local disks for a fixed period of time and then move them as _cold_.
+
+This approach is possible with
+[Table TTLs](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/#mergetree-table-ttl),
+which are also available with MergeTree table engine.
+
+The ClickHouse documentation shows this feature in detail, in the example of
+[implementing a hot - warm - cold architecture](https://clickhouse.com/docs/en/guides/developer/ttl/#implementing-a-hotwarmcold-architecture).
+
+You can take a similar approach for the example shown above. First, adjust the storage policy:
+
+```xml
+<storage_configuration>
+ ...
+ <policies>
+ <local_disk_and_gcs> <!-- policy name -->
+ <volumes>
+ <hot> <!-- volume name -->
+ <disk>fast_ssd</disk> <!-- disk name -->
+ </hot>
+ <cold>
+ <disk>gcs</disk>
+ </cold>
+ </volumes>
+ </local_disk_and_gcs>
+ ....
+</storage_configuration>
+```
+
+Then create the table as:
+
+```sql
+CREATE TABLE another_key_value_table (
+ event_date Date,
+ key String,
+ value String,
+) ENGINE = MergeTree
+ORDER BY (key)
+PARTITION BY toYYYYMM(event_date)
+TTL
+ event_date TO VOLUME 'hot',
+ event_date + INTERVAL 1 YEAR TO VOLUME 'cold'
+SETTINGS storage_policy = 'local_disk_and_gcs';
+```
+
+This creates the table so that data older than 1 year (evaluated against the
+`event_date` column) is moved to GCS. Such a storage policy can be helpful for append-only
+tables (like audit events) where only the most recent data is accessed frequently.
+You can drop the data altogether, which can be a regulatory requirement.
+
+We don't mention modifying TTLs in this guide, but that is possible as well.
+See ClickHouse documentation for
+[modifying TTL](https://clickhouse.com/docs/en/sql-reference/statements/alter/ttl/#modify-ttl)
+for details.
diff --git a/doc/development/database/creating_enums.md b/doc/development/database/creating_enums.md
index e2ae36f7481..908656dae84 100644
--- a/doc/development/database/creating_enums.md
+++ b/doc/development/database/creating_enums.md
@@ -63,7 +63,7 @@ module EE
module Pipeline
override :failure_reasons
def failure_reasons
- super.merge(activity_limit_exceeded: 2)
+ super.merge(job_activity_limit_exceeded: 2)
end
end
end
@@ -73,9 +73,9 @@ end
This works as-is, however, it has a couple of downside that:
- Someone could define a key/value pair in EE that is **conflicted** with a value defined in FOSS.
- For example, define `activity_limit_exceeded: 1` in `EE::Enums::Pipeline`.
+ For example, define `job_activity_limit_exceeded: 1` in `EE::Enums::Pipeline`.
- When it happens, the feature works totally different.
- For example, we cannot figure out `failure_reason` is either `config_error` or `activity_limit_exceeded`.
+ For example, we cannot figure out `failure_reason` is either `config_error` or `job_activity_limit_exceeded`.
- When it happens, we have to ship a database migration to fix the data integrity,
which might be impossible if you cannot recover the original value.
@@ -88,7 +88,7 @@ module EE
module Pipeline
override :failure_reasons
def failure_reasons
- super.merge(activity_limit_exceeded: 1_000, size_limit_exceeded: 1_001)
+ super.merge(job_activity_limit_exceeded: 1_000, size_limit_exceeded: 1_001)
end
end
end
@@ -98,7 +98,7 @@ end
This looks working as a workaround, however, this approach has some downsides that:
- Features could move from EE to FOSS or vice versa. Therefore, the offset might be mixed between FOSS and EE in the future.
- For example, when you move `activity_limit_exceeded` to FOSS, you see `{ unknown_failure: 0, config_error: 1, activity_limit_exceeded: 1_000 }`.
+ For example, when you move `job_activity_limit_exceeded` to FOSS, you see `{ unknown_failure: 0, config_error: 1, job_activity_limit_exceeded: 1_000 }`.
- The integer column for the `enum` is likely created [as `SMALLINT`](#creating-enums).
Therefore, you need to be careful of that the offset doesn't exceed the maximum value of 2 bytes integer.
@@ -110,7 +110,7 @@ class Pipeline < ApplicationRecord
enum failure_reason: {
unknown_failure: 0,
config_error: 1,
- activity_limit_exceeded: 2
+ job_activity_limit_exceeded: 2
}
end
```
diff --git a/doc/development/database/database_dictionary.md b/doc/development/database/database_dictionary.md
index b7e6fa4b5b3..84b76ddc34c 100644
--- a/doc/development/database/database_dictionary.md
+++ b/doc/development/database/database_dictionary.md
@@ -12,7 +12,8 @@ locate the feature categories responsible for specific database tables.
## Location
Database dictionary metadata files are stored in the `gitlab` project under `db/docs/` for the `main` and `ci` databases.
-For the `geo` database, the dictionary files are stored under `ee/db/docs/`.
+For the `embedding` database, the dictionary files are stored under `ee/db/embedding/docs/`.
+For the `geo` database, the dictionary files are stored under `ee/db/geo/docs/`.
## Example dictionary file
@@ -26,6 +27,7 @@ feature_categories:
description: Represents a Terraform state backend
introduced_by_url: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/26619
milestone: '13.0'
+gitlab_schema: gitlab_main
```
## Adding tables
@@ -50,7 +52,8 @@ When adding a table, you should:
- `gitlab_main` table: `db/docs/`
- `gitlab_ci` table: `db/docs/`
- `gitlab_shared` table: `db/docs/`
- - `gitlab_geo` table: `ee/db/docs/`
+ - `gitlab_embedding` table: `ee/db/embedding/docs/`
+ - `gitlab_geo` table: `ee/db/geo/docs/`
1. Name the file `<table_name>.yml`, and include as much information as you know about the table.
1. Include this file in the commit with the migration that creates the table.
@@ -78,7 +81,8 @@ When dropping a table, you should:
- `gitlab_main` table: `db/docs/deleted_tables/`
- `gitlab_ci` table: `db/docs/deleted_tables/`
- `gitlab_shared` table: `db/docs/deleted_tables/`
- - `gitlab_geo` table: `ee/db/docs/deleted_tables/`
+ - `gitlab_embedding` table: `ee/db/embedding/docs/deleted_tables/`
+ - `gitlab_geo` table: `ee/db/geo/docs/deleted_tables/`
1. Add the fields `removed_by_url` and `removed_in_milestone` to the dictionary file.
1. Include this change in the commit with the migration that drops the table.
@@ -104,7 +108,8 @@ When adding a new view, you should:
- `gitlab_main` view: `db/docs/views/`
- `gitlab_ci` view: `db/docs/views/`
- `gitlab_shared` view: `db/docs/views/`
- - `gitlab_geo` view: `ee/db/docs/views/`
+ - `gitlab_embedding` view: `ee/db/embedding/docs/views/`
+ - `gitlab_geo` view: `ee/db/geo/docs/views/`
1. Name the file `<view_name>.yml`, and include as much information as you know about the view.
1. Include this file in the commit with the migration that creates the view.
@@ -132,6 +137,7 @@ When dropping a view, you should:
- `gitlab_main` view: `db/docs/deleted_views/`
- `gitlab_ci` view: `db/docs/deleted_views/`
- `gitlab_shared` view: `db/docs/deleted_views/`
- - `gitlab_geo` view: `ee/db/docs/deleted_views/`
+ - `gitlab_embedding` view: `ee/db/embedding/docs/deleted_views/`
+ - `gitlab_geo` view: `ee/db/geo/docs/deleted_views/`
1. Add the fields `removed_by_url` and `removed_in_milestone` to the dictionary file.
1. Include this change in the commit with the migration that drops the view.
diff --git a/doc/development/database/database_lab.md b/doc/development/database/database_lab.md
index 162fc597cc4..357133d8bca 100644
--- a/doc/development/database/database_lab.md
+++ b/doc/development/database/database_lab.md
@@ -12,6 +12,17 @@ on replicated production data. Unlike a typical read-only production replica, in
also create, update, and delete rows. You can also test the performance of
schema changes, like additional indexes or columns, in an isolated copy of production data.
+## Database Lab quick start
+
+1. [Visit the console](https://console.postgres.ai/).
+1. Select **Sign in with Google**. (Not GitLab, as you need Google SSO to connect with our project.)
+1. After you sign in, select the GitLab organization and then visit "Ask Joe" in the sidebar.
+1. Select the database you're testing against:
+ - Most queries for the GitLab project run against `gitlab-production-tunnel-pg12`.
+ - If the query is for a CI table, select `gitlab-production-ci`.
+ - If the query is for the container registry, select `gitlab-production-registry`.
+1. Type `explain <Query Text>` in the chat box to get a plan.
+
## Access Database Lab Engine
Access to the DLE is helpful for:
@@ -21,27 +32,25 @@ Access to the DLE is helpful for:
To access the DLE's services, you can:
-- Perform query testing in the `#database_lab` Slack channel, or in the Postgres.ai web console.
+- Perform query testing in the Postgres.ai web console.
Employees access both services with their GitLab Google account. Query testing
provides `EXPLAIN` (analyze, buffers) plans for queries executed there.
- Migration testing by triggering a job as a part of a merge request.
- Direct `psql` access to DLE instead of a production replica. Available to authorized users only.
- To request `psql` access, file an [access request](https://about.gitlab.com/handbook/business-technology/team-member-enablement/onboarding-access-requests/access-requests/#individual-or-bulk-access-request).
+ To request `psql` access, file an [access request](https://about.gitlab.com/handbook/business-technology/end-user-services/onboarding-access-requests/access-requests/#individual-or-bulk-access-request).
For more assistance, use the `#database` Slack channel.
NOTE:
If you need only temporary access to a production replica, instead of a Database Lab
clone, follow the runbook procedure for connecting to the
-[database console with Teleport](https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/Teleport/Connect_to_Database_Console_via_Teleport.md).
-This procedure is similar to [Rails console access with Teleport](https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/Teleport/Connect_to_Rails_Console_via_Teleport.md#how-to-use-teleport-to-connect-to-rails-console).
+[database console with Teleport](https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/teleport/Connect_to_Database_Console_via_Teleport.md).
+This procedure is similar to [Rails console access with Teleport](https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/teleport/Connect_to_Rails_Console_via_Teleport.md#how-to-use-teleport-to-connect-to-rails-console).
### Query testing
You can access Database Lab's query analysis features either:
-- In the `#database_lab` Slack channel. Shows everyone's commands and results, but
- your own commands are still isolated in their own clone.
- In [the Postgres.ai web console](https://console.postgres.ai/GitLab/joe-instances).
Shows only the commands you run.
@@ -86,7 +95,7 @@ Caveats:
[`ci_builds`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/db/docs/ci_builds.yml#L14),
use `gitlab-production-ci`.
- Database Lab typically has a small delay of a few hours. If more up-to-date information
- is required, you can instead request access to a replica [via Teleport](https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/Teleport/Connect_to_Database_Console_via_Teleport.md)
+ is required, you can instead request access to a replica [via Teleport](https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/teleport/Connect_to_Database_Console_via_Teleport.md)
For example: `\d index_design_management_designs_on_project_id` produces:
@@ -121,6 +130,79 @@ For information on testing migrations, review our
### Access the console with `psql`
+NOTE:
+You must have `AllFeaturesUser` [`psql` access](#access-database-lab-engine) to access the console with `psql`.
+
+#### Simplified access through `pgai` Ruby gem
+
+[@mbobin](https://gitlab.com/mbobin) created the [`pgai` Ruby Gem](https://gitlab.com/mbobin/pgai/#pgai) that
+greatly simplifies access to a database clone, with support for:
+
+- Access to all database clones listed in the [Postgres.ai instances page](https://console.postgres.ai/gitlab/instances);
+- Multiple `psql` sessions on the same clone.
+
+If you have `AllFeaturesUser` [`psql` access](#access-database-lab-engine), you can follow the steps below to configure
+the `pgai` Gem:
+
+1. To get started, you need to gather some values from the [Postgres.ai instances page](https://console.postgres.ai/gitlab/instances):
+
+ 1. Navigate to the instance that you want to configure and the on right side of the screen.
+ 1. Under **Connection**, select **Connect**. The menu might be collapsed.
+
+ A pop-up with everything that's needed for configuration appears, using this format:
+
+ ```shell
+ dblab init --url http://127.0.0.1:1234 --token TOKEN --environment-id <environment-id>
+ ```
+
+ ```shell
+ ssh -NTML 1234:localhost:<environment-port> <postgresai-user>@<postgresai-proxy> -i ~/.ssh/id_rsa
+ ```
+
+1. Add the following snippet to your SSH configuration file at `~/.ssh/config`, replacing the variable values:
+
+ ```plaintext
+ Host pgai-proxy
+ HostName <postgresai-proxy>
+ User <postgresai-user>
+ IdentityFile ~/.ssh/id_ed25519
+ ```
+
+1. Run the following command so you can accept the server key fingerprint:
+
+ ```shell
+ ssh pgai-proxy
+ ```
+
+1. Run the following commands:
+
+ ```shell
+ gem install pgai
+
+ # Grab an access token: https://console.postgres.ai/gitlab/tokens
+ # GITLAB_USER is your GitLab handle
+ pgai config --dbname=gitlabhq_dblab --prefix=$GITLAB_USER --proxy=pgai-proxy
+
+ # Grab the respective port values from https://console.postgres.ai/gitlab/instances
+ # for the instances you'll be using (in this case, for the `main` database instance)
+ pgai env add --alias main --id <environment-id> --port <environment-port>
+ ```
+
+1. Once this one-time configuration is done, you can use `pgai connect` to connect to a particular database. For
+ instance, to connect to the `main` database:
+
+ ```shell
+ pgai connect main
+ ```
+
+1. Once done with the clone, you can destroy it:
+
+ ```shell
+ pgai destroy main
+ ```
+
+#### Manual access through the Postgres.ai instances page
+
Team members with [`psql` access](#access-database-lab-engine), can gain direct access
to a clone via `psql`. Access to `psql` enables you to see data, not just metadata.
diff --git a/doc/development/database/database_migration_pipeline.md b/doc/development/database/database_migration_pipeline.md
index 06e16b4c7f1..a9d525e2a41 100644
--- a/doc/development/database/database_migration_pipeline.md
+++ b/doc/development/database/database_migration_pipeline.md
@@ -9,13 +9,13 @@ info: To determine the technical writer assigned to the Stage/Group associated w
> [Introduced](https://gitlab.com/gitlab-org/database-team/team-tasks/-/issues/171) in GitLab 14.2.
With the [automated migration testing pipeline](https://gitlab.com/gitlab-org/database-team/gitlab-com-database-testing)
-we can automatically test migrations in a production-like environment (similar to `#database-lab`).
+we can automatically test migrations in a production-like environment (using [Database Lab](database_lab.md)).
It is based on an [architecture blueprint](../../architecture/blueprints/database_testing/index.md).
Migration testing is enabled in the [GitLab project](https://gitlab.com/gitlab-org/gitlab)
for changes that add a new database migration. Trigger this job manually by running the
`db:gitlabcom-database-testing` job within in `test` stage. To avoid wasting resources,
-only run this job when your MR is ready for review.
+only run this job when your MR is ready for review. Additionally, ensure that the MR has the "database" label for the pipeline to appear in the test stage.
The job starts a pipeline on the [ops GitLab instance](https://ops.gitlab.net/).
For security reasons, access to the pipeline is restricted to database maintainers.
@@ -73,3 +73,41 @@ Some additional information is included at the bottom of the comment:
migration (ending in `.log`) are available there, and only accessible by
database maintainers or with an access request. Details of the specific
batched background migration batches sampled are also available.
+
+## Test changes to the database testing pipeline
+
+To test a change to the database testing pipeline itself, you need:
+
+1. A merge request against GitLab Org.
+1. The change to be tested must be present on a branch on GitLab Ops.
+
+Use this self-documented script to test a merge request on GitLab Org against an arbitrary branch on GitLab Ops:
+
+```shell
+#! /usr/bin/env bash
+
+# The following must be set on a per-invocation basis:
+TESTING_TRIGGER_TOKEN='[REDACTED]' # Testing trigger token created in the CI section of the project
+CI_COMMIT_REF_NAME='55-post-notice-on-failure' # The branch on ops that you want to run against
+CI_MERGE_REQUEST_IID='117901' # Merge request ID of the MR on gitlab.com that you want to test
+SHA="fed6dd8a58d75a0e053a4972765b4fc08c5814a3" # The commit SHA of the HEAD of the branch you want to test on gitlab-org/gitlab
+
+# The following should not be changed between invocations:
+CI_JOB_URL='https://gitlab.com/gitlab-org/database-team/gitlab-com-database-testing/-/jobs/1590162939'
+# It doesn't appear that CI_JOB_URL has to be set to anything in particular for the pipeline to run
+# successfully, but this would normally be the URL to the upstream job that invokes the DB testing pipeline.
+CI_MERGE_REQUEST_PROJECT_ID='278964' # gitlab-org/gitlab numeric ID. Shouldn't change.
+CI_PROJECT_ID="gitlab-org/gitlab" # The slug identifying gitlab-org/gitlab.
+
+curl --verbose --request POST \
+ --form "token=$TESTING_TRIGGER_TOKEN" \
+ --form "ref=$CI_COMMIT_REF_NAME" \
+ --form "variables[TOP_UPSTREAM_MERGE_REQUEST_IID]=$CI_MERGE_REQUEST_IID" \
+ --form "variables[TOP_UPSTREAM_MERGE_REQUEST_PROJECT_ID]=$CI_MERGE_REQUEST_PROJECT_ID" \
+ --form "variables[TOP_UPSTREAM_SOURCE_JOB]=$CI_JOB_URL" \
+ --form "variables[TOP_UPSTREAM_SOURCE_PROJECT]=$CI_PROJECT_ID" \
+ --form "variables[VALIDATION_PIPELINE]=true" \
+ --form "variables[GITLAB_COMMIT_SHA]=$SHA" \
+ --form "variables[TRIGGER_SOURCE]=$CI_JOB_URL" \
+ "https://ops.gitlab.net/api/v4/projects/429/trigger/pipeline"
+```
diff --git a/doc/development/database/database_reviewer_guidelines.md b/doc/development/database/database_reviewer_guidelines.md
index aa92134018d..933bbe9c060 100644
--- a/doc/development/database/database_reviewer_guidelines.md
+++ b/doc/development/database/database_reviewer_guidelines.md
@@ -53,9 +53,8 @@ that require a more in-depth discussion between the database reviewers and maint
- [Database Office Hours Agenda](https://docs.google.com/document/d/1wgfmVL30F8SdMg-9yY6Y8djPSxWNvKmhR5XmsvYX1EI/edit).
- <i class="fa fa-youtube-play youtube" aria-hidden="true"></i> [YouTube playlist with past recordings](https://www.youtube.com/playlist?list=PL05JrBw4t0Kp-kqXeiF7fF7cFYaKtdqXM).
-You should also join the [#database-lab](understanding_explain_plans.md#database-lab-engine)
-Slack channel and get familiar with how to use Joe, the Slackbot that provides developers
-with their own clone of the production database.
+Get familiar with using [Database Lab from postgres.ai](database_lab.md), a bot that
+provides developers with their own clone of the production database.
Understanding and efficiently using `EXPLAIN` plans is at the core of the database review process.
The following guides provide a quick introduction and links to follow on more advanced topics:
diff --git a/doc/development/database/efficient_in_operator_queries.md b/doc/development/database/efficient_in_operator_queries.md
index 78b310ae708..a770dfe6531 100644
--- a/doc/development/database/efficient_in_operator_queries.md
+++ b/doc/development/database/efficient_in_operator_queries.md
@@ -433,7 +433,7 @@ construct the following table:
For the `issue_types` query we can construct a value list without querying a table:
```ruby
-value_list = Arel::Nodes::ValuesList.new([[Issue.issue_types[:incident]],[Issue.issue_types[:test_case]]])
+value_list = Arel::Nodes::ValuesList.new([[WorkItems::Type.base_types[:incident]],[WorkItems::Type.base_types[:test_case]]])
issue_type_values = Arel::Nodes::Grouping.new(value_list).as('issue_type_values (value)').to_sql
array_scope = Group
diff --git a/doc/development/database/index.md b/doc/development/database/index.md
index 8f22eaac496..f532e054849 100644
--- a/doc/development/database/index.md
+++ b/doc/development/database/index.md
@@ -4,7 +4,7 @@ group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
---
-# Database guides
+# Database development guidelines
## Database Reviews
@@ -64,6 +64,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
- [Hash indexes](hash_indexes.md)
- [Insert into tables in batches](insert_into_tables_in_batches.md)
- [Iterating tables in batches](iterating_tables_in_batches.md)
+- [Load balancing](load_balancing.md)
- [`NOT NULL` constraints](not_null_constraints.md)
- [Ordering table columns](ordering_table_columns.md)
- [Pagination guidelines](pagination_guidelines.md)
@@ -109,6 +110,8 @@ including the major methods:
- [Introduction](clickhouse/index.md)
- [Optimizing query execution](clickhouse/optimization.md)
- [Rebuild GitLab features using ClickHouse 1: Activity data](clickhouse/gitlab_activity_data.md)
+- [Rebuild GitLab features using ClickHouse 2: Merge Request analytics](clickhouse/merge_request_analytics.md)
+- [Tiered Storage in ClickHouse](clickhouse/tiered_storage.md)
## Miscellaneous
diff --git a/doc/development/database/iterating_tables_in_batches.md b/doc/development/database/iterating_tables_in_batches.md
index 6357bed8b00..a927242e8d8 100644
--- a/doc/development/database/iterating_tables_in_batches.md
+++ b/doc/development/database/iterating_tables_in_batches.md
@@ -44,9 +44,13 @@ all of the arguments that `in_batches` supports. You should always use
## Iterating over non-unique columns
-One should proceed with extra caution. When you iterate over an attribute that is not unique,
-even with the applied max batch size, there is no guarantee that the resulting batches do not
-surpass it. The following snippet demonstrates this situation when one attempt to select
+You should not use the `each_batch` method with a non-unique column (in the context of the relation) as it
+[may result in an infinite loop](https://gitlab.com/gitlab-org/gitlab/-/issues/285097).
+Additionally, the inconsistent batch sizes cause performance issues when you
+iterate over non-unique columns. Even when you apply a max batch size
+when iterating over an attribute, there's no guarantee that the resulting
+batches don't surpass it. The following snippet demonstrates this situation
+when you attempt to select
`Ci::Build` entries for users with `id` between `1` and `10,000`, the database returns
`1 215 178` matching rows.
@@ -465,6 +469,58 @@ Issue.each_batch(of: 1000) do |relation|
end
```
+### Counting records
+
+For tables with a large amount of data, counting records through queries can result
+in timeouts. The `EachBatch` module provides an alternative way to iteratively count
+records. The downside of using `each_batch` is the extra count query which is executed
+on the yielded relation object.
+
+The `each_batch_count` method is a more efficient approach that eliminates the need
+for the extra count query. By invoking this method, the iteration process can be
+paused and resumed as needed. This feature is particularly useful in situations
+where error budget violations are triggered after five minutes, such as when performing
+counting operations within Sidekiq workers.
+
+To illustrate, counting records using `EachBatch` involves invoking an additional
+count query as follows:
+
+```ruby
+count = 0
+
+Issue.each_batch do |relation|
+ count += relation.count
+end
+
+puts count
+```
+
+On the other hand, the `each_batch_count` method enables the counting process to be
+performed more efficiently (counting is part of the iteration query) without invoking
+an extra count query:
+
+```ruby
+count, _last_value = Issue.each_batch_count # last value can be ignored here
+```
+
+Furthermore, the `each_batch_count` method allows the counting process to be paused
+and resumed at any point. This capability is demonstrated in the following code snippet:
+
+```ruby
+stop_at = Time.current + 3.minutes
+
+count, last_value = Issue.each_batch_count do
+ Time.current > stop_at # condition for stopping the counting
+end
+
+# Continue the counting later
+stop_at = Time.current + 3.minutes
+
+count, last_value = Issue.each_batch_count(last_count: count, last_value: last_value) do
+ Time.current > stop_at
+end
+```
+
### `EachBatch` vs `BatchCount`
When adding new counters for Service Ping, the preferred way to count records is using the
diff --git a/doc/development/database/load_balancing.md b/doc/development/database/load_balancing.md
new file mode 100644
index 00000000000..f623ad1eab0
--- /dev/null
+++ b/doc/development/database/load_balancing.md
@@ -0,0 +1,59 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
+---
+
+# Database load balancing
+
+With database load balancing, read-only queries can be distributed across multiple
+PostgreSQL nodes to increase performance.
+
+This documentation provides a technical overview on how database load balancing
+is implemented in GitLab Rails and Sidekiq.
+
+## Nomenclature
+
+1. **Host**: Each database host. It could be a primary or a replica.
+1. **Primary**: Primary PostgreSQL host that is used for write-only and read-and-write operations.
+1. **Replica**: Secondary PostgreSQL hosts that are used for read-only operations.
+1. **Workload**: a Rails request or a Sidekiq job that requires database connections.
+
+## Components
+
+F few Ruby classes are involved in the load balancing process. All of them are
+in the namespace `Gitlab::Database::LoadBalancing`:
+
+1. `Host`
+1. `LoadBalancer`
+1. `ConnectionProxy`
+1. `Session`
+
+Each workload begins with a new instance of `Gitlab::Database::LoadBalancing::Session`.
+The `Session` keeps track of the database operations that have been performed. It then
+determines if the workload requires a connection to either the primary host or a replica host.
+
+When the workload requires a database connection through `ActiveRecord`,
+`ConnectionProxy` first redirects the connection request to `LoadBalancer`.
+`ConnectionProxy` requests either a `read` or `read_write` connection from the `LoadBalancer`
+depending on a few criteria:
+
+1. Whether the query is a read-only or it requires write.
+1. Whether the `Session` has recorded a write operation previously.
+1. Whether any special blocks have been used to prefer primary or replica, such as:
+ - `use_primary`
+ - `ignore_writes`
+ - `use_replicas_for_read_queries`
+ - `fallback_to_replicas_for_ambiguous_queries`
+
+`LoadBalancer` then yields the requested connection from the respective database connection pool.
+It yields either:
+
+- A `read_write` connection from the primary's connection pool.
+- A `read` connection from the replicas' connection pools.
+
+When responding to a request for a `read` connection, `LoadBalancer` would
+first attempt to load balance the connection across the replica hosts.
+It looks for the next `online` replica host and yields a connection from the host's connection pool.
+A replica host is considered `online` if it is up-to-date with the primary, based on
+either the replication lag size or time. The thresholds for these requirements are configurable.
diff --git a/doc/development/database/loose_foreign_keys.md b/doc/development/database/loose_foreign_keys.md
index daa022a3de2..91a22d8c26b 100644
--- a/doc/development/database/loose_foreign_keys.md
+++ b/doc/development/database/loose_foreign_keys.md
@@ -64,7 +64,7 @@ The tool ensures that all aspects of swapping a foreign key are covered. This in
- Creating a migration to remove a foreign key.
- Updating `db/structure.sql` with the new migration.
-- Updating `lib/gitlab/database/gitlab_loose_foreign_keys.yml` to add the new loose foreign key.
+- Updating `config/gitlab_loose_foreign_keys.yml` to add the new loose foreign key.
- Creating or updating a model's specs to ensure that the loose foreign key is properly supported.
The tool is located at `scripts/decomposition/generate-loose-foreign-key`:
diff --git a/doc/development/database/multiple_databases.md b/doc/development/database/multiple_databases.md
index d22e3209096..6adfdc90cf2 100644
--- a/doc/development/database/multiple_databases.md
+++ b/doc/development/database/multiple_databases.md
@@ -1,6 +1,6 @@
---
stage: Data Stores
-group: Pods
+group: Tenant Scale
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
---
@@ -545,7 +545,7 @@ end
```
Don't hesitate to reach out to the
-[pods group](https://about.gitlab.com/handbook/engineering/development/enablement/data_stores/pods/)
+[Pods group](https://about.gitlab.com/handbook/engineering/development/enablement/data_stores/tenant-scale/)
for advice.
##### Avoid `dependent: :nullify` and `dependent: :destroy` across databases
@@ -580,6 +580,24 @@ or records that point to nowhere, which might lead to bugs. As such we created
["loose foreign keys"](loose_foreign_keys.md) which is an asynchronous
process of cleaning up orphaned records.
+## Testing for multiple databases
+
+In our testing CI pipelines, we test GitLab by default with multiple databases set up, using
+both `main` and `ci` databases. But in merge requests, for example when we modify some database-related code or
+add the label `~"pipeline:run-single-db"` to the MR, we additionally run our tests in
+[two other database modes](../pipelines/index.md#single-database-testing):
+`single-db` and `single-db-ci-connection`.
+
+To handle situations where our tests need to run in specific database modes, we have some RSpec helpers
+to limit the modes where tests can run, and skip them on any other modes.
+
+| Helper name | Test runs |
+|---------------------------------------------| --- |
+| `skip_if_shared_database(:ci)` | On **multiple databases** |
+| `skip_if_database_exists(:ci)` | On **single-db** and **single-db-ci-connection** |
+| `skip_if_multiple_databases_are_setup(:ci)` | Only on **single-db** |
+| `skip_if_multiple_databases_not_setup(:ci)` | On **single-db-ci-connection** and **multiple databases** |
+
## Locking writes on the tables that don't belong to the database schemas
When the CI database is promoted and the two databases are fully split,
diff --git a/doc/development/database/query_performance.md b/doc/development/database/query_performance.md
index 73a6a40f801..10ab726940a 100644
--- a/doc/development/database/query_performance.md
+++ b/doc/development/database/query_performance.md
@@ -44,7 +44,7 @@ automatically includes these options.
If you are making a warm cache query, you see only the `shared hits`.
-For example in #database-lab:
+For example, using [Database Lab](database_lab.md):
```plaintext
Shared buffers:
@@ -60,7 +60,7 @@ Buffers: shared hit=7323
If the cache is cold, you also see `reads`.
-In #database-lab:
+Using [Database Lab](database_lab.md):
```plaintext
Shared buffers:
diff --git a/doc/development/database/required_stops.md b/doc/development/database/required_stops.md
index 46fabb5c1b4..e4f66f4424f 100644
--- a/doc/development/database/required_stops.md
+++ b/doc/development/database/required_stops.md
@@ -11,6 +11,62 @@ disruptive effect on customers. Before adding a required stop, consider if any
alternative approaches exist to avoid a required stop. Sometimes a required
stop is unavoidable. In those cases, follow the instructions below.
+## Common scenarios that require stops
+
+### Long running migrations being finalized
+
+If a migration takes a long time, it could cause a large number of customers to encounter timeouts
+during upgrades. The increased support volume may cause us to introduce a required stop. While any
+background migration may cause these issues with particularly large customers, we typically only
+introduce stops when the impact is widespread.
+
+- **Cause:** When an upgrade takes more than an hour, omnibus times out.
+- **Mitigation:** Schedule finalization for the first minor version after the next required stop.
+
+### Improperly finalized background migrations
+
+You may need to introduce a required stop for mitigation when:
+
+- A background migration is not finalized, and
+- A migration is written that depends on that background migration.
+
+- **Cause:** The dependent migration may fail if the background migration is incomplete.
+- **Mitigation:** Ensure that all background migrations are finalized before authoring dependent migrations.
+
+### Remove a migration
+
+If a migration is removed, you may need to introduce a required stop to ensure customers
+don't miss the required change.
+
+- **Cause:** Dependent migrations may fail, or the application may not function, because a required
+ migration was removed.
+- **Mitigation:** Ensure migrations are only removed after they've been a part of a planned
+ required stop.
+
+### A migration timestamp is very old
+
+If a migration timestamp is very old (> 3 weeks, or after a before the last stop),
+these scenarios may cause issues:
+
+- If the migration depends on another migration with a newer timestamp but introduced in a
+ previous release _after_ a required stop, then the new migration may run sequentially sooner
+ than the prerequisite migration, and thus fail.
+- If the migration timestamp ID is before the last, it may be inadvertently squashed when the
+ team squashes other migrations from the required stop.
+
+- **Cause:** The migration may fail if it depends on a migration with a later timestamp introduced
+ in an earlier version. Or, the migration may be inadvertently squashed after a required stop.
+- **Mitigation:** Aim for migration timestamps to fall inside the release dates and be sure that
+ they are not dated prior to the last required stop.
+
+### Bugs in migration related tooling
+
+In a few circumstances, bugs in migration related tooling has required us to introduce stops. While we aim
+to prevent these in testing, sometimes they happen.
+
+- **Cause:** There have been a few different causes where we recognized these too late.
+- **Mitigation:** Typically we try to backport fixes for migrations, but in some cases this is not possible.
+
## Before the required stop is released
Before releasing a known required stop, complete these steps. If the required stop
diff --git a/doc/development/database/strings_and_the_text_data_type.md b/doc/development/database/strings_and_the_text_data_type.md
index 47e89c1ce0f..5cd325bfa56 100644
--- a/doc/development/database/strings_and_the_text_data_type.md
+++ b/doc/development/database/strings_and_the_text_data_type.md
@@ -68,9 +68,17 @@ is held for a brief amount of time, the time `add_column` needs to complete its
depending on how frequently the table is accessed. For example, acquiring an exclusive lock for a very
frequently accessed table may take minutes in GitLab.com and requires the use of `with_lock_retries`.
-For these reasons, it is advised to add the text limit on a separate migration than the `add_column` one.
+When adding a text limit, transactions must be disabled with `disable_ddl_transaction!`. This means adding the column is not rolled back
+in case the migration fails afterwards. An attempt to re-run the migration will raise an error because of the already existing column.
-For example, consider a migration that adds a new text column `extended_title` to table `sprints`,
+For these reasons, adding a text column to an existing table can be done by either:
+
+- [Add the column and limit in separate migrations.](#add-the-column-and-limit-in-separate-migrations)
+- [Add the column and limit in one migration with checking if the column already exists.](#add-the-column-and-limit-in-one-migration-with-checking-if-the-column-already-exists)
+
+### Add the column and limit in separate migrations
+
+Consider a migration that adds a new text column `extended_title` to table `sprints`,
`db/migrate/20200501000001_add_extended_title_to_sprints.rb`:
```ruby
@@ -103,6 +111,33 @@ class AddTextLimitToSprintsExtendedTitle < Gitlab::Database::Migration[2.1]
end
```
+### Add the column and limit in one migration with checking if the column already exists
+
+Consider a migration that adds a new text column `extended_title` to table `sprints`,
+`db/migrate/20200501000001_add_extended_title_to_sprints.rb`:
+
+```ruby
+class AddExtendedTitleToSprints < Gitlab::Database::Migration[2.1]
+ disable_ddl_transaction!
+
+ def up
+ with_lock_retries do
+ add_column :sprints, :extended_title, :text, if_not_exists: true
+ end
+
+ add_text_limit :sprints, :extended_title, 512
+ end
+
+ def down
+ remove_text_limit :sprints, :extended_title
+
+ with_lock_retries do
+ remove_column :sprints, :extended_title, if_exists: true
+ end
+ end
+end
+```
+
## Add a text limit constraint to an existing column
Adding text limits to existing database columns requires multiple steps split into at least two different releases:
diff --git a/doc/development/database/table_partitioning.md b/doc/development/database/table_partitioning.md
index 0d5e3c233f6..88b2ccbc6a2 100644
--- a/doc/development/database/table_partitioning.md
+++ b/doc/development/database/table_partitioning.md
@@ -6,6 +6,13 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Database table partitioning
+WARNING:
+If you have questions not answered below, check for and add them
+to [this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/398650).
+Tag `@gitlab-org/database-team/triage` and we'll get back to you with an
+answer as soon as possible. If you get an answer in Slack, document
+it on the issue as well so we can update this document in the future.
+
Table partitioning is a powerful database feature that allows a table's
data to be split into smaller physical tables that act as a single large
table. If the application is designed to work with partitioning in mind,
@@ -32,31 +39,38 @@ several releases. Due to the limitations of partitioning and the related
migrations, you should understand how partitioning fits your use case
before attempting to leverage this feature.
-## Determining when to use partitioning
+## Determine when to use partitioning
While partitioning can be very useful when properly applied, it's
imperative to identify if the data and workload of a table naturally fit a
-partitioning scheme. There are a few details you have to understand
-to decide if partitioning is a good fit for your particular
-problem.
-
-First, a table is partitioned on a partition key, which is a column or
-set of columns which determine how the data is split across the
-partitions. The partition key is used by the database when reading or
-writing data, to decide which partitions must be accessed. The
-partition key should be a column that would be included in a `WHERE`
-clause on almost all queries accessing that table.
-
-Second, it's necessary to understand the strategy the database uses
-to split the data across the partitions. The scheme supported by the
-GitLab migration helpers is date-range partitioning, where each partition
-in the table contains data for a single month. In this case, the partitioning
-key must be a timestamp or date column. In order for this type of
+partitioning scheme. Understand a few details to decide if partitioning
+is a good fit for your particular problem:
+
+- **Table partitioning**. A table is partitioned on a partition key, which is a
+ column or set of columns which determine how the data is split across the
+ partitions. The partition key is used by the database when reading or
+ writing data, to decide which partitions must be accessed. The
+ partition key should be a column that would be included in a `WHERE`
+ clause on almost all queries accessing that table.
+
+- **How the data is split**. What strategy does the database use
+ to split the data across the partitions? The available choices are `range`,
+ `hash`, and `list`.
+
+## Determine the appropriate partitioning strategy
+
+The available partitioning strategy choices are `range`, `hash`, and `list`.
+
+### Range partitioning
+
+The scheme best supported by the GitLab migration helpers is date-range partitioning,
+where each partition in the table contains data for a single month. In this case,
+the partitioning key must be a timestamp or date column. For this type of
partitioning to work well, most queries must access data in a
certain date range.
-For a more concrete example, the `audit_events` table can be used, which
-was the first table to be partitioned in the application database
+For a more concrete example, consider using the `audit_events` table.
+It was the first table to be partitioned in the application database
(scheduled for deployment with the GitLab 13.5 release). This
table tracks audit entries of security events that happen in the
application. In almost all cases, users want to see audit activity that
@@ -142,6 +156,31 @@ substantial. Partitioning should only be leveraged if the access patterns
of the data support the partitioning strategy, otherwise performance
suffers.
+### Hash Partitioning
+
+Hash partitioning splits a logical table into a series of partitioned
+tables. Each partition corresponds to the ID range that matches
+a hash and remainder. For example, if partitioning `BY HASH(id)`, rows
+with `hash(id) % 64 == 1` would end up in the partition
+`WITH (MODULUS 64, REMAINDER 1)`.
+
+When hash partitioning, you must include a `WHERE hashed_column = ?` condition in
+every performance-sensitive query issued by the application. If this is not possible,
+hash partitioning may not be the correct fit for your use case.
+
+Hash partitioning has one main advantage: it is the only type of partitioning that
+can enforce uniqueness on a single numeric `id` column. (While also possible with
+range partitioning, it's rarely the correct choice).
+
+Hash partitioning has downsides:
+
+- The number of partitions must be known up-front.
+- It's difficult to move new data to an extra partition if current partitions become too large.
+- Range queries, such as `WHERE id BETWEEN ? and ?`, are unsupported.
+- Lookups by other keys, such as `WHERE other_id = ?`, are unsupported.
+
+For this reason, it's often best to choose a large number of hash partitions to accommodate future table growth.
+
## Partitioning a table (Range)
Unfortunately, tables can only be partitioned at their creation, making
@@ -203,6 +242,10 @@ Continuing the above example, the migration would look like:
class BackfillPartitionAuditEvents < Gitlab::Database::Migration[2.1]
include Gitlab::Database::PartitioningMigrationHelpers
+ disable_ddl_transaction!
+
+ restrict_gitlab_migration gitlab_schema: :gitlab_main
+
def up
enqueue_partitioning_data_migration :audit_events
end
@@ -213,17 +256,12 @@ class BackfillPartitionAuditEvents < Gitlab::Database::Migration[2.1]
end
```
-This step uses the same mechanism as any background migration, so you
-may want to read the [Background Migration](background_migrations.md)
-guide for details on that process. Background jobs are scheduled every
-2 minutes and copy `50_000` records at a time, which can be used to
-estimate the timing of the background migration portion of the
-partitioning migration.
+This step [queues a batched background migration](batched_background_migrations.md#queueing) internally with BATCH_SIZE and SUB_BATCH_SIZE as `50,000` and `2,500`. Refer [Batched Background migrations guide](batched_background_migrations.md) for more details.
### Step 3: Post-backfill cleanup (Release N+1)
-The third step must occur at least one release after the release that
-includes the background migration. This gives time for the background
+This step must occur at least one release after the release that
+includes step (2). This gives time for the background
migration to execute properly in self-managed installations. In this step,
add another post-deployment migration that cleans up after the
background migration. This includes forcing any remaining jobs to
@@ -236,6 +274,10 @@ Once again, continuing the example, this migration would look like:
class CleanupPartitionedAuditEventsBackfill < Gitlab::Database::Migration[2.1]
include Gitlab::Database::PartitioningMigrationHelpers
+ disable_ddl_transaction!
+
+ restrict_gitlab_migration gitlab_schema: :gitlab_main
+
def up
finalize_backfilling_partitioned_table :audit_events
end
@@ -246,16 +288,57 @@ class CleanupPartitionedAuditEventsBackfill < Gitlab::Database::Migration[2.1]
end
```
-After this migration has completed, the original table and partitioned
+After this migration completes, the original table and partitioned
table should contain identical data. The trigger installed on the
original table guarantees that the data remains in sync going forward.
### Step 4: Swap the partitioned and non-partitioned tables (Release N+1)
-The final step of the migration makes the partitioned table ready
-for use by the application. This section will be updated when the
-migration helper is ready, for now development can be followed in the
-[Tracking Issue](https://gitlab.com/gitlab-org/gitlab/-/issues/241267).
+This step replaces the non-partitioned table with its partitioned copy, this should be used only after all other migration steps have completed successfully.
+
+Some limitations to this method MUST be handled before, or during, the swap migration:
+
+- Secondary indexes and foreign keys are not automatically recreated on the partitioned table.
+- Some types of constraints (UNIQUE and EXCLUDE) which rely on indexes, are not automatically recreated
+ on the partitioned table, since the underlying index will not be present.
+- Foreign keys referencing the original non-partitioned table should be updated to reference the
+ partitioned table. This is not supported in PostgreSQL 11.
+- Views referencing the original table are not automatically updated to reference the partitioned table.
+
+```ruby
+# frozen_string_literal: true
+
+class SwapPartitionedAuditEvents < ActiveRecord::Migration[6.0]
+ include Gitlab::Database::PartitioningMigrationHelpers
+
+ def up
+ replace_with_partitioned_table :audit_events
+ end
+
+ def down
+ rollback_replace_with_partitioned_table :audit_events
+ end
+end
+```
+
+After this migration completes:
+
+- The partitioned table replaces the non-partitioned (original) table.
+- The sync trigger created earlier is dropped.
+
+The partitioned table is now ready for use by the application.
+
+## Partitioning a table (Hash)
+
+Hash partitioning divides data into partitions based on a hash of their ID.
+It works well only if most queries against the table include a clause like `WHERE id = ?`,
+so that PostgreSQL can decide which partition to look in based on the ID or ids being requested.
+
+Another key downside is that hash partitioning does not allow adding additional partitions after table creation.
+The correct number of partitions must be chosen up-front.
+
+Hash partitioning is the only type of partitioning (aside from some complex uses of list partitioning) that can guarantee
+uniqueness of an ID across multiple partitions at the database level.
## Partitioning a table (List)
@@ -504,3 +587,48 @@ class Model < ApplicationRecord
self.sequence_name = 'model_id_seq'
end
```
+
+If the partitioning constraint migration takes [more than 10 minutes](../migration_style_guide.md#how-long-a-migration-should-take) to finish,
+it can be made to run asynchronously to avoid running the post-migration during busy hours.
+
+Prepend the following migration `AsyncPrepareTableConstraintsForListPartitioning`
+and use `async: true` option. This change marks the partitioning constraint as `NOT VALID`
+and enqueues a scheduled job to validate the existing data in the table during the weekend.
+
+Then the second post-migration `PrepareTableConstraintsForListPartitioning` only
+marks the partitioning constraint as validated, because the existing data is already
+tested during the previous weekend.
+
+For example:
+
+```ruby
+class AsyncPrepareTableConstraintsForListPartitioning < Gitlab::Database::Migration[2.1]
+ include Gitlab::Database::PartitioningMigrationHelpers::TableManagementHelpers
+
+ disable_ddl_transaction!
+
+ TABLE_NAME = :table_name
+ PARENT_TABLE_NAME = :p_table_name
+ FIRST_PARTITION = 100
+ PARTITION_COLUMN = :partition_id
+
+ def up
+ prepare_constraint_for_list_partitioning(
+ table_name: TABLE_NAME,
+ partitioning_column: PARTITION_COLUMN,
+ parent_table_name: PARENT_TABLE_NAME,
+ initial_partitioning_value: FIRST_PARTITION,
+ async: true
+ )
+ end
+
+ def down
+ revert_preparing_constraint_for_list_partitioning(
+ table_name: TABLE_NAME,
+ partitioning_column: PARTITION_COLUMN,
+ parent_table_name: PARENT_TABLE_NAME,
+ initial_partitioning_value: FIRST_PARTITION
+ )
+ end
+end
+```
diff --git a/doc/development/database/transaction_guidelines.md b/doc/development/database/transaction_guidelines.md
index 26bb6c2ce8f..1648cf58937 100644
--- a/doc/development/database/transaction_guidelines.md
+++ b/doc/development/database/transaction_guidelines.md
@@ -12,7 +12,7 @@ For further reference, check PostgreSQL documentation about [transactions](https
## Database decomposition and sharding
-The [Pods Group](https://about.gitlab.com/handbook/engineering/development/enablement/data_stores/pods/) plans
+The [Pods group](https://about.gitlab.com/handbook/engineering/development/enablement/data_stores/tenant-scale/) plans
to split the main GitLab database and move some of the database tables to other database servers.
We start decomposing the `ci_*`-related database tables first. To maintain the current application
diff --git a/doc/development/database/understanding_explain_plans.md b/doc/development/database/understanding_explain_plans.md
index 094bd6b346f..560744430f9 100644
--- a/doc/development/database/understanding_explain_plans.md
+++ b/doc/development/database/understanding_explain_plans.md
@@ -714,8 +714,7 @@ SQL optimization tool - [Joe Bot](https://gitlab.com/postgres-ai/joe).
Database Lab Engine provides developers with their own clone of the production database, while Joe Bot helps with exploring execution plans.
-Joe Bot is available in the [`#database-lab`](https://gitlab.slack.com/archives/CLJMDRD8C) channel on Slack,
-and through its [web interface](https://console.postgres.ai/gitlab/joe-instances).
+Joe Bot is available through its [web interface](https://console.postgres.ai/gitlab/joe-instances).
With Joe Bot you can execute DDL statements (like creating indexes, tables, and columns) and get query plans for `SELECT`, `UPDATE`, and `DELETE` statements.
@@ -792,34 +791,6 @@ Planning time: 0.411 ms
Execution time: 0.113 ms
```
-### ChatOps
-
-GitLab team members can also use our ChatOps solution, available in Slack
-using the [`/chatops` slash command](../chatops_on_gitlabcom.md).
-
-NOTE:
-While ChatOps is still available, the recommended way to generate execution plans is to use [Database Lab Engine](#database-lab-engine).
-
-You can use ChatOps to get a query plan by running the following:
-
-```sql
-/chatops run explain SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
-```
-
-Visualising the plan using <https://explain.depesz.com/> is also supported:
-
-```sql
-/chatops run explain --visual SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
-```
-
-Quoting the query is not necessary.
-
-For more information about the available options, run:
-
-```sql
-/chatops run explain --help
-```
-
## Further reading
A more extensive guide on understanding query plans can be found in