Add latest changes from gitlab-org/gitlab@16-1-stable-eev16.1.0-rc42

author: GitLab Bot <gitlab-bot@gitlab.com> 2023-06-20 13:43:29 +0300
committer: GitLab Bot <gitlab-bot@gitlab.com> 2023-06-20 13:43:29 +0300
commit: 3b1af5cc7ed2666ff18b718ce5d30fa5a2756674 (patch)
tree: 3bc4a40e0ee51ec27eabf917c537033c0c5b14d4 /doc/development/database
parent: 9bba14be3f2c211bf79e15769cd9b77bc73a13bc (diff)
6 files changed, 160 insertions, 66 deletions
diff --git a/doc/development/database/adding_database_indexes.md b/doc/development/database/adding_database_indexes.md
index 7b29b1b14de..23a12413975 100644
--- a/doc/development/database/adding_database_indexes.md
+++ b/doc/development/database/adding_database_indexes.md
@@ -429,7 +429,7 @@ Use the asynchronous index helpers on your local environment to test changes for
 For very large tables, index destruction can be a challenge to manage.
 While `remove_concurrent_index` removes indexes in a way that does not block
 ordinary traffic, it can still be problematic if index destruction runs for
-during `autovacuum`. Necessary database operations like `autovacuum` cannot run, and
+many hours. Necessary database operations like `autovacuum` cannot run, and
 the deployment process on GitLab.com is blocked while waiting for index
 destruction to finish.
 
diff --git a/doc/development/database/batched_background_migrations.md b/doc/development/database/batched_background_migrations.md
index 6a6b43e52a0..3e54a78757a 100644
--- a/doc/development/database/batched_background_migrations.md
+++ b/doc/development/database/batched_background_migrations.md
@@ -42,7 +42,159 @@ Background migrations can help when:
 - You should use the [generator](#generator) to create batched background migrations,
   so that required files are created by default.
 
-## Isolation
+## How it works
+
+Batched background migrations (BBM) are subclasses of
+`Gitlab::BackgroundMigration::BatchedMigrationJob` that define a `perform` method.
+As the first step, a regular migration creates a `batched_background_migrations`
+record with the BBM class and the required arguments. By default,
+`batched_background_migrations` is in an active state, and those are picked up
+by the Sidekiq worker to execute the actual batched migration.
+
+All migration classes must be defined in the namespace `Gitlab::BackgroundMigration`. Place the files
+in the directory `lib/gitlab/background_migration/`.
+
+### Execution mechanism
+
+Batched background migrations are picked from the queue in the order they are enqueued. Multiple migrations are fetched
+and executed in parallel, as long they are in active state and do not target the same database table.
+The default number of migrations processed in parallel is 2, for GitLab.com this limit is configured to 4.
+Once migration is picked for execution, a job is created for the specific batch. After each job execution, migration's
+batch size may be increased or decreased, based on the performance of the last 20 jobs.
+
+```plantuml
+@startuml
+hide empty description
+skinparam ConditionEndStyle hline
+left to right direction
+rectangle "Batched Background Migration Queue" as migrations {
+  rectangle "Migration N (active)" as migrationn
+  rectangle "Migration 1 (completed)" as migration1
+  rectangle "Migration 2 (active)" as migration2
+  rectangle "Migration 3 (on hold)" as migration3
+  rectangle "Migration 4 (active)" as migration4
+  migration1 -[hidden]> migration2
+  migration2 -[hidden]> migration3
+  migration3 -[hidden]> migration4
+  migration4 -[hidden]> migrationn
+}
+rectangle "Execution Workers" as workers {
+ rectangle "Execution Worker 1 (busy)" as worker1
+ rectangle "Execution Worker 2 (available)" as worker2
+ worker1 -[hidden]> worker2
+}
+migration2 --> [Scheduling Worker]
+migration4 --> [Scheduling Worker]
+[Scheduling Worker] --> worker2
+@enduml
+```
+
+Soon as a worker is available, the BBM is processed by the runner.
+
+```plantuml
+@startuml
+hide empty description
+start
+rectangle Runner {
+  :Migration;
+  if (Have reached batching bounds?) then (Yes)
+    if (Have jobs to retry?) then (Yes)
+      :Fetch the batched job;
+    else (No)
+      :Finish active migration;
+      stop
+    endif
+  else (No)
+    :Create a batched job;
+  endif
+  :Execute batched job;
+  :Evaluate DB health;
+  note right: Checks for table autovacuum, Patroni Apdex, Write-ahead logging
+  if (Evaluation signs to stop?) then (Yes)
+    :Put migration on hold;
+  else (No)
+    :Optimize migration;
+  endif
+}
+@enduml
+```
+
+### Idempotence
+
+Batched background migrations are executed in a context of a Sidekiq process.
+The usual Sidekiq rules apply, especially the rule that jobs should be small
+and idempotent. Make sure that in case that your migration job is retried, data
+integrity is guaranteed.
+
+See [Sidekiq best practices guidelines](https://github.com/mperham/sidekiq/wiki/Best-Practices)
+for more details.
+
+### Migration optimization
+
+After each job execution, a verification takes place to check if the migration can be optimized.
+The optimization underlying mechanic is based on the concept of time efficiency. It calculates
+the exponential moving average of time efficiencies for the last N jobs and updates the batch
+size of the batched background migration to its optimal value.
+
+### Job retry mechanism
+
+The batched background migrations retry mechanism ensures that a job is executed again in case of failure.
+The following diagram shows the different stages of our retry mechanism:
+
+```plantuml
+@startuml
+hide empty description
+note as N1
+  can_split?:
+  the failure is due to a query timeout
+end note
+    [*] --> Running
+Running --> Failed
+note on link
+  if number of retries <= MAX_ATTEMPTS
+end note
+Running --> Succeeded
+Failed --> Running
+note on link
+  if number of retries > MAX_ATTEMPTS
+  and can_split? == true
+  then two jobs with smaller
+  batch size will be created
+end note
+Failed --> [*]
+Succeeded --> [*]
+@enduml
+```
+
+- `MAX_ATTEMPTS` is defined in the [`Gitlab::Database::BackgroundMigration`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/database/background_migration/batched_job.rb)
+  class.
+- `can_split?` is defined in the [`Gitlab::Database::BatchedJob`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/background_migration/batched_job.rb) class.
+
+### Failed batched background migrations
+
+The whole batched background migration is marked as `failed`
+(`/chatops run batched_background_migrations status MIGRATION_ID` shows
+the migration as `failed`) if any of the following is true:
+
+- There are no more jobs to consume, and there are failed jobs.
+- More than [half of the jobs failed since the background migration was started](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/database/background_migration/batched_migration.rb#L160).
+
+### Throttling batched migrations
+
+Because batched migrations are update heavy and there were few incidents in the past because of the heavy load from migrations while the database was underperforming, a throttling mechanism exists to mitigate them.
+
+These database indicators are checked to throttle a migration. On getting a
+stop signal, the migration is paused for a set time (10 minutes):
+
+- WAL queue pending archival crossing a threshold.
+- Active autovacuum on the tables on which the migration works on.
+- Patroni apdex SLI dropping below the SLO.
+
+It's an ongoing effort to add more indicators to further enhance the
+database health check framework. For more details, see
+[epic 7594](https://gitlab.com/groups/gitlab-org/-/epics/7594).
+
+### Isolation
 
 Batched background migrations must be isolated and cannot use application code (for example,
 models defined in `app/models` except the `ApplicationRecord` classes).
@@ -96,16 +248,6 @@ ApplicationRecord.connection.execute("SELECT * FROM projects")
 ActiveRecord::Base.connection.execute("SELECT * FROM projects")
 ```
 
-## Idempotence
-
-Batched background migrations are executed in a context of a Sidekiq process.
-The usual Sidekiq rules apply, especially the rule that jobs should be small
-and idempotent. Make sure that in case that your migration job is retried, data
-integrity is guaranteed.
-
-See [Sidekiq best practices guidelines](https://github.com/mperham/sidekiq/wiki/Best-Practices)
-for more details.
-
 ## Batched background migrations for EE-only features
 
 All the background migration classes for EE-only features should be present in GitLab FOSS.
@@ -118,12 +260,6 @@ Background migration classes for EE-only features that use job arguments should
 in the GitLab FOSS class. This is required to prevent job arguments validation from failing when
 migration is scheduled in GitLab FOSS context.
 
-Batched Background migrations are simple classes that define a `perform` method. A
-Sidekiq worker then executes such a class, passing any arguments to it. All
-migration classes must be defined in the namespace
-`Gitlab::BackgroundMigration`. Place the files in the directory
-`lib/gitlab/background_migration/`.
-
 ## Queueing
 
 Queueing a batched background migration should be done in a post-deployment
@@ -148,49 +284,6 @@ Make sure the newly-created data is either migrated, or
 saved in both the old and new version upon creation. Removals in
 turn can be handled by defining foreign keys with cascading deletes.
 
-### Job retry mechanism
-
-The batched background migrations retry mechanism ensures that a job is executed again in case of failure.
-The following diagram shows the different stages of our retry mechanism:
-
-```plantuml
-@startuml
-hide empty description
-note as N1
-  can_split?:
-  the failure is due to a query timeout
-end note
-[*] --> Running
-Running --> Failed
-note on link
-  if number of retries <= MAX_ATTEMPTS
-end note
-Running --> Succeeded
-Failed --> Running
-note on link
-  if number of retries > MAX_ATTEMPTS
-  and can_split? == true
-  then two jobs with smaller
-  batch size will be created
-end note
-Failed --> [*]
-Succeeded --> [*]
-@enduml
-```
-
-- `MAX_ATTEMPTS` is defined in the [`Gitlab::Database::BackgroundMigration`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/database/background_migration/batched_job.rb)
-class.
-- `can_split?` is defined in the [`Gitlab::Database::BatchedJob`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/background_migration/batched_job.rb) class.
-
-### Failed batched background migrations
-
-The whole batched background migration is marked as `failed`
-(`/chatops run batched_background_migrations status MIGRATION_ID` will show
-the migration as `failed`) if any of the following are true:
-
-- There are no more jobs to consume, and there are failed jobs.
-- More than [half of the jobs failed since the background migration was started](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/database/background_migration/batched_migration.rb).
-
 ### Requeuing batched background migrations
 
 If one of the batched background migrations contains a bug that is fixed in a patch
@@ -831,6 +924,7 @@ Let's assume that a batched background migration failed on a particular batch on
 Fortunately you can leverage our [database migration pipeline](database_migration_pipeline.md) to rerun a particular batch with additional logging and/or fix to see if it solves the problem.
 
 <!-- vale gitlab.Substitutions = NO -->
+
 For an example see [Draft: Test PG::CardinalityViolation fix](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/110910) but make sure to read the entire section.
 
 To do that, you need to:
@@ -872,7 +966,7 @@ end
 
 #### 3. Apply a workaround for our migration helpers (optional)
 
-If your batched background migration touches tables from a schema other than the one you specified by using `restrict_gitlab_migration` helper (example: the scheduling migration has `restrict_gitlab_migration gitlab_schema: :gitlab_main` but the background job uses tables from the `:gitlab_ci` schema) then the migration will fail. To prevent that from happening you'll have to monkey patch database helpers so they don't fail the testing pipeline job:
+If your batched background migration touches tables from a schema other than the one you specified by using `restrict_gitlab_migration` helper (example: the scheduling migration has `restrict_gitlab_migration gitlab_schema: :gitlab_main` but the background job uses tables from the `:gitlab_ci` schema) then the migration will fail. To prevent that from happening you must to monkey patch database helpers so they don't fail the testing pipeline job:
 
 1. Add the schema names to [`RestrictGitlabSchema`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb#L57)
 
diff --git a/doc/development/database/clickhouse/index.md b/doc/development/database/clickhouse/index.md
index 032e4f5f6ee..8ca6240e0f1 100644
--- a/doc/development/database/clickhouse/index.md
+++ b/doc/development/database/clickhouse/index.md
@@ -117,7 +117,7 @@ Files: `config.xml`
 | Topic | Security Requirement | Reason |
 | ----- | -------------------- | ------ |
 | Permissions | ClickHouse runs by default with the `clickhouse` user. Running as `root` is never needed. Use the principle of least privileges for the folders: `/etc/clickhouse-server`, `/var/lib/clickhouse`, `/var/log/clickhouse-server`. These folders must belong to the `clickhouse` user and group, and no other system user must have access to them. | Default passwords, ports and rules are "open doors". ([Fail securely & use secure defaults](https://about.gitlab.com/handbook/security/architecture/#fail-securely--use-secure-defaults) principle) |
-| Encryption | Use an encrypted storage for logs and data if RED data is processed. On Kubernetes, the [StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/) used must be encrypted. | Encrypt data at rest. ([Defense in depth](https://about.gitlab.com/handbook/security/architecture/#implement-defense-in-depth)) |
+| Encryption | Use an encrypted storage for logs and data if RED data is processed. On Kubernetes, the [StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/) used must be encrypted. [GKE](https://cloud.google.com/blog/products/containers-kubernetes/exploring-container-security-use-your-own-keys-to-protect-your-data-on-gke) and [EKS](https://aws.github.io/aws-eks-best-practices/security/docs/data/) encrypt all data at rest already. In this case, using your own key is best but not required. | Encrypt data at rest. ([Defense in depth](https://about.gitlab.com/handbook/security/architecture/#implement-defense-in-depth)) |
 
 ### Logging
 
diff --git a/doc/development/database/database_dictionary.md b/doc/development/database/database_dictionary.md
index 84b76ddc34c..70691d8746c 100644
--- a/doc/development/database/database_dictionary.md
+++ b/doc/development/database/database_dictionary.md
@@ -18,7 +18,7 @@ For the `geo` database, the dictionary files are stored under `ee/db/geo/docs/`.
 ## Example dictionary file
 
 ```yaml
-----
+---
 table_name: terraform_states
 classes:
 - Terraform::State
diff --git a/doc/development/database/foreign_keys.md b/doc/development/database/foreign_keys.md
index 25b3d815d7a..5dda3dd55a3 100644
--- a/doc/development/database/foreign_keys.md
+++ b/doc/development/database/foreign_keys.md
@@ -195,5 +195,5 @@ end
 ```
 
 Using a foreign key as primary key saves space but can make
-[batch counting](../service_ping/implement.md#batch-counters) in [Service Ping](../service_ping/index.md) less efficient.
+[batch counting](../internal_analytics/service_ping/implement.md#batch-counters) in [Service Ping](../service_ping/index.md) less efficient.
 Consider using a regular `id` column if the table is relevant for Service Ping.
diff --git a/doc/development/database/query_performance.md b/doc/development/database/query_performance.md
index 10ab726940a..77067e2979d 100644
--- a/doc/development/database/query_performance.md
+++ b/doc/development/database/query_performance.md
@@ -22,7 +22,7 @@ When you are optimizing your SQL queries, there are two dimensions to pay attent
 | Concurrent operations in a migration      | `5min`             | Concurrent operations do not block the database, but they block the GitLab update. This includes operations such as `add_concurrent_index` and `add_concurrent_foreign_key`.                                                                                                                                               |
 | Concurrent operations in a post migration | `20min`            | Concurrent operations do not block the database, but they block the GitLab post update process. This includes operations such as `add_concurrent_index` and `add_concurrent_foreign_key`. If index creation exceeds 20 minutes, consider [async index creation](adding_database_indexes.md#create-indexes-asynchronously). |
 | Background migrations                     | `1s`               |                                                                                                                                                                                                                                                                                                                            |
-| Service Ping                              | `1s`               | See the [Service Ping docs](../service_ping/implement.md) for more details.                                                                                                                                                                                                                                                |
+| Service Ping                              | `1s`               | See the [Service Ping docs](../internal_analytics/service_ping/implement.md) for more details.                                                                                                                                                                                                                                                |
 
 - When analyzing your query's performance, pay attention to if the time you are seeing is on a [cold or warm cache](#cold-and-warm-cache). These guidelines apply for both cache types.
 - When working with batched queries, change the range and batch size to see how it effects the query timing and caching.
author	GitLab Bot <gitlab-bot@gitlab.com>	2023-06-20 13:43:29 +0300
committer	GitLab Bot <gitlab-bot@gitlab.com>	2023-06-20 13:43:29 +0300
commit	3b1af5cc7ed2666ff18b718ce5d30fa5a2756674 (patch)
tree	3bc4a40e0ee51ec27eabf917c537033c0c5b14d4 /doc/development/database
parent	9bba14be3f2c211bf79e15769cd9b77bc73a13bc (diff)