Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2022-06-20 14:10:13 +0300
committerGitLab Bot <gitlab-bot@gitlab.com>2022-06-20 14:10:13 +0300
commit0ea3fcec397b69815975647f5e2aa5fe944a8486 (patch)
tree7979381b89d26011bcf9bdc989a40fcc2f1ed4ff /doc/development/database
parent72123183a20411a36d607d70b12d57c484394c8e (diff)
Add latest changes from gitlab-org/gitlab@15-1-stable-eev15.1.0-rc42
Diffstat (limited to 'doc/development/database')
-rw-r--r--doc/development/database/add_foreign_key_to_existing_column.md2
-rw-r--r--doc/development/database/avoiding_downtime_in_migrations.md21
-rw-r--r--doc/development/database/background_migrations.md32
-rw-r--r--doc/development/database/batched_background_migrations.md39
-rw-r--r--doc/development/database/client_side_connection_pool.md2
-rw-r--r--doc/development/database/constraint_naming_convention.md2
-rw-r--r--doc/development/database/database_lab.md2
-rw-r--r--doc/development/database/database_migration_pipeline.md2
-rw-r--r--doc/development/database/database_reviewer_guidelines.md4
-rw-r--r--doc/development/database/dbcheck-migrations-job.md2
-rw-r--r--doc/development/database/deleting_migrations.md2
-rw-r--r--doc/development/database/efficient_in_operator_queries.md46
-rw-r--r--doc/development/database/index.md2
-rw-r--r--doc/development/database/keyset_pagination.md4
-rw-r--r--doc/development/database/layout_and_access_patterns.md2
-rw-r--r--doc/development/database/loose_foreign_keys.md58
-rw-r--r--doc/development/database/maintenance_operations.md2
-rw-r--r--doc/development/database/migrations_for_multiple_databases.md35
-rw-r--r--doc/development/database/multiple_databases.md34
-rw-r--r--doc/development/database/not_null_constraints.md18
-rw-r--r--doc/development/database/pagination_guidelines.md36
-rw-r--r--doc/development/database/pagination_performance_guidelines.md30
-rw-r--r--doc/development/database/post_deployment_migrations.md2
-rw-r--r--doc/development/database/rename_database_tables.md4
-rw-r--r--doc/development/database/setting_multiple_values.md2
-rw-r--r--doc/development/database/strings_and_the_text_data_type.md18
-rw-r--r--doc/development/database/table_partitioning.md51
-rw-r--r--doc/development/database/transaction_guidelines.md10
28 files changed, 260 insertions, 204 deletions
diff --git a/doc/development/database/add_foreign_key_to_existing_column.md b/doc/development/database/add_foreign_key_to_existing_column.md
index bfd455ef9da..9842814816f 100644
--- a/doc/development/database/add_foreign_key_to_existing_column.md
+++ b/doc/development/database/add_foreign_key_to_existing_column.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
diff --git a/doc/development/database/avoiding_downtime_in_migrations.md b/doc/development/database/avoiding_downtime_in_migrations.md
index 3cf9ab1ab5c..2d079656e23 100644
--- a/doc/development/database/avoiding_downtime_in_migrations.md
+++ b/doc/development/database/avoiding_downtime_in_migrations.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
@@ -15,7 +15,7 @@ requiring downtime.
## Dropping Columns
Removing columns is tricky because running GitLab processes may still be using
-the columns. To work around this safely, you will need three steps in three releases:
+the columns. To work around this safely, you need three steps in three releases:
1. Ignoring the column (release M)
1. Dropping the column (release M+1)
@@ -77,7 +77,7 @@ bundle exec rails g post_deployment_migration remove_users_updated_at_column
There are two scenarios that you need to consider
to write a migration that removes a column:
-#### A. The removed column has no indexes or constraints that belong to it
+#### A. The removed column has no indexes or constraints that belong to it
In this case, a **transactional migration** can be used. Something as simple as:
@@ -170,12 +170,12 @@ class RenameUsersUpdatedAtToUpdatedAtTimestamp < Gitlab::Database::Migration[1.0
end
```
-This will take care of renaming the column, ensuring data stays in sync, and
+This takes care of renaming the column, ensuring data stays in sync, and
copying over indexes and foreign keys.
If a column contains one or more indexes that don't contain the name of the
-original column, the previously described procedure will fail. In that case,
-you'll first need to rename these indexes.
+original column, the previously described procedure fails. In that case,
+you need to rename these indexes.
### Step 2: Add A Post-Deployment Migration
@@ -270,7 +270,7 @@ And that's it, we're done!
Some type changes require casting data to a new type. For example when changing from `text` to `jsonb`.
In this case, use the `type_cast_function` option.
-Make sure there is no bad data and the cast will always succeed. You can also provide a custom function that handles
+Make sure there is no bad data and the cast always succeeds. You can also provide a custom function that handles
casting errors.
Example migration:
@@ -291,8 +291,9 @@ They can also produce a lot of pressure on the database due to it rapidly
updating many rows in sequence.
To reduce database pressure you should instead use a background migration
-when migrating a column in a large table (for example, `issues`). This will
-spread the work / load over a longer time period, without slowing down deployments.
+when migrating a column in a large table (for example, `issues`). Background
+migrations spread the work / load over a longer time period, without slowing
+down deployments.
For more information, see [the documentation on cleaning up background
migrations](background_migrations.md#cleaning-up).
@@ -533,7 +534,7 @@ step approach:
Usually this works, but not always. For example, if a field's format is to be
changed from JSON to something else we have a bit of a problem. If we were to
-change existing data before deploying application code we'll most likely run
+change existing data before deploying application code we would most likely run
into errors. On the other hand, if we were to migrate after deploying the
application code we could run into the same problems.
diff --git a/doc/development/database/background_migrations.md b/doc/development/database/background_migrations.md
index 80ba0336bda..0124dbae51f 100644
--- a/doc/development/database/background_migrations.md
+++ b/doc/development/database/background_migrations.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
@@ -65,7 +65,7 @@ and idempotent.
See [Sidekiq best practices guidelines](https://github.com/mperham/sidekiq/wiki/Best-Practices)
for more details.
-Make sure that in case that your migration job is going to be retried data
+Make sure that in case that your migration job is retried, data
integrity is guaranteed.
## Background migrations for EE-only features
@@ -77,7 +77,7 @@ as explained in the [guidelines for implementing Enterprise Edition features](..
## How It Works
Background migrations are simple classes that define a `perform` method. A
-Sidekiq worker will then execute such a class, passing any arguments to it. All
+Sidekiq worker then executes such a class, passing any arguments to it. All
migration classes must be defined in the namespace
`Gitlab::BackgroundMigration`, the files should be placed in the directory
`lib/gitlab/background_migration/`.
@@ -100,13 +100,13 @@ to automatically split the job into batches:
```ruby
queue_background_migration_jobs_by_range_at_intervals(
ClassName,
- BackgroundMigrationClassName,
+ 'BackgroundMigrationClassName',
2.minutes,
batch_size: 10_000
)
```
-You'll also need to make sure that newly created data is either migrated, or
+You also need to make sure that newly created data is either migrated, or
saved in both the old and new version upon creation. For complex and time
consuming migrations it's best to schedule a background job using an
`after_create` hook so this doesn't affect response timings. The same applies to
@@ -142,7 +142,7 @@ or minor release, you _must not_ do this in a patch release.
Because background migrations can take a long time you can't immediately clean
things up after scheduling them. For example, you can't drop a column that's
used in the migration process as this would cause jobs to fail. This means that
-you'll need to add a separate _post deployment_ migration in a future release
+you need to add a separate _post deployment_ migration in a future release
that finishes any remaining jobs before cleaning things up (for example, removing a
column).
@@ -189,7 +189,7 @@ extract the `url` key from this JSON object and store it in the `integrations.ur
column. There are millions of integrations and parsing JSON is slow, thus you can't
do this in a regular migration.
-To do this using a background migration we'll start with defining our migration
+To do this using a background migration we start with defining our migration
class:
```ruby
@@ -213,7 +213,7 @@ class Gitlab::BackgroundMigration::ExtractIntegrationsUrl
end
```
-Next we'll need to adjust our code so we schedule the above migration for newly
+Next we need to adjust our code so we schedule the above migration for newly
created and updated integrations. We can do this using something along the lines of
the following:
@@ -232,7 +232,7 @@ We're using `after_commit` here to ensure the Sidekiq job is not scheduled
before the transaction completes as doing so can lead to race conditions where
the changes are not yet visible to the worker.
-Next we'll need a post-deployment migration that schedules the migration for
+Next we need a post-deployment migration that schedules the migration for
existing data.
```ruby
@@ -254,11 +254,11 @@ class ScheduleExtractIntegrationsUrl < Gitlab::Database::Migration[1.0]
end
```
-Once deployed our application will continue using the data as before but at the
-same time will ensure that both existing and new data is migrated.
+After deployed our application continues using the data as before, but at the
+same time ensures that both existing and new data is migrated.
In the next release we can remove the `after_commit` hooks and related code. We
-will also need to add a post-deployment migration that consumes any remaining
+also need to add a post-deployment migration that consumes any remaining
jobs and manually run on any un-migrated rows. Such a migration would look like
this:
@@ -292,7 +292,7 @@ If the application does not depend on the data being 100% migrated (for
instance, the data is advisory, and not mission-critical), then this final step
can be skipped.
-This migration will then process any jobs for the ExtractIntegrationsUrl migration
+This migration then processes any jobs for the `ExtractIntegrationsUrl` migration
and continue once all jobs have been processed. Once done you can safely remove
the `integrations.properties` column.
@@ -325,13 +325,13 @@ for more details.
1. Make sure that tests you write are not false positives.
1. Make sure that if the data being migrated is critical and cannot be lost, the
clean-up migration also checks the final state of the data before completing.
-1. When migrating many columns, make sure it won't generate too many
+1. When migrating many columns, make sure it does not generate too many
dead tuples in the process (you may need to directly query the number of dead tuples
and adjust the scheduling according to this piece of data).
1. Make sure to discuss the numbers with a database specialist, the migration may add
more pressure on DB than you expect (measure on staging,
or ask someone to measure on production).
-1. Make sure to know how much time it'll take to run all scheduled migrations.
+1. Make sure to know how much time it takes to run all scheduled migrations.
1. Provide an estimation section in the description, estimating both the total migration
run time and the query times for each background migration job. Explain plans for each query
should also be provided.
@@ -503,6 +503,6 @@ View the production Sidekiq log and filter for:
- `json.meta.caller_id: <MyBackgroundMigrationSchedulingMigrationClassName>`
- `json.args: <MyBackgroundMigrationClassName>`
-Looking at the `json.error_class`, `json.error_message` and `json.error_backtrace` values may be helpful in understanding why the jobs failed.
+Looking at the `json.exception.class`, `json.exception.message`, `json.exception.backtrace`, and `json.exception.sql` values may be helpful in understanding why the jobs failed.
Depending on when and how the failure occurred, you may find other helpful information by filtering with `json.class: <MyBackgroundMigrationClassName>`.
diff --git a/doc/development/database/batched_background_migrations.md b/doc/development/database/batched_background_migrations.md
index 3a0fa77eff9..6d3d5fa7f92 100644
--- a/doc/development/database/batched_background_migrations.md
+++ b/doc/development/database/batched_background_migrations.md
@@ -1,6 +1,6 @@
---
type: reference, dev
-stage: Enablement
+stage: Data Stores
group: Database
info: "See the Technical Writers assigned to Development Guidelines: https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments-to-development-guidelines"
---
@@ -152,9 +152,7 @@ When you start the second post-deployment migration, delete the
previously batched migration with the provided code:
```ruby
-Gitlab::Database::BackgroundMigration::BatchedMigration
- .for_configuration(MIGRATION_NAME, TABLE_NAME, COLUMN, JOB_ARGUMENTS)
- .delete_all
+delete_batched_background_migration(MIGRATION_NAME, TABLE_NAME, COLUMN, JOB_ARGUMENTS)
```
## Cleaning up
@@ -192,7 +190,7 @@ data to be in the new format.
The `routes` table has a `source_type` field that's used for a polymorphic relationship.
As part of a database redesign, we're removing the polymorphic relationship. One step of
-the work will be migrating data from the `source_id` column into a new singular foreign key.
+the work is migrating data from the `source_id` column into a new singular foreign key.
Because we intend to delete old rows later, there's no need to update them as part of the
background migration.
@@ -221,9 +219,9 @@ background migration.
NOTE:
Job classes must be subclasses of `BatchedMigrationJob` to be
correctly handled by the batched migration framework. Any subclass of
- `BatchedMigrationJob` will be initialized with necessary arguments to
+ `BatchedMigrationJob` is initialized with necessary arguments to
execute the batch, as well as a connection to the tracking database.
- Additional `job_arguments` set on the migration will be passed to the
+ Additional `job_arguments` set on the migration are passed to the
job's `perform` method.
1. Add a new trigger to the database to update newly created and updated routes,
@@ -245,12 +243,14 @@ background migration.
1. Create a post-deployment migration that queues the migration for existing data:
```ruby
- class QueueBackfillRoutesNamespaceId < Gitlab::Database::Migration[1.0]
+ class QueueBackfillRoutesNamespaceId < Gitlab::Database::Migration[2.0]
disable_ddl_transaction!
MIGRATION = 'BackfillRouteNamespaceId'
DELAY_INTERVAL = 2.minutes
+ restrict_gitlab_migration gitlab_schema: :gitlab_main
+
def up
queue_batched_background_migration(
MIGRATION,
@@ -261,12 +261,19 @@ background migration.
end
def down
- Gitlab::Database::BackgroundMigration::BatchedMigration
- .for_configuration(MIGRATION, :routes, :id, []).delete_all
+ delete_batched_background_migration(MIGRATION, :routes, :id, [])
end
end
```
+ NOTE:
+ When queuing a batched background migration, you need to restrict
+ the schema to the database where you make the actual changes.
+ In this case, we are updating `routes` records, so we set
+ `restrict_gitlab_migration gitlab_schema: :gitlab_main`. If, however,
+ you need to perform a CI data migration, you would set
+ `restrict_gitlab_migration gitlab_schema: :gitlab_ci`.
+
After deployment, our application:
- Continues using the data as before.
- Ensures that both existing and new data are migrated.
@@ -275,16 +282,19 @@ background migration.
that checks that the batched background migration is completed. For example:
```ruby
- class FinalizeBackfillRouteNamespaceId < Gitlab::Database::Migration[1.0]
+ class FinalizeBackfillRouteNamespaceId < Gitlab::Database::Migration[2.0]
MIGRATION = 'BackfillRouteNamespaceId'
disable_ddl_transaction!
+ restrict_gitlab_migration gitlab_schema: :gitlab_main
+
def up
ensure_batched_background_migration_is_finished(
job_class_name: MIGRATION,
table_name: :routes,
column_name: :id,
- job_arguments: []
+ job_arguments: [],
+ finalize: true
)
end
@@ -294,6 +304,11 @@ background migration.
end
```
+ NOTE:
+ If the batched background migration is not finished, the system will
+ execute the batched background migration inline. If you don't want
+ to see this behavior, you need to pass `finalize: false`.
+
If the application does not depend on the data being 100% migrated (for
instance, the data is advisory, and not mission-critical), then you can skip this
final step. This step confirms that the migration is completed, and all of the rows were migrated.
diff --git a/doc/development/database/client_side_connection_pool.md b/doc/development/database/client_side_connection_pool.md
index 60c8665df87..dc52a551407 100644
--- a/doc/development/database/client_side_connection_pool.md
+++ b/doc/development/database/client_side_connection_pool.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
diff --git a/doc/development/database/constraint_naming_convention.md b/doc/development/database/constraint_naming_convention.md
index a22ddc1551c..72f16c20559 100644
--- a/doc/development/database/constraint_naming_convention.md
+++ b/doc/development/database/constraint_naming_convention.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
diff --git a/doc/development/database/database_lab.md b/doc/development/database/database_lab.md
index 1c8694b113d..5346df2690d 100644
--- a/doc/development/database/database_lab.md
+++ b/doc/development/database/database_lab.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
diff --git a/doc/development/database/database_migration_pipeline.md b/doc/development/database/database_migration_pipeline.md
index ce7e1801abc..496bd09bf1d 100644
--- a/doc/development/database/database_migration_pipeline.md
+++ b/doc/development/database/database_migration_pipeline.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
diff --git a/doc/development/database/database_reviewer_guidelines.md b/doc/development/database/database_reviewer_guidelines.md
index ca9ca36b156..b6bbfe690c1 100644
--- a/doc/development/database/database_reviewer_guidelines.md
+++ b/doc/development/database/database_reviewer_guidelines.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
@@ -47,7 +47,7 @@ As a database reviewer, join the internal `#database` Slack channel and ask ques
database related issues with other database reviewers and maintainers.
There is also an optional database office hours call held bi-weekly, alternating between
-European/US and APAC friendly hours. You can join the office hours call and bring topics
+European/US and Asia-Pacific (APAC) friendly hours. You can join the office hours call and bring topics
that require a more in-depth discussion between the database reviewers and maintainers:
- [Database Office Hours Agenda](https://docs.google.com/document/d/1wgfmVL30F8SdMg-9yY6Y8djPSxWNvKmhR5XmsvYX1EI/edit).
diff --git a/doc/development/database/dbcheck-migrations-job.md b/doc/development/database/dbcheck-migrations-job.md
index af72e28a875..49f8b183272 100644
--- a/doc/development/database/dbcheck-migrations-job.md
+++ b/doc/development/database/dbcheck-migrations-job.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
diff --git a/doc/development/database/deleting_migrations.md b/doc/development/database/deleting_migrations.md
index be9009f365d..8354cb62d0c 100644
--- a/doc/development/database/deleting_migrations.md
+++ b/doc/development/database/deleting_migrations.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
diff --git a/doc/development/database/efficient_in_operator_queries.md b/doc/development/database/efficient_in_operator_queries.md
index 2503be826ea..a2481577e8c 100644
--- a/doc/development/database/efficient_in_operator_queries.md
+++ b/doc/development/database/efficient_in_operator_queries.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
@@ -26,7 +26,7 @@ Pagination may be used to fetch subsequent records.
Example tasks requiring querying nested domain objects from the group level:
- Show first 20 issues by creation date or due date from the group `gitlab-org`.
-- Show first 20 merge_requests by merged at date from the group `gitlab-com`.
+- Show first 20 merge requests by merged at date from the group `gitlab-com`.
Unfortunately, ordered group-level queries typically perform badly
as their executions require heavy I/O, memory, and computations.
@@ -163,7 +163,7 @@ The technique can only optimize `IN` queries that satisfy the following requirem
(the combination of the columns uniquely identifies one particular column in the table).
WARNING:
-This technique will not improve the performance of the `COUNT(*)` queries.
+This technique does not improve the performance of the `COUNT(*)` queries.
## The `InOperatorOptimization` module
@@ -183,7 +183,7 @@ in `Gitlab::Pagination::Keyset::InOperatorOptimization`.
### Basic usage of `QueryBuilder`
-To illustrate a basic usage, we will build a query that
+To illustrate a basic usage, we build a query that
fetches 20 issues with the oldest `created_at` from the group `gitlab-org`.
The following ActiveRecord query would produce a query similar to
@@ -226,10 +226,10 @@ Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
the order by column expressions is available for locating the record. In this example, the
yielded values are `created_at` and `id` SQL expressions. Finding a record is very fast via the
primary key, so we don't use the `created_at` value. Providing the `finder_query` lambda is optional.
- If it's not given, the IN operator optimization will only make the ORDER BY columns available to
+ If it's not given, the `IN` operator optimization only makes the `ORDER BY` columns available to
the end-user and not the full database row.
- If it's not given, the IN operator optimization will only make the ORDER BY columns available to
+ If it's not given, the `IN` operator optimization only makes the `ORDER BY` columns available to
the end-user and not the full database row.
The following database index on the `issues` table must be present
@@ -416,7 +416,7 @@ scope = Issue
.limit(20)
```
-To construct the array scope, we'll need to take the Cartesian product of the `project_id IN` and
+To construct the array scope, we need to take the Cartesian product of the `project_id IN` and
the `issue_type IN` queries. `issue_type` is an ActiveRecord enum, so we need to
construct the following table:
@@ -589,7 +589,7 @@ LIMIT 20
NOTE:
To make the query efficient, the following columns need to be covered with an index: `project_id`, `issue_type`, `created_at`, and `id`.
-#### Using calculated ORDER BY expression
+#### Using calculated `ORDER BY` expression
The following example orders epic records by the duration between the creation time and closed
time. It is calculated with the following formula:
@@ -766,7 +766,7 @@ using the generalized `IN` optimization technique.
### Array CTE
-As the first step, we use a common table expression (CTE) for collecting the `projects.id` values.
+As the first step, we use a Common Table Expression (CTE) for collecting the `projects.id` values.
This is done by wrapping the incoming `array_scope` ActiveRecord relation parameter with a CTE.
```sql
@@ -792,7 +792,7 @@ This query produces the following result set with only one column (`projects.id`
### Array mapping
For each project (that is, each record storing a project ID in `array_cte`),
-we will fetch the cursor value identifying the first issue respecting the `ORDER BY` clause.
+we fetch the cursor value identifying the first issue respecting the `ORDER BY` clause.
As an example, let's pick the first record `ID=9` from `array_cte`.
The following query should fetch the cursor value `(created_at, id)` identifying
@@ -805,7 +805,7 @@ ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
LIMIT 1;
```
-We will use `LATERAL JOIN` to loop over the records in the `array_cte` and find the
+We use `LATERAL JOIN` to loop over the records in the `array_cte` and find the
cursor value for each project. The query would be built using the `array_mapping_scope` lambda
function.
@@ -854,11 +854,11 @@ The table shows the cursor values (`created_at, id`) of the first record for eac
respecting the `ORDER BY` clause.
At this point, we have the initial data. To start collecting the actual records from the database,
-we'll use a recursive CTE query where each recursion locates one row until
+we use a recursive CTE query where each recursion locates one row until
the `LIMIT` is reached or no more data can be found.
-Here's an outline of the steps we will take in the recursive CTE query
-(expressing the steps in SQL is non-trivial but will be explained next):
+Here's an outline of the steps we take in the recursive CTE query
+(expressing the steps in SQL is non-trivial but is explained next):
1. Sort the initial resultset according to the `ORDER BY` clause.
1. Pick the top cursor to fetch the record, this is our first record. In the example,
@@ -877,7 +877,7 @@ this cursor would be (`2020-01-05`, `3`) for `project_id=9`.
### Initializing the recursive CTE query
-For the initial recursive query, we'll need to produce exactly one row, we call this the
+For the initial recursive query, we need to produce exactly one row, we call this the
initializer query (`initializer_query`).
Use `ARRAY_AGG` function to compact the initial result set into a single row
@@ -994,7 +994,7 @@ After this, the recursion starts again by finding the next lowest cursor value.
### Finalizing the query
-For producing the final `issues` rows, we're going to wrap the query with another `SELECT` statement:
+For producing the final `issues` rows, we wrap the query with another `SELECT` statement:
```sql
SELECT "issues".*
@@ -1031,17 +1031,17 @@ Optimized `IN` query:
| issue lookup query | 519 | 20 | 10 000 |
The group and project queries are not using sorting, the necessary columns are read from database
-indexes. These values are accessed frequently so it's very likely that most of the data will be
+indexes. These values are accessed frequently so it's very likely that most of the data is
in the PostgreSQL's buffer cache.
-The optimized `IN` query will read maximum 519 entries (cursor values) from the index:
+The optimized `IN` query reads maximum 519 entries (cursor values) from the index:
- 500 index-only scans for populating the arrays for each project. The cursor values of the first
-record will be here.
+record is here.
- Maximum 19 additional index-only scans for the consecutive records.
-The optimized `IN` query will sort the array (cursor values per project array) 20 times, which
-means we'll sort 20 x 500 rows. However, this might be a less memory-intensive task than
+The optimized `IN` query sorts the array (cursor values per project array) 20 times, which
+means we sort 20 x 500 rows. However, this might be a less memory-intensive task than
sorting 10 000 rows at once.
Performance comparison for the `gitlab-org` group:
@@ -1053,5 +1053,5 @@ Performance comparison for the `gitlab-org` group:
NOTE:
Before taking measurements, the group lookup query was executed separately in order to make
-the group data available in the buffer cache. Since it's a frequently called query, it's going to
-hit many shared buffers during the query execution in the production environment.
+the group data available in the buffer cache. Since it's a frequently called query, it
+hits many shared buffers during the query execution in the production environment.
diff --git a/doc/development/database/index.md b/doc/development/database/index.md
index 0363d13ed4c..b427f54ff3c 100644
--- a/doc/development/database/index.md
+++ b/doc/development/database/index.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
diff --git a/doc/development/database/keyset_pagination.md b/doc/development/database/keyset_pagination.md
index 88928feb927..4aec64b8cce 100644
--- a/doc/development/database/keyset_pagination.md
+++ b/doc/development/database/keyset_pagination.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
@@ -178,7 +178,7 @@ To make keyset pagination work, we must configure custom order objects, to do so
collect information about the order columns:
- `relative_position` can have duplicated values because no unique index is present.
-- `relative_position` can have null values because we don't have a not null constraint on the column. For this, we must determine where we see NULL values, at the beginning of the result set, or the end (`NULLS LAST`).
+- `relative_position` can have null values because we don't have a not null constraint on the column. For this, we must determine where we see `NULL` values, at the beginning of the result set, or the end (`NULLS LAST`).
- Keyset pagination requires distinct order columns, so we must add the primary key (`id`) to make the order distinct.
- Jumping to the last page and paginating backwards actually reverses the `ORDER BY` clause. For this, we must provide the reversed `ORDER BY` clause.
diff --git a/doc/development/database/layout_and_access_patterns.md b/doc/development/database/layout_and_access_patterns.md
index a3e2fefb2a3..99a50b503aa 100644
--- a/doc/development/database/layout_and_access_patterns.md
+++ b/doc/development/database/layout_and_access_patterns.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
diff --git a/doc/development/database/loose_foreign_keys.md b/doc/development/database/loose_foreign_keys.md
index 3db24793f1b..dec51d484fd 100644
--- a/doc/development/database/loose_foreign_keys.md
+++ b/doc/development/database/loose_foreign_keys.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
@@ -255,7 +255,7 @@ When the loose foreign key definition is no longer needed (parent table is remov
we need to remove the definition from the YAML file and ensure that we don't leave pending deleted
records in the database.
-1. Remove the loose foreign key definition from the config (`config/gitlab_loose_foreign_keys.yml`).
+1. Remove the loose foreign key definition from the configuration (`config/gitlab_loose_foreign_keys.yml`).
1. Remove the deletion tracking trigger from the parent table (if the parent table is still there).
1. Remove leftover deleted records from the `loose_foreign_keys_deleted_records` table.
@@ -429,7 +429,7 @@ ALTER TABLE ONLY vulnerability_occurrence_pipelines
In this example we expect to delete all associated `vulnerability_occurrence_pipelines` records
whenever we delete the `ci_pipelines` record associated with them. In this case
you might end up with some vulnerability page in GitLab which shows an occurrence
-of a vulnerability. However, when you try to click a link to the pipeline, you get
+of a vulnerability. However, when you try to select a link to the pipeline, you get
a 404, because the pipeline is deleted. Then, when you navigate back you might find the
occurrence has disappeared too.
@@ -515,13 +515,13 @@ referenced child tables.
### Database structure
The feature relies on triggers installed on the parent tables. When a parent record is deleted,
-the trigger will automatically insert a new record into the `loose_foreign_keys_deleted_records`
+the trigger automatically inserts a new record into the `loose_foreign_keys_deleted_records`
database table.
-The inserted record will store the following information about the deleted record:
+The inserted record stores the following information about the deleted record:
- `fully_qualified_table_name`: name of the database table where the record was located.
-- `primary_key_value`: the ID of the record, the value will be present in the child tables as
+- `primary_key_value`: the ID of the record, the value is present in the child tables as
the foreign key value. At the moment, composite primary keys are not supported, the parent table
must have an `id` column.
- `status`: defaults to pending, represents the status of the cleanup process.
@@ -532,7 +532,7 @@ several runs.
#### Database decomposition
-The `loose_foreign_keys_deleted_records` table will exist on both database servers (Ci and Main)
+The `loose_foreign_keys_deleted_records` table exists on both database servers (`ci` and `main`)
after the [database decomposition](https://gitlab.com/groups/gitlab-org/-/epics/6168). The worker
ill determine which parent tables belong to which database by reading the
`lib/gitlab/database/gitlab_schemas.yml` YAML file.
@@ -547,10 +547,10 @@ Example:
- `ci_builds`
- `ci_pipelines`
-When the worker is invoked for the Ci database, the worker will load deleted records only from the
+When the worker is invoked for the `ci` database, the worker loads deleted records only from the
`ci_builds` and `ci_pipelines` tables. During the cleanup process, `DELETE` and `UPDATE` queries
-will mostly run on tables located in the Main database. In this example, one `UPDATE` query will
-nullify the `merge_requests.head_pipeline_id` column.
+mostly run on tables located in the Main database. In this example, one `UPDATE` query
+nullifies the `merge_requests.head_pipeline_id` column.
#### Database partitioning
@@ -561,7 +561,7 @@ strategy was considered for the feature but due to the large data volume we deci
new strategy.
A deleted record is considered fully processed when all its direct children records have been
-cleaned up. When this happens, the loose foreign key worker will update the `status` column of
+cleaned up. When this happens, the loose foreign key worker updates the `status` column of
the deleted record. After this step, the record is no longer needed.
The sliding partitioning strategy provides an efficient way of cleaning up old, unused data by
@@ -591,7 +591,7 @@ Partitions: gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_84 FOR
```
The `partition` column controls the insert direction, the `partition` value determines which
-partition will get the deleted rows inserted via the trigger. Notice that the default value of
+partition gets the deleted rows inserted via the trigger. Notice that the default value of
the `partition` table matches with the value of the list partition (84). In `INSERT` query
within the trigger the value of the `partition` is omitted, the trigger always relies on the
default value of the column.
@@ -607,20 +607,20 @@ SELECT TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME, old_table.id FROM old_table;
The partition "sliding" process is controlled by two, regularly executed callbacks. These
callbacks are defined within the `LooseForeignKeys::DeletedRecord` model.
-The `next_partition_if` callback controls when to create a new partition. A new partition will
-be created when the current partition has at least one record older than 24 hours. A new partition
+The `next_partition_if` callback controls when to create a new partition. A new partition is
+created when the current partition has at least one record older than 24 hours. A new partition
is added by the [`PartitionManager`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/partitioning/partition_manager.rb)
using the following steps:
1. Create a new partition, where the `VALUE` for the partition is `CURRENT_PARTITION + 1`.
1. Update the default value of the `partition` column to `CURRENT_PARTITION + 1`.
-With these steps, new `INSERT`-s via the triggers will end up in the new partition. At this point,
+With these steps, all new `INSERT` queries via the triggers end up in the new partition. At this point,
the database table has two partitions.
The `detach_partition_if` callback determines if the old partitions can be detached from the table.
A partition is detachable if there are no pending (unprocessed) records in the partition
-(`status = 1`). The detached partitions will be available for some time, you can see the list
+(`status = 1`). The detached partitions are available for some time, you can see the list
detached partitions in the `detached_partitions` table:
```sql
@@ -663,7 +663,7 @@ WHERE ("merge_requests"."id") IN
These queries are batched, which means that in many cases, several invocations are needed to clean
up all associated child records.
-The batching is implemented with loops, the processing will stop when all associated child records
+The batching is implemented with loops, the processing stops when all associated child records
are cleaned up or the limit is reached.
```ruby
@@ -682,14 +682,14 @@ end
The loop-based batch processing is preferred over `EachBatch` for the following reasons:
-- The records in the batch are modified, so the next batch will contain different records.
+- The records in the batch are modified, so the next batch contains different records.
- There is always an index on the foreign key column however, the column is usually not unique.
`EachBatch` requires a unique column for the iteration.
- The record order doesn't matter for the cleanup.
-Notice that we have two loops. The initial loop will process records with the `SKIP LOCKED` clause.
-The query will skip rows that are locked by other application processes. This will ensure that the
-cleanup worker will less likely to become blocked. The second loop will execute the database
+Notice that we have two loops. The initial loop processes records with the `SKIP LOCKED` clause.
+The query skips rows that are locked by other application processes. This ensures that the
+cleanup worker is less likely to become blocked. The second loop executes the database
queries without `SKIP LOCKED` to ensure that all records have been processed.
#### Processing limits
@@ -709,19 +709,19 @@ To mitigate these issues, several limits are applied when the worker runs.
The limit rules are implemented in the `LooseForeignKeys::ModificationTracker` class. When one of
the limits (record modification count, time limit) is reached the processing is stopped
-immediately. After some time, the next scheduled worker will continue the cleanup process.
+immediately. After some time, the next scheduled worker continues the cleanup process.
#### Performance characteristics
-The database trigger on the parent tables will **decrease** the record deletion speed. Each
-statement that removes rows from the parent table will invoke the trigger to insert records
+The database trigger on the parent tables **decreases** the record deletion speed. Each
+statement that removes rows from the parent table invokes the trigger to insert records
into the `loose_foreign_keys_deleted_records` table.
The queries within the cleanup worker are fairly efficient index scans, with limits in place
they're unlikely to affect other parts of the application.
The database queries are not running in transaction, when an error happens for example a statement
-timeout or a worker crash, the next job will continue the processing.
+timeout or a worker crash, the next job continues the processing.
## Troubleshooting
@@ -730,13 +730,13 @@ timeout or a worker crash, the next job will continue the processing.
There can be cases where the workers need to process an unusually large amount of data. This can
happen under normal usage, for example when a large project or group is deleted. In this scenario,
there can be several million rows to be deleted or nullified. Due to the limits enforced by the
-worker, processing this data will take some time.
+worker, processing this data takes some time.
When cleaning up "heavy-hitters", the feature ensures fair processing by rescheduling larger
batches for later. This gives time for other deleted records to be processed.
For example, a project with millions of `ci_builds` records is deleted. The `ci_builds` records
-will be deleted by the loose foreign keys feature.
+is deleted by the loose foreign keys feature.
1. The cleanup worker is scheduled and picks up a batch of deleted `projects` records. The large
project is part of the batch.
@@ -746,7 +746,7 @@ project is part of the batch.
1. Go to step 1. The next cleanup worker continues the cleanup.
1. When the `cleanup_attempts` reaches 3, the batch is re-scheduled 10 minutes later by updating
the `consume_after` column.
-1. The next cleanup worker will process a different batch.
+1. The next cleanup worker processes a different batch.
We have Prometheus metrics in place to monitor the deleted record cleanup:
@@ -812,7 +812,7 @@ runtime.
LooseForeignKeys::CleanupWorker.new.perform
```
-When the cleanup is done, the older partitions will be automatically detached by the
+When the cleanup is done, the older partitions are automatically detached by the
`PartitionManager`.
### PartitionManager bug
diff --git a/doc/development/database/maintenance_operations.md b/doc/development/database/maintenance_operations.md
index 9e7a35531ca..85df185c024 100644
--- a/doc/development/database/maintenance_operations.md
+++ b/doc/development/database/maintenance_operations.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
diff --git a/doc/development/database/migrations_for_multiple_databases.md b/doc/development/database/migrations_for_multiple_databases.md
index ce326a6ce4a..df9607f5672 100644
--- a/doc/development/database/migrations_for_multiple_databases.md
+++ b/doc/development/database/migrations_for_multiple_databases.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
@@ -13,11 +13,6 @@ for [the decomposed GitLab application using multiple databases](https://gitlab.
Learn more about general multiple databases support in a [separate document](multiple_databases.md).
-WARNING:
-If you experience any issues using `Gitlab::Database::Migration[2.0]`,
-you can temporarily revert back to the previous behavior by changing the version to `Gitlab::Database::Migration[1.0]`.
-Please report any issues with `Gitlab::Database::Migration[2.0]` in [this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/358430).
-
The design for multiple databases (except for the Geo database) assumes
that all decomposed databases have **the same structure** (for example, schema), but **the data is different** in each database. This means that some tables do not contain data on each database.
@@ -78,6 +73,30 @@ class AddUserIdAndStateIndexToMergeRequestReviewers < Gitlab::Database::Migratio
end
```
+#### Example: Add a new table to store in a single database
+
+1. Define the [GitLab Schema](multiple_databases.md#gitlab-schema) of the table in [`lib/gitlab/database/gitlab_schemas.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/gitlab_schemas.yml):
+
+ ```yaml
+ ssh_signatures: :gitlab_main
+ ```
+
+1. Create the table in a schema migration:
+
+ ```ruby
+ class CreateSshSignatures < Gitlab::Database::Migration[2.0]
+ def change
+ create_table :ssh_signatures do |t|
+ t.timestamps_with_timezone null: false
+ t.bigint :project_id, null: false, index: true
+ t.bigint :key_id, null: false, index: true
+ t.integer :verification_status, default: 0, null: false, limit: 2
+ t.binary :commit_sha, null: false, index: { unique: true }
+ end
+ end
+ end
+ ```
+
### Data Manipulation Language (DML)
The DML migrations are all migrations that:
@@ -241,7 +260,7 @@ the `database_tasks: false` set. `gitlab:db:validate_config` always runs before
## Validation
-Validation in a nutshell uses [pg_query](https://github.com/pganalyze/pg_query) to analyze
+Validation in a nutshell uses [`pg_query`](https://github.com/pganalyze/pg_query) to analyze
each query and classify tables with information from [`gitlab_schema.yml`](multiple_databases.md#gitlab-schema).
The migration is skipped if the specified `gitlab_schema` is outside of a list of schemas
managed by a given database connection (`Gitlab::Database::gitlab_schemas_for_connection`).
@@ -408,7 +427,7 @@ updating all `ci_pipelines`, you would set
As with all DML migrations, you cannot query another database outside of
`restrict_gitlab_migration` or `gitlab_shared`. If you need to query another database,
-you'll likely need to separate these into two migrations somehow.
+separate the migrations.
Because the actual migration logic (not the queueing step) for background
migrations runs in a Sidekiq worker, the logic can perform DML queries on
diff --git a/doc/development/database/multiple_databases.md b/doc/development/database/multiple_databases.md
index c622d4f50ff..7badd7f76fa 100644
--- a/doc/development/database/multiple_databases.md
+++ b/doc/development/database/multiple_databases.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Sharding
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
@@ -23,7 +23,8 @@ Each table of GitLab needs to have a `gitlab_schema` assigned:
- `gitlab_main`: describes all tables that are being stored in the `main:` database (for example, like `projects`, `users`).
- `gitlab_ci`: describes all CI tables that are being stored in the `ci:` database (for example, `ci_pipelines`, `ci_builds`).
-- `gitlab_shared`: describe all application tables that contain data across all decomposed databases (for example, `loose_foreign_keys_deleted_records`).
+- `gitlab_shared`: describe all application tables that contain data across all decomposed databases (for example, `loose_foreign_keys_deleted_records`) for models that inherit from `Gitlab::Database::SharedModel`.
+- `gitlab_internal`: describe all internal tables of Rails and PostgreSQL (for example, `ar_internal_metadata`, `schema_migrations`, `pg_*`).
- `...`: more schemas to be introduced with additional decomposed databases
The usage of schema enforces the base class to be used:
@@ -44,10 +45,8 @@ This is used as a primary source of classification for:
### The special purpose of `gitlab_shared`
-`gitlab_shared` is a special case describing tables or views that by design contain data across
-all decomposed databases. This does describe application-defined tables (like `loose_foreign_keys_deleted_records`),
-Rails-defined tables (like `schema_migrations` or `ar_internal_metadata` as well as internal PostgreSQL tables
-(for example, `pg_attribute`).
+`gitlab_shared` is a special case that describes tables or views that, by design, contain data across
+all decomposed databases. This classification describes application-defined tables (like `loose_foreign_keys_deleted_records`).
**Be careful** to use `gitlab_shared` as it requires special handling while accessing data.
Since `gitlab_shared` shares not only structure but also data, the application needs to be written in a way
@@ -62,6 +61,11 @@ end
As such, migrations modifying data of `gitlab_shared` tables are expected to run across
all decomposed databases.
+### The special purpose of `gitlab_internal`
+
+`gitlab_internal` describes Rails-defined tables (like `schema_migrations` or `ar_internal_metadata`), as well as internal PostgreSQL tables (for example, `pg_attribute`). Its primary purpose is to [support other databases](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/85842#note_943453682), like Geo, that
+might be missing some of those application-defined `gitlab_shared` tables (like `loose_foreign_keys_deleted_records`), but are valid Rails databases.
+
## Migrations
Read [Migrations for Multiple Databases](migrations_for_multiple_databases.md).
@@ -597,3 +601,21 @@ way to replace cascading deletes so we don't end up with orphaned data
or records that point to nowhere, which might lead to bugs. As such we created
["loose foreign keys"](loose_foreign_keys.md) which is an asynchronous
process of cleaning up orphaned records.
+
+## Locking writes on the tables that don't belong to the database schemas
+
+When the CI database is promoted and the two databases are fully split,
+as an extra safeguard against creating a split brain situation,
+run the Rake task `gitlab:db:lock_writes`. This command locks writes on:
+
+- The `gitlab_main` tables on the CI Database.
+- The `gitlab_ci` tables on the Main Database.
+
+This Rake task adds triggers to all the tables, to prevent any
+`INSERT`, `UPDATE`, `DELETE`, or `TRUNCATE` statements from running
+against the tables that need to be locked.
+
+If this task was run against a GitLab setup that uses only a single database
+for both `gitlab_main` and `gitlab_ci` tables, then no tables will be locked.
+
+To undo the operation, run the opposite Rake task: `gitlab:db:unlock_writes`.
diff --git a/doc/development/database/not_null_constraints.md b/doc/development/database/not_null_constraints.md
index af7d569e282..3962307f80d 100644
--- a/doc/development/database/not_null_constraints.md
+++ b/doc/development/database/not_null_constraints.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
@@ -99,8 +99,8 @@ such records, so we would follow the same process either way.
We first add the `NOT NULL` constraint with a `NOT VALID` parameter, which enforces consistency
when new records are inserted or current records are updated.
-In the example above, the existing epics with a `NULL` description will not be affected and you'll
-still be able to update records in the `epics` table. However, when you try to update or insert
+In the example above, the existing epics with a `NULL` description are not affected and you are
+still able to update records in the `epics` table. However, when you try to update or insert
an epic without providing a description, the constraint causes a database error.
Adding or removing a `NOT NULL` clause requires that any application changes are deployed _first_.
@@ -129,7 +129,7 @@ end
#### Data migration to fix existing records (current release)
The approach here depends on the data volume and the cleanup strategy. The number of records that
-must be fixed on GitLab.com is a nice indicator that will help us decide whether to use a
+must be fixed on GitLab.com is a nice indicator that helps us decide whether to use a
post-deployment migration or a background data migration:
- If the data volume is less than `1000` records, then the data migration can be executed within the post-migration.
@@ -138,7 +138,7 @@ post-deployment migration or a background data migration:
When unsure about which option to use, please contact the Database team for advice.
Back to our example, the epics table is not considerably large nor frequently accessed,
-so we are going to add a post-deployment migration for the 13.0 milestone (current),
+so we add a post-deployment migration for the 13.0 milestone (current),
`db/post_migrate/20200501000002_cleanup_epics_with_null_description.rb`:
```ruby
@@ -173,7 +173,7 @@ end
#### Validate the `NOT NULL` constraint (next release)
-Validating the `NOT NULL` constraint will scan the whole table and make sure that each record is correct.
+Validating the `NOT NULL` constraint scans the whole table and make sure that each record is correct.
Still in our example, for the 13.1 milestone (next), we run the `validate_not_null_constraint`
migration helper in a final post-deployment migration,
@@ -196,11 +196,11 @@ end
## `NOT NULL` constraints on large tables
If you have to clean up a nullable column for a [high-traffic table](../migration_style_guide.md#high-traffic-tables)
-(for example, the `artifacts` in `ci_builds`), your background migration will go on for a while and
-it will need an additional [background migration cleaning up](background_migrations.md#cleaning-up)
+(for example, the `artifacts` in `ci_builds`), your background migration goes on for a while and
+it needs an additional [background migration cleaning up](background_migrations.md#cleaning-up)
in the release after adding the data migration.
-In that rare case you will need 3 releases end-to-end:
+In that rare case you need 3 releases end-to-end:
1. Release `N.M` - Add the `NOT NULL` constraint and the background-migration to fix the existing records.
1. Release `N.M+1` - Cleanup the background migration.
diff --git a/doc/development/database/pagination_guidelines.md b/doc/development/database/pagination_guidelines.md
index 08840124535..1641708ce01 100644
--- a/doc/development/database/pagination_guidelines.md
+++ b/doc/development/database/pagination_guidelines.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
@@ -28,9 +28,9 @@ We have two options for rendering the content:
Rendering long lists can significantly affect both the frontend and backend performance:
-- The database will need to read a lot of data from the disk.
-- The result of the query (records) will eventually be transformed to Ruby objects which increases memory allocation.
-- Large responses will take more time to send over the wire, to the user's browser.
+- The database reads a lot of data from the disk.
+- The result of the query (records) is eventually transformed to Ruby objects which increases memory allocation.
+- Large responses take more time to send over the wire, to the user's browser.
- Rendering long lists might freeze the browser (bad user experience).
With pagination, the data is split into equal pieces (pages). On the first visit, the user receives only a limited number of items (page size). The user can see more items by paginating forward which results in a new HTTP request and a new database query.
@@ -127,17 +127,17 @@ We can produce the same query in Rails:
Issue.where(project_id: 1).page(1).per(20)
```
-The SQL query will return a maximum of 20 rows from the database. However, it doesn't mean that the database will only read 20 rows from the disk to produce the result.
+The SQL query returns a maximum of 20 rows from the database. However, it doesn't mean that the database only reads 20 rows from the disk to produce the result.
-This is what will happen:
+This is what happens:
-1. The database will try to plan the execution in the most efficient way possible based on the table statistics and the available indexes.
+1. The database tries to plan the execution in the most efficient way possible based on the table statistics and the available indexes.
1. The planner knows that we have an index covering the `project_id` column.
-1. The database will read all rows using the index on `project_id`.
-1. The rows at this point are not sorted, so the database will need to sort the rows.
+1. The database reads all rows using the index on `project_id`.
+1. The rows at this point are not sorted, so the database sorts the rows.
1. The database returns the first 20 rows.
-In case the project has 10_000 rows, the database will read 10_000 rows and sort them in memory (or on disk). This is not going to scale well in the long term.
+In case the project has 10,000 rows, the database reads 10,000 rows and sorts them in memory (or on disk). This does not scale well in the long term.
To fix this we need the following index:
@@ -145,16 +145,16 @@ To fix this we need the following index:
CREATE INDEX index_on_issues_project_id ON issues (project_id, id);
```
-By making the `id` column part of the index, the previous query will read maximum 20 rows. The query will perform well regardless of the number of issues within a project. So with this change, we've also improved the initial page load (when the user loads the issue page).
+By making the `id` column part of the index, the previous query reads maximum 20 rows. The query performs well regardless of the number of issues within a project. So with this change, we've also improved the initial page load (when the user loads the issue page).
NOTE:
-Here we're leveraging the ordered property of the b-tree database index. Values in the index are sorted so reading 20 rows will not require further sorting.
+Here we're leveraging the ordered property of the b-tree database index. Values in the index are sorted so reading 20 rows does not require further sorting.
#### Limitations
##### `COUNT(*)` on a large dataset
-Kaminari by default executes a count query to determine the number of pages for rendering the page links. Count queries can be quite expensive for a large table, in an unfortunate scenario the queries will simply time out.
+Kaminari by default executes a count query to determine the number of pages for rendering the page links. Count queries can be quite expensive for a large table. In an unfortunate scenario the queries simply time out.
To work around this, we can run Kaminari without invoking the count SQL query.
@@ -162,11 +162,11 @@ To work around this, we can run Kaminari without invoking the count SQL query.
Issue.where(project_id: 1).page(1).per(20).without_count
```
-In this case, the count query will not be executed and the pagination will no longer render the page numbers. We'll see only the next and previous links.
+In this case, the count query is not executed and the pagination no longer renders the page numbers. We see only the next and previous links.
##### `OFFSET` on a large dataset
-When we paginate over a large dataset, we might notice that the response time will get slower and slower. This is due to the `OFFSET` clause that seeks through the rows and skips N rows.
+When we paginate over a large dataset, we might notice that the response time gets slower and slower. This is due to the `OFFSET` clause that seeks through the rows and skips N rows.
From the user point of view, this might not be always noticeable. As the user paginates forward, the previous rows might be still in the buffer cache of the database. If the user shares the link with someone else and it's opened after a few minutes or hours, the response time might be significantly higher or it would even time out.
@@ -214,7 +214,7 @@ Limit (cost=137878.89..137881.65 rows=20 width=1309) (actual time=5523.588..552
(8 rows)
```
-We can argue that a normal user will not be going to visit these pages, however, API users could easily navigate to very high page numbers (scraping, collecting data).
+We can argue that a normal user does not visit these pages, however, API users could easily navigate to very high page numbers (scraping, collecting data).
### Keyset pagination
@@ -279,7 +279,7 @@ eyJpZCI6Ijk0NzMzNTk0IiwidXBkYXRlZF9hdCI6IjIwMjEtMDQtMDkgMDg6NTA6MDUuODA1ODg0MDAw
```
NOTE:
-Pagination parameters will be visible to the user, so we need to be careful about which columns we order by.
+Pagination parameters are visible to the user, so be careful about which columns we order by.
Keyset pagination can only provide the next, previous, first, and last pages.
@@ -302,7 +302,7 @@ LIMIT 20
##### Tooling
-A generic keyset pagination library is available within the GitLab project which can most of the cases easily replace the existing, kaminari based pagination with significant performance improvements when dealing with large datasets.
+A generic keyset pagination library is available within the GitLab project which can most of the cases easily replace the existing, Kaminari based pagination with significant performance improvements when dealing with large datasets.
Example:
diff --git a/doc/development/database/pagination_performance_guidelines.md b/doc/development/database/pagination_performance_guidelines.md
index 90e4faf2de7..b5040e499e4 100644
--- a/doc/development/database/pagination_performance_guidelines.md
+++ b/doc/development/database/pagination_performance_guidelines.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
@@ -55,13 +55,13 @@ LIMIT 20
OFFSET 0
```
-With PostgreSQL version 11, the planner will first look up all issues matching the `project_id` filter and then join all `issue_metrics` rows. The ordering of rows will happen in memory. In case the joined relation is always present (1:1 relationship), the database will read `N * 2` rows where N is the number of rows matching the `project_id` filter.
+With PostgreSQL version 11, the planner first looks up all issues matching the `project_id` filter and then join all `issue_metrics` rows. The ordering of rows happens in memory. In case the joined relation is always present (1:1 relationship), the database reads `N * 2` rows where N is the number of rows matching the `project_id` filter.
For performance reasons, we should avoid mixing columns from different tables when specifying the `ORDER BY` clause.
-In this particular case there is no simple way (like index creation) to improve the query. We might think that changing the `issues.id` column to `issue_metrics.issue_id` will help, however, this will likely make the query perform worse because it might force the database to process all rows in the `issue_metrics` table.
+In this particular case there is no simple way (like index creation) to improve the query. We might think that changing the `issues.id` column to `issue_metrics.issue_id` helps, however, this likely makes the query perform worse because it might force the database to process all rows in the `issue_metrics` table.
-One idea to address this problem is denormalization. Adding the `project_id` column to the `issue_metrics` table will make the filtering and sorting efficient:
+One idea to address this problem is denormalization. Adding the `project_id` column to the `issue_metrics` table makes the filtering and sorting efficient:
```sql
SELECT issues.* FROM issues
@@ -73,7 +73,7 @@ OFFSET 0
```
NOTE:
-The query will require an index on `issue_metrics` table with the following column configuration: `(project_id, first_mentioned_in_commit_at DESC, issue_id DESC)`.
+The query requires an index on `issue_metrics` table with the following column configuration: `(project_id, first_mentioned_in_commit_at DESC, issue_id DESC)`.
## Filtering
@@ -81,7 +81,7 @@ The query will require an index on `issue_metrics` table with the following colu
Filtering by a project is a very common use case since we have many features on the project level. Examples: merge requests, issues, boards, iterations.
-These features will have a filter on `project_id` in their base query. Loading issues for a project:
+These features have a filter on `project_id` in their base query. Loading issues for a project:
```ruby
project = Project.find(5)
@@ -108,9 +108,9 @@ This index fully covers the database query and the pagination.
### By group
-Unfortunately, there is no efficient way to sort and paginate on the group level. The database query execution time will increase based on the number of records in the group.
+Unfortunately, there is no efficient way to sort and paginate on the group level. The database query execution time increases based on the number of records in the group.
-Things get worse when group level actually means group and its subgroups. To load the first page, the database needs to look up the group hierarchy, find all projects and then look up all issues.
+Things get worse when group level actually means group and its subgroups. To load the first page, the database looks up the group hierarchy, finds all projects, and then looks up all issues.
The main reason behind the inefficient queries on the group level is the way our database schema is designed; our core domain models are associated with a project, and projects are associated with groups. This doesn't mean that the database structure is bad, it's just in a well-normalized form that is not optimized for efficient group level queries. We might need to look into denormalization in the long term.
@@ -184,7 +184,7 @@ LIMIT 20
OFFSET 0
```
-Keep in mind that the index above will not support the following project level query:
+The index above does not support the following project level query:
```sql
SELECT "issues".*
@@ -213,7 +213,7 @@ OFFSET 0
We might be tempted to add an index on `project_id`, `confidential`, and `iid` to improve the database query, however, in this case it's probably unnecessary. Based on the data distribution in the table, confidential issues are rare. Filtering them out does not make the database query significantly slower. The database might read a few extra rows, the performance difference might not even be visible to the end-user.
-On the other hand, if we would implement a special filter where we only show confidential issues, we will surely need the index. Finding 20 confidential issues might require the database to scan hundreds of rows or in the worst case, all issues in the project.
+On the other hand, if we implemented a special filter where we only show confidential issues, we need the index. Finding 20 confidential issues might require the database to scan hundreds of rows or, in the worst case, all issues in the project.
NOTE:
Be aware of the data distribution and the table access patterns (how features work) when introducing a new database index. Sampling production data might be necessary to make the right decision.
@@ -253,7 +253,7 @@ Example database (oversimplified) execution plan:
- `SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 5`
1. The database estimates the number of rows and the costs to run these queries.
1. The database executes the cheapest query first.
-1. Using the query result, load the rows from the other table (from the other query) using the JOIN column and filter the rows further.
+1. Using the query result, load the rows from the other table (from the other query) using the `JOIN` column and filter the rows further.
In this particular example, the `issue_assignees` query would likely be executed first.
@@ -276,17 +276,17 @@ Running the query in production for the GitLab project produces the following ex
(13 rows)
```
-The query looks up the `assignees` first, filtered by the `user_id` (`user_id = 4156052`) and it finds 215 rows. Using that 215 rows, the database will look up the 215 associated issue rows by the primary key. Notice that the filter on the `project_id` column is not backed by an index.
+The query looks up the `assignees` first, filtered by the `user_id` (`user_id = 4156052`) and it finds 215 rows. Using those 215 rows, the database looks up the 215 associated issue rows by the primary key. Notice that the filter on the `project_id` column is not backed by an index.
-In most cases, we are lucky that the joined relation will not be going to return too many rows, therefore, we will end up with a relatively efficient database query that accesses low number of rows. As the database grows, these queries might start to behave differently. Let's say the number `issue_assignees` records for a particular user is very high (millions), then this join query will not perform well, and it will likely time out.
+In most cases, we are lucky that the joined relation does not return too many rows, therefore, we end up with a relatively efficient database query that accesses a small number of rows. As the database grows, these queries might start to behave differently. Let's say the number `issue_assignees` records for a particular user is very high, in the millions. This join query does not perform well, and it likely times out.
-A similar problem could be a double join, where the filter exists in the 2nd JOIN query. Example: `Issue -> LabelLink -> Label(name=bug)`.
+A similar problem could be a double join, where the filter exists in the 2nd `JOIN` query. Example: `Issue -> LabelLink -> Label(name=bug)`.
There is no easy way to fix these problems. Denormalization of data could help significantly, however, it has also negative effects (data duplication and keeping the data up to date).
Ideas for improving the `issue_assignees` filter:
-- Add `project_id` column to the `issue_assignees` table so when JOIN-ing, the extra `project_id` filter will further filter the rows. The sorting will likely happen in memory:
+- Add `project_id` column to the `issue_assignees` table so when performing the `JOIN`, the extra `project_id` filter further filters the rows. The sorting likely happens in memory:
```sql
SELECT "issues".*
diff --git a/doc/development/database/post_deployment_migrations.md b/doc/development/database/post_deployment_migrations.md
index 799eefdb875..a49c77ca047 100644
--- a/doc/development/database/post_deployment_migrations.md
+++ b/doc/development/database/post_deployment_migrations.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
diff --git a/doc/development/database/rename_database_tables.md b/doc/development/database/rename_database_tables.md
index 7a76c028042..cbcbd507204 100644
--- a/doc/development/database/rename_database_tables.md
+++ b/doc/development/database/rename_database_tables.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
@@ -135,4 +135,4 @@ database, ActiveRecord fetches the column information again. At this time, our p
marked table (`TABLES_TO_BE_RENAMED`) instructs ActiveRecord to use the new database table name
when fetching the database table information.
-The new version of the application will use the new database table.
+The new version of the application uses the new database table.
diff --git a/doc/development/database/setting_multiple_values.md b/doc/development/database/setting_multiple_values.md
index 0f23aae9f79..cba15a73430 100644
--- a/doc/development/database/setting_multiple_values.md
+++ b/doc/development/database/setting_multiple_values.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
diff --git a/doc/development/database/strings_and_the_text_data_type.md b/doc/development/database/strings_and_the_text_data_type.md
index 7aa529e1518..73e023f8d45 100644
--- a/doc/development/database/strings_and_the_text_data_type.md
+++ b/doc/development/database/strings_and_the_text_data_type.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
@@ -8,7 +8,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/30453) in GitLab 13.0.
-When adding new columns that will be used to store strings or other textual information:
+When adding new columns to store strings or other textual information:
1. We always use the `text` data type instead of the `string` data type.
1. `text` columns should always have a limit set, either by using the `create_table` with
@@ -142,8 +142,8 @@ instance of GitLab could have such records, so we would follow the same process
We first add the limit as a `NOT VALID` check constraint to the table, which enforces consistency when
new records are inserted or current records are updated.
-In the example above, the existing issues with more than 1024 characters in their title will not be
-affected and you'll be still able to update records in the `issues` table. However, when you'd try
+In the example above, the existing issues with more than 1024 characters in their title are not
+affected, and you are still able to update records in the `issues` table. However, when you'd try
to update the `title_html` with a title that has more than 1024 characters, the constraint causes
a database error.
@@ -182,7 +182,7 @@ end
#### Data migration to fix existing records (current release)
The approach here depends on the data volume and the cleanup strategy. The number of records that must
-be fixed on GitLab.com is a nice indicator that will help us decide whether to use a post-deployment
+be fixed on GitLab.com is a nice indicator that helps us decide whether to use a post-deployment
migration or a background data migration:
- If the data volume is less than `1,000` records, then the data migration can be executed within the post-migration.
@@ -233,7 +233,7 @@ You can find more information on the guide about [background migrations](backgro
#### Validate the text limit (next release)
-Validating the text limit will scan the whole table and make sure that each record is correct.
+Validating the text limit scans the whole table, and makes sure that each record is correct.
Still in our example, for the 13.1 milestone (next), we run the `validate_text_limit` migration
helper in a final post-deployment migration,
@@ -276,11 +276,11 @@ end
## Text limit constraints on large tables
If you have to clean up a text column for a really [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3)
-(for example, the `artifacts` in `ci_builds`), your background migration will go on for a while and
-it will need an additional [background migration cleaning up](background_migrations.md#cleaning-up)
+(for example, the `artifacts` in `ci_builds`), your background migration goes on for a while and
+it needs an additional [background migration cleaning up](background_migrations.md#cleaning-up)
in the release after adding the data migration.
-In that rare case you will need 3 releases end-to-end:
+In that rare case you need 3 releases end-to-end:
1. Release `N.M` - Add the text limit and the background migration to fix the existing records.
1. Release `N.M+1` - Cleanup the background migration.
diff --git a/doc/development/database/table_partitioning.md b/doc/development/database/table_partitioning.md
index 34cb73978bc..582c988bef9 100644
--- a/doc/development/database/table_partitioning.md
+++ b/doc/development/database/table_partitioning.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
@@ -36,23 +36,23 @@ before attempting to leverage this feature.
While partitioning can be very useful when properly applied, it's
imperative to identify if the data and workload of a table naturally fit a
-partitioning scheme. There are a few details you'll have to understand
-in order to decide if partitioning is a good fit for your particular
+partitioning scheme. There are a few details you have to understand
+to decide if partitioning is a good fit for your particular
problem.
First, a table is partitioned on a partition key, which is a column or
-set of columns which determine how the data will be split across the
+set of columns which determine how the data is split across the
partitions. The partition key is used by the database when reading or
-writing data, to decide which partitions need to be accessed. The
+writing data, to decide which partitions must be accessed. The
partition key should be a column that would be included in a `WHERE`
clause on almost all queries accessing that table.
-Second, it's necessary to understand the strategy the database will
-use to split the data across the partitions. The scheme supported by the
+Second, it's necessary to understand the strategy the database uses
+to split the data across the partitions. The scheme supported by the
GitLab migration helpers is date-range partitioning, where each partition
in the table contains data for a single month. In this case, the partitioning
-key would need to be a timestamp or date column. In order for this type of
-partitioning to work well, most queries would need to access data within a
+key must be a timestamp or date column. In order for this type of
+partitioning to work well, most queries must access data in a
certain date range.
For a more concrete example, the `audit_events` table can be used, which
@@ -73,7 +73,7 @@ CREATE TABLE audit_events (
created_at timestamptz NOT NULL);
```
-Now imagine typical queries in the UI would display the data within a
+Now imagine typical queries in the UI would display the data in a
certain date range, like a single week:
```sql
@@ -117,7 +117,7 @@ partition key falls in the specified range. For example, the partition
greater than or equal to `2020-01-01` and less than `2020-02-01`.
Now, if we look at the previous example query again, the database can
-use the `WHERE` to recognize that all matching rows will be in the
+use the `WHERE` to recognize that all matching rows are in the
`audit_events_202001` partition. Rather than searching all of the data
in all of the partitions, it can search only the single month's worth
of data in the appropriate partition. In a large table, this can
@@ -136,11 +136,11 @@ LIMIT 100
In this example, the database can't prune any partitions from the search,
because matching data could exist in any of them. As a result, it has to
query each partition individually, and aggregate the rows into a single result
-set. Since `author_id` would be indexed, the performance impact could
+set. Because `author_id` would be indexed, the performance impact could
likely be acceptable, but on more complex queries the overhead can be
substantial. Partitioning should only be leveraged if the access patterns
-of the data support the partitioning strategy, otherwise performance will
-suffer.
+of the data support the partitioning strategy, otherwise performance
+suffers.
## Partitioning a table
@@ -158,15 +158,15 @@ migration to copy data into the new table. Changes to the original table
schema can be made in parallel with the partitioning migration, but they
must take care to not break the underlying mechanism that makes the migration
work. For example, if a column is added to the table that is being
-partitioned, both the partitioned table and the trigger definition need to
+partitioned, both the partitioned table and the trigger definition must
be updated to match.
### Step 1: Creating the partitioned copy (Release N)
The first step is to add a migration to create the partitioned copy of
-the original table. This migration will also create the appropriate
+the original table. This migration creates the appropriate
partitions based on the data in the original table, and install a
-trigger that will sync writes from the original table into the
+trigger that syncs writes from the original table into the
partitioned copy.
An example migration of partitioning the `audit_events` table by its
@@ -186,15 +186,15 @@ class PartitionAuditEvents < Gitlab::Database::Migration[1.0]
end
```
-Once this has executed, any inserts, updates or deletes in the
-original table will also be duplicated in the new table. For updates and
-deletes, the operation will only have an effect if the corresponding row
+After this has executed, any inserts, updates, or deletes in the
+original table are also duplicated in the new table. For updates and
+deletes, the operation only has an effect if the corresponding row
exists in the partitioned table.
### Step 2: Backfill the partitioned copy (Release N)
-The second step is to add a post-deployment migration that will schedule
-the background jobs that will backfill existing data from the original table
+The second step is to add a post-deployment migration that schedules
+the background jobs that backfill existing data from the original table
into the partitioned copy.
Continuing the above example, the migration would look like:
@@ -225,7 +225,7 @@ partitioning migration.
The third step must occur at least one release after the release that
includes the background migration. This gives time for the background
migration to execute properly in self-managed installations. In this step,
-add another post-deployment migration that will cleanup after the
+add another post-deployment migration that cleans up after the
background migration. This includes forcing any remaining jobs to
execute, and copying data that may have been missed, due to dropped or
failed jobs.
@@ -248,12 +248,11 @@ end
After this migration has completed, the original table and partitioned
table should contain identical data. The trigger installed on the
-original table guarantees that the data will remain in sync going
-forward.
+original table guarantees that the data remains in sync going forward.
### Step 4: Swap the partitioned and non-partitioned tables (Release N+1)
-The final step of the migration will make the partitioned table ready
+The final step of the migration makes the partitioned table ready
for use by the application. This section will be updated when the
migration helper is ready, for now development can be followed in the
[Tracking Issue](https://gitlab.com/gitlab-org/gitlab/-/issues/241267).
diff --git a/doc/development/database/transaction_guidelines.md b/doc/development/database/transaction_guidelines.md
index 2806bd217db..d96d11f05a5 100644
--- a/doc/development/database/transaction_guidelines.md
+++ b/doc/development/database/transaction_guidelines.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
@@ -15,8 +15,8 @@ For further reference, check PostgreSQL documentation about [transactions](https
The [sharding group](https://about.gitlab.com/handbook/engineering/development/enablement/sharding/) plans
to split the main GitLab database and move some of the database tables to other database servers.
-We'll start decomposing the `ci_*`-related database tables first. To maintain the current application
-development experience, we'll add tooling and static analyzers to the codebase to ensure correct
+We start decomposing the `ci_*`-related database tables first. To maintain the current application
+development experience, we add tooling and static analyzers to the codebase to ensure correct
data access and data modification methods. By using the correct form for defining database transactions,
we can save significant refactoring work in the future.
@@ -60,7 +60,7 @@ end
The database tries to acquire the `FOR UPDATE` lock for the referenced `issue` and
`project` records. In our case, we have two competing transactions for these locks,
-and only one of them will successfully acquire them. The other transaction will have
+and only one of them successfully acquires them. The other transaction has
to wait in the lock queue until the first transaction finishes. The execution of the
second transaction is blocked at this point.
@@ -139,5 +139,5 @@ end
```
The `ActiveRecord::Base` class uses a different database connection than the `Ci::Build` records.
-The two statements in the transaction block will not be part of the transaction and will not be
+The two statements in the transaction block are not part of the transaction and are
rolled back in case something goes wrong. They act as 3rd part calls.