Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2022-06-18 00:09:20 +0300
committerGitLab Bot <gitlab-bot@gitlab.com>2022-06-18 00:09:20 +0300
commit3e20234984524c3ccfb09eace7b9d170cbcc32d7 (patch)
tree5586f86210db467613ba99040c96b617066de8c9 /doc
parent9a1066298169f8ebecacb9e55fe895f4f8962000 (diff)
Add latest changes from gitlab-org/gitlab@master
Diffstat (limited to 'doc')
-rw-r--r--doc/architecture/blueprints/database_scaling/size-limits.md14
-rw-r--r--doc/development/adding_database_indexes.md16
-rw-r--r--doc/development/creating_enums.md2
-rw-r--r--doc/development/database/avoiding_downtime_in_migrations.md4
-rw-r--r--doc/development/database/background_migrations.md26
-rw-r--r--doc/development/database/batched_background_migrations.md6
-rw-r--r--doc/development/database/efficient_in_operator_queries.md2
-rw-r--r--doc/development/database/loose_foreign_keys.md30
-rw-r--r--doc/development/database/rename_database_tables.md2
-rw-r--r--doc/development/database/table_partitioning.md20
-rw-r--r--doc/development/database_review.md24
-rw-r--r--doc/development/fips_compliance.md4
-rw-r--r--doc/development/serializing_data.md19
-rw-r--r--doc/development/verifying_database_capabilities.md2
-rw-r--r--doc/user/project/merge_requests/drafts.md7
-rw-r--r--doc/user/project/quick_actions.md3
-rw-r--r--doc/user/tasks.md10
17 files changed, 95 insertions, 96 deletions
diff --git a/doc/architecture/blueprints/database_scaling/size-limits.md b/doc/architecture/blueprints/database_scaling/size-limits.md
index cb380d11739..375c82f8833 100644
--- a/doc/architecture/blueprints/database_scaling/size-limits.md
+++ b/doc/architecture/blueprints/database_scaling/size-limits.md
@@ -7,12 +7,12 @@ description: 'Database Scalability / Limit table sizes'
# Database Scalability: Limit on-disk table size to < 100 GB for GitLab.com
-This document is a proposal to work towards reducing and limiting table sizes on GitLab.com. We establish a **measurable target** by limiting table size to a certain threshold. This will be used as an indicator to drive database focus and decision making. With GitLab.com growing, we continuously re-evaluate which tables need to be worked on to prevent or otherwise fix violations.
+This document is a proposal to work towards reducing and limiting table sizes on GitLab.com. We establish a **measurable target** by limiting table size to a certain threshold. This is used as an indicator to drive database focus and decision making. With GitLab.com growing, we continuously re-evaluate which tables need to be worked on to prevent or otherwise fix violations.
This is not meant to be a hard rule but rather a strong indication that work needs to be done to break a table apart or otherwise reduce its size.
This is meant to be read in context with the [Database Sharding blueprint](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/64115),
-which paints the bigger picture. This proposal here is thought to be part of the "debloating step" below, as we aim to reduce storage requirements and improve data modeling. Partitioning is part of the standard tool-belt: where possible, we can already use partitioning as a solution to cut physical table sizes significantly. Both will help to prepare efforts like decomposition (database usage is already optimized) and sharding (database is already partitioned along an identified data access dimension).
+which paints the bigger picture. This proposal here is thought to be part of the "debloating step" below, as we aim to reduce storage requirements and improve data modeling. Partitioning is part of the standard tool-belt: where possible, we can already use partitioning as a solution to cut physical table sizes significantly. Both help to prepare efforts like decomposition (database usage is already optimized) and sharding (database is already partitioned along an identified data access dimension).
```mermaid
graph LR
@@ -125,11 +125,11 @@ In order to maintain and improve operational stability and lessen development bu
1. Indexes are smaller, can be maintained more efficiently and fit better into memory
1. Data migrations are easier to reason about, take less time to implement and execute
-This target is *pragmatic*: We understand table sizes depend on feature usage, code changes and other factors - which all change over time. We may not always find solutions where we can tightly limit the size of physical tables once and for all. That is acceptable though and we primarily aim to keep the situation on GitLab.com under control. We adapt our efforts to the situation present on GitLab.com and will re-evaluate frequently.
+This target is *pragmatic*: We understand table sizes depend on feature usage, code changes and other factors - which all change over time. We may not always find solutions where we can tightly limit the size of physical tables once and for all. That is acceptable though and we primarily aim to keep the situation on GitLab.com under control. We adapt our efforts to the situation present on GitLab.com and re-evaluate frequently.
-While there are changes we can make that lead to a constant maximum physical table size over time, this doesn't need to be the case necessarily. Consider for example hash partitioning, which breaks a table down into a static number of partitions. With data growth over time, individual partitions will also grow in size and may eventually reach the threshold size again. We strive to get constant table sizes, but it is acceptable to ship easier solutions that don't have this characteristic but improve the situation for a considerable amount of time.
+While there are changes we can make that lead to a constant maximum physical table size over time, this doesn't need to be the case necessarily. Consider for example hash partitioning, which breaks a table down into a static number of partitions. With data growth over time, individual partitions also grow in size and may eventually reach the threshold size again. We strive to get constant table sizes, but it is acceptable to ship easier solutions that don't have this characteristic but improve the situation for a considerable amount of time.
-As such, the target size of a physical table after refactoring depends on the situation and there is no hard rule for it. We suggest to consider historic data growth and forecast when physical tables will reach the threshold of 100 GB again. This allows us to understand how long a particular solution is expected to last until the model has to be revisited.
+As such, the target size of a physical table after refactoring depends on the situation and there is no hard rule for it. We suggest to consider historic data growth and forecast when physical tables reach the threshold of 100 GB again. This allows us to understand how long a particular solution is expected to last until the model has to be revisited.
## Solutions
@@ -154,10 +154,10 @@ For solutions like normalization, this is a trade-off: Denormalized models can s
A few examples can be found below, many more are organized under the epic [Database efficiency](https://gitlab.com/groups/gitlab-org/-/epics/5585).
1. [Reduce number of indexes on `ci_builds`](https://gitlab.com/groups/gitlab-org/-/epics/6203)
-1. [Normalize and de-duplicate committer and author details in merge_request_diff_commits](https://gitlab.com/gitlab-org/gitlab/-/issues/331823)
+1. [Normalize and de-duplicate committer and author details in `merge_request_diff_commits`](https://gitlab.com/gitlab-org/gitlab/-/issues/331823)
1. [Retention strategy for `ci_build_trace_sections`](https://gitlab.com/gitlab-org/gitlab/-/issues/32565#note_603138100)
1. [Implement worker that hard-deletes old CI jobs metadata](https://gitlab.com/gitlab-org/gitlab/-/issues/215646)
-1. [merge_request_diff_files violates < 100 GB target](https://gitlab.com/groups/gitlab-org/-/epics/6215) (epic)
+1. [`merge_request_diff_files` violates < 100 GB target](https://gitlab.com/groups/gitlab-org/-/epics/6215) (epic)
## Goal
diff --git a/doc/development/adding_database_indexes.md b/doc/development/adding_database_indexes.md
index b44a2861735..f524b04c6eb 100644
--- a/doc/development/adding_database_indexes.md
+++ b/doc/development/adding_database_indexes.md
@@ -32,14 +32,14 @@ data and are slower to update compared to B-tree indexes.
Because of all this one should not blindly add a new index for every column used
to filter data by. Instead one should ask themselves the following questions:
-1. Can I write my query in such a way that it re-uses as many existing indexes
+1. Can you write your query in such a way that it re-uses as many existing indexes
as possible?
-1. Is the data going to be large enough that using an index will actually be
+1. Is the data large enough that using an index is actually
faster than just iterating over the rows in the table?
1. Is the overhead of maintaining the index worth the reduction in query
timings?
-We'll explore every question in detail below.
+We explore every question in detail below.
## Re-using Queries
@@ -54,7 +54,7 @@ AND state = 'open';
```
Now imagine we already have an index on the `user_id` column but not on the
-`state` column. One may think this query will perform badly due to `state` being
+`state` column. One may think this query performs badly due to `state` being
unindexed. In reality the query may perform just fine given the index on
`user_id` can filter out enough rows.
@@ -85,8 +85,8 @@ enough rows you may _not_ want to add a new index.
## Maintenance Overhead
Indexes have to be updated on every table write. In case of PostgreSQL _all_
-existing indexes will be updated whenever data is written to a table. As a
-result of this having many indexes on the same table will slow down writes.
+existing indexes are updated whenever data is written to a table. As a
+result of this having many indexes on the same table slows down writes.
Because of this one should ask themselves: is the reduction in query performance
worth the overhead of maintaining an extra index?
@@ -184,8 +184,8 @@ def up
end
```
-The call to `index_exists?` will return true if **any** index exists on
-`:my_table` and `:my_column`, and index creation will be bypassed.
+The call to `index_exists?` returns true if **any** index exists on
+`:my_table` and `:my_column`, and index creation is bypassed.
The `add_concurrent_index` helper is a requirement for creating indexes
on populated tables. Since it cannot be used inside a transactional
diff --git a/doc/development/creating_enums.md b/doc/development/creating_enums.md
index a5a642eed86..450cb97d978 100644
--- a/doc/development/creating_enums.md
+++ b/doc/development/creating_enums.md
@@ -98,7 +98,7 @@ end
This looks working as a workaround, however, this approach has some downsides that:
- Features could move from EE to FOSS or vice versa. Therefore, the offset might be mixed between FOSS and EE in the future.
- For example, when you move `activity_limit_exceeded` to FOSS, you'll see `{ unknown_failure: 0, config_error: 1, activity_limit_exceeded: 1_000 }`.
+ For example, when you move `activity_limit_exceeded` to FOSS, you see `{ unknown_failure: 0, config_error: 1, activity_limit_exceeded: 1_000 }`.
- The integer column for the `enum` is likely created [as `SMALLINT`](#creating-enums).
Therefore, you need to be careful of that the offset doesn't exceed the maximum value of 2 bytes integer.
diff --git a/doc/development/database/avoiding_downtime_in_migrations.md b/doc/development/database/avoiding_downtime_in_migrations.md
index 91254d5c88f..2d079656e23 100644
--- a/doc/development/database/avoiding_downtime_in_migrations.md
+++ b/doc/development/database/avoiding_downtime_in_migrations.md
@@ -170,7 +170,7 @@ class RenameUsersUpdatedAtToUpdatedAtTimestamp < Gitlab::Database::Migration[1.0
end
```
-This will take care of renaming the column, ensuring data stays in sync, and
+This takes care of renaming the column, ensuring data stays in sync, and
copying over indexes and foreign keys.
If a column contains one or more indexes that don't contain the name of the
@@ -270,7 +270,7 @@ And that's it, we're done!
Some type changes require casting data to a new type. For example when changing from `text` to `jsonb`.
In this case, use the `type_cast_function` option.
-Make sure there is no bad data and the cast will always succeed. You can also provide a custom function that handles
+Make sure there is no bad data and the cast always succeeds. You can also provide a custom function that handles
casting errors.
Example migration:
diff --git a/doc/development/database/background_migrations.md b/doc/development/database/background_migrations.md
index 4d20d85fce2..0124dbae51f 100644
--- a/doc/development/database/background_migrations.md
+++ b/doc/development/database/background_migrations.md
@@ -65,7 +65,7 @@ and idempotent.
See [Sidekiq best practices guidelines](https://github.com/mperham/sidekiq/wiki/Best-Practices)
for more details.
-Make sure that in case that your migration job is going to be retried data
+Make sure that in case that your migration job is retried, data
integrity is guaranteed.
## Background migrations for EE-only features
@@ -77,7 +77,7 @@ as explained in the [guidelines for implementing Enterprise Edition features](..
## How It Works
Background migrations are simple classes that define a `perform` method. A
-Sidekiq worker will then execute such a class, passing any arguments to it. All
+Sidekiq worker then executes such a class, passing any arguments to it. All
migration classes must be defined in the namespace
`Gitlab::BackgroundMigration`, the files should be placed in the directory
`lib/gitlab/background_migration/`.
@@ -106,7 +106,7 @@ queue_background_migration_jobs_by_range_at_intervals(
)
```
-You'll also need to make sure that newly created data is either migrated, or
+You also need to make sure that newly created data is either migrated, or
saved in both the old and new version upon creation. For complex and time
consuming migrations it's best to schedule a background job using an
`after_create` hook so this doesn't affect response timings. The same applies to
@@ -142,7 +142,7 @@ or minor release, you _must not_ do this in a patch release.
Because background migrations can take a long time you can't immediately clean
things up after scheduling them. For example, you can't drop a column that's
used in the migration process as this would cause jobs to fail. This means that
-you'll need to add a separate _post deployment_ migration in a future release
+you need to add a separate _post deployment_ migration in a future release
that finishes any remaining jobs before cleaning things up (for example, removing a
column).
@@ -189,7 +189,7 @@ extract the `url` key from this JSON object and store it in the `integrations.ur
column. There are millions of integrations and parsing JSON is slow, thus you can't
do this in a regular migration.
-To do this using a background migration we'll start with defining our migration
+To do this using a background migration we start with defining our migration
class:
```ruby
@@ -213,7 +213,7 @@ class Gitlab::BackgroundMigration::ExtractIntegrationsUrl
end
```
-Next we'll need to adjust our code so we schedule the above migration for newly
+Next we need to adjust our code so we schedule the above migration for newly
created and updated integrations. We can do this using something along the lines of
the following:
@@ -232,7 +232,7 @@ We're using `after_commit` here to ensure the Sidekiq job is not scheduled
before the transaction completes as doing so can lead to race conditions where
the changes are not yet visible to the worker.
-Next we'll need a post-deployment migration that schedules the migration for
+Next we need a post-deployment migration that schedules the migration for
existing data.
```ruby
@@ -254,11 +254,11 @@ class ScheduleExtractIntegrationsUrl < Gitlab::Database::Migration[1.0]
end
```
-Once deployed our application will continue using the data as before but at the
-same time will ensure that both existing and new data is migrated.
+After deployed our application continues using the data as before, but at the
+same time ensures that both existing and new data is migrated.
In the next release we can remove the `after_commit` hooks and related code. We
-will also need to add a post-deployment migration that consumes any remaining
+also need to add a post-deployment migration that consumes any remaining
jobs and manually run on any un-migrated rows. Such a migration would look like
this:
@@ -292,7 +292,7 @@ If the application does not depend on the data being 100% migrated (for
instance, the data is advisory, and not mission-critical), then this final step
can be skipped.
-This migration will then process any jobs for the ExtractIntegrationsUrl migration
+This migration then processes any jobs for the `ExtractIntegrationsUrl` migration
and continue once all jobs have been processed. Once done you can safely remove
the `integrations.properties` column.
@@ -325,13 +325,13 @@ for more details.
1. Make sure that tests you write are not false positives.
1. Make sure that if the data being migrated is critical and cannot be lost, the
clean-up migration also checks the final state of the data before completing.
-1. When migrating many columns, make sure it won't generate too many
+1. When migrating many columns, make sure it does not generate too many
dead tuples in the process (you may need to directly query the number of dead tuples
and adjust the scheduling according to this piece of data).
1. Make sure to discuss the numbers with a database specialist, the migration may add
more pressure on DB than you expect (measure on staging,
or ask someone to measure on production).
-1. Make sure to know how much time it'll take to run all scheduled migrations.
+1. Make sure to know how much time it takes to run all scheduled migrations.
1. Provide an estimation section in the description, estimating both the total migration
run time and the query times for each background migration job. Explain plans for each query
should also be provided.
diff --git a/doc/development/database/batched_background_migrations.md b/doc/development/database/batched_background_migrations.md
index d77cbd23935..6d3d5fa7f92 100644
--- a/doc/development/database/batched_background_migrations.md
+++ b/doc/development/database/batched_background_migrations.md
@@ -190,7 +190,7 @@ data to be in the new format.
The `routes` table has a `source_type` field that's used for a polymorphic relationship.
As part of a database redesign, we're removing the polymorphic relationship. One step of
-the work will be migrating data from the `source_id` column into a new singular foreign key.
+the work is migrating data from the `source_id` column into a new singular foreign key.
Because we intend to delete old rows later, there's no need to update them as part of the
background migration.
@@ -219,9 +219,9 @@ background migration.
NOTE:
Job classes must be subclasses of `BatchedMigrationJob` to be
correctly handled by the batched migration framework. Any subclass of
- `BatchedMigrationJob` will be initialized with necessary arguments to
+ `BatchedMigrationJob` is initialized with necessary arguments to
execute the batch, as well as a connection to the tracking database.
- Additional `job_arguments` set on the migration will be passed to the
+ Additional `job_arguments` set on the migration are passed to the
job's `perform` method.
1. Add a new trigger to the database to update newly created and updated routes,
diff --git a/doc/development/database/efficient_in_operator_queries.md b/doc/development/database/efficient_in_operator_queries.md
index ef6ef232c9e..a2481577e8c 100644
--- a/doc/development/database/efficient_in_operator_queries.md
+++ b/doc/development/database/efficient_in_operator_queries.md
@@ -1031,7 +1031,7 @@ Optimized `IN` query:
| issue lookup query | 519 | 20 | 10 000 |
The group and project queries are not using sorting, the necessary columns are read from database
-indexes. These values are accessed frequently so it's very likely that most of the data will be
+indexes. These values are accessed frequently so it's very likely that most of the data is
in the PostgreSQL's buffer cache.
The optimized `IN` query reads maximum 519 entries (cursor values) from the index:
diff --git a/doc/development/database/loose_foreign_keys.md b/doc/development/database/loose_foreign_keys.md
index 1a6d995e78c..dec51d484fd 100644
--- a/doc/development/database/loose_foreign_keys.md
+++ b/doc/development/database/loose_foreign_keys.md
@@ -607,20 +607,20 @@ SELECT TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME, old_table.id FROM old_table;
The partition "sliding" process is controlled by two, regularly executed callbacks. These
callbacks are defined within the `LooseForeignKeys::DeletedRecord` model.
-The `next_partition_if` callback controls when to create a new partition. A new partition will
-be created when the current partition has at least one record older than 24 hours. A new partition
+The `next_partition_if` callback controls when to create a new partition. A new partition is
+created when the current partition has at least one record older than 24 hours. A new partition
is added by the [`PartitionManager`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/partitioning/partition_manager.rb)
using the following steps:
1. Create a new partition, where the `VALUE` for the partition is `CURRENT_PARTITION + 1`.
1. Update the default value of the `partition` column to `CURRENT_PARTITION + 1`.
-With these steps, new `INSERT`-s via the triggers will end up in the new partition. At this point,
+With these steps, all new `INSERT` queries via the triggers end up in the new partition. At this point,
the database table has two partitions.
The `detach_partition_if` callback determines if the old partitions can be detached from the table.
A partition is detachable if there are no pending (unprocessed) records in the partition
-(`status = 1`). The detached partitions will be available for some time, you can see the list
+(`status = 1`). The detached partitions are available for some time, you can see the list
detached partitions in the `detached_partitions` table:
```sql
@@ -663,7 +663,7 @@ WHERE ("merge_requests"."id") IN
These queries are batched, which means that in many cases, several invocations are needed to clean
up all associated child records.
-The batching is implemented with loops, the processing will stop when all associated child records
+The batching is implemented with loops, the processing stops when all associated child records
are cleaned up or the limit is reached.
```ruby
@@ -682,14 +682,14 @@ end
The loop-based batch processing is preferred over `EachBatch` for the following reasons:
-- The records in the batch are modified, so the next batch will contain different records.
+- The records in the batch are modified, so the next batch contains different records.
- There is always an index on the foreign key column however, the column is usually not unique.
`EachBatch` requires a unique column for the iteration.
- The record order doesn't matter for the cleanup.
-Notice that we have two loops. The initial loop will process records with the `SKIP LOCKED` clause.
-The query will skip rows that are locked by other application processes. This will ensure that the
-cleanup worker will less likely to become blocked. The second loop will execute the database
+Notice that we have two loops. The initial loop processes records with the `SKIP LOCKED` clause.
+The query skips rows that are locked by other application processes. This ensures that the
+cleanup worker is less likely to become blocked. The second loop executes the database
queries without `SKIP LOCKED` to ensure that all records have been processed.
#### Processing limits
@@ -713,7 +713,7 @@ immediately. After some time, the next scheduled worker continues the cleanup pr
#### Performance characteristics
-The database trigger on the parent tables will **decrease** the record deletion speed. Each
+The database trigger on the parent tables **decreases** the record deletion speed. Each
statement that removes rows from the parent table invokes the trigger to insert records
into the `loose_foreign_keys_deleted_records` table.
@@ -721,7 +721,7 @@ The queries within the cleanup worker are fairly efficient index scans, with lim
they're unlikely to affect other parts of the application.
The database queries are not running in transaction, when an error happens for example a statement
-timeout or a worker crash, the next job will continue the processing.
+timeout or a worker crash, the next job continues the processing.
## Troubleshooting
@@ -730,13 +730,13 @@ timeout or a worker crash, the next job will continue the processing.
There can be cases where the workers need to process an unusually large amount of data. This can
happen under normal usage, for example when a large project or group is deleted. In this scenario,
there can be several million rows to be deleted or nullified. Due to the limits enforced by the
-worker, processing this data will take some time.
+worker, processing this data takes some time.
When cleaning up "heavy-hitters", the feature ensures fair processing by rescheduling larger
batches for later. This gives time for other deleted records to be processed.
For example, a project with millions of `ci_builds` records is deleted. The `ci_builds` records
-will be deleted by the loose foreign keys feature.
+is deleted by the loose foreign keys feature.
1. The cleanup worker is scheduled and picks up a batch of deleted `projects` records. The large
project is part of the batch.
@@ -746,7 +746,7 @@ project is part of the batch.
1. Go to step 1. The next cleanup worker continues the cleanup.
1. When the `cleanup_attempts` reaches 3, the batch is re-scheduled 10 minutes later by updating
the `consume_after` column.
-1. The next cleanup worker will process a different batch.
+1. The next cleanup worker processes a different batch.
We have Prometheus metrics in place to monitor the deleted record cleanup:
@@ -812,7 +812,7 @@ runtime.
LooseForeignKeys::CleanupWorker.new.perform
```
-When the cleanup is done, the older partitions will be automatically detached by the
+When the cleanup is done, the older partitions are automatically detached by the
`PartitionManager`.
### PartitionManager bug
diff --git a/doc/development/database/rename_database_tables.md b/doc/development/database/rename_database_tables.md
index 44f95b1d277..cbcbd507204 100644
--- a/doc/development/database/rename_database_tables.md
+++ b/doc/development/database/rename_database_tables.md
@@ -135,4 +135,4 @@ database, ActiveRecord fetches the column information again. At this time, our p
marked table (`TABLES_TO_BE_RENAMED`) instructs ActiveRecord to use the new database table name
when fetching the database table information.
-The new version of the application will use the new database table.
+The new version of the application uses the new database table.
diff --git a/doc/development/database/table_partitioning.md b/doc/development/database/table_partitioning.md
index c16e6486f99..582c988bef9 100644
--- a/doc/development/database/table_partitioning.md
+++ b/doc/development/database/table_partitioning.md
@@ -37,13 +37,13 @@ before attempting to leverage this feature.
While partitioning can be very useful when properly applied, it's
imperative to identify if the data and workload of a table naturally fit a
partitioning scheme. There are a few details you have to understand
-in order to decide if partitioning is a good fit for your particular
+to decide if partitioning is a good fit for your particular
problem.
First, a table is partitioned on a partition key, which is a column or
set of columns which determine how the data is split across the
partitions. The partition key is used by the database when reading or
-writing data, to decide which partitions need to be accessed. The
+writing data, to decide which partitions must be accessed. The
partition key should be a column that would be included in a `WHERE`
clause on almost all queries accessing that table.
@@ -51,8 +51,8 @@ Second, it's necessary to understand the strategy the database uses
to split the data across the partitions. The scheme supported by the
GitLab migration helpers is date-range partitioning, where each partition
in the table contains data for a single month. In this case, the partitioning
-key would need to be a timestamp or date column. In order for this type of
-partitioning to work well, most queries would need to access data within a
+key must be a timestamp or date column. In order for this type of
+partitioning to work well, most queries must access data in a
certain date range.
For a more concrete example, the `audit_events` table can be used, which
@@ -73,7 +73,7 @@ CREATE TABLE audit_events (
created_at timestamptz NOT NULL);
```
-Now imagine typical queries in the UI would display the data within a
+Now imagine typical queries in the UI would display the data in a
certain date range, like a single week:
```sql
@@ -136,11 +136,11 @@ LIMIT 100
In this example, the database can't prune any partitions from the search,
because matching data could exist in any of them. As a result, it has to
query each partition individually, and aggregate the rows into a single result
-set. Since `author_id` would be indexed, the performance impact could
+set. Because `author_id` would be indexed, the performance impact could
likely be acceptable, but on more complex queries the overhead can be
substantial. Partitioning should only be leveraged if the access patterns
-of the data support the partitioning strategy, otherwise performance will
-suffer.
+of the data support the partitioning strategy, otherwise performance
+suffers.
## Partitioning a table
@@ -158,7 +158,7 @@ migration to copy data into the new table. Changes to the original table
schema can be made in parallel with the partitioning migration, but they
must take care to not break the underlying mechanism that makes the migration
work. For example, if a column is added to the table that is being
-partitioned, both the partitioned table and the trigger definition need to
+partitioned, both the partitioned table and the trigger definition must
be updated to match.
### Step 1: Creating the partitioned copy (Release N)
@@ -252,7 +252,7 @@ original table guarantees that the data remains in sync going forward.
### Step 4: Swap the partitioned and non-partitioned tables (Release N+1)
-The final step of the migration will make the partitioned table ready
+The final step of the migration makes the partitioned table ready
for use by the application. This section will be updated when the
migration helper is ready, for now development can be followed in the
[Tracking Issue](https://gitlab.com/gitlab-org/gitlab/-/issues/241267).
diff --git a/doc/development/database_review.md b/doc/development/database_review.md
index e5795d71efc..2b215190e6d 100644
--- a/doc/development/database_review.md
+++ b/doc/development/database_review.md
@@ -6,7 +6,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Database Review Guidelines
-This page is specific to database reviews. Please refer to our
+This page is specific to database reviews. Refer to our
[code review guide](code_review.md) for broader advice and best
practices for code review in general.
@@ -47,7 +47,7 @@ If new migrations are introduced, in the MR **you are required to provide**:
- The output of both migrating (`db:migrate`) and rolling back (`db:rollback`) for all migrations.
-Note that we have automated tooling for
+We have automated tooling for
[GitLab](https://gitlab.com/gitlab-org/gitlab) (provided by the
[`db:check-migrations`](database/dbcheck-migrations-job.md) pipeline job) that provides this output for migrations on
~database merge requests. You do not need to provide this information manually
@@ -88,7 +88,7 @@ A database **maintainer**'s role is to:
database reviewer and the MR author.
- Finally approve the MR and relabel the MR with ~"database::approved"
- Merge the MR if no other approvals are pending or pass it on to
- other maintainers as required (frontend, backend, docs).
+ other maintainers as required (frontend, backend, documentation).
- If not merging, remove yourself as a reviewer.
### Distributing review workload
@@ -106,7 +106,7 @@ make sure you have applied the ~database label and rerun the
### How to prepare the merge request for a database review
-In order to make reviewing easier and therefore faster, please take
+To make reviewing easier and therefore faster, take
the following preparations into account.
#### Preparation when adding migrations
@@ -139,7 +139,7 @@ of error that would result in corruption or loss of production data.
Include in the MR description:
-- If the migration itself is not reversible, details of how data changes could be reverted in the event of an incident. For example, in the case of a migration that deletes records (an operation that most of the times is not automatically revertable), how _could_ the deleted records be recovered.
+- If the migration itself is not reversible, details of how data changes could be reverted in the event of an incident. For example, in the case of a migration that deletes records (an operation that most of the times is not automatically reversible), how _could_ the deleted records be recovered.
- If the migration deletes data, apply the label `~data-deletion`.
- Concise descriptions of possible user experience impact of an error; for example, "Issues would unexpectedly go missing from Epics".
- Relevant data from the [query plans](#query-plans) that indicate the query works as expected; such as the approximate number of records that are modified or deleted.
@@ -157,7 +157,7 @@ Include in the MR description:
- In case of queries generated dynamically by using parameters, there should be one raw SQL query for each variation.
For example, a finder for issues that may take as a parameter an optional filter on projects,
- should include both the version of the simple query over issues and the one that joins issues
+ should include both the version of the query over issues and the one that joins issues
and projects and applies the filter.
There are finders or other methods that can generate a very large amount of permutations.
@@ -167,7 +167,7 @@ Include in the MR description:
For example, if joins or a group by clause are optional, the versions without the group by clause
and with less joins should be also included, while keeping the appropriate filters for the remaining tables.
-- If a query is going to be always used with a limit and an offset, those should always be
+- If a query is always used with a limit and an offset, those should always be
included with the maximum allowed limit used and a non 0 offset.
##### Query Plans
@@ -175,7 +175,7 @@ Include in the MR description:
- The query plan for each raw SQL query included in the merge request along with the link to the query plan following each raw SQL snippet.
- Provide a public link to the plan from either:
- [postgres.ai](https://postgres.ai/): Follow the link in `#database-lab` and generate a shareable, public link
- by clicking the **Share** button in the upper right corner.
+ by clicking **Share** in the upper right corner.
- [explain.depesz.com](https://explain.depesz.com) or [explain.dalibo.com](https://explain.dalibo.com): Paste both the plan and the query used in the form.
- When providing query plans, make sure it hits enough data:
- You can use a GitLab production replica to test your queries on a large scale,
@@ -204,11 +204,11 @@ Include in the MR description:
- Add foreign keys to any columns pointing to data in other tables, including [an index](migration_style_guide.md#adding-foreign-key-constraints).
- Add indexes for fields that are used in statements such as `WHERE`, `ORDER BY`, `GROUP BY`, and `JOIN`s.
- New tables and columns are not necessarily risky, but over time some access patterns are inherently
- difficult to scale. To identify these risky patterns in advance, we need to document expectations for
+ difficult to scale. To identify these risky patterns in advance, we must document expectations for
access and size. Include in the MR description answers to these questions:
- What is the anticipated growth for the new table over the next 3 months, 6 months, 1 year? What assumptions are these based on?
- How many reads and writes per hour would you expect this table to have in 3 months, 6 months, 1 year? Under what circumstances are rows updated? What assumptions are these based on?
- - Based on the anticipated data volume and access patterns, does the new table pose an availability risk to GitLab.com or self-managed instances? Will the proposed design scale to support the needs of GitLab.com and self-managed customers?
+ - Based on the anticipated data volume and access patterns, does the new table pose an availability risk to GitLab.com or self-managed instances? Does the proposed design scale to support the needs of GitLab.com and self-managed customers?
#### Preparation when removing columns, tables, indexes, or other structures
@@ -235,7 +235,7 @@ Include in the MR description:
- Check consistency with `db/structure.sql` and that migrations are [reversible](migration_style_guide.md#reversibility)
- Check that the relevant version files under `db/schema_migrations` were added or removed.
- Check queries timing (If any): In a single transaction, cumulative query time executed in a migration
- needs to fit comfortably within `15s` - preferably much less than that - on GitLab.com.
+ needs to fit comfortably in `15s` - preferably much less than that - on GitLab.com.
- For column removals, make sure the column has been [ignored in a previous release](database/avoiding_downtime_in_migrations.md#dropping-columns)
- Check [background migrations](database/background_migrations.md):
- Establish a time estimate for execution on GitLab.com. For historical purposes,
@@ -266,7 +266,7 @@ Include in the MR description:
- Query performance
- Check for any overly complex queries and queries the author specifically
points out for review (if any)
- - If not present yet, ask the author to provide SQL queries and query plans
+ - If not present, ask the author to provide SQL queries and query plans
(for example, by using [ChatOps](understanding_explain_plans.md#chatops) or direct
database access)
- For given queries, review parameters regarding data distribution
diff --git a/doc/development/fips_compliance.md b/doc/development/fips_compliance.md
index d545e162e60..5b6f6ba0d98 100644
--- a/doc/development/fips_compliance.md
+++ b/doc/development/fips_compliance.md
@@ -1,6 +1,6 @@
---
-stage: none
-group: unassigned
+stage: Create
+group: Source Code
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
diff --git a/doc/development/serializing_data.md b/doc/development/serializing_data.md
index 76e9dc0617e..97e6f665484 100644
--- a/doc/development/serializing_data.md
+++ b/doc/development/serializing_data.md
@@ -20,7 +20,7 @@ end
```
While it may be tempting to store serialized data in the database there are many
-problems with this. This document will outline these problems and provide an
+problems with this. This document outlines these problems and provide an
alternative.
## Serialized Data Is Less Powerful
@@ -34,13 +34,13 @@ turn there's no way to query the data at all.
## Waste Of Space
-Storing serialized data such as JSON or YAML will end up wasting a lot of space.
+Storing serialized data such as JSON or YAML ends up wasting a lot of space.
This is because these formats often include additional characters (for example, double
quotes or newlines) besides the data that you are storing.
## Difficult To Manage
-There comes a time where you will need to add a new field to the serialized
+There comes a time where you must add a new field to the serialized
data, or change an existing one. Using serialized data this becomes difficult
and very time consuming as the only way of doing so is to re-write all the
stored values. To do so you would have to:
@@ -51,8 +51,7 @@ stored values. To do so you would have to:
1. Serialize it back to a String
1. Store it in the database
-On the other hand, if one were to use regular columns adding a column would be
-as easy as this:
+On the other hand, if one were to use regular columns adding a column would be:
```sql
ALTER TABLE table_name ADD COLUMN column_name type;
@@ -62,9 +61,9 @@ Such a query would take very little to no time and would immediately apply to
all rows, without having to re-write large JSON or YAML structures.
Finally, there comes a time when the JSON or YAML structure is no longer
-sufficient and you need to migrate away from it. When storing only a few rows
+sufficient and you must migrate away from it. When storing only a few rows
this may not be a problem, but when storing millions of rows such a migration
-can easily take hours or even days to complete.
+can take hours or even days to complete.
## Relational Databases Are Not Document Stores
@@ -85,7 +84,7 @@ you don't need.
## The Solution
-The solution is very simple: just use separate columns and/or separate tables.
-This will allow you to use all the features provided by your database, it will
-make it easier to manage and migrate the data, you'll conserve space, you can
+The solution is to use separate columns and/or separate tables.
+This allows you to use all the features provided by your database, it
+makes it easier to manage and migrate the data, you conserve space, you can
index the data efficiently and so forth.
diff --git a/doc/development/verifying_database_capabilities.md b/doc/development/verifying_database_capabilities.md
index 9433e9d5f78..55347edf4ec 100644
--- a/doc/development/verifying_database_capabilities.md
+++ b/doc/development/verifying_database_capabilities.md
@@ -34,5 +34,5 @@ to be wrapped in a `Gitlab::Database.read_only?` or `Gitlab::Database.read_write
guard, to make sure it doesn't for read-only databases.
We have a Rails Middleware that filters any potentially writing
-operations (the CUD operations of CRUD) and prevent the user from trying
+operations (the `CUD` operations of CRUD) and prevent the user from trying
to update the database and getting a 500 error (see `Gitlab::Middleware::ReadOnly`).
diff --git a/doc/user/project/merge_requests/drafts.md b/doc/user/project/merge_requests/drafts.md
index 8e193fb84d0..13cc68f02dd 100644
--- a/doc/user/project/merge_requests/drafts.md
+++ b/doc/user/project/merge_requests/drafts.md
@@ -30,7 +30,7 @@ There are several ways to flag a merge request as a draft:
- **Commenting in an existing merge request**: Add the `/draft`
[quick action](../quick_actions.md#issues-merge-requests-and-epics)
in a comment. This quick action is a toggle, and can be repeated to change the status
- again. This quick action discards any other text in the comment.
+ back to Ready.
- **Creating a commit**: Add `draft:`, `Draft:`, `fixup!`, or `Fixup!` to the
beginning of a commit message targeting the merge request's source branch. This
is not a toggle, and adding this text again in a later commit doesn't mark the
@@ -49,10 +49,9 @@ When a merge request is ready to be merged, you can remove the `Draft` flag in s
- **Editing an existing merge request**: Remove `[Draft]`, `Draft:` or `(Draft)`
from the beginning of the title, or select **Remove the Draft: prefix from the title**
below the **Title** field.
-- **Commenting in an existing merge request**: Add the `/draft`
+- **Commenting in an existing merge request**: Add the `/ready`
[quick action](../quick_actions.md#issues-merge-requests-and-epics)
- in a comment in the merge request. This quick action is a toggle, and can be repeated
- to change the status back. This quick action discards any other text in the comment.
+ in a comment in the merge request.
In [GitLab 13.10 and later](https://gitlab.com/gitlab-org/gitlab/-/issues/15332),
when you mark a merge request as ready, notifications are triggered to
diff --git a/doc/user/project/quick_actions.md b/doc/user/project/quick_actions.md
index b73b381ffcd..d5a7058d3d2 100644
--- a/doc/user/project/quick_actions.md
+++ b/doc/user/project/quick_actions.md
@@ -67,7 +67,7 @@ threads. Some quick actions might not be available to all subscription tiers.
| `/copy_metadata <#issue>` | **{check-circle}** Yes | **{check-circle}** Yes | **{dotted-circle}** No | Copy labels and milestone from another issue in the project. |
| `/create_merge_request <branch name>` | **{check-circle}** Yes | **{dotted-circle}** No | **{dotted-circle}** No | Create a new merge request starting from the current issue. |
| `/done` | **{check-circle}** Yes | **{check-circle}** Yes | **{check-circle}** Yes | Mark to do as done. |
-| `/draft` | **{dotted-circle}** No | **{check-circle}** Yes | **{dotted-circle}** No | Toggle the draft status. |
+| `/draft` | **{dotted-circle}** No | **{check-circle}** Yes | **{dotted-circle}** No | Toggle the [draft status](merge_requests/drafts.md). |
| `/due <date>` | **{check-circle}** Yes | **{dotted-circle}** No | **{dotted-circle}** No | Set due date. Examples of valid `<date>` include `in 2 days`, `this Friday` and `December 31st`. |
| `/duplicate <#issue>` | **{check-circle}** Yes | **{dotted-circle}** No | **{dotted-circle}** No | Close this issue and mark as a duplicate of another issue. Also, mark both as related. |
| `/epic <epic>` | **{check-circle}** Yes | **{dotted-circle}** No | **{dotted-circle}** No | Add to epic `<epic>`. The `<epic>` value should be in the format of `&epic`, `group&epic`, or a URL to an epic. |
@@ -85,6 +85,7 @@ threads. Some quick actions might not be available to all subscription tiers.
| `/promote_to_incident` | **{check-circle}** Yes | **{dotted-circle}** No | **{dotted-circle}** No | Promote issue to incident ([introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/296787) in GitLab 14.5). |
| `/page <policy name>` | **{check-circle}** Yes | **{dotted-circle}** No | **{dotted-circle}** No | Start escalations for the incident ([introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/79977) in GitLab 14.9). |
| `/publish` | **{check-circle}** Yes | **{dotted-circle}** No | **{dotted-circle}** No | Publish issue to an associated [Status Page](../../operations/incident_management/status_page.md) ([Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/30906) in GitLab 13.0) |
+| `/ready` | **{dotted-circle}** No | **{check-circle}** Yes | **{dotted-circle}** No | Set the [ready status](merge_requests/drafts.md#mark-merge-requests-as-ready) ([Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/90361) in GitLab 15.1). |
| `/reassign @user1 @user2` | **{check-circle}** Yes | **{check-circle}** Yes | **{dotted-circle}** No | Replace current assignees with those specified. |
| `/reassign_reviewer @user1 @user2` | **{dotted-circle}** No | **{check-circle}** Yes | **{dotted-circle}** No | Replace current reviewers with those specified. |
| `/rebase` | **{dotted-circle}** No | **{check-circle}** Yes | **{dotted-circle}** No | Rebase source branch. This schedules a background task that attempts to rebase the changes in the source branch on the latest commit of the target branch. If `/rebase` is used, `/merge` is ignored to avoid a race condition where the source branch is merged or deleted before it is rebased. If there are merge conflicts, GitLab displays a message that a rebase cannot be scheduled. Rebase failures are displayed with the merge request status. |
diff --git a/doc/user/tasks.md b/doc/user/tasks.md
index 4ec6a851b26..fc49661c61c 100644
--- a/doc/user/tasks.md
+++ b/doc/user/tasks.md
@@ -6,13 +6,13 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Tasks **(FREE)**
-> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/334812) in GitLab 14.5 [with a flag](../administration/feature_flags.md) named `work_items`. Disabled by default.
-> - [Enabled on GitLab.com and self-managed](https://gitlab.com/gitlab-org/gitlab/-/issues/339664) in GitLab 15.1.
+> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/334812) in GitLab 14.5 [with a flag](../administration/feature_flags.md) named `work_items`. Disabled by default.
FLAG:
-On self-managed GitLab, by default this feature is available. To hide the feature,
-ask an administrator to [disable the feature flag](../administration/feature_flags.md) named `work_items`.
-On GitLab.com, this feature is available.
+On self-managed GitLab, by default this feature is not available. To make it available,
+ask an administrator to [enable the feature flag](../administration/feature_flags.md) named `work_items`.
+On GitLab.com, this feature is not available.
+The feature is not ready for production use.
Use tasks to track steps needed for the [issue](project/issues/index.md) to be closed.