Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2022-06-07 21:09:27 +0300
committerGitLab Bot <gitlab-bot@gitlab.com>2022-06-07 21:09:27 +0300
commit5cda8c8a420399ca9687c4a981fefd50ce5a1fdd (patch)
tree6050d7517a36798c9586e153df20a0696c5fcd4f /doc/development
parent7bbc731c75d0b8bf7c74ba77d521266d2ed0a1fc (diff)
Add latest changes from gitlab-org/gitlab@master
Diffstat (limited to 'doc/development')
-rw-r--r--doc/development/database/efficient_in_operator_queries.md4
-rw-r--r--doc/development/database/loose_foreign_keys.md22
-rw-r--r--doc/development/database/migrations_for_multiple_databases.md4
-rw-r--r--doc/development/database/strings_and_the_text_data_type.md16
-rw-r--r--doc/development/database_query_comments.md4
-rw-r--r--doc/development/insert_into_tables_in_batches.md24
-rw-r--r--doc/development/ordering_table_columns.md14
-rw-r--r--doc/development/query_performance.md8
-rw-r--r--doc/development/swapping_tables.md12
-rw-r--r--doc/development/understanding_explain_plans.md42
10 files changed, 75 insertions, 75 deletions
diff --git a/doc/development/database/efficient_in_operator_queries.md b/doc/development/database/efficient_in_operator_queries.md
index 4f18c937198..ef6ef232c9e 100644
--- a/doc/development/database/efficient_in_operator_queries.md
+++ b/doc/development/database/efficient_in_operator_queries.md
@@ -26,7 +26,7 @@ Pagination may be used to fetch subsequent records.
Example tasks requiring querying nested domain objects from the group level:
- Show first 20 issues by creation date or due date from the group `gitlab-org`.
-- Show first 20 merge_requests by merged at date from the group `gitlab-com`.
+- Show first 20 merge requests by merged at date from the group `gitlab-com`.
Unfortunately, ordered group-level queries typically perform badly
as their executions require heavy I/O, memory, and computations.
@@ -877,7 +877,7 @@ this cursor would be (`2020-01-05`, `3`) for `project_id=9`.
### Initializing the recursive CTE query
-For the initial recursive query, we'll need to produce exactly one row, we call this the
+For the initial recursive query, we need to produce exactly one row, we call this the
initializer query (`initializer_query`).
Use `ARRAY_AGG` function to compact the initial result set into a single row
diff --git a/doc/development/database/loose_foreign_keys.md b/doc/development/database/loose_foreign_keys.md
index 6889b9123ca..1a6d995e78c 100644
--- a/doc/development/database/loose_foreign_keys.md
+++ b/doc/development/database/loose_foreign_keys.md
@@ -515,13 +515,13 @@ referenced child tables.
### Database structure
The feature relies on triggers installed on the parent tables. When a parent record is deleted,
-the trigger will automatically insert a new record into the `loose_foreign_keys_deleted_records`
+the trigger automatically inserts a new record into the `loose_foreign_keys_deleted_records`
database table.
-The inserted record will store the following information about the deleted record:
+The inserted record stores the following information about the deleted record:
- `fully_qualified_table_name`: name of the database table where the record was located.
-- `primary_key_value`: the ID of the record, the value will be present in the child tables as
+- `primary_key_value`: the ID of the record, the value is present in the child tables as
the foreign key value. At the moment, composite primary keys are not supported, the parent table
must have an `id` column.
- `status`: defaults to pending, represents the status of the cleanup process.
@@ -532,7 +532,7 @@ several runs.
#### Database decomposition
-The `loose_foreign_keys_deleted_records` table will exist on both database servers (Ci and Main)
+The `loose_foreign_keys_deleted_records` table exists on both database servers (`ci` and `main`)
after the [database decomposition](https://gitlab.com/groups/gitlab-org/-/epics/6168). The worker
ill determine which parent tables belong to which database by reading the
`lib/gitlab/database/gitlab_schemas.yml` YAML file.
@@ -547,10 +547,10 @@ Example:
- `ci_builds`
- `ci_pipelines`
-When the worker is invoked for the Ci database, the worker will load deleted records only from the
+When the worker is invoked for the `ci` database, the worker loads deleted records only from the
`ci_builds` and `ci_pipelines` tables. During the cleanup process, `DELETE` and `UPDATE` queries
-will mostly run on tables located in the Main database. In this example, one `UPDATE` query will
-nullify the `merge_requests.head_pipeline_id` column.
+mostly run on tables located in the Main database. In this example, one `UPDATE` query
+nullifies the `merge_requests.head_pipeline_id` column.
#### Database partitioning
@@ -561,7 +561,7 @@ strategy was considered for the feature but due to the large data volume we deci
new strategy.
A deleted record is considered fully processed when all its direct children records have been
-cleaned up. When this happens, the loose foreign key worker will update the `status` column of
+cleaned up. When this happens, the loose foreign key worker updates the `status` column of
the deleted record. After this step, the record is no longer needed.
The sliding partitioning strategy provides an efficient way of cleaning up old, unused data by
@@ -591,7 +591,7 @@ Partitions: gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_84 FOR
```
The `partition` column controls the insert direction, the `partition` value determines which
-partition will get the deleted rows inserted via the trigger. Notice that the default value of
+partition gets the deleted rows inserted via the trigger. Notice that the default value of
the `partition` table matches with the value of the list partition (84). In `INSERT` query
within the trigger the value of the `partition` is omitted, the trigger always relies on the
default value of the column.
@@ -709,12 +709,12 @@ To mitigate these issues, several limits are applied when the worker runs.
The limit rules are implemented in the `LooseForeignKeys::ModificationTracker` class. When one of
the limits (record modification count, time limit) is reached the processing is stopped
-immediately. After some time, the next scheduled worker will continue the cleanup process.
+immediately. After some time, the next scheduled worker continues the cleanup process.
#### Performance characteristics
The database trigger on the parent tables will **decrease** the record deletion speed. Each
-statement that removes rows from the parent table will invoke the trigger to insert records
+statement that removes rows from the parent table invokes the trigger to insert records
into the `loose_foreign_keys_deleted_records` table.
The queries within the cleanup worker are fairly efficient index scans, with limits in place
diff --git a/doc/development/database/migrations_for_multiple_databases.md b/doc/development/database/migrations_for_multiple_databases.md
index 2500071f4cf..df9607f5672 100644
--- a/doc/development/database/migrations_for_multiple_databases.md
+++ b/doc/development/database/migrations_for_multiple_databases.md
@@ -260,7 +260,7 @@ the `database_tasks: false` set. `gitlab:db:validate_config` always runs before
## Validation
-Validation in a nutshell uses [pg_query](https://github.com/pganalyze/pg_query) to analyze
+Validation in a nutshell uses [`pg_query`](https://github.com/pganalyze/pg_query) to analyze
each query and classify tables with information from [`gitlab_schema.yml`](multiple_databases.md#gitlab-schema).
The migration is skipped if the specified `gitlab_schema` is outside of a list of schemas
managed by a given database connection (`Gitlab::Database::gitlab_schemas_for_connection`).
@@ -427,7 +427,7 @@ updating all `ci_pipelines`, you would set
As with all DML migrations, you cannot query another database outside of
`restrict_gitlab_migration` or `gitlab_shared`. If you need to query another database,
-you'll likely need to separate these into two migrations somehow.
+separate the migrations.
Because the actual migration logic (not the queueing step) for background
migrations runs in a Sidekiq worker, the logic can perform DML queries on
diff --git a/doc/development/database/strings_and_the_text_data_type.md b/doc/development/database/strings_and_the_text_data_type.md
index d764e54aa76..73e023f8d45 100644
--- a/doc/development/database/strings_and_the_text_data_type.md
+++ b/doc/development/database/strings_and_the_text_data_type.md
@@ -8,7 +8,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/30453) in GitLab 13.0.
-When adding new columns that will be used to store strings or other textual information:
+When adding new columns to store strings or other textual information:
1. We always use the `text` data type instead of the `string` data type.
1. `text` columns should always have a limit set, either by using the `create_table` with
@@ -142,8 +142,8 @@ instance of GitLab could have such records, so we would follow the same process
We first add the limit as a `NOT VALID` check constraint to the table, which enforces consistency when
new records are inserted or current records are updated.
-In the example above, the existing issues with more than 1024 characters in their title will not be
-affected and you'll be still able to update records in the `issues` table. However, when you'd try
+In the example above, the existing issues with more than 1024 characters in their title are not
+affected, and you are still able to update records in the `issues` table. However, when you'd try
to update the `title_html` with a title that has more than 1024 characters, the constraint causes
a database error.
@@ -182,7 +182,7 @@ end
#### Data migration to fix existing records (current release)
The approach here depends on the data volume and the cleanup strategy. The number of records that must
-be fixed on GitLab.com is a nice indicator that will help us decide whether to use a post-deployment
+be fixed on GitLab.com is a nice indicator that helps us decide whether to use a post-deployment
migration or a background data migration:
- If the data volume is less than `1,000` records, then the data migration can be executed within the post-migration.
@@ -233,7 +233,7 @@ You can find more information on the guide about [background migrations](backgro
#### Validate the text limit (next release)
-Validating the text limit will scan the whole table and make sure that each record is correct.
+Validating the text limit scans the whole table, and makes sure that each record is correct.
Still in our example, for the 13.1 milestone (next), we run the `validate_text_limit` migration
helper in a final post-deployment migration,
@@ -276,11 +276,11 @@ end
## Text limit constraints on large tables
If you have to clean up a text column for a really [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3)
-(for example, the `artifacts` in `ci_builds`), your background migration will go on for a while and
-it will need an additional [background migration cleaning up](background_migrations.md#cleaning-up)
+(for example, the `artifacts` in `ci_builds`), your background migration goes on for a while and
+it needs an additional [background migration cleaning up](background_migrations.md#cleaning-up)
in the release after adding the data migration.
-In that rare case you will need 3 releases end-to-end:
+In that rare case you need 3 releases end-to-end:
1. Release `N.M` - Add the text limit and the background migration to fix the existing records.
1. Release `N.M+1` - Cleanup the background migration.
diff --git a/doc/development/database_query_comments.md b/doc/development/database_query_comments.md
index 9cfa5540c83..2798071bc06 100644
--- a/doc/development/database_query_comments.md
+++ b/doc/development/database_query_comments.md
@@ -12,7 +12,7 @@ queries generated by ActiveRecord.
It is very useful for tracing problematic queries back to the application source.
-An engineer during an on-call incident will have the full context of a query
+An engineer during an on-call incident has the full context of a query
and its application source from the comments.
## Metadata information in comments
@@ -24,7 +24,7 @@ Queries generated from **Rails** include the following metadata in comments:
- `endpoint_id`
- `line`
-Queries generated from **Sidekiq** workers will include the following metadata
+Queries generated from **Sidekiq** workers include the following metadata
in comments:
- `application`
diff --git a/doc/development/insert_into_tables_in_batches.md b/doc/development/insert_into_tables_in_batches.md
index c8bb4ce1c6d..ebed3d16319 100644
--- a/doc/development/insert_into_tables_in_batches.md
+++ b/doc/development/insert_into_tables_in_batches.md
@@ -48,7 +48,7 @@ records = [MyModel.new, ...]
MyModel.bulk_insert!(records)
```
-Note that calls to `bulk_insert!` will always attempt to insert _new records_. If instead
+Calls to `bulk_insert!` always attempt to insert _new records_. If instead
you would like to replace existing records with new values, while still inserting those
that do not already exist, then you can use `bulk_upsert!`:
@@ -59,9 +59,9 @@ MyModel.bulk_upsert!(records, unique_by: [:name])
```
In this example, `unique_by` specifies the columns by which records are considered to be
-unique and as such will be updated if they existed prior to insertion. For example, if
+unique and as such are updated if they existed prior to insertion. For example, if
`existing_model` has a `name` attribute, and if a record with the same `name` value already
-exists, its fields will be updated with those of `existing_model`.
+exists, its fields are updated with those of `existing_model`.
The `unique_by` parameter can also be passed as a `Symbol`, in which case it specifies
a database index by which a column is considered unique:
@@ -72,8 +72,8 @@ MyModel.bulk_insert!(records, unique_by: :index_on_name)
### Record validation
-The `bulk_insert!` method guarantees that `records` will be inserted transactionally, and
-will run validations on each record prior to insertion. If any record fails to validate,
+The `bulk_insert!` method guarantees that `records` are inserted transactionally, and
+runs validations on each record prior to insertion. If any record fails to validate,
an error is raised and the transaction is rolled back. You can turn off validations via
the `:validate` option:
@@ -83,7 +83,7 @@ MyModel.bulk_insert!(records, validate: false)
### Batch size configuration
-In those cases where the number of `records` is above a given threshold, insertions will
+In those cases where the number of `records` is above a given threshold, insertions
occur in multiple batches. The default batch size is defined in
[`BulkInsertSafe::DEFAULT_BATCH_SIZE`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/bulk_insert_safe.rb).
Assuming a default threshold of 500, inserting 950 records
@@ -95,7 +95,7 @@ MyModel.bulk_insert!(records, batch_size: 100)
```
Assuming the same number of 950 records, this would result in 10 batches being written instead.
-Since this will also affect the number of `INSERT`s that occur, make sure you measure the
+Since this also affects the number of `INSERT` statements that occur, make sure you measure the
performance impact this might have on your code. There is a trade-off between the number of
`INSERT` statements the database has to process and the size and cost of each `INSERT`.
@@ -127,7 +127,7 @@ records are inserted in bulk, we currently prevent their use.
The specifics around which callbacks are explicitly allowed are defined in
[`BulkInsertSafe`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/bulk_insert_safe.rb).
Consult the module source code for details. If your class uses callbacks that are not explicitly designated
-safe and you `include BulkInsertSafe` the application will fail with an error.
+safe and you `include BulkInsertSafe` the application fails with an error.
### `BulkInsertSafe` versus `InsertAll`
@@ -155,7 +155,7 @@ owner = OwnerModel.new(owned_relations: array_of_owned_relations)
owner.save!
```
-This will issue a single `INSERT`, and transaction, for every record in `owned_relations`, which is inefficient if
+This issues a single `INSERT`, and transaction, for every record in `owned_relations`, which is inefficient if
`array_of_owned_relations` is large. To remedy this, the `BulkInsertableAssociations` concern can be
used to declare that the owner defines associations that are safe for bulk insertion:
@@ -180,8 +180,8 @@ BulkInsertableAssociations.with_bulk_insert do
end
```
-Note that you can still save relations that are not `BulkInsertSafe` in this block; they will
-simply be treated as if you had invoked `save` from outside the block.
+You can still save relations that are not `BulkInsertSafe` in this block; they
+simply are treated as if you had invoked `save` from outside the block.
## Known limitations
@@ -192,5 +192,5 @@ There are a few restrictions to how these APIs can be used:
- It does not yet support `has_many through: ...` relations.
Moreover, input data should either be limited to around 1000 records at most,
-or already batched prior to calling bulk insert. The `INSERT` statement will run in a single
+or already batched prior to calling bulk insert. The `INSERT` statement runs in a single
transaction, so for large amounts of records it may negatively affect database stability.
diff --git a/doc/development/ordering_table_columns.md b/doc/development/ordering_table_columns.md
index 42e5588e010..7cd3d4fb208 100644
--- a/doc/development/ordering_table_columns.md
+++ b/doc/development/ordering_table_columns.md
@@ -24,15 +24,15 @@ The first column is a 4-byte integer. The next is text of variable length. The
bytes. To meet the alignment requirements, four zeros are to be added right
after the first column, so `id` occupies 4 bytes, then 4 bytes of alignment
padding, and only next `name` is being stored. Therefore, in this case, 8 bytes
-will be spent for storing a 4-byte integer.
+are spent for storing a 4-byte integer.
The space between rows is also subject to alignment padding. The `user_id`
-column takes only 4 bytes, and on 64-bit platform, 4 zeroes will be added for
+column takes only 4 bytes, and on 64-bit platform, 4 zeroes are added for
alignment padding, to allow storing the next row beginning with the "clear" word.
As a result, the actual size of each column would be (omitting variable length
data and 24-byte tuple header): 8 bytes, variable, 8 bytes. This means that
-each row will require at least 16 bytes for the two 4-byte integers. If a table
+each row requires at least 16 bytes for the two 4-byte integers. If a table
has a few rows this is not an issue. However, once you start storing millions of
rows you can save space by using a different order. For the above example, the
ideal column order would be the following:
@@ -49,7 +49,7 @@ or
In these examples, the `id` and `user_id` columns are packed together, which
means we only need 8 bytes to store _both_ of them. This in turn means each row
-will require 8 bytes less space.
+requires 8 bytes less space.
Since Ruby on Rails 5.1, the default data type for IDs is `bigint`, which uses 8 bytes.
We are using `integer` in the examples to showcase a more realistic reordering scenario.
@@ -57,7 +57,7 @@ We are using `integer` in the examples to showcase a more realistic reordering s
## Type Sizes
While the [PostgreSQL documentation](https://www.postgresql.org/docs/current/datatype.html) contains plenty
-of information we will list the sizes of common types here so it's easier to
+of information we list the sizes of common types here so it's easier to
look them up. Here "word" refers to the word size, which is 4 bytes for a 32
bits platform and 8 bytes for a 64 bits platform.
@@ -69,7 +69,7 @@ bits platform and 8 bytes for a 64 bits platform.
| `real` | 4 bytes | 1 word |
| `double precision` | 8 bytes | 8 bytes |
| `boolean` | 1 byte | not needed |
-| `text` / `string` | variable, 1 byte plus the data | 1 word |
+| `text` / `string` | variable, 1 byte plus the data | 1 word |
| `bytea` | variable, 1 or 4 bytes plus the data | 1 word |
| `timestamp` | 8 bytes | 8 bytes |
| `timestamptz` | 8 bytes | 8 bytes |
@@ -77,7 +77,7 @@ bits platform and 8 bytes for a 64 bits platform.
A "variable" size means the actual size depends on the value being stored. If
PostgreSQL determines this can be embedded directly into a row it may do so, but
-for very large values it will store the data externally and store a pointer (of
+for very large values it stores the data externally and store a pointer (of
1 word in size) in the column. Because of this variable sized columns should
always be at the end of a table.
diff --git a/doc/development/query_performance.md b/doc/development/query_performance.md
index 87e41c78e19..4fe27d42c38 100644
--- a/doc/development/query_performance.md
+++ b/doc/development/query_performance.md
@@ -11,7 +11,7 @@ This document describes various guidelines to follow when optimizing SQL queries
When you are optimizing your SQL queries, there are two dimensions to pay attention to:
1. The query execution time. This is paramount as it reflects how the user experiences GitLab.
-1. The query plan. Optimizing the query plan is important in allowing queries to independently scale over time. Realizing that an index will keep a query performing well as the table grows before the query degrades is an example of why we analyze these plans.
+1. The query plan. Optimizing the query plan is important in allowing queries to independently scale over time. Realizing that an index keeps a query performing well as the table grows before the query degrades is an example of why we analyze these plans.
## Timing guidelines for queries
@@ -39,9 +39,9 @@ cache, or what PostgreSQL calls shared buffers. This is the "warm cache" query.
When analyzing an [`EXPLAIN` plan](understanding_explain_plans.md), you can see
the difference not only in the timing, but by looking at the output for `Buffers`
by running your explain with `EXPLAIN(analyze, buffers)`. [Database Lab](understanding_explain_plans.md#database-lab-engine)
-will automatically include these options.
+automatically includes these options.
-If you are making a warm cache query, you will only see the `shared hits`.
+If you are making a warm cache query, you see only the `shared hits`.
For example in #database-lab:
@@ -57,7 +57,7 @@ Or in the explain plan from `psql`:
Buffers: shared hit=7323
```
-If the cache is cold, you will also see `reads`.
+If the cache is cold, you also see `reads`.
In #database-lab:
diff --git a/doc/development/swapping_tables.md b/doc/development/swapping_tables.md
index d6c5b8f0662..efb481ccf35 100644
--- a/doc/development/swapping_tables.md
+++ b/doc/development/swapping_tables.md
@@ -10,12 +10,12 @@ Sometimes you need to replace one table with another. For example, when
migrating data in a very large table it's often better to create a copy of the
table and insert & migrate the data into this new table in the background.
-Let's say you want to swap the table "events" with "events_for_migration". In
+Let's say you want to swap the table `events` with `events_for_migration`. In
this case you need to follow 3 steps:
-1. Rename "events" to "events_temporary"
-1. Rename "events_for_migration" to "events"
-1. Rename "events_temporary" to "events_for_migration"
+1. Rename `events` to `events_temporary`
+1. Rename `events_for_migration` to `events`
+1. Rename `events_temporary` to `events_for_migration`
Rails allows you to do this using the `rename_table` method:
@@ -27,7 +27,7 @@ rename_table :events_temporary, :events_for_migration
This does not require any downtime as long as the 3 `rename_table` calls are
executed in the _same_ database transaction. Rails by default uses database
-transactions for migrations, but if it doesn't you'll need to start one
+transactions for migrations, but if it doesn't you need to start one
manually:
```ruby
@@ -45,7 +45,7 @@ PostgreSQL you can use the `reset_pk_sequence!` method like so:
reset_pk_sequence!('events')
```
-Failure to reset the primary keys will result in newly created rows starting
+Failure to reset the primary keys results in newly created rows starting
with an ID value of 1. Depending on the existing data this can then lead to
duplicate key constraints from popping up, preventing users from creating new
data.
diff --git a/doc/development/understanding_explain_plans.md b/doc/development/understanding_explain_plans.md
index 3fc071bc5ff..17fcd5b3e88 100644
--- a/doc/development/understanding_explain_plans.md
+++ b/doc/development/understanding_explain_plans.md
@@ -7,7 +7,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Understanding EXPLAIN plans
PostgreSQL allows you to obtain query plans using the `EXPLAIN` command. This
-command can be invaluable when trying to determine how a query will perform.
+command can be invaluable when trying to determine how a query performs.
You can use this command directly in your SQL query, as long as the query starts
with it:
@@ -26,7 +26,7 @@ Aggregate (cost=922411.76..922411.77 rows=1 width=8)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
```
-When using _just_ `EXPLAIN`, PostgreSQL won't actually execute our query,
+When using _just_ `EXPLAIN`, PostgreSQL does not actually execute our query,
instead it produces an _estimated_ execution plan based on the available
statistics. This means the actual plan can differ quite a bit. Fortunately,
PostgreSQL provides us with the option to execute the query as well. To do so,
@@ -39,7 +39,7 @@ FROM projects
WHERE visibility_level IN (0, 20);
```
-This will produce:
+This produces:
```sql
Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1)
@@ -54,7 +54,7 @@ As we can see this plan is quite different, and includes a lot more data. Let's
discuss this step by step.
Because `EXPLAIN ANALYZE` executes the query, care should be taken when using a
-query that will write data or might time out. If the query modifies data,
+query that writes data or might time out. If the query modifies data,
consider wrapping it in a transaction that rolls back automatically like so:
```sql
@@ -73,7 +73,7 @@ FROM projects
WHERE visibility_level IN (0, 20);
```
-This will then produce:
+This then produces:
```sql
Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1)
@@ -120,10 +120,10 @@ Aggregate (cost=922411.76..922411.77 rows=1 width=8)
Here the first node executed is `Seq scan on projects`. The `Filter:` is an
additional filter applied to the results of the node. A filter is very similar
to Ruby's `Array#select`: it takes the input rows, applies the filter, and
-produces a new list of rows. Once the node is done, we perform the `Aggregate`
+produces a new list of rows. After the node is done, we perform the `Aggregate`
above it.
-Nested nodes will look like this:
+Nested nodes look like this:
```sql
Aggregate (cost=176.97..176.98 rows=1 width=8) (actual time=0.252..0.252 rows=1 loops=1)
@@ -152,7 +152,7 @@ number of rows produced, the number of loops performed, and more. For example:
Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0)
```
-Here we can see that our cost ranges from `0.00..908044.47` (we'll cover this in
+Here we can see that our cost ranges from `0.00..908044.47` (we cover this in
a moment), and we estimate (since we're using `EXPLAIN` and not `EXPLAIN
ANALYZE`) a total of 5,746,914 rows to be produced by this node. The `width`
statistics describes the estimated width of each row, in bytes.
@@ -171,7 +171,7 @@ The startup cost states how expensive it was to start the node, with the total
cost describing how expensive the entire node was. In general: the greater the
values, the more expensive the node.
-When using `EXPLAIN ANALYZE`, these statistics will also include the actual time
+When using `EXPLAIN ANALYZE`, these statistics also include the actual time
(in milliseconds) spent, and other runtime statistics (for example, the actual number of
produced rows):
@@ -183,7 +183,7 @@ Here we can see we estimated 5,746,969 rows to be returned, but in reality we
returned 5,746,940 rows. We can also see that _just_ this sequential scan took
2.98 seconds to run.
-Using `EXPLAIN (ANALYZE, BUFFERS)` will also give us information about the
+Using `EXPLAIN (ANALYZE, BUFFERS)` also gives us information about the
number of rows removed by a filter, the number of buffers used, and more. For
example:
@@ -242,7 +242,7 @@ retrieving lots of rows, so it's best to avoid these for large tables.
A scan on an index that did not require fetching anything from the table. In
certain cases an index only scan may still fetch data from the table, in this
-case the node will include a `Heap Fetches:` statistic.
+case the node includes a `Heap Fetches:` statistic.
### Index Scan
@@ -273,7 +273,7 @@ Sorts the input rows as specified using an `ORDER BY` statement.
### Nested Loop
-A nested loop will execute its child nodes for every row produced by a node that
+A nested loop executes its child nodes for every row produced by a node that
precedes it. For example:
```sql
@@ -316,7 +316,7 @@ FROM users
WHERE twitter != '';
```
-This will produce the following plan:
+This produces the following plan:
```sql
Aggregate (cost=845110.21..845110.22 rows=1 width=8) (actual time=1271.157..1271.158 rows=1 loops=1)
@@ -435,7 +435,7 @@ This index would only index the `email` value of rows that match `WHERE id <
CREATE INDEX CONCURRENTLY twitter_test ON users (twitter) WHERE twitter != '';
```
-Once created, if we run our query again we will be given the following plan:
+After being created, if we run our query again we are given the following plan:
```sql
Aggregate (cost=1608.26..1608.27 rows=1 width=8) (actual time=19.821..19.821 rows=1 loops=1)
@@ -466,7 +466,7 @@ be used for comparison (for example, it depends a lot on the state of cache).
When optimizing a query, we usually need to reduce the amount of data we're
dealing with. Indexes are the way to work with fewer pages (buffers) to get the
result, so, during optimization, look at the number of buffers used (read and hit),
-and work on reducing these numbers. Reduced timing will be the consequence of reduced
+and work on reducing these numbers. Reduced timing is the consequence of reduced
buffer numbers. [Database Lab Engine](#database-lab-engine) guarantees that the plan is structurally
identical to production (and overall number of buffers is the same as on production),
but difference in cache state and I/O speed may lead to different timings.
@@ -508,8 +508,8 @@ index on `projects.visibility_level` to somehow turn this Sequential scan +
filter into an index-only scan.
Unfortunately, doing so is unlikely to improve anything. Contrary to what some
-might believe, an index being present _does not guarantee_ that PostgreSQL will
-actually use it. For example, when doing a `SELECT * FROM projects` it is much
+might believe, an index being present _does not guarantee_ that PostgreSQL
+actually uses it. For example, when doing a `SELECT * FROM projects` it is much
cheaper to just scan the entire table, instead of using an index and then
fetching data from the table. In such cases PostgreSQL may decide to not use an
index.
@@ -539,7 +539,7 @@ For GitLab.com this produces:
Here the total number of projects is 5,811,804, and 5,746,126 of those are of
level 0 or 20. That's 98% of the entire table!
-So no matter what we do, this query will retrieve 98% of the entire table. Since
+So no matter what we do, this query retrieves 98% of the entire table. Since
most time is spent doing exactly that, there isn't really much we can do to
improve this query, other than _not_ running it at all.
@@ -589,7 +589,7 @@ Foreign-key constraints:
"fk_rails_722ceba4f7" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE
```
-Let's rewrite our query to JOIN this table onto our projects, and get the
+Let's rewrite our query to `JOIN` this table onto our projects, and get the
projects for a specific user:
```sql
@@ -604,7 +604,7 @@ AND user_interacted_projects.user_id = 1;
What we do here is the following:
1. Get our projects.
-1. INNER JOIN `user_interacted_projects`, meaning we're only left with rows in
+1. `INNER JOIN` `user_interacted_projects`, meaning we're only left with rows in
`projects` that have a corresponding row in `user_interacted_projects`.
1. Limit this to the projects with `visibility_level` of 0 or 20, and to
projects that the user with ID 1 interacted with.
@@ -765,7 +765,7 @@ The web interface comes with the following execution plan visualizers included:
#### Tips & Tricks
-The database connection is now maintained during your whole session, so you can use `exec set ...` for any session variables (such as `enable_seqscan` or `work_mem`). These settings will be applied to all subsequent commands until you reset them. For example you can disable parallel queries with
+The database connection is now maintained during your whole session, so you can use `exec set ...` for any session variables (such as `enable_seqscan` or `work_mem`). These settings are applied to all subsequent commands until you reset them. For example you can disable parallel queries with
```sql
exec SET max_parallel_workers_per_gather = 0