Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2021-06-17 13:07:47 +0300
committerGitLab Bot <gitlab-bot@gitlab.com>2021-06-17 13:07:47 +0300
commitd670c3006e6e44901bce0d53cc4768d1d80ffa92 (patch)
tree8f65743c232e5b76850c4cc264ba15e1185815ff /doc/development
parenta5f4bba440d7f9ea47046a0a561d49adf0a1e6d4 (diff)
Add latest changes from gitlab-org/gitlab@14-0-stable-ee
Diffstat (limited to 'doc/development')
-rw-r--r--doc/development/database/database_reviewer_guidelines.md2
-rw-r--r--doc/development/database/keyset_pagination.md251
-rw-r--r--doc/development/database/pagination_guidelines.md21
-rw-r--r--doc/development/documentation/styleguide/index.md20
-rw-r--r--doc/development/fe_guide/graphql.md64
-rw-r--r--doc/development/query_performance.md4
-rw-r--r--doc/development/sidekiq_style_guide.md67
-rw-r--r--doc/development/understanding_explain_plans.md149
-rw-r--r--doc/development/usage_ping/index.md31
9 files changed, 515 insertions, 94 deletions
diff --git a/doc/development/database/database_reviewer_guidelines.md b/doc/development/database/database_reviewer_guidelines.md
index de131ddffbc..16734dada13 100644
--- a/doc/development/database/database_reviewer_guidelines.md
+++ b/doc/development/database/database_reviewer_guidelines.md
@@ -52,7 +52,7 @@ that require a more in-depth discussion between the database reviewers and maint
- [Database Office Hours Agenda](https://docs.google.com/document/d/1wgfmVL30F8SdMg-9yY6Y8djPSxWNvKmhR5XmsvYX1EI/edit).
- <i class="fa fa-youtube-play youtube" aria-hidden="true"></i> [YouTube playlist with past recordings](https://www.youtube.com/playlist?list=PL05JrBw4t0Kp-kqXeiF7fF7cFYaKtdqXM).
-You should also join the [#database-labs](../understanding_explain_plans.md#database-lab)
+You should also join the [#database-lab](../understanding_explain_plans.md#database-lab-engine)
Slack channel and get familiar with how to use Joe, the Slackbot that provides developers
with their own clone of the production database.
diff --git a/doc/development/database/keyset_pagination.md b/doc/development/database/keyset_pagination.md
new file mode 100644
index 00000000000..e30c3cc8832
--- /dev/null
+++ b/doc/development/database/keyset_pagination.md
@@ -0,0 +1,251 @@
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Keyset pagination
+
+The keyset pagination library can be used in HAML-based views and the REST API within the GitLab project.
+
+You can read about keyset pagination and how it compares to the offset based pagination on our [pagination guidelines](pagination_guidelines.md) page.
+
+## API overview
+
+### Synopsis
+
+Keyset pagination with `ActiveRecord` in Rails controllers:
+
+```ruby
+cursor = params[:cursor] # this is nil when the first page is requested
+paginator = Project.order(:created_at).keyset_paginate(cursor: cursor, per_page: 20)
+
+paginator.each do |project|
+ puts project.name # prints maximum 20 projects
+end
+```
+
+### Usage
+
+This library adds a single method to ActiveRecord relations: [`#keyset_paginate`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/config/initializers/active_record_keyset_pagination.rb).
+
+This is similar in spirit (but not in implementation) to Kaminari's `paginate` method.
+
+Keyset pagination works without any configuration for simple ActiveRecord queries:
+
+- Order by one column.
+- Order by two columns, where the last column is the primary key.
+
+The library can detect nullable and non-distinct columns and based on these, it will add extra ordering using the primary key. This is necessary because keyset pagination expects distinct order by values:
+
+```ruby
+Project.order(:created_at).keyset_paginate.records # ORDER BY created_at, id
+
+Project.order(:name).keyset_paginate.records # ORDER BY name, id
+
+Project.order(:created_at, id: :desc).keyset_paginate.records # ORDER BY created_at, id
+
+Project.order(created_at: :asc, id: :desc).keyset_paginate.records # ORDER BY created_at, id DESC
+```
+
+The `keyset_paginate` method returns [a special paginator object](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/pagination/keyset/paginator.rb) which contains the loaded records and additional information for requesting various pages.
+
+The method accepts the following keyword arguments:
+
+- `cursor` - Encoded order by column values for requesting the next page (can be `nil`).
+- `per_page` - Number of records to load per page (default 20).
+- `keyset_order_options` - Extra options for building the keyset paginated database query, see an example for `UNION` queries in the performance section (optional).
+
+The paginator object has the following methods:
+
+- `records` - Returns the records for the current page.
+- `has_next_page?` - Tells whether there is a next page.
+- `has_previous_page?` - Tells whether there is a previous page.
+- `cursor_for_next_page` - Encoded values as `String` for requesting the next page (can be `nil`).
+- `cursor_for_previous_page` - Encoded values as `String` for requesting the previous page (can be `nil`).
+- `cursor_for_first_page` - Encoded values as `String` for requesting the first page.
+- `cursor_for_last_page` - Encoded values as `String` for requesting the last page.
+- The paginator objects includes the `Enumerable` module and delegates the enumerable functionality to the `records` method/array.
+
+Example for getting the first and the second page:
+
+```ruby
+paginator = Project.order(:name).keyset_paginate
+
+paginator.to_a # same as .records
+
+cursor = paginator.cursor_for_next_page # encoded column attributes for the next page
+
+paginator = Project.order(:name).keyset_paginate(cursor: cursor).records # loading the next page
+```
+
+Since keyset pagination does not support page numbers, we are restricted to go to the following pages:
+
+- Next page
+- Previous page
+- Last page
+- First page
+
+#### Usage in Rails with HAML views
+
+Consider the following controller action, where we list the projects ordered by name:
+
+```ruby
+def index
+ @projects = Project.order(:name).keyset_paginate(cursor: params[:cursor])
+end
+```
+
+In the HAML file, we can render the records:
+
+```ruby
+- if @projects.any?
+ - @projects.each do |project|
+ .project-container
+ = project.name
+
+ = keyset_paginate @projects
+```
+
+## Performance
+
+The performance of the keyset pagination depends on the database index configuration and the number of columns we use in the `ORDER BY` clause.
+
+In case we order by the primary key (`id`), then the generated queries will be efficient since the primary key is covered by a database index.
+
+When two or more columns are used in the `ORDER BY` clause, it's advised to check the generated database query and make sure that the correct index configuration is used. More information can be found on the [pagination guideline page](pagination_guidelines.md#index-coverage).
+
+NOTE:
+While the query performance of the first page might look good, the second page (where the cursor attributes are used in the query) might yield poor performance. It's advised to always verify the performance of both queries: first page and second page.
+
+Example database query with tie-breaker (`id`) column:
+
+```sql
+SELECT "issues".*
+FROM "issues"
+WHERE (("issues"."id" > 99
+ AND "issues"."created_at" = '2021-02-16 11:26:17.408466')
+ OR ("issues"."created_at" > '2021-02-16 11:26:17.408466')
+ OR ("issues"."created_at" IS NULL))
+ORDER BY "issues"."created_at" DESC NULLS LAST, "issues"."id" DESC
+LIMIT 20
+```
+
+`OR` queries are difficult to optimize in PostgreSQL, we generally advise using [`UNION` queries](../sql.md#use-unions) instead. The keyset pagination library can generate efficient `UNION` when multiple columns are present in the `ORDER BY` clause. This is triggered when we specify the `use_union_optimization: true` option in the options passed to `Relation#keyset_paginate`.
+
+Example:
+
+```ruby
+# Triggers a simple query for the first page.
+paginator1 = Project.order(:created_at, id: :desc).keyset_paginate(per_page: 2, keyset_order_options: { use_union_optimization: true })
+
+cursor = paginator1.cursor_for_next_page
+
+# Triggers UNION query for the second page
+paginator2 = Project.order(:created_at, id: :desc).keyset_paginate(per_page: 2, cursor: cursor, keyset_order_options: { use_union_optimization: true })
+
+puts paginator2.records.to_a # UNION query
+```
+
+## Complex order configuration
+
+Common `ORDER BY` configurations will be handled by the `keyset_paginate` method automatically so no manual configuration is needed. There are a few edge cases where order object configuration is necessary:
+
+- `NULLS LAST` ordering.
+- Function-based ordering.
+- Ordering with a custom tie-breaker column, like `iid`.
+
+These order objects can be defined in the model classes as normal ActiveRecord scopes, there is no special behavior that prevents using these scopes elsewhere (kaminari, background jobs).
+
+### `NULLS LAST` ordering
+
+Consider the following scope:
+
+```ruby
+scope = Issue.where(project_id: 10).order(Gitlab::Database.nulls_last_order('relative_position', 'DESC'))
+# SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 10 ORDER BY relative_position DESC NULLS LAST
+
+scope.keyset_paginate # raises: Gitlab::Pagination::Keyset::Paginator::UnsupportedScopeOrder: The order on the scope does not support keyset pagination
+```
+
+The `keyset_paginate` method raises an error because the order value on the query is a custom SQL string and not an [`Arel`](https://www.rubydoc.info/gems/arel) AST node. The keyset library cannot automatically infer configuration values from these kinds of queries.
+
+To make keyset pagination work, we need to configure custom order objects, to do so, we need to collect information about the order columns:
+
+- `relative_position` can have duplicated values since no unique index is present.
+- `relative_position` can have null values because we don't have a not null constraint on the column. For this, we need to determine where will we see NULL values, at the beginning of the resultset or the end (`NULLS LAST`).
+- Keyset pagination requires distinct order columns, so we'll need to add the primary key (`id`) to make the order distinct.
+- Jumping to the last page and paginating backwards actually reverses the `ORDER BY` clause. For this, we'll need to provide the reversed `ORDER BY` clause.
+
+Example:
+
+```ruby
+order = Gitlab::Pagination::Keyset::Order.build([
+ # The attributes are documented in the `lib/gitlab/pagination/keyset/column_order_definition.rb` file
+ Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
+ attribute_name: 'relative_position',
+ column_expression: Issue.arel_table[:relative_position],
+ order_expression: Gitlab::Database.nulls_last_order('relative_position', 'DESC'),
+ reversed_order_expression: Gitlab::Database.nulls_first_order('relative_position', 'ASC'),
+ nullable: :nulls_last,
+ order_direction: :desc,
+ distinct: false
+ ),
+ Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
+ attribute_name: 'id',
+ order_expression: Issue.arel_table[:id].asc,
+ nullable: :not_nullable,
+ distinct: true
+ )
+])
+
+scope = Issue.where(project_id: 10).order(order) # or reorder()
+
+scope.keyset_paginate.records # works
+```
+
+### Function-based ordering
+
+In the following example, we multiply the `id` by 10 and ordering by that value. Since the `id` column is unique, we need to define only one column:
+
+```ruby
+order = Gitlab::Pagination::Keyset::Order.build([
+ Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
+ attribute_name: 'id_times_ten',
+ order_expression: Arel.sql('id * 10').asc,
+ nullable: :not_nullable,
+ order_direction: :asc,
+ distinct: true,
+ add_to_projections: true
+ )
+])
+
+paginator = Issue.where(project_id: 10).order(order).keyset_paginate(per_page: 5)
+puts paginator.records.map(&:id_times_ten)
+
+cursor = paginator.cursor_for_next_page
+
+paginator = Issue.where(project_id: 10).order(order).keyset_paginate(cursor: cursor, per_page: 5)
+puts paginator.records.map(&:id_times_ten)
+```
+
+The `add_to_projections` flag tells the paginator to expose the column expression in the `SELECT` clause. This is necessary because the keyset pagination needs to somehow extract the last value from the records to request the next page.
+
+### `iid` based ordering
+
+When ordering issues, the database ensures that we'll have distinct `iid` values within a project. Ordering by one column is enough to make the pagination work if the `project_id` filter is present:
+
+```ruby
+order = Gitlab::Pagination::Keyset::Order.build([
+ Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
+ attribute_name: 'iid',
+ order_expression: Issue.arel_table[:iid].asc,
+ nullable: :not_nullable,
+ distinct: true
+ )
+])
+
+scope = Issue.where(project_id: 10).order(order)
+
+scope.keyset_paginate.records # works
+```
diff --git a/doc/development/database/pagination_guidelines.md b/doc/development/database/pagination_guidelines.md
index 3308ebfcaae..ce656851f86 100644
--- a/doc/development/database/pagination_guidelines.md
+++ b/doc/development/database/pagination_guidelines.md
@@ -58,9 +58,7 @@ It's not possible to make all filter and sort combinations performant, so we sho
### Prepare for scaling
-Offset-based pagination is the easiest way to paginate over records, however, it does not scale well for large tables. As a long-term solution, keyset pagination is preferred. The tooling around keyset pagination is not as mature as for offset pagination so currently, it's easier to start with offset pagination and then switch to keyset pagination.
-
-To avoid losing functionality and maintaining backward compatibility when switching pagination methods, it's advised to consider the following approach in the design phase:
+Offset-based pagination is the easiest way to paginate over records, however, it does not scale well for large database tables. As a long-term solution, [keyset pagination](keyset_pagination.md) is preferred. Switching between offset and keyset pagination is generally straightforward and can be done without affecting the end-user if the following conditions are met:
- Avoid presenting total counts, prefer limit counts.
- Example: count maximum 1001 records, and then on the UI show 1000+ if the count is 1001, show the actual number otherwise.
@@ -304,7 +302,22 @@ LIMIT 20
##### Tooling
-Using keyset pagination outside of GraphQL is not straightforward. We have the low-level blocks for building keyset pagination database queries, however, the usage in application code is still not streamlined yet.
+A generic keyset pagination library is available within the GitLab project which can most of the cases easly replace the existing, kaminari based pagination with significant performance improvements when dealing with large datasets.
+
+Example:
+
+```ruby
+# first page
+paginator = Project.order(:created_at, :id).keyset_paginate(per_page: 20)
+puts paginator.to_a # records
+
+# next page
+cursor = paginator.cursor_for_next_page
+paginator = Project.order(:created_at, :id).keyset_paginate(cursor: cursor, per_page: 20)
+puts paginator.to_a # records
+```
+
+For a comprehensive overview, take a look at the [keyset pagination guide](keyset_pagination.md) page.
#### Performance
diff --git a/doc/development/documentation/styleguide/index.md b/doc/development/documentation/styleguide/index.md
index 225db273cb6..7787366dbf4 100644
--- a/doc/development/documentation/styleguide/index.md
+++ b/doc/development/documentation/styleguide/index.md
@@ -216,15 +216,15 @@ to update.
Put files for a specific product area into the related folder:
-| Directory | What belongs here |
+| Directory | Contents |
|:----------------------|:------------------|
-| `doc/user/` | User related documentation. Anything that can be done in the GitLab user interface goes here, including usage of the `/admin` interface. |
+| `doc/user/` | Documentation for users. Anything that can be done in the GitLab user interface goes here, including usage of the `/admin` interface. |
| `doc/administration/` | Documentation that requires the user to have access to the server where GitLab is installed. Administrator settings in the GitLab user interface are under `doc/user/admin_area/`. |
-| `doc/api/` | API-related documentation. |
+| `doc/api/` | Documentation for the API. |
| `doc/development/` | Documentation related to the development of GitLab, whether contributing code or documentation. Related process and style guides should go here. |
| `doc/legal/` | Legal documents about contributing to GitLab. |
-| `doc/install/` | Contains instructions for installing GitLab. |
-| `doc/update/` | Contains instructions for updating GitLab. |
+| `doc/install/` | Instructions for installing GitLab. |
+| `doc/update/` | Instructions for updating GitLab. |
| `doc/topics/` | Indexes per topic (`doc/topics/topic_name/index.md`): all resources for that topic. |
### Work with directories and files
@@ -300,11 +300,17 @@ Do not include the same information in multiple places.
## Language
-GitLab documentation should be clear and easy to understand.
+GitLab documentation should be clear and easy to understand. Avoid unnecessary words.
-- Be clear, concise, and stick to the goal of the documentation.
+- Be clear, concise, and stick to the goal of the topic.
- Write in US English with US grammar. (Tested in [`British.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/.vale/gitlab/British.yml).)
- Use [inclusive language](#inclusive-language).
+- Rewrite to avoid wordiness:
+ - there is
+ - there are
+ - enables you to
+ - in order to
+ - because of the fact that
### Capitalization
diff --git a/doc/development/fe_guide/graphql.md b/doc/development/fe_guide/graphql.md
index 870605c82f4..844ef2156d9 100644
--- a/doc/development/fe_guide/graphql.md
+++ b/doc/development/fe_guide/graphql.md
@@ -842,6 +842,70 @@ Keep in mind, this means your app will not batch queries.
Once subscriptions are mature, this process can be replaced by using them and we can remove the separate link library and return to batching queries.
+#### Subscriptions
+
+We use [subscriptions](https://www.apollographql.com/docs/react/data/subscriptions/) to receive real-time updates from GraphQL API via websockets. Currently, the number of existing subscriptions is limited, you can check a list of available ones in [GraphqiQL explorer](https://gitlab.com/-/graphql-explorer)
+
+**NOTE:**
+We cannot test subscriptions using GraphiQL, because they require an ActionCable client, which GraphiQL does not support at the moment.
+
+Subscriptions don't require any additional configuration of Apollo Client instance, you can use them in the application right away. To distinguish subscriptions from queries and mutations, we recommend naming them with `.subscription.graphql` extension:
+
+```graphql
+// ~/sidebar/queries/issuable_assignees.subscription.graphql
+
+subscription issuableAssigneesUpdated($issuableId: IssuableID!) {
+ issuableAssigneesUpdated(issuableId: $issuableId) {
+ ... on Issue {
+ assignees {
+ nodes {
+ ...User
+ status {
+ availability
+ }
+ }
+ }
+ }
+ }
+}
+```
+
+When using GraphQL subscriptions in Vue application, we recommend updating existing Apollo query results with [subscribeToMore](https://apollo.vuejs.org/guide/apollo/subscriptions.html#subscribe-to-more) option:
+
+```javascript
+import issuableAssigneesSubscription from '~/sidebar/queries/issuable_assignees.subscription.graphql'
+
+apollo: {
+ issuable: {
+ query() {
+ return assigneesQueries[this.issuableType].query;
+ },
+ subscribeToMore: {
+ // Specify the subscription that will update the query
+ document() {
+ return issuableAssigneesSubscription;
+ },
+ variables() {
+ return {
+ issuableId: convertToGraphQLId(this.issuableClass, this.issuableId),
+ };
+ },
+ // Describe how subscription should update the query
+ updateQuery(prev, { subscriptionData }) {
+ if (prev && subscriptionData?.data?.issuableAssigneesUpdated) {
+ const data = produce(prev, (draftData) => {
+ draftData.workspace.issuable.assignees.nodes =
+ subscriptionData.data.issuableAssigneesUpdated.assignees.nodes;
+ });
+ return data;
+ }
+ return prev;
+ },
+ },
+ },
+},
+```
+
### Testing
#### Generating the GraphQL schema
diff --git a/doc/development/query_performance.md b/doc/development/query_performance.md
index 87e26cf42df..3ff36c7d005 100644
--- a/doc/development/query_performance.md
+++ b/doc/development/query_performance.md
@@ -38,8 +38,8 @@ cache, or what PostgreSQL calls shared buffers. This is the "warm cache" query.
When analyzing an [`EXPLAIN` plan](understanding_explain_plans.md), you can see
the difference not only in the timing, but by looking at the output for `Buffers`
-by running your explain with `EXPLAIN(analyze, buffers)`. The [#database-lab](understanding_explain_plans.md#database-lab)
-tool will automatically include these options.
+by running your explain with `EXPLAIN(analyze, buffers)`. [Database Lab](understanding_explain_plans.md#database-lab-engine)
+will automatically include these options.
If you are making a warm cache query, you will only see the `shared hits`.
diff --git a/doc/development/sidekiq_style_guide.md b/doc/development/sidekiq_style_guide.md
index c87870b088c..7bc3ecf002f 100644
--- a/doc/development/sidekiq_style_guide.md
+++ b/doc/development/sidekiq_style_guide.md
@@ -155,7 +155,7 @@ A job scheduled for an idempotent worker is [deduplicated](#deduplication) when
an unstarted job with the same arguments is already in the queue.
WARNING:
-For [data consistency jobs](#job-data-consistency), the deduplication is not compatible with the
+For [data consistency jobs](#job-data-consistency-strategies), the deduplication is not compatible with the
`data_consistency` attribute set to `:sticky` or `:delayed`.
The reason for this is that deduplication always takes into account the latest binary replication pointer into account, not the first one.
There is an [open issue](https://gitlab.com/gitlab-org/gitlab/-/issues/325291) to improve this.
@@ -462,18 +462,56 @@ If we expect an increase of **less than 5%**, then no further action is needed.
Otherwise, please ping `@gitlab-org/scalability` on the merge request and ask
for a review.
-## Job data consistency
+## Job data consistency strategies
-In order to utilize [Sidekiq read-only database replicas capabilities](../administration/database_load_balancing.md#enable-the-load-balancer-for-sidekiq),
-set the `data_consistency` attribute of the job to `:always`, `:sticky`, or `:delayed`.
+In GitLab 13.11 and earlier, Sidekiq workers would always send database queries to the primary
+database node,
+both for reads and writes. This ensured that data integrity
+is both guaranteed and immediate, since in a single-node scenario it is impossible to encounter
+stale reads even for workers that read their own writes.
+If a worker writes to the primary, but reads from a replica, however, the possibility
+of reading a stale record is non-zero due to replicas potentially lagging behind the primary.
+
+When the number of jobs that rely on the database increases, ensuring immediate data consistency
+can put unsustainable load on the primary database server. We therefore added the ability to use
+[database load-balancing in Sidekiq workers](../administration/database_load_balancing.md#enable-the-load-balancer-for-sidekiq).
+By configuring a worker's `data_consistency` field, we can then allow the scheduler to target read replicas
+under several strategies outlined below.
+
+## Trading immediacy for reduced primary load
+
+Not requiring immediate data consistency allows developers to decide to either:
+
+- Ensure immediately consistent reads, but increase load on the primary database.
+- Prefer read replicas to add relief to the primary, but increase the likelihood of stale reads that have to be retried.
+
+By default, any worker has a data consistency requirement of `:always`, so, as before, all
+database operations target the primary. To allow for reads to be served from replicas instead, we
+added two additional consistency modes: `:sticky` and `:delayed`.
+
+When you declare either `:sticky` or `:delayed` consistency, workers become eligible for database
+load-balancing. In both cases, jobs are enqueued with a short delay.
+This minimizes the likelihood of replication lag after a write.
+
+The difference is in what happens when there is replication lag after the delay: `sticky` workers
+switch over to the primary right away, whereas `delayed` workers fail fast and are retried once.
+If they still encounter replication lag, they also switch to the primary instead.
+**If your worker never performs any writes, it is strongly advised to apply one of these consistency settings,
+since it will never need to rely on the primary database node.**
+
+The table below shows the `data_consistency` attribute and its values, ordered by the degree to which
+they prefer read replicas and will wait for replicas to catch up:
| **Data Consistency** | **Description** |
|--------------|-----------------------------|
-| `:always` | The job is required to use the primary database (default). |
-| `:sticky` | The job uses a replica as long as possible. It switches to primary either on write or long replication lag. It should be used on jobs that require to be executed as fast as possible. |
-| `:delayed` | The job always uses replica, but switches to primary on write. The job is delayed if there's a long replication lag. If the replica is not up-to-date with the next retry, it switches to the primary. It should be used on jobs where we are fine to delay the execution of a given job due to their importance such as expire caches, execute hooks, etc. |
+| `:always` | The job is required to use the primary database (default). It should be used for workers that primarily perform writes or that have very strict requirements around reading their writes without suffering any form of delay. |
+| `:sticky` | The job prefers replicas, but switches to the primary for writes or when encountering replication lag. It should be used for jobs that require to be executed as fast as possible but can sustain a small initial queuing delay. |
+| `:delayed` | The job prefers replicas, but switches to the primary for writes. When encountering replication lag before the job starts, the job is retried once. If the replica is still not up to date on the next retry, it switches to the primary. It should be used for jobs where delaying execution further typically does not matter, such as cache expiration or web hooks execution. |
+
+In all cases workers read either from a replica that is fully caught up,
+or from the primary node, so data consistency is always ensured.
-To set a data consistency for a job, use the `data_consistency` class method:
+To set a data consistency for a worker, use the `data_consistency` class method:
```ruby
class DelayedWorker
@@ -499,8 +537,8 @@ When `feature_flag` is disabled, the job defaults to `:always`, which means that
The `feature_flag` property does not allow the use of
[feature gates based on actors](../development/feature_flags/index.md).
This means that the feature flag cannot be toggled only for particular
-projects, groups, or users, but instead, you can safely use [percentage of time rollout](../development/feature_flags/index.md).
-Note that since we check the feature flag on both Sidekiq client and server, rolling out a 10% of the time,
+projects, groups, or users, but instead, you can safely use [percentage of time rollout](../development/feature_flags/index.md).
+Note that since we check the feature flag on both Sidekiq client and server, rolling out a 10% of the time,
will likely results in 1% (0.1 [from client]*0.1 [from server]) of effective jobs using replicas.
Example:
@@ -515,15 +553,6 @@ class DelayedWorker
end
```
-### Delayed job execution
-
-Scheduling workers that utilize [Sidekiq read-only database replicas capabilities](#job-data-consistency),
-(workers with `data_consistency` attribute set to `:sticky` or `:delayed`),
-by calling `SomeWorker.perform_async` results in a worker performing in the future (1 second in the future).
-
-This way, the replica has a chance to catch up, and the job will likely use the replica.
-For workers with `data_consistency` set to `:delayed`, it can also reduce the number of retried jobs.
-
## Jobs with External Dependencies
Most background jobs in the GitLab application communicate with other GitLab
diff --git a/doc/development/understanding_explain_plans.md b/doc/development/understanding_explain_plans.md
index 66dc1fef31a..f9d1e7e2eee 100644
--- a/doc/development/understanding_explain_plans.md
+++ b/doc/development/understanding_explain_plans.md
@@ -198,13 +198,39 @@ Here we can see that our filter has to remove 65,677 rows, and that we use
208,846 buffers. Each buffer in PostgreSQL is 8 KB (8192 bytes), meaning our
above node uses *1.6 GB of buffers*. That's a lot!
+Keep in mind that some statistics are per-loop averages, while others are total values:
+
+| Field name | Value type |
+| --- | --- |
+| Actual Total Time | per-loop average |
+| Actual Rows | per-loop average |
+| Buffers Shared Hit | total value |
+| Buffers Shared Read | total value |
+| Buffers Shared Dirtied | total value |
+| Buffers Shared Written | total value |
+| I/O Read Time | total value |
+| I/O Read Write | total value |
+
+For example:
+
+```sql
+ -> Index Scan using users_pkey on public.users (cost=0.43..3.44 rows=1 width=1318) (actual time=0.025..0.025 rows=1 loops=888)
+ Index Cond: (users.id = issues.author_id)
+ Buffers: shared hit=3543 read=9
+ I/O Timings: read=17.760 write=0.000
+```
+
+Here we can see that this node used 3552 buffers (3543 + 9), returned 888 rows (`888 * 1`), and the actual duration was 22.2 milliseconds (`888 * 0.025`).
+17.76 milliseconds of the total duration was spent in reading from disk, to retrieve data that was not in the cache.
+
## Node types
There are quite a few different types of nodes, so we only cover some of the
more common ones here.
A full list of all the available nodes and their descriptions can be found in
-the [PostgreSQL source file `plannodes.h`](https://gitlab.com/postgres/postgres/blob/master/src/include/nodes/plannodes.h)
+the [PostgreSQL source file `plannodes.h`](https://gitlab.com/postgres/postgres/blob/master/src/include/nodes/plannodes.h).
+pgMustard's [EXPLAIN docs](https://www.pgmustard.com/docs/explain) also offer detailed look into nodes and their fields.
### Seq Scan
@@ -441,7 +467,7 @@ When optimizing a query, we usually need to reduce the amount of data we're
dealing with. Indexes are the way to work with fewer pages (buffers) to get the
result, so, during optimization, look at the number of buffers used (read and hit),
and work on reducing these numbers. Reduced timing will be the consequence of reduced
-buffer numbers. [#database-lab](#database-lab) guarantees that the plan is structurally
+buffer numbers. [Database Lab Engine](#database-lab-engine) guarantees that the plan is structurally
identical to production (and overall number of buffers is the same as on production),
but difference in cache state and I/O speed may lead to different timings.
@@ -617,7 +643,7 @@ If we look at the plan we also see our costs are very low:
Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145)
```
-Here our cost is only 3.45, and it only takes us 0.050 milliseconds to do so.
+Here our cost is only 3.45, and it takes us 7.25 milliseconds to do so (0.05 * 145).
The next index scan is a bit more expensive:
```sql
@@ -681,64 +707,26 @@ There are a few ways to get the output of a query plan. Of course you
can directly run the `EXPLAIN` query in the `psql` console, or you can
follow one of the other options below.
-### Rails console
+### Database Lab Engine
-Using the [`activerecord-explain-analyze`](https://github.com/6/activerecord-explain-analyze)
-you can directly generate the query plan from the Rails console:
+GitLab team members can use [Database Lab Engine](https://gitlab.com/postgres-ai/database-lab), and the companion
+SQL optimization tool - [Joe Bot](https://gitlab.com/postgres-ai/joe).
-```ruby
-pry(main)> require 'activerecord-explain-analyze'
-=> true
-pry(main)> Project.where('build_timeout > ?', 3600).explain(analyze: true)
- Project Load (1.9ms) SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600)
- ↳ (pry):12
-=> EXPLAIN for: SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600)
-Seq Scan on public.projects (cost=0.00..2.17 rows=1 width=742) (actual time=0.040..0.041 rows=0 loops=1)
- Output: id, name, path, description, created_at, updated_at, creator_id, namespace_id, ...
- Filter: (projects.build_timeout > 3600)
- Rows Removed by Filter: 14
- Buffers: shared hit=2
-Planning time: 0.411 ms
-Execution time: 0.113 ms
-```
+Database Lab Engine provides developers with their own clone of the production database, while Joe Bot helps with exploring execution plans.
-### ChatOps
+Joe Bot is available in the [`#database-lab`](https://gitlab.slack.com/archives/CLJMDRD8C) channel on Slack,
+and through its [web interface](https://console.postgres.ai/gitlab/joe-instances).
-[GitLab team members can also use our ChatOps solution, available in Slack using the
-`/chatops` slash command](chatops_on_gitlabcom.md).
-You can use ChatOps to get a query plan by running the following:
+With Joe Bot you can execute DDL statements (like creating indexes, tables, and columns) and get query plans for `SELECT`, `UPDATE`, and `DELETE` statements.
-```sql
-/chatops run explain SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
-```
+For example, in order to test new index on a column that is not existing on production yet, you can do the following:
-Visualising the plan using <https://explain.depesz.com/> is also supported:
+Create the column:
```sql
-/chatops run explain --visual SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
+exec ALTER TABLE projects ADD COLUMN last_at timestamp without time zone
```
-Quoting the query is not necessary.
-
-For more information about the available options, run:
-
-```sql
-/chatops run explain --help
-```
-
-### `#database-lab`
-
-Another tool GitLab team members can use is a chatbot powered by [Joe](https://gitlab.com/postgres-ai/joe)
-which uses [Database Lab](https://gitlab.com/postgres-ai/database-lab) to instantly provide developers
-with their own clone of the production database.
-
-Joe is available in the
-[`#database-lab`](https://gitlab.slack.com/archives/CLJMDRD8C) channel on Slack.
-
-Unlike ChatOps, it gives you a way to execute DDL statements (like creating indexes and tables) and get query plan not only for `SELECT` but also `UPDATE` and `DELETE`.
-
-For example, in order to test new index you can do the following:
-
Create the index:
```sql
@@ -769,18 +757,67 @@ For more information about the available options, run:
help
```
+The web interface comes with the following execution plan visualizers included:
+
+- [Depesz](https://explain.depesz.com/)
+- [PEV2](https://github.com/dalibo/pev2)
+- [FlameGraph](https://github.com/mgartner/pg_flame)
+
#### Tips & Tricks
-The database connection is now maintained during your whole session, so you can use `exec set ...` for any session variables (such as `enable_seqscan` or `work_mem`). These settings will be applied to all subsequent commands until you reset them.
+The database connection is now maintained during your whole session, so you can use `exec set ...` for any session variables (such as `enable_seqscan` or `work_mem`). These settings will be applied to all subsequent commands until you reset them. For example you can disable parallel queries with
+
+```sql
+exec SET max_parallel_workers_per_gather = 0
+```
+
+### Rails console
+
+Using the [`activerecord-explain-analyze`](https://github.com/6/activerecord-explain-analyze)
+you can directly generate the query plan from the Rails console:
+
+```ruby
+pry(main)> require 'activerecord-explain-analyze'
+=> true
+pry(main)> Project.where('build_timeout > ?', 3600).explain(analyze: true)
+ Project Load (1.9ms) SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600)
+ ↳ (pry):12
+=> EXPLAIN for: SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600)
+Seq Scan on public.projects (cost=0.00..2.17 rows=1 width=742) (actual time=0.040..0.041 rows=0 loops=1)
+ Output: id, name, path, description, created_at, updated_at, creator_id, namespace_id, ...
+ Filter: (projects.build_timeout > 3600)
+ Rows Removed by Filter: 14
+ Buffers: shared hit=2
+Planning time: 0.411 ms
+Execution time: 0.113 ms
+```
+
+### ChatOps
+
+[GitLab team members can also use our ChatOps solution, available in Slack using the
+`/chatops` slash command](chatops_on_gitlabcom.md).
+
+NOTE:
+While ChatOps is still available, the recommended way to generate execution plans is to use [Database Lab Engine](#database-lab-engine).
-It is also possible to use transactions. This may be useful when you are working on statements that modify the data, for example INSERT, UPDATE, and DELETE. The `explain` command will perform `EXPLAIN ANALYZE`, which executes the statement. In order to run each `explain` starting from a clean state you can wrap it in a transaction, for example:
+You can use ChatOps to get a query plan by running the following:
```sql
-exec BEGIN
+/chatops run explain SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
+```
-explain UPDATE some_table SET some_column = TRUE
+Visualising the plan using <https://explain.depesz.com/> is also supported:
+
+```sql
+/chatops run explain --visual SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
+```
-exec ROLLBACK
+Quoting the query is not necessary.
+
+For more information about the available options, run:
+
+```sql
+/chatops run explain --help
```
## Further reading
diff --git a/doc/development/usage_ping/index.md b/doc/development/usage_ping/index.md
index 95dc4f2979a..de6a234e20c 100644
--- a/doc/development/usage_ping/index.md
+++ b/doc/development/usage_ping/index.md
@@ -24,11 +24,32 @@ More links:
## What is Usage Ping?
-- GitLab sends a weekly payload containing usage data to GitLab Inc. Usage Ping provides high-level data to help our product, support, and sales teams. It does not send any project names, usernames, or any other specific data. The information from the usage ping is not anonymous, it is linked to the hostname of the instance. Sending usage ping is optional, and any instance can disable analytics.
-- The usage data is primarily composed of row counts for different tables in the instance's database. By comparing these counts month over month (or week over week), we can get a rough sense for how an instance is using the different features in the product. In addition to counts, other facts
- that help us classify and understand GitLab installations are collected.
-- Usage ping is important to GitLab as we use it to calculate our Stage Monthly Active Users (SMAU) which helps us measure the success of our stages and features.
-- While usage ping is enabled, GitLab gathers data from the other instances and can show usage statistics of your instance to your users.
+Usage Ping is a process in GitLab that collects and sends a weekly payload to GitLab Inc.
+The payload provides important high-level data that helps our product, support,
+and sales teams understand how GitLab is used. For example, the data helps to:
+
+- Compare counts month over month (or week over week) to get a rough sense for how an instance uses
+ different product features.
+- Collect other facts that help us classify and understand GitLab installations.
+- Calculate our Stage Monthly Active Users (SMAU), which helps to measure the success of our stages
+ and features.
+
+Usage Ping information is not anonymous. It's linked to the instance's hostname. However, it does
+not contain project names, usernames, or any other specific data.
+
+Sending a Usage Ping payload is optional and can be [disabled](#disable-usage-ping) on any instance.
+When Usage Ping is enabled, GitLab gathers data from the other instances
+and can show your instance's usage statistics to your users.
+
+### Terminology
+
+We use the following terminology to describe the Usage Ping components:
+
+- **Usage Ping**: the process that collects and generates a JSON payload.
+- **Usage data**: the contents of the Usage Ping JSON payload. This includes metrics.
+- **Metrics**: primarily made up of row counts for different tables in an instance's database. Each
+ metric has a corresponding [metric definition](metrics_dictionary.md#metrics-definition-and-validation)
+ in a YAML file.
### Why should we enable Usage Ping?