Add latest changes from gitlab-org/gitlab@master

author: GitLab Bot <gitlab-bot@gitlab.com> 2021-09-14 15:10:35 +0300
committer: GitLab Bot <gitlab-bot@gitlab.com> 2021-09-14 15:10:35 +0300
commit: d378fdaa60adb7217e3fc798580ad206127728d5 (patch)
tree: 9cb715513dd4d4197f76b2908458551940b0a094 /doc/development/database/efficient_in_operator_queries.md
parent: 7b69070a7468c4a9b6fe0ed7fbf1b3f2b58434e0 (diff)
1 files changed, 948 insertions, 0 deletions
diff --git a/doc/development/database/efficient_in_operator_queries.md b/doc/development/database/efficient_in_operator_queries.md
new file mode 100644
index 00000000000..207fa6c3832
--- /dev/null
+++ b/doc/development/database/efficient_in_operator_queries.md
@@ -0,0 +1,948 @@
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Efficient `IN` operator queries
+
+This document describes a technique for building efficient ordered database queries with the `IN`
+SQL operator and the usage of a GitLab utility module to help apply the technique.
+
+NOTE:
+The described technique makes heavy use of
+[keyset pagination](pagination_guidelines.md#keyset-pagination).
+It's advised to get familiar with the topic first.
+
+## Motivation
+
+In GitLab, many domain objects like `Issue` live under nested hierarchies of projects and groups.
+To fetch nested database records for domain objects at the group-level,
+we often perform queries with the `IN` SQL operator.
+We are usually interested in ordering the records by some attributes
+and limiting the number of records using `ORDER BY` and `LIMIT` clauses for performance.
+Pagination may be used to fetch subsequent records.
+
+Example tasks requiring querying nested domain objects from the group level:
+
+- Show first 20 issues by creation date or due date from the group `gitlab-org`.
+- Show first 20 merge_requests by merged at date from the group `gitlab-com`.
+
+Unfortunately, ordered group-level queries typically perform badly
+as their executions require heavy I/O, memory, and computations.
+Let's do an in-depth examination of executing one such query.
+
+### Performance problems with `IN` queries
+
+Consider the task of fetching the twenty oldest created issues
+from the group `gitlab-org` with the following query:
+
+```sql
+SELECT "issues".*
+FROM "issues"
+WHERE "issues"."project_id" IN
+    (SELECT "projects"."id"
+     FROM "projects"
+     WHERE "projects"."namespace_id" IN
+         (SELECT traversal_ids[array_length(traversal_ids, 1)] AS id
+          FROM "namespaces"
+          WHERE (traversal_ids @> ('{9970}'))))
+ORDER BY "issues"."created_at" ASC,
+         "issues"."id" ASC
+LIMIT 20
+```
+
+NOTE:
+For pagination, ordering by the `created_at` column is not enough,
+we must add the `id` column as a
+[tie-breaker](pagination_performance_guidelines.md#tie-breaker-column).
+
+The execution of the query can be largely broken down into three steps:
+
+1. The database accesses both `namespaces` and `projects` tables
+   to find all projects from all groups in the group hierarchy.
+1. The database retrieves `issues` records for each project causing heavy disk I/O.
+   Ideally, an appropriate index configuration should optimize this process.
+1. The database sorts the `issues` rows in memory by `created_at` and returns `LIMIT 20` rows to
+   the end-user. For large groups, this final step requires both large memory and CPU resources.
+
+<details>
+<summary>Expand this sentence to see the execution plan for this DB query.</summary>
+<pre><code>
+ Limit  (cost=90170.07..90170.12 rows=20 width=1329) (actual time=967.597..967.607 rows=20 loops=1)
+   Buffers: shared hit=239127 read=3060
+   I/O Timings: read=336.879
+   ->  Sort  (cost=90170.07..90224.02 rows=21578 width=1329) (actual time=967.596..967.603 rows=20 loops=1)
+         Sort Key: issues.created_at, issues.id
+         Sort Method: top-N heapsort  Memory: 74kB
+         Buffers: shared hit=239127 read=3060
+         I/O Timings: read=336.879
+         ->  Nested Loop  (cost=1305.66..89595.89 rows=21578 width=1329) (actual time=4.709..797.659 rows=241534 loops=1)
+               Buffers: shared hit=239121 read=3060
+               I/O Timings: read=336.879
+               ->  HashAggregate  (cost=1305.10..1360.22 rows=5512 width=4) (actual time=4.657..5.370 rows=1528 loops=1)
+                     Group Key: projects.id
+                     Buffers: shared hit=2597
+                     ->  Nested Loop  (cost=576.76..1291.32 rows=5512 width=4) (actual time=2.427..4.244 rows=1528 loops=1)
+                           Buffers: shared hit=2597
+                           ->  HashAggregate  (cost=576.32..579.06 rows=274 width=25) (actual time=2.406..2.447 rows=265 loops=1)
+                                 Group Key: namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)]
+                                 Buffers: shared hit=334
+                                 ->  Bitmap Heap Scan on namespaces  (cost=141.62..575.63 rows=274 width=25) (actual time=1.933..2.330 rows=265 loops=1)
+                                       Recheck Cond: (traversal_ids @> '{9970}'::integer[])
+                                       Heap Blocks: exact=243
+                                       Buffers: shared hit=334
+                                       ->  Bitmap Index Scan on index_namespaces_on_traversal_ids  (cost=0.00..141.55 rows=274 width=0) (actual time=1.897..1.898 rows=265 loops=1)
+                                             Index Cond: (traversal_ids @> '{9970}'::integer[])
+                                             Buffers: shared hit=91
+                           ->  Index Only Scan using index_projects_on_namespace_id_and_id on projects  (cost=0.44..2.40 rows=20 width=8) (actual time=0.004..0.006 rows=6 loops=265)
+                                 Index Cond: (namespace_id = (namespaces.traversal_ids)[array_length(namespaces.traversal_ids, 1)])
+                                 Heap Fetches: 51
+                                 Buffers: shared hit=2263
+               ->  Index Scan using index_issues_on_project_id_and_iid on issues  (cost=0.57..10.57 rows=544 width=1329) (actual time=0.114..0.484 rows=158 loops=1528)
+                     Index Cond: (project_id = projects.id)
+                     Buffers: shared hit=236524 read=3060
+                     I/O Timings: read=336.879
+ Planning Time: 7.750 ms
+ Execution Time: 967.973 ms
+(36 rows)
+</code></pre>
+</details>
+
+The performance of the query depends on the number of rows in the database.
+On average, we can say the following:
+
+- Number of groups in the group-hierarchy: less than 1 000
+- Number of projects: less than 5 000
+- Number of issues: less than 100 000
+
+From the list, it's apparent that the number of `issues` records has
+the largest impact on the performance.
+As per normal usage, we can say that the number of issue records grows
+at a faster rate than the `namespaces` and the `projects` records.
+
+This problem affects most of our group-level features where records are listed
+in a specific order, such as group-level issues, merge requests pages, and APIs.
+For very large groups the database queries can easily time out, causing HTTP 500 errors.
+
+## Optimizing ordered `IN` queries
+
+In the talk
+["How to teach an elephant to dance rock'n'roll"](https://www.youtube.com/watch?v=Ha38lcjVyhQ),
+Maxim Boguk demonstrated a technique to optimize a special class of ordered `IN` queries,
+such as our ordered group-level queries.
+
+A typical ordered `IN` query may look like this:
+
+```sql
+SELECT t.* FROM t
+WHERE t.fkey IN (value_set)
+ORDER BY t.pkey
+LIMIT N;
+```
+
+Here's the key insight used in the technique: we need at most `|value_set| + N` record lookups,
+rather than retrieving all records satisfying the condition `t.fkey IN value_set` (`|value_set|`
+is the number of values in `value_set`).
+
+We adopted and generalized the technique for use in GitLab by implementing utilities in the
+`Gitlab::Pagination::Keyset::InOperatorOptimization` class to facilitate building efficient `IN`
+queries.
+
+### Requirements
+
+The technique is not a drop-in replacement for the existing group-level queries using `IN` operator.
+The technique can only optimize `IN` queries that satisfy the following requirements:
+
+- `LIMIT` is present, which usually means that the query is paginated
+  (offset or keyset pagination).
+- The column used with the `IN` query and the columns in the `ORDER BY`
+  clause are covered with a database index. The columns in the index must be
+  in the following order: `column_for_the_in_query`, `order by column 1`, and
+  `order by column 2`.
+- The columns in the `ORDER BY` clause are distinct
+  (the combination of the columns uniquely identifies one particular column in the table).
+
+WARNING:
+This technique will not improve the performance of the `COUNT(*)` queries.
+
+## The `InOperatorOptimization` module
+
+> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/67352) in GitLab 14.3.
+
+The `Gitlab::Pagination::Keyset::InOperatorOptimization` module implements utilities for applying a generalized version of
+the efficient `IN` query technique described in the previous section.
+
+To build optimized, ordered `IN` queries that meet [the requirements](#requirements),
+use the utility class `QueryBuilder` from the module.
+
+NOTE:
+The generic keyset pagination module introduced in the merge request
+[51481](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/51481)
+plays a fundamental role in the generalized implementation of the technique
+in `Gitlab::Pagination::Keyset::InOperatorOptimization`.
+
+### Basic usage of `QueryBuilder`
+
+To illustrate a basic usage, we will build a query that
+fetches 20 issues with the oldest `created_at` from the group `gitlab-org`.
+
+The following ActiveRecord query would produce a query similar to
+[the unoptimized query](#performance-problems-with-in-queries) that we examined earlier:
+
+```ruby
+scope = Issue
+  .where(project_id: Group.find(9970).all_projects.select(:id)) # `gitlab-org` group and its subgroups
+  .order(:created_at, :id)
+  .limit(20)
+```
+
+Instead, use the query builder `InOperatorOptimization::QueryBuilder` to produce an optimized
+version:
+
+```ruby
+scope = Issue.order(:created_at, :id)
+array_scope = Group.find(9970).all_projects.select(:id)
+array_mapping_scope = -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) }
+finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+
+Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
+  scope: scope,
+  array_scope: array_scope,
+  array_mapping_scope: array_mapping_scope,
+  finder_query: finder_query
+).execute.limit(20)
+```
+
+- `scope` represents the original `ActiveRecord::Relation` object without the `IN` query. The
+  relation should define an order which must be supported by the
+  [keyset pagination library](keyset_pagination.md#usage).
+- `array_scope` contains the `ActiveRecord::Relation` object, which represents the original
+  `IN (subquery)`. The select values must contain the columns by which the subquery is "connected"
+  to the main query: the `id` of the project record.
+- `array_mapping_scope` defines a lambda returning an `ActiveRecord::Relation` object. The lambda
+  matches (`=`) single select values from the `array_scope`. The lambda yields as many
+  arguments as the select values defined in the `array_scope`. The arguments are Arel SQL expressions.
+- `finder_query` loads the actual record row from the database. It must also be a lambda, where
+  the order by column expressions is available for locating the record. In this example, the
+  yielded values are `created_at` and `id` SQL expressions. Finding a record is very fast via the
+  primary key, so we don't use the `created_at` value.
+
+The following database index on the `issues` table must be present
+to make the query execute efficiently:
+
+```sql
+"idx_issues_on_project_id_and_created_at_and_id" btree (project_id, created_at, id)
+```
+
+<details>
+<summary>Expand this sentence to see the SQL query.</summary>
+<pre><code>
+SELECT "issues".*
+FROM
+  (WITH RECURSIVE "array_cte" AS MATERIALIZED
+     (SELECT "projects"."id"
+ FROM "projects"
+ WHERE "projects"."namespace_id" IN
+     (SELECT traversal_ids[array_length(traversal_ids, 1)] AS id
+      FROM "namespaces"
+      WHERE (traversal_ids @> ('{9970}')))),
+                  "recursive_keyset_cte" AS (  -- initializer row start
+                                               (SELECT NULL::issues AS records,
+                                                       array_cte_id_array,
+                                                       issues_created_at_array,
+                                                       issues_id_array,
+                                                       0::bigint AS COUNT
+                                                FROM
+                                                  (SELECT ARRAY_AGG("array_cte"."id") AS array_cte_id_array,
+                                                          ARRAY_AGG("issues"."created_at") AS issues_created_at_array,
+                                                          ARRAY_AGG("issues"."id") AS issues_id_array
+                                                   FROM
+                                                     (SELECT "array_cte"."id"
+                                                      FROM array_cte) array_cte
+                                                   LEFT JOIN LATERAL
+                                                     (SELECT "issues"."created_at",
+                                                             "issues"."id"
+                                                      FROM "issues"
+                                                      WHERE "issues"."project_id" = "array_cte"."id"
+                                                      ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+                                                      LIMIT 1) issues ON TRUE
+                                                   WHERE "issues"."created_at" IS NOT NULL
+                                                     AND "issues"."id" IS NOT NULL) array_scope_lateral_query
+                                                LIMIT 1)
+                                                -- initializer row finished
+                                             UNION ALL
+                                               (SELECT
+                                                  -- result row start
+                                                  (SELECT issues -- record finder query as the first column
+                                                   FROM "issues"
+                                                   WHERE "issues"."id" = recursive_keyset_cte.issues_id_array[position]
+                                                   LIMIT 1),
+                                                   array_cte_id_array,
+                                                   recursive_keyset_cte.issues_created_at_array[:position_query.position-1]||next_cursor_values.created_at||recursive_keyset_cte.issues_created_at_array[position_query.position+1:],
+                                                   recursive_keyset_cte.issues_id_array[:position_query.position-1]||next_cursor_values.id||recursive_keyset_cte.issues_id_array[position_query.position+1:],
+                                                   recursive_keyset_cte.count + 1
+                                                -- result row finished
+                                                FROM recursive_keyset_cte,
+                                                     LATERAL
+                                                  -- finding the cursor values of the next record start
+                                                  (SELECT created_at,
+                                                          id,
+                                                          position
+                                                   FROM UNNEST(issues_created_at_array, issues_id_array) WITH
+                                                   ORDINALITY AS u(created_at, id, position)
+                                                   WHERE created_at IS NOT NULL
+                                                     AND id IS NOT NULL
+                                                   ORDER BY "created_at" ASC, "id" ASC
+                                                   LIMIT 1) AS position_query,
+                                                  -- finding the cursor values of the next record end
+                                                  -- finding the next cursor values (next_cursor_values_query) start
+                                                             LATERAL
+                                                  (SELECT "record"."created_at",
+                                                          "record"."id"
+                                                   FROM (
+                                                         VALUES (NULL,
+                                                                 NULL)) AS nulls
+                                                   LEFT JOIN
+                                                     (SELECT "issues"."created_at",
+                                                             "issues"."id"
+                                                      FROM (
+                                                              (SELECT "issues"."created_at",
+                                                                      "issues"."id"
+                                                               FROM "issues"
+                                                               WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
+                                                                 AND recursive_keyset_cte.issues_created_at_array[position] IS NULL
+                                                                 AND "issues"."created_at" IS NULL
+                                                                 AND "issues"."id" > recursive_keyset_cte.issues_id_array[position]
+                                                               ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+                                                            UNION ALL
+                                                              (SELECT "issues"."created_at",
+                                                                      "issues"."id"
+                                                               FROM "issues"
+                                                               WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
+                                                                 AND recursive_keyset_cte.issues_created_at_array[position] IS NOT NULL
+                                                                 AND "issues"."created_at" IS NULL
+                                                               ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+                                                            UNION ALL
+                                                              (SELECT "issues"."created_at",
+                                                                      "issues"."id"
+                                                               FROM "issues"
+                                                               WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
+                                                                 AND recursive_keyset_cte.issues_created_at_array[position] IS NOT NULL
+                                                                 AND "issues"."created_at" > recursive_keyset_cte.issues_created_at_array[position]
+                                                               ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+                                                            UNION ALL
+                                                              (SELECT "issues"."created_at",
+                                                                      "issues"."id"
+                                                               FROM "issues"
+                                                               WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
+                                                                 AND recursive_keyset_cte.issues_created_at_array[position] IS NOT NULL
+                                                                 AND "issues"."created_at" = recursive_keyset_cte.issues_created_at_array[position]
+                                                                 AND "issues"."id" > recursive_keyset_cte.issues_id_array[position]
+                                                               ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)) issues
+                                                      ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+                                                      LIMIT 1) record ON TRUE
+                                                   LIMIT 1) AS next_cursor_values))
+                                                  -- finding the next cursor values (next_cursor_values_query) END
+SELECT (records).*
+   FROM "recursive_keyset_cte" AS "issues"
+   WHERE (COUNT <> 0)) issues -- filtering out the initializer row
+LIMIT 20
+</code></pre>
+</details>
+
+### Using the `IN` query optimization
+
+#### Adding more filters
+
+In this example, let's add an extra filter by `milestone_id`.
+
+Be careful when adding extra filters to the query. If the column is not covered by the same index,
+then the query might perform worse than the non-optimized query. The `milestone_id` column in the
+`issues` table is currently covered by a different index:
+
+```sql
+"index_issues_on_milestone_id" btree (milestone_id)
+```
+
+Adding the `miletone_id = X` filter to the `scope` argument or to the optimized scope causes bad performance.
+
+Example (bad):
+
+```ruby
+Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
+  scope: scope,
+  array_scope: array_scope,
+  array_mapping_scope: array_mapping_scope,
+  finder_query: finder_query
+).execute
+  .where(milestone_id: 5)
+  .limit(20)
+```
+
+To address this concern, we could define another index:
+
+```sql
+"idx_issues_on_project_id_and_milestone_id_and_created_at_and_id" btree (project_id, milestone_id, created_at, id)
+```
+
+Adding more indexes to the `issues` table could significantly affect the performance of
+the `UPDATE` queries. In this case, it's better to rely on the original query. It means that if we
+want to use the optimization for the unfiltered page we need to add extra logic in the application code:
+
+```ruby
+if optimization_possible? # no extra params or params covered with the same index as the ORDER BY clause
+  run_optimized_query
+else
+  run_normal_in_query
+end
+```
+
+#### Multiple `IN` queries
+
+Let's assume that we want to extend the group-level queries to include only incident and test case
+issue types.
+
+The original ActiveRecord query would look like this:
+
+```ruby
+scope = Issue
+  .where(project_id: Group.find(9970).all_projects.select(:id)) # `gitlab-org` group and its subgroups
+  .where(issue_type: [:incident, :test_case]) # 1, 2
+  .order(:created_at, :id)
+  .limit(20)
+```
+
+To construct the array scope, we'll need to take the Cartesian product of the `project_id IN` and
+the `issue_type IN` queries. `issue_type` is an ActiveRecord enum, so we need to
+construct the following table:
+
+| `project_id` | `issue_type_value` |
+| ------------ | ------------------ |
+| 2            | 1                  |
+| 2            | 2                  |
+| 5            | 1                  |
+| 5            | 2                  |
+| 10           | 1                  |
+| 10           | 2                  |
+| 9            | 1                  |
+| 9            | 2                  |
+
+For the `issue_types` query we can construct a value list without querying a table:
+
+```ruby
+value_list = Arel::Nodes::ValuesList.new([[Issue.issue_types[:incident]],[Issue.issue_types[:test_case]]])
+issue_type_values = Arel::Nodes::Grouping.new(value_list).as('issue_type_values (value)').to_sql
+
+array_scope = Group
+  .find(9970)
+  .all_projects
+  .from("#{Project.table_name}, #{issue_type_values}")
+  .select(:id, :value)
+```
+
+Building the `array_mapping_scope` query requires two arguments: `id` and `issue_type_value`:
+
+```ruby
+array_mapping_scope = -> (id_expression, issue_type_value_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)).where(Issue.arel_table[:issue_type].eq(issue_type_value_expression)) }
+```
+
+The `scope` and the `finder` queries don't change:
+
+```ruby
+scope = Issue.order(:created_at, :id)
+finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+
+Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
+  scope: scope,
+  array_scope: array_scope,
+  array_mapping_scope: array_mapping_scope,
+  finder_query: finder_query
+).execute.limit(20)
+```
+
+<details>
+<summary>Expand this sentence to see the SQL query.</summary>
+<pre><code lang='sql'>
+SELECT "issues".*
+FROM
+  (WITH RECURSIVE "array_cte" AS MATERIALIZED
+     (SELECT "projects"."id", "value"
+      FROM projects, (
+                      VALUES (1), (2)) AS issue_type_values (value)
+      WHERE "projects"."namespace_id" IN
+          (WITH RECURSIVE "base_and_descendants" AS (
+                                                       (SELECT "namespaces".*
+                                                        FROM "namespaces"
+                                                        WHERE "namespaces"."type" = 'Group'
+                                                          AND "namespaces"."id" = 9970)
+                                                     UNION
+                                                       (SELECT "namespaces".*
+                                                        FROM "namespaces", "base_and_descendants"
+                                                        WHERE "namespaces"."type" = 'Group'
+                                                          AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id"
+           FROM "base_and_descendants" AS "namespaces")),
+                  "recursive_keyset_cte" AS (
+                                               (SELECT NULL::issues AS records,
+                                                       array_cte_id_array,
+                                                       array_cte_value_array,
+                                                       issues_created_at_array,
+                                                       issues_id_array,
+                                                       0::bigint AS COUNT
+                                                FROM
+                                                  (SELECT ARRAY_AGG("array_cte"."id") AS array_cte_id_array,
+                                                          ARRAY_AGG("array_cte"."value") AS array_cte_value_array,
+                                                          ARRAY_AGG("issues"."created_at") AS issues_created_at_array,
+                                                          ARRAY_AGG("issues"."id") AS issues_id_array
+                                                   FROM
+                                                     (SELECT "array_cte"."id",
+                                                             "array_cte"."value"
+                                                      FROM array_cte) array_cte
+                                                   LEFT JOIN LATERAL
+                                                     (SELECT "issues"."created_at",
+                                                             "issues"."id"
+                                                      FROM "issues"
+                                                      WHERE "issues"."project_id" = "array_cte"."id"
+                                                        AND "issues"."issue_type" = "array_cte"."value"
+                                                      ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+                                                      LIMIT 1) issues ON TRUE
+                                                   WHERE "issues"."created_at" IS NOT NULL
+                                                     AND "issues"."id" IS NOT NULL) array_scope_lateral_query
+                                                LIMIT 1)
+                                             UNION ALL
+                                               (SELECT
+                                                  (SELECT issues
+                                                   FROM "issues"
+                                                   WHERE "issues"."id" = recursive_keyset_cte.issues_id_array[POSITION]
+                                                   LIMIT 1), array_cte_id_array,
+                                                             array_cte_value_array,
+                                                             recursive_keyset_cte.issues_created_at_array[:position_query.position-1]||next_cursor_values.created_at||recursive_keyset_cte.issues_created_at_array[position_query.position+1:], recursive_keyset_cte.issues_id_array[:position_query.position-1]||next_cursor_values.id||recursive_keyset_cte.issues_id_array[position_query.position+1:], recursive_keyset_cte.count + 1
+                                                FROM recursive_keyset_cte,
+                                                     LATERAL
+                                                  (SELECT created_at,
+                                                          id,
+                                                          POSITION
+                                                   FROM UNNEST(issues_created_at_array, issues_id_array) WITH
+                                                   ORDINALITY AS u(created_at, id, POSITION)
+                                                   WHERE created_at IS NOT NULL
+                                                     AND id IS NOT NULL
+                                                   ORDER BY "created_at" ASC, "id" ASC
+                                                   LIMIT 1) AS position_query,
+                                                             LATERAL
+                                                  (SELECT "record"."created_at",
+                                                          "record"."id"
+                                                   FROM (
+                                                         VALUES (NULL,
+                                                                 NULL)) AS nulls
+                                                   LEFT JOIN
+                                                     (SELECT "issues"."created_at",
+                                                             "issues"."id"
+                                                      FROM (
+                                                              (SELECT "issues"."created_at",
+                                                                      "issues"."id"
+                                                               FROM "issues"
+                                                               WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[POSITION]
+                                                                 AND "issues"."issue_type" = recursive_keyset_cte.array_cte_value_array[POSITION]
+                                                                 AND recursive_keyset_cte.issues_created_at_array[POSITION] IS NULL
+                                                                 AND "issues"."created_at" IS NULL
+                                                                 AND "issues"."id" > recursive_keyset_cte.issues_id_array[POSITION]
+                                                               ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+                                                            UNION ALL
+                                                              (SELECT "issues"."created_at",
+                                                                      "issues"."id"
+                                                               FROM "issues"
+                                                               WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[POSITION]
+                                                                 AND "issues"."issue_type" = recursive_keyset_cte.array_cte_value_array[POSITION]
+                                                                 AND recursive_keyset_cte.issues_created_at_array[POSITION] IS NOT NULL
+                                                                 AND "issues"."created_at" IS NULL
+                                                               ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+                                                            UNION ALL
+                                                              (SELECT "issues"."created_at",
+                                                                      "issues"."id"
+                                                               FROM "issues"
+                                                               WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[POSITION]
+                                                                 AND "issues"."issue_type" = recursive_keyset_cte.array_cte_value_array[POSITION]
+                                                                 AND recursive_keyset_cte.issues_created_at_array[POSITION] IS NOT NULL
+                                                                 AND "issues"."created_at" > recursive_keyset_cte.issues_created_at_array[POSITION]
+                                                               ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+                                                            UNION ALL
+                                                              (SELECT "issues"."created_at",
+                                                                      "issues"."id"
+                                                               FROM "issues"
+                                                               WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[POSITION]
+                                                                 AND "issues"."issue_type" = recursive_keyset_cte.array_cte_value_array[POSITION]
+                                                                 AND recursive_keyset_cte.issues_created_at_array[POSITION] IS NOT NULL
+                                                                 AND "issues"."created_at" = recursive_keyset_cte.issues_created_at_array[POSITION]
+                                                                 AND "issues"."id" > recursive_keyset_cte.issues_id_array[POSITION]
+                                                               ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)) issues
+                                                      ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+                                                      LIMIT 1) record ON TRUE
+                                                   LIMIT 1) AS next_cursor_values)) SELECT (records).*
+   FROM "recursive_keyset_cte" AS "issues"
+   WHERE (COUNT <> 0)) issues
+LIMIT 20
+</code>
+</details>
+
+NOTE:
+To make the query efficient, the following columns need to be covered with an index: `project_id`, `issue_type`, `created_at`, and `id`.
+
+#### Batch iteration
+
+Batch iteration over the records is possible via the keyset `Iterator` class.
+
+```ruby
+scope = Issue.order(:created_at, :id)
+array_scope = Group.find(9970).all_projects.select(:id)
+array_mapping_scope = -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) }
+finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+
+opts = {
+  in_operator_optimization_options: {
+    array_scope: array_scope,
+    array_mapping_scope: array_mapping_scope,
+    finder_query: finder_query
+  }
+}
+
+Gitlab::Pagination::Keyset::Iterator.new(scope: scope, **opts).each_batch(of: 100) do |records|
+  puts records.select(:id).map { |r| [r.id] }
+end
+```
+
+#### Keyset pagination
+
+The optimization works out of the box with GraphQL and the `keyset_paginate` helper method.
+Read more about [keyset pagination](database/keyset_pagination.md).
+
+```ruby
+array_scope = Group.find(9970).all_projects.select(:id)
+array_mapping_scope = -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) }
+finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+
+opts = {
+  in_operator_optimization_options: {
+    array_scope: array_scope,
+    array_mapping_scope: array_mapping_scope,
+    finder_query: finder_query
+  }
+}
+
+issues = Issue
+  .order(:created_at, :id)
+  .keyset_paginate(per_page: 20, keyset_order_options: opts)
+  .records
+```
+
+#### Offset pagination with Kaminari
+
+The `ActiveRecord` scope produced by the `InOperatorOptimization` class can be used in
+[offset-paginated](database/pagination_guidelines.md#offset-pagination)
+queries.
+
+```ruby
+Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder
+  .new(...)
+  .execute
+  .page(1)
+  .per(20)
+  .without_count
+```
+
+## Generalized `IN` optimization technique
+
+Let's dive into how `QueryBuilder` builds the optimized query
+to fetch the twenty oldest created issues from the group `gitlab-org`
+using the generalized `IN` optimization technique.
+
+### Array CTE
+
+As the first step, we use a common table expression (CTE) for collecting the `projects.id` values.
+This is done by wrapping the incoming `array_scope` ActiveRecord relation parameter with a CTE.
+
+```sql
+WITH array_cte AS MATERIALIZED (
+  SELECT "projects"."id"
+   FROM "projects"
+   WHERE "projects"."namespace_id" IN
+       (SELECT traversal_ids[array_length(traversal_ids, 1)] AS id
+        FROM "namespaces"
+        WHERE (traversal_ids @> ('{9970}')))
+)
+```
+
+This query produces the following result set with only one column (`projects.id`):
+
+| ID  |
+| --- |
+| 9   |
+| 2   |
+| 5   |
+| 10  |
+
+### Array mapping
+
+For each project (that is, each record storing a project ID in `array_cte`),
+we will fetch the cursor value identifying the first issue respecting the `ORDER BY` clause.
+
+As an example, let's pick the first record `ID=9` from `array_cte`.
+The following query should fetch the cursor value `(created_at, id)` identifying
+the first issue record respecting the `ORDER BY` clause for the project with `ID=9`:
+
+```sql
+SELECT "issues"."created_at", "issues"."id"
+FROM "issues"."project_id"=9
+ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+LIMIT 1;
+```
+
+We will use `LATERAL JOIN` to loop over the records in the `array_cte` and find the
+cursor value for each project. The query would be built using the `array_mapping_scope` lambda
+function.
+
+```sql
+SELECT ARRAY_AGG("array_cte"."id") AS array_cte_id_array,
+  ARRAY_AGG("issues"."created_at") AS issues_created_at_array,
+  ARRAY_AGG("issues"."id") AS issues_id_array
+FROM (
+  SELECT "array_cte"."id" FROM array_cte
+) array_cte
+LEFT JOIN LATERAL
+(
+  SELECT "issues"."created_at", "issues"."id"
+  FROM "issues"
+  WHERE "issues"."project_id" = "array_cte"."id"
+  ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+  LIMIT 1
+) issues ON TRUE
+```
+
+Since we have an index on `project_id`, `created_at`, and `id`,
+index-only scans should quickly locate all the cursor values.
+
+This is how the query could be translated to Ruby:
+
+```ruby
+created_at_values = []
+id_values = []
+project_ids.map do |project_id|
+  created_at, id = Issue.select(:created_at, :id).where(project_id: project_id).order(:created_at, :id).limit(1).first # N+1 but it's fast
+  created_at_values << created_at
+  id_values << id
+end
+```
+
+This is what the result set would look like:
+
+| `project_ids` | `created_at_values` | `id_values` |
+| ------------- | ------------------- | ----------- |
+| 2             | 2020-01-10          | 5           |
+| 5             | 2020-01-05          | 4           |
+| 10            | 2020-01-15          | 7           |
+| 9             | 2020-01-05          | 3           |
+
+The table shows the cursor values (`created_at, id`) of the first record for each project
+respecting the `ORDER BY` clause.
+
+At this point, we have the initial data. To start collecting the actual records from the database,
+we'll use a recursive CTE query where each recursion locates one row until
+the `LIMIT` is reached or no more data can be found.
+
+Here's an outline of the steps we will take in the recursive CTE query
+(expressing the steps in SQL is non-trivial but will be explained next):
+
+1. Sort the initial resultset according to the `ORDER BY` clause.
+1. Pick the top cursor to fetch the record, this is our first record. In the example,
+this cursor would be (`2020-01-05`, `3`) for `project_id=9`.
+1. We can use (`2020-01-05`, `3`) to fetch the next issue respecting the `ORDER BY` clause
+`project_id=9` filter. This produces an updated resultset.
+
+  | `project_ids` | `created_at_values` | `id_values` |
+  | ------------- | ------------------- | ----------- |
+  | 2             | 2020-01-10          | 5           |
+  | 5             | 2020-01-05          | 4           |
+  | 10            | 2020-01-15          | 7           |
+  | **9**         | **2020-01-06**      | **6**       |
+
+1. Repeat 1 to 3 with the updated resultset until we have fetched `N=20` records.
+
+### Initializing the recursive CTE query
+
+For the initial recursive query, we'll need to produce exactly one row, we call this the
+initializer query (`initializer_query`).
+
+Use `ARRAY_AGG` function to compact the initial result set into a single row
+and use the row as the initial value for the recursive CTE query:
+
+Example initializer row:
+
+| `records`      | `project_ids`   | `created_at_values` | `id_values` | `Count` | `Position` |
+| -------------- | --------------- | ------------------- | ----------- | ------- | ---------- |
+| `NULL::issues` | `[9, 2, 5, 10]` | `[...]`             | `[...]`     | `0`     | `NULL`     |
+
+- The `records` column contains our sorted database records, and the initializer query sets the
+first value to `NULL`, which is filtered out later.
+- The `count` column tracks the number of records found. We use this column to filter out the
+initializer row from the result set.
+
+### Recursive portion of the CTE query
+
+The result row is produced with the following steps:
+
+1. [Order the keyset arrays.](#order-the-keyset-arrays)
+1. [Find the next cursor.](#find-the-next-cursor)
+1. [Produce a new row.](#produce-a-new-row)
+
+#### Order the keyset arrays
+
+Order the keyset arrays according to the original `ORDER BY` clause with `LIMIT 1` using the
+`UNNEST [] WITH ORDINALITY` table function. The function locates the "lowest" keyset cursor
+values and gives us the array position. These cursor values are used to locate the record.
+
+NOTE:
+At this point, we haven't read anything from the database tables, because we relied on
+fast index-only scans.
+
+| `project_ids` | `created_at_values` | `id_values` |
+| ------------- | ------------------- | ----------- |
+| 2             | 2020-01-10          | 5           |
+| 5             | 2020-01-05          | 4           |
+| 10            | 2020-01-15          | 7           |
+| 9             | 2020-01-05          | 3           |
+
+The first row is the 4th one (`position = 4`), because it has the lowest `created_at` and
+`id` values. The `UNNEST` function also exposes the position using an extra column (note:
+PostgreSQL uses 1-based index).
+
+Demonstration of the `UNNEST [] WITH ORDINALITY` table function:
+
+```sql
+SELECT position FROM unnest('{2020-01-10, 2020-01-05, 2020-01-15, 2020-01-05}'::timestamp[], '{5, 4, 7, 3}'::int[])
+  WITH ORDINALITY AS t(created_at, id, position) ORDER BY created_at ASC, id ASC LIMIT 1;
+```
+
+Result:
+
+```sql
+position
+----------
+         4
+(1 row)
+```
+
+#### Find the next cursor
+
+Now, let's find the next cursor values (`next_cursor_values_query`) for the project with `id = 9`.
+To do that, we build a keyset pagination SQL query. Find the next row after
+`created_at = 2020-01-05` and `id = 3`. Because we order by two database columns, there can be two
+cases:
+
+- There are rows with `created_at = 2020-01-05` and `id > 3`.
+- There are rows with `created_at > 2020-01-05`.
+
+Generating this query is done by the generic keyset pagination library. After the query is done,
+we have a temporary table with the next cursor values:
+
+| `created_at` | ID  |
+| ------------ | --- |
+| 2020-01-06   | 6   |
+
+#### Produce a new row
+
+As the final step, we need to produce a new row by manipulating the initializer row
+(`data_collector_query` method). Two things happen here:
+
+- Read the full row from the DB and return it in the `records` columns. (`result_collector_columns`
+method)
+- Replace the cursor values at the current position with the results from the keyset query.
+
+Reading the full row from the database is only one index scan by the primary key. We use the
+ActiveRecord query passed as the `finder_query`:
+
+```sql
+(SELECT "issues".* FROM issues WHERE id = id_values[position] LIMIT 1)
+```
+
+By adding parentheses, the result row can be put into the `records` column.
+
+Replacing the cursor values at `position` can be done via standard PostgreSQL array operators:
+
+```sql
+-- created_at_values column value
+created_at_values[:position-1]||next_cursor_values.created_at||created_at_values[position+1:]
+
+-- id_values column value
+id_values[:position-1]||next_cursor_values.id||id_values[position+1:]
+```
+
+The Ruby equivalent would be the following:
+
+```ruby
+id_values[0..(position - 1)] + [next_cursor_values.id] + id_values[(position + 1)..-1]
+```
+
+After this, the recursion starts again by finding the next lowest cursor value.
+
+### Finalizing the query
+
+For producing the final `issues` rows, we're going to wrap the query with another `SELECT` statement:
+
+```sql
+SELECT "issues".*
+FROM (
+  SELECT (records).* -- similar to ruby splat operator
+  FROM recursive_keyset_cte
+  WHERE recursive_keyset_cte.count <> 0 -- filter out the initializer row
+) AS issues
+```
+
+### Performance comparison
+
+Assuming that we have the correct database index in place, we can compare the query performance by
+looking at the number of database rows accessed by the query.
+
+- Number of groups: 100
+- Number of projects: 500
+- Number of issues (in the group hierarchy): 50 000
+
+Standard `IN` query:
+
+| Query                    | Entries read from index | Rows read from the table | Rows sorted in memory |
+| ------------------------ | ----------------------- | ------------------------ | --------------------- |
+| group hierarchy subquery | 100                     | 0                        | 0                     |
+| project lookup query     | 500                     | 0                        | 0                     |
+| issue lookup query       | 50 000                  | 20                       | 50 000                |
+
+Optimized `IN` query:
+
+| Query                    | Entries read from index | Rows read from the table | Rows sorted in memory |
+| ------------------------ | ----------------------- | ------------------------ | --------------------- |
+| group hierarchy subquery | 100                     | 0                        | 0                     |
+| project lookup query     | 500                     | 0                        | 0                     |
+| issue lookup query       | 519                     | 20                       | 10 000                |
+
+The group and project queries are not using sorting, the necessary columns are read from database
+indexes. These values are accessed frequently so it's very likely that most of the data will be
+in the PostgreSQL's buffer cache.
+
+The optimized `IN` query will read maximum 519 entries (cursor values) from the index:
+
+- 500 index-only scans for populating the arrays for each project. The cursor values of the first
+record will be here.
+- Maximum 19 additional index-only scans for the consecutive records.
+
+The optimized `IN` query will sort the array (cursor values per project array) 20 times, which
+means we'll sort 20 x 500 rows. However, this might be a less memory-intensive task than
+sorting 10 000 rows at once.
+
+Performance comparison for the `gitlab-org` group:
+
+| Query                | Number of 8K Buffers involved | Uncached execution time | Cached execution time |
+| -------------------- | ----------------------------- | ----------------------- | --------------------- |
+| `IN` query           | 240833                        | 1.2s                    | 660ms                 |
+| Optimized `IN` query | 9783                          | 450ms                   | 22ms                  |
+
+NOTE:
+Before taking measurements, the group lookup query was executed separately in order to make
+the group data available in the buffer cache. Since it's a frequently called query, it's going to
+hit many shared buffers during the query execution in the production environment.
author	GitLab Bot <gitlab-bot@gitlab.com>	2021-09-14 15:10:35 +0300
committer	GitLab Bot <gitlab-bot@gitlab.com>	2021-09-14 15:10:35 +0300
commit	d378fdaa60adb7217e3fc798580ad206127728d5 (patch)
tree	9cb715513dd4d4197f76b2908458551940b0a094 /doc/development/database/efficient_in_operator_queries.md
parent	7b69070a7468c4a9b6fe0ed7fbf1b3f2b58434e0 (diff)