Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2021-09-14 15:10:35 +0300
committerGitLab Bot <gitlab-bot@gitlab.com>2021-09-14 15:10:35 +0300
commitd378fdaa60adb7217e3fc798580ad206127728d5 (patch)
tree9cb715513dd4d4197f76b2908458551940b0a094 /doc/development/database/efficient_in_operator_queries.md
parent7b69070a7468c4a9b6fe0ed7fbf1b3f2b58434e0 (diff)
Add latest changes from gitlab-org/gitlab@master
Diffstat (limited to 'doc/development/database/efficient_in_operator_queries.md')
-rw-r--r--doc/development/database/efficient_in_operator_queries.md948
1 files changed, 948 insertions, 0 deletions
diff --git a/doc/development/database/efficient_in_operator_queries.md b/doc/development/database/efficient_in_operator_queries.md
new file mode 100644
index 00000000000..207fa6c3832
--- /dev/null
+++ b/doc/development/database/efficient_in_operator_queries.md
@@ -0,0 +1,948 @@
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Efficient `IN` operator queries
+
+This document describes a technique for building efficient ordered database queries with the `IN`
+SQL operator and the usage of a GitLab utility module to help apply the technique.
+
+NOTE:
+The described technique makes heavy use of
+[keyset pagination](pagination_guidelines.md#keyset-pagination).
+It's advised to get familiar with the topic first.
+
+## Motivation
+
+In GitLab, many domain objects like `Issue` live under nested hierarchies of projects and groups.
+To fetch nested database records for domain objects at the group-level,
+we often perform queries with the `IN` SQL operator.
+We are usually interested in ordering the records by some attributes
+and limiting the number of records using `ORDER BY` and `LIMIT` clauses for performance.
+Pagination may be used to fetch subsequent records.
+
+Example tasks requiring querying nested domain objects from the group level:
+
+- Show first 20 issues by creation date or due date from the group `gitlab-org`.
+- Show first 20 merge_requests by merged at date from the group `gitlab-com`.
+
+Unfortunately, ordered group-level queries typically perform badly
+as their executions require heavy I/O, memory, and computations.
+Let's do an in-depth examination of executing one such query.
+
+### Performance problems with `IN` queries
+
+Consider the task of fetching the twenty oldest created issues
+from the group `gitlab-org` with the following query:
+
+```sql
+SELECT "issues".*
+FROM "issues"
+WHERE "issues"."project_id" IN
+ (SELECT "projects"."id"
+ FROM "projects"
+ WHERE "projects"."namespace_id" IN
+ (SELECT traversal_ids[array_length(traversal_ids, 1)] AS id
+ FROM "namespaces"
+ WHERE (traversal_ids @> ('{9970}'))))
+ORDER BY "issues"."created_at" ASC,
+ "issues"."id" ASC
+LIMIT 20
+```
+
+NOTE:
+For pagination, ordering by the `created_at` column is not enough,
+we must add the `id` column as a
+[tie-breaker](pagination_performance_guidelines.md#tie-breaker-column).
+
+The execution of the query can be largely broken down into three steps:
+
+1. The database accesses both `namespaces` and `projects` tables
+ to find all projects from all groups in the group hierarchy.
+1. The database retrieves `issues` records for each project causing heavy disk I/O.
+ Ideally, an appropriate index configuration should optimize this process.
+1. The database sorts the `issues` rows in memory by `created_at` and returns `LIMIT 20` rows to
+ the end-user. For large groups, this final step requires both large memory and CPU resources.
+
+<details>
+<summary>Expand this sentence to see the execution plan for this DB query.</summary>
+<pre><code>
+ Limit (cost=90170.07..90170.12 rows=20 width=1329) (actual time=967.597..967.607 rows=20 loops=1)
+ Buffers: shared hit=239127 read=3060
+ I/O Timings: read=336.879
+ -> Sort (cost=90170.07..90224.02 rows=21578 width=1329) (actual time=967.596..967.603 rows=20 loops=1)
+ Sort Key: issues.created_at, issues.id
+ Sort Method: top-N heapsort Memory: 74kB
+ Buffers: shared hit=239127 read=3060
+ I/O Timings: read=336.879
+ -> Nested Loop (cost=1305.66..89595.89 rows=21578 width=1329) (actual time=4.709..797.659 rows=241534 loops=1)
+ Buffers: shared hit=239121 read=3060
+ I/O Timings: read=336.879
+ -> HashAggregate (cost=1305.10..1360.22 rows=5512 width=4) (actual time=4.657..5.370 rows=1528 loops=1)
+ Group Key: projects.id
+ Buffers: shared hit=2597
+ -> Nested Loop (cost=576.76..1291.32 rows=5512 width=4) (actual time=2.427..4.244 rows=1528 loops=1)
+ Buffers: shared hit=2597
+ -> HashAggregate (cost=576.32..579.06 rows=274 width=25) (actual time=2.406..2.447 rows=265 loops=1)
+ Group Key: namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)]
+ Buffers: shared hit=334
+ -> Bitmap Heap Scan on namespaces (cost=141.62..575.63 rows=274 width=25) (actual time=1.933..2.330 rows=265 loops=1)
+ Recheck Cond: (traversal_ids @> '{9970}'::integer[])
+ Heap Blocks: exact=243
+ Buffers: shared hit=334
+ -> Bitmap Index Scan on index_namespaces_on_traversal_ids (cost=0.00..141.55 rows=274 width=0) (actual time=1.897..1.898 rows=265 loops=1)
+ Index Cond: (traversal_ids @> '{9970}'::integer[])
+ Buffers: shared hit=91
+ -> Index Only Scan using index_projects_on_namespace_id_and_id on projects (cost=0.44..2.40 rows=20 width=8) (actual time=0.004..0.006 rows=6 loops=265)
+ Index Cond: (namespace_id = (namespaces.traversal_ids)[array_length(namespaces.traversal_ids, 1)])
+ Heap Fetches: 51
+ Buffers: shared hit=2263
+ -> Index Scan using index_issues_on_project_id_and_iid on issues (cost=0.57..10.57 rows=544 width=1329) (actual time=0.114..0.484 rows=158 loops=1528)
+ Index Cond: (project_id = projects.id)
+ Buffers: shared hit=236524 read=3060
+ I/O Timings: read=336.879
+ Planning Time: 7.750 ms
+ Execution Time: 967.973 ms
+(36 rows)
+</code></pre>
+</details>
+
+The performance of the query depends on the number of rows in the database.
+On average, we can say the following:
+
+- Number of groups in the group-hierarchy: less than 1 000
+- Number of projects: less than 5 000
+- Number of issues: less than 100 000
+
+From the list, it's apparent that the number of `issues` records has
+the largest impact on the performance.
+As per normal usage, we can say that the number of issue records grows
+at a faster rate than the `namespaces` and the `projects` records.
+
+This problem affects most of our group-level features where records are listed
+in a specific order, such as group-level issues, merge requests pages, and APIs.
+For very large groups the database queries can easily time out, causing HTTP 500 errors.
+
+## Optimizing ordered `IN` queries
+
+In the talk
+["How to teach an elephant to dance rock'n'roll"](https://www.youtube.com/watch?v=Ha38lcjVyhQ),
+Maxim Boguk demonstrated a technique to optimize a special class of ordered `IN` queries,
+such as our ordered group-level queries.
+
+A typical ordered `IN` query may look like this:
+
+```sql
+SELECT t.* FROM t
+WHERE t.fkey IN (value_set)
+ORDER BY t.pkey
+LIMIT N;
+```
+
+Here's the key insight used in the technique: we need at most `|value_set| + N` record lookups,
+rather than retrieving all records satisfying the condition `t.fkey IN value_set` (`|value_set|`
+is the number of values in `value_set`).
+
+We adopted and generalized the technique for use in GitLab by implementing utilities in the
+`Gitlab::Pagination::Keyset::InOperatorOptimization` class to facilitate building efficient `IN`
+queries.
+
+### Requirements
+
+The technique is not a drop-in replacement for the existing group-level queries using `IN` operator.
+The technique can only optimize `IN` queries that satisfy the following requirements:
+
+- `LIMIT` is present, which usually means that the query is paginated
+ (offset or keyset pagination).
+- The column used with the `IN` query and the columns in the `ORDER BY`
+ clause are covered with a database index. The columns in the index must be
+ in the following order: `column_for_the_in_query`, `order by column 1`, and
+ `order by column 2`.
+- The columns in the `ORDER BY` clause are distinct
+ (the combination of the columns uniquely identifies one particular column in the table).
+
+WARNING:
+This technique will not improve the performance of the `COUNT(*)` queries.
+
+## The `InOperatorOptimization` module
+
+> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/67352) in GitLab 14.3.
+
+The `Gitlab::Pagination::Keyset::InOperatorOptimization` module implements utilities for applying a generalized version of
+the efficient `IN` query technique described in the previous section.
+
+To build optimized, ordered `IN` queries that meet [the requirements](#requirements),
+use the utility class `QueryBuilder` from the module.
+
+NOTE:
+The generic keyset pagination module introduced in the merge request
+[51481](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/51481)
+plays a fundamental role in the generalized implementation of the technique
+in `Gitlab::Pagination::Keyset::InOperatorOptimization`.
+
+### Basic usage of `QueryBuilder`
+
+To illustrate a basic usage, we will build a query that
+fetches 20 issues with the oldest `created_at` from the group `gitlab-org`.
+
+The following ActiveRecord query would produce a query similar to
+[the unoptimized query](#performance-problems-with-in-queries) that we examined earlier:
+
+```ruby
+scope = Issue
+ .where(project_id: Group.find(9970).all_projects.select(:id)) # `gitlab-org` group and its subgroups
+ .order(:created_at, :id)
+ .limit(20)
+```
+
+Instead, use the query builder `InOperatorOptimization::QueryBuilder` to produce an optimized
+version:
+
+```ruby
+scope = Issue.order(:created_at, :id)
+array_scope = Group.find(9970).all_projects.select(:id)
+array_mapping_scope = -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) }
+finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+
+Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
+ scope: scope,
+ array_scope: array_scope,
+ array_mapping_scope: array_mapping_scope,
+ finder_query: finder_query
+).execute.limit(20)
+```
+
+- `scope` represents the original `ActiveRecord::Relation` object without the `IN` query. The
+ relation should define an order which must be supported by the
+ [keyset pagination library](keyset_pagination.md#usage).
+- `array_scope` contains the `ActiveRecord::Relation` object, which represents the original
+ `IN (subquery)`. The select values must contain the columns by which the subquery is "connected"
+ to the main query: the `id` of the project record.
+- `array_mapping_scope` defines a lambda returning an `ActiveRecord::Relation` object. The lambda
+ matches (`=`) single select values from the `array_scope`. The lambda yields as many
+ arguments as the select values defined in the `array_scope`. The arguments are Arel SQL expressions.
+- `finder_query` loads the actual record row from the database. It must also be a lambda, where
+ the order by column expressions is available for locating the record. In this example, the
+ yielded values are `created_at` and `id` SQL expressions. Finding a record is very fast via the
+ primary key, so we don't use the `created_at` value.
+
+The following database index on the `issues` table must be present
+to make the query execute efficiently:
+
+```sql
+"idx_issues_on_project_id_and_created_at_and_id" btree (project_id, created_at, id)
+```
+
+<details>
+<summary>Expand this sentence to see the SQL query.</summary>
+<pre><code>
+SELECT "issues".*
+FROM
+ (WITH RECURSIVE "array_cte" AS MATERIALIZED
+ (SELECT "projects"."id"
+ FROM "projects"
+ WHERE "projects"."namespace_id" IN
+ (SELECT traversal_ids[array_length(traversal_ids, 1)] AS id
+ FROM "namespaces"
+ WHERE (traversal_ids @> ('{9970}')))),
+ "recursive_keyset_cte" AS ( -- initializer row start
+ (SELECT NULL::issues AS records,
+ array_cte_id_array,
+ issues_created_at_array,
+ issues_id_array,
+ 0::bigint AS COUNT
+ FROM
+ (SELECT ARRAY_AGG("array_cte"."id") AS array_cte_id_array,
+ ARRAY_AGG("issues"."created_at") AS issues_created_at_array,
+ ARRAY_AGG("issues"."id") AS issues_id_array
+ FROM
+ (SELECT "array_cte"."id"
+ FROM array_cte) array_cte
+ LEFT JOIN LATERAL
+ (SELECT "issues"."created_at",
+ "issues"."id"
+ FROM "issues"
+ WHERE "issues"."project_id" = "array_cte"."id"
+ ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+ LIMIT 1) issues ON TRUE
+ WHERE "issues"."created_at" IS NOT NULL
+ AND "issues"."id" IS NOT NULL) array_scope_lateral_query
+ LIMIT 1)
+ -- initializer row finished
+ UNION ALL
+ (SELECT
+ -- result row start
+ (SELECT issues -- record finder query as the first column
+ FROM "issues"
+ WHERE "issues"."id" = recursive_keyset_cte.issues_id_array[position]
+ LIMIT 1),
+ array_cte_id_array,
+ recursive_keyset_cte.issues_created_at_array[:position_query.position-1]||next_cursor_values.created_at||recursive_keyset_cte.issues_created_at_array[position_query.position+1:],
+ recursive_keyset_cte.issues_id_array[:position_query.position-1]||next_cursor_values.id||recursive_keyset_cte.issues_id_array[position_query.position+1:],
+ recursive_keyset_cte.count + 1
+ -- result row finished
+ FROM recursive_keyset_cte,
+ LATERAL
+ -- finding the cursor values of the next record start
+ (SELECT created_at,
+ id,
+ position
+ FROM UNNEST(issues_created_at_array, issues_id_array) WITH
+ ORDINALITY AS u(created_at, id, position)
+ WHERE created_at IS NOT NULL
+ AND id IS NOT NULL
+ ORDER BY "created_at" ASC, "id" ASC
+ LIMIT 1) AS position_query,
+ -- finding the cursor values of the next record end
+ -- finding the next cursor values (next_cursor_values_query) start
+ LATERAL
+ (SELECT "record"."created_at",
+ "record"."id"
+ FROM (
+ VALUES (NULL,
+ NULL)) AS nulls
+ LEFT JOIN
+ (SELECT "issues"."created_at",
+ "issues"."id"
+ FROM (
+ (SELECT "issues"."created_at",
+ "issues"."id"
+ FROM "issues"
+ WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
+ AND recursive_keyset_cte.issues_created_at_array[position] IS NULL
+ AND "issues"."created_at" IS NULL
+ AND "issues"."id" > recursive_keyset_cte.issues_id_array[position]
+ ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+ UNION ALL
+ (SELECT "issues"."created_at",
+ "issues"."id"
+ FROM "issues"
+ WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
+ AND recursive_keyset_cte.issues_created_at_array[position] IS NOT NULL
+ AND "issues"."created_at" IS NULL
+ ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+ UNION ALL
+ (SELECT "issues"."created_at",
+ "issues"."id"
+ FROM "issues"
+ WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
+ AND recursive_keyset_cte.issues_created_at_array[position] IS NOT NULL
+ AND "issues"."created_at" > recursive_keyset_cte.issues_created_at_array[position]
+ ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+ UNION ALL
+ (SELECT "issues"."created_at",
+ "issues"."id"
+ FROM "issues"
+ WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
+ AND recursive_keyset_cte.issues_created_at_array[position] IS NOT NULL
+ AND "issues"."created_at" = recursive_keyset_cte.issues_created_at_array[position]
+ AND "issues"."id" > recursive_keyset_cte.issues_id_array[position]
+ ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)) issues
+ ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+ LIMIT 1) record ON TRUE
+ LIMIT 1) AS next_cursor_values))
+ -- finding the next cursor values (next_cursor_values_query) END
+SELECT (records).*
+ FROM "recursive_keyset_cte" AS "issues"
+ WHERE (COUNT <> 0)) issues -- filtering out the initializer row
+LIMIT 20
+</code></pre>
+</details>
+
+### Using the `IN` query optimization
+
+#### Adding more filters
+
+In this example, let's add an extra filter by `milestone_id`.
+
+Be careful when adding extra filters to the query. If the column is not covered by the same index,
+then the query might perform worse than the non-optimized query. The `milestone_id` column in the
+`issues` table is currently covered by a different index:
+
+```sql
+"index_issues_on_milestone_id" btree (milestone_id)
+```
+
+Adding the `miletone_id = X` filter to the `scope` argument or to the optimized scope causes bad performance.
+
+Example (bad):
+
+```ruby
+Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
+ scope: scope,
+ array_scope: array_scope,
+ array_mapping_scope: array_mapping_scope,
+ finder_query: finder_query
+).execute
+ .where(milestone_id: 5)
+ .limit(20)
+```
+
+To address this concern, we could define another index:
+
+```sql
+"idx_issues_on_project_id_and_milestone_id_and_created_at_and_id" btree (project_id, milestone_id, created_at, id)
+```
+
+Adding more indexes to the `issues` table could significantly affect the performance of
+the `UPDATE` queries. In this case, it's better to rely on the original query. It means that if we
+want to use the optimization for the unfiltered page we need to add extra logic in the application code:
+
+```ruby
+if optimization_possible? # no extra params or params covered with the same index as the ORDER BY clause
+ run_optimized_query
+else
+ run_normal_in_query
+end
+```
+
+#### Multiple `IN` queries
+
+Let's assume that we want to extend the group-level queries to include only incident and test case
+issue types.
+
+The original ActiveRecord query would look like this:
+
+```ruby
+scope = Issue
+ .where(project_id: Group.find(9970).all_projects.select(:id)) # `gitlab-org` group and its subgroups
+ .where(issue_type: [:incident, :test_case]) # 1, 2
+ .order(:created_at, :id)
+ .limit(20)
+```
+
+To construct the array scope, we'll need to take the Cartesian product of the `project_id IN` and
+the `issue_type IN` queries. `issue_type` is an ActiveRecord enum, so we need to
+construct the following table:
+
+| `project_id` | `issue_type_value` |
+| ------------ | ------------------ |
+| 2 | 1 |
+| 2 | 2 |
+| 5 | 1 |
+| 5 | 2 |
+| 10 | 1 |
+| 10 | 2 |
+| 9 | 1 |
+| 9 | 2 |
+
+For the `issue_types` query we can construct a value list without querying a table:
+
+```ruby
+value_list = Arel::Nodes::ValuesList.new([[Issue.issue_types[:incident]],[Issue.issue_types[:test_case]]])
+issue_type_values = Arel::Nodes::Grouping.new(value_list).as('issue_type_values (value)').to_sql
+
+array_scope = Group
+ .find(9970)
+ .all_projects
+ .from("#{Project.table_name}, #{issue_type_values}")
+ .select(:id, :value)
+```
+
+Building the `array_mapping_scope` query requires two arguments: `id` and `issue_type_value`:
+
+```ruby
+array_mapping_scope = -> (id_expression, issue_type_value_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)).where(Issue.arel_table[:issue_type].eq(issue_type_value_expression)) }
+```
+
+The `scope` and the `finder` queries don't change:
+
+```ruby
+scope = Issue.order(:created_at, :id)
+finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+
+Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
+ scope: scope,
+ array_scope: array_scope,
+ array_mapping_scope: array_mapping_scope,
+ finder_query: finder_query
+).execute.limit(20)
+```
+
+<details>
+<summary>Expand this sentence to see the SQL query.</summary>
+<pre><code lang='sql'>
+SELECT "issues".*
+FROM
+ (WITH RECURSIVE "array_cte" AS MATERIALIZED
+ (SELECT "projects"."id", "value"
+ FROM projects, (
+ VALUES (1), (2)) AS issue_type_values (value)
+ WHERE "projects"."namespace_id" IN
+ (WITH RECURSIVE "base_and_descendants" AS (
+ (SELECT "namespaces".*
+ FROM "namespaces"
+ WHERE "namespaces"."type" = 'Group'
+ AND "namespaces"."id" = 9970)
+ UNION
+ (SELECT "namespaces".*
+ FROM "namespaces", "base_and_descendants"
+ WHERE "namespaces"."type" = 'Group'
+ AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id"
+ FROM "base_and_descendants" AS "namespaces")),
+ "recursive_keyset_cte" AS (
+ (SELECT NULL::issues AS records,
+ array_cte_id_array,
+ array_cte_value_array,
+ issues_created_at_array,
+ issues_id_array,
+ 0::bigint AS COUNT
+ FROM
+ (SELECT ARRAY_AGG("array_cte"."id") AS array_cte_id_array,
+ ARRAY_AGG("array_cte"."value") AS array_cte_value_array,
+ ARRAY_AGG("issues"."created_at") AS issues_created_at_array,
+ ARRAY_AGG("issues"."id") AS issues_id_array
+ FROM
+ (SELECT "array_cte"."id",
+ "array_cte"."value"
+ FROM array_cte) array_cte
+ LEFT JOIN LATERAL
+ (SELECT "issues"."created_at",
+ "issues"."id"
+ FROM "issues"
+ WHERE "issues"."project_id" = "array_cte"."id"
+ AND "issues"."issue_type" = "array_cte"."value"
+ ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+ LIMIT 1) issues ON TRUE
+ WHERE "issues"."created_at" IS NOT NULL
+ AND "issues"."id" IS NOT NULL) array_scope_lateral_query
+ LIMIT 1)
+ UNION ALL
+ (SELECT
+ (SELECT issues
+ FROM "issues"
+ WHERE "issues"."id" = recursive_keyset_cte.issues_id_array[POSITION]
+ LIMIT 1), array_cte_id_array,
+ array_cte_value_array,
+ recursive_keyset_cte.issues_created_at_array[:position_query.position-1]||next_cursor_values.created_at||recursive_keyset_cte.issues_created_at_array[position_query.position+1:], recursive_keyset_cte.issues_id_array[:position_query.position-1]||next_cursor_values.id||recursive_keyset_cte.issues_id_array[position_query.position+1:], recursive_keyset_cte.count + 1
+ FROM recursive_keyset_cte,
+ LATERAL
+ (SELECT created_at,
+ id,
+ POSITION
+ FROM UNNEST(issues_created_at_array, issues_id_array) WITH
+ ORDINALITY AS u(created_at, id, POSITION)
+ WHERE created_at IS NOT NULL
+ AND id IS NOT NULL
+ ORDER BY "created_at" ASC, "id" ASC
+ LIMIT 1) AS position_query,
+ LATERAL
+ (SELECT "record"."created_at",
+ "record"."id"
+ FROM (
+ VALUES (NULL,
+ NULL)) AS nulls
+ LEFT JOIN
+ (SELECT "issues"."created_at",
+ "issues"."id"
+ FROM (
+ (SELECT "issues"."created_at",
+ "issues"."id"
+ FROM "issues"
+ WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[POSITION]
+ AND "issues"."issue_type" = recursive_keyset_cte.array_cte_value_array[POSITION]
+ AND recursive_keyset_cte.issues_created_at_array[POSITION] IS NULL
+ AND "issues"."created_at" IS NULL
+ AND "issues"."id" > recursive_keyset_cte.issues_id_array[POSITION]
+ ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+ UNION ALL
+ (SELECT "issues"."created_at",
+ "issues"."id"
+ FROM "issues"
+ WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[POSITION]
+ AND "issues"."issue_type" = recursive_keyset_cte.array_cte_value_array[POSITION]
+ AND recursive_keyset_cte.issues_created_at_array[POSITION] IS NOT NULL
+ AND "issues"."created_at" IS NULL
+ ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+ UNION ALL
+ (SELECT "issues"."created_at",
+ "issues"."id"
+ FROM "issues"
+ WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[POSITION]
+ AND "issues"."issue_type" = recursive_keyset_cte.array_cte_value_array[POSITION]
+ AND recursive_keyset_cte.issues_created_at_array[POSITION] IS NOT NULL
+ AND "issues"."created_at" > recursive_keyset_cte.issues_created_at_array[POSITION]
+ ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+ UNION ALL
+ (SELECT "issues"."created_at",
+ "issues"."id"
+ FROM "issues"
+ WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[POSITION]
+ AND "issues"."issue_type" = recursive_keyset_cte.array_cte_value_array[POSITION]
+ AND recursive_keyset_cte.issues_created_at_array[POSITION] IS NOT NULL
+ AND "issues"."created_at" = recursive_keyset_cte.issues_created_at_array[POSITION]
+ AND "issues"."id" > recursive_keyset_cte.issues_id_array[POSITION]
+ ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)) issues
+ ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+ LIMIT 1) record ON TRUE
+ LIMIT 1) AS next_cursor_values)) SELECT (records).*
+ FROM "recursive_keyset_cte" AS "issues"
+ WHERE (COUNT <> 0)) issues
+LIMIT 20
+</code>
+</details>
+
+NOTE:
+To make the query efficient, the following columns need to be covered with an index: `project_id`, `issue_type`, `created_at`, and `id`.
+
+#### Batch iteration
+
+Batch iteration over the records is possible via the keyset `Iterator` class.
+
+```ruby
+scope = Issue.order(:created_at, :id)
+array_scope = Group.find(9970).all_projects.select(:id)
+array_mapping_scope = -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) }
+finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+
+opts = {
+ in_operator_optimization_options: {
+ array_scope: array_scope,
+ array_mapping_scope: array_mapping_scope,
+ finder_query: finder_query
+ }
+}
+
+Gitlab::Pagination::Keyset::Iterator.new(scope: scope, **opts).each_batch(of: 100) do |records|
+ puts records.select(:id).map { |r| [r.id] }
+end
+```
+
+#### Keyset pagination
+
+The optimization works out of the box with GraphQL and the `keyset_paginate` helper method.
+Read more about [keyset pagination](database/keyset_pagination.md).
+
+```ruby
+array_scope = Group.find(9970).all_projects.select(:id)
+array_mapping_scope = -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) }
+finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+
+opts = {
+ in_operator_optimization_options: {
+ array_scope: array_scope,
+ array_mapping_scope: array_mapping_scope,
+ finder_query: finder_query
+ }
+}
+
+issues = Issue
+ .order(:created_at, :id)
+ .keyset_paginate(per_page: 20, keyset_order_options: opts)
+ .records
+```
+
+#### Offset pagination with Kaminari
+
+The `ActiveRecord` scope produced by the `InOperatorOptimization` class can be used in
+[offset-paginated](database/pagination_guidelines.md#offset-pagination)
+queries.
+
+```ruby
+Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder
+ .new(...)
+ .execute
+ .page(1)
+ .per(20)
+ .without_count
+```
+
+## Generalized `IN` optimization technique
+
+Let's dive into how `QueryBuilder` builds the optimized query
+to fetch the twenty oldest created issues from the group `gitlab-org`
+using the generalized `IN` optimization technique.
+
+### Array CTE
+
+As the first step, we use a common table expression (CTE) for collecting the `projects.id` values.
+This is done by wrapping the incoming `array_scope` ActiveRecord relation parameter with a CTE.
+
+```sql
+WITH array_cte AS MATERIALIZED (
+ SELECT "projects"."id"
+ FROM "projects"
+ WHERE "projects"."namespace_id" IN
+ (SELECT traversal_ids[array_length(traversal_ids, 1)] AS id
+ FROM "namespaces"
+ WHERE (traversal_ids @> ('{9970}')))
+)
+```
+
+This query produces the following result set with only one column (`projects.id`):
+
+| ID |
+| --- |
+| 9 |
+| 2 |
+| 5 |
+| 10 |
+
+### Array mapping
+
+For each project (that is, each record storing a project ID in `array_cte`),
+we will fetch the cursor value identifying the first issue respecting the `ORDER BY` clause.
+
+As an example, let's pick the first record `ID=9` from `array_cte`.
+The following query should fetch the cursor value `(created_at, id)` identifying
+the first issue record respecting the `ORDER BY` clause for the project with `ID=9`:
+
+```sql
+SELECT "issues"."created_at", "issues"."id"
+FROM "issues"."project_id"=9
+ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+LIMIT 1;
+```
+
+We will use `LATERAL JOIN` to loop over the records in the `array_cte` and find the
+cursor value for each project. The query would be built using the `array_mapping_scope` lambda
+function.
+
+```sql
+SELECT ARRAY_AGG("array_cte"."id") AS array_cte_id_array,
+ ARRAY_AGG("issues"."created_at") AS issues_created_at_array,
+ ARRAY_AGG("issues"."id") AS issues_id_array
+FROM (
+ SELECT "array_cte"."id" FROM array_cte
+) array_cte
+LEFT JOIN LATERAL
+(
+ SELECT "issues"."created_at", "issues"."id"
+ FROM "issues"
+ WHERE "issues"."project_id" = "array_cte"."id"
+ ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+ LIMIT 1
+) issues ON TRUE
+```
+
+Since we have an index on `project_id`, `created_at`, and `id`,
+index-only scans should quickly locate all the cursor values.
+
+This is how the query could be translated to Ruby:
+
+```ruby
+created_at_values = []
+id_values = []
+project_ids.map do |project_id|
+ created_at, id = Issue.select(:created_at, :id).where(project_id: project_id).order(:created_at, :id).limit(1).first # N+1 but it's fast
+ created_at_values << created_at
+ id_values << id
+end
+```
+
+This is what the result set would look like:
+
+| `project_ids` | `created_at_values` | `id_values` |
+| ------------- | ------------------- | ----------- |
+| 2 | 2020-01-10 | 5 |
+| 5 | 2020-01-05 | 4 |
+| 10 | 2020-01-15 | 7 |
+| 9 | 2020-01-05 | 3 |
+
+The table shows the cursor values (`created_at, id`) of the first record for each project
+respecting the `ORDER BY` clause.
+
+At this point, we have the initial data. To start collecting the actual records from the database,
+we'll use a recursive CTE query where each recursion locates one row until
+the `LIMIT` is reached or no more data can be found.
+
+Here's an outline of the steps we will take in the recursive CTE query
+(expressing the steps in SQL is non-trivial but will be explained next):
+
+1. Sort the initial resultset according to the `ORDER BY` clause.
+1. Pick the top cursor to fetch the record, this is our first record. In the example,
+this cursor would be (`2020-01-05`, `3`) for `project_id=9`.
+1. We can use (`2020-01-05`, `3`) to fetch the next issue respecting the `ORDER BY` clause
+`project_id=9` filter. This produces an updated resultset.
+
+ | `project_ids` | `created_at_values` | `id_values` |
+ | ------------- | ------------------- | ----------- |
+ | 2 | 2020-01-10 | 5 |
+ | 5 | 2020-01-05 | 4 |
+ | 10 | 2020-01-15 | 7 |
+ | **9** | **2020-01-06** | **6** |
+
+1. Repeat 1 to 3 with the updated resultset until we have fetched `N=20` records.
+
+### Initializing the recursive CTE query
+
+For the initial recursive query, we'll need to produce exactly one row, we call this the
+initializer query (`initializer_query`).
+
+Use `ARRAY_AGG` function to compact the initial result set into a single row
+and use the row as the initial value for the recursive CTE query:
+
+Example initializer row:
+
+| `records` | `project_ids` | `created_at_values` | `id_values` | `Count` | `Position` |
+| -------------- | --------------- | ------------------- | ----------- | ------- | ---------- |
+| `NULL::issues` | `[9, 2, 5, 10]` | `[...]` | `[...]` | `0` | `NULL` |
+
+- The `records` column contains our sorted database records, and the initializer query sets the
+first value to `NULL`, which is filtered out later.
+- The `count` column tracks the number of records found. We use this column to filter out the
+initializer row from the result set.
+
+### Recursive portion of the CTE query
+
+The result row is produced with the following steps:
+
+1. [Order the keyset arrays.](#order-the-keyset-arrays)
+1. [Find the next cursor.](#find-the-next-cursor)
+1. [Produce a new row.](#produce-a-new-row)
+
+#### Order the keyset arrays
+
+Order the keyset arrays according to the original `ORDER BY` clause with `LIMIT 1` using the
+`UNNEST [] WITH ORDINALITY` table function. The function locates the "lowest" keyset cursor
+values and gives us the array position. These cursor values are used to locate the record.
+
+NOTE:
+At this point, we haven't read anything from the database tables, because we relied on
+fast index-only scans.
+
+| `project_ids` | `created_at_values` | `id_values` |
+| ------------- | ------------------- | ----------- |
+| 2 | 2020-01-10 | 5 |
+| 5 | 2020-01-05 | 4 |
+| 10 | 2020-01-15 | 7 |
+| 9 | 2020-01-05 | 3 |
+
+The first row is the 4th one (`position = 4`), because it has the lowest `created_at` and
+`id` values. The `UNNEST` function also exposes the position using an extra column (note:
+PostgreSQL uses 1-based index).
+
+Demonstration of the `UNNEST [] WITH ORDINALITY` table function:
+
+```sql
+SELECT position FROM unnest('{2020-01-10, 2020-01-05, 2020-01-15, 2020-01-05}'::timestamp[], '{5, 4, 7, 3}'::int[])
+ WITH ORDINALITY AS t(created_at, id, position) ORDER BY created_at ASC, id ASC LIMIT 1;
+```
+
+Result:
+
+```sql
+position
+----------
+ 4
+(1 row)
+```
+
+#### Find the next cursor
+
+Now, let's find the next cursor values (`next_cursor_values_query`) for the project with `id = 9`.
+To do that, we build a keyset pagination SQL query. Find the next row after
+`created_at = 2020-01-05` and `id = 3`. Because we order by two database columns, there can be two
+cases:
+
+- There are rows with `created_at = 2020-01-05` and `id > 3`.
+- There are rows with `created_at > 2020-01-05`.
+
+Generating this query is done by the generic keyset pagination library. After the query is done,
+we have a temporary table with the next cursor values:
+
+| `created_at` | ID |
+| ------------ | --- |
+| 2020-01-06 | 6 |
+
+#### Produce a new row
+
+As the final step, we need to produce a new row by manipulating the initializer row
+(`data_collector_query` method). Two things happen here:
+
+- Read the full row from the DB and return it in the `records` columns. (`result_collector_columns`
+method)
+- Replace the cursor values at the current position with the results from the keyset query.
+
+Reading the full row from the database is only one index scan by the primary key. We use the
+ActiveRecord query passed as the `finder_query`:
+
+```sql
+(SELECT "issues".* FROM issues WHERE id = id_values[position] LIMIT 1)
+```
+
+By adding parentheses, the result row can be put into the `records` column.
+
+Replacing the cursor values at `position` can be done via standard PostgreSQL array operators:
+
+```sql
+-- created_at_values column value
+created_at_values[:position-1]||next_cursor_values.created_at||created_at_values[position+1:]
+
+-- id_values column value
+id_values[:position-1]||next_cursor_values.id||id_values[position+1:]
+```
+
+The Ruby equivalent would be the following:
+
+```ruby
+id_values[0..(position - 1)] + [next_cursor_values.id] + id_values[(position + 1)..-1]
+```
+
+After this, the recursion starts again by finding the next lowest cursor value.
+
+### Finalizing the query
+
+For producing the final `issues` rows, we're going to wrap the query with another `SELECT` statement:
+
+```sql
+SELECT "issues".*
+FROM (
+ SELECT (records).* -- similar to ruby splat operator
+ FROM recursive_keyset_cte
+ WHERE recursive_keyset_cte.count <> 0 -- filter out the initializer row
+) AS issues
+```
+
+### Performance comparison
+
+Assuming that we have the correct database index in place, we can compare the query performance by
+looking at the number of database rows accessed by the query.
+
+- Number of groups: 100
+- Number of projects: 500
+- Number of issues (in the group hierarchy): 50 000
+
+Standard `IN` query:
+
+| Query | Entries read from index | Rows read from the table | Rows sorted in memory |
+| ------------------------ | ----------------------- | ------------------------ | --------------------- |
+| group hierarchy subquery | 100 | 0 | 0 |
+| project lookup query | 500 | 0 | 0 |
+| issue lookup query | 50 000 | 20 | 50 000 |
+
+Optimized `IN` query:
+
+| Query | Entries read from index | Rows read from the table | Rows sorted in memory |
+| ------------------------ | ----------------------- | ------------------------ | --------------------- |
+| group hierarchy subquery | 100 | 0 | 0 |
+| project lookup query | 500 | 0 | 0 |
+| issue lookup query | 519 | 20 | 10 000 |
+
+The group and project queries are not using sorting, the necessary columns are read from database
+indexes. These values are accessed frequently so it's very likely that most of the data will be
+in the PostgreSQL's buffer cache.
+
+The optimized `IN` query will read maximum 519 entries (cursor values) from the index:
+
+- 500 index-only scans for populating the arrays for each project. The cursor values of the first
+record will be here.
+- Maximum 19 additional index-only scans for the consecutive records.
+
+The optimized `IN` query will sort the array (cursor values per project array) 20 times, which
+means we'll sort 20 x 500 rows. However, this might be a less memory-intensive task than
+sorting 10 000 rows at once.
+
+Performance comparison for the `gitlab-org` group:
+
+| Query | Number of 8K Buffers involved | Uncached execution time | Cached execution time |
+| -------------------- | ----------------------------- | ----------------------- | --------------------- |
+| `IN` query | 240833 | 1.2s | 660ms |
+| Optimized `IN` query | 9783 | 450ms | 22ms |
+
+NOTE:
+Before taking measurements, the group lookup query was executed separately in order to make
+the group data available in the buffer cache. Since it's a frequently called query, it's going to
+hit many shared buffers during the query execution in the production environment.