Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2020-11-19 11:27:35 +0300
committerGitLab Bot <gitlab-bot@gitlab.com>2020-11-19 11:27:35 +0300
commit7e9c479f7de77702622631cff2628a9c8dcbc627 (patch)
treec8f718a08e110ad7e1894510980d2155a6549197 /doc/development/graphql_guide
parente852b0ae16db4052c1c567d9efa4facc81146e88 (diff)
Add latest changes from gitlab-org/gitlab@13-6-stable-eev13.6.0-rc42
Diffstat (limited to 'doc/development/graphql_guide')
-rw-r--r--doc/development/graphql_guide/batchloader.md121
-rw-r--r--doc/development/graphql_guide/index.md6
-rw-r--r--doc/development/graphql_guide/pagination.md173
3 files changed, 296 insertions, 4 deletions
diff --git a/doc/development/graphql_guide/batchloader.md b/doc/development/graphql_guide/batchloader.md
new file mode 100644
index 00000000000..6d529358499
--- /dev/null
+++ b/doc/development/graphql_guide/batchloader.md
@@ -0,0 +1,121 @@
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+---
+
+# GraphQL BatchLoader
+
+GitLab uses the [batch-loader](https://github.com/exAspArk/batch-loader) Ruby gem to optimize and avoid N+1 SQL queries.
+
+It is the properties of the GraphQL query tree that create opportunities for batching like this - disconnected nodes might need the same data, but cannot know about themselves.
+
+## When should you use it?
+
+We should try to batch DB requests as much as possible during GraphQL **query** execution. There is no need to batch loading during **mutations** because they are executed serially. If you need to make a database query, and it is possible to combine two similar (but not identical) queries, then consider using the batch-loader.
+
+When implementing a new endpoint we should aim to minimise the number of SQL queries. For stability and scalability we must also ensure that our queries do not suffer from N+1 performance issues.
+
+## Implementation
+
+Batch loading is useful when a series of queries for inputs `Qα, Qβ, ... Qω` can be combined to a single query for `Q[α, β, ... ω]`. An example of this is lookups by ID, where we can find two users by usernames as cheaply as one, but real-world examples can be more complex.
+
+Batchloading is not suitable when the result sets have different sort-orders, grouping, aggregation or other non-composable features.
+
+There are two ways to use the batch-loader in your code. For simple ID lookups, use `::Gitlab::Graphql::Loaders::BatchModelLoader.new(model, id).find`. For more complex cases, you can use the batch API directly.
+
+For example, to load a `User` by `username`, we can add batching as follows:
+
+```ruby
+class UserResolver < BaseResolver
+ type UserType, null: true
+ argument :username, ::GraphQL::STRING_TYPE, required: true
+
+ def resolve(**args)
+ BatchLoader::GraphQL.for(username).batch do |usernames, loader|
+ User.by_username(usernames).each do |user|
+ loader.call(user.username, user)
+ end
+ end
+ end
+end
+```
+
+- `project_id` is the `ID` of the current project being queried
+- `loader.call` is used to map the result back to the input key (here a project ID)
+- `BatchLoader::GraphQL` returns a lazy object (suspended promise to fetch the data)
+
+Here an [example MR](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/46549) illustrating how to use our `BatchLoading` mechanism.
+
+## How does it work exactly?
+
+Each lazy object knows which data it needs to load and how to batch the query. When we need to use the lazy objects (which we announce by calling `#sync`), they will be loaded along with all other similar objects in the current batch.
+
+Inside the block we execute a batch query for our items (`User`). After that, all we have to do is to call loader by passing an item which was used in `BatchLoader::GraphQL.for` method (`usernames`) and the loaded object itself (`user`):
+
+```ruby
+BatchLoader::GraphQL.for(username).batch do |usernames, loader|
+ User.by_username(usernames).each do |user|
+ loader.call(user.username, user)
+ end
+end
+```
+
+### What does lazy mean?
+
+It is important to avoid syncing batches too early. In the example below we can see how calling sync too early can eliminate opportunities for batching:
+
+```ruby
+x = find_lazy(1)
+y = find_lazy(2)
+
+# calling .sync will flush the current batch and will inhibit maximum laziness
+x.sync
+
+z = find_lazy(3)
+
+y.sync
+z.sync
+
+# => will run 2 queries
+```
+
+```ruby
+x = find_lazy(1)
+y = find_lazy(2)
+z = find_lazy(3)
+
+x.sync
+y.sync
+z.sync
+
+# => will run 1 query
+```
+
+## Testing
+
+Any GraphQL field that supports `BatchLoading` should be tested using the `batch_sync` method available in [GraphQLHelpers](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/support/helpers/graphql_helpers.rb).
+
+```ruby
+it 'returns data as a batch' do
+ results = batch_sync(max_queries: 1) do
+ [{ id: 1 }, { id: 2 }].map { |args| resolve(args) }
+ end
+
+ expect(results).to eq(expected_results)
+end
+
+def resolve(args = {}, context = { current_user: current_user })
+ resolve(described_class, obj: obj, args: args, ctx: context)
+end
+```
+
+We can also use [QueryRecorder](../query_recorder.md) to make sure we are performing only **one SQL query** per call.
+
+```ruby
+it 'executes only 1 SQL query' do
+ query_count = ActiveRecord::QueryRecorder.new { subject }.count
+
+ expect(query_count).to eq(1)
+end
+```
diff --git a/doc/development/graphql_guide/index.md b/doc/development/graphql_guide/index.md
index 9d7fb5ba0a8..12b4f9796c7 100644
--- a/doc/development/graphql_guide/index.md
+++ b/doc/development/graphql_guide/index.md
@@ -1,3 +1,9 @@
+---
+stage: none
+group: unassigned
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+---
+
# GraphQL development guidelines
This guide contains all the information to successfully contribute to GitLab's
diff --git a/doc/development/graphql_guide/pagination.md b/doc/development/graphql_guide/pagination.md
index bf9eaa99158..d5140363396 100644
--- a/doc/development/graphql_guide/pagination.md
+++ b/doc/development/graphql_guide/pagination.md
@@ -1,3 +1,9 @@
+---
+stage: none
+group: unassigned
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+---
+
# GraphQL pagination
## Types of pagination
@@ -59,13 +65,13 @@ Some of the benefits and tradeoffs of keyset pagination are
- Performance is much better.
-- Data stability is greater since you're not going to miss records due to
+- More data stability for end-users since records are not missing from lists due to
deletions or insertions.
- It's the best way to do infinite scrolling.
- It's more difficult to program and maintain. Easy for `updated_at` and
- `sort_order`, complicated (or impossible) for complex sorting scenarios.
+ `sort_order`, complicated (or impossible) for [complex sorting scenarios](#limitations-of-query-complexity).
## Implementation
@@ -80,12 +86,171 @@ However, there are some cases where we have to use the offset
pagination connection, `OffsetActiveRecordRelationConnection`, such as when
sorting by label priority in issues, due to the complexity of the sort.
-<!-- ### Keyset pagination -->
+### Keyset pagination
+
+The keyset pagination implementation is a subclass of `GraphQL::Pagination::ActiveRecordRelationConnection`,
+which is a part of the `graphql` gem. This is installed as the default for all `ActiveRecord::Relation`.
+However, instead of using a cursor based on an offset (which is the default), GitLab uses a more specialized cursor.
+
+The cursor is created by encoding a JSON object which contains the relevant ordering fields. For example:
+
+```ruby
+ordering = {"id"=>"72410125", "created_at"=>"2020-10-08 18:05:21.953398000 UTC"}
+json = ordering.to_json
+cursor = Base64Bp.urlsafe_encode64(json, padding: false)
+
+"eyJpZCI6IjcyNDEwMTI1IiwiY3JlYXRlZF9hdCI6IjIwMjAtMTAtMDggMTg6MDU6MjEuOTUzMzk4MDAwIFVUQyJ9"
+
+json = Base64Bp.urlsafe_decode64(cursor)
+Gitlab::Json.parse(json)
+
+{"id"=>"72410125", "created_at"=>"2020-10-08 18:05:21.953398000 UTC"}
+```
+
+The benefits of storing the order attribute values in the cursor:
+
+- If only the ID of the object were stored, the object and its attributes could be queried.
+ That would require an additional query, and if the object is no longer there, then the needed
+ attributes are not available.
+- If an attribute is `NULL`, then one SQL query can be used. If it's not `NULL`, then a
+ different SQL query can be used.
+
+Based on whether the main attribute field being sorted on is `NULL` in the cursor, the proper query
+condition is built. The last ordering field is considered to be unique (a primary key), meaning the
+column never contains `NULL` values.
+
+#### Limitations of query complexity
+
+We only support two ordering fields, and one of those fields needs to be the primary key.
+
+Here are two examples of pseudocode for the query:
+
+- **Two-condition query.** `X` represents the values from the cursor. `C` represents
+ the columns in the database, sorted in ascending order, using an `:after` cursor, and with `NULL`
+ values sorted last.
+
+ ```plaintext
+ X1 IS NOT NULL
+ AND
+ (C1 > X1)
+ OR
+ (C1 IS NULL)
+ OR
+ (C1 = X1
+ AND
+ C2 > X2)
+
+ X1 IS NULL
+ AND
+ (C1 IS NULL
+ AND
+ C2 > X2)
+ ```
+
+ Below is an example based on the relation `Issue.order(relative_position: :asc).order(id: :asc)`
+ with an after cursor of `relative_position: 1500, id: 500`:
+
+ ```plaintext
+ when cursor[relative_position] is not NULL
+
+ ("issues"."relative_position" > 1500)
+ OR (
+ "issues"."relative_position" = 1500
+ AND
+ "issues"."id" > 500
+ )
+ OR ("issues"."relative_position" IS NULL)
+
+ when cursor[relative_position] is NULL
+
+ "issues"."relative_position" IS NULL
+ AND
+ "issues"."id" > 500
+ ```
+
+- **Three-condition query.** The example below is not complete, but shows the
+ complexity of adding one more condition. `X` represents the values from the cursor. `C` represents
+ the columns in the database, sorted in ascending order, using an `:after` cursor, and with `NULL`
+ values sorted last.
+
+ ```plaintext
+ X1 IS NOT NULL
+ AND
+ (C1 > X1)
+ OR
+ (C1 IS NULL)
+ OR
+ (C1 = X1 AND C2 > X2)
+ OR
+ (C1 = X1
+ AND
+ X2 IS NOT NULL
+ AND
+ ((C2 > X2)
+ OR
+ (C2 IS NULL)
+ OR
+ (C2 = X2 AND C3 > X3)
+ OR
+ X2 IS NULL.....
+ ```
+
+By using
+[`Gitlab::Graphql::Pagination::Keyset::QueryBuilder`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/graphql/pagination/keyset/query_builder.rb),
+we're able to build the necessary SQL conditions and apply them to the Active Record relation.
+
+Complex queries can be difficult or impossible to use. For example,
+in [`issuable.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/issuable.rb),
+the `order_due_date_and_labels_priority` method creates a very complex query.
+
+These types of queries are not supported. In these instances, you can use offset pagination.
+
+### Offset pagination
+
+There are times when the [complexity of sorting](#limitations-of-query-complexity)
+is more than our keyset pagination can handle.
+
+For example, in [`IssuesResolver`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/graphql/resolvers/issues_resolver.rb),
+when sorting by `priority_asc`, we can't use keyset pagination as the ordering is much
+too complex. For more information, read [`issuable.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/issuable.rb).
-<!-- ### Offset pagination -->
+In cases like this, we can fall back to regular offset pagination by returning a
+[`Gitlab::Graphql::Pagination::OffsetActiveRecordRelationConnection`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/graphql/pagination/offset_active_record_relation_connection.rb)
+instead of an `ActiveRecord::Relation`:
+
+```ruby
+ def resolve(parent, finder, **args)
+ issues = apply_lookahead(Gitlab::Graphql::Loaders::IssuableLoader.new(parent, finder).batching_find_all)
+
+ if non_stable_cursor_sort?(args[:sort])
+ # Certain complex sorts are not supported by the stable cursor pagination yet.
+ # In these cases, we use offset pagination, so we return the correct connection.
+ Gitlab::Graphql::Pagination::OffsetActiveRecordRelationConnection.new(issues)
+ else
+ issues
+ end
+ end
+```
<!-- ### External pagination -->
+### External pagination
+
+There may be times where you need to return data through the GitLab API that is stored in
+another system. In these cases you may have to paginate a third-party's API.
+
+An example of this is with our [Error Tracking](../../operations/error_tracking.md) implementation,
+where we proxy [Sentry errors](../../operations/error_tracking.md#sentry-error-tracking) through
+the GitLab API. We do this by calling the Sentry API which enforces its own pagination rules.
+This means we cannot access the collection within GitLab to perform our own custom pagination.
+
+For consistency, we manually set the pagination cursors based on values returned by the external API, using `Gitlab::Graphql::ExternallyPaginatedArray.new(previous_cursor, next_cursor, *items)`.
+
+You can see an example implementation in the following files:
+
+- [`types/error__tracking/sentry_error_collection_type.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/graphql/types/error_tracking/sentry_error_collection_type.rb) which adds an extension to `field :errors`.
+- [`resolvers/error_tracking/sentry_errors_resolver.rb`](https://gitlab.com/gitlab-org/gitlab/blob/master/app/graphql/resolvers/error_tracking/sentry_errors_resolver.rb) which returns the data from the resolver.
+
## Testing
Any GraphQL field that supports pagination and sorting should be tested