Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/graphql_guide/batchloader.md')
-rw-r--r--doc/development/graphql_guide/batchloader.md86
1 files changed, 78 insertions, 8 deletions
diff --git a/doc/development/graphql_guide/batchloader.md b/doc/development/graphql_guide/batchloader.md
index 0e90f89ff7a..492d3bc9007 100644
--- a/doc/development/graphql_guide/batchloader.md
+++ b/doc/development/graphql_guide/batchloader.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
@@ -12,7 +12,7 @@ It is the properties of the GraphQL query tree that create opportunities for bat
## When should you use it?
-We should try to batch DB requests as much as possible during GraphQL **query** execution. There is no need to batch loading during **mutations** because they are executed serially. If you need to make a database query, and it is possible to combine two similar (but not identical) queries, then consider using the batch-loader.
+We should try to batch DB requests as much as possible during GraphQL **query** execution. There is no need to batch loading during **mutations** because they are executed serially. If you need to make a database query, and it is possible to combine two similar (but not necessarily identical) queries, then consider using the batch-loader.
When implementing a new endpoint we should aim to minimise the number of SQL queries. For stability and scalability we must also ensure that our queries do not suffer from N+1 performance issues.
@@ -20,7 +20,7 @@ When implementing a new endpoint we should aim to minimise the number of SQL que
Batch loading is useful when a series of queries for inputs `Qα, Qβ, ... Qω` can be combined to a single query for `Q[α, β, ... ω]`. An example of this is lookups by ID, where we can find two users by usernames as cheaply as one, but real-world examples can be more complex.
-Batch loading is not suitable when the result sets have different sort-orders, grouping, aggregation or other non-composable features.
+Batch loading is not suitable when the result sets have different sort orders, grouping, aggregation, or other non-composable features.
There are two ways to use the batch-loader in your code. For simple ID lookups, use `::Gitlab::Graphql::Loaders::BatchModelLoader.new(model, id).find`. For more complex cases, you can use the batch API directly.
@@ -47,9 +47,29 @@ end
Here an [example MR](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/46549) illustrating how to use our `BatchLoading` mechanism.
+## The `BatchModelLoader`
+
+For ID lookups, the advice is to use the `BatchModelLoader`:
+
+```ruby
+def project
+ ::Gitlab::Graphql::Loaders::BatchModelLoader.new(::Project, object.project_id).find
+end
+```
+
+To preload associations, you can pass an array of them:
+
+```ruby
+def issue(lookahead:)
+ preloads = [:author] if lookahead.selects?(:author)
+
+ ::Gitlab::Graphql::Loaders::BatchModelLoader.new(::Issue, object.issue_id, preloads).find
+end
+```
+
## How does it work exactly?
-Each lazy object knows which data it needs to load and how to batch the query. When we need to use the lazy objects (which we announce by calling `#sync`), they will be loaded along with all other similar objects in the current batch.
+Each lazy object knows which data it needs to load and how to batch the query. When we need to use the lazy objects (which we announce by calling `#sync`), they are loaded along with all other similar objects in the current batch.
Inside the block we execute a batch query for our items (`User`). After that, all we have to do is to call loader by passing an item which was used in `BatchLoader::GraphQL.for` method (`usernames`) and the loaded object itself (`user`):
@@ -61,9 +81,28 @@ BatchLoader::GraphQL.for(username).batch do |usernames, loader|
end
```
+The batch-loader uses the source code location of the block to determine
+which requests belong in the same queue, but only one instance of the block
+is evaluated for each batch. You do not control which one.
+
+For this reason, it is important that:
+
+- The block must not refer to (close over) any instance state on objects. The best practice
+ is to pass all data the block needs through to it in the `for(data)` call.
+- The block must be specific to a kind of batched data. Implementing generic
+ loaders (such as the `BatchModelLoader`) is possible, but it requires the use
+ of an injective `key` argument.
+- Batches are not shared unless they refer to the same block - two identical blocks
+ with the same behavior, parameters, and keys do not get shared. For this reason,
+ never implement batched ID lookups on your own, instead use the `BatchModelLoader` for
+ maximum sharing. If you see two fields define the same batch-loading, consider
+ extracting that out to a new `Loader`, and enabling them to share.
+
### What does lazy mean?
-It is important to avoid syncing batches too early. In the example below we can see how calling sync too early can eliminate opportunities for batching:
+It is important to avoid syncing batches (forcing their evaluation) too early. The following example shows how calling sync too early can eliminate opportunities for batching.
+
+This example calls sync on `x` too early:
```ruby
x = find_lazy(1)
@@ -80,6 +119,8 @@ z.sync
# => will run 2 queries
```
+However, this example waits until all requests are queued, and eliminates the extra query:
+
```ruby
x = find_lazy(1)
y = find_lazy(2)
@@ -92,9 +133,38 @@ z.sync
# => will run 1 query
```
+NOTE:
+There is no dependency analysis in the use of batch-loading. There is simply
+a pending queue of requests, and as soon as any one result is needed, all pending
+requests are evaluated.
+
+You should never call `batch.sync` or use `Lazy.force` in resolver code.
+If you depend on a lazy value, use `Lazy.with_value` instead:
+
+```ruby
+def publisher
+ ::Gitlab::Graphql::Loaders::BatchModelLoader.new(::Publisher, object.publisher_id).find
+end
+
+# Here we need the publisher in order to generate the catalog URL
+def catalog_url
+ ::Gitlab::Graphql::Lazy.with_value(publisher) do |p|
+ UrlHelpers.book_catalog_url(publisher, object.isbn)
+ end
+end
+```
+
## Testing
-Any GraphQL field that supports `BatchLoading` should be tested using the `batch_sync` method available in [GraphQLHelpers](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/support/helpers/graphql_helpers.rb).
+Ideally, do all your testing using request specs, and using `Schema.execute`. If
+you do so, you do not need to manage the lifecycle of lazy values yourself, and
+you are assured accurate results.
+
+GraphQL fields that return lazy values may need these values forced in tests.
+Forcing refers to explicit demands for evaluation, where this would normally
+be arranged by the framework.
+
+You can force a lazy value with the `GraphqlHelpers#batch_sync` method available in [GraphQLHelpers](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/support/helpers/graphql_helpers.rb), or by using `Gitlab::Graphql::Lazy.force`. For example:
```ruby
it 'returns data as a batch' do
@@ -114,8 +184,8 @@ We can also use [QueryRecorder](../query_recorder.md) to make sure we are perfor
```ruby
it 'executes only 1 SQL query' do
- query_count = ActiveRecord::QueryRecorder.new { subject }.count
+ query_count = ActiveRecord::QueryRecorder.new { subject }
- expect(query_count).to eq(1)
+ expect(query_count).not_to exceed_query_limit(1)
end
```