diff options
Diffstat (limited to 'doc/development/graphql_guide/batchloader.md')
-rw-r--r-- | doc/development/graphql_guide/batchloader.md | 86 |
1 files changed, 78 insertions, 8 deletions
diff --git a/doc/development/graphql_guide/batchloader.md b/doc/development/graphql_guide/batchloader.md index 0e90f89ff7a..492d3bc9007 100644 --- a/doc/development/graphql_guide/batchloader.md +++ b/doc/development/graphql_guide/batchloader.md @@ -1,5 +1,5 @@ --- -stage: Enablement +stage: Data Stores group: Database info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments --- @@ -12,7 +12,7 @@ It is the properties of the GraphQL query tree that create opportunities for bat ## When should you use it? -We should try to batch DB requests as much as possible during GraphQL **query** execution. There is no need to batch loading during **mutations** because they are executed serially. If you need to make a database query, and it is possible to combine two similar (but not identical) queries, then consider using the batch-loader. +We should try to batch DB requests as much as possible during GraphQL **query** execution. There is no need to batch loading during **mutations** because they are executed serially. If you need to make a database query, and it is possible to combine two similar (but not necessarily identical) queries, then consider using the batch-loader. When implementing a new endpoint we should aim to minimise the number of SQL queries. For stability and scalability we must also ensure that our queries do not suffer from N+1 performance issues. @@ -20,7 +20,7 @@ When implementing a new endpoint we should aim to minimise the number of SQL que Batch loading is useful when a series of queries for inputs `Qα, Qβ, ... Qω` can be combined to a single query for `Q[α, β, ... ω]`. An example of this is lookups by ID, where we can find two users by usernames as cheaply as one, but real-world examples can be more complex. -Batch loading is not suitable when the result sets have different sort-orders, grouping, aggregation or other non-composable features. +Batch loading is not suitable when the result sets have different sort orders, grouping, aggregation, or other non-composable features. There are two ways to use the batch-loader in your code. For simple ID lookups, use `::Gitlab::Graphql::Loaders::BatchModelLoader.new(model, id).find`. For more complex cases, you can use the batch API directly. @@ -47,9 +47,29 @@ end Here an [example MR](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/46549) illustrating how to use our `BatchLoading` mechanism. +## The `BatchModelLoader` + +For ID lookups, the advice is to use the `BatchModelLoader`: + +```ruby +def project + ::Gitlab::Graphql::Loaders::BatchModelLoader.new(::Project, object.project_id).find +end +``` + +To preload associations, you can pass an array of them: + +```ruby +def issue(lookahead:) + preloads = [:author] if lookahead.selects?(:author) + + ::Gitlab::Graphql::Loaders::BatchModelLoader.new(::Issue, object.issue_id, preloads).find +end +``` + ## How does it work exactly? -Each lazy object knows which data it needs to load and how to batch the query. When we need to use the lazy objects (which we announce by calling `#sync`), they will be loaded along with all other similar objects in the current batch. +Each lazy object knows which data it needs to load and how to batch the query. When we need to use the lazy objects (which we announce by calling `#sync`), they are loaded along with all other similar objects in the current batch. Inside the block we execute a batch query for our items (`User`). After that, all we have to do is to call loader by passing an item which was used in `BatchLoader::GraphQL.for` method (`usernames`) and the loaded object itself (`user`): @@ -61,9 +81,28 @@ BatchLoader::GraphQL.for(username).batch do |usernames, loader| end ``` +The batch-loader uses the source code location of the block to determine +which requests belong in the same queue, but only one instance of the block +is evaluated for each batch. You do not control which one. + +For this reason, it is important that: + +- The block must not refer to (close over) any instance state on objects. The best practice + is to pass all data the block needs through to it in the `for(data)` call. +- The block must be specific to a kind of batched data. Implementing generic + loaders (such as the `BatchModelLoader`) is possible, but it requires the use + of an injective `key` argument. +- Batches are not shared unless they refer to the same block - two identical blocks + with the same behavior, parameters, and keys do not get shared. For this reason, + never implement batched ID lookups on your own, instead use the `BatchModelLoader` for + maximum sharing. If you see two fields define the same batch-loading, consider + extracting that out to a new `Loader`, and enabling them to share. + ### What does lazy mean? -It is important to avoid syncing batches too early. In the example below we can see how calling sync too early can eliminate opportunities for batching: +It is important to avoid syncing batches (forcing their evaluation) too early. The following example shows how calling sync too early can eliminate opportunities for batching. + +This example calls sync on `x` too early: ```ruby x = find_lazy(1) @@ -80,6 +119,8 @@ z.sync # => will run 2 queries ``` +However, this example waits until all requests are queued, and eliminates the extra query: + ```ruby x = find_lazy(1) y = find_lazy(2) @@ -92,9 +133,38 @@ z.sync # => will run 1 query ``` +NOTE: +There is no dependency analysis in the use of batch-loading. There is simply +a pending queue of requests, and as soon as any one result is needed, all pending +requests are evaluated. + +You should never call `batch.sync` or use `Lazy.force` in resolver code. +If you depend on a lazy value, use `Lazy.with_value` instead: + +```ruby +def publisher + ::Gitlab::Graphql::Loaders::BatchModelLoader.new(::Publisher, object.publisher_id).find +end + +# Here we need the publisher in order to generate the catalog URL +def catalog_url + ::Gitlab::Graphql::Lazy.with_value(publisher) do |p| + UrlHelpers.book_catalog_url(publisher, object.isbn) + end +end +``` + ## Testing -Any GraphQL field that supports `BatchLoading` should be tested using the `batch_sync` method available in [GraphQLHelpers](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/support/helpers/graphql_helpers.rb). +Ideally, do all your testing using request specs, and using `Schema.execute`. If +you do so, you do not need to manage the lifecycle of lazy values yourself, and +you are assured accurate results. + +GraphQL fields that return lazy values may need these values forced in tests. +Forcing refers to explicit demands for evaluation, where this would normally +be arranged by the framework. + +You can force a lazy value with the `GraphqlHelpers#batch_sync` method available in [GraphQLHelpers](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/support/helpers/graphql_helpers.rb), or by using `Gitlab::Graphql::Lazy.force`. For example: ```ruby it 'returns data as a batch' do @@ -114,8 +184,8 @@ We can also use [QueryRecorder](../query_recorder.md) to make sure we are perfor ```ruby it 'executes only 1 SQL query' do - query_count = ActiveRecord::QueryRecorder.new { subject }.count + query_count = ActiveRecord::QueryRecorder.new { subject } - expect(query_count).to eq(1) + expect(query_count).not_to exceed_query_limit(1) end ``` |