Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/graphql_guide/batchloader.md')
-rw-r--r--doc/development/graphql_guide/batchloader.md121
1 files changed, 121 insertions, 0 deletions
diff --git a/doc/development/graphql_guide/batchloader.md b/doc/development/graphql_guide/batchloader.md
new file mode 100644
index 00000000000..6d529358499
--- /dev/null
+++ b/doc/development/graphql_guide/batchloader.md
@@ -0,0 +1,121 @@
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+---
+
+# GraphQL BatchLoader
+
+GitLab uses the [batch-loader](https://github.com/exAspArk/batch-loader) Ruby gem to optimize and avoid N+1 SQL queries.
+
+It is the properties of the GraphQL query tree that create opportunities for batching like this - disconnected nodes might need the same data, but cannot know about themselves.
+
+## When should you use it?
+
+We should try to batch DB requests as much as possible during GraphQL **query** execution. There is no need to batch loading during **mutations** because they are executed serially. If you need to make a database query, and it is possible to combine two similar (but not identical) queries, then consider using the batch-loader.
+
+When implementing a new endpoint we should aim to minimise the number of SQL queries. For stability and scalability we must also ensure that our queries do not suffer from N+1 performance issues.
+
+## Implementation
+
+Batch loading is useful when a series of queries for inputs `Qα, Qβ, ... Qω` can be combined to a single query for `Q[α, β, ... ω]`. An example of this is lookups by ID, where we can find two users by usernames as cheaply as one, but real-world examples can be more complex.
+
+Batchloading is not suitable when the result sets have different sort-orders, grouping, aggregation or other non-composable features.
+
+There are two ways to use the batch-loader in your code. For simple ID lookups, use `::Gitlab::Graphql::Loaders::BatchModelLoader.new(model, id).find`. For more complex cases, you can use the batch API directly.
+
+For example, to load a `User` by `username`, we can add batching as follows:
+
+```ruby
+class UserResolver < BaseResolver
+ type UserType, null: true
+ argument :username, ::GraphQL::STRING_TYPE, required: true
+
+ def resolve(**args)
+ BatchLoader::GraphQL.for(username).batch do |usernames, loader|
+ User.by_username(usernames).each do |user|
+ loader.call(user.username, user)
+ end
+ end
+ end
+end
+```
+
+- `project_id` is the `ID` of the current project being queried
+- `loader.call` is used to map the result back to the input key (here a project ID)
+- `BatchLoader::GraphQL` returns a lazy object (suspended promise to fetch the data)
+
+Here an [example MR](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/46549) illustrating how to use our `BatchLoading` mechanism.
+
+## How does it work exactly?
+
+Each lazy object knows which data it needs to load and how to batch the query. When we need to use the lazy objects (which we announce by calling `#sync`), they will be loaded along with all other similar objects in the current batch.
+
+Inside the block we execute a batch query for our items (`User`). After that, all we have to do is to call loader by passing an item which was used in `BatchLoader::GraphQL.for` method (`usernames`) and the loaded object itself (`user`):
+
+```ruby
+BatchLoader::GraphQL.for(username).batch do |usernames, loader|
+ User.by_username(usernames).each do |user|
+ loader.call(user.username, user)
+ end
+end
+```
+
+### What does lazy mean?
+
+It is important to avoid syncing batches too early. In the example below we can see how calling sync too early can eliminate opportunities for batching:
+
+```ruby
+x = find_lazy(1)
+y = find_lazy(2)
+
+# calling .sync will flush the current batch and will inhibit maximum laziness
+x.sync
+
+z = find_lazy(3)
+
+y.sync
+z.sync
+
+# => will run 2 queries
+```
+
+```ruby
+x = find_lazy(1)
+y = find_lazy(2)
+z = find_lazy(3)
+
+x.sync
+y.sync
+z.sync
+
+# => will run 1 query
+```
+
+## Testing
+
+Any GraphQL field that supports `BatchLoading` should be tested using the `batch_sync` method available in [GraphQLHelpers](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/support/helpers/graphql_helpers.rb).
+
+```ruby
+it 'returns data as a batch' do
+ results = batch_sync(max_queries: 1) do
+ [{ id: 1 }, { id: 2 }].map { |args| resolve(args) }
+ end
+
+ expect(results).to eq(expected_results)
+end
+
+def resolve(args = {}, context = { current_user: current_user })
+ resolve(described_class, obj: obj, args: args, ctx: context)
+end
+```
+
+We can also use [QueryRecorder](../query_recorder.md) to make sure we are performing only **one SQL query** per call.
+
+```ruby
+it 'executes only 1 SQL query' do
+ query_count = ActiveRecord::QueryRecorder.new { subject }.count
+
+ expect(query_count).to eq(1)
+end
+```