Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2020-12-17 14:59:07 +0300
committerGitLab Bot <gitlab-bot@gitlab.com>2020-12-17 14:59:07 +0300
commit8b573c94895dc0ac0e1d9d59cf3e8745e8b539ca (patch)
tree544930fb309b30317ae9797a9683768705d664c4 /doc/development/merge_request_performance_guidelines.md
parent4b1de649d0168371549608993deac953eb692019 (diff)
Add latest changes from gitlab-org/gitlab@13-7-stable-eev13.7.0-rc42
Diffstat (limited to 'doc/development/merge_request_performance_guidelines.md')
-rw-r--r--doc/development/merge_request_performance_guidelines.md60
1 files changed, 30 insertions, 30 deletions
diff --git a/doc/development/merge_request_performance_guidelines.md b/doc/development/merge_request_performance_guidelines.md
index a5d9a653472..8d5b2db828e 100644
--- a/doc/development/merge_request_performance_guidelines.md
+++ b/doc/development/merge_request_performance_guidelines.md
@@ -1,7 +1,7 @@
---
stage: none
group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
# Merge Request Performance Guidelines
@@ -46,8 +46,8 @@ and running.
Can the queries used potentially take down any critical services and result in
engineers being woken up in the night? Can a malicious user abuse the code to
-take down a GitLab instance? Will my changes simply make loading a certain page
-slower? Will execution time grow exponentially given enough load or data in the
+take down a GitLab instance? Do my changes simply make loading a certain page
+slower? Does execution time grow exponentially given enough load or data in the
database?
These are all questions one should ask themselves before submitting a merge
@@ -67,14 +67,14 @@ in turn can request a performance specialist to review the changes.
## Think outside of the box
-Everyone has their own perception how the new feature is going to be used.
+Everyone has their own perception of how to use the new feature.
Always consider how users might be using the feature instead. Usually,
users test our features in a very unconventional way,
like by brute forcing or abusing edge conditions that we have.
## Data set
-The data set that will be processed by the merge request should be known
+The data set the merge request processes should be known
and documented. The feature should clearly document what the expected
data set is for this feature to process, and what problems it might cause.
@@ -86,8 +86,8 @@ from the repository and perform search for the set of files.
As an author you should in context of that problem consider
the following:
-1. What repositories are going to be supported?
-1. How long it will take for big repositories like Linux kernel?
+1. What repositories are planned to be supported?
+1. How long it do big repositories like Linux kernel take?
1. Is there something that we can do differently to not process such a
big data set?
1. Should we build some fail-safe mechanism to contain
@@ -96,28 +96,28 @@ the following:
## Query plans and database structure
-The query plan can tell us if we will need additional
+The query plan can tell us if we need additional
indexes, or expensive filtering (such as using sequential scans).
Each query plan should be run against substantial size of data set.
For example, if you look for issues with specific conditions,
you should consider validating a query against
a small number (a few hundred) and a big number (100_000) of issues.
-See how the query will behave if the result will be a few
+See how the query behaves if the result is a few
and a few thousand.
This is needed as we have users using GitLab for very big projects and
in a very unconventional way. Even if it seems that it's unlikely
-that such a big data set will be used, it's still plausible that one
-of our customers will encounter a problem with the feature.
+that such a big data set is used, it's still plausible that one
+of our customers could encounter a problem with the feature.
-Understanding ahead of time how it's going to behave at scale, even if we accept it,
-is the desired outcome. We should always have a plan or understanding of what it will take
+Understanding ahead of time how it behaves at scale, even if we accept it,
+is the desired outcome. We should always have a plan or understanding of what is needed
to optimize the feature for higher usage patterns.
Every database structure should be optimized and sometimes even over-described
in preparation for easy extension. The hardest part after some point is
-data migration. Migrating millions of rows will always be troublesome and
+data migration. Migrating millions of rows is always troublesome and
can have a negative impact on the application.
To better understand how to get help with the query plan reviews
@@ -130,7 +130,7 @@ queries unless absolutely necessary.
The total number of queries executed by the code modified or added by a merge request
must not increase unless absolutely necessary. When building features it's
-entirely possible you will need some extra queries, but you should try to keep
+entirely possible you need some extra queries, but you should try to keep
this at a minimum.
As an example, say you introduce a feature that updates a number of database
@@ -144,7 +144,7 @@ objects_to_update.each do |object|
end
```
-This will end up running one query for every object to update. This code can
+This means running one query for every object to update. This code can
easily overload a database given enough rows to update or many instances of this
code running in parallel. This particular problem is known as the
["N+1 query problem"](https://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations). You can write a test with [QueryRecorder](query_recorder.md) to detect this and prevent regressions.
@@ -162,11 +162,11 @@ query. This in turn makes it much harder for this code to overload a database.
**Summary:** a merge request **should not** execute duplicated cached queries.
-Rails provides an [SQL Query Cache](cached_queries.md#cached-queries-guidelines),
-used to cache the results of database queries for the duration of the request.
+Rails provides an [SQL Query Cache](cached_queries.md#cached-queries-guidelines),
+used to cache the results of database queries for the duration of the request.
-See [why cached queries are considered bad](cached_queries.md#why-cached-queries-are-considered-bad) and
-[how to detect them](cached_queries.md#how-to-detect).
+See [why cached queries are considered bad](cached_queries.md#why-cached-queries-are-considered-bad) and
+[how to detect them](cached_queries.md#how-to-detect-cached-queries).
The code introduced by a merge request, should not execute multiple duplicated cached queries.
@@ -189,8 +189,8 @@ build.project == pipeline_project
# => true
```
-When we call `build.project`, it will not hit the database, it will use the cached result, but it will re-instantiate
-same pipeline project object. It turns out that associated objects do not point to the same in-memory object.
+When we call `build.project`, it doesn't hit the database, it uses the cached result, but it re-instantiates
+the same pipeline project object. It turns out that associated objects do not point to the same in-memory object.
If we try to serialize each build:
@@ -200,7 +200,7 @@ pipeline.builds.each do |build|
end
```
-It will re-instantiate project object for each build, instead of using the same in-memory object.
+It re-instantiates project object for each build, instead of using the same in-memory object.
In this particular case the workaround is fairly easy:
@@ -212,7 +212,7 @@ end
```
We can assign `pipeline.project` to each `build.project`, since we know it should point to the same project.
-This will allow us that each build point to the same in-memory project,
+This allows us that each build point to the same in-memory project,
avoiding the cached SQL query and re-instantiation of the project object for each build.
## Executing Queries in Loops
@@ -323,7 +323,7 @@ Certain UI elements may not always be needed. For example, when hovering over a
diff line there's a small icon displayed that can be used to create a new
comment. Instead of always rendering these kind of elements they should only be
rendered when actually needed. This ensures we don't spend time generating
-Haml/HTML when it's not going to be used.
+Haml/HTML when it's not used.
## Instrumenting New Code
@@ -411,8 +411,8 @@ The quota should be optimised to a level that we consider the feature to
be performant and usable for the user, but **not limiting**.
**We want the features to be fully usable for the users.**
-**However, we want to ensure that the feature will continue to perform well if used at its limit**
-**and it won't cause availability issues.**
+**However, we want to ensure that the feature continues to perform well if used at its limit**
+**and it doesn't cause availability issues.**
Consider that it's always better to start with some kind of limitation,
instead of later introducing a breaking change that would result in some
@@ -433,11 +433,11 @@ The intent of quotas could be different:
Examples:
-1. Pipeline Schedules: It's very unlikely that user will want to create
+1. Pipeline Schedules: It's very unlikely that user wants to create
more than 50 schedules.
In such cases it's rather expected that this is either misuse
or abuse of the feature. Lack of the upper limit can result
- in service degradation as the system will try to process all schedules
+ in service degradation as the system tries to process all schedules
assigned the project.
1. GitLab CI/CD includes: We started with the limit of maximum of 50 nested includes.
@@ -477,7 +477,7 @@ We can consider the following types of storages:
for most of our implementations. Even though this allows the above limit to be significantly larger,
it does not really mean that you can use more. The shared temporary storage is shared by
all nodes. Thus, the job that uses significant amount of that space or performs a lot
- of operations will create a contention on execution of all other jobs and request
+ of operations creates a contention on execution of all other jobs and request
across the whole application, this can easily impact stability of the whole GitLab.
Be respectful of that.