diff options
Diffstat (limited to 'doc/development/performance.md')
-rw-r--r-- | doc/development/performance.md | 293 |
1 files changed, 192 insertions, 101 deletions
diff --git a/doc/development/performance.md b/doc/development/performance.md index b5294c8359d..1e3e0570206 100644 --- a/doc/development/performance.md +++ b/doc/development/performance.md @@ -7,7 +7,38 @@ info: To determine the technical writer assigned to the Stage/Group associated w # Performance Guidelines This document describes various guidelines to follow to ensure good and -consistent performance of GitLab. +consistent performance of GitLab. Refer to the [Index](#performance-documentation) section below to navigate to Performance-related pages. + +## Performance Documentation + +- General: + - [Solving performance issues](#workflow) + - [Handbook performance page](https://about.gitlab.com/handbook/engineering/performance/) + - [Merge request performance guidelines](../development/merge_request_performance_guidelines.md) +- Backend: + - [Tooling](#tooling) + - Database: + - [Query performance guidelines](../development/query_performance.md) + - [Pagination performance guidelines](../development/database/pagination_performance_guidelines.md) + - [Keyset pagination performance](../development/database/keyset_pagination.md#performance) + - [Troubleshooting import/export performance issues](../development/import_export.md#troubleshooting-performance-issues) + - [Pipelines performance in the `gitlab` project](../development/pipelines.md#performance) +- Frontend: + - [Performance guidelines](../development/fe_guide/performance.md) + - [Performance dashboards and monitoring guidelines](../development/new_fe_guide/development/performance.md) + - [Browser performance testing guidelines](../user/project/merge_requests/browser_performance_testing.md) + - [`gdk measure` and `gdk measure-workflow`](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/gdk_commands.md#measure-performance) +- QA: + - [Load performance testing](../user/project/merge_requests/load_performance_testing.md) + - [GitLab Performance Tool project](https://gitlab.com/gitlab-org/quality/performance) + - [Review apps performance metrics](../development/testing_guide/review_apps.md#performance-metrics) +- Monitoring & Overview: + - [GitLab performance monitoring](../administration/monitoring/performance/index.md) + - [Development department performance indicators](https://about.gitlab.com/handbook/engineering/development/performance-indicators/) + - [Service measurement](../development/service_measurement.md) +- Self-managed administration and customer-focused: + - [File system performance benchmarking](../administration/operations/filesystem_benchmarking.md) + - [Sidekiq performance troubleshooting](../administration/troubleshooting/sidekiq.md) ## Workflow @@ -95,8 +126,13 @@ end This however leads to the question: how many iterations should we run to get meaningful statistics? -The benchmark-ips Gem basically takes care of all this and much more, and as a -result of this should be used instead of the `Benchmark` module. +The [`benchmark-ips`](https://github.com/evanphx/benchmark-ips) +gem takes care of all this and much more. You should therefore use it instead of the `Benchmark` +module. + +The GitLab Gemfile also contains the [`benchmark-memory`](https://github.com/michaelherold/benchmark-memory) +gem, which works similarly to the `benchmark` and `benchmark-ips` gems. However, `benchmark-memory` +instead returns the memory size, objects, and strings allocated and retained during the benchmark. In short: @@ -110,7 +146,7 @@ In short: - If you must write a benchmark use the benchmark-ips Gem instead of Ruby's `Benchmark` module. -## Profiling +## Profiling with Stackprof By collecting snapshots of process state at regular intervals, profiling allows you to see where time is spent in a process. The @@ -124,15 +160,36 @@ frequency (for example, 100hz, that is 100 stacks per second). This type of prof has quite a low (albeit non-zero) overhead and is generally considered to be safe for production. -### Development - A profiler can be a very useful tool during development, even if it does run *in an unrepresentative environment*. In particular, a method is not necessarily troublesome just because it's executed many times, or takes a long time to execute. Profiles are tools you can use to better understand what is happening in an application - using that information wisely is up to you! -Keeping that in mind, to create a profile, identify (or create) a spec that +There are multiple ways to create a profile with Stackprof. + +### Wrapping a code block + +To profile a specific code block, you can wrap that block in a `Stackprof.run` call: + +```ruby +StackProf.run(mode: :wall, out: 'tmp/stackprof-profiling.dump') do + #... +end +``` + +This creates a `.dump` file that you can [read](#reading-a-stackprof-profile). +For all available options, see the [Stackprof documentation](https://github.com/tmm1/stackprof#all-options). + +### Performance bar + +With the [Performance bar](../administration/monitoring/performance/performance_bar.md), +you have the option to profile a request using Stackprof and immediately output the results to a +[Speedscope flamegraph](profiling.md#speedscope-flamegraphs). + +### RSpec profiling with Stackprof + +To create a profile from a spec, identify (or create) a spec that exercises the troublesome code path, then run it using the `bin/rspec-stackprof` helper, for example: @@ -161,89 +218,10 @@ Finished in 18.19 seconds (files took 4.8 seconds to load) 187 (1.1%) 187 (1.1%) block (4 levels) in class_attribute ``` -You can limit the specs that are run by passing any arguments `rspec` would +You can limit the specs that are run by passing any arguments `RSpec` would normally take. -The output is sorted by the `Samples` column by default. This is the number of -samples taken where the method is the one currently being executed. The `Total` -column shows the number of samples taken where the method, or any of the methods -it calls, were being executed. - -To create a graphical view of the call stack: - -```shell -stackprof tmp/project_policy_spec.rb.dump --graphviz > project_policy_spec.dot -dot -Tsvg project_policy_spec.dot > project_policy_spec.svg -``` - -To load the profile in [KCachegrind](https://kcachegrind.github.io/): - -```shell -stackprof tmp/project_policy_spec.rb.dump --callgrind > project_policy_spec.callgrind -kcachegrind project_policy_spec.callgrind # Linux -qcachegrind project_policy_spec.callgrind # Mac -``` - -For flame graphs, enable raw collection first. Note that raw -collection can generate a very large file, so increase the `INTERVAL`, or -run on a smaller number of specs for smaller file size: - -```shell -RAW=true bin/rspec-stackprof spec/policies/group_member_policy_spec.rb -``` - -You can then generate, and view the resultant flame graph. It might take a -while to generate based on the output file size: - -```shell -# Generate -stackprof --flamegraph tmp/group_member_policy_spec.rb.dump > group_member_policy_spec.flame - -# View -stackprof --flamegraph-viewer=group_member_policy_spec.flame -``` - -It may be useful to zoom in on a specific method, for example: - -```shell -$ stackprof tmp/project_policy_spec.rb.dump --method warm_asset_cache - -TestEnv#warm_asset_cache (/Users/lupine/dev/gitlab.com/gitlab-org/gitlab-development-kit/gitlab/spec/support/test_env.rb:164) - samples: 0 self (0.0%) / 6288 total (36.9%) - callers: - 6288 ( 100.0%) block (2 levels) in <top (required)> - callees (6288 total): - 6288 ( 100.0%) Capybara::RackTest::Driver#visit - code: - | 164 | def warm_asset_cache - | 165 | return if warm_asset_cache? - | 166 | return unless defined?(Capybara) - | 167 | - 6288 (36.9%) | 168 | Capybara.current_session.driver.visit '/' - | 169 | end -$ stackprof tmp/project_policy_spec.rb.dump --method BasePolicy#abilities -BasePolicy#abilities (/Users/lupine/dev/gitlab.com/gitlab-org/gitlab-development-kit/gitlab/app/policies/base_policy.rb:79) - samples: 0 self (0.0%) / 50 total (0.3%) - callers: - 25 ( 50.0%) BasePolicy.abilities - 25 ( 50.0%) BasePolicy#collect_rules - callees (50 total): - 25 ( 50.0%) ProjectPolicy#rules - 25 ( 50.0%) BasePolicy#collect_rules - code: - | 79 | def abilities - | 80 | return RuleSet.empty if @user && @user.blocked? - | 81 | return anonymous_abilities if @user.nil? - 50 (0.3%) | 82 | collect_rules { rules } - | 83 | end -``` - -Since the profile includes the work done by the test suite as well as the -application code, these profiles can be used to investigate slow tests as well. -However, for smaller runs (like this example), this means that the cost of -setting up the test suite tends to dominate. - -### Production +### Using Stackprof in production Stackprof can also be used to profile production workloads. @@ -274,8 +252,8 @@ the timeout. Once profiling stops, the profile is written out to disk at `$STACKPROF_FILE_PREFIX/stackprof.$PID.$RAND.profile`. It can then be inspected -further via the `stackprof` command line tool, as described in the previous -section. +further through the `stackprof` command line tool, as described in the +[Reading a Stackprof profile section](#reading-a-stackprof-profile). Currently supported profiling targets are: @@ -295,14 +273,85 @@ For Sidekiq, the signal can be sent to the `sidekiq-cluster` process via `pkill -USR2 bin/sidekiq-cluster`, which forwards the signal to all Sidekiq children. Alternatively, you can also select a specific PID of interest. -Production profiles can be especially noisy. It can be helpful to visualize them -as a [flame graph](https://github.com/brendangregg/FlameGraph). This can be done -via: +### Reading a Stackprof profile + +The output is sorted by the `Samples` column by default. This is the number of samples taken where +the method is the one currently executed. The `Total` column shows the number of samples taken where +the method (or any of the methods it calls) is executed. + +To create a graphical view of the call stack: ```shell -bundle exec stackprof --stackcollapse /tmp/stackprof.55769.c6c3906452.profile | flamegraph.pl > flamegraph.svg +stackprof tmp/project_policy_spec.rb.dump --graphviz > project_policy_spec.dot +dot -Tsvg project_policy_spec.dot > project_policy_spec.svg ``` +To load the profile in [KCachegrind](https://kcachegrind.github.io/): + +```shell +stackprof tmp/project_policy_spec.rb.dump --callgrind > project_policy_spec.callgrind +kcachegrind project_policy_spec.callgrind # Linux +qcachegrind project_policy_spec.callgrind # Mac +``` + +You can also generate and view the resultant flame graph. To view a flame graph that +`bin/rspec-stackprof` creates, you must set the `RAW` environment variable to `true` when running +`bin/rspec-stackprof`. + +It might take a while to generate based on the output file size: + +```shell +# Generate +stackprof --flamegraph tmp/group_member_policy_spec.rb.dump > group_member_policy_spec.flame + +# View +stackprof --flamegraph-viewer=group_member_policy_spec.flame +``` + +To export the flame graph to an SVG file, use [Brendan Gregg's FlameGraph tool](https://github.com/brendangregg/FlameGraph): + +```shell +stackprof --stackcollapse /tmp/group_member_policy_spec.rb.dump | flamegraph.pl > flamegraph.svg +``` + +It's also possible to view flame graphs through [speedscope](https://github.com/jlfwong/speedscope). +You can do this when using the [performance bar](profiling.md#speedscope-flamegraphs) +and when [profiling code blocks](https://github.com/jlfwong/speedscope/wiki/Importing-from-stackprof-(ruby)). +This option isn't supported by `bin/rspec-stackprof`. + +You can profile speciific methods by using `--method method_name`: + +```shell +$ stackprof tmp/project_policy_spec.rb.dump --method access_allowed_to + +ProjectPolicy#access_allowed_to? (/Users/royzwambag/work/gitlab-development-kit/gitlab/app/policies/project_policy.rb:793) + samples: 0 self (0.0%) / 578 total (0.7%) + callers: + 397 ( 68.7%) block (2 levels) in <class:ProjectPolicy> + 95 ( 16.4%) block in <class:ProjectPolicy> + 86 ( 14.9%) block in <class:ProjectPolicy> + callees (578 total): + 399 ( 69.0%) ProjectPolicy#team_access_level + 141 ( 24.4%) Project::GeneratedAssociationMethods#project_feature + 30 ( 5.2%) DeclarativePolicy::Base#can? + 8 ( 1.4%) Featurable#access_level + code: + | 793 | def access_allowed_to?(feature) + 141 (0.2%) | 794 | return false unless project.project_feature + | 795 | + 8 (0.0%) | 796 | case project.project_feature.access_level(feature) + | 797 | when ProjectFeature::DISABLED + | 798 | false + | 799 | when ProjectFeature::PRIVATE + 429 (0.5%) | 800 | can?(:read_all_resources) || team_access_level >= ProjectFeature.required_minimum_access_level(feature) + | 801 | else +``` + +When using Stackprof to profile specs, the profile includes the work done by the test suite and the +application code. You can therefore use these profiles to investigate slow tests as well. However, +for smaller runs (like this example), this means that the cost of setting up the test suite tends to +dominate. + ## RSpec profiling The GitLab development environment also includes the @@ -459,11 +508,14 @@ The `mem_*` values represent different aspects of how objects and memory are all We can use `memory_profiler` for profiling. -The [`memory_profiler`](https://github.com/SamSaffron/memory_profiler) gem is already present in the GitLab `Gemfile`, -you just need to require it: +The [`memory_profiler`](https://github.com/SamSaffron/memory_profiler) +gem is already present in the GitLab `Gemfile`. It's also available in the [performance bar](../administration/monitoring/performance/performance_bar.md) +for the current URL. + +To use the memory profiler directly in your code, use `require` to add it: ```ruby -require 'sidekiq/testing' +require 'memory_profiler' report = MemoryProfiler.report do # Code you want to profile @@ -473,10 +525,17 @@ output = File.open('/tmp/profile.txt','w') report.pretty_print(output) ``` -The report breaks down 2 key concepts: +The report shows the retained and allocated memory grouped by gem, file, location, and class. The +memory profiler also performs a string analysis that shows how often a string is allocated and +retained. -- Retained: long lived memory use and object count retained due to the execution of the code block. -- Allocated: all object allocation and memory allocation during code block. +#### Retained versus allocated + +- Retained memory: long-lived memory use and object count retained due to the execution of the code + block. This has a direct impact on memory and the garbage collector. +- Allocated memory: all object allocation and memory allocation during the code block. This might + have minimal impact on memory, but substantial impact on performance. The more objects you + allocate, the more work is being done and the slower the application is. As a general rule, **retained** is always smaller than or equal to **allocated**. @@ -512,6 +571,32 @@ Fragmented Ruby heap snapshot could look like this: Memory fragmentation could be reduced by tuning GC parameters [as described in this post](https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html). This should be considered as a tradeoff, as it may affect overall performance of memory allocation and GC cycles. +### Derailed Benchmarks + +`derailed_benchmarks` is a [gem](https://github.com/zombocom/derailed_benchmarks) +described as "A series of things you can use to benchmark a Rails or Ruby app." +We include `derailed_benchmarks` in our `Gemfile`. + +We run `derailed exec perf:mem` in every pipeline with a `test` stage, in a job +called `memory-on-boot`. ([Read an example job.](https://gitlab.com/gitlab-org/gitlab/-/jobs/2144695684).) +You may find the results: + +- On the merge request **Overview** tab, in the merge request reports area, in the + **Metrics Reports** [dropdown list](../ci/metrics_reports.md). +- In the `memory-on-boot` artifacts for a full report and a dependency breakdown. + +`derailed_benchmarks` also provides other methods to investigate memory. To learn more, +refer to the [gem documentation](https://github.com/zombocom/derailed_benchmarks#running-derailed-exec). +Most of the methods (`derailed exec perf:*`) attempt to boot your Rails app in a +`production` environment and run benchmarks against it. +It is possible both in GDK and GCK: + +- For GDK, follow the + [the instructions](https://github.com/zombocom/derailed_benchmarks#running-in-production-locally) + on the gem page. You must do similar for Redis configurations to avoid errors. +- GCK includes `production` configuration sections + [out of the box](https://gitlab.com/gitlab-org/gitlab-compose-kit#running-production-like). + ## Importance of Changes When working on performance improvements, it's important to always ask yourself @@ -612,7 +697,7 @@ end ## String Freezing -In recent Ruby versions calling `freeze` on a String leads to it being allocated +In recent Ruby versions calling `.freeze` on a String leads to it being allocated only once and re-used. For example, on Ruby 2.3 or later this only allocates the "foo" String once: @@ -626,6 +711,12 @@ Depending on the size of the String and how frequently it would be allocated (before the `.freeze` call was added), this _may_ make things faster, but this isn't guaranteed. +Freezing strings saves memory, as every allocated string uses at least one `RVALUE_SIZE` bytes (40 +bytes on x64) of memory. + +You can use the [memory profiler](#using-memory-profiler) +to see which strings are allocated often and could potentially benefit from a `.freeze`. + Strings are frozen by default in Ruby 3.0. To prepare our codebase for this eventuality, we are adding the following header to all Ruby files: |