1 files changed, 192 insertions, 101 deletions
diff --git a/doc/development/performance.md b/doc/development/performance.md
index b5294c8359d..1e3e0570206 100644
--- a/doc/development/performance.md
+++ b/doc/development/performance.md
@@ -7,7 +7,38 @@ info: To determine the technical writer assigned to the Stage/Group associated w
 # Performance Guidelines
 
 This document describes various guidelines to follow to ensure good and
-consistent performance of GitLab.
+consistent performance of GitLab. Refer to the [Index](#performance-documentation) section below to navigate to Performance-related pages.
+
+## Performance Documentation
+
+- General:
+  - [Solving performance issues](#workflow)
+  - [Handbook performance page](https://about.gitlab.com/handbook/engineering/performance/)
+  - [Merge request performance guidelines](../development/merge_request_performance_guidelines.md)
+- Backend:
+  - [Tooling](#tooling)
+  - Database:
+    - [Query performance guidelines](../development/query_performance.md)
+    - [Pagination performance guidelines](../development/database/pagination_performance_guidelines.md)
+    - [Keyset pagination performance](../development/database/keyset_pagination.md#performance)
+  - [Troubleshooting import/export performance issues](../development/import_export.md#troubleshooting-performance-issues)
+  - [Pipelines performance in the `gitlab` project](../development/pipelines.md#performance)
+- Frontend:
+  - [Performance guidelines](../development/fe_guide/performance.md)
+  - [Performance dashboards and monitoring guidelines](../development/new_fe_guide/development/performance.md)
+  - [Browser performance testing guidelines](../user/project/merge_requests/browser_performance_testing.md)
+  - [`gdk measure` and `gdk measure-workflow`](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/gdk_commands.md#measure-performance)
+- QA:
+  - [Load performance testing](../user/project/merge_requests/load_performance_testing.md)
+  - [GitLab Performance Tool project](https://gitlab.com/gitlab-org/quality/performance)
+  - [Review apps performance metrics](../development/testing_guide/review_apps.md#performance-metrics)
+- Monitoring & Overview:
+  - [GitLab performance monitoring](../administration/monitoring/performance/index.md)
+  - [Development department performance indicators](https://about.gitlab.com/handbook/engineering/development/performance-indicators/)
+  - [Service measurement](../development/service_measurement.md)
+- Self-managed administration and customer-focused:
+  - [File system performance benchmarking](../administration/operations/filesystem_benchmarking.md)
+  - [Sidekiq performance troubleshooting](../administration/troubleshooting/sidekiq.md)
 
 ## Workflow
 
@@ -95,8 +126,13 @@ end
 This however leads to the question: how many iterations should we run to get
 meaningful statistics?
 
-The benchmark-ips Gem basically takes care of all this and much more, and as a
-result of this should be used instead of the `Benchmark` module.
+The [`benchmark-ips`](https://github.com/evanphx/benchmark-ips)
+gem takes care of all this and much more. You should therefore use it instead of the `Benchmark`
+module.
+
+The GitLab Gemfile also contains the [`benchmark-memory`](https://github.com/michaelherold/benchmark-memory)
+gem, which works similarly to the `benchmark` and `benchmark-ips` gems. However, `benchmark-memory`
+instead returns the memory size, objects, and strings allocated and retained during the benchmark.
 
 In short:
 
@@ -110,7 +146,7 @@ In short:
 - If you must write a benchmark use the benchmark-ips Gem instead of Ruby's
    `Benchmark` module.
 
-## Profiling
+## Profiling with Stackprof
 
 By collecting snapshots of process state at regular intervals, profiling allows
 you to see where time is spent in a process. The
@@ -124,15 +160,36 @@ frequency (for example, 100hz, that is 100 stacks per second). This type of prof
 has quite a low (albeit non-zero) overhead and is generally considered to be
 safe for production.
 
-### Development
-
 A profiler can be a very useful tool during development, even if it does run *in
 an unrepresentative environment*. In particular, a method is not necessarily
 troublesome just because it's executed many times, or takes a long time to
 execute. Profiles are tools you can use to better understand what is happening
 in an application - using that information wisely is up to you!
 
-Keeping that in mind, to create a profile, identify (or create) a spec that
+There are multiple ways to create a profile with Stackprof.
+
+### Wrapping a code block
+
+To profile a specific code block, you can wrap that block in a `Stackprof.run` call:
+
+```ruby
+StackProf.run(mode: :wall, out: 'tmp/stackprof-profiling.dump') do
+  #...
+end
+```
+
+This creates a `.dump` file that you can [read](#reading-a-stackprof-profile).
+For all available options, see the [Stackprof documentation](https://github.com/tmm1/stackprof#all-options).
+
+### Performance bar
+
+With the [Performance bar](../administration/monitoring/performance/performance_bar.md),
+you have the option to profile a request using Stackprof and immediately output the results to a
+[Speedscope flamegraph](profiling.md#speedscope-flamegraphs).
+
+### RSpec profiling with Stackprof
+
+To create a profile from a spec, identify (or create) a spec that
 exercises the troublesome code path, then run it using the `bin/rspec-stackprof`
 helper, for example:
 
@@ -161,89 +218,10 @@ Finished in 18.19 seconds (files took 4.8 seconds to load)
       187   (1.1%)         187   (1.1%)     block (4 levels) in class_attribute
 ```
 
-You can limit the specs that are run by passing any arguments `rspec` would
+You can limit the specs that are run by passing any arguments `RSpec` would
 normally take.
 
-The output is sorted by the `Samples` column by default. This is the number of
-samples taken where the method is the one currently being executed. The `Total`
-column shows the number of samples taken where the method, or any of the methods
-it calls, were being executed.
-
-To create a graphical view of the call stack:
-
-```shell
-stackprof tmp/project_policy_spec.rb.dump --graphviz > project_policy_spec.dot
-dot -Tsvg project_policy_spec.dot > project_policy_spec.svg
-```
-
-To load the profile in [KCachegrind](https://kcachegrind.github.io/):
-
-```shell
-stackprof tmp/project_policy_spec.rb.dump --callgrind > project_policy_spec.callgrind
-kcachegrind project_policy_spec.callgrind # Linux
-qcachegrind project_policy_spec.callgrind # Mac
-```
-
-For flame graphs, enable raw collection first. Note that raw
-collection can generate a very large file, so increase the `INTERVAL`, or
-run on a smaller number of specs for smaller file size:
-
-```shell
-RAW=true bin/rspec-stackprof spec/policies/group_member_policy_spec.rb
-```
-
-You can then generate, and view the resultant flame graph. It might take a
-while to generate based on the output file size:
-
-```shell
-# Generate
-stackprof --flamegraph tmp/group_member_policy_spec.rb.dump > group_member_policy_spec.flame
-
-# View
-stackprof --flamegraph-viewer=group_member_policy_spec.flame
-```
-
-It may be useful to zoom in on a specific method, for example:
-
-```shell
-$ stackprof tmp/project_policy_spec.rb.dump --method warm_asset_cache
-
-TestEnv#warm_asset_cache (/Users/lupine/dev/gitlab.com/gitlab-org/gitlab-development-kit/gitlab/spec/support/test_env.rb:164)
-  samples:     0 self (0.0%)  /   6288 total (36.9%)
-  callers:
-    6288  (  100.0%)  block (2 levels) in <top (required)>
-  callees (6288 total):
-    6288  (  100.0%)  Capybara::RackTest::Driver#visit
-  code:
-                                  |   164  |   def warm_asset_cache
-                                  |   165  |     return if warm_asset_cache?
-                                  |   166  |     return unless defined?(Capybara)
-                                  |   167  |
- 6288   (36.9%)                   |   168  |     Capybara.current_session.driver.visit '/'
-                                  |   169  |   end
-$ stackprof tmp/project_policy_spec.rb.dump --method BasePolicy#abilities
-BasePolicy#abilities (/Users/lupine/dev/gitlab.com/gitlab-org/gitlab-development-kit/gitlab/app/policies/base_policy.rb:79)
-  samples:     0 self (0.0%)  /     50 total (0.3%)
-  callers:
-      25  (   50.0%)  BasePolicy.abilities
-      25  (   50.0%)  BasePolicy#collect_rules
-  callees (50 total):
-      25  (   50.0%)  ProjectPolicy#rules
-      25  (   50.0%)  BasePolicy#collect_rules
-  code:
-                                  |    79  |   def abilities
-                                  |    80  |     return RuleSet.empty if @user && @user.blocked?
-                                  |    81  |     return anonymous_abilities if @user.nil?
-   50    (0.3%)                   |    82  |     collect_rules { rules }
-                                  |    83  |   end
-```
-
-Since the profile includes the work done by the test suite as well as the
-application code, these profiles can be used to investigate slow tests as well.
-However, for smaller runs (like this example), this means that the cost of
-setting up the test suite tends to dominate.
-
-### Production
+### Using Stackprof in production
 
 Stackprof can also be used to profile production workloads.
 
@@ -274,8 +252,8 @@ the timeout.
 
 Once profiling stops, the profile is written out to disk at
 `$STACKPROF_FILE_PREFIX/stackprof.$PID.$RAND.profile`. It can then be inspected
-further via the `stackprof` command line tool, as described in the previous
-section.
+further through the `stackprof` command line tool, as described in the
+[Reading a Stackprof profile section](#reading-a-stackprof-profile).
 
 Currently supported profiling targets are:
 
@@ -295,14 +273,85 @@ For Sidekiq, the signal can be sent to the `sidekiq-cluster` process via `pkill
 -USR2 bin/sidekiq-cluster`, which forwards the signal to all Sidekiq
 children. Alternatively, you can also select a specific PID of interest.
 
-Production profiles can be especially noisy. It can be helpful to visualize them
-as a [flame graph](https://github.com/brendangregg/FlameGraph). This can be done
-via:
+### Reading a Stackprof profile
+
+The output is sorted by the `Samples` column by default. This is the number of samples taken where
+the method is the one currently executed. The `Total` column shows the number of samples taken where
+the method (or any of the methods it calls) is executed.
+
+To create a graphical view of the call stack:
 
 ```shell
-bundle exec stackprof --stackcollapse /tmp/stackprof.55769.c6c3906452.profile | flamegraph.pl > flamegraph.svg
+stackprof tmp/project_policy_spec.rb.dump --graphviz > project_policy_spec.dot
+dot -Tsvg project_policy_spec.dot > project_policy_spec.svg
 ```
 
+To load the profile in [KCachegrind](https://kcachegrind.github.io/):
+
+```shell
+stackprof tmp/project_policy_spec.rb.dump --callgrind > project_policy_spec.callgrind
+kcachegrind project_policy_spec.callgrind # Linux
+qcachegrind project_policy_spec.callgrind # Mac
+```
+
+You can also generate and view the resultant flame graph. To view a flame graph that
+`bin/rspec-stackprof` creates, you must set the `RAW` environment variable to `true` when running
+`bin/rspec-stackprof`.
+
+It might take a while to generate based on the output file size:
+
+```shell
+# Generate
+stackprof --flamegraph tmp/group_member_policy_spec.rb.dump > group_member_policy_spec.flame
+
+# View
+stackprof --flamegraph-viewer=group_member_policy_spec.flame
+```
+
+To export the flame graph to an SVG file, use [Brendan Gregg's FlameGraph tool](https://github.com/brendangregg/FlameGraph):
+
+```shell
+stackprof --stackcollapse  /tmp/group_member_policy_spec.rb.dump | flamegraph.pl > flamegraph.svg
+```
+
+It's also possible to view flame graphs through [speedscope](https://github.com/jlfwong/speedscope).
+You can do this when using the [performance bar](profiling.md#speedscope-flamegraphs)
+and when [profiling code blocks](https://github.com/jlfwong/speedscope/wiki/Importing-from-stackprof-(ruby)).
+This option isn't supported by `bin/rspec-stackprof`.
+
+You can profile speciific methods by using `--method method_name`:
+
+```shell
+$ stackprof tmp/project_policy_spec.rb.dump --method access_allowed_to
+
+ProjectPolicy#access_allowed_to? (/Users/royzwambag/work/gitlab-development-kit/gitlab/app/policies/project_policy.rb:793)
+  samples:     0 self (0.0%)  /    578 total (0.7%)
+  callers:
+     397  (   68.7%)  block (2 levels) in <class:ProjectPolicy>
+      95  (   16.4%)  block in <class:ProjectPolicy>
+      86  (   14.9%)  block in <class:ProjectPolicy>
+  callees (578 total):
+     399  (   69.0%)  ProjectPolicy#team_access_level
+     141  (   24.4%)  Project::GeneratedAssociationMethods#project_feature
+      30  (    5.2%)  DeclarativePolicy::Base#can?
+       8  (    1.4%)  Featurable#access_level
+  code:
+                                  |   793  |   def access_allowed_to?(feature)
+  141    (0.2%)                   |   794  |     return false unless project.project_feature
+                                  |   795  |
+    8    (0.0%)                   |   796  |     case project.project_feature.access_level(feature)
+                                  |   797  |     when ProjectFeature::DISABLED
+                                  |   798  |       false
+                                  |   799  |     when ProjectFeature::PRIVATE
+  429    (0.5%)                   |   800  |       can?(:read_all_resources) || team_access_level >= ProjectFeature.required_minimum_access_level(feature)
+                                  |   801  |     else
+```
+
+When using Stackprof to profile specs, the profile includes the work done by the test suite and the
+application code. You can therefore use these profiles to investigate slow tests as well. However,
+for smaller runs (like this example), this means that the cost of setting up the test suite tends to
+dominate.
+
 ## RSpec profiling
 
 The GitLab development environment also includes the
@@ -459,11 +508,14 @@ The `mem_*` values represent different aspects of how objects and memory are all
 
 We can use `memory_profiler` for profiling.
 
-The [`memory_profiler`](https://github.com/SamSaffron/memory_profiler) gem is already present in the GitLab `Gemfile`,
-you just need to require it:
+The [`memory_profiler`](https://github.com/SamSaffron/memory_profiler)
+gem is already present in the GitLab `Gemfile`. It's also available in the [performance bar](../administration/monitoring/performance/performance_bar.md)
+for the current URL.
+
+To use the memory profiler directly in your code, use `require` to add it:
 
 ```ruby
-require 'sidekiq/testing'
+require 'memory_profiler'
 
 report = MemoryProfiler.report do
   # Code you want to profile
@@ -473,10 +525,17 @@ output = File.open('/tmp/profile.txt','w')
 report.pretty_print(output)
 ```
 
-The report breaks down 2 key concepts:
+The report shows the retained and allocated memory grouped by gem, file, location, and class. The
+memory profiler also performs a string analysis that shows how often a string is allocated and
+retained.
 
-- Retained: long lived memory use and object count retained due to the execution of the code block.
-- Allocated: all object allocation and memory allocation during code block.
+#### Retained versus allocated
+
+- Retained memory: long-lived memory use and object count retained due to the execution of the code
+  block. This has a direct impact on memory and the garbage collector.
+- Allocated memory: all object allocation and memory allocation during the code block. This might
+  have minimal impact on memory, but substantial impact on performance. The more objects you
+  allocate, the more work is being done and the slower the application is.
 
 As a general rule, **retained** is always smaller than or equal to **allocated**.
 
@@ -512,6 +571,32 @@ Fragmented Ruby heap snapshot could look like this:
 
 Memory fragmentation could be reduced by tuning GC parameters [as described in this post](https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html). This should be considered as a tradeoff, as it may affect overall performance of memory allocation and GC cycles.
 
+### Derailed Benchmarks
+
+`derailed_benchmarks` is a [gem](https://github.com/zombocom/derailed_benchmarks)
+described as "A series of things you can use to benchmark a Rails or Ruby app."
+We include `derailed_benchmarks` in our `Gemfile`.
+
+We run `derailed exec perf:mem` in every pipeline with a `test` stage, in a job
+called `memory-on-boot`. ([Read an example job.](https://gitlab.com/gitlab-org/gitlab/-/jobs/2144695684).)
+You may find the results:
+
+- On the merge request **Overview** tab, in the merge request reports area, in the
+  **Metrics Reports** [dropdown list](../ci/metrics_reports.md).
+- In the `memory-on-boot` artifacts for a full report and a dependency breakdown.
+
+`derailed_benchmarks` also provides other methods to investigate memory. To learn more,
+refer to the [gem documentation](https://github.com/zombocom/derailed_benchmarks#running-derailed-exec).
+Most of the methods (`derailed exec perf:*`) attempt to boot your Rails app in a
+`production` environment and run benchmarks against it.
+It is possible both in GDK and GCK:
+
+- For GDK, follow the
+  [the instructions](https://github.com/zombocom/derailed_benchmarks#running-in-production-locally)
+  on the gem page. You must do similar for Redis configurations to avoid errors.
+- GCK includes `production` configuration sections
+  [out of the box](https://gitlab.com/gitlab-org/gitlab-compose-kit#running-production-like).
+
 ## Importance of Changes
 
 When working on performance improvements, it's important to always ask yourself
@@ -612,7 +697,7 @@ end
 
 ## String Freezing
 
-In recent Ruby versions calling `freeze` on a String leads to it being allocated
+In recent Ruby versions calling `.freeze` on a String leads to it being allocated
 only once and re-used. For example, on Ruby 2.3 or later this only allocates the
 "foo" String once:
 
@@ -626,6 +711,12 @@ Depending on the size of the String and how frequently it would be allocated
 (before the `.freeze` call was added), this _may_ make things faster, but
 this isn't guaranteed.
 
+Freezing strings saves memory, as every allocated string uses at least one `RVALUE_SIZE` bytes (40
+bytes on x64) of memory.
+
+You can use the [memory profiler](#using-memory-profiler)
+to see which strings are allocated often and could potentially benefit from a `.freeze`.
+
 Strings are frozen by default in Ruby 3.0. To prepare our codebase for
 this eventuality, we are adding the following header to all Ruby files: