Add latest changes from gitlab-org/gitlab@15-9-stable-eev15.9.0-rc42

author: GitLab Bot <gitlab-bot@gitlab.com> 2023-02-20 16:49:51 +0300
committer: GitLab Bot <gitlab-bot@gitlab.com> 2023-02-20 16:49:51 +0300
commit: 71786ddc8e28fbd3cb3fcc4b3ff15e5962a1c82e (patch)
tree: 6a2d93ef3fb2d353bb7739e4b57e6541f51cdd71 /doc/development/sidekiq
parent: a7253423e3403b8c08f8a161e5937e1488f5f407 (diff)
2 files changed, 39 insertions, 21 deletions
diff --git a/doc/development/sidekiq/index.md b/doc/development/sidekiq/index.md
index f4f98641d39..355f5a3b753 100644
--- a/doc/development/sidekiq/index.md
+++ b/doc/development/sidekiq/index.md
@@ -7,7 +7,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
 # Sidekiq guides
 
 We use [Sidekiq](https://github.com/mperham/sidekiq) as our background
-job processor. These guides are for writing jobs that will work well on
+job processor. These guides are for writing jobs that works well on
 GitLab.com and be consistent with our existing worker classes. For
 information on administering GitLab, see [configuring Sidekiq](../../administration/sidekiq/index.md).
 
@@ -74,7 +74,7 @@ A lower retry count may be applicable if any of the below apply:
 1. The worker is not idempotent and running it multiple times could
    leave the system in an inconsistent state. For example, a worker that
    posts a system note and then performs an action: if the second step
-   fails and the worker retries, the system note will be posted again.
+   fails and the worker retries, the system note is posted again.
 1. The worker is a cronjob that runs frequently. For example, if a cron
    job runs every hour, then we don't need to retry beyond an hour
    because we don't need two of the same job running at once.
@@ -96,6 +96,24 @@ def perform
 end
 ```
 
+## Failure handling
+
+Failures are typically handled by Sidekiq itself, which takes advantage of the inbuilt retry mechanism mentioned above. You should allow exceptions to be raised so that Sidekiq can reschedule the job.
+
+If you need to perform an action when a job fails after all of its retry attempts, add it to the `sidekiq_retries_exhausted` method.
+
+```ruby
+sidekiq_retries_exhausted do |msg, ex|
+  project = Project.find(msg['args'].first)
+  project.perform_a_rollback # handle the permanent failure
+end
+
+def perform(project_id)
+  project = Project.find(project_id)
+  project.some_action # throws an exception
+end
+```
+
 ## Sidekiq Queues
 
 Previously, each worker had its own queue, which was automatically set based on the
@@ -113,8 +131,8 @@ gitlab:sidekiq:all_queues_yml:generate` to regenerate
 `app/workers/all_queues.yml` or `ee/app/workers/all_queues.yml` so that
 it can be picked up by
 [`sidekiq-cluster`](../../administration/sidekiq/extra_sidekiq_processes.md)
-in installations that don't use routing rules. To learn more about potential changes,
-read [Use routing rules by default and deprecate queue selectors for self-managed](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/596).
+in installations that don't use routing rules. For more information about potential changes,
+see [epic 596](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/596).
 
 Additionally, run
 `bin/rake gitlab:sidekiq:sidekiq_queues_yml:generate` to regenerate
@@ -156,7 +174,7 @@ queues in a namespace (technically: all queues prefixed with the namespace name)
 when a namespace is provided instead of a simple queue name in the `--queue`
 (`-q`) option, or in the `:queues:` section in `config/sidekiq_queues.yml`.
 
-Note that adding a worker to an existing namespace should be done with care, as
+Adding a worker to an existing namespace should be done with care, as
 the extra jobs take resources away from jobs from workers that were already
 there, if the resources available to the Sidekiq process handling the namespace
 are not adjusted appropriately.
@@ -195,9 +213,9 @@ can read the number or type of provided arguments.
 
 GitLab stores Sidekiq jobs and their arguments in Redis. To avoid
 excessive memory usage, we compress the arguments of Sidekiq jobs
-if their original size is bigger than 100KB.
+if their original size is bigger than 100 KB.
 
-After compression, if their size still exceeds 5MB, it raises an
+After compression, if their size still exceeds 5 MB, it raises an
 [`ExceedLimitError`](https://gitlab.com/gitlab-org/gitlab/-/blob/f3dd89e5e510ea04b43ffdcb58587d8f78a8d77c/lib/gitlab/sidekiq_middleware/size_limiter/exceed_limit_error.rb#L8)
 error when scheduling the job.
 
@@ -227,6 +245,6 @@ tests should be placed in `spec/workers`.
 
 ## Interacting with Sidekiq Redis and APIs
 
-The application should minimise interaction with of any `Sidekiq.redis` and Sidekiq [APIs](https://github.com/mperham/sidekiq/blob/main/lib/sidekiq/api.rb). Such interactions in generic application logic should be abstracted to a [Sidekiq middleware](https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/gitlab/sidekiq_middleware) for re-use across teams. By decoupling application logic from Sidekiq's datastore, it allows for greater freedom when horizontally scaling the GitLab background processing setup.
+The application should minimise interaction with of any `Sidekiq.redis` and Sidekiq [APIs](https://github.com/mperham/sidekiq/blob/main/lib/sidekiq/api.rb). Such interactions in generic application logic should be abstracted to a [Sidekiq middleware](https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/gitlab/sidekiq_middleware) for re-use across teams. By decoupling application logic from Sidekiq datastore, it allows for greater freedom when horizontally scaling the GitLab background processing setup.
 
 Some exceptions to this rule would be migration-related logic or administration operations.
diff --git a/doc/development/sidekiq/worker_attributes.md b/doc/development/sidekiq/worker_attributes.md
index 4fcd8e33d5c..a3bfe5f27cc 100644
--- a/doc/development/sidekiq/worker_attributes.md
+++ b/doc/development/sidekiq/worker_attributes.md
@@ -37,7 +37,7 @@ end
 ### Latency sensitive jobs
 
 If a large number of background jobs get scheduled at once, queueing of jobs may
-occur while jobs wait for a worker node to be become available. This is normal
+occur while jobs wait for a worker node to be become available. This is standard
 and gives the system resilience by allowing it to gracefully handle spikes in
 traffic. Some jobs, however, are more sensitive to latency than others.
 
@@ -79,7 +79,7 @@ On GitLab.com, we run Sidekiq in several
 each of which represents a particular type of workload.
 
 When changing a queue's urgency, or adding a new queue, we need to take
-into account the expected workload on the new shard. Note that, if we're
+into account the expected workload on the new shard. If we're
 changing an existing queue, there is also an effect on the old shard,
 but that always reduces work.
 
@@ -108,7 +108,7 @@ shard_consumption = shard_rps * shard_duration_avg
 
 If we expect an increase of **less than 5%**, then no further action is needed.
 
-Otherwise, please ping `@gitlab-org/scalability` on the merge request and ask
+Otherwise, ping `@gitlab-org/scalability` on the merge request and ask
 for a review.
 
 ## Jobs with External Dependencies
@@ -121,7 +121,7 @@ However, some jobs are dependent on external services to complete
 successfully. Some examples include:
 
 1. Jobs which call web-hooks configured by a user.
-1. Jobs which deploy an application to a k8s cluster configured by a user.
+1. Jobs which deploy an application to a Kubernetes cluster configured by a user.
 
 These jobs have "external dependencies". This is important for the operation of
 the background processing cluster in several ways:
@@ -179,8 +179,8 @@ performance.
 Likewise, if a worker uses large amounts of memory, we can run these on a
 bespoke low concurrency, high memory fleet.
 
-Note that memory-bound workers create heavy GC workloads, with pauses of
-10-50ms. This has an impact on the latency requirements for the
+Memory-bound workers create heavy GC workloads, with pauses of
+10-50 ms. This has an impact on the latency requirements for the
 worker. For this reason, `memory` bound, `urgency :high` jobs are not
 permitted and fail CI. In general, `memory` bound workers are
 discouraged, and alternative approaches to processing the work should be
@@ -219,7 +219,7 @@ We use the following approach to determine whether a worker is CPU-bound:
 - Divide `cpu_s` by `duration` to get the percentage time spend on-CPU.
 - If this ratio exceeds 33%, the worker is considered CPU-bound and should be
   annotated as such.
-- Note that these values should not be used over small sample sizes, but
+- These values should not be used over small sample sizes, but
   rather over fairly large aggregates.
 
 ## Feature category
@@ -254,7 +254,7 @@ When setting this field, consider the following trade-off:
 - Prefer read replicas to add relief to the primary, but increase the likelihood of stale reads that have to be retried.
 
 To maintain the same behavior compared to before this field was introduced, set it to `:always`, so
-database operations will only target the primary. Reasons for having to do so include workers
+database operations only target the primary. Reasons for having to do so include workers
 that mostly or exclusively perform writes, or workers that read their own writes and who might run
 into data consistency issues should a stale record be read back from a replica. **Try to avoid
 these scenarios, since `:always` should be considered the exception, not the rule.**
@@ -270,10 +270,10 @@ The difference is in what happens when there is still replication lag after the
 switch over to the primary right away, whereas `delayed` workers fail fast and are retried once.
 If they still encounter replication lag, they also switch to the primary instead.
 **If your worker never performs any writes, it is strongly advised to apply one of these consistency settings,
-since it will never need to rely on the primary database node.**
+since it never needs to rely on the primary database node.**
 
 The table below shows the `data_consistency` attribute and its values, ordered by the degree to which
-they prefer read replicas and will wait for replicas to catch up:
+they prefer read replicas and wait for replicas to catch up:
 
 | **Data Consistency**  | **Description**  |
 |--------------|-----------------------------|
@@ -300,14 +300,14 @@ end
 
 The `feature_flag` property allows you to toggle a job's `data_consistency`,
 which permits you to safely toggle load balancing capabilities for a specific job.
-When `feature_flag` is disabled, the job defaults to `:always`, which means that the job will always use the primary database.
+When `feature_flag` is disabled, the job defaults to `:always`, which means that the job always uses the primary database.
 
 The `feature_flag` property does not allow the use of
 [feature gates based on actors](../feature_flags/index.md).
 This means that the feature flag cannot be toggled only for particular
 projects, groups, or users, but instead, you can safely use [percentage of time rollout](../feature_flags/index.md).
-Note that since we check the feature flag on both Sidekiq client and server, rolling out a 10% of the time,
-will likely results in 1% (`0.1` `[from client]*0.1` `[from server]`) of effective jobs using replicas.
+Since we check the feature flag on both Sidekiq client and server, rolling out a 10% of the time,
+likely results in 1% (`0.1` `[from client]*0.1` `[from server]`) of effective jobs using replicas.
 
 Example:
author	GitLab Bot <gitlab-bot@gitlab.com>	2023-02-20 16:49:51 +0300
committer	GitLab Bot <gitlab-bot@gitlab.com>	2023-02-20 16:49:51 +0300
commit	71786ddc8e28fbd3cb3fcc4b3ff15e5962a1c82e (patch)
tree	6a2d93ef3fb2d353bb7739e4b57e6541f51cdd71 /doc/development/sidekiq
parent	a7253423e3403b8c08f8a161e5937e1488f5f407 (diff)