diff options
Diffstat (limited to 'doc/development/sidekiq/logging.md')
-rw-r--r-- | doc/development/sidekiq/logging.md | 155 |
1 files changed, 155 insertions, 0 deletions
diff --git a/doc/development/sidekiq/logging.md b/doc/development/sidekiq/logging.md new file mode 100644 index 00000000000..015376b0fc6 --- /dev/null +++ b/doc/development/sidekiq/logging.md @@ -0,0 +1,155 @@ +--- +stage: none +group: unassigned +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Sidekiq logging + +## Worker context + +> [Introduced](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/9) in GitLab 12.8. + +To have some more information about workers in the logs, we add +[metadata to the jobs in the form of an +`ApplicationContext`](../logging.md#logging-context-metadata-through-rails-or-grape-requests). +In most cases, when scheduling a job from a request, this context is already +deducted from the request and added to the scheduled job. + +When a job runs, the context that was active when it was scheduled +is restored. This causes the context to be propagated to any job +scheduled from within the running job. + +All this means that in most cases, to add context to jobs, we don't +need to do anything. + +There are however some instances when there would be no context +present when the job is scheduled, or the context that is present is +likely to be incorrect. For these instances, we've added Rubocop rules +to draw attention and avoid incorrect metadata in our logs. + +As with most our cops, there are perfectly valid reasons for disabling +them. In this case it could be that the context from the request is +correct. Or maybe you've specified a context already in a way that +isn't picked up by the cops. In any case, leave a code comment +pointing to which context to use when disabling the cops. + +When you do provide objects to the context, make sure that the +route for namespaces and projects is pre-loaded. This can be done by using +the `.with_route` scope defined on all `Routable`s. + +### Cron workers + +The context is automatically cleared for workers in the cronjob queue +(`include CronjobQueue`), even when scheduling them from +requests. We do this to avoid incorrect metadata when other jobs are +scheduled from the cron worker. + +Cron workers themselves run instance wide, so they aren't scoped to +users, namespaces, projects, or other resources that should be added to +the context. + +However, they often schedule other jobs that _do_ require context. + +That is why there needs to be an indication of context somewhere in +the worker. This can be done by using one of the following methods +somewhere within the worker: + +1. Wrap the code that schedules jobs in the `with_context` helper: + + ```ruby + def perform + deletion_cutoff = Gitlab::CurrentSettings + .deletion_adjourned_period.days.ago.to_date + projects = Project.with_route.with_namespace + .aimed_for_deletion(deletion_cutoff) + + projects.find_each(batch_size: 100).with_index do |project, index| + delay = index * INTERVAL + + with_context(project: project) do + AdjournedProjectDeletionWorker.perform_in(delay, project.id) + end + end + end + ``` + +1. Use the a batch scheduling method that provides context: + + ```ruby + def schedule_projects_in_batch(projects) + ProjectImportScheduleWorker.bulk_perform_async_with_contexts( + projects, + arguments_proc: -> (project) { project.id }, + context_proc: -> (project) { { project: project } } + ) + end + ``` + + Or, when scheduling with delays: + + ```ruby + diffs.each_batch(of: BATCH_SIZE) do |diffs, index| + DeleteDiffFilesWorker + .bulk_perform_in_with_contexts(index * 5.minutes, + diffs, + arguments_proc: -> (diff) { diff.id }, + context_proc: -> (diff) { { project: diff.merge_request.target_project } }) + end + ``` + +### Jobs scheduled in bulk + +Often, when scheduling jobs in bulk, these jobs should have a separate +context rather than the overarching context. + +If that is the case, `bulk_perform_async` can be replaced by the +`bulk_perform_async_with_context` helper, and instead of +`bulk_perform_in` use `bulk_perform_in_with_context`. + +For example: + +```ruby + ProjectImportScheduleWorker.bulk_perform_async_with_contexts( + projects, + arguments_proc: -> (project) { project.id }, + context_proc: -> (project) { { project: project } } + ) +``` + +Each object from the enumerable in the first argument is yielded into 2 +blocks: + +- The `arguments_proc` which needs to return the list of arguments the + job needs to be scheduled with. + +- The `context_proc` which needs to return a hash with the context + information for the job. + +## Arguments logging + +As of GitLab 13.6, Sidekiq job arguments are logged by default, unless [`SIDEKIQ_LOG_ARGUMENTS`](../../administration/troubleshooting/sidekiq.md#log-arguments-to-sidekiq-jobs) +is disabled. + +By default, the only arguments logged are numeric arguments, because +arguments of other types could contain sensitive information. To +override this, use `loggable_arguments` inside a worker with the indexes +of the arguments to be logged. (Numeric arguments do not need to be +specified here.) + +For example: + +```ruby +class MyWorker + include ApplicationWorker + + loggable_arguments 1, 3 + + # object_id will be logged as it's numeric + # string_a will be logged due to the loggable_arguments call + # string_b will be filtered from logs + # string_c will be logged due to the loggable_arguments call + def perform(object_id, string_a, string_b, string_c) + end +end +``` |