Welcome to mirror list, hosted at ThFree Co, Russian Federation.

index.md « application_slis « development « doc - gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
blob: c1d7ac9fa0cfcd4daeca7e9f17ac2441a0a4dbdf (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
stage: Platforms
group: Scalability
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---

# GitLab Application Service Level Indicators (SLIs)

> [Introduced](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/525) in GitLab 14.4

It is possible to define [Service Level Indicators
(SLIs)](https://en.wikipedia.org/wiki/Service_level_indicator)
directly in the Ruby codebase. This keeps the definition of operations
and their success close to the implementation and allows the people
building features to easily define how these features should be
monitored.

Defining an SLI causes 2
[Prometheus
counters](https://prometheus.io/docs/concepts/metric_types/#counter)
to be emitted from the rails application:

- `gitlab_sli:<sli name>:total`: incremented for each operation.
- `gitlab_sli:<sli_name>:success_total`: incremented for successful
  operations.

## Existing SLIs

1. [`rails_request_apdex`](rails_request_apdex.md)

## Defining a new SLI

An SLI can be defined using the `Gitlab::Metrics::Sli` class.

Before the first scrape, it is important to have [initialized the SLI
with all possible
label-combinations](https://prometheus.io/docs/practices/instrumentation/#avoid-missing-metrics). This
avoid confusing results when using these counters in calculations.

To initialize an SLI, use the `.inilialize_sli` class method, for
example:

```ruby
Gitlab::Metrics::Sli.initialize_sli(:received_email, [
  {
    feature_category: :issue_tracking,
    email_type: :create_issue
  },
  {
    feature_category: :service_desk,
    email_type: :service_desk
  },
  {
    feature_category: :code_review,
    email_type: :create_merge_request
  }
])
```

Metrics must be initialized before they get
scraped for the first time. This could be done at the start time of the
process that will emit them, in which case we need to pay attention
not to increase application's boot time too much. This is preferable
if possible.

Alternatively, if initializing would take too long, this can be done
during the first scrape. We need to make sure we don't do it for every
scrape. This can be done as follows:

```ruby
def initialize_request_slis_if_needed!
  return if Gitlab::Metrics::Sli.initialized?(:rails_request_apdex)
  Gitlab::Metrics::Sli.initialize_sli(:rails_request_apdex, possible_request_labels)
end
```

Also pay attention to do it for the different metrics
endpoints we have. Currently the
[`WebExporter`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/metrics/exporter/web_exporter.rb)
and the
[`HealthController`](https://gitlab.com/gitlab-org/gitlab/blob/master/app/controllers/health_controller.rb)
for Rails and
[`SidekiqExporter`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/metrics/exporter/sidekiq_exporter.rb)
for Sidekiq.

## Tracking operations for an SLI

Tracking an operation in the newly defined SLI can be done like this:

```ruby
Gitlab::Metrics::Sli[:received_email].increment(
  labels: {
    feature_category: :service_desk,
    email_type: :service_desk
  },
  success: issue_created?
)
```

Calling `#increment` on this SLI will increment the total Prometheus counter

```prometheus
gitlab_sli:received_email:total{ feature_category='service_desk', email_type='service_desk' }
```

If the `success:` argument passed is truthy, then the success counter
will also be incremented:

```prometheus
gitlab_sli:received_email:success_total{ feature_category='service_desk', email_type='service_desk' }
```

## Using the SLI in service monitoring and alerts

When the application is emitting metrics for the new SLI, those need
to be consumed in the service catalog to result in alerts, and be
included in the error budget for stage groups and GitLab.com's overall
availability.

This is currently being worked on in [this
project](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/573). As
part of [this
issue](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1307)
we will update the documentation.

For any question, please don't hesitate to createan issue in [the
Scalability issue
tracker](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues)
or come find us in
[#g_scalability](https://gitlab.slack.com/archives/CMMF8TKR9) on Slack.