Welcome to mirror list, hosted at ThFree Co, Russian Federation.

stage_group_dashboard.md « dashboards « stage_group_observability « development « doc - gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
blob: fb5d5bbe3791ddf1e5289d0b33dae5a0dcc3a26d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
---
stage: Platforms
group: Scalability
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
---

# Stage group dashboard

The stage group dashboard is generated dashboard that contains metrics
for common components used by most stage groups. The dashboard is
fully customizable and owned by the stage groups.

This page explains what is on these dashboards, how to use their
contents, and how they can be customized.

## Dashboard contents

### Error budget panels

![28 day budget](img/stage_group_dashboards_28d_budget.png)

The top panels display the [error budget](../index.md#error-budget).
These panels always show the 28 days before the end time selected in the
[time range controls](index.md#time-range-controls). This data doesn't
follow the selected range. It does respect the filters for environment
and stage.

### Metrics panels

![Metrics panels](img/stage_group_dashboards_metrics.png)

Although most of the metrics displayed in the panels are self-explanatory in their title and nearby
description, note the following:

- The events are counted, measured, accumulated, collected, and stored as
  [time series](https://prometheus.io/docs/concepts/data_model/). The data is calculated using
  statistical methods to produce metrics. It means that metrics are approximately correct and
  meaningful over a time period. They help you get an overview of the stage of a system over time.
  They are not meant to give you precise numbers of a discrete event.

  If you need a higher level of accuracy, use another monitoring tool, such as
  [logs](https://about.gitlab.com/handbook/engineering/monitoring/#logs).
  Read the following examples for more explanations.
- All the rate metrics' units are `requests per second`. The default aggregate time frame is 1 minute.

  For example, a panel shows the requests per second number at `2020-12-25 00:42:00` to be `34.13`.
  It means at the minute 42 (from `2020-12-25 00:42:00` to `2020-12-25 00:42:59` ), there are
  approximately `34.13 * 60 = ~ 2047` requests processed by the web servers.
- You might encounter some gotchas related to decimal fraction and rounding up frequently, especially
  in low-traffic cases. For example, the error rate of `RepositoryUpdateMirrorWorker` at
  `2020-12-25 02:04:00` is `0.07`, equivalent to `4.2` jobs per minute. The raw result is
  `0.06666666667`, equivalent to 4 jobs per minute.
- All the rate metrics are more accurate when the data is big enough. The default floating-point
  precision is 2. In some extremely low panels, you can see `0.00`, even though there is still some
  real traffic.

To inspect the raw data of the panel for further calculation, select **Inspect** from the dropdown list of a panel.
Queries, raw data, and panel JSON structure are available.
Read more at [Grafana panel inspection](http://grafana.com/docs/grafana/next/panels/query-a-data-source/).

All the dashboards are powered by [Grafana](https://grafana.com/), a frontend for displaying metrics.
Grafana consumes the data returned from queries to backend Prometheus data source, then presents it
with visualizations. The stage group dashboards are built to serve the most common use cases with a
limited set of filters and pre-built queries. Grafana provides a way to explore and visualize the
metrics data with [Grafana Explore](https://grafana.com/docs/grafana/latest/explore/). This requires
some knowledge of the [Prometheus PromQL query language](https://prometheus.io/docs/prometheus/latest/querying/basics/).

## Example: Debugging with dashboards

Example debugging workflow:

1. A team member in the Code Review group has merged an MR which got deployed to production.
1. To verify the deployment, you can check the
   [Code Review group's dashboard](https://dashboards.gitlab.net/d/stage-groups-code_review/stage-groups-group-dashboard-create-code-review?orgId=1).
1. Sidekiq Error Rate panel shows an elevated error rate, specifically `UpdateMergeRequestsWorker`.

  ![Debug 1](img/stage_group_dashboards_debug_1.png)

1. If you select **Kibana: Kibana Sidekiq failed request logs** in the **Extra links** section, you can filter for `UpdateMergeRequestsWorker` and read through the logs.

  ![Debug 2](img/stage_group_dashboards_debug_2.png)

1. With [Sentry](https://sentry.gitlab.net/gitlab/gitlabcom/) you can find the exception where you
   can filter by transaction type and `correlation_id` from Kibana's result item.

  ![Debug 3](img/stage_group_dashboards_debug_3.png)

1. A precise exception, including a stack trace, job arguments, and other information should now appear.

Happy debugging!

## Customizing the dashboard

All Grafana dashboards at GitLab are generated from the [Jsonnet files](https://github.com/grafana/grafonnet-lib)
stored in [the runbooks project](https://gitlab.com/gitlab-com/runbooks/-/tree/master/dashboards).
Particularly, the stage group dashboards definitions are stored in
[`/dashboards/stage-groups`](https://gitlab.com/gitlab-com/runbooks/-/tree/master/dashboards/stage-groups).

By convention, each group has a corresponding Jsonnet file. The dashboards are synced with GitLab
[stage group data](https://gitlab.com/gitlab-com/www-gitlab-com/-/raw/master/data/stages.yml) every
month.

Expansion and customization are one of the key principles used when we designed this system.
To customize your group's dashboard, edit the corresponding file and follow the
[Runbook workflow](https://gitlab.com/gitlab-com/runbooks/-/tree/master/dashboards#dashboard-source).
The dashboard is updated after the MR is merged.

Looking at an autogenerated file, for example,
[`product_planning.dashboard.jsonnet`](https://gitlab.com/gitlab-com/runbooks/-/blob/master/dashboards/stage-groups/product_planning.dashboard.jsonnet):

```jsonnet
// This file is autogenerated using scripts/update_stage_groups_dashboards.rb
// Please feel free to customize this file.
local stageGroupDashboards = import './stage-group-dashboards.libsonnet';

stageGroupDashboards.dashboard('product_planning')
.stageGroupDashboardTrailer()
```

We provide basic customization to filter out the components essential to your group's activities.
By default, only the `web`, `api`, and `sidekiq` components are available in the dashboard, while
`git` is hidden. See [how to enable available components and optional graphs](#optional-graphs).

You can also append further information or custom metrics to a dashboard. The following example
adds some links and a total request rate to the top of the page:

```jsonnet
local stageGroupDashboards = import './stage-group-dashboards.libsonnet';
local grafana = import 'github.com/grafana/grafonnet-lib/grafonnet/grafana.libsonnet';
local basic = import 'grafana/basic.libsonnet';

stageGroupDashboards.dashboard('source_code')
.addPanel(
  grafana.text.new(
    title='Group information',
    mode='markdown',
    content=|||
      Useful link for the Source Code Management group dashboard:
      - [Issue list](https://gitlab.com/groups/gitlab-org/-/issues?scope=all&state=opened&label_name%5B%5D=repository)
      - [Epic list](https://gitlab.com/groups/gitlab-org/-/epics?label_name[]=repository)
    |||,
  ),
  gridPos={ x: 0, y: 0, w: 24, h: 4 }
)
.addPanel(
  basic.timeseries(
    title='Total Request Rate',
    yAxisLabel='Requests per Second',
    decimals=2,
    query=|||
      sum (
        rate(gitlab_transaction_duration_seconds_count{
          env='$environment',
          environment='$environment',
          feature_category=~'source_code_management',
        }[$__interval])
      )
    |||
  ),
  gridPos={ x: 0, y: 0, w: 24, h: 7 }
)
.stageGroupDashboardTrailer()
```

![Stage Group Dashboard Customization](img/stage_group_dashboards_time_customization.png)

<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
If you want to see the workflow in action, we've recorded a pairing session on customizing a dashboard,
available on [GitLab Unfiltered](https://youtu.be/shEd_eiUjdI).

For deeper customization and more complicated metrics, visit the
[Grafonnet lib](https://github.com/grafana/grafonnet-lib) project and the
[GitLab Prometheus Metrics](../../../administration/monitoring/prometheus/gitlab_metrics.md#gitlab-prometheus-metrics)
documentation.

### Optional graphs

Some graphs aren't relevant for all groups, so they aren't added to
the dashboard by default. They can be added by customizing the
dashboard.

By default, only the `web`, `api`, and `sidekiq` metrics are
shown. If you wish to see the metrics from the `git` fleet (or any
other component that might be added in the future), you can configure it as follows:

```jsonnet
stageGroupDashboards
.dashboard('source_code', components=stageGroupDashboards.supportedComponents)
.stageGroupDashboardTrailer()
```

If your group is interested in Sidekiq job durations and their
thresholds, you can add these graphs by calling the `.addSidekiqJobDurationByUrgency` function:

```jsonnet
stageGroupDashboards
.dashboard('access')
.addSidekiqJobDurationByUrgency()
.stageGroupDashboardTrailer()
```