Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/stage_group_observability/index.md')
-rw-r--r--doc/development/stage_group_observability/index.md42
1 files changed, 39 insertions, 3 deletions
diff --git a/doc/development/stage_group_observability/index.md b/doc/development/stage_group_observability/index.md
index b275b0bfec2..ba17b4cc73a 100644
--- a/doc/development/stage_group_observability/index.md
+++ b/doc/development/stage_group_observability/index.md
@@ -68,11 +68,11 @@ component can have two indicators:
and
[Web](https://gitlab.com/gitlab-com/runbooks/-/blob/f22f40b2c2eab37d85e23ccac45e658b2c914445/metrics-catalog/services/web.jsonnet#L154)
services, that threshold is **5 seconds** when not opted in to the
- [`rails_requests` SLI](../application_slis/rails_request_apdex.md).
+ [`rails_request` SLI](../application_slis/rails_request.md).
We've made this target configurable in [this project](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/525).
- To learn how to customize the request Apdex, see
- [Rails request Apdex SLI](../application_slis/rails_request_apdex.md).
+ To customize the request Apdex, see
+ [Rails request SLIs](../application_slis/rails_request.md).
This new Apdex measurement is not part of the error budget until you
[opt in](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1451).
@@ -136,3 +136,39 @@ For example, see the `server` component of the `web-pages` service:
![web-pages-server-component SLI](img/stage_group_dashboards_service_sli_detail.png)
To add more SLIs tailored to specific features, you can use an [Application SLI](../application_slis/index.md).
+
+## Kibana dashboard for error budgets
+
+For a detailed analysis you can use [a specialized Kibana dashboard](https://log.gprd.gitlab.net/goto/771b5c10-c0ec-11ed-85ed-e7557b0a598c), like this:
+
+![Kibana dashboard](img/error_budgets_kibana_dashboard_v15_10.png)
+
+Description:
+
+- **Apdex requests over limit (graph)** - Displays only requests that exceeded their
+ target duration.
+- **Apdex operations over-limit duration (graph)** - Displays the distribution of duration
+ components (database, Redis, Gitaly, and Rails app).
+- **Apdex requests** (pie chart) - Displays the percentage of `2xx`, `3xx`, `4xx` and
+ `5xx` requests.
+- **Slow request component distribution** - Highlights the component responsible
+ for Apdex violation.
+- **Apdex operations over limit** (table) - Displays a number of operations over
+ limit for each endpoint.
+- **Apdex requests over limit** - Displays a list of individual requests responsible
+ for Apdex violation.
+
+### Use the dashboard
+
+1. Select the feature category you want to investigate.
+ 1. Scroll to the **Feature Category** section. Enter the feature name.
+ 1. Select **Apply changes**. Selected results contain only requests related to this feature category.
+1. Select the time frame for the investigation.
+1. Review dashboard and pay attention to the type of failures.
+
+Questions to answer:
+
+1. Does the failure pattern look like a spike? Or does it persist?
+1. Does the failure look related to a particular component? (database, Redis, ...)
+1. Does the failure affect a specific endpoint? Or is it system-wide?
+1. Does the failure appear caused by infrastructure incidents?