doc/development/testing_guide/flaky_tests.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326

---
stage: none
group: unassigned
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
---

# Flaky tests

## What's a flaky test?

It's a test that sometimes fails, but if you retry it enough times, it passes,
eventually.

## What are the potential cause for a test to be flaky?

### State leak

**Label:** `flaky-test::state leak`

**Description:** Data state has leaked from a previous test. The actual cause is probably not the flaky test here.

**Difficulty to reproduce:** Moderate. Usually, running the same spec files until the one that's failing reproduces the problem.

**Resolution:** Fix the previous tests and/or places where the test data or environment is modified, so that
it's reset to a pristine test after each test.

**Examples:**

- [Example 1](https://gitlab.com/gitlab-org/gitlab/-/issues/402915): State leakage can result from
  data records created with `let_it_be` shared between test examples, while some test modifies the model
  either deliberately or unwillingly causing out-of-sync data in test examples. This can result in `PG::QueryCanceled: ERROR` in the subsequent test examples or retries.
  For more information about state leakages and resolution options, see [GitLab testing best practices](best_practices.md#lets-talk-about-let).
- [Example 2](https://gitlab.com/gitlab-org/gitlab/-/issues/378414#note_1142026988): A migration
  test might roll-back the database, perform its testing, and then roll-up the database in an
  inconsistent state, so that following tests might not know about certain columns.
- [Example 3](https://gitlab.com/gitlab-org/gitlab/-/issues/368500): A test modifies data that is
  used by a following test.
- [Example 4](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/103434#note_1172316521): A test for a database query passes in a fresh database, but in a
  CI/CD pipeline where the database is used to process previous test sequences, the test fails. This likely
  means that the query itself needs to be updated to work in a non-clean database.

### Dataset-specific

**Label:** `flaky-test::dataset-specific`

**Description:** The test assumes the dataset is in a particular (usually limited) state or order, which
might not be true depending on when the test run during the test suite.

**Difficulty to reproduce:** Moderate, as the amount of data needed to reproduce the issue might be
difficult to achieve locally. Ordering issues are easier to reproduce by repeatedly running the tests several times.

**Resolution:**

- Fix the test to not assume that the dataset is in a particular state, don't hardcode IDs.
- Loosen the assertion if the test shouldn't care about ordering but only on the elements.
- Fix the test by specifying a deterministic ordering.
- Fix the app code by specifying a deterministic ordering.

**Examples:**

- [Example 1](https://gitlab.com/gitlab-org/gitlab/-/issues/378381): The database is recreated when
  any table has more than 500 columns. It could pass in the merge request, but fail later in
  `master` if the order of tests changes.
- [Example 2](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/91016/diffs): A test asserts
  that trying to find a record with an nonexistent ID returns an error message. The test uses an
  hardcoded ID that's supposed to not exist (e.g. `42`). If the test is run early in the test
  suite, it might pass as not enough records were created before it, but as soon as it would run
  later in the suite, there could be a record that actually has the ID `42`, hence the test would
  start to fail.
- [Example 3](https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/10148/diffs): Without
  specifying `ORDER BY`, database is not given deterministic ordering, or data race can happen
  in the tests.
- [Example 4](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/106936/diffs).

### Random input

**Label:** `flaky-test::random input`

**Description:** The test use random values, that sometimes match the expectations, and sometimes not.

**Difficulty to reproduce:** Easy, as the test can be modified locally to use the "random value"
used at the time the test failed

**Resolution:** Once the problem is reproduced, it should be easy to debug and fix either the test
or the app.

**Examples:**

- [Example 1](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/20121): The test isn't robust enough to handle a specific data, that only appears sporadically since the data input is random.

### Unreliable DOM Selector

**Label:** `flaky-test::unreliable dom selector`

**Description:** The DOM selector used in the test is unreliable.

**Difficulty to reproduce:** Moderate to difficult. Depending on whether the DOM selector is duplicated, or appears after a delay etc.
Adding a delay in API or controller could help reproducing the issue.

**Resolution:** It really depends on the problem here. It could be to wait for requests to finish, to scroll down the page etc.

**Examples:**

- [Example 1](https://gitlab.com/gitlab-org/gitlab/-/issues/338341): A non-unique CSS selector
  matching more than one element, or a non-waiting selector method that does not allow rendering
  time before throwing an `element not found` error.
- [Example 2](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/101728/diffs): A CSS selector
  only appears after a GraphQL requests has finished, and the UI has updated.
- [Example 3](https://gitlab.com/gitlab-org/gitlab/-/issues/408215): A false-positive test, Capybara imediatly returns true after
page visit and page is not fully loaded, or if the element is not detectable by webdriver (such as being rendered outside the viewport or behind other elements).

### Datetime-sensitive

**Label:** `flaky-test::datetime-sensitive`

**Description:** The test is assuming a specific date or time.

**Difficulty to reproduce:** Easy to moderate, depending on whether the test consistently fails after a certain date, or only fails at a given time or date.

**Resolution:** Freezing the time is usually a good solution.

**Examples:**

- [Example 1](https://gitlab.com/gitlab-org/gitlab/-/issues/118612): A test that breaks after some time passed.
- [Example 2](https://gitlab.com/gitlab-org/gitlab/-/issues/403332): A test that breaks in the last day of the month.

### Unstable infrastructure

**Label:** `flaky-test::unstable infrastructure`

**Description:** The test fails from time to time due to infrastructure issues.

**Difficulty to reproduce:** Hard. It's really hard to reproduce CI infrastructure issues. It might
be possible by using containers locally.

**Resolution:** Starting a conversation with the Infrastructure department in a dedicated issue is
usually a good idea.

**Examples:**

- [Example 1](https://gitlab.com/gitlab-org/gitlab/-/issues/363214): The runner is under heavy load at this time.
- [Example 2](https://gitlab.com/gitlab-org/gitlab/-/issues/360559): The runner is having networking issues, making a job failing early

## Quarantined tests

When we have a flaky test in `master`:

1. Create [a ~"failure::flaky-test" issue](https://about.gitlab.com/handbook/engineering/workflow/#broken-master) with the relevant group label.
1. Quarantine the test after the first failure.
   If the test cannot be fixed in a timely fashion, there is an impact on the
   productivity of all the developers, so it should be quarantined.

### RSpec

#### Fast quarantine

To quickly quarantine a test without having to open a merge request and wait for pipelines,
you can follow [the fast quarantining process](https://gitlab.com/gitlab-org/quality/engineering-productivity/fast-quarantine/-/tree/main/#fast-quarantine-a-test).

#### Long-term quarantine

Once a test is fast-quarantined, you can proceed with the long-term quarantining process. This can be done by opening a merge request.

First, ensure the test file has a [`feature_category` metadata](../feature_categorization/index.md#rspec-examples), to ensure correct attribution of the test file.

Then, you can use the `quarantine: '<issue url>'` metadata with the URL of the
~"failure::flaky-test" issue you created previously.

```ruby
it 'succeeds', quarantine: 'https://gitlab.com/gitlab-org/gitlab/-/issues/12345' do
  expect(response).to have_gitlab_http_status(:ok)
end
```

This means it is skipped unless run with `--tag quarantine`:

```shell
bin/rspec --tag quarantine
```

After the long-term quarantining MR has reached production, you should revert the fast-quarantine MR you created earlier.

### Jest

For Jest specs, you can use the `.skip` method along with the `eslint-disable-next-line` comment to disable the `jest/no-disabled-tests` ESLint rule and include the issue URL. Here's an example:

```javascript
// https://gitlab.com/gitlab-org/gitlab/-/issues/56789
// eslint-disable-next-line jest/no-disabled-tests
it.skip('should throw an error', () => {
  expect(response).toThrowError(expected_error)
});
```

This means it is skipped unless the test suit is run with `--runInBand` Jest command line option:

```shell
jest --runInBand
```

For both test frameworks, make sure to add the `~"quarantined test"` label to the issue.

Once a test is in quarantine, there are 3 choices:

- Fix the test (that is, get rid of its flakiness).
- Move the test to a lower level of testing.
- Remove the test entirely (for example, because there's already a
  lower-level test, or it's duplicating another same-level test, or it's testing
  too much etc.).

## Automatic retries and flaky tests detection

On our CI, we use [RSpec::Retry](https://github.com/NoRedInk/rspec-retry) to automatically retry a failing example a few
times (see [`spec/spec_helper.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/spec_helper.rb) for the precise retries count).

We also use a custom [`RspecFlaky::Listener`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/tooling/rspec_flaky/listener.rb).
This listener runs in the `update-tests-metadata` job in `maintenance` scheduled pipelines
on the `master` branch, and saves flaky examples to `rspec/flaky/report-suite.json`.
The report file is then retrieved by the `retrieve-tests-metadata` job in all pipelines.

This was originally implemented in: <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/13021>.

If you want to enable retries locally, you can use the `RETRIES` environment variable.
For instance `RETRIES=1 bin/rspec ...` would retry the failing examples once.

To generate the reports locally, use the `FLAKY_RSPEC_GENERATE_REPORT` environment variable.
For example, `FLAKY_RSPEC_GENERATE_REPORT=1 bin/rspec ...`.

### Usage of the `rspec/flaky/report-suite.json` report

The `rspec/flaky/report-suite.json` report is
[imported into Snowflake](https://gitlab.com/gitlab-data/analytics/-/blob/7085bea51bb2f8f823e073393934ba5f97259459/extract/gitlab_flaky_tests/upload.py#L19)
once per day, for monitoring with the
[internal dashboard](https://app.periscopedata.com/app/gitlab/888968/EP---Flaky-tests).

## Problems we had in the past at GitLab

- [`rspec-retry` is biting us when some API specs fail](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/29242): <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/9825>
- [Sporadic RSpec failures due to `PG::UniqueViolation`](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/28307#note_24958837): <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/9846>
  - Follow-up: <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/10688>
  - [Capybara.reset_session! should be called before requests are blocked](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/33779): <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/12224>
- ffaker generates funky data that tests are not ready to handle (and tests should be predictable so that's bad!):
  - [Make `spec/mailers/notify_spec.rb` more robust](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/20121): <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/10015>
  - [Transient failure in `spec/requests/api/commits_spec.rb`](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/27988#note_25342521): <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/9944>
  - [Replace ffaker factory data with sequences](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/29643): <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/10184>
  - [Transient failure in spec/finders/issues_finder_spec.rb](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/30211#note_26707685): <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/10404>

### Order-dependent flaky tests

These flaky tests can fail depending on the order they run with other tests. For example:

- <https://gitlab.com/gitlab-org/gitlab/-/issues/327668>

To identify the tests that lead to such failure, we can use `scripts/rspec_bisect_flaky`,
which would give us the minimal test combination to reproduce the failure:

1. First obtain the list of specs that ran before the flaky test. You can search
   for the list under `Knapsack node specs:` in the CI job output log.
1. Save the list of specs as a file, and run:

   ```shell
   cat knapsack_specs.txt | xargs scripts/rspec_bisect_flaky
   ```

If there is an order-dependency issue, the script above will print the minimal
reproduction.

### Time-sensitive flaky tests

- <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/10046>
- <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/10306>

### Array order expectation

- <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/10148>

### Feature tests

- [Be sure to create all the data the test need before starting exercise](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/32622#note_31128195): <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/12059>
- [Bis](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/34609#note_34048715): <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/12604>
- [Bis](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/34698#note_34276286): <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/12664>
- [Assert against the underlying database state instead of against a page's content](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/31437): <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/10934>
- In JS tests, shifting elements can cause Capybara to mis-click when the element moves at the exact time Capybara sends the click
  - [Dropdowns rendering upward or downward due to window size and scroll position](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/17660)
  - [Lazy loaded images can cause Capybara to mis-click](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/18713)
- [Triggering JS events before the event handlers are set up](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/18742)
- [Wait for the image to be lazy-loaded when asserting on a Markdown image's `src` attribute](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/25408)
- [Avoid asserting against flash notice banners](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/79432)

#### Capybara viewport size related issues

- [Transient failure of spec/features/issues/filtered_search/filter_issues_spec.rb](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/29241#note_26743936): <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/10411>

#### Capybara JS driver related issues

- [Don't wait for AJAX when no AJAX request is fired](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/30461): <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/10454>
- [Bis](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/34647): <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/12626>

#### Capybara expectation times out

- [Test imports a project (via Sidekiq) that is growing over time, leading to timeouts when the import takes longer than 60 seconds](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/22599)

### Hanging specs

If a spec hangs, it might be caused by a [bug in Rails](https://github.com/rails/rails/issues/45994):

- <https://gitlab.com/gitlab-org/gitlab/-/merge_requests/81112>
- <https://gitlab.com/gitlab-org/gitlab/-/issues/337039>

## Suggestions

### Split the test file

It could help to split the large RSpec files in multiple files in order to narrow down the context and identify the problematic tests.

## Resources

- [Flaky Tests: Are You Sure You Want to Rerun Them?](https://semaphoreci.com/blog/2017/04/20/flaky-tests.html)
- [How to Deal With and Eliminate Flaky Tests](https://semaphoreci.com/community/tutorials/how-to-deal-with-and-eliminate-flaky-tests)
- [Tips on Treating Flakiness in your Rails Test Suite](https://semaphoreci.com/blog/2017/08/03/tips-on-treating-flakiness-in-your-test-suite.html)
- ['Flaky' tests: a short story](https://www.ombulabs.com/blog/rspec/continuous-integration/how-to-track-down-a-flaky-test.html)
- [Using Insights to Discover Flaky, Slow, and Failed Tests](https://circleci.com/blog/using-insights-to-discover-flaky-slow-and-failed-tests/)

---

[Return to Testing documentation](index.md)