Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/filtering_by_label.md')
-rw-r--r--doc/development/filtering_by_label.md182
1 files changed, 7 insertions, 175 deletions
diff --git a/doc/development/filtering_by_label.md b/doc/development/filtering_by_label.md
index 9e759744a1a..675fe004c22 100644
--- a/doc/development/filtering_by_label.md
+++ b/doc/development/filtering_by_label.md
@@ -1,179 +1,11 @@
---
-stage: Plan
-group: Project Management
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/filtering_by_label.md'
+remove_date: '2022-11-05'
---
-# Filtering by label
-## Introduction
+This document was moved to [another location](database/filtering_by_label.md).
-GitLab has [labels](../user/project/labels.md) that can be assigned to issues,
-merge requests, and epics. Labels on those objects are a many-to-many relation
-through the polymorphic `label_links` table.
-
-To filter these objects by multiple labels - for instance, 'all open
-issues with the label ~Plan and the label ~backend' - we generate a
-query containing a `GROUP BY` clause. In a simple form, this looks like:
-
-```sql
-SELECT
- issues.*
-FROM
- issues
- INNER JOIN label_links ON label_links.target_id = issues.id
- AND label_links.target_type = 'Issue'
- INNER JOIN labels ON labels.id = label_links.label_id
-WHERE
- issues.project_id = 13083
- AND (issues.state IN ('opened'))
- AND labels.title IN ('Plan',
- 'backend')
-GROUP BY
- issues.id
-HAVING (COUNT(DISTINCT labels.title) = 2)
-ORDER BY
- issues.updated_at DESC,
- issues.id DESC
-LIMIT 20 OFFSET 0
-```
-
-In particular, note that:
-
-1. We `GROUP BY issues.id` so that we can ...
-1. Use the `HAVING (COUNT(DISTINCT labels.title) = 2)` condition to ensure that
- all matched issues have both labels.
-
-This is more complicated than is ideal. It makes the query construction more
-prone to errors (such as
-[issue #15557](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/15557)).
-
-## Attempt A: `WHERE EXISTS`
-
-### Attempt A1: use multiple subqueries with `WHERE EXISTS`
-
-In [issue #37137](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/37137)
-and its associated [merge request](https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/14022),
-we tried to replace the `GROUP BY` with multiple uses of `WHERE EXISTS`. For the
-example above, this would give:
-
-```sql
-WHERE (EXISTS (
- SELECT
- TRUE
- FROM
- label_links
- INNER JOIN labels ON labels.id = label_links.label_id
- WHERE
- labels.title = 'Plan'
- AND target_type = 'Issue'
- AND target_id = issues.id))
-AND (EXISTS (
- SELECT
- TRUE
- FROM
- label_links
- INNER JOIN labels ON labels.id = label_links.label_id
- WHERE
- labels.title = 'backend'
- AND target_type = 'Issue'
- AND target_id = issues.id))
-```
-
-While this worked without schema changes, and did improve readability somewhat,
-it did not improve query performance.
-
-### Attempt A2: use label IDs in the `WHERE EXISTS` clause
-
-In [merge request #34503](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/34503), we followed a similar approach to A1. But this time, we
-did a separate query to fetch the IDs of the labels used in the filter so that we avoid the `JOIN` in the `EXISTS` clause and filter directly by
-`label_links.label_id`. We also added a new index on `label_links` for the `target_id`, `label_id`, and `target_type` columns to speed up this query.
-
-Finding the label IDs wasn't straightforward because there could be multiple labels with the same title within a single root namespace. We solved
-this by grouping the label IDs by title and then using the array of IDs in the `EXISTS` clauses.
-
-This resulted in a significant performance improvement. However, this optimization could not be applied to the dashboard pages
-where we do not have a project or group context. We could not easily search for the label IDs here because that would mean searching across all
-projects and groups that the user has access to.
-
-## Attempt B: Denormalize using an array column
-
-Having [removed MySQL support in GitLab 12.1](https://about.gitlab.com/blog/2019/06/27/removing-mysql-support/),
-using [PostgreSQL's arrays](https://www.postgresql.org/docs/11/arrays.html) became more
-tractable as we didn't have to support two databases. We discussed denormalizing
-the `label_links` table for querying in
-[issue #49651](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/49651),
-with two options: label IDs and titles.
-
-We can think of both of those as array columns on `issues`, `merge_requests`,
-and `epics`: `issues.label_ids` would be an array column of label IDs, and
-`issues.label_titles` would be an array of label titles.
-
-These array columns can be complemented with [GIN
-indexes](https://www.postgresql.org/docs/11/gin-intro.html) to improve
-matching.
-
-### Attempt B1: store label IDs for each object
-
-This has some strong advantages over titles:
-
-1. Unless a label is deleted, or a project is moved, we never need to
- bulk-update the denormalized column.
-1. It uses less storage than the titles.
-
-Unfortunately, our application design makes this hard. If we were able to query
-just by label ID easily, we wouldn't need the `INNER JOIN labels` in the initial
-query at the start of this document. GitLab allows users to filter by label
-title across projects and even across groups, so a filter by the label ~Plan may
-include labels with multiple distinct IDs.
-
-We do not want users to have to know about the different IDs, which means that
-given this data set:
-
-| Project | ~Plan label ID | ~backend label ID |
-| ------- | -------------- | ----------------- |
-| A | 11 | 12 |
-| B | 21 | 22 |
-| C | 31 | 32 |
-
-We would need something like:
-
-```sql
-WHERE
- label_ids @> ARRAY[11, 12]
- OR label_ids @> ARRAY[21, 22]
- OR label_ids @> ARRAY[31, 32]
-```
-
-This can get even more complicated when we consider that in some cases, there
-might be two ~backend labels - with different IDs - that could apply to the same
-object, so the number of combinations would balloon further.
-
-### Attempt B2: store label titles for each object
-
-From the perspective of updating the object, this is the worst
-option. We have to bulk update the objects when:
-
-1. The objects are moved from one project to another.
-1. The project is moved from one group to another.
-1. The label is renamed.
-1. The label is deleted.
-
-It also uses much more storage. Querying is simple, though:
-
-```sql
-WHERE
- label_titles @> ARRAY['Plan', 'backend']
-```
-
-And our [tests in issue #49651](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/49651#note_188777346)
-showed that this could be fast.
-
-However, at present, the disadvantages outweigh the advantages.
-
-## Conclusion
-
-We found a method A2 that does not need denormalization and improves the query performance significantly. This
-did not apply to all cases, but we were able to apply method A1 to the rest of the cases so that we remove the
-`GROUP BY` and `HAVING` clauses in all scenarios.
-
-This simplified the query and improved the performance in the most common cases.
+<!-- This redirect file can be deleted after <2022-11-05>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->