Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/architecture/blueprints/ci_data_decay')
-rw-r--r--doc/architecture/blueprints/ci_data_decay/index.md8
-rw-r--r--doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md16
2 files changed, 17 insertions, 7 deletions
diff --git a/doc/architecture/blueprints/ci_data_decay/index.md b/doc/architecture/blueprints/ci_data_decay/index.md
index 8808a526df0..7c0bdf299db 100644
--- a/doc/architecture/blueprints/ci_data_decay/index.md
+++ b/doc/architecture/blueprints/ci_data_decay/index.md
@@ -48,8 +48,8 @@ PostgreSQL database running on GitLab.com.
This volume contributes to significant performance problems, development
challenges and is often related to production incidents.
-We also expect a [significant growth in the number of builds executed on
-GitLab.com](../ci_scale/index.md) in the upcoming years.
+We also expect a [significant growth in the number of builds executed on GitLab.com](../ci_scale/index.md)
+in the upcoming years.
## Opportunity
@@ -61,8 +61,8 @@ pipelines that are older than a few months might help us to move this data out
of the primary database, to a different storage, that is more performant and
cost effective.
-It is already possible to prevent processing builds [that have been
-archived](../../../user/admin_area/settings/continuous_integration.md#archive-jobs).
+It is already possible to prevent processing builds
+[that have been archived](../../../user/admin_area/settings/continuous_integration.md#archive-jobs).
When a build gets archived it will not be possible to retry it, but we still do
keep all the processing metadata in the database, and it consumes resources
that are scarce in the primary database.
diff --git a/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md b/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md
index 60b20c50696..868dae4fc6c 100644
--- a/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md
+++ b/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md
@@ -306,9 +306,19 @@ We also need to build a proof of concept for removing data on the PostgreSQL
side (using foreign keys with `ON DELETE CASCADE`) and removing data through
Rails associations, as this might be an important area of uncertainty.
-We need to [better understand](https://gitlab.com/gitlab-org/gitlab/-/issues/360148)
-how unique constraints we are currently using will perform when using the
-partitioned schema.
+We [learned](https://gitlab.com/gitlab-org/gitlab/-/issues/360148) that `PostgreSQL`
+does not allow to create a single index (unique or otherwise) across all partitions of a table.
+
+One solution to solve this problem is to embed the partitioning key inside the uniqueness constraint.
+
+This might mean prepending the partition ID in a hexadecimal format before the token itself and storing
+the concatenated string in a database. To do that we would need to reserve an appropriate number of
+leading bytes in a token to accommodate for the maximum number of partitions we may have in the future.
+It seems that reserving four characters, what would translate into 16-bits number in base-16,
+might be sufficient. The maximum number we can encode this way would be FFFF, what is 65535 in decimal.
+
+This would provide a unique constraint per-partition which
+is sufficient for global uniqueness.
We have also designed a query analyzer that makes it possible to detect direct
usage of zero partitions, legacy tables that have been attached as first