Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/architecture/blueprints')
-rw-r--r--doc/architecture/blueprints/ci_data_decay/decomposition_partitioning_comparison.pngbin0 -> 33212 bytes
-rw-r--r--doc/architecture/blueprints/ci_data_decay/index.md22
-rw-r--r--doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md478
-rw-r--r--doc/architecture/blueprints/ci_scale/index.md6
-rw-r--r--doc/architecture/blueprints/database/scalability/patterns/index.md2
-rw-r--r--doc/architecture/blueprints/database/scalability/patterns/read_mostly.md4
-rw-r--r--doc/architecture/blueprints/database/scalability/patterns/time_decay.md14
-rw-r--r--doc/architecture/blueprints/database_scaling/size-limits.md19
-rw-r--r--doc/architecture/blueprints/database_testing/index.md5
-rw-r--r--doc/architecture/blueprints/runner_scaling/index.md16
10 files changed, 533 insertions, 33 deletions
diff --git a/doc/architecture/blueprints/ci_data_decay/decomposition_partitioning_comparison.png b/doc/architecture/blueprints/ci_data_decay/decomposition_partitioning_comparison.png
new file mode 100644
index 00000000000..47a53ad9be5
--- /dev/null
+++ b/doc/architecture/blueprints/ci_data_decay/decomposition_partitioning_comparison.png
Binary files differ
diff --git a/doc/architecture/blueprints/ci_data_decay/index.md b/doc/architecture/blueprints/ci_data_decay/index.md
index cbd9f5dea7a..8808a526df0 100644
--- a/doc/architecture/blueprints/ci_data_decay/index.md
+++ b/doc/architecture/blueprints/ci_data_decay/index.md
@@ -135,6 +135,9 @@ builds archival, to make it consistent and reliable.
Epic: [Partition CI/CD pipelines database tables](https://gitlab.com/groups/gitlab-org/-/epics/5417).
+For more technical details about this topic see
+[pipeline data partitioning design](pipeline_partitioning.md).
+
### Partition CI/CD builds queuing database tables
While working on the [CI/CD Scale](../ci_scale/index.md) blueprint, we have
@@ -159,6 +162,9 @@ business rule is present in the product since the inception of GitLab CI.
Epic: [Partition CI/CD builds queuing database tables](https://gitlab.com/groups/gitlab-org/-/epics/7438).
+For more technical details about this topic see
+[pipeline data partitioning design](pipeline_partitioning.md).
+
## Principles
All the three tracks we will use to implement CI/CD time decay pattern are
@@ -224,6 +230,22 @@ All three tracks can be worked on in parallel:
In progress.
+## Timeline
+
+- 2021-01-21: Parent [CI Scaling](../ci_scale/) blueprint [merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/52203) created.
+- 2021-04-26: CI Scaling blueprint approved and merged.
+- 2021-09-10: CI/CD data time decay blueprint discussions started.
+- 2022-01-07: CI/CD data time decay blueprint [merged](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/70052).
+- 2022-02-01: Blueprint [updated](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/79110) with new content and links to epics.
+- 2022-02-08: Pipeline partitioning PoC [merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/80186) started.
+- 2022-02-23: Pipeline partitioning PoC [successful](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/80186#note_852704724)
+- 2022-03-07: A way to attach an existing table as a partition [found and proven](https://gitlab.com/gitlab-org/gitlab/-/issues/353380#note_865237214).
+- 2022-03-23: Pipeline partitioning design [Google Doc](https://docs.google.com/document/d/1ARdoTZDy4qLGf6Z1GIHh83-stG_ZLpqsibjKr_OXMgc) started.
+- 2022-03-29: Pipeline partitioning PoC [concluded](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/80186#note_892674358).
+- 2022-04-15: Partitioned pipeline data associations PoC [shipped](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/84071).
+- 2022-04-30: Additional [benchmarking started](https://gitlab.com/gitlab-org/gitlab/-/issues/361019) to evaluate impact.
+- 2022-06-31: [Pipeline partitioning design](pipeline_partitioning.md) document [merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/87683) merged.
+
## Who
Proposal:
diff --git a/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md b/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md
new file mode 100644
index 00000000000..60b20c50696
--- /dev/null
+++ b/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md
@@ -0,0 +1,478 @@
+---
+stage: none
+group: unassigned
+comments: false
+description: 'Pipeline data partitioning design'
+---
+
+# Pipeline data partitioning design
+
+_Disclaimer: The following contains information related to upcoming products,
+features, and functionality._
+
+_It is important to note that the information presented is for informational
+purposes only. Please do not rely on this information for purchasing or
+planning purposes._
+
+_As with all projects, the items mentioned in this document and linked pages
+are subject to change or delay. The development, release and timing of any
+products, features, or functionality remain at the sole discretion of GitLab
+Inc._
+
+## What problem are we trying to solve?
+
+We want to partition the CI/CD dataset, because some of the database tables are
+extremely large, which might be challenging in terms of scaling single node
+reads, even after we ship the CI/CD database decomposition.
+
+We want to reduce the risk of database performance degradation by transforming
+a few of the largest database tables into smaller ones using PostgreSQL
+declarative partitioning.
+
+See more details about this effort in [the parent blueprint](index.md).
+
+![pipeline data time decay](pipeline_data_time_decay.png)
+
+## How are CI/CD data decomposition, partitioning, and time-decay related?
+
+CI/CD decomposition is an extraction of a CI/CD database cluster out of the
+"main" database cluster, to make it possible to have a different primary
+database receiving writes. The main benefit is doubling the capacity for writes
+and data storage. The new database cluster will not have to serve reads /
+writes for non-CI/CD database tables, so this offers some additional capacity
+for reads too.
+
+CI/CD partitioning is dividing large CI/CD database tables into smaller ones.
+This will improve reads capacity on every CI/CD database node, because it is
+much less expensive to read data from small tables, than from large
+multi-terabytes tables. We can add more CI/CD database replicas to better
+handle the increase in the number of SQL queries that are reading data, but we
+need partitioning to perform a single read more efficiently. Performance in
+other aspects will improve too, because PostgreSQL will be more efficient in
+maintaining multiple small tables than in maintaining a very large database
+table.
+
+CI/CD time-decay allows us to benefit from the strong time-decay
+characteristics of pipeline data. It can be implemented in many different ways,
+but using partitioning to implement time-decay might be especially beneficial.
+When implementing a time decay we usually mark data as archived, and migrate it
+out of a database to a different place when data is no longer relevant or
+needed. Our dataset is extremely large (tens of terabytes), so moving such a
+high volume of data is challenging. When time-decay is implemented using
+partitioning, we can archive the entire partition (or set of partitions) by
+simply updating a single record in one of our database tables. It is one of the
+least expensive ways to implement time-decay patterns at a database level.
+
+![decomposition_partitioning_comparison.png](decomposition_partitioning_comparison.png)
+
+## Why do we need to partition CI/CD data?
+
+We need to partition CI/CD data because our database tables storing pipelines,
+builds, and artifacts are too large. The `ci_builds` database table size is
+currently around 2.5 TB with an index of around 1.4 GB. This is too much and
+violates our [principle of 100 GB max size](../database_scaling/size-limits.md).
+We also want to [build alerting](https://gitlab.com/gitlab-com/gl-infra/tamland/-/issues/5)
+to notify us when this number is exceeded.
+
+We’ve seen numerous S1 and S2 database-related production environment
+incidents, over the last couple of months, for example:
+
+- S1: 2022-03-17 [Increase in writes in `ci_builds` table](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6625)
+- S1: 2021-11-22 [Excessive buffer read in replicas for `ci_job_artifacts`](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/5952)
+- S2: 2022-04-12 [Transactions detected that have been running for more than 10m](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6821)
+- S2: 2022-04-06 [Database contention plausibly caused by excessive `ci_builds` reads](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6773)
+- S2: 2022-03-18 [Unable to remove a foreign key on `ci_builds`](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6642)
+
+We have approximately 50 `ci_*` prefixed database tables, and some of them
+would benefit from partitioning.
+
+A simple SQL query to get this data:
+
+```sql
+WITH tables AS (SELECT table_name FROM information_schema.tables WHERE table_name LIKE 'ci_%')
+ SELECT table_name,
+ pg_size_pretty(pg_total_relation_size(quote_ident(table_name))) AS total_size,
+ pg_size_pretty(pg_relation_size(quote_ident(table_name))) AS table_size,
+ pg_size_pretty(pg_indexes_size(quote_ident(table_name))) AS index_size,
+ pg_total_relation_size(quote_ident(table_name)) AS total_size_bytes
+ FROM tables ORDER BY total_size_bytes DESC;
+```
+
+See data from March 2022:
+
+| Table name | Total size | Index size |
+|-------------------------|------------|------------|
+| `ci_builds` | 3.5 TB | 1 TB |
+| `ci_builds_metadata` | 1.8 TB | 150 GB |
+| `ci_job_artifacts` | 600 GB | 300 GB |
+| `ci_pipelines` | 400 GB | 300 GB |
+| `ci_stages` | 200 GB | 120 GB |
+| `ci_pipeline_variables` | 100 GB | 20 GB |
+| (...around 40 more) | | |
+
+Based on the table above, it is clear that there are tables with a lot of
+stored data.
+
+While we have almost 50 CI/CD-related database tables, we are initially
+interested in partitioning only 6 of them. We can start by partitioning the
+most interesting tables in an iterative way, but we also should have a strategy
+for partitioning the remaining ones if needed. This document is an attempt to
+capture this strategy, describe as many details as possible, to share this
+knowledge among engineering teams.
+
+## How do we want to partition CI/CD data?
+
+We want to partition the CI/CD tables in iterations. It might not be feasible
+to partition all of the 6 initial tables at once, so an iterative strategy
+might be necessary. We also want to have a strategy for partitioning the
+remaining database tables when it becomes necessary.
+
+It is also important to avoid large data migrations. We store almost 6
+terabytes of data in the biggest CI/CD tables, in many different columns and
+indexes. Migrating this amount of data might be challenging and could cause
+instability in the production environment. Due to this concern, we’ve developed
+a way to attach an existing database table as a partition zero without downtime
+and excessive database locking, what has been demonstrated in one of the
+[first proofs of concept](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/80186).
+This makes creation of a partitioned schema possible without a downtime (for
+example using a routing table `p_ci_pipelines`), by attaching an existing
+`ci_pipelines` table as partition zero without exclusive locking. It will be
+possible to use the legacy table as usual, but we can create the next partition
+when needed and the `p_ci_pipelines` table will be used for routing queries. To
+use the routing table we need to find a good partitioning key.
+
+Our plan is to use logical partition IDs. We want to start with the
+`ci_pipelines` table and create a `partition_id` column with a `DEFAULT` value
+of `100` or `1000`. Using a `DEFAULT` value avoids the challenge of backfilling
+this value for every row. Adding a `CHECK` constraint prior to attaching the
+first partition tells PostgreSQL that we’ve already ensured consistency and
+there is no need to check it while holding an exclusive table lock when
+attaching this table as a partition to the routing table (partitioned schema
+definition). We will increment this value every time we create a new partition
+for `p_ci_pipelines`, and the partitioning strategy will be `LIST`
+partitioning.
+
+We will also create a `partition_id` column in the other initial 6 database
+tables we want to iteratively partition. After a new pipeline is created, it
+will get a `partition_id` assigned, and all the related resources, like builds
+and artifacts, will share the same value. We want to add the `partition_id`
+column into all 6 problematic tables because we can avoid backfilling this data
+when we decide it is time to start partitioning them.
+
+We want to partition CI/CD data iteratively, so we will start with the
+pipelines table, and create at least one, but likely two, partitions. The
+pipelines table will be partitioned using the `LIST` partitioning strategy. It
+is possible that, after some time, `p_ci_pipelines` will store data in two
+partitions with IDs of `100` and `101`. Then we will try partitioning
+`ci_builds`. Therefore we might want to use `RANGE` partitioning in
+`p_ci_builds` with IDs `100` and `101`, because builds for the two logical
+partitions used will still be stored in a single table.
+
+Physical partitioning and logical partitioning will be separated, and a
+strategy will be determined when we implement partitioning for the respective
+database tables. Using `RANGE` partitioning works similarly to using `LIST`
+partitioning in database tables other than `ci_pipelines`, but because we can
+guarantee continuity of `partition_id` values, using `RANGE` partitioning might
+be a better strategy.
+
+## Why do we want to use explicit logical partition ids?
+
+Partitioning CI/CD data using a logical `partition_id` has several benefits. We
+could partition by a primary key, but this would introduce much more complexity
+and additional cognitive load required to understand how the data is being
+structured and stored in partitions.
+
+CI/CD data is hierarchical data. Stages belong to pipelines, builds belong to
+stages, artifacts belong to builds (with rare exceptions). We are designing a
+partitioning strategy that reflects this hierarchy, to reduce the complexity
+and therefore cognitive load for contributors. With an explicit `partition_id`
+associated with a pipeline, we can cascade the partition ID number when trying
+to retrieve all resources associated with a pipeline. We know that for a
+pipeline `12345` with a `partition_id` of `102`, we are always able to find
+associated resources in logical partitions with number `102` in other routing
+tables, and PostgreSQL will know in which partitions these records are being
+stored in for every table.
+
+Another interesting benefit for using a single and incremental latest
+`partition_id` number, associated with pipelines, is that in theory we can
+cache it in Redis or in memory to avoid excessive reads from the database to
+find this number, though we might not need to do this.
+
+The single and uniform `partition_id` value for pipeline data gives us more
+choices later on than primary-keys-based partitioning.
+
+## Splitting large partitions into smaller ones
+
+We want to start with the initial `pipeline_id` number `100` (or higher, like
+`1000`, depending on our calculations and estimations). We do not want to start
+from 1, because existing tables are also large already, and we might want to
+split them into smaller partitions. If we start with `100`, we will be able to
+create partitions for `partition_id` of `1`, `20`, `45`, and move existing
+records there by updating `partition_id` from `100` to a smaller number.
+
+PostgreSQL will move these records into their respective partitions in a
+consistent way, provided that we do it in a transaction for all pipeline
+resources at the same time. If we ever decide to split large partitions into
+smaller ones (it's not yet clear if we will need to do this), we might be able
+to just use background migrations to update partition IDs, and PostgreSQL is
+smart enough to move rows between partitions on its own.
+
+## Storing partitions metadata in the database
+
+In order to build an efficient mechanism that will be responsible for creating
+new partitions, and to implement time decay we want to introduce a partitioning
+metadata table, called `ci_partitions`. In that table we would store metadata
+about all the logical partitions, with many pipelines per partition. We may
+need to store a range of pipeline ids per logical partition. Using it we will
+be able to find the `partition_id` number for a given pipeline ID and we will
+also find information about which logical partitions are “active” or
+“archived”, which will help us to implement a time-decay pattern using database
+declarative partitioning.
+
+`ci_partitions` table will store information about a partition identifier,
+pipeline ids range it is valid for and whether the partitions have been
+archived or not. Additional columns with timestamps may be helpful too.
+
+## Implementing a time-decay pattern using partitioning
+
+We can use `ci_partitions` to implement a time-decay pattern using declarative
+partitioning. By telling PostgreSQL which logical partitions are archived we
+can stop reading from these partitions using a SQL query like the one below.
+
+```sql
+SELECT * FROM ci_builds WHERE partition_id IN (
+ SELECT id FROM ci_partitions WHERE active = true
+);
+```
+
+This query will make it possible to limit the number of partitions we will read
+from, and therefore will cut access to "archived" pipeline data, using our data
+retention policy for CI/CD data. Ideally we do not want to read from more than
+two partitions at once, so we need to align the automatic partitioning
+mechanisms with the time-decay policy. We will still need to implement new
+access patterns for the archived data, presumably through the API, but the cost
+of storing archived data in PostgreSQL will be reduced significantly this way.
+
+There are some technical details here that are out of the scope of this
+description, but by using this strategy we can "archive" data, and make it much
+less expensive to reside in our PostgreSQL cluster by simply toggling a boolean
+column value.
+
+## Accessing partitioned data
+
+It will be possible to access partitioned data whether it has been archived or
+not, in most places in GitLab. On a merge request page, we will always show
+pipeline details even if the merge request was created years ago. We can do
+that because `ci_partitions` will be a lookup table associating a pipeline ID
+with its `partition_id`, and we will be able to find the partition that the
+pipeline data is stored in.
+
+We will need to constrain access to searching through pipelines, builds,
+artifacts etc. Search can not be done through all partitions, as it would not
+be efficient enough, hence we will need to find a better way of searching
+through archived pipelines data. It will be necessary to have different access
+patterns to access archived data in the UI and API.
+
+There are a few challenges in enforcing usage of the `partition_id`
+partitioning key in PostgreSQL. To make it easier to update our application to
+support this, we have designed a new queries analyzer in our
+[proof of concept merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/80186).
+It helps to find queries that are not using the partitioning key.
+
+In a [separate proof of concept merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/84071)
+and [related issue](https://gitlab.com/gitlab-org/gitlab/-/issues/357090) we
+demonstrated that using the uniform `partition_id` makes it possible to extend
+Rails associations with an additional scope modifier so we can provide the
+partitioning key in the SQL query.
+
+Using instance dependent associations, we can easily append a partitioning key
+to SQL queries that are supposed to retrieve associated pipeline resources, for
+example:
+
+```ruby
+has_many :builds, -> (pipeline) { where(partition_id: pipeline.partition_id) }
+```
+
+The problem with this approach is that it makes preloading much more difficult
+as instance dependent associations can not be used with preloads:
+
+```plaintext
+ArgumentError: The association scope 'builds' is instance dependent (the
+scope block takes an argument). Preloading instance dependent scopes is not
+supported.
+```
+
+We also need to build a proof of concept for removing data on the PostgreSQL
+side (using foreign keys with `ON DELETE CASCADE`) and removing data through
+Rails associations, as this might be an important area of uncertainty.
+
+We need to [better understand](https://gitlab.com/gitlab-org/gitlab/-/issues/360148)
+how unique constraints we are currently using will perform when using the
+partitioned schema.
+
+We have also designed a query analyzer that makes it possible to detect direct
+usage of zero partitions, legacy tables that have been attached as first
+partitions to routing tables, to ensure that all queries are targeting
+partitioned schema or partitioned routing tables, like `p_ci_pipelines`.
+
+## Why not partition using the project or namespace ID?
+
+We do not want to partition using `project_id` or `namespace_id` because
+sharding and podding is a different problem to solve, on a different layer of
+the application. It doesn't solve the original problem statement of performance
+growing worse over time as we build up infrequently read data. We may want to
+introduce pods in the future, and that might become the primary mechanism of
+separating data based on the group or project the data is associated with.
+
+In theory we could use either `project_id` or `namespace_id` as a second
+partitioning dimension, but this would add more complexity to a problem that is
+already very complex.
+
+## Partitioning builds queuing tables
+
+We also want to partition our builds queuing tables. We currently have two:
+`ci_pending_builds` and `ci_running_builds`. These tables are different from
+other CI/CD data tables, as there are business rules in our product that make
+all data stored in them invalid after 24 hours.
+
+As a result, we will need to use a different strategy to partition those
+database tables, by removing partitions entirely after these are older than 24
+hours, and always reading from two partitions through a routing table. The
+strategy to partition these tables is well understood, but requires a solid
+Ruby-based automation to manage the creation and deletion of these partitions.
+To achieve that we will collaborate with the Database team to adapt
+[existing database partitioning tools](../../../development/database/table_partitioning.md)
+to support CI/CD data partitioning.
+
+## Iterating to reduce the risk
+
+This strategy should reduce the risk of implementing CI/CD partitioning to
+acceptable levels. We are also focusing on implementing partitioning for
+reading only from two partitions initially to make it possible to detach zero
+partitions in case of problems in our production environment. Every iteration
+phase, described below has a revert strategy and before shipping database
+changes we want to test them in our benchmarking environment.
+
+The main way of reducing risk in case of this effort is iteration and making
+things reversible. Shipping changes, described in this document, in a safe and
+reliable way is our priority.
+
+As we move forward with the implementation we will need to find even more ways
+to iterate on the design, support incremental rollouts and have better control
+over reverting changes in case of something going wrong. It is sometimes
+challenging to ship database schema changes iteratively, and even more
+difficult to support incremental rollouts to the production environment. This
+can, however, be done, it just sometimes requires additional creativity, that
+we will certainly need here. Some examples of how this could look like:
+
+### Incremental rollout of partitioned schema
+
+Once we introduce a first partitioned routing table (presumably
+`p_ci_pipelines`) and attach its zero partition (`ci_pipelines`), we will need
+to start interacting with the new routing table, instead of a concrete
+partition zero. Usually we would override the database table the `Ci::Pipeline`
+Rails model would use with something like `self.table_name = 'p_ci_pipelines'`.
+Unfortunately this approach might not support incremental rollout, because
+`self.table_name` will be read upon application boot up, and later we might be
+unable revert this change without restarting the application.
+
+One way of solving this might be introducing `Ci::Partitioned::Pipeline` model,
+that will inherit from `Ci::Pipeline`. In that model we would set
+`self.table_name` to `p_ci_pipeline` and return its meta class from
+`Ci::Pipeline.partitioned` as a scope. This will allow us to use feature flags
+to route reads from `ci_pipelines` to `p_ci_pipelines` with a simple revert
+strategy.
+
+### Incremental experimentation with partitioned reads
+
+Another example would be related to the time when we decide to attach another
+partition. The goal of Phase 1 will be have two partitions per partitioned
+schema / routing table, meaning that for `p_ci_pipelines` we will have
+`ci_pipelines` attached as partition zero, and a new `ci_pipelines_p1`
+partition created for new data. All reads from `p_ci_pipelines` will also need
+to read data from the `p1` partition and we should also iteratively experiment
+with reads targeting more than one partition, to evaluate performance and
+overhead of partitioning.
+
+We can do that by moving _old_ data to `ci_pipelines_m1` (minus 1) partition
+iteratively. Perhaps we will create `partition_id = 1` and move some really old
+pipelines there. We can then iteratively migrate data into `m1` partition to
+measure the impact, performance and increase our confidence before creating a
+new partition `p1` for _new_ (still not created) data.
+
+## Iterations
+
+We want to focus on Phase 1 iteration first. The goal and the main objective of
+this iteration is to partition the biggest 6 CI/CD database tables into 6
+routing tables (partitioned schema) and 12 partitions. This will leave our
+Rails SQL queries mostly unchanged, but it will also make it possible to
+perform emergency detachment of "zero partitions" if there is a database
+performance degradation. This will cut users off their old data, but the
+application will remain up and running, which is a better alternative to
+application-wide outage.
+
+1. **Phase 0**: Build CI/CD data partitioning strategy: Done. ✅
+1. **Phase 1**: Partition the 6 biggest CI/CD database tables.
+ 1. Create partitioned schemas for all 6 database tables.
+ 1. Design a way to cascade `partition_id` to all partitioned resources.
+ 1. Implement initial query analyzers validating that we target routing tables.
+ 1. Attach zero partitions to the partitioned database tables.
+ 1. Update the application to target routing tables and partitioned tables.
+ 1. Measure the performance and efficiency of this solution.
+
+ **Revert strategy**: Switch back to using concrete partitions instead of routing tables.
+
+1. **Phase 2**: Add a partitioning key to add SQL queries targeting partitioned tables.
+ 1. Implement query analyzer to check if queries targeting partitioned tables
+ are using proper partitioning keys.
+ 1. Modify existing queries to make sure that all of them are using a
+ partitioning key as a filter.
+
+ **Revert strategy**: Use feature flags, query by query.
+
+1. **Phase 3**: Build new partitioned data access patterns.
+ 1. Build a new API or extend an existing one to allow access to data stored in
+ partitions that are supposed to be excluded based on the time-decay data
+ retention policy.
+
+ **Revert strategy**: Feature flags.
+
+1. **Phase 4**: Introduce time-decay mechanisms built on top of partitioning.
+ 1. Build time-decay policy mechanisms.
+ 1. Enable the time-decay strategy on GitLab.com.
+1. **Phase 5**: Introduce mechanisms for creating partitions automatically.
+ 1. Make it possible to create partitions in an automatic way.
+ 1. Deliver the new architecture to self-managed instances.
+
+## Conclusions
+
+We want to build a solid strategy for partitioning CI/CD data. We are aware of
+the fact that it is difficult to iterate on this design, because a mistake made
+in managing the database schema of our multi-terabyte PostgreSQL instance might
+not be easily reversible without potential downtime. That is the reason we are
+spending a significant amount of time to research and refine our partitioning
+strategy. The strategy, described in this document, is subject to iteration as
+well. Whenever we find a better way to reduce the risk and improve our plan, we
+should update this document as well.
+
+We’ve managed to find a way to avoid large-scale data migrations, and we are
+building an iterative strategy for partitioning CI/CD data. We documented our
+strategy here to share knowledge and solicit feedback from other team members.
+
+## Who
+
+Authors:
+
+<!-- vale gitlab.Spelling = NO -->
+
+| Role | Who |
+|--------|----------------|
+| Author | Grzegorz Bizon |
+
+Recommenders:
+
+| Role | Who |
+|------------------------|-----------------|
+| Distingiushed Engineer | Kamil Trzciński |
+
+<!-- vale gitlab.Spelling = YES -->
diff --git a/doc/architecture/blueprints/ci_scale/index.md b/doc/architecture/blueprints/ci_scale/index.md
index be21f824392..1c3aee2f860 100644
--- a/doc/architecture/blueprints/ci_scale/index.md
+++ b/doc/architecture/blueprints/ci_scale/index.md
@@ -37,7 +37,7 @@ to sustain future growth.
### We are running out of the capacity to store primary keys
The primary key in `ci_builds` table is an integer generated in a sequence.
-Historically, Rails used to use [integer](https://www.postgresql.org/docs/9.1/datatype-numeric.html)
+Historically, Rails used to use [integer](https://www.postgresql.org/docs/14/datatype-numeric.html)
type when creating primary keys for a table. We did use the default when we
[created the `ci_builds` table in 2012](https://gitlab.com/gitlab-org/gitlab/-/blob/046b28312704f3131e72dcd2dbdacc5264d4aa62/db/ci/migrate/20121004165038_create_builds.rb).
[The behavior of Rails has changed](https://github.com/rails/rails/pull/26266)
@@ -55,8 +55,8 @@ that have the same problem.
Primary keys problem will be tackled by our Database Team.
-**Status**: As of October 2021 the primary keys in CI tables have been migrated
-to big integers.
+**Status**: In October 2021, the primary keys in CI tables were migrated
+to big integers. See the [related Epic](https://gitlab.com/groups/gitlab-org/-/epics/5657) for more details.
### The table is too large
diff --git a/doc/architecture/blueprints/database/scalability/patterns/index.md b/doc/architecture/blueprints/database/scalability/patterns/index.md
index dadf3634407..4a9bb003763 100644
--- a/doc/architecture/blueprints/database/scalability/patterns/index.md
+++ b/doc/architecture/blueprints/database/scalability/patterns/index.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
comments: false
diff --git a/doc/architecture/blueprints/database/scalability/patterns/read_mostly.md b/doc/architecture/blueprints/database/scalability/patterns/read_mostly.md
index 02b56841507..0780ae3c4d5 100644
--- a/doc/architecture/blueprints/database/scalability/patterns/read_mostly.md
+++ b/doc/architecture/blueprints/database/scalability/patterns/read_mostly.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
comments: false
@@ -114,7 +114,7 @@ consider in this pattern (see [#327483](https://gitlab.com/gitlab-org/gitlab/-/i
To reduce the database overhead, we implement a cache for the data and thus significantly
reduce the query frequency on the database side. There are different scopes for caching available:
-- `RequestStore`: per-request in-memory cache (based on [request_store gem](https://github.com/steveklabnik/request_store))
+- `RequestStore`: per-request in-memory cache (based on [`request_store` gem](https://github.com/steveklabnik/request_store))
- [`ProcessMemoryCache`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/process_memory_cache.rb#L4): per-process in-memory cache (a `ActiveSupport::Cache::MemoryStore`)
- [`Gitlab::Redis::Cache`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/redis/cache.rb) and `Rails.cache`: full-blown cache in Redis
diff --git a/doc/architecture/blueprints/database/scalability/patterns/time_decay.md b/doc/architecture/blueprints/database/scalability/patterns/time_decay.md
index b4614cde9d4..7a64f0cb7c6 100644
--- a/doc/architecture/blueprints/database/scalability/patterns/time_decay.md
+++ b/doc/architecture/blueprints/database/scalability/patterns/time_decay.md
@@ -1,5 +1,5 @@
---
-stage: Enablement
+stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
comments: false
@@ -71,7 +71,7 @@ The second and most important characteristic of time-decay data is that most of
able to implicitly or explicitly access the data using a date filter,
**restricting our results based on a time-related dimension**.
-There can be many such dimensions, but we are only going to focus on the creation date as it is both
+There can be many such dimensions, but we focus only on the creation date as it is both
the most commonly used, and the one that we can control and optimize against. It:
- Is immutable.
@@ -107,7 +107,7 @@ perspective, but that definition is volatile and not actionable.
Finally, a characteristic that further differentiates time-decay data in sub-categories with
slightly different approaches available is **whether we want to keep the old data or not**
(for example, retention policy) and/or
-**whether old data will be accessible by users through the application**.
+**whether old data is accessible by users through the application**.
#### (optional) Extended definition of time-decay data
@@ -148,7 +148,7 @@ factors:
would include too many unnecessary records in each partition, as is the case for `web_hook_logs`.
1. **How large are the partitions created?**
The major purpose of partitioning is accessing tables that are as small as possible. If they get too
- large by themselves, queries will start underperforming. We may have to re-partition (split) them
+ large by themselves, queries start underperforming. We may have to re-partition (split) them
in even smaller partitions.
The perfect partitioning scheme keeps **all queries over a dataset almost always over a single partition**,
@@ -194,7 +194,7 @@ The disadvantage of such a solution over large, non-partitioned tables is that w
access and delete all the records that are considered as not relevant any more. That is a very
expensive operation, due to multi-version concurrency control in PostgreSQL. It also leads to the
pruning worker not being able to catch up with new records being created, if that rate exceeds a
-threshold, as is the case of [web_hook_logs](https://gitlab.com/gitlab-org/gitlab/-/issues/256088)
+threshold, as is the case of [`web_hook_logs`](https://gitlab.com/gitlab-org/gitlab/-/issues/256088)
at the time of writing this document.
For the aforementioned reasons, our proposal is that
@@ -315,7 +315,7 @@ The process required follows:
1. After the non-partitioned table is dropped, we can add a worker to implement the
pruning strategy by dropping past partitions.
- In this case, the worker will make sure that only 4 partitions are always active (as the
+ In this case, the worker makes sure that only 4 partitions are always active (as the
retention policy is 90 days) and drop any partitions older than four months. We have to keep 4
months of partitions while the current month is still active, as going 90 days back takes you to
the fourth oldest partition.
@@ -325,7 +325,7 @@ The process required follows:
Related epic: [Partitioning: Design and implement partitioning strategy for Audit Events](https://gitlab.com/groups/gitlab-org/-/epics/3206)
The `audit_events` table shares a lot of characteristics with the `web_hook_logs` table discussed
-in the previous sub-section, so we are going to focus on the points they differ.
+in the previous sub-section, so we focus on the points they differ.
The consensus was that
[partitioning could solve most of the performance issues](https://gitlab.com/groups/gitlab-org/-/epics/3206#note_338157248).
diff --git a/doc/architecture/blueprints/database_scaling/size-limits.md b/doc/architecture/blueprints/database_scaling/size-limits.md
index a0508488620..375c82f8833 100644
--- a/doc/architecture/blueprints/database_scaling/size-limits.md
+++ b/doc/architecture/blueprints/database_scaling/size-limits.md
@@ -1,17 +1,18 @@
---
+stage: Data Stores
+group: Database
comments: false
description: 'Database Scalability / Limit table sizes'
-group: database
---
# Database Scalability: Limit on-disk table size to < 100 GB for GitLab.com
-This document is a proposal to work towards reducing and limiting table sizes on GitLab.com. We establish a **measurable target** by limiting table size to a certain threshold. This will be used as an indicator to drive database focus and decision making. With GitLab.com growing, we continuously re-evaluate which tables need to be worked on to prevent or otherwise fix violations.
+This document is a proposal to work towards reducing and limiting table sizes on GitLab.com. We establish a **measurable target** by limiting table size to a certain threshold. This is used as an indicator to drive database focus and decision making. With GitLab.com growing, we continuously re-evaluate which tables need to be worked on to prevent or otherwise fix violations.
This is not meant to be a hard rule but rather a strong indication that work needs to be done to break a table apart or otherwise reduce its size.
This is meant to be read in context with the [Database Sharding blueprint](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/64115),
-which paints the bigger picture. This proposal here is thought to be part of the "debloating step" below, as we aim to reduce storage requirements and improve data modeling. Partitioning is part of the standard tool-belt: where possible, we can already use partitioning as a solution to cut physical table sizes significantly. Both will help to prepare efforts like decomposition (database usage is already optimized) and sharding (database is already partitioned along an identified data access dimension).
+which paints the bigger picture. This proposal here is thought to be part of the "debloating step" below, as we aim to reduce storage requirements and improve data modeling. Partitioning is part of the standard tool-belt: where possible, we can already use partitioning as a solution to cut physical table sizes significantly. Both help to prepare efforts like decomposition (database usage is already optimized) and sharding (database is already partitioned along an identified data access dimension).
```mermaid
graph LR
@@ -36,7 +37,7 @@ Large tables on GitLab.com are a major problem - for both operations and develop
1. **Table maintenance** becomes much more costly. Vacuum activity has become a significant concern on GitLab.com - with large tables only seeing infrequent (once per day) processing and vacuum runs taking many hours to complete. This has various negative consequences and a very large table has potential to impact seemingly unrelated parts of the database and hence overall application performance suffers.
1. **Data migrations** on large tables are significantly more complex to implement and incur development overhead. They have potential to cause stability problems on GitLab.com and take a long time to execute on large datasets.
1. **Indexes size** is significant. This directly impacts performance as smaller parts of the index are kept in memory and also makes the indexes harder to maintain (think repacking).
-1. **Index creation times** go up significantly - in 2021, we see btree creation take up to 6 hours for a single btree index. This impacts our ability to deploy frequently and leads to vacuum-related problems (delayed cleanup).
+1. **Index creation times** go up significantly - in 2021, we see B-Tree creation take up to 6 hours for a single B-Tree index. This impacts our ability to deploy frequently and leads to vacuum-related problems (delayed cleanup).
1. We tend to add **many indexes** to mitigate, but this eventually causes significant overhead, can confuse the query planner and a large number of indexes is a smell of a design problem.
## Examples
@@ -124,11 +125,11 @@ In order to maintain and improve operational stability and lessen development bu
1. Indexes are smaller, can be maintained more efficiently and fit better into memory
1. Data migrations are easier to reason about, take less time to implement and execute
-This target is *pragmatic*: We understand table sizes depend on feature usage, code changes and other factors - which all change over time. We may not always find solutions where we can tightly limit the size of physical tables once and for all. That is acceptable though and we primarily aim to keep the situation on GitLab.com under control. We adapt our efforts to the situation present on GitLab.com and will re-evaluate frequently.
+This target is *pragmatic*: We understand table sizes depend on feature usage, code changes and other factors - which all change over time. We may not always find solutions where we can tightly limit the size of physical tables once and for all. That is acceptable though and we primarily aim to keep the situation on GitLab.com under control. We adapt our efforts to the situation present on GitLab.com and re-evaluate frequently.
-While there are changes we can make that lead to a constant maximum physical table size over time, this doesn't need to be the case necessarily. Consider for example hash partitioning, which breaks a table down into a static number of partitions. With data growth over time, individual partitions will also grow in size and may eventually reach the threshold size again. We strive to get constant table sizes, but it is acceptable to ship easier solutions that don't have this characteristic but improve the situation for a considerable amount of time.
+While there are changes we can make that lead to a constant maximum physical table size over time, this doesn't need to be the case necessarily. Consider for example hash partitioning, which breaks a table down into a static number of partitions. With data growth over time, individual partitions also grow in size and may eventually reach the threshold size again. We strive to get constant table sizes, but it is acceptable to ship easier solutions that don't have this characteristic but improve the situation for a considerable amount of time.
-As such, the target size of a physical table after refactoring depends on the situation and there is no hard rule for it. We suggest to consider historic data growth and forecast when physical tables will reach the threshold of 100 GB again. This allows us to understand how long a particular solution is expected to last until the model has to be revisited.
+As such, the target size of a physical table after refactoring depends on the situation and there is no hard rule for it. We suggest to consider historic data growth and forecast when physical tables reach the threshold of 100 GB again. This allows us to understand how long a particular solution is expected to last until the model has to be revisited.
## Solutions
@@ -153,10 +154,10 @@ For solutions like normalization, this is a trade-off: Denormalized models can s
A few examples can be found below, many more are organized under the epic [Database efficiency](https://gitlab.com/groups/gitlab-org/-/epics/5585).
1. [Reduce number of indexes on `ci_builds`](https://gitlab.com/groups/gitlab-org/-/epics/6203)
-1. [Normalize and de-duplicate committer and author details in merge_request_diff_commits](https://gitlab.com/gitlab-org/gitlab/-/issues/331823)
+1. [Normalize and de-duplicate committer and author details in `merge_request_diff_commits`](https://gitlab.com/gitlab-org/gitlab/-/issues/331823)
1. [Retention strategy for `ci_build_trace_sections`](https://gitlab.com/gitlab-org/gitlab/-/issues/32565#note_603138100)
1. [Implement worker that hard-deletes old CI jobs metadata](https://gitlab.com/gitlab-org/gitlab/-/issues/215646)
-1. [merge_request_diff_files violates < 100 GB target](https://gitlab.com/groups/gitlab-org/-/epics/6215) (epic)
+1. [`merge_request_diff_files` violates < 100 GB target](https://gitlab.com/groups/gitlab-org/-/epics/6215) (epic)
## Goal
diff --git a/doc/architecture/blueprints/database_testing/index.md b/doc/architecture/blueprints/database_testing/index.md
index 22857abf176..30ebd06c81f 100644
--- a/doc/architecture/blueprints/database_testing/index.md
+++ b/doc/architecture/blueprints/database_testing/index.md
@@ -20,7 +20,7 @@ Developers are expected to test database migrations prior to deploying to any en
The [code review phase](../../../development/database_review.md) involves Database Reviewers and Maintainers to manually check the migrations committed. This often involves knowing and spotting problematic patterns and their particular behavior on GitLab.com from experience. There is no large-scale environment available that allows us to test database migrations before they are being merged.
-Testing in CI is done on a very small database. We mainly check forward/backward migration consistency, evaluate Rubocop rules to detect well-known problematic behaviors (static code checking) and have a few other, rather technical checks in place (adding the right files etc). That is, we typically find code or other rather simple errors, but cannot surface any data related errors - which are also typically not covered by unit tests either.
+Testing in CI is done on a very small database. We mainly check forward/backward migration consistency, evaluate RuboCop rules to detect well-known problematic behaviors (static code checking) and have a few other, rather technical checks in place (adding the right files etc). That is, we typically find code or other rather simple errors, but cannot surface any data related errors - which are also typically not covered by unit tests either.
Once merged, migrations are being deployed to the staging environment. Its database size is less than 5% of the production database size as of January 2021 and its recent data distribution does not resemble the production site. Oftentimes, we see migrations succeed in staging but then fail in production due to query timeouts or other unexpected problems. Even if we caught problems in staging, this is still expensive to reconcile and ideally we want to catch those problems as early as possible in the development cycle.
@@ -127,7 +127,7 @@ An alternative approach we have discussed and abandoned is to "scrub" and anonym
<!-- vale gitlab.Spelling = NO -->
-This effort is owned and driven by the [GitLab Database Team](https://about.gitlab.com/handbook/engineering/development/enablement/database/) with support from the [GitLab.com Reliability Datastores](https://about.gitlab.com/handbook/engineering/infrastructure/team/reliability/datastores/) team.
+This effort is owned and driven by the [GitLab Database Team](https://about.gitlab.com/handbook/engineering/development/enablement/database/) with support from the [GitLab.com Reliability Datastores](https://about.gitlab.com/handbook/engineering/infrastructure/team/reliability/) team.
| Role | Who
|------------------------------|-------------------------|
@@ -142,7 +142,6 @@ DRIs:
| Role | Who
|------------------------------|------------------------|
| Product | Fabian Zimmer |
-| Leadership | Craig Gomes |
| Engineering | Andreas Brandl |
<!-- vale gitlab.Spelling = YES -->
diff --git a/doc/architecture/blueprints/runner_scaling/index.md b/doc/architecture/blueprints/runner_scaling/index.md
index 174fe191cc7..c4bd8433ab3 100644
--- a/doc/architecture/blueprints/runner_scaling/index.md
+++ b/doc/architecture/blueprints/runner_scaling/index.md
@@ -36,7 +36,7 @@ rapid growth of CI/CD adoption on GitLab.com.
We can not, however, continue using Docker Machine. Work on that project [was
paused in July 2018](https://github.com/docker/machine/issues/4537) and there
was no development made since that time (except for some highly important
-security fixes). In 2018, after Docker Machine entered the “maintenance mode”,
+security fixes). In 2018, after Docker Machine entered the "maintenance mode",
we decided to create [our own fork](https://gitlab.com/gitlab-org/ci-cd/docker-machine)
to be able to keep using this and ship fixes and updates needed for our use case.
[On September 26th, 2021 the project got archived](https://github.com/docker/docker.github.io/commit/2dc8b49dcbe85686cc7230e17aff8e9944cb47a5)
@@ -48,7 +48,7 @@ new mechanism for GitLab Runner auto-scaling. It not only needs to support
auto-scaling, but it also needs to do that in the way to enable us to build on
top of it to improve efficiency, reliability and availability.
-We call this new mechanism the “next GitLab Runner Scaling architecture”.
+We call this new mechanism the "next GitLab Runner Scaling architecture".
_Disclaimer The following contain information related to upcoming products,
features, and functionality._
@@ -82,11 +82,11 @@ about how people are using Docker Machine right now, and it seems that GitLab
CI is one of the most frequent reasons for people to keep using Docker Machine.
There is also an opportunity in being able to optionally run multiple jobs in a
-single, larger virtual machine. We can’t do that today, but we know that this
+single, larger virtual machine. We can't do that today, but we know that this
can potentially significantly improve efficiency. We might want to build a new
architecture that makes it easier and allows us to test how efficient it is
with PoCs. Running multiple jobs on a single machine can also make it possible
-to reuse what we call a “sticky context” - a space for build artifacts / user
+to reuse what we call a "sticky context" - a space for build artifacts / user
data that can be shared between job runs.
### 💡 Design a simple abstraction that users will be able to build on top of
@@ -165,7 +165,7 @@ sequence diagram.
![GitLab Runner Autoscaling Overview](gitlab-autoscaling-overview.png)
On the diagrams above we see that currently a GitLab Runner Manager runs on a
-machine that has access to a cloud provider’s API. It is using Docker Machine
+machine that has access to a cloud provider's API. It is using Docker Machine
to provision new Virtual Machines with Docker Engine installed and it
configures the Docker daemon there to allow external authenticated requests. It
stores credentials to such ephemeral Docker environments on disk. Once a
@@ -186,8 +186,8 @@ through os/exec system calls.
Thanks to the custom executor abstraction there is no more need to implement
new executors internally in Runner. Users who have specific needs can implement
-their own drivers and don’t need to wait for us to make their work part of the
-“official” GitLab Runner. As each driver is a separate project, it also makes
+their own drivers and don't need to wait for us to make their work part of the
+"official" GitLab Runner. As each driver is a separate project, it also makes
it easier to create communities around them, where interested people can
collaborate together on improvements and bug fixes.
@@ -197,7 +197,7 @@ provide a context and an environment in which a build will be executed by one
of the Custom Executors.
There are multiple solutions to implementing a custom provider abstraction. We
-can use raw Go plugins, Hashcorp’s Go Plugin, HTTP interface or gRPC based
+can use raw Go plugins, Hashcorp's Go Plugin, HTTP interface or gRPC based
facade service. There are many solutions, and we want to choose the most
optimal one. In order to do that, we will describe the solutions in a separate
document, define requirements and score the solution accordingly. This will