Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/database/partitioning/index.md')
-rw-r--r--doc/development/database/partitioning/index.md71
1 files changed, 71 insertions, 0 deletions
diff --git a/doc/development/database/partitioning/index.md b/doc/development/database/partitioning/index.md
new file mode 100644
index 00000000000..78b0c069705
--- /dev/null
+++ b/doc/development/database/partitioning/index.md
@@ -0,0 +1,71 @@
+---
+stage: Data Stores
+group: Database
+info: Any user with at least the Maintainer role can merge updates to this content. For details, see https://docs.gitlab.com/ee/development/development_processes.html#development-guidelines-review.
+---
+
+# Database table partitioning
+
+WARNING:
+If you have questions not answered below, check for and add them
+to [this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/398650).
+Tag `@gitlab-org/database-team/triage` and we'll get back to you with an
+answer as soon as possible. If you get an answer in Slack, document
+it on the issue as well so we can update this document in the future.
+
+Table partitioning is a powerful database feature that allows a table's
+data to be split into smaller physical tables that act as a single large
+table. If the application is designed to work with partitioning in mind,
+there can be multiple benefits, such as:
+
+- Query performance can be improved greatly, because the database can
+ cheaply eliminate much of the data from the search space, while still
+ providing full SQL capabilities.
+
+- Bulk deletes can be achieved with minimal impact on the database by
+ dropping entire partitions. This is a natural fit for features that need
+ to periodically delete data that falls outside the retention window.
+
+- Administrative tasks like `VACUUM` and index rebuilds can operate on
+ individual partitions, rather than across a single massive table.
+
+Unfortunately, not all models fit a partitioning scheme, and there are
+significant drawbacks if implemented incorrectly. Additionally,
+**tables can only be partitioned at their creation**, making it nontrivial
+to apply partitioning to a busy database. A suite of migration tools are available
+to enable backend developers to partition existing tables, but the
+migration process is rather heavy, taking multiple steps split across
+several releases. Due to the limitations of partitioning and the related
+migrations, you should understand how partitioning fits your use case
+before attempting to leverage this feature.
+
+The partitioning migration helpers work by creating a partitioned duplicate
+of the original table and using a combination of a trigger and a background
+migration to copy data into the new table. Changes to the original table
+schema can be made in parallel with the partitioning migration, but they
+must take care to not break the underlying mechanism that makes the migration
+work. For example, if a column is added to the table that is being
+partitioned, both the partitioned table and the trigger definition must
+be updated to match.
+
+## Determine when to use partitioning
+
+While partitioning can be very useful when properly applied, it's
+imperative to identify if the data and workload of a table naturally fit a
+partitioning scheme. Understand a few details to decide if partitioning
+is a good fit for your particular problem:
+
+- **Table partitioning**. A table is partitioned on a partition key, which is a
+ column or set of columns which determine how the data is split across the
+ partitions. The partition key is used by the database when reading or
+ writing data, to decide which partitions must be accessed. The
+ partition key should be a column that would be included in a `WHERE`
+ clause on almost all queries accessing that table.
+
+- **How the data is split**. What strategy does the database use
+ to split the data across the partitions? The available choices are `range`,
+ `hash`, and `list`.
+
+## Determine the appropriate partitioning strategy
+
+The available partitioning strategy choices are `date range`, `int range`, `hash`, and `list`.