diff options
Diffstat (limited to 'doc/development/database/partitioning/list.md')
-rw-r--r-- | doc/development/database/partitioning/list.md | 372 |
1 files changed, 372 insertions, 0 deletions
diff --git a/doc/development/database/partitioning/list.md b/doc/development/database/partitioning/list.md new file mode 100644 index 00000000000..74ce5a00f6a --- /dev/null +++ b/doc/development/database/partitioning/list.md @@ -0,0 +1,372 @@ +--- +stage: Data Stores +group: Database +info: Any user with at least the Maintainer role can merge updates to this content. For details, see https://docs.gitlab.com/ee/development/development_processes.html#development-guidelines-review. +--- + +# List partition + +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/96815) in GitLab 15.4. + +## Description + +Add the partitioning key column to the table you are partitioning. +Include the partitioning key in the following constraints: + +- The primary key. +- All foreign keys referencing the table to be partitioned. +- All unique constraints. + +## Example + +### Step 1 - Add partition key + +Add the partitioning key column. For example, in a rails migration: + +```ruby +class AddPartitionNumberForPartitioning < Gitlab::Database::Migration[2.1] + enable_lock_retries! + + TABLE_NAME = :table_name + COLUMN_NAME = :partition_id + DEFAULT_VALUE = 100 + + def change + add_column(TABLE_NAME, COLUMN_NAME, :bigint, default: 100) + end +end +``` + +### Step 2 - Create required indexes + +Add indexes including the partitioning key column. For example, in a rails migration: + +```ruby +class PrepareIndexesForPartitioning < Gitlab::Database::Migration[2.1] + disable_ddl_transaction! + + TABLE_NAME = :table_name + INDEX_NAME = :index_name + + def up + add_concurrent_index(TABLE_NAME, [:id, :partition_id], unique: true, name: INDEX_NAME) + end + + def down + remove_concurrent_index_by_name(TABLE_NAME, INDEX_NAME) + end +end +``` + +### Step 3 - Enforce unique constraint + +Change all unique indexes to include the partitioning key column, +including the primary key index. You can start by adding an unique +index on `[primary_key_column, :partition_id]`, which will be +required for the next two steps. For example, in a rails migration: + +```ruby +class PrepareUniqueContraintForPartitioning < Gitlab::Database::Migration[2.1] + disable_ddl_transaction! + + TABLE_NAME = :table_name + OLD_UNIQUE_INDEX_NAME = :index_name_unique + NEW_UNIQUE_INDEX_NAME = :new_index_name + + def up + add_concurrent_index(TABLE_NAME, [:id, :partition_id], unique: true, name: NEW_UNIQUE_INDEX_NAME) + + remove_concurrent_index_by_name(TABLE_NAME, OLD_UNIQUE_INDEX_NAME) + end + + def down + add_concurrent_index(TABLE_NAME, :id, unique: true, name: OLD_UNIQUE_INDEX_NAME) + + remove_concurrent_index_by_name(TABLE_NAME, NEW_UNIQUE_INDEX_NAME) + end +end +``` + +### Step 4 - Enforce foreign key constraint + +Enforce foreign keys including the partitioning key column. For example, in a rails migration: + +```ruby +class PrepareForeignKeyForPartitioning < Gitlab::Database::Migration[2.1] + disable_ddl_transaction! + + SOURCE_TABLE_NAME = :source_table_name + TARGET_TABLE_NAME = :target_table_name + COLUMN = :foreign_key_id + TARGET_COLUMN = :id + FK_NAME = :fk_365d1db505_p + PARTITION_COLUMN = :partition_id + + def up + add_concurrent_foreign_key( + SOURCE_TABLE_NAME, + TARGET_TABLE_NAME, + column: [PARTITION_COLUMN, COLUMN], + target_column: [PARTITION_COLUMN, TARGET_COLUMN], + validate: false, + on_update: :cascade, + name: FK_NAME + ) + + # This should be done in a separate post migration when dealing with a high traffic table + validate_foreign_key(TABLE_NAME, [PARTITION_COLUMN, COLUMN], name: FK_NAME) + end + + def down + with_lock_retries do + remove_foreign_key_if_exists(SOURCE_TABLE_NAME, name: FK_NAME) + end + end +end +``` + +The `on_update: :cascade` option is mandatory if we want the partitioning column +to be updated. This will cascade the update to all dependent rows. Without +specifying it, updating the partition column on the target table we would +result in a `Key is still referenced from table ...` error and updating the +partition column on the source table would raise a +`Key is not present in table ...` error. + +This migration can be automatically generated using: + +```shell +./scripts/partitioning/generate-fk --source source_table_name --target target_table_name +``` + +### Step 5 - Swap primary key + +Swap the primary key including the partitioning key column. This can be done only after +including the partition key for all references foreign keys. For example, in a rails migration: + +```ruby +class PreparePrimaryKeyForPartitioning < Gitlab::Database::Migration[2.1] + disable_ddl_transaction! + + TABLE_NAME = :table_name + PRIMARY_KEY = :primary_key + OLD_INDEX_NAME = :old_index_name + NEW_INDEX_NAME = :new_index_name + + def up + swap_primary_key(TABLE_NAME, PRIMARY_KEY, NEW_INDEX_NAME) + end + + def down + add_concurrent_index(TABLE_NAME, :id, unique: true, name: OLD_INDEX_NAME) + add_concurrent_index(TABLE_NAME, [:id, :partition_id], unique: true, name: NEW_INDEX_NAME) + + unswap_primary_key(TABLE_NAME, PRIMARY_KEY, OLD_INDEX_NAME) + end +end +``` + +NOTE: +Do not forget to set the primary key explicitly in your model as `ActiveRecord` does not support composite primary keys. + +```ruby +class Model < ApplicationRecord + self.primary_key = :id +end +``` + +### Step 6 - Create parent table and attach existing table as the initial partition + +You can now create the parent table attaching the existing table as the initial +partition by using the following helpers provided by the database team. + +For example, using list partitioning in Rails post migrations: + +```ruby +class PrepareTableConstraintsForListPartitioning < Gitlab::Database::Migration[2.1] + include Gitlab::Database::PartitioningMigrationHelpers::TableManagementHelpers + + disable_ddl_transaction! + + TABLE_NAME = :table_name + PARENT_TABLE_NAME = :p_table_name + FIRST_PARTITION = 100 + PARTITION_COLUMN = :partition_id + + def up + prepare_constraint_for_list_partitioning( + table_name: TABLE_NAME, + partitioning_column: PARTITION_COLUMN, + parent_table_name: PARENT_TABLE_NAME, + initial_partitioning_value: FIRST_PARTITION + ) + end + + def down + revert_preparing_constraint_for_list_partitioning( + table_name: TABLE_NAME, + partitioning_column: PARTITION_COLUMN, + parent_table_name: PARENT_TABLE_NAME, + initial_partitioning_value: FIRST_PARTITION + ) + end +end +``` + +```ruby +class ConvertTableToListPartitioning < Gitlab::Database::Migration[2.1] + include Gitlab::Database::PartitioningMigrationHelpers::TableManagementHelpers + + disable_ddl_transaction! + + TABLE_NAME = :table_name + TABLE_FK = :table_references_by_fk + PARENT_TABLE_NAME = :p_table_name + FIRST_PARTITION = 100 + PARTITION_COLUMN = :partition_id + + def up + convert_table_to_first_list_partition( + table_name: TABLE_NAME, + partitioning_column: PARTITION_COLUMN, + parent_table_name: PARENT_TABLE_NAME, + initial_partitioning_value: FIRST_PARTITION, + lock_tables: [TABLE_FK, TABLE_NAME] + ) + end + + def down + revert_converting_table_to_first_list_partition( + table_name: TABLE_NAME, + partitioning_column: PARTITION_COLUMN, + parent_table_name: PARENT_TABLE_NAME, + initial_partitioning_value: FIRST_PARTITION + ) + end +end +``` + +NOTE: +Do not forget to set the sequence name explicitly in your model because it will +be owned by the routing table and `ActiveRecord` can't determine it. This can +be cleaned up after the `table_name` is changed to the routing table. + +```ruby +class Model < ApplicationRecord + self.sequence_name = 'model_id_seq' +end +``` + +If the partitioning constraint migration takes [more than 10 minutes](../../migration_style_guide.md#how-long-a-migration-should-take) to finish, +it can be made to run asynchronously to avoid running the post-migration during busy hours. + +Prepend the following migration `AsyncPrepareTableConstraintsForListPartitioning` +and use `async: true` option. This change marks the partitioning constraint as `NOT VALID` +and enqueues a scheduled job to validate the existing data in the table during the weekend. + +Then the second post-migration `PrepareTableConstraintsForListPartitioning` only +marks the partitioning constraint as validated, because the existing data is already +tested during the previous weekend. + +For example: + +```ruby +class AsyncPrepareTableConstraintsForListPartitioning < Gitlab::Database::Migration[2.1] + include Gitlab::Database::PartitioningMigrationHelpers::TableManagementHelpers + + disable_ddl_transaction! + + TABLE_NAME = :table_name + PARENT_TABLE_NAME = :p_table_name + FIRST_PARTITION = 100 + PARTITION_COLUMN = :partition_id + + def up + prepare_constraint_for_list_partitioning( + table_name: TABLE_NAME, + partitioning_column: PARTITION_COLUMN, + parent_table_name: PARENT_TABLE_NAME, + initial_partitioning_value: FIRST_PARTITION, + async: true + ) + end + + def down + revert_preparing_constraint_for_list_partitioning( + table_name: TABLE_NAME, + partitioning_column: PARTITION_COLUMN, + parent_table_name: PARENT_TABLE_NAME, + initial_partitioning_value: FIRST_PARTITION + ) + end +end +``` + +### Step 7 - Re-point foreign keys to parent table + +The tables that reference the initial partition must be updated to point to the +parent table now. Without this change, the records from those tables +will not be able to locate the rows in the next partitions because they will look +for them in the initial partition. + +Steps: + +- Add the foreign key to the partitioned table and validate it asynchronously, + [for example](https://gitlab.com/gitlab-org/gitlab/-/blob/65d63f6a00196c3a7d59f15191920f271ab2b145/db/post_migrate/20230524135543_replace_ci_build_pending_states_foreign_key.rb). +- Validate it synchronously after the asynchronously validation was completed on GitLab.com, + [for example](https://gitlab.com/gitlab-org/gitlab/-/blob/65d63f6a00196c3a7d59f15191920f271ab2b145/db/post_migrate/20230530140456_validate_fk_ci_build_pending_states_p_ci_builds.rb). +- Remove the old foreign key and rename the new one to the old name, + [for example](https://gitlab.com/gitlab-org/gitlab/-/blob/65d63f6a00196c3a7d59f15191920f271ab2b145/db/post_migrate/20230615083713_replace_old_fk_ci_build_pending_states_to_builds.rb#L9). + +### Step 8 - Ensure ID uniqueness across partitions + +All uniqueness constraints must include the partitioning key, so we can have +duplicate IDs across partitions. To solve this we enforce that only the database +can set the ID values and use a sequence to generate them because sequences are +guaranteed to generate unique values. + +For example: + +```ruby +class EnsureIdUniquenessForPCiBuilds < Gitlab::Database::Migration[2.1] + include Gitlab::Database::PartitioningMigrationHelpers::UniquenessHelpers + + enable_lock_retries! + + TABLE_NAME = :p_ci_builds + FUNCTION_NAME = :assign_p_ci_builds_id_value + + def up + ensure_unique_id(TABLE_NAME) + end + + def down + execute(<<~SQL.squish) + ALTER TABLE #{TABLE_NAME} + ALTER COLUMN id SET DEFAULT nextval('ci_builds_id_seq'::regclass); + + DROP FUNCTION IF EXISTS #{FUNCTION_NAME} CASCADE; + SQL + end +``` + +### Step 9 - Analyze the partitioned table and create new partitions + +The autovacuum daemon does not process partitioned tables. It is necessary to +periodically run a manual `ANALYZE` to keep the statistics of the table hierarchy +up to date. + +Models that implement `Ci::Partitionable` with `partitioned: true` option are +analyzed by default on a weekly basis. To enable this and create new partitions +you need to register the model in the [PostgreSQL initializer](https://gitlab.com/gitlab-org/gitlab/-/blob/b7f0e3f1bcd2ffc220768bbc373364151775ca8e/config/initializers/postgres_partitioning.rb). + +### Step 10 - Update the application to use the partitioned table + +Now that the parent table is ready, we can update the application to use it: + +```ruby +class Model < ApplicationRecord + self.table_name = :partitioned_table +end +``` + +Depending on the model, it might be safer to use a [change management issue](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16387). |