diff options
author | GitLab Bot <gitlab-bot@gitlab.com> | 2020-05-22 12:08:09 +0300 |
---|---|---|
committer | GitLab Bot <gitlab-bot@gitlab.com> | 2020-05-22 12:08:09 +0300 |
commit | 4a3ba3e5f261eb09e6b2b4fd44373e7a1c454a72 (patch) | |
tree | 1a94467252ebcc5575c7de6a3590360ce05b9967 /doc/development | |
parent | 707c0eca50cf9a5290806ce637be30f4c7288def (diff) |
Add latest changes from gitlab-org/gitlab@master
Diffstat (limited to 'doc/development')
-rw-r--r-- | doc/development/README.md | 45 | ||||
-rw-r--r-- | doc/development/database/index.md | 48 | ||||
-rw-r--r-- | doc/development/database/strings_and_the_text_data_type.md | 288 | ||||
-rw-r--r-- | doc/development/migration_style_guide.md | 6 |
4 files changed, 343 insertions, 44 deletions
diff --git a/doc/development/README.md b/doc/development/README.md index 22cc21e12b9..458f5cc2350 100644 --- a/doc/development/README.md +++ b/doc/development/README.md @@ -113,50 +113,7 @@ Complementary reads: ## Database guides -### Tooling - -- [Understanding EXPLAIN plans](understanding_explain_plans.md) -- [explain.depesz.com](https://explain.depesz.com/) for visualizing the output - of `EXPLAIN` -- [pgFormatter](http://sqlformat.darold.net/) a PostgreSQL SQL syntax beautifier - -### Migrations - -- [What requires downtime?](what_requires_downtime.md) -- [SQL guidelines](sql.md) for working with SQL queries -- [Migrations style guide](migration_style_guide.md) for creating safe SQL migrations -- [Testing Rails migrations](testing_guide/testing_migrations_guide.md) guide -- [Post deployment migrations](post_deployment_migrations.md) -- [Background migrations](background_migrations.md) -- [Swapping tables](swapping_tables.md) -- [Deleting migrations](deleting_migrations.md) - -### Debugging - -- Tracing the source of an SQL query using query comments with [Marginalia](database_query_comments.md) -- Tracing the source of an SQL query in Rails console using [Verbose Query Logs](https://guides.rubyonrails.org/debugging_rails_applications.html#verbose-query-logs) - -### Best practices - -- [Adding database indexes](adding_database_indexes.md) -- [Foreign keys & associations](foreign_keys.md) -- [Single table inheritance](single_table_inheritance.md) -- [Polymorphic associations](polymorphic_associations.md) -- [Serializing data](serializing_data.md) -- [Hash indexes](hash_indexes.md) -- [Storing SHA1 hashes as binary](sha1_as_binary.md) -- [Iterating tables in batches](iterating_tables_in_batches.md) -- [Insert into tables in batches](insert_into_tables_in_batches.md) -- [Ordering table columns](ordering_table_columns.md) -- [Verifying database capabilities](verifying_database_capabilities.md) -- [Database Debugging and Troubleshooting](database_debugging.md) -- [Query Count Limits](query_count_limits.md) -- [Creating enums](creating_enums.md) - -### Case studies - -- [Database case study: Filtering by label](filtering_by_label.md) -- [Database case study: Namespaces storage statistics](namespaces_storage_statistics.md) +See [database guidelines](database/index.md). ## Integration guides diff --git a/doc/development/database/index.md b/doc/development/database/index.md new file mode 100644 index 00000000000..8935e5f6efa --- /dev/null +++ b/doc/development/database/index.md @@ -0,0 +1,48 @@ +# Database guides + +## Tooling + +- [Understanding EXPLAIN plans](../understanding_explain_plans.md) +- [explain.depesz.com](https://explain.depesz.com/) for visualizing the output + of `EXPLAIN` +- [pgFormatter](http://sqlformat.darold.net/) a PostgreSQL SQL syntax beautifier + +## Migrations + +- [What requires downtime?](../what_requires_downtime.md) +- [SQL guidelines](../sql.md) for working with SQL queries +- [Migrations style guide](../migration_style_guide.md) for creating safe SQL migrations +- [Testing Rails migrations](../testing_guide/testing_migrations_guide.md) guide +- [Post deployment migrations](../post_deployment_migrations.md) +- [Background migrations](../background_migrations.md) +- [Swapping tables](../swapping_tables.md) +- [Deleting migrations](../deleting_migrations.md) + +## Debugging + +- Tracing the source of an SQL query using query comments with [Marginalia](../database_query_comments.md) +- Tracing the source of an SQL query in Rails console using [Verbose Query Logs](https://guides.rubyonrails.org/debugging_rails_applications.html#verbose-query-logs) + +## Best practices + +- [Adding database indexes](../adding_database_indexes.md) +- [Foreign keys & associations](../foreign_keys.md) +- [Adding a foreign key constraint to an existing column](add_foreign_key_to_existing_column.md) +- [Strings and the Text data type](strings_and_the_text_data_type.md) +- [Single table inheritance](../single_table_inheritance.md) +- [Polymorphic associations](../polymorphic_associations.md) +- [Serializing data](../serializing_data.md) +- [Hash indexes](../hash_indexes.md) +- [Storing SHA1 hashes as binary](../sha1_as_binary.md) +- [Iterating tables in batches](../iterating_tables_in_batches.md) +- [Insert into tables in batches](../insert_into_tables_in_batches.md) +- [Ordering table columns](../ordering_table_columns.md) +- [Verifying database capabilities](../verifying_database_capabilities.md) +- [Database Debugging and Troubleshooting](../database_debugging.md) +- [Query Count Limits](../query_count_limits.md) +- [Creating enums](../creating_enums.md) + +## Case studies + +- [Database case study: Filtering by label](../filtering_by_label.md) +- [Database case study: Namespaces storage statistics](../namespaces_storage_statistics.md) diff --git a/doc/development/database/strings_and_the_text_data_type.md b/doc/development/database/strings_and_the_text_data_type.md new file mode 100644 index 00000000000..67e7f896b61 --- /dev/null +++ b/doc/development/database/strings_and_the_text_data_type.md @@ -0,0 +1,288 @@ +# Strings and the Text data type + +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/30453) in GitLab 13.0. + +When adding new columns that will be used to store strings or other textual information: + +1. We always use the `text` data type instead of the `string` data type. +1. `text` columns should always have a limit set by using the `add_text_limit` migration helper. + +The `text` data type can not be defined with a limit, so `add_text_limit` is enforcing that by +adding a [check constraint](https://www.postgresql.org/docs/11/ddl-constraints.html) on the +column and then validating it at a followup step. + +## Background info + +The reason we always want to use `text` instead of `string` is that `string` columns have the +disadvantage that if you want to update their limit, you have to run an `ALTER TABLE ...` command. + +While a limit is added, the `ALTER TABLE ...` command requires an `EXCLUSIVE LOCK` on the table, which +is held throughout the process of updating the column and while validating all existing records, a +process that can take a while for large tables. + +On the other hand, texts are [more or less equivalent to strings](https://www.depesz.com/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/) in PostgreSQL, +while having the additional advantage that adding a limit on an existing column or updating their +limit does not require the very costly `EXCLUSIVE LOCK` to be held throughout the validation phase. +We can start by updating the constraint with the valid option off, which requires an `EXCLUSIVE LOCK` +but only for updating the declaration of the columns. We can then validate it at a later step using +`VALIDATE CONSTRAINT`, which requires only a `SHARE UPDATE EXCLUSIVE LOCK` (only conflicts with other +validations and index creation while it allows reads and writes). + +## Create a new table with text columns + +When adding a new table, the limits for all text columns should be added in the same migration as +the table creation. + +For example, consider a migration that creates a table with two text columns, +**db/migrate/20200401000001_create_db_guides.rb**: + +```ruby +class CreateDbGuides < ActiveRecord::Migration[6.0] + DOWNTIME = false + + disable_ddl_transaction! + + def up + unless table_exists?(:db_guides) + create_table :db_guides do |t| + t.bigint :stars, default: 0, null: false + t.text :title + t.text :notes + end + end + + # The following add the constraints and validate them immediately (no data in the table) + add_text_limit :db_guides, :title, 128 + add_text_limit :db_guides, :notes, 1024 + end + + def down + # No need to drop the constraints, drop_table takes care of everything + drop_table :db_guides + end +end +``` + +Adding a check constraint requires an exclusive lock while the `ALTER TABLE` that adds is running. +As we don't want the exclusive lock to be held for the duration of a transaction, `add_text_limit` +must always run in a migration with `disable_ddl_transaction!`. + +Also, note that we have to add a check that the table exists so that the migration can be repeated +in case of a failure. + +## Add a text column to an existing table + +Adding a column to an existing table requires an exclusive lock for that table. Even though that lock +is held for a brief amount of time, the time `add_column` needs to complete its execution can vary +depending on how frequently the table is accessed. For example, acquiring an exclusive lock for a very +frequently accessed table may take minutes in GitLab.com and requires the use of `with_lock_retries`. + +For these reasons, it is advised to add the text limit on a separate migration than the `add_column` one. + +For example, consider a migration that adds a new text column `extended_title` to table `sprints`, +**db/migrate/20200501000001_add_extended_title_to_sprints.rb**: + +```ruby +class AddExtendedTitleToSprints < ActiveRecord::Migration[6.0] + DOWNTIME = false + + # rubocop:disable Migration/AddLimitToTextColumns + # limit is added in 20200501000002_add_text_limit_to_sprints_extended_title + def change + add_column :sprints, :extended_title, :text + end + # rubocop:enable Migration/AddLimitToTextColumns +end +``` + +A second migration should follow the first one with a limit added to `extended_title`, +**db/migrate/20200501000002_add_text_limit_to_sprints_extended_title.rb**: + +```ruby +class AddTextLimitToSprintsExtendedTitle < ActiveRecord::Migration[6.0] + include Gitlab::Database::MigrationHelpers + DOWNTIME = false + + disable_ddl_transaction! + + def up + add_text_limit :sprints, :extended_title, 512 + end + + def down + # Down is required as `add_text_limit` is not reversible + remove_text_limit :sprints, :extended_title + end +end +``` + +## Add a text limit constraint to an existing column + +Adding text limits to existing database columns requires multiple steps split into at least two different releases: + +1. Release `N.M` (current release) + + - Add a post-deployment migration to add the limit to the text column with `validate: false`. + - Add a post-deployment migration to fix the existing records. + + NOTE: **Note:** + Depending on the size of the table, a background migration for cleanup could be required in the next release. + See [text limit constraints on large tables](strings_and_the_text_data_type.md#text-limit-constraints-on-large-tables) for more information. + + - Create an issue for the next milestone to validate the text limit. + +1. Release `N.M+1` (next release) + + - Validate the text limit using a post-deployment migration. + +### Example + +Let's assume we want to add a `1024` limit to `issues.title_html` for a given release milestone, +such as 13.0. + +Issues is a pretty busy and large table with more than 25 million rows, so we don't want to lock all +other processes that try to access it while running the update. + +Also, after checking our production database, we know that there are `issues` with more characters in +their title than the 1024 character limit, so we can not add and validate the constraint in one step. + +NOTE: **Note:** +Even if we did not have any record with a title larger than the provided limit, another +instance of GitLab could have such records, so we would follow the same process either way. + +#### Prevent new invalid records (current release) + +We first add the limit as a `NOT VALID` check constraint to the table, which enforces consistency when +new records are inserted or current records are updated. + +In the example above, the existing issues with more than 1024 characters in their title will not be +affected and you'll be still able to update records in the `issues` table. However, when you'd try +to update the `title_html` with a title that has more than 1024 characters, the constraint causes +a database error. + +Adding or removing a constraint to an existing attribute requires that any application changes are +deployed _first_, [otherwise servers still in the old version of the application may try to update the +attribute with invalid values](../multi_version_compatibility.md#ci-artifact-uploads-were-failing). +For these reasons, `add_text_limit` should run in a post-deployment migration. + +Still in our example, for the 13.0 milestone (current), consider that the following validation +has been added to model `Issue`: + +```ruby +validates :title_html, length: { maximum: 1024 } +``` + +We can also update the database in the same milestone by adding the text limit with `validate: false` +in a post-deployment migration, +**db/post_migrate/20200501000001_add_text_limit_migration.rb**: + +```ruby +class AddTextLimitMigration < ActiveRecord::Migration[6.0] + include Gitlab::Database::MigrationHelpers + DOWNTIME = false + + disable_ddl_transaction! + + def up + # This will add the constraint WITHOUT validating it + add_text_limit :issues, :title_html, 1024, validate: false + end + + def down + # Down is required as `add_text_limit` is not reversible + remove_text_limit :issues, :title_html + end +end +``` + +#### Data migration to fix existing records (current release) + +The approach here depends on the data volume and the cleanup strategy. The number of records that must +be fixed on GitLab.com is a nice indicator that will help us decide whether to use a post-deployment +migration or a background data migration: + +- If the data volume is less than `1,000` records, then the data migration can be executed within the post-migration. +- If the data volume is higher than `1,000` records, it's advised to create a background migration. + +When unsure about which option to use, please contact the Database team for advice. + +Back to our example, the issues table is considerably large and frequently accessed, so we are going +to add a background migration for the 13.0 milestone (current), +**db/post_migrate/20200501000002_schedule_cap_title_length_on_issues.rb**: + +```ruby +class ScheduleCapTitleLengthOnIssues < ActiveRecord::Migration[6.0] + include Gitlab::Database::MigrationHelpers + + # Info on how many records will be affected on GitLab.com + # time each batch needs to run on average, etc ... + BATCH_SIZE = 5000 + DELAY_INTERVAL = 2.minutes.to_i + + # Background migration will update issues whose title is longer than 1024 limit + ISSUES_BACKGROUND_MIGRATION = 'CapTitleLengthOnIssues'.freeze + + disable_ddl_transaction! + + class Issue < ActiveRecord::Base + include EachBatch + + self.table_name = 'issues' + end + + def up + queue_background_migration_jobs_by_range_at_intervals( + Issue.where('char_length(title_html) > 1024'), + ISSUES_MIGRATION, + DELAY_INTERVAL, + batch_size: BATCH_SIZE + ) + end + + def down + # no-op : the part of the title_html after the limit is lost forever + end +end +``` + +To keep this guide short, we skipped the definition of the background migration and only +provided a high level example of the post-deployment migration that is used to schedule the batches. +You can find more info on the guide about [background migrations](../background_migrations.md) + +#### Validate the text limit (next release) + +Validating the text limit will scan the whole table and make sure that each record is correct. + +Still in our example, for the 13.1 milestone (next), we run the `validate_text_limit` migration +helper in a final post-deployment migration, +**db/post_migrate/20200601000001_validate_text_limit_migration.rb**: + +```ruby +class ValidateTextLimitMigration < ActiveRecord::Migration[6.0] + include Gitlab::Database::MigrationHelpers + DOWNTIME = false + + disable_ddl_transaction! + + def up + validate_text_limit :issues, :title_html + end + + def down + # no-op + end +end +``` + +## Text limit constraints on large tables + +If you have to clean up a text column for a really [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/migration_helpers.rb#L12) +(for example, the `artifacts` in `ci_builds`), your background migration will go on for a while and +it will need an additional [background migration cleaning up](../background_migrations.md#cleaning-up) +in the release after adding the data migration. + +In that rare case you will need 3 releases end-to-end: + +1. Release `N.M` - Add the text limit and the background migration to fix the existing records. +1. Release `N.M+1` - Cleanup the background migration. +1. Release `N.M+2` - Validate the text limit. diff --git a/doc/development/migration_style_guide.md b/doc/development/migration_style_guide.md index 0db214e1401..46fc94a4b2d 100644 --- a/doc/development/migration_style_guide.md +++ b/doc/development/migration_style_guide.md @@ -727,6 +727,12 @@ Rails migration example: add_column(:projects, :foo, :integer, default: 10, limit: 8) ``` +## Strings and the Text data type + +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/30453) in GitLab 13.0. + +See the [text data type](database/strings_and_the_text_data_type.md) style guide for more information. + ## Timestamp column type By default, Rails uses the `timestamp` data type that stores timestamp data |