Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development')
-rw-r--r--doc/development/adding_database_indexes.md317
-rw-r--r--doc/development/adding_service_component.md4
-rw-r--r--doc/development/api_graphql_styleguide.md162
-rw-r--r--doc/development/api_styleguide.md16
-rw-r--r--doc/development/application_limits.md3
-rw-r--r--doc/development/application_slis/index.md13
-rw-r--r--doc/development/application_slis/rails_request_apdex.md3
-rw-r--r--doc/development/architecture.md2
-rw-r--r--doc/development/audit_event_guide/index.md23
-rw-r--r--doc/development/auto_devops.md4
-rw-r--r--doc/development/backend/create_source_code_be/index.md2
-rw-r--r--doc/development/backend/ruby_style_guide.md2
-rw-r--r--doc/development/build_test_package.md4
-rw-r--r--doc/development/cached_queries.md2
-rw-r--r--doc/development/caching.md8
-rw-r--r--doc/development/changelog.md4
-rw-r--r--doc/development/chatops_on_gitlabcom.md2
-rw-r--r--doc/development/code_intelligence/index.md4
-rw-r--r--doc/development/code_review.md116
-rw-r--r--doc/development/contributing/design.md6
-rw-r--r--doc/development/contributing/index.md14
-rw-r--r--doc/development/contributing/issue_workflow.md4
-rw-r--r--doc/development/contributing/merge_request_workflow.md152
-rw-r--r--doc/development/creating_enums.md157
-rw-r--r--doc/development/database/add_foreign_key_to_existing_column.md4
-rw-r--r--doc/development/database/adding_database_indexes.md410
-rw-r--r--doc/development/database/avoiding_downtime_in_migrations.md40
-rw-r--r--doc/development/database/background_migrations.md6
-rw-r--r--doc/development/database/batched_background_migrations.md198
-rw-r--r--doc/development/database/ci_mirrored_tables.md156
-rw-r--r--doc/development/database/client_side_connection_pool.md13
-rw-r--r--doc/development/database/creating_enums.md154
-rw-r--r--doc/development/database/database_debugging.md177
-rw-r--r--doc/development/database/database_dictionary.md51
-rw-r--r--doc/development/database/database_lab.md2
-rw-r--r--doc/development/database/database_query_comments.md62
-rw-r--r--doc/development/database/database_reviewer_guidelines.md10
-rw-r--r--doc/development/database/db_dump.md56
-rw-r--r--doc/development/database/filtering_by_label.md179
-rw-r--r--doc/development/database/foreign_keys.md199
-rw-r--r--doc/development/database/hash_indexes.md26
-rw-r--r--doc/development/database/index.md52
-rw-r--r--doc/development/database/insert_into_tables_in_batches.md196
-rw-r--r--doc/development/database/iterating_tables_in_batches.md598
-rw-r--r--doc/development/database/loose_foreign_keys.md11
-rw-r--r--doc/development/database/multiple_databases.md9
-rw-r--r--doc/development/database/namespaces_storage_statistics.md193
-rw-r--r--doc/development/database/not_null_constraints.md4
-rw-r--r--doc/development/database/ordering_table_columns.md152
-rw-r--r--doc/development/database/pagination_guidelines.md2
-rw-r--r--doc/development/database/pagination_performance_guidelines.md10
-rw-r--r--doc/development/database/polymorphic_associations.md152
-rw-r--r--doc/development/database/post_deployment_migrations.md4
-rw-r--r--doc/development/database/query_count_limits.md70
-rw-r--r--doc/development/database/query_performance.md74
-rw-r--r--doc/development/database/query_recorder.md145
-rw-r--r--doc/development/database/rename_database_tables.md6
-rw-r--r--doc/development/database/serializing_data.md90
-rw-r--r--doc/development/database/sha1_as_binary.md42
-rw-r--r--doc/development/database/single_table_inheritance.md63
-rw-r--r--doc/development/database/strings_and_the_text_data_type.md7
-rw-r--r--doc/development/database/swapping_tables.md51
-rw-r--r--doc/development/database/transaction_guidelines.md2
-rw-r--r--doc/development/database/understanding_explain_plans.md829
-rw-r--r--doc/development/database/verifying_database_capabilities.md38
-rw-r--r--doc/development/database_debugging.md180
-rw-r--r--doc/development/database_query_comments.md65
-rw-r--r--doc/development/database_review.md23
-rw-r--r--doc/development/db_dump.md59
-rw-r--r--doc/development/deprecation_guidelines/index.md22
-rw-r--r--doc/development/distributed_tracing.md16
-rw-r--r--doc/development/documentation/restful_api_styleguide.md5
-rw-r--r--doc/development/documentation/site_architecture/deployment_process.md22
-rw-r--r--doc/development/documentation/site_architecture/folder_structure.md2
-rw-r--r--doc/development/documentation/site_architecture/global_nav.md36
-rw-r--r--doc/development/documentation/site_architecture/index.md257
-rw-r--r--doc/development/documentation/structure.md61
-rw-r--r--doc/development/documentation/styleguide/index.md117
-rw-r--r--doc/development/documentation/styleguide/word_list.md20
-rw-r--r--doc/development/documentation/testing.md68
-rw-r--r--doc/development/documentation/versions.md6
-rw-r--r--doc/development/ee_features.md339
-rw-r--r--doc/development/elasticsearch.md34
-rw-r--r--doc/development/emails.md9
-rw-r--r--doc/development/event_store.md11
-rw-r--r--doc/development/fe_guide/accessibility.md2
-rw-r--r--doc/development/fe_guide/architecture.md4
-rw-r--r--doc/development/fe_guide/content_editor.md1
-rw-r--r--doc/development/fe_guide/design_anti_patterns.md6
-rw-r--r--doc/development/fe_guide/development_process.md8
-rw-r--r--doc/development/fe_guide/frontend_faq.md6
-rw-r--r--doc/development/fe_guide/graphql.md68
-rw-r--r--doc/development/fe_guide/icons.md2
-rw-r--r--doc/development/fe_guide/index.md2
-rw-r--r--doc/development/fe_guide/merge_request_widget_extensions.md437
-rw-r--r--doc/development/fe_guide/performance.md25
-rw-r--r--doc/development/fe_guide/source_editor.md2
-rw-r--r--doc/development/fe_guide/storybook.md2
-rw-r--r--doc/development/fe_guide/style/javascript.md3
-rw-r--r--doc/development/fe_guide/style/scss.md2
-rw-r--r--doc/development/fe_guide/style/vue.md18
-rw-r--r--doc/development/fe_guide/tooling.md2
-rw-r--r--doc/development/fe_guide/troubleshooting.md2
-rw-r--r--doc/development/fe_guide/view_component.md45
-rw-r--r--doc/development/fe_guide/vue.md10
-rw-r--r--doc/development/fe_guide/vuex.md6
-rw-r--r--doc/development/fe_guide/widgets.md4
-rw-r--r--doc/development/feature_categorization/index.md6
-rw-r--r--doc/development/feature_development.md4
-rw-r--r--doc/development/feature_flags/controls.md2
-rw-r--r--doc/development/feature_flags/index.md20
-rw-r--r--doc/development/features_inside_dot_gitlab.md4
-rw-r--r--doc/development/filtering_by_label.md182
-rw-r--r--doc/development/fips_compliance.md105
-rw-r--r--doc/development/foreign_keys.md203
-rw-r--r--doc/development/gemfile.md17
-rw-r--r--doc/development/geo.md2
-rw-r--r--doc/development/geo/proxying.md4
-rw-r--r--doc/development/git_object_deduplication.md13
-rw-r--r--doc/development/gitaly.md3
-rw-r--r--doc/development/github_importer.md4
-rw-r--r--doc/development/gitlab_flavored_markdown/specification_guide/index.md49
-rw-r--r--doc/development/go_guide/dependencies.md6
-rw-r--r--doc/development/go_guide/index.md71
-rw-r--r--doc/development/gotchas.md3
-rw-r--r--doc/development/hash_indexes.md29
-rw-r--r--doc/development/i18n/externalization.md12
-rw-r--r--doc/development/i18n/proofreader.md2
-rw-r--r--doc/development/image_scaling.md2
-rw-r--r--doc/development/import_export.md4
-rw-r--r--doc/development/import_project.md2
-rw-r--r--doc/development/index.md4
-rw-r--r--doc/development/insert_into_tables_in_batches.md199
-rw-r--r--doc/development/integrations/index.md2
-rw-r--r--doc/development/integrations/secure.md17
-rw-r--r--doc/development/internal_api/index.md51
-rw-r--r--doc/development/issue_types.md6
-rw-r--r--doc/development/iterating_tables_in_batches.md601
-rw-r--r--doc/development/jh_features_review.md3
-rw-r--r--doc/development/kubernetes.md2
-rw-r--r--doc/development/lfs.md9
-rw-r--r--doc/development/licensed_feature_availability.md75
-rw-r--r--doc/development/logging.md19
-rw-r--r--doc/development/merge_request_concepts/index.md31
-rw-r--r--doc/development/merge_request_concepts/widget_extensions.md11
-rw-r--r--doc/development/merge_request_performance_guidelines.md9
-rw-r--r--doc/development/migration_style_guide.md56
-rw-r--r--doc/development/module_with_instance_variables.md6
-rw-r--r--doc/development/multi_version_compatibility.md2
-rw-r--r--doc/development/namespaces_storage_statistics.md196
-rw-r--r--doc/development/new_fe_guide/development/accessibility.md55
-rw-r--r--doc/development/new_fe_guide/development/components.md30
-rw-r--r--doc/development/new_fe_guide/development/index.md26
-rw-r--r--doc/development/new_fe_guide/development/performance.md25
-rw-r--r--doc/development/new_fe_guide/index.md25
-rw-r--r--doc/development/new_fe_guide/modules/dirty_submit.md31
-rw-r--r--doc/development/new_fe_guide/modules/index.md18
-rw-r--r--doc/development/new_fe_guide/modules/widget_extensions.md358
-rw-r--r--doc/development/new_fe_guide/tips.md38
-rw-r--r--doc/development/ordering_table_columns.md155
-rw-r--r--doc/development/pages/index.md48
-rw-r--r--doc/development/performance.md10
-rw-r--r--doc/development/permissions.md2
-rw-r--r--doc/development/pipelines.md14
-rw-r--r--doc/development/policies.md3
-rw-r--r--doc/development/polymorphic_associations.md155
-rw-r--r--doc/development/query_count_limits.md73
-rw-r--r--doc/development/query_performance.md77
-rw-r--r--doc/development/query_recorder.md148
-rw-r--r--doc/development/rails_update.md4
-rw-r--r--doc/development/real_time.md4
-rw-r--r--doc/development/redis/new_redis_instance.md6
-rw-r--r--doc/development/reusing_abstractions.md55
-rw-r--r--doc/development/routing.md4
-rw-r--r--doc/development/scalability.md39
-rw-r--r--doc/development/secure_coding_guidelines.md2
-rw-r--r--doc/development/serializing_data.md93
-rw-r--r--doc/development/service_measurement.md2
-rw-r--r--doc/development/service_ping/implement.md27
-rw-r--r--doc/development/service_ping/index.md6
-rw-r--r--doc/development/service_ping/metrics_dictionary.md6
-rw-r--r--doc/development/service_ping/metrics_instrumentation.md12
-rw-r--r--doc/development/service_ping/performance_indicator_metrics.md5
-rw-r--r--doc/development/service_ping/review_guidelines.md6
-rw-r--r--doc/development/service_ping/usage_data.md2
-rw-r--r--doc/development/sha1_as_binary.md45
-rw-r--r--doc/development/shell_commands.md2
-rw-r--r--doc/development/sidekiq/compatibility_across_updates.md13
-rw-r--r--doc/development/sidekiq/idempotent_jobs.md5
-rw-r--r--doc/development/sidekiq/index.md16
-rw-r--r--doc/development/sidekiq/logging.md5
-rw-r--r--doc/development/sidekiq/worker_attributes.md6
-rw-r--r--doc/development/single_table_inheritance.md66
-rw-r--r--doc/development/snowplow/implementation.md2
-rw-r--r--doc/development/snowplow/index.md7
-rw-r--r--doc/development/snowplow/infrastructure.md6
-rw-r--r--doc/development/snowplow/review_guidelines.md2
-rw-r--r--doc/development/sql.md5
-rw-r--r--doc/development/stage_group_observability/dashboards/stage_group_dashboard.md2
-rw-r--r--doc/development/swapping_tables.md54
-rw-r--r--doc/development/testing_guide/best_practices.md11
-rw-r--r--doc/development/testing_guide/contract/consumer_tests.md113
-rw-r--r--doc/development/testing_guide/contract/index.md14
-rw-r--r--doc/development/testing_guide/end_to_end/best_practices.md4
-rw-r--r--doc/development/testing_guide/end_to_end/feature_flags.md4
-rw-r--r--doc/development/testing_guide/end_to_end/index.md23
-rw-r--r--doc/development/testing_guide/end_to_end/rspec_metadata_tests.md1
-rw-r--r--doc/development/testing_guide/end_to_end/running_tests_that_require_special_setup.md83
-rw-r--r--doc/development/testing_guide/frontend_testing.md9
-rw-r--r--doc/development/testing_guide/index.md4
-rw-r--r--doc/development/testing_guide/review_apps.md2
-rw-r--r--doc/development/testing_guide/testing_levels.md4
-rw-r--r--doc/development/testing_guide/testing_migrations_guide.md4
-rw-r--r--doc/development/understanding_explain_plans.md832
-rw-r--r--doc/development/uploads/working_with_uploads.md375
-rw-r--r--doc/development/utilities.md18
-rw-r--r--doc/development/verifying_database_capabilities.md41
-rw-r--r--doc/development/windows.md3
-rw-r--r--doc/development/work_items.md4
-rw-r--r--doc/development/workhorse/configuration.md8
-rw-r--r--doc/development/workhorse/gitlab_features.md2
-rw-r--r--doc/development/workhorse/index.md8
222 files changed, 7042 insertions, 5949 deletions
diff --git a/doc/development/adding_database_indexes.md b/doc/development/adding_database_indexes.md
index e80bffe7c18..7ab846cce3e 100644
--- a/doc/development/adding_database_indexes.md
+++ b/doc/development/adding_database_indexes.md
@@ -1,314 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/adding_database_indexes.md'
+remove_date: '2022-11-05'
---
-# Adding Database Indexes
+This document was moved to [another location](database/adding_database_indexes.md).
-Indexes can be used to speed up database queries, but when should you add a new
-index? Traditionally the answer to this question has been to add an index for
-every column used for filtering or joining data. For example, consider the
-following query:
-
-```sql
-SELECT *
-FROM projects
-WHERE user_id = 2;
-```
-
-Here we are filtering by the `user_id` column and as such a developer may decide
-to index this column.
-
-While in certain cases indexing columns using the above approach may make sense,
-it can actually have a negative impact. Whenever you write data to a table, any
-existing indexes must also be updated. The more indexes there are, the slower this
-can potentially become. Indexes can also take up significant disk space, depending
-on the amount of data indexed and the index type. For example, PostgreSQL offers
-`GIN` indexes which can be used to index certain data types that cannot be
-indexed by regular B-tree indexes. These indexes, however, generally take up more
-data and are slower to update compared to B-tree indexes.
-
-Because of all this, it's important make the following considerations
-when adding a new index:
-
-1. Do the new queries re-use as many existing indexes as possible?
-1. Is there enough data that using an index is faster than iterating over
- rows in the table?
-1. Is the overhead of maintaining the index worth the reduction in query
- timings?
-
-## Re-using Queries
-
-The first step is to make sure your query re-uses as many existing indexes as
-possible. For example, consider the following query:
-
-```sql
-SELECT *
-FROM todos
-WHERE user_id = 123
-AND state = 'open';
-```
-
-Now imagine we already have an index on the `user_id` column but not on the
-`state` column. One may think this query performs badly due to `state` being
-unindexed. In reality the query may perform just fine given the index on
-`user_id` can filter out enough rows.
-
-The best way to determine if indexes are re-used is to run your query using
-`EXPLAIN ANALYZE`. Depending on the joined tables and the columns being used for filtering,
-you may find an extra index doesn't make much, if any, difference.
-
-In short:
-
-1. Try to write your query in such a way that it re-uses as many existing
- indexes as possible.
-1. Run the query using `EXPLAIN ANALYZE` and study the output to find the most
- ideal query.
-
-## Data Size
-
-A database may not use an index even when a regular sequence scan
-(iterating over all rows) is faster, especially for small tables.
-
-Consider adding an index if a table is expected to grow, and your query has to filter a lot of rows.
-You may _not_ want to add an index if the table size is small (<`1,000` records),
-or if existing indexes already filter out enough rows.
-
-## Maintenance Overhead
-
-Indexes have to be updated on every table write. In the case of PostgreSQL, _all_
-existing indexes are updated whenever data is written to a table. As a
-result, having many indexes on the same table slows down writes. It's therefore important
-to balance query performance with the overhead of maintaining an extra index.
-
-Let's say that adding an index reduces SELECT timings by 5 milliseconds but increases
-INSERT/UPDATE/DELETE timings by 10 milliseconds. In this case, the new index may not be worth
-it. A new index is more valuable when SELECT timings are reduced and INSERT/UPDATE/DELETE
-timings are unaffected.
-
-## Finding Unused Indexes
-
-To see which indexes are unused you can run the following query:
-
-```sql
-SELECT relname as table_name, indexrelname as index_name, idx_scan, idx_tup_read, idx_tup_fetch, pg_size_pretty(pg_relation_size(indexrelname::regclass))
-FROM pg_stat_all_indexes
-WHERE schemaname = 'public'
-AND idx_scan = 0
-AND idx_tup_read = 0
-AND idx_tup_fetch = 0
-ORDER BY pg_relation_size(indexrelname::regclass) desc;
-```
-
-This query outputs a list containing all indexes that are never used and sorts
-them by indexes sizes in descending order. This query helps in
-determining whether existing indexes are still required. More information on
-the meaning of the various columns can be found at
-<https://www.postgresql.org/docs/current/monitoring-stats.html>.
-
-To determine if an index is still being used on production, use the following
-Thanos query with your index name:
-
-```sql
-sum(rate(pg_stat_user_indexes_idx_tup_read{env="gprd", indexrelname="index_ci_name", type="patroni-ci"}[5m]))
-```
-
-Because the query output relies on the actual usage of your database, it
-may be affected by factors such as:
-
-- Certain queries never being executed, thus not being able to use certain
- indexes.
-- Certain tables having little data, resulting in PostgreSQL using sequence
- scans instead of index scans.
-
-This data is only reliable for a frequently used database with
-plenty of data, and using as many GitLab features as possible.
-
-## Requirements for naming indexes
-
-Indexes with complex definitions must be explicitly named rather than
-relying on the implicit naming behavior of migration methods. In short,
-that means you **must** provide an explicit name argument for an index
-created with one or more of the following options:
-
-- `where`
-- `using`
-- `order`
-- `length`
-- `type`
-- `opclass`
-
-### Considerations for index names
-
-Check our [Constraints naming conventions](database/constraint_naming_convention.md) page.
-
-### Why explicit names are required
-
-As Rails is database agnostic, it generates an index name only
-from the required options of all indexes: table name and column names.
-For example, imagine the following two indexes are created in a migration:
-
-```ruby
-def up
- add_index :my_table, :my_column
-
- add_index :my_table, :my_column, where: 'my_column IS NOT NULL'
-end
-```
-
-Creation of the second index would fail, because Rails would generate
-the same name for both indexes.
-
-This naming issue is further complicated by the behavior of the `index_exists?` method.
-It considers only the table name, column names, and uniqueness specification
-of the index when making a comparison. Consider:
-
-```ruby
-def up
- unless index_exists?(:my_table, :my_column, where: 'my_column IS NOT NULL')
- add_index :my_table, :my_column, where: 'my_column IS NOT NULL'
- end
-end
-```
-
-The call to `index_exists?` returns true if **any** index exists on
-`:my_table` and `:my_column`, and index creation is bypassed.
-
-The `add_concurrent_index` helper is a requirement for creating indexes
-on populated tables. Because it cannot be used inside a transactional
-migration, it has a built-in check that detects if the index already
-exists. In the event a match is found, index creation is skipped.
-Without an explicit name argument, Rails can return a false positive
-for `index_exists?`, causing a required index to not be created
-properly. By always requiring a name for certain types of indexes, the
-chance of error is greatly reduced.
-
-## Temporary indexes
-
-There may be times when an index is only needed temporarily.
-
-For example, in a migration, a column of a table might be conditionally
-updated. To query which columns must be updated in the
-[query performance guidelines](query_performance.md), an index is needed
-that would otherwise not be used.
-
-In these cases, consider a temporary index. To specify a
-temporary index:
-
-1. Prefix the index name with `tmp_` and follow the [naming conventions](database/constraint_naming_convention.md).
-1. Create a follow-up issue to remove the index in the next (or future) milestone.
-1. Add a comment in the migration mentioning the removal issue.
-
-A temporary migration would look like:
-
-```ruby
-INDEX_NAME = 'tmp_index_projects_on_owner_where_emails_disabled'
-
-def up
- # Temporary index to be removed in 13.9 https://gitlab.com/gitlab-org/gitlab/-/issues/1234
- add_concurrent_index :projects, :creator_id, where: 'emails_disabled = false', name: INDEX_NAME
-end
-
-def down
- remove_concurrent_index_by_name :projects, INDEX_NAME
-end
-```
-
-## Create indexes asynchronously
-
-For very large tables, index creation can be a challenge to manage.
-While `add_concurrent_index` creates indexes in a way that does not block
-normal traffic, it can still be problematic when index creation runs for
-many hours. Necessary database operations like `autovacuum` cannot run, and
-on GitLab.com, the deployment process is blocked waiting for index
-creation to finish.
-
-To limit impact on GitLab.com, a process exists to create indexes
-asynchronously during weekend hours. Due to generally lower traffic and fewer deployments,
-index creation can proceed at a lower level of risk.
-
-### Schedule index creation for a low-impact time
-
-1. [Schedule the index to be created](#schedule-the-index-to-be-created).
-1. [Verify the MR was deployed and the index exists in production](#verify-the-mr-was-deployed-and-the-index-exists-in-production).
-1. [Add a migration to create the index synchronously](#add-a-migration-to-create-the-index-synchronously).
-
-### Schedule the index to be created
-
-Create an MR with a post-deployment migration which prepares the index
-for asynchronous creation. An example of creating an index using
-the asynchronous index helpers can be seen in the block below. This migration
-enters the index name and definition into the `postgres_async_indexes`
-table. The process that runs on weekends pulls indexes from this
-table and attempt to create them.
-
-```ruby
-# in db/post_migrate/
-
-INDEX_NAME = 'index_ci_builds_on_some_column'
-
-def up
- prepare_async_index :ci_builds, :some_column, name: INDEX_NAME
-end
-
-def down
- unprepare_async_index :ci_builds, :some_column, name: INDEX_NAME
-end
-```
-
-### Verify the MR was deployed and the index exists in production
-
-You can verify if the MR was deployed to GitLab.com by executing
-`/chatops run auto_deploy status <merge_sha>`. To verify existence of
-the index, you can:
-
-- Use a meta-command in #database-lab, such as: `\d <index_name>`.
- - Ensure that the index is not [`invalid`](https://www.postgresql.org/docs/12/sql-createindex.html#:~:text=The%20psql%20%5Cd%20command%20will%20report%20such%20an%20index%20as%20INVALID).
-- Ask someone in #database to check if the index exists.
-- With proper access, you can also verify directly on production or in a
-production clone.
-
-### Add a migration to create the index synchronously
-
-After the index is verified to exist on the production database, create a second
-merge request that adds the index synchronously. The schema changes must be
-updated and committed to `structure.sql` in this second merge request.
-The synchronous migration results in a no-op on GitLab.com, but you should still add the
-migration as expected for other installations. The below block
-demonstrates how to create the second migration for the previous
-asynchronous example.
-
-**WARNING:**
-Verify that the index exists in production before merging a second migration with `add_concurrent_index`.
-If the second migration is deployed before the index has been created,
-the index is created synchronously when the second migration executes.
-
-```ruby
-# in db/post_migrate/
-
-INDEX_NAME = 'index_ci_builds_on_some_column'
-
-disable_ddl_transaction!
-
-def up
- add_concurrent_index :ci_builds, :some_column, name: INDEX_NAME
-end
-
-def down
- remove_concurrent_index_by_name :ci_builds, INDEX_NAME
-end
-```
-
-## Test database index changes locally
-
-You must test the database index changes locally before creating a merge request.
-
-### Verify indexes created asynchronously
-
-Use the asynchronous index helpers on your local environment to test changes for creating an index:
-
-1. Enable the feature flags by running `Feature.enable(:database_async_index_creation)` and `Feature.enable(:database_reindexing)` in the Rails console.
-1. Run `bundle exec rails db:migrate` so that it creates an entry in the `postgres_async_indexes` table.
-1. Run `bundle exec rails gitlab:db:reindex` so that the index is created asynchronously.
-1. To verify the index, open the PostgreSQL console using the [GDK](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/postgresql.md) command `gdk psql` and run the command `\d <index_name>` to check that your newly created index exists.
+<!-- This redirect file can be deleted after <2022-11-05>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/adding_service_component.md b/doc/development/adding_service_component.md
index 51c6e86bb49..c00d9da5d16 100644
--- a/doc/development/adding_service_component.md
+++ b/doc/development/adding_service_component.md
@@ -60,7 +60,7 @@ NOTE:
Code shipped with GitLab needs to use a license approved by the Legal team. See the list of [existing approved licenses](https://about.gitlab.com/handbook/engineering/open-source/#using-open-source-libraries).
-Notify the [Distribution team](https://about.gitlab.com/handbook/engineering/development/enablement/distribution/) when adding a new dependency that must be compiled. We must be able to compile the dependency on all supported platforms.
+Notify the [Distribution team](https://about.gitlab.com/handbook/engineering/development/enablement/systems/distribution/) when adding a new dependency that must be compiled. We must be able to compile the dependency on all supported platforms.
New services to be bundled with GitLab need to be available in the following environments.
@@ -83,7 +83,7 @@ In order for a service to be bundled for end-users or GitLab.com, it needs to be
Dependencies should be kept up to date and be tracked for security updates. For the Rails codebase, the JavaScript and Ruby dependencies are
scanned for vulnerabilities using GitLab [dependency scanning](../user/application_security/dependency_scanning/index.md).
-In addition, any system dependencies used in Omnibus packages or the Cloud Native images should be added to the [dependency update automation](https://about.gitlab.com/handbook/engineering/development/enablement/distribution/maintenance/dependencies.io.html#adding-new-dependencies).
+In addition, any system dependencies used in Omnibus packages or the Cloud Native images should be added to the [dependency update automation](https://about.gitlab.com/handbook/engineering/development/enablement/systems/distribution/maintenance/dependencies.io.html#adding-new-dependencies).
## Release management
diff --git a/doc/development/api_graphql_styleguide.md b/doc/development/api_graphql_styleguide.md
index 37de7044765..0b36b9b2f2f 100644
--- a/doc/development/api_graphql_styleguide.md
+++ b/doc/development/api_graphql_styleguide.md
@@ -98,11 +98,8 @@ See the [deprecating schema items](#deprecating-schema-items) section for how to
### Breaking change exemptions
-Two scenarios exist where schema items are exempt from the deprecation process,
-and can be removed or changed at any time without notice. These are schema items that either:
-
-- Use the [`feature_flag` property](#feature_flag-property) _and_ the flag is disabled by default.
-- Are [marked as alpha](#marking-schema-items-as-alpha).
+Schema items [marked as alpha](#mark-schema-items-as-alpha) are exempt from the deprecation process,
+and can be removed or changed at any time without notice.
## Global IDs
@@ -216,7 +213,6 @@ Further reading:
- [GraphQL Best Practices Guide](https://graphql.org/learn/best-practices/#nullability).
- GraphQL documentation on [Object types and fields](https://graphql.org/learn/schema/#object-types-and-fields).
-- [GraphQL Best Practices Guide](https://graphql.org/learn/best-practices/#nullability)
- [Using nullability in GraphQL](https://www.apollographql.com/blog/graphql/basics/using-nullability-in-graphql/)
### Exposing Global IDs
@@ -249,8 +245,7 @@ end
NOTE:
For specifics on implementation, see [Pagination implementation](#pagination-implementation).
-GraphQL uses [cursor based
-pagination](https://graphql.org/learn/pagination/#pagination-and-edges)
+GraphQL uses [cursor based pagination](https://graphql.org/learn/pagination/#pagination-and-edges)
to expose collections of items. This provides the clients with a lot
of flexibility while also allowing the backend to use different
pagination models.
@@ -472,93 +467,38 @@ end
## Feature flags
-Developers can add [feature flags](../development/feature_flags/index.md) to GraphQL
-fields in the following ways:
-
-- Add the [`feature_flag` property](#feature_flag-property) to a field. This allows the field to be _hidden_
- from the GraphQL schema when the flag is disabled.
-- [Toggle the return value](#toggle-the-value-of-a-field) when resolving the field.
-
-You can refer to these guidelines to decide which approach to use:
-
-- If your field is experimental, and its name or type is subject to
- change, use the [`feature_flag` property](#feature_flag-property).
-- If your field is stable and its definition doesn't change, even after the flag is
- removed, [toggle the return value](#toggle-the-value-of-a-field) of the field instead. Note that
- [all fields should be nullable](#nullable-fields) anyway.
-- If your field will be accessed from frontend using the `@include` or `@skip` directive, [do not use the `feature_flag` property](#frontend-and-backend-feature-flag-strategies).
-
-### `feature_flag` property
-
-The `feature_flag` property allows you to toggle the field's
-[visibility](https://graphql-ruby.org/authorization/visibility.html)
-in the GraphQL schema. This removes the field from the schema
-when the flag is disabled.
-
-A description is [appended](https://gitlab.com/gitlab-org/gitlab/-/blob/497b556/app/graphql/types/base_field.rb#L44-53)
-to the field indicating that it is behind a feature flag.
-
-WARNING:
-If a client queries for the field when the feature flag is disabled, the query
-fails. Consider this when toggling the visibility of the feature on or off on
-production.
-
-The `feature_flag` property does not allow the use of
-[feature gates based on actors](../development/feature_flags/index.md).
-This means that the feature flag cannot be toggled only for particular
-projects, groups, or users, but instead can only be toggled globally for
-everyone.
-
-Example:
-
-```ruby
-field :test_field, type: GraphQL::Types::String,
- null: true,
- description: 'Some test field.',
- feature_flag: :my_feature_flag
-```
-
-### Frontend and Backend feature flag strategies
-
-#### Directives
-
-When feature flags are used in the frontend to control the `@include` and `@skip` directives, do not use the `feature_flag`
-property on the server-side. For the accepted backend workaround, see [Toggle the value of a field](#toggle-the-value-of-a-field). It is recommended that the feature flag used in this approach be the same for frontend and backend.
-
-Even if the frontend directives evaluate to `@include:false` / `@skip:true`, the guarded entity is sent to the backend and matched
-against the GraphQL schema. We would then get an exception due to a schema mismatch. See the [frontend GraphQL guide](../development/fe_guide/graphql.md#the-include-directive) for more guidance.
+You can implement [feature flags](../development/feature_flags/index.md) in GraphQL to toggle:
-#### Different versions of a query
+- The return value of a field.
+- The behavior of an argument or mutation.
-See the guide frontend GraphQL guide for [different versions of a query](../development/fe_guide/graphql.md#different-versions-of-a-query), and [why it is not the preferred approach](../development/fe_guide/graphql.md#avoiding-multiple-query-versions)
-
-### Toggle the value of a field
-
-This method of using feature flags for fields is to toggle the
-return value of the field. This can be done in the resolver, in the
+This can be done in a resolver, in the
type, or even in a model method, depending on your preference and
situation.
-Consider also [marking the field as Alpha](#marking-schema-items-as-alpha)
-while the value of the field can be toggled. You can
-[change or remove Alpha fields at any time](#breaking-change-exemptions) without needing to deprecate them.
-This also signals to consumers of the public GraphQL API that the field is not
+NOTE:
+It's recommended that you also [mark the item as Alpha](#mark-schema-items-as-alpha) while it is behind a feature flag.
+This signals to consumers of the public GraphQL API that the field is not
meant to be used yet.
+You can also
+[change or remove Alpha items at any time](#breaking-change-exemptions) without needing to deprecate them. When the flag is removed, "release"
+the schema item by removing its Alpha property to make it public.
-When applying a feature flag to toggle the value of a field, the
-`description` of the field must:
+### Descriptions for feature flagged items
-- State that the value of the field can be toggled by a feature flag.
+When using a feature flag to toggle the value or behavior of a schema item, the
+`description` of the item must:
+
+- State that the value or behavior can be toggled by a feature flag.
- Name the feature flag.
-- State what the field returns when the feature flag is disabled (or
+- State what the field returns, or behavior is, when the feature flag is disabled (or
enabled, if more appropriate).
-Example:
+Example of a feature-flagged field:
```ruby
-field :foo, GraphQL::Types::String,
- null: true,
- deprecated: { reason: :alpha, milestone: '10.0' },
+field :foo, GraphQL::Types::String, null: true,
+ alpha: { milestone: '10.0' },
description: 'Some test field. Returns `null`' \
'if `my_feature_flag` feature flag is disabled.'
@@ -567,6 +507,26 @@ def foo
end
```
+Example of a feature-flagged argument:
+
+```ruby
+argument :foo, type: GraphQL::Types::String, required: false,
+ alpha: { milestone: '10.0' },
+ description: 'Some test argument. Is ignored if ' \
+ '`my_feature_flag` feature flag is disabled.'
+
+def resolve(args)
+ args.delete(:foo) unless Feature.enabled?(:my_feature_flag, object)
+ # ...
+end
+```
+
+### `feature_flag` property (deprecated)
+
+NOTE:
+This property is deprecated and should no longer be used. The property
+has been temporarily renamed to `_deprecated_feature_flag` and support for it will be removed in [#369202](https://gitlab.com/gitlab-org/gitlab/-/issues/369202).
+
## Deprecating schema items
The GitLab GraphQL API is versionless, which means we maintain backwards
@@ -586,6 +546,7 @@ To deprecate a schema item in GraphQL:
See also:
- [Aliasing and deprecating mutations](#aliasing-and-deprecating-mutations).
+- [Marking schema items as Alpha](#mark-schema-items-as-alpha).
- [How to filter Kibana for queries that used deprecated fields](graphql_guide/monitoring.md#queries-that-used-a-deprecated-field).
### Create a deprecation issue
@@ -715,7 +676,7 @@ To allow clients to continue to interact with the mutation unchanged, edit the `
```ruby
DEPRECATIONS = [
- Deprecation.new(old_model_name: 'PrometheusService', new_model_name: 'Integrations::Prometheus', milestone: '14.0')
+ Gitlab::Graphql::DeprecationsBase::NameDeprecation.new(old_name: 'PrometheusService', new_name: 'Integrations::Prometheus', milestone: '14.0')
].freeze
```
@@ -745,28 +706,37 @@ aware of the support.
The documentation will mention that the old Global ID style is now deprecated.
-## Marking schema items as Alpha
+## Mark schema items as Alpha
-Fields, arguments, enum values, and mutations can be marked as being in
-[alpha](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga).
+You can mark GraphQL schema items (fields, arguments, enum values, and mutations) as
+[Alpha](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga).
-An item marked as "alpha" is exempt from the deprecation process and can be removed
-at any time without notice.
+An item marked as Alpha is [exempt from the deprecation process](#breaking-change-exemptions) and can be removed
+at any time without notice. Mark an item as Alpha when it is
+subject to change and not ready for public use.
-This leverages GraphQL deprecations to cause the schema item to appear as deprecated,
-and will be described as being in "alpha" in our generated docs and its GraphQL description.
+NOTE:
+Only mark new items as Alpha. Never mark existing items
+as Alpha because they're already public.
-To mark a schema item as being in "alpha", use the `deprecated:` keyword with `reason: :alpha`.
-You must provide the `milestone:` that introduced the alpha item.
+To mark a schema item as Alpha, use the `alpha:` keyword.
+You must provide the `milestone:` that introduced the Alpha item.
For example:
```ruby
field :token, GraphQL::Types::String, null: true,
- deprecated: { reason: :alpha, milestone: '10.0' },
+ alpha: { milestone: '10.0' },
description: 'Token for login.'
```
+Alpha GraphQL items is a custom GitLab feature that leverages GraphQL deprecations. An Alpha item
+appears as deprecated in the GraphQL schema. Like all deprecated schema items, you can test an
+Alpha field in [GraphiQL](../api/graphql/index.md#graphiql). However, be aware that the GraphiQL
+autocomplete editor doesn't suggest deprecated fields.
+
+The item shows as Alpha in our generated GraphQL documentation and its GraphQL schema description.
+
## Enums
GitLab GraphQL enums are defined in `app/graphql/types`. When defining new enums, the
@@ -816,7 +786,7 @@ Enum values can be deprecated using the
### Defining GraphQL enums dynamically from Rails enums
-If your GraphQL enum is backed by a [Rails enum](creating_enums.md), then consider
+If your GraphQL enum is backed by a [Rails enum](database/creating_enums.md), then consider
using the Rails enum to dynamically define the GraphQL enum values. Doing so
binds the GraphQL enum values to the Rails enum definition, so if values are
ever added to the Rails enum then the GraphQL enum automatically reflects the change.
@@ -1641,8 +1611,8 @@ correctly rendered to the clients.
### Errors in mutations
-We encourage following the practice of [errors as
-data](https://graphql-ruby.org/mutations/mutation_errors) for mutations, which
+We encourage following the practice of
+[errors as data](https://graphql-ruby.org/mutations/mutation_errors) for mutations, which
distinguishes errors by who they are relevant to, defined by who can deal with
them.
diff --git a/doc/development/api_styleguide.md b/doc/development/api_styleguide.md
index df2f3c337cd..b72ef1bffc4 100644
--- a/doc/development/api_styleguide.md
+++ b/doc/development/api_styleguide.md
@@ -59,7 +59,7 @@ end
## Declared parameters
-> Grape allows you to access only the parameters that have been declared by your
+Grape allows you to access only the parameters that have been declared by your
`params` block. It filters out the parameters that have been passed, but are not
allowed.
@@ -67,7 +67,7 @@ allowed.
### Exclude parameters from parent namespaces
-> By default `declared(params)`includes parameters that were defined in all
+By default `declared(params)`includes parameters that were defined in all
parent namespaces.
– <https://github.com/ruby-grape/grape#include-parent-namespaces>
@@ -110,15 +110,15 @@ Model.create(foo: params[:foo])
With Grape v1.3+, Array types must be defined with a `coerce_with`
block, or parameters, fails to validate when passed a string from an
-API request. See the [Grape upgrading
-documentation](https://github.com/ruby-grape/grape/blob/master/UPGRADING.md#ensure-that-array-types-have-explicit-coercions)
+API request. See the
+[Grape upgrading documentation](https://github.com/ruby-grape/grape/blob/master/UPGRADING.md#ensure-that-array-types-have-explicit-coercions)
for more details.
### Automatic coercion of nil inputs
Prior to Grape v1.3.3, Array parameters with `nil` values would
-automatically be coerced to an empty Array. However, due to [this pull
-request in v1.3.3](https://github.com/ruby-grape/grape/pull/2040), this
+automatically be coerced to an empty Array. However, due to
+[this pull request in v1.3.3](https://github.com/ruby-grape/grape/pull/2040), this
is no longer the case. For example, suppose you define a PUT `/test`
request that has an optional parameter:
@@ -259,8 +259,8 @@ In situations where the same model has multiple entities in the API
discretion with applying this scope. It may be that you optimize for the
most basic entity, with successive entities building upon that scope.
-The `with_api_entity_associations` scope also [automatically preloads
-data](https://gitlab.com/gitlab-org/gitlab/-/blob/19f74903240e209736c7668132e6a5a735954e7c/app%2Fmodels%2Ftodo.rb#L34)
+The `with_api_entity_associations` scope also
+[automatically preloads data](https://gitlab.com/gitlab-org/gitlab/-/blob/19f74903240e209736c7668132e6a5a735954e7c/app%2Fmodels%2Ftodo.rb#L34)
for `Todo` _targets_ when returned in the [to-dos API](../api/todos.md).
For more context and discussion about preloading see
diff --git a/doc/development/application_limits.md b/doc/development/application_limits.md
index 2826b8a3bc4..ceb3c124d1a 100644
--- a/doc/development/application_limits.md
+++ b/doc/development/application_limits.md
@@ -15,8 +15,7 @@ First of all, you have to gather information and decide which are the different
limits that are set for the different GitLab tiers. Coordinate with others to [document](../administration/instance_limits.md)
and communicate those limits.
-There is a guide about [introducing application
-limits](https://about.gitlab.com/handbook/product/product-processes/#introducing-application-limits).
+There is a guide about [introducing application limits](https://about.gitlab.com/handbook/product/product-processes/#introducing-application-limits).
## Implement plan limits
diff --git a/doc/development/application_slis/index.md b/doc/development/application_slis/index.md
index 8d7941865e1..27e69ff3445 100644
--- a/doc/development/application_slis/index.md
+++ b/doc/development/application_slis/index.md
@@ -8,16 +8,14 @@ info: To determine the technical writer assigned to the Stage/Group associated w
> [Introduced](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/525) in GitLab 14.4
-It is possible to define [Service Level Indicators
-(SLIs)](https://en.wikipedia.org/wiki/Service_level_indicator)
+It is possible to define [Service Level Indicators(SLIs)](https://en.wikipedia.org/wiki/Service_level_indicator)
directly in the Ruby codebase. This keeps the definition of operations
and their success close to the implementation and allows the people
building features to easily define how these features should be
monitored.
Defining an SLI causes 2
-[Prometheus
-counters](https://prometheus.io/docs/concepts/metric_types/#counter)
+[Prometheus counters](https://prometheus.io/docs/concepts/metric_types/#counter)
to be emitted from the rails application:
- `gitlab_sli:<sli name>:total`: incremented for each operation.
@@ -47,10 +45,9 @@ for clarity, they define different metric names:
As shown in this example, they can share a base name (`foo` in this example). We
recommend this when they refer to the same operation.
-Before the first scrape, it is important to have [initialized the SLI
-with all possible
-label-combinations](https://prometheus.io/docs/practices/instrumentation/#avoid-missing-metrics). This
-avoid confusing results when using these counters in calculations.
+Before the first scrape, it is important to have
+[initialized the SLI with all possible label-combinations](https://prometheus.io/docs/practices/instrumentation/#avoid-missing-metrics).
+This avoid confusing results when using these counters in calculations.
To initialize an SLI, use the `.initialize_sli` class method, for
example:
diff --git a/doc/development/application_slis/rails_request_apdex.md b/doc/development/application_slis/rails_request_apdex.md
index 3e3cd100183..033bffbf608 100644
--- a/doc/development/application_slis/rails_request_apdex.md
+++ b/doc/development/application_slis/rails_request_apdex.md
@@ -231,8 +231,7 @@ end
### Error budget attribution and ownership
This SLI is used for service level monitoring. It feeds into the
-[error budget for stage
-groups](../stage_group_observability/index.md#error-budget). For this
+[error budget for stage groups](../stage_group_observability/index.md#error-budget). For this
particular SLI, we have opted everyone out by default to give time to
set the correct urgencies on endpoints before it affects a group's
error budget.
diff --git a/doc/development/architecture.md b/doc/development/architecture.md
index a61a891b096..10d6c0ae9c9 100644
--- a/doc/development/architecture.md
+++ b/doc/development/architecture.md
@@ -89,7 +89,7 @@ new features and services must be written to consider Kubernetes compatibility *
The simplest way to ensure this, is to add support for your feature or service to
[the official GitLab Helm chart](https://docs.gitlab.com/charts/) or reach out to
-[the Distribution team](https://about.gitlab.com/handbook/engineering/development/enablement/distribution/#how-to-work-with-distribution).
+[the Distribution team](https://about.gitlab.com/handbook/engineering/development/enablement/systems/distribution/#how-to-work-with-distribution).
Refer to the [process for adding new service components](adding_service_component.md) for more details.
diff --git a/doc/development/audit_event_guide/index.md b/doc/development/audit_event_guide/index.md
index 14cd2fd1dc3..0c66189a6f6 100644
--- a/doc/development/audit_event_guide/index.md
+++ b/doc/development/audit_event_guide/index.md
@@ -29,13 +29,13 @@ If you have any questions, please reach out to `@gitlab-org/manage/compliance` t
To instrument an audit event, the following attributes should be provided:
-| Attribute | Type | Required? | Description |
-|:-------------|:---------------------|:----------|:-----------------------------------------------------------------|
-| `name` | String | false | Action name to be audited. Used for error tracking |
-| `author` | User | true | User who authors the change |
-| `scope` | User, Project, Group | true | Scope which the audit event belongs to |
-| `target` | Object | true | Target object being audited |
-| `message` | String | true | Message describing the action |
+| Attribute | Type | Required? | Description |
+|:-------------|:---------------------|:----------|:------------------------------------------------------------------|
+| `name` | String | false | Action name to be audited. Used for error tracking |
+| `author` | User | true | User who authors the change |
+| `scope` | User, Project, Group | true | Scope which the audit event belongs to |
+| `target` | Object | true | Target object being audited |
+| `message` | String | true | Message describing the action ([not translated](#i18n-and-the-audit-event-message-attribute)) |
| `created_at` | DateTime | false | The time when the action occurred. Defaults to `DateTime.current` |
## How to instrument new Audit Events
@@ -195,7 +195,7 @@ deactivate B
```
In addition to recording to the database, we also write these events to
-[a log file](../../administration/logs.md#audit_jsonlog).
+[a log file](../../administration/logs/index.md#audit_jsonlog).
## Event streaming
@@ -210,3 +210,10 @@ a large amount of data. See [this merge request](https://gitlab.com/gitlab-org/g
for an example.
This feature is under heavy development. Follow the [parent epic](https://gitlab.com/groups/gitlab-org/-/epics/5925) for updates on feature
development.
+
+### I18N and the Audit Event `:message` attribute
+
+We intentionally do not translate audit event messages because translated messages would be saved in the database and served to users, regardless of their locale settings.
+
+This could mean, for example, that we use the locale for the currently logged in user to record an audit event message and stream the message to an external streaming
+destination in the wrong language for that destination. Users could find that confusing.
diff --git a/doc/development/auto_devops.md b/doc/development/auto_devops.md
index 2989e10a124..55ab234cc68 100644
--- a/doc/development/auto_devops.md
+++ b/doc/development/auto_devops.md
@@ -20,8 +20,8 @@ based on your project contents. When Auto DevOps is enabled for a
project, the user does not need to explicitly include any pipeline configuration
through a [`.gitlab-ci.yml` file](../ci/yaml/index.md).
-In the absence of a `.gitlab-ci.yml` file, the [Auto DevOps CI
-template](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/ci/templates/Auto-DevOps.gitlab-ci.yml)
+In the absence of a `.gitlab-ci.yml` file, the
+[Auto DevOps CI/CD template](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/ci/templates/Auto-DevOps.gitlab-ci.yml)
is used implicitly to configure the pipeline for the project. This
template is a top-level template that includes other sub-templates,
which then defines jobs.
diff --git a/doc/development/backend/create_source_code_be/index.md b/doc/development/backend/create_source_code_be/index.md
index ad4e25dc815..e1ee78731de 100644
--- a/doc/development/backend/create_source_code_be/index.md
+++ b/doc/development/backend/create_source_code_be/index.md
@@ -13,7 +13,7 @@ of the [Create stage](https://about.gitlab.com/handbook/product/categories/#crea
of the [DevOps lifecycle](https://about.gitlab.com/handbook/product/categories/#devops-stages).
We interface with the Gitaly and Code Review teams, and work closely with the
-[Create:Source Code Frontend team](https://about.gitlab.com/handbook/engineering/development/dev/create-source-code-fe). The features
+[Create:Source Code Frontend team](https://about.gitlab.com/handbook/engineering/development/dev/create/create-source-code-fe/). The features
we work with are listed on the
[Features by Group Page](https://about.gitlab.com/handbook/product/categories/features/#createsource-code-group).
diff --git a/doc/development/backend/ruby_style_guide.md b/doc/development/backend/ruby_style_guide.md
index c86f21d4bac..a9fee02a15a 100644
--- a/doc/development/backend/ruby_style_guide.md
+++ b/doc/development/backend/ruby_style_guide.md
@@ -16,6 +16,8 @@ document the new rule. For every new guideline, add it in a new section and link
[version history note](../documentation/versions.md#add-a-version-history-item)
to provide context and serve as a reference.
+See also [guidelines for reusing abstractions](../reusing_abstractions.md).
+
Everything listed here can be [reopened for discussion](https://about.gitlab.com/handbook/values/#disagree-commit-and-disagree).
## Instance variable access using `attr_reader`
diff --git a/doc/development/build_test_package.md b/doc/development/build_test_package.md
index 89b13efc1aa..4645bd02d9e 100644
--- a/doc/development/build_test_package.md
+++ b/doc/development/build_test_package.md
@@ -13,8 +13,8 @@ pipeline that can be used to trigger a pipeline in the Omnibus GitLab repository
that will create:
- A deb package for Ubuntu 16.04, available as a build artifact, and
-- A Docker image, which is pushed to the [Omnibus GitLab container
- registry](https://gitlab.com/gitlab-org/omnibus-gitlab/container_registry)
+- A Docker image, which is pushed to the
+ [Omnibus GitLab container registry](https://gitlab.com/gitlab-org/omnibus-gitlab/container_registry)
(images titled `gitlab-ce` and `gitlab-ee` respectively and image tag is the
commit which triggered the pipeline).
diff --git a/doc/development/cached_queries.md b/doc/development/cached_queries.md
index b0bf7c7b6f5..7af4c302e93 100644
--- a/doc/development/cached_queries.md
+++ b/doc/development/cached_queries.md
@@ -1,6 +1,6 @@
---
stage: Data Stores
-group: Memory
+group: Application Performance
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
diff --git a/doc/development/caching.md b/doc/development/caching.md
index 7c51bd595f7..5ae6484436e 100644
--- a/doc/development/caching.md
+++ b/doc/development/caching.md
@@ -118,8 +118,8 @@ Is the cache being added "worthy"? This can be hard to measure, but you can cons
- `tail -f log/development.log | grep "Rendered "`
- After you're looking in the right place:
- Remove or comment out sections of code until you find the cause.
- - Use `binding.pry` to poke about in live requests. This requires a foreground
- web process like [Thin](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/pry.md).
+ - Use `binding.pry` to poke about in live requests. This requires a
+ [foreground web process](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/pry.md).
#### Verification
@@ -301,10 +301,10 @@ it's time to look at a custom solution:
In short: the oldest stuff is replaced with new stuff:
-- A [useful article](https://redis.io/topics/lru-cache) about configuring Redis as an LRU cache.
+- A [useful article](https://redis.io/docs/manual/eviction/) about configuring Redis as an LRU cache.
- Lots of options for different cache eviction strategies.
- You probably want `allkeys-lru`, which is functionally similar to Memcached.
-- In Redis 4.0 and later, [allkeys-lfu is available](https://redis.io/topics/lru-cache#the-new-lfu-mode),
+- In Redis 4.0 and later, [allkeys-lfu is available](https://redis.io/docs/manual/eviction/#the-new-lfu-mode),
which is similar but different.
- We handle all explicit deletes using UNLINK instead of DEL now, which allows Redis to
reclaim memory in its own time, rather than immediately.
diff --git a/doc/development/changelog.md b/doc/development/changelog.md
index 83919bab671..c5b234069e3 100644
--- a/doc/development/changelog.md
+++ b/doc/development/changelog.md
@@ -190,8 +190,8 @@ editor. Once closed, Git presents you with a new text editor instance to edit
the commit message of commit B. Add the trailer, then save and quit the editor.
If all went well, commit B is now updated.
-For more information about interactive rebases, take a look at [the Git
-documentation](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History).
+For more information about interactive rebases, take a look at
+[the Git documentation](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History).
---
diff --git a/doc/development/chatops_on_gitlabcom.md b/doc/development/chatops_on_gitlabcom.md
index 2065021c61b..7309b92c702 100644
--- a/doc/development/chatops_on_gitlabcom.md
+++ b/doc/development/chatops_on_gitlabcom.md
@@ -59,5 +59,5 @@ To request access to ChatOps on GitLab.com:
## See also
- [ChatOps Usage](../ci/chatops/index.md)
-- [Understanding EXPLAIN plans](understanding_explain_plans.md)
+- [Understanding EXPLAIN plans](database/understanding_explain_plans.md)
- [Feature Groups](feature_flags/index.md#feature-groups)
diff --git a/doc/development/code_intelligence/index.md b/doc/development/code_intelligence/index.md
index 3a8845084c3..a89730383e4 100644
--- a/doc/development/code_intelligence/index.md
+++ b/doc/development/code_intelligence/index.md
@@ -35,8 +35,8 @@ sequenceDiagram
Workhorse-->>-Runner: request results
```
-1. The CI/CD job generates a document in an LSIF format (usually `dump.lsif`) using [an
- indexer](https://lsif.dev) for the language of a project. The format
+1. The CI/CD job generates a document in an LSIF format (usually `dump.lsif`) using
+ [an indexer](https://lsif.dev) for the language of a project. The format
[describes](https://github.com/sourcegraph/sourcegraph/blob/main/doc/code_intelligence/explanations/writing_an_indexer.md)
interactions between a method or function and its definitions or references. The
document is marked to be stored as an LSIF report artifact.
diff --git a/doc/development/code_review.md b/doc/development/code_review.md
index 1225260e600..e9e546c6f9b 100644
--- a/doc/development/code_review.md
+++ b/doc/development/code_review.md
@@ -62,6 +62,9 @@ Team members' domain expertise can be viewed on the [engineering projects](https
### Reviewer roulette
+NOTE:
+Reviewer roulette is an internal tool for use on GitLab.com, and not available for use on customer installations.
+
The [Danger bot](dangerbot.md) randomly picks a reviewer and a maintainer for
each area of the codebase that your merge request seems to touch. It only makes
**recommendations** and you should override it if you think someone else is a better
@@ -110,6 +113,12 @@ The [Roulette dashboard](https://gitlab-org.gitlab.io/gitlab-roulette) contains:
For more information, review [the roulette README](https://gitlab.com/gitlab-org/gitlab-roulette).
+As an experiment, we want to introduce a `local` reviewer status for database reviews. Local reviewers are reviewers
+focusing on work from a team/stage, but not outside of it. This helps to focus and build great domain
+knowledge. We are not introducing changes to the reviewer roulette till we evaluate the impact and feedback from this
+experiment. We ask to respect reviewers who decline reviews based on their focus on `local` reviews. For tracking purposes,
+please use in your personal YAML file entry: `- reviewer database local` instead of `- reviewer database`.
+
### Approval guidelines
As described in the section on the responsibility of the maintainer below, you
@@ -135,7 +144,7 @@ with [domain expertise](#domain-experts).
More information about license compatibility can be found in our
[GitLab Licensing and Compatibility documentation](licensing.md).
1. If your merge request includes a new dependency or a file system change, it must be
- **approved by a [Distribution team member](https://about.gitlab.com/company/team/)**. See how to work with the [Distribution team](https://about.gitlab.com/handbook/engineering/development/enablement/distribution/#how-to-work-with-distribution) for more details.
+ **approved by a [Distribution team member](https://about.gitlab.com/company/team/)**. See how to work with the [Distribution team](https://about.gitlab.com/handbook/engineering/development/enablement/systems/distribution/#how-to-work-with-distribution) for more details.
1. If your merge request includes documentation changes, it must be **approved
by a [Technical writer](https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments)**,
based on assignments in the appropriate [DevOps stage group](https://about.gitlab.com/handbook/product/categories/#devops-stages).
@@ -144,7 +153,7 @@ with [domain expertise](#domain-experts).
by a [Software Engineer in Test](https://about.gitlab.com/handbook/engineering/quality/#individual-contributors)**.
1. If your merge request only includes end-to-end changes (*4*) **or** if the MR author is a [Software Engineer in Test](https://about.gitlab.com/handbook/engineering/quality/#individual-contributors), it must be **approved by a [Quality maintainer](https://about.gitlab.com/handbook/engineering/projects/#gitlab_maintainers_qa)**
1. If your merge request includes a new or updated [application limit](https://about.gitlab.com/handbook/product/product-processes/#introducing-application-limits), it must be **approved by a [product manager](https://about.gitlab.com/company/team/)**.
-1. If your merge request includes Product Intelligence (telemetry or analytics) changes, it should be reviewed and approved by a [Product Intelligence engineer](https://gitlab.com/gitlab-org/growth/product-intelligence/engineers).
+1. If your merge request includes Product Intelligence (telemetry or analytics) changes, it should be reviewed and approved by a [Product Intelligence engineer](https://gitlab.com/gitlab-org/analytics-section/product-intelligence/engineers).
1. If your merge request includes an addition of, or changes to a [Feature spec](testing_guide/testing_levels.md#frontend-feature-tests), it must be **approved by a [Quality maintainer](https://about.gitlab.com/handbook/engineering/projects/#gitlab_maintainers_qa) or [Quality reviewer](https://about.gitlab.com/handbook/engineering/projects/#gitlab_reviewers_qa)**.
1. If your merge request introduces a new service to GitLab (Puma, Sidekiq, Gitaly are examples), it must be **approved by a [product manager](https://about.gitlab.com/company/team/)**. See the [process for adding a service component to GitLab](adding_service_component.md) for details.
1. If your merge request includes changes related to authentication or authorization, it must be **approved by a [Manage:Authentication and Authorization team member](https://about.gitlab.com/company/team/)**. Check the [code review section on the group page](https://about.gitlab.com/handbook/engineering/development/dev/manage/authentication-and-authorization/#additional-considerations) for more details. Patterns for files known to require review from the team are listed in the in the `Authentication and Authorization` section of the [`CODEOWNERS`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/.gitlab/CODEOWNERS) file, and the team will be listed in the approvers section of all merge requests that modify these files.
@@ -176,7 +185,7 @@ See the [test engineering process](https://about.gitlab.com/handbook/engineering
1. I have tested this MR in [all supported browsers](../install/requirements.md#supported-web-browsers), or determined that this testing is not needed.
1. I have confirmed that this change is [backwards compatible across updates](multi_version_compatibility.md), or I have decided that this does not apply.
1. I have properly separated EE content from FOSS, or this MR is FOSS only.
- - [Where should EE code go?](ee_features.md#separation-of-ee-code)
+ - [Where should EE code go?](ee_features.md)
1. I have considered that existing data may be surprisingly varied. For example, a new model validation can break existing records. Consider making validation on existing data optional rather than required if you haven't confirmed that existing data will pass validation.
##### Performance, reliability, and availability
@@ -278,13 +287,29 @@ This saves reviewers time and helps authors catch mistakes earlier.
### The responsibility of the reviewer
+Reviewers are responsible for reviewing the specifics of the chosen solution.
+
[Review the merge request](#reviewing-a-merge-request) thoroughly.
Verify that the merge request meets all [contribution acceptance criteria](contributing/merge_request_workflow.md#contribution-acceptance-criteria).
-If a merge request is too large, fixes more than one issue, or implements more
-than one feature, you should guide the author towards splitting the merge request
-into smaller merge requests.
+Some merge requests may require domain experts to help with the specifics.
+Reviewers, if they are not a domain expert in the area, can do any of the following:
+
+- Review the merge request and loop in a domain expert for another review. This expert
+ can either be another reviewer or a maintainer.
+- Pass the review to another reviewer they deem more suitable.
+- If no domain experts are available, review on a best-effort basis.
+
+You should guide the author towards splitting the merge request into smaller merge requests if it is:
+
+- Too large.
+- Fixes more than one issue.
+- Implements more than one feature.
+- Has a high complexity resulting in additional risk.
+
+The author may choose to request that the current maintainers and reviewers review the split MRs
+or request a new group of maintainers and reviewers.
When you are confident
that it meets all requirements, you should:
@@ -308,19 +333,6 @@ Because a maintainer's job only depends on their knowledge of the overall GitLab
codebase, and not that of any specific domain, they can review, approve, and merge
merge requests from any team and in any product area.
-A maintainer should ask the author to make a merge request smaller if it is:
-
-- Too large.
-- Fixes more than one issue.
-- Implements more than one feature.
-- Has a high complexity resulting in additional risk.
-
-The maintainer, any of the
-reviewers, or a merge request coach can step up to help the author to divide work
-into smaller iterations, and guide the author on how to split the merge request.
-The author may choose to request that the current maintainers and reviewers review the split MRs
-or request a new group of maintainers and reviewers.
-
Maintainers do their best to also review the specifics of the chosen solution
before merging, but as they are not necessarily [domain experts](#domain-experts), they may be poorly
placed to do so without an unreasonable investment of time. In those cases, they
@@ -401,7 +413,7 @@ first time.
of your shiny new branch, read through the entire diff. Does it make sense?
Did you include something unrelated to the overall purpose of the changes? Did
you forget to remove any debugging code?
-- Write a detailed description as outlined in the [merge request guidelines](contributing/merge_request_workflow.md#merge-request-guidelines).
+- Write a detailed description as outlined in the [merge request guidelines](contributing/merge_request_workflow.md#merge-request-guidelines-for-contributors).
Some reviewers may not be familiar with the product feature or area of the
codebase. Thorough descriptions help all reviewers understand your request
and test effectively.
@@ -445,7 +457,7 @@ You can also use `workflow::ready for review` label. That means that your merge
When your merge request receives an approval from the first reviewer it can be passed to a maintainer. You should default to choosing a maintainer with [domain expertise](#domain-experts), and otherwise follow the Reviewer Roulette recommendation or use the label `ready for merge`.
-Sometimes, a maintainer may not be available for review. They could be out of the office or [at capacity](#review-response-slo).
+Sometimes, a maintainer may not be available for review. They could be out of the office or [at capacity](https://about.gitlab.com/handbook/engineering/workflow/code-review/#review-response-slo).
You can and should check the maintainer's availability in their profile. If the maintainer recommended by
the roulette is not available, choose someone else from that list.
@@ -466,6 +478,8 @@ experience, refactors the existing code). Then:
- Offer alternative implementations, but assume the author already considered
them. ("What do you think about using a custom validator here?")
- Seek to understand the author's perspective.
+- Check out the branch, and test the changes locally. You can decide how much manual testing you want to perform.
+ Your testing might result in opportunities to add automated tests.
- If you don't understand a piece of code, _say so_. There's a good chance
someone else would be confused by it as well.
- Ensure the author is clear on what is required from them to address/resolve the suggestion.
@@ -494,34 +508,29 @@ Before taking the decision to merge:
- If the MR contains both Quality and non-Quality-related changes, the MR should be merged by the relevant maintainer for user-facing changes (backend, frontend, or database) after the Quality related changes are approved by a Software Engineer in Test.
If a merge request is fundamentally ready, but needs only trivial fixes (such as
-typos), consider demonstrating a [bias for
-action](https://about.gitlab.com/handbook/values/#bias-for-action) by making
-those changes directly without going back to the author. You can do this by
+typos), consider demonstrating a [bias for action](https://about.gitlab.com/handbook/values/#bias-for-action)
+by making those changes directly without going back to the author. You can do this by
using the [suggest changes](../user/project/merge_requests/reviews/suggestions.md) feature to apply
your own suggestions to the merge request. Note that:
- If the changes are not straightforward, please prefer allowing the author to make the change.
- **Before applying suggestions**, edit the merge request to make sure
- [squash and
- merge](../user/project/merge_requests/squash_and_merge.md#squash-and-merge)
+ [squash and merge](../user/project/merge_requests/squash_and_merge.md#squash-and-merge)
is enabled, otherwise, the pipeline's Danger job fails.
- If a merge request does not have squash and merge enabled, and it
has more than one commit, then see the note below about rewriting
commit history.
-As a maintainer, if a merge request that you authored has received all required approvals, it is acceptable to show a [bias for action](https://about.gitlab.com/handbook/values/#bias-for-action) and merge your own MR, if:
-
-- The last maintainer to review intended to start the merge and did not, OR
-- The last maintainer to review started the merge, but some trivial chore caused the pipeline to break. For example, the MR might need a rebase first because of unrelated pipeline issues, or some files might need to be regenerated (like `gitlab.pot`).
- - "Trivial" is a subjective measure but we expect project maintainers to exercise their judgement carefully and cautiously.
+Authors are not authorized to merge their own merge requests and need to seek another maintainer to merge.
+This policy is in place to satisfy the CHG-04 control of the GitLab
+[Change Management Controls](https://about.gitlab.com/handbook/engineering/security/security-assurance/security-compliance/guidance/change-management.html).
When ready to merge:
WARNING:
**If the merge request is from a fork, also check the [additional guidelines for community contributions](#community-contributions).**
-- Consider using the [Squash and
- merge](../user/project/merge_requests/squash_and_merge.md#squash-and-merge)
+- Consider using the [Squash and merge](../user/project/merge_requests/squash_and_merge.md#squash-and-merge)
feature when the merge request has a lot of commits.
When merging code, a maintainer should only use the squash feature if the
author has already set this option, or if the merge request clearly contains a
@@ -541,8 +550,7 @@ WARNING:
enough to `main`.
- When you set the MR to "Merge When Pipeline Succeeds", you should take over
subsequent revisions for anything that would be spotted after that.
-- For merge requests that have had [Squash and
- merge](../user/project/merge_requests/squash_and_merge.md#squash-and-merge) set,
+- For merge requests that have had [Squash and merge](../user/project/merge_requests/squash_and_merge.md#squash-and-merge) set,
the squashed commit's default commit message is taken from the merge request title.
You're encouraged to [select a commit with a more informative commit message](../user/project/merge_requests/squash_and_merge.md) before merging.
@@ -586,7 +594,7 @@ If the MR source branch is more than 1,000 commits behind the target branch:
When an MR needs further changes but the author is not responding for a long period of time,
or is unable to finish the MR, GitLab can take it over in accordance with our
-[Closing policy for issues and merge requests](contributing/#closing-policy-for-issues-and-merge-requests).
+[Closing policy for issues and merge requests](contributing/index.md#closing-policy-for-issues-and-merge-requests).
A GitLab engineer (generally the merge request coach) will:
1. Add a comment to their MR saying you'll take it over to be able to get it merged.
@@ -683,42 +691,6 @@ Enterprise Edition instance. This has some implications:
Ensure that we support object storage for any file storage we need to perform. For more
information, see the [uploads documentation](uploads/index.md).
-### Review turnaround time
-
-Because [unblocking others is always a top priority](https://about.gitlab.com/handbook/values/#global-optimization),
-reviewers are expected to review merge requests in a timely manner,
-even when this may negatively impact their other tasks and priorities.
-
-Doing so allows everyone involved in the merge request to iterate faster as the
-context is fresh in memory, and improves contributors' experience significantly.
-
-#### Review-response SLO
-
-To ensure swift feedback to ready-to-review code, we maintain a `Review-response` Service-level Objective (SLO). The SLO is defined as:
-
-> Review-response SLO = (time when first review is provided) - (time MR is assigned to reviewer) < 2 business days
-
-If you don't think you can review a merge request in the `Review-response` SLO
-time frame, let the author know as soon as possible in the comments
-(no later than 36 hours after first receiving the review request)
-and try to help them find another reviewer or maintainer who is able to, so that they can be unblocked
-and get on with their work quickly. Remove yourself as a reviewer.
-
-If you think you are at capacity and are unable to accept any more reviews until
-some have been completed, communicate this through your GitLab status by setting
-the 🔴 `:red_circle:` emoji and mentioning that you are at capacity in the status
-text. This guides contributors to pick a different reviewer, helping us to
-meet the SLO.
-
-Of course, if you are out of office and have
-[communicated](https://about.gitlab.com/handbook/paid-time-off/#communicating-your-time-off)
-this through your GitLab.com Status, authors are expected to realize this and
-find a different reviewer themselves.
-
-When a merge request author has been blocked for longer than
-the `Review-response` SLO, they are free to remind the reviewer through Slack or add
-another reviewer.
-
### Customer critical merge requests
A merge request may benefit from being considered a customer critical priority because there is a significant benefit to the business in doing so.
diff --git a/doc/development/contributing/design.md b/doc/development/contributing/design.md
index ce013a9254b..04915985dc8 100644
--- a/doc/development/contributing/design.md
+++ b/doc/development/contributing/design.md
@@ -68,9 +68,9 @@ Check states using your browser's _styles inspector_ to toggle CSS pseudo-classe
like `:hover` and others ([Chrome](https://developer.chrome.com/docs/devtools/css/reference/#pseudo-class),
[Firefox](https://firefox-source-docs.mozilla.org/devtools-user/page_inspector/how_to/examine_and_edit_css/index.html#viewing-common-pseudo-classes)).
-- Account for all applicable states ([error](https://design.gitlab.com/content/error-messages),
+- Account for all applicable states ([error](https://design.gitlab.com/content/error-messages/),
rest, loading, focus, hover, selected, disabled).
-- Account for states dependent on data size ([empty](https://design.gitlab.com/regions/empty-states),
+- Account for states dependent on data size ([empty](https://design.gitlab.com/regions/empty-states/),
some data, and lots of data).
- Account for states dependent on user role, user preferences, and subscription.
- Consider animations and transitions, and follow their [guidelines](https://design.gitlab.com/product-foundations/motion/).
@@ -113,7 +113,7 @@ When the design is ready, _before_ starting its implementation:
At any moment, but usually _during_ or _after_ the design's implementation:
-- Contribute [issues to Pajamas](https://design.gitlab.com/get-started/contribute#contribute-an-issue)
+- Contribute [issues to Pajamas](https://design.gitlab.com/get-started/contribute/#contribute-an-issue)
for additions or enhancements to the design system.
- Create issues with the [`~UX debt`](issue_workflow.md#technical-and-ux-debt)
label for intentional deviations from the agreed-upon UX requirements due to
diff --git a/doc/development/contributing/index.md b/doc/development/contributing/index.md
index 12fd7c3dc12..6999ffe810e 100644
--- a/doc/development/contributing/index.md
+++ b/doc/development/contributing/index.md
@@ -186,21 +186,11 @@ reasons for including it.
Mention a maintainer in merge requests that contain:
- More than 500 changes.
-- Any major [breaking changes](#breaking-changes).
+- Any major [breaking changes](../deprecation_guidelines/index.md).
- External libraries.
If you are not sure who to mention, the reviewer will do this for you early in the merge request process.
-#### Breaking changes
-
-A "breaking change" is any change that requires users to make a corresponding change to their code, settings, or workflow. "Users" might be humans, API clients, or even code classes that "use" another class. Examples of breaking changes include:
-
-- Removing a user-facing feature without a replacement/workaround.
-- Changing the definition of an existing API (by doing things like re-naming query parameters or changing routes).
-- Removing a public method from a code class.
-
-A breaking change can be considered "major" if it affects many users, or represents a significant change in behavior.
-
#### Issues workflow
This [documentation](issue_workflow.md) outlines the current issue workflow:
@@ -218,7 +208,7 @@ This [documentation](issue_workflow.md) outlines the current issue workflow:
This [documentation](merge_request_workflow.md) outlines the current merge request process.
-- [Merge request guidelines](merge_request_workflow.md#merge-request-guidelines)
+- [Merge request guidelines](merge_request_workflow.md#merge-request-guidelines-for-contributors)
- [Contribution acceptance criteria](merge_request_workflow.md#contribution-acceptance-criteria)
- [Definition of done](merge_request_workflow.md#definition-of-done)
- [Dependencies](merge_request_workflow.md#dependencies)
diff --git a/doc/development/contributing/issue_workflow.md b/doc/development/contributing/issue_workflow.md
index c6d977cf5ad..d557319b41f 100644
--- a/doc/development/contributing/issue_workflow.md
+++ b/doc/development/contributing/issue_workflow.md
@@ -53,7 +53,7 @@ Most issues will have labels for at least one of the following:
- Priority: `~"priority::1"`, `~"priority::2"`, `~"priority::3"`, `~"priority::4"`
- Severity: ~`"severity::1"`, `~"severity::2"`, `~"severity::3"`, `~"severity::4"`
-Please add `~"breaking change"` label if the issue can be considered as a [breaking change](index.md#breaking-changes).
+Please add `~"breaking change"` label if the issue can be considered as a [breaking change](../deprecation_guidelines/index.md).
Please add `~security` label if the issue is related to application security.
@@ -161,7 +161,7 @@ For instance, the "DevOps Reports" category is represented by the
`devops_reports.name` value is "DevOps Reports".
If a category's label doesn't respect this naming convention, it should be specified
-with [the `label` attribute](https://about.gitlab.com/handbook/marketing/inbound-marketing/digital-experience/website/#category-attributes)
+with [the `label` attribute](https://about.gitlab.com/handbook/marketing/digital-experience/website/#category-attributes)
in <https://gitlab.com/gitlab-com/www-gitlab-com/blob/master/data/categories.yml>.
### Feature labels
diff --git a/doc/development/contributing/merge_request_workflow.md b/doc/development/contributing/merge_request_workflow.md
index eff1d2e671d..faa1642d50a 100644
--- a/doc/development/contributing/merge_request_workflow.md
+++ b/doc/development/contributing/merge_request_workflow.md
@@ -33,7 +33,7 @@ some potentially easy issues.
To start developing GitLab, download the [GitLab Development Kit](https://gitlab.com/gitlab-org/gitlab-development-kit)
and see the [Development section](../../index.md) for the required guidelines.
-## Merge request guidelines
+## Merge request guidelines for contributors
If you find an issue, please submit a merge request with a fix or improvement, if
you can, and include tests. If you don't know how to fix the issue but can write a test
@@ -45,20 +45,10 @@ request is as follows:
1. [Fork](../../user/project/repository/forking_workflow.md) the project into
your personal namespace (or group) on GitLab.com.
1. Create a feature branch in your fork (don't work off your [default branch](../../user/project/repository/branches/default.md)).
-1. Write [tests](../rake_tasks.md#run-tests) and code.
-1. [Ensure a changelog is created](../changelog.md).
-1. If you are writing documentation, make sure to follow the
- [documentation guidelines](../documentation/index.md).
1. Follow the [commit messages guidelines](#commit-messages-guidelines).
-1. If you have multiple commits, combine them into a few logically organized
- commits by [squashing them](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History#_squashing),
- but do not change the commit history if you're working on shared branches though.
+1. If you have multiple commits, combine them into a few logically organized commits.
1. Push the commits to your working branch in your fork.
-1. Submit a merge request (MR) to the `main` branch in the main GitLab project.
- 1. Your merge request needs at least 1 approval, but depending on your changes
- you might need additional approvals. Refer to the [Approval guidelines](../code_review.md#approval-guidelines).
- 1. You don't have to select any specific approvers, but you can if you really want
- specific people to approve your merge request.
+1. Submit a merge request (MR) against the default branch of the upstream project.
1. The MR title should describe the change you want to make.
1. The MR description should give a reason for your change.
1. If you are contributing code, fill in the description according to the default
@@ -68,58 +58,15 @@ request is as follows:
1. Use the syntax `Solves #XXX`, `Closes #XXX`, or `Refs #XXX` to mention the issues your merge
request addresses. Referenced issues do not [close automatically](../../user/project/issues/managing_issues.md#closing-issues-automatically).
You must close them manually once the merge request is merged.
- 1. The MR must include *Before* and *After* screenshots if UI changes are made.
- 1. Include any steps or setup required to ensure reviewers can view the changes you've made (for example, include any information about feature flags).
1. If you're allowed to, set a relevant milestone and [labels](issue_workflow.md).
MR labels should generally match the corresponding issue (if there is one).
The group label should reflect the group that executed or coached the work,
not necessarily the group that owns the feature.
-1. UI changes should use available components from the GitLab Design System,
- [Pajamas](https://design.gitlab.com/).
-1. If the MR changes CSS classes, please include the list of affected pages, which
- can be found by running `grep css-class ./app -R`.
-1. If your MR touches code that executes shell commands, reads or opens files, or
- handles paths to files on disk, make sure it adheres to the
- [shell command guidelines](../shell_commands.md)
-1. [Code changes should include observability instrumentation](../code_review.md#observability-instrumentation).
-1. If your code needs to handle file storage, see the [uploads documentation](../uploads/index.md).
-1. If your merge request adds one or more migrations, make sure to execute all
- migrations on a fresh database before the MR is reviewed. If the review leads
- to large changes in the MR, execute the migrations again once the review is complete.
-1. Write tests for more complex migrations.
-1. If your merge request adds new validations to existing models, to make sure the
- data processing is backwards compatible:
-
- - Ask in the [`#database`](https://gitlab.slack.com/archives/CNZ8E900G) Slack channel
- for assistance to execute the database query that checks the existing rows to
- ensure existing rows aren't impacted by the change.
- - Add the necessary validation with a feature flag to be gradually rolled out
- following [the rollout steps](https://about.gitlab.com/handbook/product-development-flow/feature-flag-lifecycle/#rollout).
-
- If this merge request is urgent, the code owners should make the final call on
- whether reviewing existing rows should be included as an immediate follow-up task
- to the merge request.
-
- NOTE:
- There isn't a way to know anything about our customers' data on their
- [self-managed instances](../../subscriptions/self_managed/index.md), so keep
- that in mind for any data implications with your merge request.
-
-1. Merge requests **must** adhere to the [merge request performance guidelines](../merge_request_performance_guidelines.md).
-1. For tests that use Capybara, read
- [how to write reliable, asynchronous integration tests](https://thoughtbot.com/blog/write-reliable-asynchronous-integration-tests-with-capybara).
-1. If your merge request introduces changes that require additional steps when
- installing GitLab from source, add them to `doc/install/installation.md` in
- the same merge request.
-1. If your merge request introduces changes that require additional steps when
- upgrading GitLab from source, add them to
- `doc/update/upgrading_from_source.md` in the same merge request. If these
- instructions are specific to a version, add them to the "Version specific
- upgrading instructions" section.
1. Read and adhere to
[The responsibility of the merge request author](../code_review.md#the-responsibility-of-the-merge-request-author).
1. Read and follow
[Having your merge request reviewed](../code_review.md#having-your-merge-request-reviewed).
+1. Make sure the merge request meets the [Definition of done](#definition-of-done).
If you would like quick feedback on your merge request feel free to mention someone
from the [core team](https://about.gitlab.com/community/core-team/) or one of the
@@ -172,7 +119,7 @@ Commit messages should follow the guidelines below, for reasons explained by Chr
#### Why these standards matter
1. Consistent commit messages that follow these guidelines make the history more readable.
-1. Concise standard commit messages helps to identify [breaking changes](index.md#breaking-changes) for a deployment or ~"master:broken" quicker when
+1. Concise standard commit messages helps to identify [breaking changes](../deprecation_guidelines/index.md) for a deployment or ~"master:broken" quicker when
reviewing commits between two points in time.
#### Commit message template
@@ -218,12 +165,12 @@ the contribution acceptance criteria below:
exposing a bug in existing code). Every new class should have corresponding
unit tests, even if the class is exercised at a higher level, such as a feature test.
- If a failing CI build seems to be unrelated to your contribution, you can try
- restarting the failing CI job, rebasing from `main` to bring in updates that
+ restarting the failing CI job, rebasing on top of target branch to bring in updates that
may resolve the failure, or if it has not been fixed yet, ask a developer to
help you fix the test.
-1. The MR initially contains a few logically organized commits.
+1. The MR contains a few logically organized commits, or has [squashing commits enabled](../../user/project/merge_requests/squash_and_merge.md#squash-and-merge).
1. The changes can merge without problems. If not, you should rebase if you're the
- only one working on your feature branch, otherwise merge `main`.
+ only one working on your feature branch, otherwise merge the default branch into the MR branch.
1. Only one specific issue is fixed or one specific feature is implemented. Do not
combine things; send separate merge requests for each issue or feature.
1. Migrations should do only one thing (for example, create a table, move data to a new
@@ -258,13 +205,60 @@ requirements.
### MR Merge
-1. Clear description explaining the relevancy of the contribution.
+1. Clear title and description explaining the relevancy of the contribution.
1. Working and clean code that is commented where needed.
-1. [Unit, integration, and system tests](../testing_guide/index.md) that all pass
- on the CI server.
-1. Peer member testing is optional but recommended when the risk of a change is high. This includes when the changes are [far-reaching](https://about.gitlab.com/handbook/engineering/development/#reducing-the-impact-of-far-reaching-work) or are for [components critical for security](../code_review.md#security).
-1. Regressions and bugs are covered with tests that reduce the risk of the issue happening
- again.
+1. The change is evaluated to [limit the impact of far-reaching work](https://about.gitlab.com/handbook/engineering/development/#reducing-the-impact-of-far-reaching-work).
+1. Testing:
+
+ - [Unit, integration, and system tests](../testing_guide/index.md) that all pass
+ on the CI server.
+ - Peer member testing is optional but recommended when the risk of a change is high.
+ This includes when the changes are [far-reaching](https://about.gitlab.com/handbook/engineering/development/#reducing-the-impact-of-far-reaching-work)
+ or are for [components critical for security](../code_review.md#security).
+ - Description includes any steps or setup required to ensure reviewers can view the changes you've made (for example, include any information about feature flags).
+ - Regressions and bugs are covered with tests that reduce the risk of the issue happening
+ again.
+ - For tests that use Capybara, read
+ [how to write reliable, asynchronous integration tests](https://thoughtbot.com/blog/write-reliable-asynchronous-integration-tests-with-capybara).
+ - [Black-box tests/end-to-end tests](../testing_guide/testing_levels.md#black-box-tests-at-the-system-level-aka-end-to-end-tests)
+ added if required. Please contact [the quality team](https://about.gitlab.com/handbook/engineering/quality/#teams)
+ with any questions.
+ - The change is tested in a review app where possible and if appropriate.
+1. In case of UI changes:
+
+ - Use available components from the GitLab Design System,
+ [Pajamas](https://design.gitlab.com/).
+ - The MR must include *Before* and *After* screenshots if UI changes are made.
+ - If the MR changes CSS classes, please include the list of affected pages, which
+ can be found by running `grep css-class ./app -R`.
+1. If your MR touches code that executes shell commands, reads or opens files, or
+ handles paths to files on disk, make sure it adheres to the
+ [shell command guidelines](../shell_commands.md)
+1. [Code changes should include observability instrumentation](../code_review.md#observability-instrumentation).
+1. If your code needs to handle file storage, see the [uploads documentation](../uploads/index.md).
+1. If your merge request adds one or more migrations:
+ - Make sure to execute all migrations on a fresh database before the MR is reviewed.
+ If the review leads to large changes in the MR, execute the migrations again
+ after the review is complete.
+ - Write tests for more complex migrations.
+1. If your merge request adds new validations to existing models, to make sure the
+ data processing is backwards compatible:
+
+ - Ask in the [`#database`](https://gitlab.slack.com/archives/CNZ8E900G) Slack channel
+ for assistance to execute the database query that checks the existing rows to
+ ensure existing rows aren't impacted by the change.
+ - Add the necessary validation with a feature flag to be gradually rolled out
+ following [the rollout steps](https://about.gitlab.com/handbook/product-development-flow/feature-flag-lifecycle/#rollout).
+
+ If this merge request is urgent, the code owners should make the final call on
+ whether reviewing existing rows should be included as an immediate follow-up task
+ to the merge request.
+
+ NOTE:
+ There isn't a way to know anything about our customers' data on their
+ [self-managed instances](../../subscriptions/self_managed/index.md), so keep
+ that in mind for any data implications with your merge request.
+
1. Code affected by a feature flag is covered by [automated tests with the feature flag enabled and disabled](../feature_flags/index.md#feature-flags-in-tests), or both
states are tested as part of peer member testing or as part of the rollout plan.
1. [Performance guidelines](../merge_request_performance_guidelines.md) have been followed.
@@ -272,16 +266,22 @@ requirements.
1. [Application and rate limit guidelines](../merge_request_application_and_rate_limit_guidelines.md) have been followed.
1. [Documented](../documentation/index.md) in the `/doc` directory.
1. [Changelog entry added](../changelog.md), if necessary.
+1. If your merge request introduces changes that require additional steps when
+ installing GitLab from source, add them to `doc/install/installation.md` in
+ the same merge request.
+1. If your merge request introduces changes that require additional steps when
+ upgrading GitLab from source, add them to
+ `doc/update/upgrading_from_source.md` in the same merge request. If these
+ instructions are specific to a version, add them to the "Version specific
+ upgrading instructions" section.
1. Reviewed by relevant reviewers, and all concerns are addressed for Availability, Regressions, and Security. Documentation reviews should take place as soon as possible, but they should not block a merge request.
1. The [MR acceptance checklist](../code_review.md#acceptance-checklist) has been checked as confirmed in the MR.
-1. Create an issue in the [infrastructure issue tracker](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues) to inform the Infrastructure department when your contribution is changing default settings or introduces a new setting, if relevant.
-1. [Black-box tests/end-to-end tests](../testing_guide/testing_levels.md#black-box-tests-at-the-system-level-aka-end-to-end-tests)
- added if required. Please contact [the quality team](https://about.gitlab.com/handbook/engineering/quality/#teams)
- with any questions.
-1. The change is tested in a review app where possible and if appropriate.
-1. The new feature does not degrade the user experience of the product.
-1. The change is evaluated to [limit the impact of far-reaching work](https://about.gitlab.com/handbook/engineering/development/#reducing-the-impact-of-far-reaching-work).
+1. Create an issue in the [infrastructure issue tracker](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues) to inform the Infrastructure department when your contribution is changing default settings or introduces a new setting, if relevant.
1. An agreed-upon [rollout plan](https://about.gitlab.com/handbook/engineering/development/processes/rollout-plans/).
+1. Your merge request has at least 1 approval, but depending on your changes
+ you might need additional approvals. Refer to the [Approval guidelines](../code_review.md#approval-guidelines).
+ - You don't have to select any specific approvers, but you can if you really want
+ specific people to approve your merge request.
1. Merged by a project maintainer.
### Production use
@@ -319,8 +319,8 @@ request:
We allow engineering time to fix small problems (with or without an
issue) that are incremental improvements, such as:
-1. Unprioritized bug fixes (for example, [Banner alerting of project move is
-showing up everywhere](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/18985))
+1. Unprioritized bug fixes (for example,
+ [Banner alerting of project move is showing up everywhere](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/18985))
1. Documentation improvements
1. RuboCop or Code Quality improvements
diff --git a/doc/development/creating_enums.md b/doc/development/creating_enums.md
index 450cb97d978..d3892c4c44e 100644
--- a/doc/development/creating_enums.md
+++ b/doc/development/creating_enums.md
@@ -1,154 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/creating_enums.md'
+remove_date: '2022-11-06'
---
-# Creating enums
+This document was moved to [another location](database/creating_enums.md).
-When creating a new enum, it should use the database type `SMALLINT`.
-The `SMALLINT` type size is 2 bytes, which is sufficient for an enum.
-This would help to save space in the database.
-
-To use this type, add `limit: 2` to the migration that creates the column.
-
-Example:
-
-```ruby
-def change
- add_column :ci_job_artifacts, :file_format, :integer, limit: 2
-end
-```
-
-## All of the key/value pairs should be defined in FOSS
-
-**Summary:** All enums needs to be defined in FOSS, if a model is also part of the FOSS.
-
-```ruby
-class Model < ApplicationRecord
- enum platform: {
- aws: 0,
- gcp: 1 # EE-only
- }
-end
-```
-
-When you add a new key/value pair to a `enum` and if it's EE-specific, you might be
-tempted to organize the `enum` as the following:
-
-```ruby
-# Define `failure_reason` enum in `Pipeline` model:
-class Pipeline < ApplicationRecord
- enum failure_reason: Enums::Pipeline.failure_reasons
-end
-```
-
-```ruby
-# Define key/value pairs that used in FOSS and EE:
-module Enums
- module Pipeline
- def self.failure_reasons
- { unknown_failure: 0, config_error: 1 }
- end
- end
-end
-
-Enums::Pipeline.prepend_mod_with('Enums::Pipeline')
-```
-
-```ruby
-# Define key/value pairs that used in EE only:
-module EE
- module Enums
- module Pipeline
- override :failure_reasons
- def failure_reasons
- super.merge(activity_limit_exceeded: 2)
- end
- end
- end
-end
-```
-
-This works as-is, however, it has a couple of downside that:
-
-- Someone could define a key/value pair in EE that is **conflicted** with a value defined in FOSS.
- For example, define `activity_limit_exceeded: 1` in `EE::Enums::Pipeline`.
-- When it happens, the feature works totally different.
- For example, we cannot figure out `failure_reason` is either `config_error` or `activity_limit_exceeded`.
-- When it happens, we have to ship a database migration to fix the data integrity,
- which might be impossible if you cannot recover the original value.
-
-Also, you might observe a workaround for this concern by setting an offset in EE's values.
-For example, this example sets `1000` as the offset:
-
-```ruby
-module EE
- module Enums
- module Pipeline
- override :failure_reasons
- def failure_reasons
- super.merge(activity_limit_exceeded: 1_000, size_limit_exceeded: 1_001)
- end
- end
- end
-end
-```
-
-This looks working as a workaround, however, this approach has some downsides that:
-
-- Features could move from EE to FOSS or vice versa. Therefore, the offset might be mixed between FOSS and EE in the future.
- For example, when you move `activity_limit_exceeded` to FOSS, you see `{ unknown_failure: 0, config_error: 1, activity_limit_exceeded: 1_000 }`.
-- The integer column for the `enum` is likely created [as `SMALLINT`](#creating-enums).
- Therefore, you need to be careful of that the offset doesn't exceed the maximum value of 2 bytes integer.
-
-As a conclusion, you should define all of the key/value pairs in FOSS.
-For example, you can simply write the following code in the above case:
-
-```ruby
-class Pipeline < ApplicationRecord
- enum failure_reason: {
- unknown_failure: 0,
- config_error: 1,
- activity_limit_exceeded: 2
- }
-end
-```
-
-## Add new values in the gap
-
-After merging some EE and FOSS enums, there might be a gap between the two groups of values:
-
-```ruby
-module Enums
- module Ci
- module CommitStatus
- def self.failure_reasons
- {
- # ...
- data_integrity_failure: 12,
- forward_deployment_failure: 13,
- insufficient_bridge_permissions: 1_001,
- downstream_bridge_project_not_found: 1_002,
- # ...
- }
- end
- end
- end
-end
-```
-
-To add new values, you should fill the gap first.
-In the example above add `14` instead of `1_003`:
-
-```ruby
-{
- # ...
- data_integrity_failure: 12,
- forward_deployment_failure: 13,
- a_new_value: 14,
- insufficient_bridge_permissions: 1_001,
- downstream_bridge_project_not_found: 1_002,
- # ...
-}
-```
+<!-- This redirect file can be deleted after <2022-11-06>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/database/add_foreign_key_to_existing_column.md b/doc/development/database/add_foreign_key_to_existing_column.md
index 7a18da2223f..8a8fe3c0a1e 100644
--- a/doc/development/database/add_foreign_key_to_existing_column.md
+++ b/doc/development/database/add_foreign_key_to_existing_column.md
@@ -6,7 +6,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Add a foreign key constraint to an existing column
-Foreign keys ensure consistency between related database tables. The current database review process **always** encourages you to add [foreign keys](../foreign_keys.md) when creating tables that reference records from other tables.
+Foreign keys ensure consistency between related database tables. The current database review process **always** encourages you to add [foreign keys](foreign_keys.md) when creating tables that reference records from other tables.
Starting with Rails version 4, Rails includes migration helpers to add foreign key constraints
to database tables. Before Rails 4, the only way for ensuring some level of consistency was the
@@ -95,7 +95,7 @@ The approach here depends on the data volume and the cleanup strategy. If we can
records by doing a database query and the record count is not high, then the data migration can
be executed in a Rails migration.
-In case the data volume is higher (>1000 records), it's better to create a background migration. If unsure, please contact the database team for advice.
+In case the data volume is higher (>1000 records), it's better to create a background migration. If unsure, contact the database team for advice.
Example for cleaning up records in the `emails` table in a database migration:
diff --git a/doc/development/database/adding_database_indexes.md b/doc/development/database/adding_database_indexes.md
new file mode 100644
index 00000000000..8abd7c8298e
--- /dev/null
+++ b/doc/development/database/adding_database_indexes.md
@@ -0,0 +1,410 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Adding Database Indexes
+
+Indexes can be used to speed up database queries, but when should you add a new
+index? Traditionally the answer to this question has been to add an index for
+every column used for filtering or joining data. For example, consider the
+following query:
+
+```sql
+SELECT *
+FROM projects
+WHERE user_id = 2;
+```
+
+Here we are filtering by the `user_id` column and as such a developer may decide
+to index this column.
+
+While in certain cases indexing columns using the above approach may make sense,
+it can actually have a negative impact. Whenever you write data to a table, any
+existing indexes must also be updated. The more indexes there are, the slower this
+can potentially become. Indexes can also take up significant disk space, depending
+on the amount of data indexed and the index type. For example, PostgreSQL offers
+`GIN` indexes which can be used to index certain data types that cannot be
+indexed by regular B-tree indexes. These indexes, however, generally take up more
+data and are slower to update compared to B-tree indexes.
+
+Because of all this, it's important make the following considerations
+when adding a new index:
+
+1. Do the new queries re-use as many existing indexes as possible?
+1. Is there enough data that using an index is faster than iterating over
+ rows in the table?
+1. Is the overhead of maintaining the index worth the reduction in query
+ timings?
+
+## Re-using Queries
+
+The first step is to make sure your query re-uses as many existing indexes as
+possible. For example, consider the following query:
+
+```sql
+SELECT *
+FROM todos
+WHERE user_id = 123
+AND state = 'open';
+```
+
+Now imagine we already have an index on the `user_id` column but not on the
+`state` column. One may think this query performs badly due to `state` being
+unindexed. In reality the query may perform just fine given the index on
+`user_id` can filter out enough rows.
+
+The best way to determine if indexes are re-used is to run your query using
+`EXPLAIN ANALYZE`. Depending on the joined tables and the columns being used for filtering,
+you may find an extra index doesn't make much, if any, difference.
+
+In short:
+
+1. Try to write your query in such a way that it re-uses as many existing
+ indexes as possible.
+1. Run the query using `EXPLAIN ANALYZE` and study the output to find the most
+ ideal query.
+
+## Data Size
+
+A database may not use an index even when a regular sequence scan
+(iterating over all rows) is faster, especially for small tables.
+
+Consider adding an index if a table is expected to grow, and your query has to filter a lot of rows.
+You may _not_ want to add an index if the table size is small (<`1,000` records),
+or if existing indexes already filter out enough rows.
+
+## Maintenance Overhead
+
+Indexes have to be updated on every table write. In the case of PostgreSQL, _all_
+existing indexes are updated whenever data is written to a table. As a
+result, having many indexes on the same table slows down writes. It's therefore important
+to balance query performance with the overhead of maintaining an extra index.
+
+Let's say that adding an index reduces SELECT timings by 5 milliseconds but increases
+INSERT/UPDATE/DELETE timings by 10 milliseconds. In this case, the new index may not be worth
+it. A new index is more valuable when SELECT timings are reduced and INSERT/UPDATE/DELETE
+timings are unaffected.
+
+## Finding Unused Indexes
+
+To see which indexes are unused you can run the following query:
+
+```sql
+SELECT relname as table_name, indexrelname as index_name, idx_scan, idx_tup_read, idx_tup_fetch, pg_size_pretty(pg_relation_size(indexrelname::regclass))
+FROM pg_stat_all_indexes
+WHERE schemaname = 'public'
+AND idx_scan = 0
+AND idx_tup_read = 0
+AND idx_tup_fetch = 0
+ORDER BY pg_relation_size(indexrelname::regclass) desc;
+```
+
+This query outputs a list containing all indexes that are never used and sorts
+them by indexes sizes in descending order. This query helps in
+determining whether existing indexes are still required. More information on
+the meaning of the various columns can be found at
+<https://www.postgresql.org/docs/current/monitoring-stats.html>.
+
+To determine if an index is still being used on production, use the following
+Thanos query with your index name:
+
+```sql
+sum(rate(pg_stat_user_indexes_idx_tup_read{env="gprd", indexrelname="index_ci_name", type="patroni-ci"}[5m]))
+```
+
+Because the query output relies on the actual usage of your database, it
+may be affected by factors such as:
+
+- Certain queries never being executed, thus not being able to use certain
+ indexes.
+- Certain tables having little data, resulting in PostgreSQL using sequence
+ scans instead of index scans.
+
+This data is only reliable for a frequently used database with
+plenty of data, and using as many GitLab features as possible.
+
+## Requirements for naming indexes
+
+Indexes with complex definitions must be explicitly named rather than
+relying on the implicit naming behavior of migration methods. In short,
+that means you **must** provide an explicit name argument for an index
+created with one or more of the following options:
+
+- `where`
+- `using`
+- `order`
+- `length`
+- `type`
+- `opclass`
+
+### Considerations for index names
+
+Check our [Constraints naming conventions](constraint_naming_convention.md) page.
+
+### Why explicit names are required
+
+As Rails is database agnostic, it generates an index name only
+from the required options of all indexes: table name and column names.
+For example, imagine the following two indexes are created in a migration:
+
+```ruby
+def up
+ add_index :my_table, :my_column
+
+ add_index :my_table, :my_column, where: 'my_column IS NOT NULL'
+end
+```
+
+Creation of the second index would fail, because Rails would generate
+the same name for both indexes.
+
+This naming issue is further complicated by the behavior of the `index_exists?` method.
+It considers only the table name, column names, and uniqueness specification
+of the index when making a comparison. Consider:
+
+```ruby
+def up
+ unless index_exists?(:my_table, :my_column, where: 'my_column IS NOT NULL')
+ add_index :my_table, :my_column, where: 'my_column IS NOT NULL'
+ end
+end
+```
+
+The call to `index_exists?` returns true if **any** index exists on
+`:my_table` and `:my_column`, and index creation is bypassed.
+
+The `add_concurrent_index` helper is a requirement for creating indexes
+on populated tables. Because it cannot be used inside a transactional
+migration, it has a built-in check that detects if the index already
+exists. In the event a match is found, index creation is skipped.
+Without an explicit name argument, Rails can return a false positive
+for `index_exists?`, causing a required index to not be created
+properly. By always requiring a name for certain types of indexes, the
+chance of error is greatly reduced.
+
+## Temporary indexes
+
+There may be times when an index is only needed temporarily.
+
+For example, in a migration, a column of a table might be conditionally
+updated. To query which columns must be updated in the
+[query performance guidelines](query_performance.md), an index is needed
+that would otherwise not be used.
+
+In these cases, consider a temporary index. To specify a
+temporary index:
+
+1. Prefix the index name with `tmp_` and follow the [naming conventions](constraint_naming_convention.md).
+1. Create a follow-up issue to remove the index in the next (or future) milestone.
+1. Add a comment in the migration mentioning the removal issue.
+
+A temporary migration would look like:
+
+```ruby
+INDEX_NAME = 'tmp_index_projects_on_owner_where_emails_disabled'
+
+def up
+ # Temporary index to be removed in 13.9 https://gitlab.com/gitlab-org/gitlab/-/issues/1234
+ add_concurrent_index :projects, :creator_id, where: 'emails_disabled = false', name: INDEX_NAME
+end
+
+def down
+ remove_concurrent_index_by_name :projects, INDEX_NAME
+end
+```
+
+## Create indexes asynchronously
+
+For very large tables, index creation can be a challenge to manage.
+While `add_concurrent_index` creates indexes in a way that does not block
+normal traffic, it can still be problematic when index creation runs for
+many hours. Necessary database operations like `autovacuum` cannot run, and
+on GitLab.com, the deployment process is blocked waiting for index
+creation to finish.
+
+To limit impact on GitLab.com, a process exists to create indexes
+asynchronously during weekend hours. Due to generally lower traffic and fewer deployments,
+index creation can proceed at a lower level of risk.
+
+### Schedule index creation for a low-impact time
+
+1. [Schedule the index to be created](#schedule-the-index-to-be-created).
+1. [Verify the MR was deployed and the index exists in production](#verify-the-mr-was-deployed-and-the-index-exists-in-production).
+1. [Add a migration to create the index synchronously](#add-a-migration-to-create-the-index-synchronously).
+
+### Schedule the index to be created
+
+Create an MR with a post-deployment migration which prepares the index
+for asynchronous creation. An example of creating an index using
+the asynchronous index helpers can be seen in the block below. This migration
+enters the index name and definition into the `postgres_async_indexes`
+table. The process that runs on weekends pulls indexes from this
+table and attempt to create them.
+
+```ruby
+# in db/post_migrate/
+
+INDEX_NAME = 'index_ci_builds_on_some_column'
+
+def up
+ prepare_async_index :ci_builds, :some_column, name: INDEX_NAME
+end
+
+def down
+ unprepare_async_index :ci_builds, :some_column, name: INDEX_NAME
+end
+```
+
+### Verify the MR was deployed and the index exists in production
+
+You can verify if the post-deploy migration was executed on GitLab.com by:
+
+- Executing `/chatops run auto_deploy status <merge_sha>`. If the output returns `db/gprd`,
+ the post-deploy migration has been executed in the production database. More details in this
+ [guide](https://gitlab.com/gitlab-org/release/docs/-/blob/master/general/post_deploy_migration/readme.md#how-to-determine-if-a-post-deploy-migration-has-been-executed-on-gitlabcom).
+- Use a meta-command in #database-lab, such as: `\d <index_name>`.
+ - Ensure that the index is not [`invalid`](https://www.postgresql.org/docs/12/sql-createindex.html#:~:text=The%20psql%20%5Cd%20command%20will%20report%20such%20an%20index%20as%20INVALID).
+- Ask someone in #database to check if the index exists.
+- With proper access, you can also verify directly on production or in a
+ production clone.
+
+### Add a migration to create the index synchronously
+
+After the index is verified to exist on the production database, create a second
+merge request that adds the index synchronously. The schema changes must be
+updated and committed to `structure.sql` in this second merge request.
+The synchronous migration results in a no-op on GitLab.com, but you should still add the
+migration as expected for other installations. The below block
+demonstrates how to create the second migration for the previous
+asynchronous example.
+
+**WARNING:**
+Verify that the index exists in production before merging a second migration with `add_concurrent_index`.
+If the second migration is deployed before the index has been created,
+the index is created synchronously when the second migration executes.
+
+```ruby
+# in db/post_migrate/
+
+INDEX_NAME = 'index_ci_builds_on_some_column'
+
+disable_ddl_transaction!
+
+def up
+ add_concurrent_index :ci_builds, :some_column, name: INDEX_NAME
+end
+
+def down
+ remove_concurrent_index_by_name :ci_builds, INDEX_NAME
+end
+```
+
+## Test database index changes locally
+
+You must test the database index changes locally before creating a merge request.
+
+### Verify indexes created asynchronously
+
+Use the asynchronous index helpers on your local environment to test changes for creating an index:
+
+1. Enable the feature flags by running `Feature.enable(:database_async_index_creation)` and `Feature.enable(:database_reindexing)` in the Rails console.
+1. Run `bundle exec rails db:migrate` so that it creates an entry in the `postgres_async_indexes` table.
+1. Run `bundle exec rails gitlab:db:reindex` so that the index is created asynchronously.
+1. To verify the index, open the PostgreSQL console using the [GDK](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/postgresql.md) command `gdk psql` and run the command `\d <index_name>` to check that your newly created index exists.
+
+## Drop indexes asynchronously
+
+For very large tables, index destruction can be a challenge to manage.
+While `remove_concurrent_index` removes indexes in a way that does not block
+normal traffic, it can still be problematic if index destruction runs for
+during autovacuum. Necessary database operations like `autovacuum` cannot run, and
+the deployment process on GitLab.com is blocked while waiting for index
+destruction to finish.
+
+To limit the impact on GitLab.com, use the following process to remove indexes
+asynchronously during weekend hours. Due to generally lower traffic and fewer deployments,
+index destruction can proceed at a lower level of risk.
+
+1. [Schedule the index to be removed](#schedule-the-index-to-be-removed).
+1. [Verify the MR was deployed and the index exists in production](#verify-the-mr-was-deployed-and-the-index-exists-in-production).
+1. [Add a migration to create the index synchronously](#add-a-migration-to-create-the-index-synchronously).
+
+### Schedule the index to be removed
+
+Create an MR with a post-deployment migration which prepares the index
+for asynchronous destruction. For example. to destroy an index using
+the asynchronous index helpers:
+
+```ruby
+# in db/post_migrate/
+
+INDEX_NAME = 'index_ci_builds_on_some_column'
+
+def up
+ prepare_async_index_removal :ci_builds, :some_column, name: INDEX_NAME
+end
+
+def down
+ unprepare_async_index :ci_builds, :some_column, name: INDEX_NAME
+end
+```
+
+This migration enters the index name and definition into the `postgres_async_indexes`
+table. The process that runs on weekends pulls indexes from this table and attempt
+to remove them.
+
+You must test the database index changes locally before creating a merge request.
+
+### Verify the MR was deployed and the index exists in production
+
+You can verify if the MR was deployed to GitLab.com with
+`/chatops run auto_deploy status <merge_sha>`. To verify the existence of
+the index, you can:
+
+- Use a meta-command in `#database-lab`, for example: `\d <index_name>`.
+ - Make sure the index is not [`invalid`](https://www.postgresql.org/docs/12/sql-createindex.html#:~:text=The%20psql%20%5Cd%20command%20will%20report%20such%20an%20index%20as%20INVALID).
+- Ask someone in `#database` to check if the index exists.
+- If you have access, you can verify directly on production or in a
+ production clone.
+
+### Add a migration to destroy the index synchronously
+
+After you verify the index exists in the production database, create a second
+merge request that removes the index synchronously. The schema changes must be
+updated and committed to `structure.sql` in this second merge request.
+The synchronous migration results in a no-op on GitLab.com, but you should still add the
+migration as expected for other installations. For example, to
+create the second migration for the previous asynchronous example:
+
+**WARNING:**
+Verify that the index no longer exist in production before merging a second migration with `remove_concurrent_index_by_name`.
+If the second migration is deployed before the index has been destroyed,
+the index is destroyed synchronously when the second migration executes.
+
+```ruby
+# in db/post_migrate/
+
+INDEX_NAME = 'index_ci_builds_on_some_column'
+
+disable_ddl_transaction!
+
+def up
+ remove_concurrent_index_by_name :ci_builds, name: INDEX_NAME
+end
+
+def down
+ add_concurrent_index :ci_builds, :some_column, INDEX_NAME
+end
+```
+
+### Verify indexes removed asynchronously
+
+To test changes for removing an index, use the asynchronous index helpers on your local environment:
+
+1. Enable the feature flags by running `Feature.enable(:database_async_index_destruction)` and `Feature.enable(:database_reindexing)` in the Rails console.
+1. Run `bundle exec rails db:migrate` which should create an entry in the `postgres_async_indexes` table.
+1. Run `bundle exec rails gitlab:db:reindex` destroy the index asynchronously.
+1. To verify the index, open the PostgreSQL console by using the [GDK](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/postgresql.md)
+ command `gdk psql` and run `\d <index_name>` to check that the destroyed index no longer exists.
diff --git a/doc/development/database/avoiding_downtime_in_migrations.md b/doc/development/database/avoiding_downtime_in_migrations.md
index 2d079656e23..79c76b351c8 100644
--- a/doc/development/database/avoiding_downtime_in_migrations.md
+++ b/doc/development/database/avoiding_downtime_in_migrations.md
@@ -93,9 +93,8 @@ class RemoveUsersUpdatedAtColumn < Gitlab::Database::Migration[2.0]
end
```
-You can consider [enabling lock retries](
-https://docs.gitlab.com/ee/development/migration_style_guide.html#usage-with-transactional-migrations
-) when you run a migration on big tables, because it might take some time to
+You can consider [enabling lock retries](../migration_style_guide.md#usage-with-transactional-migrations)
+when you run a migration on big tables, because it might take some time to
acquire a lock on this table.
#### B. The removed column has an index or constraint that belongs to it
@@ -104,7 +103,7 @@ If the `down` method requires adding back any dropped indexes or constraints, th
be done within a transactional migration, then the migration would look like this:
```ruby
-class RemoveUsersUpdatedAtColumn < Gitlab::Database::Migration[1.0]
+class RemoveUsersUpdatedAtColumn < Gitlab::Database::Migration[2.0]
disable_ddl_transaction!
def up
@@ -126,13 +125,11 @@ end
In the `down` method, we check to see if the column already exists before adding it again.
We do this because the migration is non-transactional and might have failed while it was running.
-The [`disable_ddl_transaction!`](
-https://docs.gitlab.com/ee/development/migration_style_guide.html#usage-with-non-transactional-migrations-disable_ddl_transaction
-) is used to disable the transaction that wraps the whole migration.
+The [`disable_ddl_transaction!`](../migration_style_guide.md#usage-with-non-transactional-migrations-disable_ddl_transaction)
+is used to disable the transaction that wraps the whole migration.
-You can refer to the page [Migration Style Guide](
-https://docs.gitlab.com/ee/development/migration_style_guide.html
-) for more information about database migrations.
+You can refer to the page [Migration Style Guide](../migration_style_guide.md)
+for more information about database migrations.
### Step 3: Removing the ignore rule (release M+2)
@@ -145,9 +142,13 @@ the `remove_after` date has passed.
## Renaming Columns
Renaming columns the normal way requires downtime as an application may continue
-using the old column name during/after a database migration. To rename a column
-without requiring downtime we need two migrations: a regular migration, and a
-post-deployment migration. Both these migration can go in the same release.
+to use the old column names during or after a database migration. To rename a column
+without requiring downtime, we need two migrations: a regular migration and a
+post-deployment migration. Both these migrations can go in the same release.
+
+NOTE:
+It's not possible to rename columns with default values. For more details, see
+[this merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/52032#default-values).
### Step 1: Add The Regular Migration
@@ -157,7 +158,7 @@ renaming. For example
```ruby
# A regular migration in db/migrate
-class RenameUsersUpdatedAtToUpdatedAtTimestamp < Gitlab::Database::Migration[1.0]
+class RenameUsersUpdatedAtToUpdatedAtTimestamp < Gitlab::Database::Migration[2.0]
disable_ddl_transaction!
def up
@@ -185,7 +186,7 @@ We can perform this cleanup using
```ruby
# A post-deployment migration in db/post_migrate
-class CleanupUsersUpdatedAtRename < Gitlab::Database::Migration[1.0]
+class CleanupUsersUpdatedAtRename < Gitlab::Database::Migration[2.0]
disable_ddl_transaction!
def up
@@ -198,7 +199,7 @@ class CleanupUsersUpdatedAtRename < Gitlab::Database::Migration[1.0]
end
```
-If you're renaming a [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3), please carefully consider the state when the first migration has run but the second cleanup migration hasn't been run yet.
+If you're renaming a [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3), carefully consider the state when the first migration has run but the second cleanup migration hasn't been run yet.
With [Canary](https://gitlab.com/gitlab-com/gl-infra/readiness/-/tree/master/library/canary/) it is possible that the system runs in this state for a significant amount of time.
## Changing Column Constraints
@@ -232,7 +233,7 @@ as follows:
```ruby
# A regular migration in db/migrate
-class ChangeUsersUsernameStringToText < Gitlab::Database::Migration[1.0]
+class ChangeUsersUsernameStringToText < Gitlab::Database::Migration[2.0]
disable_ddl_transaction!
def up
@@ -251,7 +252,7 @@ Next we need to clean up our changes using a post-deployment migration:
```ruby
# A post-deployment migration in db/post_migrate
-class ChangeUsersUsernameStringToTextCleanup < Gitlab::Database::Migration[1.0]
+class ChangeUsersUsernameStringToTextCleanup < Gitlab::Database::Migration[2.0]
disable_ddl_transaction!
def up
@@ -295,8 +296,7 @@ when migrating a column in a large table (for example, `issues`). Background
migrations spread the work / load over a longer time period, without slowing
down deployments.
-For more information, see [the documentation on cleaning up background
-migrations](background_migrations.md#cleaning-up).
+For more information, see [the documentation on cleaning up background migrations](background_migrations.md#cleaning-up).
## Adding Indexes
diff --git a/doc/development/database/background_migrations.md b/doc/development/database/background_migrations.md
index 0124dbae51f..9b596eb7379 100644
--- a/doc/development/database/background_migrations.md
+++ b/doc/development/database/background_migrations.md
@@ -8,7 +8,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
WARNING:
Background migrations are strongly discouraged in favor of the new [batched background migrations framework](batched_background_migrations.md).
-Please check that documentation and determine if that framework suits your needs and fall back
+Check that documentation and determine if that framework suits your needs and fall back
to these only if required.
Background migrations should be used to perform data migrations whenever a
@@ -368,9 +368,9 @@ A strategy to make the migration run faster is to schedule larger batches, and t
within the background migration to perform multiple statements.
The background migration helpers that queue multiple jobs such as
-`queue_background_migration_jobs_by_range_at_intervals` use [`EachBatch`](../iterating_tables_in_batches.md).
+`queue_background_migration_jobs_by_range_at_intervals` use [`EachBatch`](iterating_tables_in_batches.md).
The example above has batches of 1000, where each queued job takes two seconds. If the query has been optimized
-to make the time for the delete statement within the [query performance guidelines](../query_performance.md),
+to make the time for the delete statement within the [query performance guidelines](query_performance.md),
1000 may be the largest number of records that can be deleted in a reasonable amount of time.
The minimum and most common interval for delaying jobs is two minutes. This results in two seconds
diff --git a/doc/development/database/batched_background_migrations.md b/doc/development/database/batched_background_migrations.md
index f3ea82b5c61..edb22fcf436 100644
--- a/doc/development/database/batched_background_migrations.md
+++ b/doc/development/database/batched_background_migrations.md
@@ -105,11 +105,16 @@ for more details.
## Batched background migrations for EE-only features
-All the background migration classes for EE-only features should be present in GitLab CE.
-For this purpose, create an empty class for GitLab CE, and extend it for GitLab EE
+All the background migration classes for EE-only features should be present in GitLab FOSS.
+For this purpose, create an empty class for GitLab FOSS, and extend it for GitLab EE
as explained in the guidelines for
[implementing Enterprise Edition features](../ee_features.md#code-in-libgitlabbackground_migration).
+NOTE:
+Background migration classes for EE-only features that use job arguments should define them
+in the GitLab FOSS class. This is required to prevent job arguments validation from failing when
+migration is scheduled in GitLab FOSS context.
+
Batched Background migrations are simple classes that define a `perform` method. A
Sidekiq worker then executes such a class, passing any arguments to it. All
migration classes must be defined in the namespace
@@ -132,6 +137,10 @@ queue_batched_background_migration(
)
```
+NOTE:
+This helper raises an error if the number of provided job arguments does not match
+the number of [job arguments](#job-arguments) defined in `JOB_CLASS_NAME`.
+
Make sure the newly-created data is either migrated, or
saved in both the old and new version upon creation. Removals in
turn can be handled by defining foreign keys with cascading deletes.
@@ -186,6 +195,115 @@ Bump to the [import/export version](../../user/project/settings/import_export.md
be required, if importing a project from a prior version of GitLab requires the
data to be in the new format.
+## Job arguments
+
+`BatchedMigrationJob` provides the `job_arguments` helper method for job classes to define the job arguments they need.
+
+Batched migrations scheduled with `queue_batched_background_migration` **must** use the helper to define the job arguments:
+
+```ruby
+queue_batched_background_migration(
+ 'CopyColumnUsingBackgroundMigrationJob',
+ TABLE_NAME,
+ 'name', 'name_convert_to_text',
+ job_interval: DELAY_INTERVAL
+)
+```
+
+NOTE:
+If the number of defined job arguments does not match the number of job arguments provided when
+scheduling the migration, `queue_batched_background_migration` raises an error.
+
+In this example, `copy_from` returns `name`, and `copy_to` returns `name_convert_to_text`:
+
+```ruby
+class CopyColumnUsingBackgroundMigrationJob < BatchedMigrationJob
+ job_arguments :copy_from, :copy_to
+
+ def perform
+ from_column = connection.quote_column_name(copy_from)
+ to_column = connection.quote_column_name(copy_to)
+
+ assignment_clause = "#{to_column} = #{from_column}"
+
+ each_sub_batch(operation_name: :update_all) do |relation|
+ relation.update_all(assignment_clause)
+ end
+ end
+end
+```
+
+### Additional filters
+
+By default, when creating background jobs to perform the migration, batched background migrations
+iterate over the full specified table. This iteration is done using the
+[`PrimaryKeyBatchingStrategy`](https://gitlab.com/gitlab-org/gitlab/-/blob/c9dabd1f4b8058eece6d8cb4af95e9560da9a2ee/lib/gitlab/database/migrations/batched_background_migration_helpers.rb#L17). If the table has 1000 records
+and the batch size is 100, the work is batched into 10 jobs. For illustrative purposes,
+`EachBatch` is used like this:
+
+```ruby
+# PrimaryKeyBatchingStrategy
+Namespace.each_batch(of: 100) do |relation|
+ relation.where(type: nil).update_all(type: 'User') # this happens in each background job
+end
+```
+
+In some cases, only a subset of records must be examined. If only 10% of the 1000 records
+need examination, apply a filter to the initial relation when the jobs are created:
+
+```ruby
+Namespace.where(type: nil).each_batch(of: 100) do |relation|
+ relation.update_all(type: 'User')
+end
+```
+
+In the first example, we don't know how many records will be updated in each batch.
+In the second (filtered) example, we know exactly 100 will be updated with each batch.
+
+`BatchedMigrationJob` provides a `scope_to` helper method to apply additional filters and achieve this:
+
+1. Create a new migration job class that inherits from `BatchedMigrationJob` and defines the additional filter:
+
+ ```ruby
+ class BackfillNamespaceType < BatchedMigrationJob
+ scope_to ->(relation) { relation.where(type: nil) }
+
+ def perform
+ each_sub_batch(operation_name: :update_all) do |sub_batch|
+ sub_batch.update_all(type: 'User')
+ end
+ end
+ end
+ ```
+
+1. In the post-deployment migration, enqueue the batched background migration:
+
+ ```ruby
+ class BackfillNamespaceType < Gitlab::Database::Migration[2.0]
+ MIGRATION = 'BackfillNamespaceType'
+ DELAY_INTERVAL = 2.minutes
+
+ restrict_gitlab_migration gitlab_schema: :gitlab_main
+
+ def up
+ queue_batched_background_migration(
+ MIGRATION,
+ :namespaces,
+ :id,
+ job_interval: DELAY_INTERVAL
+ )
+ end
+
+ def down
+ delete_batched_background_migration(MIGRATION, :namespaces, :id, [])
+ end
+ end
+ ```
+
+NOTE:
+When applying additional filters, it is important to ensure they are properly covered by an index to optimize `EachBatch` performance.
+In the example above we need an index on `(type, id)` to support the filters. See [the `EachBatch` docs for more information](../iterating_tables_in_batches.md).
+
## Example
The `routes` table has a `source_type` field that's used for a polymorphic relationship.
@@ -221,8 +339,6 @@ background migration.
correctly handled by the batched migration framework. Any subclass of
`BatchedMigrationJob` is initialized with necessary arguments to
execute the batch, as well as a connection to the tracking database.
- Additional `job_arguments` set on the migration are passed to the
- job's `perform` method.
1. Add a new trigger to the database to update newly created and updated routes,
similar to this example:
@@ -320,7 +436,7 @@ The default batching strategy provides an efficient way to iterate over primary
However, if you need to iterate over columns where values are not unique, you must use a
different batching strategy.
-The `LooseIndexScanBatchingStrategy` batching strategy uses a special version of [`EachBatch`](../iterating_tables_in_batches.md#loose-index-scan-with-distinct_each_batch)
+The `LooseIndexScanBatchingStrategy` batching strategy uses a special version of [`EachBatch`](iterating_tables_in_batches.md#loose-index-scan-with-distinct_each_batch)
to provide efficient and stable iteration over the distinct column values.
This example shows a batched background migration where the `issues.project_id` column is used as
@@ -374,76 +490,8 @@ module Gitlab
end
```
-### Adding filters to the initial batching
-
-By default, when creating background jobs to perform the migration, batched background migrations will iterate over the full specified table. This is done using the [`PrimaryKeyBatchingStrategy`](https://gitlab.com/gitlab-org/gitlab/-/blob/c9dabd1f4b8058eece6d8cb4af95e9560da9a2ee/lib/gitlab/database/migrations/batched_background_migration_helpers.rb#L17). This means if there are 1000 records in the table and the batch size is 100, there will be 10 jobs. For illustrative purposes, `EachBatch` is used like this:
-
-```ruby
-# PrimaryKeyBatchingStrategy
-Projects.all.each_batch(of: 100) do |relation|
- relation.where(foo: nil).update_all(foo: 'bar') # this happens in each background job
-end
-```
-
-There are cases where we only need to look at a subset of records. Perhaps we only need to update 1 out of every 10 of those 1000 records. It would be best if we could apply a filter to the initial relation when the jobs are created:
-
-```ruby
-Projects.where(foo: nil).each_batch(of: 100) do |relation|
- relation.update_all(foo: 'bar')
-end
-```
-
-In the `PrimaryKeyBatchingStrategy` example, we do not know how many records will be updated in each batch. In the filtered example, we know exactly 100 will be updated with each batch.
-
-The `PrimaryKeyBatchingStrategy` contains [a method that can be overwritten](https://gitlab.com/gitlab-org/gitlab/-/blob/dd1e70d3676891025534dc4a1e89ca9383178fe7/lib/gitlab/background_migration/batching_strategies/primary_key_batching_strategy.rb#L38-52) to apply additional filtering on the initial `EachBatch`.
-
-We can accomplish this by:
-
-1. Create a new class that inherits from `PrimaryKeyBatchingStrategy` and overrides the method using the desired filter (this may be the same filter used in the sub-batch):
-
- ```ruby
- # frozen_string_literal: true
-
- module GitLab
- module BackgroundMigration
- module BatchingStrategies
- class FooStrategy < PrimaryKeyBatchingStrategy
- def apply_additional_filters(relation, job_arguments: [], job_class: nil)
- relation.where(foo: nil)
- end
- end
- end
- end
- end
- ```
-
-1. In the post-deployment migration that queues the batched background migration, specify the new batching strategy using the `batch_class_name` parameter:
-
- ```ruby
- class BackfillProjectsFoo < Gitlab::Database::Migration[2.0]
- MIGRATION = 'BackfillProjectsFoo'
- DELAY_INTERVAL = 2.minutes
- BATCH_CLASS_NAME = 'FooStrategy'
-
- restrict_gitlab_migration gitlab_schema: :gitlab_main
-
- def up
- queue_batched_background_migration(
- MIGRATION,
- :routes,
- :id,
- job_interval: DELAY_INTERVAL,
- batch_class_name: BATCH_CLASS_NAME
- )
- end
-
- def down
- delete_batched_background_migration(MIGRATION, :routes, :id, [])
- end
- end
- ```
-
-When applying a batching strategy, it is important to ensure the filter properly covered by an index to optimize `EachBatch` performance. See [the `EachBatch` docs for more information](../iterating_tables_in_batches.md).
+NOTE:
+[Additional filters](#additional-filters) defined with `scope_to` will be ignored by `LooseIndexScanBatchingStrategy` and `distinct_each_batch`.
## Testing
diff --git a/doc/development/database/ci_mirrored_tables.md b/doc/development/database/ci_mirrored_tables.md
new file mode 100644
index 00000000000..06f0087fafe
--- /dev/null
+++ b/doc/development/database/ci_mirrored_tables.md
@@ -0,0 +1,156 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# CI mirrored tables
+
+## Problem statement
+
+As part of the database [decomposition work](https://gitlab.com/groups/gitlab-org/-/epics/6168),
+which had the goal of splitting the single database GitLab is using, into two databases: `main` and
+`ci`, came the big challenge of
+[removing all joins between the `main` and the `ci` tables](multiple_databases.md#removing-joins-between-ci-and-non-ci-tables).
+That is because PostgreSQL doesn't support joins between tables that belong to different databases.
+However, some core application models in the main database are queried very often by the CI side.
+For example:
+
+- `Namespace`, in the `namespaces` table.
+- `Project`, in the `projects` table.
+
+Not being able to do `joins` on these tables brings a great challenge. The team chose to perform logical
+replication of those tables from the main database to the CI database, in the new tables:
+
+- `ci_namespace_mirrors`, as a mirror of the `namespaces` table
+- `ci_project_mirrors`, as a mirror of the `projects` table
+
+This logical replication means two things:
+
+1. The `main` database tables can be queried and joined to the `namespaces` and `projects` tables.
+1. The `ci` database tables can be joined with the `ci_namespace_mirrors` and `ci_project_mirrors` tables.
+
+```mermaid
+graph LR
+
+ subgraph "Main database (tables)"
+ A[namespaces] -->|updates| B[namespaces_sync_events]
+ A -->|deletes| C[loose_foreign_keys_deleted_records]
+ D[projects] -->|deletes| C
+ D -->|updates| E[projects_sync_events]
+ end
+
+ B --> F
+ C --> G
+ E --> H
+
+ subgraph "Sidekiq worker jobs"
+ F[Namespaces::ProcessSyncEventsWorker]
+ G[LooseForeignKeys::CleanupWorker]
+ H[Projects::ProcessSyncEventsWorker]
+ end
+
+ F -->|do update| I
+ G -->|delete records| I
+ G -->|delete records| J
+ H -->|do update| J
+
+ subgraph "CI database (tables)"
+ I[ci_namespace_mirrors]
+ J[ci_project_mirrors]
+ end
+```
+
+This replication was restricted only to a few attributes that are needed from each model:
+
+- From `Namespace` we replicate `traversal_ids`.
+- From `Project` we replicate only the `namespace_id`, which represents the group which the project belongs to.
+
+## Keeping the CI mirrored tables in sync with the source tables
+
+We must care about two type 3 events to keep
+the source and the target tables in sync:
+
+1. Creation of new namespaces or projects.
+1. Updating the namespaces or projects.
+1. Deleting namespaces/projects.
+
+```mermaid
+graph TD
+
+ subgraph "CI database (tables)"
+ E[other CI tables]
+ F{queries with joins allowed}
+ G[ci_project_mirrors]
+ H[ci_namespace_mirrors]
+
+ E---F
+ F---G
+ F---H
+ end
+
+ A---B
+ B---C
+ B---D
+
+L["⛔ ← Joins are not allowed → ⛔"]
+
+ subgraph "Main database (tables)"
+ A[other main tables]
+ B{queries with joins allowed}
+ C[projects]
+ D[namespaces]
+ end
+```
+
+### Create and update
+
+Syncing the data of newly created or updated namespaces or projects happens in this
+order:
+
+1. **On the `main` database**: Any `INSERT` or `UPDATE` on the `namespaces` or `projects` tables
+ adds an entry to the tables `namespaces_sync_events`, and `projects_sync_events`. These tables
+ also exist on the `main` database. These entries are added by triggers on both of the tables.
+1. **On the model level**: After a commit happens on either of the source models `Namespace` or
+ `Project`, it schedules the corresponding Sidekiq jobs `Namespaces::ProcessSyncEventsWorker`
+ or `Projects::ProcessSyncEventsWorker` to run.
+1. These workers then:
+ 1. Read the entries from the tables `(namespaces/project)_sync_events`
+ from the `main` database, to check which namespaces or projects to sync.
+ 1. Copy the data for any updated records into the target
+ tables `ci_namespace_mirrors`, `ci_project_mirrors`.
+
+### Delete
+
+When any of `namespaces` or `projects` are deleted, the target records on the mirrored
+CI tables are deleted using the [loose foreign keys](loose_foreign_keys.md) (LFK) mechanism.
+
+By having these items in the `config/gitlab_loose_foreign_keys.yml`, the LFK mechanism
+was already working as expected. It deleted any records on the CI mirrored
+tables that mapped to deleted `namespaces` or `projects` in the `main` database.
+
+```yaml
+ci_namespace_mirrors:
+ - table: namespaces
+ column: namespace_id
+ on_delete: async_delete
+ci_project_mirrors:
+ - table: projects
+ column: project_id
+ on_delete: async_delete
+```
+
+## Consistency Checking
+
+To make sure that both syncing mechanisms work as expected, we deploy
+two extra worker jobs, triggered by cron jobs every few minutes:
+
+1. `Database::CiNamespaceMirrorsConsistencyCheckWorker`
+1. `Database::CiProjectMirrorsConsistencyCheckWorker`
+
+These jobs:
+
+1. Scan both of the source tables on the `main` database, using a cursor.
+1. Compare the items in the `namespaces` and `projects` with the target tables on the `ci` database.
+1. Report the items that are not in sync to Kibana and Prometheus.
+1. Corrects any discrepancies.
diff --git a/doc/development/database/client_side_connection_pool.md b/doc/development/database/client_side_connection_pool.md
index dc52a551407..3cd0e836a8d 100644
--- a/doc/development/database/client_side_connection_pool.md
+++ b/doc/development/database/client_side_connection_pool.md
@@ -10,8 +10,8 @@ Ruby processes accessing the database through
ActiveRecord, automatically calculate the connection-pool size for the
process based on the concurrency.
-Because of the way [Ruby on Rails manages database
-connections](#connection-lifecycle), it is important that we have at
+Because of the way [Ruby on Rails manages database connections](#connection-lifecycle),
+it is important that we have at
least as many connections as we have threads. While there is a 'pool'
setting in [`database.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/config/database.yml.postgresql), it is not very practical because you need to
maintain it in tandem with the number of application threads. For this
@@ -28,9 +28,8 @@ because connections are instantiated lazily.
## Troubleshooting connection-pool issues
-The connection-pool usage can be seen per environment in the [connection-pool
-saturation
-dashboard](https://dashboards.gitlab.net/d/alerts-sat_rails_db_connection_pool/alerts-rails_db_connection_pool-saturation-detail?orgId=1).
+The connection-pool usage can be seen per environment in the
+[connection-pool saturation dashboard](https://dashboards.gitlab.net/d/alerts-sat_rails_db_connection_pool/alerts-rails_db_connection_pool-saturation-detail?orgId=1).
If the connection-pool is too small, this would manifest in
`ActiveRecord::ConnectionTimeoutError`s from the application. Because we alert
@@ -41,8 +40,8 @@ hardcoded value (10).
At this point, we need to investigate what is using more connections
than we anticipated. To do that, we can use the
-`gitlab_ruby_threads_running_threads` metric. For example, [this
-graph](https://thanos.gitlab.net/graph?g0.range_input=1h&g0.max_source_resolution=0s&g0.expr=sum%20by%20(thread_name)%20(%20gitlab_ruby_threads_running_threads%7Buses_db_connection%3D%22yes%22%7D%20)&g0.tab=0)
+`gitlab_ruby_threads_running_threads` metric. For example,
+[this graph](https://thanos.gitlab.net/graph?g0.range_input=1h&g0.max_source_resolution=0s&g0.expr=sum%20by%20(thread_name)%20(%20gitlab_ruby_threads_running_threads%7Buses_db_connection%3D%22yes%22%7D%20)&g0.tab=0)
shows all running threads that connect to the database by their
name. Threads labeled `puma worker` or `sidekiq_worker_thread` are
the threads that define `Gitlab::Runtime.max_threads` so those are
diff --git a/doc/development/database/creating_enums.md b/doc/development/database/creating_enums.md
new file mode 100644
index 00000000000..450cb97d978
--- /dev/null
+++ b/doc/development/database/creating_enums.md
@@ -0,0 +1,154 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Creating enums
+
+When creating a new enum, it should use the database type `SMALLINT`.
+The `SMALLINT` type size is 2 bytes, which is sufficient for an enum.
+This would help to save space in the database.
+
+To use this type, add `limit: 2` to the migration that creates the column.
+
+Example:
+
+```ruby
+def change
+ add_column :ci_job_artifacts, :file_format, :integer, limit: 2
+end
+```
+
+## All of the key/value pairs should be defined in FOSS
+
+**Summary:** All enums needs to be defined in FOSS, if a model is also part of the FOSS.
+
+```ruby
+class Model < ApplicationRecord
+ enum platform: {
+ aws: 0,
+ gcp: 1 # EE-only
+ }
+end
+```
+
+When you add a new key/value pair to a `enum` and if it's EE-specific, you might be
+tempted to organize the `enum` as the following:
+
+```ruby
+# Define `failure_reason` enum in `Pipeline` model:
+class Pipeline < ApplicationRecord
+ enum failure_reason: Enums::Pipeline.failure_reasons
+end
+```
+
+```ruby
+# Define key/value pairs that used in FOSS and EE:
+module Enums
+ module Pipeline
+ def self.failure_reasons
+ { unknown_failure: 0, config_error: 1 }
+ end
+ end
+end
+
+Enums::Pipeline.prepend_mod_with('Enums::Pipeline')
+```
+
+```ruby
+# Define key/value pairs that used in EE only:
+module EE
+ module Enums
+ module Pipeline
+ override :failure_reasons
+ def failure_reasons
+ super.merge(activity_limit_exceeded: 2)
+ end
+ end
+ end
+end
+```
+
+This works as-is, however, it has a couple of downside that:
+
+- Someone could define a key/value pair in EE that is **conflicted** with a value defined in FOSS.
+ For example, define `activity_limit_exceeded: 1` in `EE::Enums::Pipeline`.
+- When it happens, the feature works totally different.
+ For example, we cannot figure out `failure_reason` is either `config_error` or `activity_limit_exceeded`.
+- When it happens, we have to ship a database migration to fix the data integrity,
+ which might be impossible if you cannot recover the original value.
+
+Also, you might observe a workaround for this concern by setting an offset in EE's values.
+For example, this example sets `1000` as the offset:
+
+```ruby
+module EE
+ module Enums
+ module Pipeline
+ override :failure_reasons
+ def failure_reasons
+ super.merge(activity_limit_exceeded: 1_000, size_limit_exceeded: 1_001)
+ end
+ end
+ end
+end
+```
+
+This looks working as a workaround, however, this approach has some downsides that:
+
+- Features could move from EE to FOSS or vice versa. Therefore, the offset might be mixed between FOSS and EE in the future.
+ For example, when you move `activity_limit_exceeded` to FOSS, you see `{ unknown_failure: 0, config_error: 1, activity_limit_exceeded: 1_000 }`.
+- The integer column for the `enum` is likely created [as `SMALLINT`](#creating-enums).
+ Therefore, you need to be careful of that the offset doesn't exceed the maximum value of 2 bytes integer.
+
+As a conclusion, you should define all of the key/value pairs in FOSS.
+For example, you can simply write the following code in the above case:
+
+```ruby
+class Pipeline < ApplicationRecord
+ enum failure_reason: {
+ unknown_failure: 0,
+ config_error: 1,
+ activity_limit_exceeded: 2
+ }
+end
+```
+
+## Add new values in the gap
+
+After merging some EE and FOSS enums, there might be a gap between the two groups of values:
+
+```ruby
+module Enums
+ module Ci
+ module CommitStatus
+ def self.failure_reasons
+ {
+ # ...
+ data_integrity_failure: 12,
+ forward_deployment_failure: 13,
+ insufficient_bridge_permissions: 1_001,
+ downstream_bridge_project_not_found: 1_002,
+ # ...
+ }
+ end
+ end
+ end
+end
+```
+
+To add new values, you should fill the gap first.
+In the example above add `14` instead of `1_003`:
+
+```ruby
+{
+ # ...
+ data_integrity_failure: 12,
+ forward_deployment_failure: 13,
+ a_new_value: 14,
+ insufficient_bridge_permissions: 1_001,
+ downstream_bridge_project_not_found: 1_002,
+ # ...
+}
+```
diff --git a/doc/development/database/database_debugging.md b/doc/development/database/database_debugging.md
new file mode 100644
index 00000000000..5921dc942f2
--- /dev/null
+++ b/doc/development/database/database_debugging.md
@@ -0,0 +1,177 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Troubleshooting and Debugging Database
+
+This section is to help give some copy-pasta you can use as a reference when you
+run into some head-banging database problems.
+
+A first step is to search for your error in Slack, or search for `GitLab <my error>` with Google.
+
+Available `RAILS_ENV`:
+
+- `production` (generally not for your main GDK database, but you may need this for other installations such as Omnibus).
+- `development` (this is your main GDK db).
+- `test` (used for tests like RSpec).
+
+## Delete everything and start over
+
+If you just want to delete everything and start over with an empty DB (approximately 1 minute):
+
+```shell
+bundle exec rake db:reset RAILS_ENV=development
+```
+
+If you want to seed the empty DB with sample data (approximately 4 minutes):
+
+```shell
+bundle exec rake dev:setup
+```
+
+If you just want to delete everything and start over with sample data (approximately 4 minutes). This
+also does `db:reset` and runs DB-specific migrations:
+
+```shell
+bundle exec rake db:setup RAILS_ENV=development
+```
+
+If your test DB is giving you problems, it is safe to delete everything because it doesn't contain important
+data:
+
+```shell
+bundle exec rake db:reset RAILS_ENV=test
+```
+
+## Migration wrangling
+
+- `bundle exec rake db:migrate RAILS_ENV=development`: Execute any pending migrations that you may have picked up from a MR
+- `bundle exec rake db:migrate:status RAILS_ENV=development`: Check if all migrations are `up` or `down`
+- `bundle exec rake db:migrate:down VERSION=20170926203418 RAILS_ENV=development`: Tear down a migration
+- `bundle exec rake db:migrate:up VERSION=20170926203418 RAILS_ENV=development`: Set up a migration
+- `bundle exec rake db:migrate:redo VERSION=20170926203418 RAILS_ENV=development`: Re-run a specific migration
+
+## Manually access the database
+
+Access the database via one of these commands (they all get you to the same place)
+
+```shell
+gdk psql -d gitlabhq_development
+bundle exec rails dbconsole -e development
+bundle exec rails db -e development
+```
+
+- `\q`: Quit/exit
+- `\dt`: List all tables
+- `\d+ issues`: List columns for `issues` table
+- `CREATE TABLE board_labels();`: Create a table called `board_labels`
+- `SELECT * FROM schema_migrations WHERE version = '20170926203418';`: Check if a migration was run
+- `DELETE FROM schema_migrations WHERE version = '20170926203418';`: Manually remove a migration
+
+## Access the database with a GUI
+
+Most GUIs (DataGrip, RubyMine, DBeaver) require a TCP connection to the database, but by default
+the database runs on a UNIX socket. To be able to access the database from these tools, some steps
+are needed:
+
+1. On the GDK root directory, run:
+
+ ```shell
+ gdk config set postgresql.host localhost
+ ```
+
+1. Open your `gdk.yml`, and confirm that it has the following lines:
+
+ ```yaml
+ postgresql:
+ host: localhost
+ ```
+
+1. Reconfigure GDK:
+
+ ```shell
+ gdk reconfigure
+ ```
+
+1. On your database GUI, select `localhost` as host, `5432` as port and `gitlabhq_development` as database.
+ Alternatively, you can use the connection string `postgresql://localhost:5432/gitlabhq_development`.
+
+The new connection should be working now.
+
+## Access the GDK database with Visual Studio Code
+
+Use these instructions for exploring the GitLab database while developing with the GDK:
+
+1. Install or open [Visual Studio Code](https://code.visualstudio.com/download).
+1. Install the [PostgreSQL VSCode Extension](https://marketplace.visualstudio.com/items?itemName=ckolkman.vscode-postgres).
+1. In Visual Studio Code select **PostgreSQL Explorer** in the left toolbar.
+1. In the top bar of the new window, select `+` to **Add Database Connection**, and follow the prompts to fill in the details:
+ 1. **Hostname**: the path to the PostgreSQL folder in your GDK directory (for example `/dev/gitlab-development-kit/postgresql`).
+ 1. **PostgreSQL user to authenticate as**: usually your local username, unless otherwise specified during PostgreSQL installation.
+ 1. **Password of the PostgreSQL user**: the password you set when installing PostgreSQL.
+ 1. **Port number to connect to**: `5432` (default).
+ 1. **Use an SSL connection?** This depends on your installation. Options are:
+ - **Use Secure Connection**
+ - **Standard Connection** (default)
+ 1. **Optional. The database to connect to**: `gitlabhq_development`.
+ 1. **The display name for the database connection**: `gitlabhq_development`.
+
+Your database connection should now be displayed in the PostgreSQL Explorer pane and
+you can explore the `gitlabhq_development` database. If you cannot connect, ensure
+that GDK is running. For further instructions on how to use the PostgreSQL Explorer
+Extension for Visual Studio Code, read the [usage section](https://marketplace.visualstudio.com/items?itemName=ckolkman.vscode-postgres#usage)
+of the extension documentation.
+
+## FAQ
+
+### `ActiveRecord::PendingMigrationError` with Spring
+
+When running specs with the [Spring pre-loader](../rake_tasks.md#speed-up-tests-rake-tasks-and-migrations),
+the test database can get into a corrupted state. Trying to run the migration or
+dropping/resetting the test database has no effect.
+
+```shell
+$ bundle exec spring rspec some_spec.rb
+...
+Failure/Error: ActiveRecord::Migration.maintain_test_schema!
+
+ActiveRecord::PendingMigrationError:
+
+
+ Migrations are pending. To resolve this issue, run:
+
+ bin/rake db:migrate RAILS_ENV=test
+# ~/.rvm/gems/ruby-2.3.3/gems/activerecord-4.2.10/lib/active_record/migration.rb:392:in `check_pending!'
+...
+0 examples, 0 failures, 1 error occurred outside of examples
+```
+
+To resolve, you can kill the spring server and app that lives between spec runs.
+
+```shell
+$ ps aux | grep spring
+eric 87304 1.3 2.9 3080836 482596 ?? Ss 10:12AM 4:08.36 spring app | gitlab | started 6 hours ago | test mode
+eric 37709 0.0 0.0 2518640 7524 s006 S Wed11AM 0:00.79 spring server | gitlab | started 29 hours ago
+$ kill 87304
+$ kill 37709
+```
+
+### db:migrate `database version is too old to be migrated` error
+
+Users receive this error when `db:migrate` detects that the current schema version
+is older than the `MIN_SCHEMA_VERSION` defined in the `Gitlab::Database` library
+module.
+
+Over time we cleanup/combine old migrations in the codebase, so it is not always
+possible to migrate GitLab from every previous version.
+
+In some cases you may want to bypass this check. For example, if you were on a version
+of GitLab schema later than the `MIN_SCHEMA_VERSION`, and then rolled back the
+to an older migration, from before. In this case, to migrate forward again,
+you should set the `SKIP_SCHEMA_VERSION_CHECK` environment variable.
+
+```shell
+bundle exec rake db:migrate SKIP_SCHEMA_VERSION_CHECK=true
+```
diff --git a/doc/development/database/database_dictionary.md b/doc/development/database/database_dictionary.md
new file mode 100644
index 00000000000..c330c5e67bd
--- /dev/null
+++ b/doc/development/database/database_dictionary.md
@@ -0,0 +1,51 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Database Dictionary
+
+This page documents the database schema for GitLab, so data analysts and other groups can
+locate the feature categories responsible for specific database tables.
+
+## Location
+
+Database dictionary metadata files are stored in the `gitlab` project under `db/docs/`.
+
+## Example dictionary file
+
+```yaml
+---
+table_name: terraform_states
+classes:
+- Terraform::State
+feature_categories:
+- infrastructure_as_code
+description: Represents a Terraform state backend
+introduced_by_url: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/26619
+milestone: '13.0'
+```
+
+## Schema
+
+| Attribute | Type | Required | Description |
+|----------------------|---------------|----------|--------------------------------------------------------------------------|
+| `table_name` | String | yes | Database table name |
+| `classes` | Array(String) | no | List of classes that respond to `.table_name` with the `table_name` |
+| `feature_categories` | Array(String) | yes | List of feature categories using this table |
+| `description` | String | no | Text description of the information stored in the table and it's purpose |
+| `introduced_by_url` | URL | no | URL to the merge request or commit which introduced this table |
+| `milestone` | String | no | The milestone that introduced this table |
+
+## Adding tables
+
+When adding a new table, create a new file under `db/docs/` named
+`<table_name>.yml` containing as much information as you know about the table.
+
+Include this file in the commit with the migration that creates the table.
+
+## Dropping tables
+
+When dropping a table, you must remove the metadata file from `db/docs/`
+in the same commit with the migration that drops the table.
diff --git a/doc/development/database/database_lab.md b/doc/development/database/database_lab.md
index 5346df2690d..1d584a4ec6f 100644
--- a/doc/development/database/database_lab.md
+++ b/doc/development/database/database_lab.md
@@ -95,7 +95,7 @@ To connect to a clone using `psql`:
1. In the **Clone details** page of the Postgres.ai web interface, copy and run
the command to start SSH port forwarding for the clone.
1. In the **Clone details** page of the Postgres.ai web interface, copy and run the `psql` connection string.
- Use the password provided at setup.
+ Use the password provided at setup and set the `dbname` to `gitlabhq_dblab` (or check what databases are available by using `psql -l` with the same query string but `dbname=postgres`).
After you connect, use clone like you would any `psql` console in production, but with
the added benefit and safety of an isolated writeable environment.
diff --git a/doc/development/database/database_query_comments.md b/doc/development/database/database_query_comments.md
new file mode 100644
index 00000000000..2798071bc06
--- /dev/null
+++ b/doc/development/database/database_query_comments.md
@@ -0,0 +1,62 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Database query comments with Marginalia
+
+The [Marginalia gem](https://github.com/basecamp/marginalia) is used to add
+query comments containing application related context information to PostgreSQL
+queries generated by ActiveRecord.
+
+It is very useful for tracing problematic queries back to the application source.
+
+An engineer during an on-call incident has the full context of a query
+and its application source from the comments.
+
+## Metadata information in comments
+
+Queries generated from **Rails** include the following metadata in comments:
+
+- `application`
+- `correlation_id`
+- `endpoint_id`
+- `line`
+
+Queries generated from **Sidekiq** workers include the following metadata
+in comments:
+
+- `application`
+- `jid`
+- `correlation_id`
+- `endpoint_id`
+- `line`
+
+`endpoint_id` is a single field that can represent any endpoint in the application:
+
+- For Rails controllers, it's the controller and action. For example, `Projects::BlobController#show`.
+- For Grape API endpoints, it's the route. For example, `/api/:version/users/:id`.
+- For Sidekiq workers, it's the worker class name. For example, `UserStatusCleanup::BatchWorker`.
+
+`line` is not present in production logs due to the additional overhead required.
+
+Examples of queries with comments:
+
+- Rails:
+
+ ```sql
+ /*application:web,controller:blob,action:show,correlation_id:01EZVMR923313VV44ZJDJ7PMEZ,endpoint_id:Projects::BlobController#show*/ SELECT "routes".* FROM "routes" WHERE "routes"."source_id" = 75 AND "routes"."source_type" = 'Namespace' LIMIT 1
+ ```
+
+- Grape:
+
+ ```sql
+ /*application:web,correlation_id:01EZVN0DAYGJF5XHG9N4VX8FAH,endpoint_id:/api/:version/users/:id*/ SELECT COUNT(*) FROM "users" INNER JOIN "user_follow_users" ON "users"."id" = "user_follow_users"."followee_id" WHERE "user_follow_users"."follower_id" = 1
+ ```
+
+- Sidekiq:
+
+ ```sql
+ /*application:sidekiq,correlation_id:df643992563683313bc0a0288fb55e23,jid:15fbc506590c625d7664b074,endpoint_id:UserStatusCleanup::BatchWorker,line:/app/workers/user_status_cleanup/batch_worker.rb:19:in `perform'*/ SELECT $1 AS one FROM "user_statuses" WHERE "user_statuses"."clear_status_at" <= $2 LIMIT $3
+ ```
diff --git a/doc/development/database/database_reviewer_guidelines.md b/doc/development/database/database_reviewer_guidelines.md
index b6bbfe690c1..a85f399a447 100644
--- a/doc/development/database/database_reviewer_guidelines.md
+++ b/doc/development/database/database_reviewer_guidelines.md
@@ -36,7 +36,7 @@ projects:
Create the merge request [using the "Database reviewer" template](https://gitlab.com/gitlab-com/www-gitlab-com/-/blob/master/.gitlab/merge_request_templates/Database%20reviewer.md),
adding your expertise your profile YAML file. Assign to a database maintainer or the
-[Database Team's Engineering Manager](https://about.gitlab.com/handbook/engineering/development/enablement/database/).
+[Database Team's Engineering Manager](https://about.gitlab.com/handbook/engineering/development/enablement/data_stores/database/).
After the `team.yml` update is merged, the [Reviewer roulette](../code_review.md#reviewer-roulette)
may recommend you as a database reviewer.
@@ -53,17 +53,17 @@ that require a more in-depth discussion between the database reviewers and maint
- [Database Office Hours Agenda](https://docs.google.com/document/d/1wgfmVL30F8SdMg-9yY6Y8djPSxWNvKmhR5XmsvYX1EI/edit).
- <i class="fa fa-youtube-play youtube" aria-hidden="true"></i> [YouTube playlist with past recordings](https://www.youtube.com/playlist?list=PL05JrBw4t0Kp-kqXeiF7fF7cFYaKtdqXM).
-You should also join the [#database-lab](../understanding_explain_plans.md#database-lab-engine)
+You should also join the [#database-lab](understanding_explain_plans.md#database-lab-engine)
Slack channel and get familiar with how to use Joe, the Slackbot that provides developers
with their own clone of the production database.
Understanding and efficiently using `EXPLAIN` plans is at the core of the database review process.
The following guides provide a quick introduction and links to follow on more advanced topics:
-- Guide on [understanding EXPLAIN plans](../understanding_explain_plans.md).
+- Guide on [understanding EXPLAIN plans](understanding_explain_plans.md).
- [Explaining the unexplainable series in `depesz`](https://www.depesz.com/tag/unexplainable/).
-We also have licensed access to The Art of PostgreSQL available, if you are interested in getting access please check out the
+We also have licensed access to The Art of PostgreSQL. If you are interested in getting access, check out the
[issue (confidential)](https://gitlab.com/gitlab-org/database-team/team-tasks/-/issues/23).
Finally, you can find various guides in the [Database guides](index.md) page that cover more specific
@@ -95,7 +95,7 @@ are three times as likely to be picked by the [Danger bot](../dangerbot.md) as o
## What to do if you feel overwhelmed
Similar to all types of reviews, [unblocking others is always a top priority](https://about.gitlab.com/handbook/values/#global-optimization).
-Database reviewers are expected to [review assigned merge requests in a timely manner](../code_review.md#review-turnaround-time)
+Database reviewers are expected to [review assigned merge requests in a timely manner](https://about.gitlab.com/handbook/engineering/workflow/code-review/#review-turnaround-time)
or let the author know as soon as possible and help them find another reviewer or maintainer.
We are doing reviews to help the rest of the GitLab team and, at the same time, get exposed
diff --git a/doc/development/database/db_dump.md b/doc/development/database/db_dump.md
new file mode 100644
index 00000000000..f2076cbc410
--- /dev/null
+++ b/doc/development/database/db_dump.md
@@ -0,0 +1,56 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Importing a database dump into a staging environment
+
+Sometimes it is useful to import the database from a production environment
+into a staging environment for testing. The procedure below assumes you have
+SSH and `sudo` access to both the production environment and the staging VM.
+
+**Destroy your staging VM** when you are done with it. It is important to avoid
+data leaks.
+
+On the staging VM, add the following line to `/etc/gitlab/gitlab.rb` to speed up
+large database imports.
+
+```shell
+# On STAGING
+echo "postgresql['checkpoint_segments'] = 64" | sudo tee -a /etc/gitlab/gitlab.rb
+sudo touch /etc/gitlab/skip-auto-reconfigure
+sudo gitlab-ctl reconfigure
+sudo gitlab-ctl stop puma
+sudo gitlab-ctl stop sidekiq
+```
+
+Next, we let the production environment stream a compressed SQL dump to our
+local machine via SSH, and redirect this stream to a `psql` client on the staging
+VM.
+
+```shell
+# On LOCAL MACHINE
+ssh -C gitlab.example.com sudo -u gitlab-psql /opt/gitlab/embedded/bin/pg_dump -Cc gitlabhq_production |\
+ ssh -C staging-vm sudo -u gitlab-psql /opt/gitlab/embedded/bin/psql -d template1
+```
+
+## Recreating directory structure
+
+If you need to re-create some directory structure on the staging server you can
+use this procedure.
+
+First, on the production server, create a list of directories you want to
+re-create.
+
+```shell
+# On PRODUCTION
+(umask 077; sudo find /var/opt/gitlab/git-data/repositories -maxdepth 1 -type d -print0 > directories.txt)
+```
+
+Copy `directories.txt` to the staging server and create the directories there.
+
+```shell
+# On STAGING
+sudo -u git xargs -0 mkdir -p < directories.txt
+```
diff --git a/doc/development/database/filtering_by_label.md b/doc/development/database/filtering_by_label.md
new file mode 100644
index 00000000000..29b0c75298e
--- /dev/null
+++ b/doc/development/database/filtering_by_label.md
@@ -0,0 +1,179 @@
+---
+stage: Plan
+group: Project Management
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+# Filtering by label
+
+## Introduction
+
+GitLab has [labels](../../user/project/labels.md) that can be assigned to issues,
+merge requests, and epics. Labels on those objects are a many-to-many relation
+through the polymorphic `label_links` table.
+
+To filter these objects by multiple labels - for instance, 'all open
+issues with the label ~Plan and the label ~backend' - we generate a
+query containing a `GROUP BY` clause. In a simple form, this looks like:
+
+```sql
+SELECT
+ issues.*
+FROM
+ issues
+ INNER JOIN label_links ON label_links.target_id = issues.id
+ AND label_links.target_type = 'Issue'
+ INNER JOIN labels ON labels.id = label_links.label_id
+WHERE
+ issues.project_id = 13083
+ AND (issues.state IN ('opened'))
+ AND labels.title IN ('Plan',
+ 'backend')
+GROUP BY
+ issues.id
+HAVING (COUNT(DISTINCT labels.title) = 2)
+ORDER BY
+ issues.updated_at DESC,
+ issues.id DESC
+LIMIT 20 OFFSET 0
+```
+
+In particular, note that:
+
+1. We `GROUP BY issues.id` so that we can ...
+1. Use the `HAVING (COUNT(DISTINCT labels.title) = 2)` condition to ensure that
+ all matched issues have both labels.
+
+This is more complicated than is ideal. It makes the query construction more
+prone to errors (such as
+[issue #15557](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/15557)).
+
+## Attempt A: `WHERE EXISTS`
+
+### Attempt A1: use multiple subqueries with `WHERE EXISTS`
+
+In [issue #37137](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/37137)
+and its associated [merge request](https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/14022),
+we tried to replace the `GROUP BY` with multiple uses of `WHERE EXISTS`. For the
+example above, this would give:
+
+```sql
+WHERE (EXISTS (
+ SELECT
+ TRUE
+ FROM
+ label_links
+ INNER JOIN labels ON labels.id = label_links.label_id
+ WHERE
+ labels.title = 'Plan'
+ AND target_type = 'Issue'
+ AND target_id = issues.id))
+AND (EXISTS (
+ SELECT
+ TRUE
+ FROM
+ label_links
+ INNER JOIN labels ON labels.id = label_links.label_id
+ WHERE
+ labels.title = 'backend'
+ AND target_type = 'Issue'
+ AND target_id = issues.id))
+```
+
+While this worked without schema changes, and did improve readability somewhat,
+it did not improve query performance.
+
+### Attempt A2: use label IDs in the `WHERE EXISTS` clause
+
+In [merge request #34503](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/34503), we followed a similar approach to A1. But this time, we
+did a separate query to fetch the IDs of the labels used in the filter so that we avoid the `JOIN` in the `EXISTS` clause and filter directly by
+`label_links.label_id`. We also added a new index on `label_links` for the `target_id`, `label_id`, and `target_type` columns to speed up this query.
+
+Finding the label IDs wasn't straightforward because there could be multiple labels with the same title within a single root namespace. We solved
+this by grouping the label IDs by title and then using the array of IDs in the `EXISTS` clauses.
+
+This resulted in a significant performance improvement. However, this optimization could not be applied to the dashboard pages
+where we do not have a project or group context. We could not easily search for the label IDs here because that would mean searching across all
+projects and groups that the user has access to.
+
+## Attempt B: Denormalize using an array column
+
+Having [removed MySQL support in GitLab 12.1](https://about.gitlab.com/blog/2019/06/27/removing-mysql-support/),
+using [PostgreSQL's arrays](https://www.postgresql.org/docs/11/arrays.html) became more
+tractable as we didn't have to support two databases. We discussed denormalizing
+the `label_links` table for querying in
+[issue #49651](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/49651),
+with two options: label IDs and titles.
+
+We can think of both of those as array columns on `issues`, `merge_requests`,
+and `epics`: `issues.label_ids` would be an array column of label IDs, and
+`issues.label_titles` would be an array of label titles.
+
+These array columns can be complemented with
+[GIN indexes](https://www.postgresql.org/docs/11/gin-intro.html) to improve
+matching.
+
+### Attempt B1: store label IDs for each object
+
+This has some strong advantages over titles:
+
+1. Unless a label is deleted, or a project is moved, we never need to
+ bulk-update the denormalized column.
+1. It uses less storage than the titles.
+
+Unfortunately, our application design makes this hard. If we were able to query
+just by label ID easily, we wouldn't need the `INNER JOIN labels` in the initial
+query at the start of this document. GitLab allows users to filter by label
+title across projects and even across groups, so a filter by the label ~Plan may
+include labels with multiple distinct IDs.
+
+We do not want users to have to know about the different IDs, which means that
+given this data set:
+
+| Project | ~Plan label ID | ~backend label ID |
+| ------- | -------------- | ----------------- |
+| A | 11 | 12 |
+| B | 21 | 22 |
+| C | 31 | 32 |
+
+We would need something like:
+
+```sql
+WHERE
+ label_ids @> ARRAY[11, 12]
+ OR label_ids @> ARRAY[21, 22]
+ OR label_ids @> ARRAY[31, 32]
+```
+
+This can get even more complicated when we consider that in some cases, there
+might be two ~backend labels - with different IDs - that could apply to the same
+object, so the number of combinations would balloon further.
+
+### Attempt B2: store label titles for each object
+
+From the perspective of updating the object, this is the worst
+option. We have to bulk update the objects when:
+
+1. The objects are moved from one project to another.
+1. The project is moved from one group to another.
+1. The label is renamed.
+1. The label is deleted.
+
+It also uses much more storage. Querying is simple, though:
+
+```sql
+WHERE
+ label_titles @> ARRAY['Plan', 'backend']
+```
+
+And our [tests in issue #49651](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/49651#note_188777346)
+showed that this could be fast.
+
+However, at present, the disadvantages outweigh the advantages.
+
+## Conclusion
+
+We found a method A2 that does not need denormalization and improves the query performance significantly. This
+did not apply to all cases, but we were able to apply method A1 to the rest of the cases so that we remove the
+`GROUP BY` and `HAVING` clauses in all scenarios.
+
+This simplified the query and improved the performance in the most common cases.
diff --git a/doc/development/database/foreign_keys.md b/doc/development/database/foreign_keys.md
new file mode 100644
index 00000000000..7834e7d53c3
--- /dev/null
+++ b/doc/development/database/foreign_keys.md
@@ -0,0 +1,199 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Foreign Keys & Associations
+
+When adding an association to a model you must also add a foreign key. For
+example, say you have the following model:
+
+```ruby
+class User < ActiveRecord::Base
+ has_many :posts
+end
+```
+
+Add a foreign key here on column `posts.user_id`. This ensures
+that data consistency is enforced on database level. Foreign keys also mean that
+the database can very quickly remove associated data (for example, when removing a
+user), instead of Rails having to do this.
+
+## Adding Foreign Keys In Migrations
+
+Foreign keys can be added concurrently using `add_concurrent_foreign_key` as
+defined in `Gitlab::Database::MigrationHelpers`. See the
+[Migration Style Guide](../migration_style_guide.md) for more information.
+
+Keep in mind that you can only safely add foreign keys to existing tables after
+you have removed any orphaned rows. The method `add_concurrent_foreign_key`
+does not take care of this so you must do so manually. See
+[adding foreign key constraint to an existing column](add_foreign_key_to_existing_column.md).
+
+## Updating Foreign Keys In Migrations
+
+Sometimes a foreign key constraint must be changed, preserving the column
+but updating the constraint condition. For example, moving from
+`ON DELETE CASCADE` to `ON DELETE SET NULL` or vice-versa.
+
+PostgreSQL does not prevent you from adding overlapping foreign keys. It
+honors the most recently added constraint. This allows us to replace foreign keys without
+ever losing foreign key protection on a column.
+
+To replace a foreign key:
+
+1. [Add the new foreign key without validation](add_foreign_key_to_existing_column.md#prevent-invalid-records)
+
+ The name of the foreign key constraint must be changed to add a new
+ foreign key before removing the old one.
+
+ ```ruby
+ class ReplaceFkOnPackagesPackagesProjectId < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ NEW_CONSTRAINT_NAME = 'fk_new'
+
+ def up
+ add_concurrent_foreign_key(:packages_packages, :projects, column: :project_id, on_delete: :nullify, validate: false, name: NEW_CONSTRAINT_NAME)
+ end
+
+ def down
+ with_lock_retries do
+ remove_foreign_key_if_exists(:packages_packages, column: :project_id, on_delete: :nullify, name: NEW_CONSTRAINT_NAME)
+ end
+ end
+ end
+ ```
+
+1. [Validate the new foreign key](add_foreign_key_to_existing_column.md#validate-the-foreign-key)
+
+ ```ruby
+ class ValidateFkNew < Gitlab::Database::Migration[2.0]
+ NEW_CONSTRAINT_NAME = 'fk_new'
+
+ # foreign key added in <link to MR or path to migration adding new FK>
+ def up
+ validate_foreign_key(:packages_packages, name: NEW_CONSTRAINT_NAME)
+ end
+
+ def down
+ # no-op
+ end
+ end
+ ```
+
+1. Remove the old foreign key:
+
+ ```ruby
+ class RemoveFkOld < Gitlab::Database::Migration[2.0]
+ OLD_CONSTRAINT_NAME = 'fk_old'
+
+ # new foreign key added in <link to MR or path to migration adding new FK>
+ # and validated in <link to MR or path to migration validating new FK>
+ def up
+ remove_foreign_key_if_exists(:packages_packages, column: :project_id, on_delete: :cascade, name: OLD_CONSTRAINT_NAME)
+ end
+
+ def down
+ # Validation is skipped here, so if rolled back, this will need to be revalidated in a separate migration
+ add_concurrent_foreign_key(:packages_packages, :projects, column: :project_id, on_delete: :cascade, validate: false, name: OLD_CONSTRAINT_NAME)
+ end
+ end
+ ```
+
+## Cascading Deletes
+
+Every foreign key must define an `ON DELETE` clause, and in 99% of the cases
+this should be set to `CASCADE`.
+
+## Indexes
+
+When adding a foreign key in PostgreSQL the column is not indexed automatically,
+thus you must also add a concurrent index. Not doing so results in cascading
+deletes being very slow.
+
+## Naming foreign keys
+
+By default Ruby on Rails uses the `_id` suffix for foreign keys. So we should
+only use this suffix for associations between two tables. If you want to
+reference an ID on a third party platform the `_xid` suffix is recommended.
+
+The spec `spec/db/schema_spec.rb` tests if all columns with the `_id` suffix
+have a foreign key constraint. So if that spec fails, don't add the column to
+`IGNORED_FK_COLUMNS`, but instead add the FK constraint, or consider naming it
+differently.
+
+## Dependent Removals
+
+Don't define options such as `dependent: :destroy` or `dependent: :delete` when
+defining an association. Defining these options means Rails handles the
+removal of data, instead of letting the database handle this in the most
+efficient way possible.
+
+In other words, this is bad and should be avoided at all costs:
+
+```ruby
+class User < ActiveRecord::Base
+ has_many :posts, dependent: :destroy
+end
+```
+
+Should you truly have a need for this it should be approved by a database
+specialist first.
+
+You should also not define any `before_destroy` or `after_destroy` callbacks on
+your models _unless_ absolutely required and only when approved by database
+specialists. For example, if each row in a table has a corresponding file on a
+file system it may be tempting to add a `after_destroy` hook. This however
+introduces non database logic to a model, and means we can no longer rely on
+foreign keys to remove the data as this would result in the file system data
+being left behind. In such a case you should use a service class instead that
+takes care of removing non database data.
+
+In cases where the relation spans multiple databases you have even
+further problems using `dependent: :destroy` or the above hooks. You can
+read more about alternatives at
+[Avoid `dependent: :nullify` and `dependent: :destroy` across databases](multiple_databases.md#avoid-dependent-nullify-and-dependent-destroy-across-databases).
+
+## Alternative primary keys with `has_one` associations
+
+Sometimes a `has_one` association is used to create a one-to-one relationship:
+
+```ruby
+class User < ActiveRecord::Base
+ has_one :user_config
+end
+
+class UserConfig < ActiveRecord::Base
+ belongs_to :user
+end
+```
+
+In these cases, there may be an opportunity to remove the unnecessary `id`
+column on the associated table, `user_config.id` in this example. Instead,
+the originating table ID can be used as the primary key for the associated
+table:
+
+```ruby
+create_table :user_configs, id: false do |t|
+ t.references :users, primary_key: true, default: nil, index: false, foreign_key: { on_delete: :cascade }
+ ...
+end
+```
+
+Setting `default: nil` ensures a primary key sequence is not created, and since the primary key
+automatically gets an index, we set `index: false` to avoid creating a duplicate.
+You also need to add the new primary key to the model:
+
+```ruby
+class UserConfig < ActiveRecord::Base
+ self.primary_key = :user_id
+
+ belongs_to :user
+end
+```
+
+Using a foreign key as primary key saves space but can make
+[batch counting](../service_ping/implement.md#batch-counters) in [Service Ping](../service_ping/index.md) less efficient.
+Consider using a regular `id` column if the table is relevant for Service Ping.
diff --git a/doc/development/database/hash_indexes.md b/doc/development/database/hash_indexes.md
new file mode 100644
index 00000000000..731639b6f06
--- /dev/null
+++ b/doc/development/database/hash_indexes.md
@@ -0,0 +1,26 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Hash Indexes
+
+PostgreSQL supports hash indexes besides the regular B-tree
+indexes. Hash indexes however are to be avoided at all costs. While they may
+_sometimes_ provide better performance the cost of rehashing can be very high.
+More importantly: at least until PostgreSQL 10.0 hash indexes are not
+WAL-logged, meaning they are not replicated to any replicas. From the PostgreSQL
+documentation:
+
+> Hash index operations are not presently WAL-logged, so hash indexes might need
+> to be rebuilt with REINDEX after a database crash if there were unwritten
+> changes. Also, changes to hash indexes are not replicated over streaming or
+> file-based replication after the initial base backup, so they give wrong
+> answers to queries that subsequently use them. For these reasons, hash index
+> use is presently discouraged.
+
+RuboCop is configured to register an offense when it detects the use of a hash
+index.
+
+Instead of using hash indexes you should use regular B-tree indexes.
diff --git a/doc/development/database/index.md b/doc/development/database/index.md
index b427f54ff3c..8cf9a2eec04 100644
--- a/doc/development/database/index.md
+++ b/doc/development/database/index.md
@@ -16,52 +16,62 @@ info: To determine the technical writer assigned to the Stage/Group associated w
## Tooling
-- [Understanding EXPLAIN plans](../understanding_explain_plans.md)
+- [Understanding EXPLAIN plans](understanding_explain_plans.md)
- [explain.depesz.com](https://explain.depesz.com/) or [explain.dalibo.com](https://explain.dalibo.com/) for visualizing the output of `EXPLAIN`
- [pgFormatter](https://sqlformat.darold.net/) a PostgreSQL SQL syntax beautifier
- [db:check-migrations job](dbcheck-migrations-job.md)
## Migrations
+- [Different types of migrations](../migration_style_guide.md#choose-an-appropriate-migration-type)
+- [Create a regular migration](../migration_style_guide.md#create-a-regular-schema-migration), including creating new models
+- [Post-deployment migrations guidelines](post_deployment_migrations.md) and [how to create one](post_deployment_migrations.md#creating-migrations)
+- [Background migrations guidelines](background_migrations.md)
+- [Batched background migrations guidelines](batched_background_migrations.md)
+- [Deleting migrations](deleting_migrations.md)
+- [Running database migrations](database_debugging.md#migration-wrangling)
- [Migrations for multiple databases](migrations_for_multiple_databases.md)
- [Avoiding downtime in migrations](avoiding_downtime_in_migrations.md)
-- [SQL guidelines](../sql.md) for working with SQL queries
+- [When and how to write Rails migrations tests](../testing_guide/testing_migrations_guide.md)
- [Migrations style guide](../migration_style_guide.md) for creating safe SQL migrations
- [Testing Rails migrations](../testing_guide/testing_migrations_guide.md) guide
- [Post deployment migrations](post_deployment_migrations.md)
- [Background migrations](background_migrations.md)
-- [Swapping tables](../swapping_tables.md)
+- [Swapping tables](swapping_tables.md)
- [Deleting migrations](deleting_migrations.md)
+- [SQL guidelines](../sql.md) for working with SQL queries
- [Partitioning tables](table_partitioning.md)
## Debugging
-- Tracing the source of an SQL query using query comments with [Marginalia](../database_query_comments.md)
+- [Resetting the database](database_debugging.md#delete-everything-and-start-over)
+- [Accessing the database](database_debugging.md#manually-access-the-database)
+- [Troubleshooting and debugging the database](database_debugging.md)
+- Tracing the source of an SQL query using query comments with [Marginalia](database_query_comments.md)
- Tracing the source of an SQL query in Rails console using [Verbose Query Logs](https://guides.rubyonrails.org/debugging_rails_applications.html#verbose-query-logs)
## Best practices
-- [Adding database indexes](../adding_database_indexes.md)
-- [Foreign keys & associations](../foreign_keys.md)
+- [Adding database indexes](adding_database_indexes.md)
+- [Foreign keys & associations](foreign_keys.md)
- [Adding a foreign key constraint to an existing column](add_foreign_key_to_existing_column.md)
- [`NOT NULL` constraints](not_null_constraints.md)
- [Strings and the Text data type](strings_and_the_text_data_type.md)
-- [Single table inheritance](../single_table_inheritance.md)
-- [Polymorphic associations](../polymorphic_associations.md)
-- [Serializing data](../serializing_data.md)
-- [Hash indexes](../hash_indexes.md)
-- [Storing SHA1 hashes as binary](../sha1_as_binary.md)
-- [Iterating tables in batches](../iterating_tables_in_batches.md)
-- [Insert into tables in batches](../insert_into_tables_in_batches.md)
-- [Ordering table columns](../ordering_table_columns.md)
-- [Verifying database capabilities](../verifying_database_capabilities.md)
-- [Database Debugging and Troubleshooting](../database_debugging.md)
-- [Query Count Limits](../query_count_limits.md)
-- [Creating enums](../creating_enums.md)
+- [Single table inheritance](single_table_inheritance.md)
+- [Polymorphic associations](polymorphic_associations.md)
+- [Serializing data](serializing_data.md)
+- [Hash indexes](hash_indexes.md)
+- [Storing SHA1 hashes as binary](sha1_as_binary.md)
+- [Iterating tables in batches](iterating_tables_in_batches.md)
+- [Insert into tables in batches](insert_into_tables_in_batches.md)
+- [Ordering table columns](ordering_table_columns.md)
+- [Verifying database capabilities](verifying_database_capabilities.md)
+- [Query Count Limits](query_count_limits.md)
+- [Creating enums](creating_enums.md)
- [Client-side connection-pool](client_side_connection_pool.md)
- [Updating multiple values](setting_multiple_values.md)
- [Constraints naming conventions](constraint_naming_convention.md)
-- [Query performance guidelines](../query_performance.md)
+- [Query performance guidelines](query_performance.md)
- [Pagination guidelines](pagination_guidelines.md)
- [Pagination performance guidelines](pagination_performance_guidelines.md)
- [Efficient `IN` operator queries](efficient_in_operator_queries.md)
@@ -69,8 +79,8 @@ info: To determine the technical writer assigned to the Stage/Group associated w
## Case studies
-- [Database case study: Filtering by label](../filtering_by_label.md)
-- [Database case study: Namespaces storage statistics](../namespaces_storage_statistics.md)
+- [Database case study: Filtering by label](filtering_by_label.md)
+- [Database case study: Namespaces storage statistics](namespaces_storage_statistics.md)
## Miscellaneous
diff --git a/doc/development/database/insert_into_tables_in_batches.md b/doc/development/database/insert_into_tables_in_batches.md
new file mode 100644
index 00000000000..ebed3d16319
--- /dev/null
+++ b/doc/development/database/insert_into_tables_in_batches.md
@@ -0,0 +1,196 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+description: "Sometimes it is necessary to store large amounts of records at once, which can be inefficient
+when iterating collections and performing individual `save`s. With the arrival of `insert_all`
+in Rails 6, which operates at the row level (that is, using `Hash`es), GitLab has added a set
+of APIs that make it safe and simple to insert ActiveRecord objects in bulk."
+---
+
+# Insert into tables in batches
+
+Sometimes it is necessary to store large amounts of records at once, which can be inefficient
+when iterating collections and saving each record individually. With the arrival of
+[`insert_all`](https://apidock.com/rails/ActiveRecord/Persistence/ClassMethods/insert_all)
+in Rails 6, which operates at the row level (that is, using `Hash` objects), GitLab has added a set
+of APIs that make it safe and simple to insert `ActiveRecord` objects in bulk.
+
+## Prepare `ApplicationRecord`s for bulk insertion
+
+In order for a model class to take advantage of the bulk insertion API, it has to include the
+`BulkInsertSafe` concern first:
+
+```ruby
+class MyModel < ApplicationRecord
+ # other includes here
+ # ...
+ include BulkInsertSafe # include this last
+
+ # ...
+end
+```
+
+The `BulkInsertSafe` concern has two functions:
+
+- It performs checks against your model class to ensure that it does not use ActiveRecord
+ APIs that are not safe to use with respect to bulk insertions (more on that below).
+- It adds new class methods `bulk_insert!` and `bulk_upsert!`, which you can use to insert many records at once.
+
+## Insert records with `bulk_insert!` and `bulk_upsert!`
+
+If the target class passes the checks performed by `BulkInsertSafe`, you can insert an array of
+ActiveRecord model objects as follows:
+
+```ruby
+records = [MyModel.new, ...]
+
+MyModel.bulk_insert!(records)
+```
+
+Calls to `bulk_insert!` always attempt to insert _new records_. If instead
+you would like to replace existing records with new values, while still inserting those
+that do not already exist, then you can use `bulk_upsert!`:
+
+```ruby
+records = [MyModel.new, existing_model, ...]
+
+MyModel.bulk_upsert!(records, unique_by: [:name])
+```
+
+In this example, `unique_by` specifies the columns by which records are considered to be
+unique and as such are updated if they existed prior to insertion. For example, if
+`existing_model` has a `name` attribute, and if a record with the same `name` value already
+exists, its fields are updated with those of `existing_model`.
+
+The `unique_by` parameter can also be passed as a `Symbol`, in which case it specifies
+a database index by which a column is considered unique:
+
+```ruby
+MyModel.bulk_insert!(records, unique_by: :index_on_name)
+```
+
+### Record validation
+
+The `bulk_insert!` method guarantees that `records` are inserted transactionally, and
+runs validations on each record prior to insertion. If any record fails to validate,
+an error is raised and the transaction is rolled back. You can turn off validations via
+the `:validate` option:
+
+```ruby
+MyModel.bulk_insert!(records, validate: false)
+```
+
+### Batch size configuration
+
+In those cases where the number of `records` is above a given threshold, insertions
+occur in multiple batches. The default batch size is defined in
+[`BulkInsertSafe::DEFAULT_BATCH_SIZE`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/bulk_insert_safe.rb).
+Assuming a default threshold of 500, inserting 950 records
+would result in two batches being written sequentially (of size 500 and 450 respectively.)
+You can override the default batch size via the `:batch_size` option:
+
+```ruby
+MyModel.bulk_insert!(records, batch_size: 100)
+```
+
+Assuming the same number of 950 records, this would result in 10 batches being written instead.
+Since this also affects the number of `INSERT` statements that occur, make sure you measure the
+performance impact this might have on your code. There is a trade-off between the number of
+`INSERT` statements the database has to process and the size and cost of each `INSERT`.
+
+### Handling duplicate records
+
+NOTE:
+This parameter applies only to `bulk_insert!`. If you intend to update existing
+records, use `bulk_upsert!` instead.
+
+It may happen that some records you are trying to insert already exist, which would result in
+primary key conflicts. There are two ways to address this problem: failing fast by raising an
+error or skipping duplicate records. The default behavior of `bulk_insert!` is to fail fast
+and raise an `ActiveRecord::RecordNotUnique` error.
+
+If this is undesirable, you can instead skip duplicate records with the `skip_duplicates` flag:
+
+```ruby
+MyModel.bulk_insert!(records, skip_duplicates: true)
+```
+
+### Requirements for safe bulk insertions
+
+Large parts of ActiveRecord's persistence API are built around the notion of callbacks. Many
+of these callbacks fire in response to model life cycle events such as `save` or `create`.
+These callbacks cannot be used with bulk insertions, since they are meant to be called for
+every instance that is saved or created. Since these events do not fire when
+records are inserted in bulk, we currently prevent their use.
+
+The specifics around which callbacks are explicitly allowed are defined in
+[`BulkInsertSafe`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/bulk_insert_safe.rb).
+Consult the module source code for details. If your class uses callbacks that are not explicitly designated
+safe and you `include BulkInsertSafe` the application fails with an error.
+
+### `BulkInsertSafe` versus `InsertAll`
+
+Internally, `BulkInsertSafe` is based on `InsertAll`, and you may wonder when to choose
+the former over the latter. To help you make the decision,
+the key differences between these classes are listed in the table below.
+
+| | Input type | Validates input | Specify batch size | Can bypass callbacks | Transactional |
+|--------------- | -------------------- | --------------- | ------------------ | --------------------------------- | ------------- |
+| `bulk_insert!` | ActiveRecord objects | Yes (optional) | Yes (optional) | No (prevents unsafe callback use) | Yes |
+| `insert_all!` | Attribute hashes | No | No | Yes | Yes |
+
+To summarize, `BulkInsertSafe` moves bulk inserts closer to how ActiveRecord objects
+and inserts would normally behave. However, if all you need is to insert raw data in bulk, then
+`insert_all` is more efficient.
+
+## Insert `has_many` associations in bulk
+
+A common use case is to save collections of associated relations through the owner side of the relation,
+where the owned relation is associated to the owner through the `has_many` class method:
+
+```ruby
+owner = OwnerModel.new(owned_relations: array_of_owned_relations)
+# saves all `owned_relations` one-by-one
+owner.save!
+```
+
+This issues a single `INSERT`, and transaction, for every record in `owned_relations`, which is inefficient if
+`array_of_owned_relations` is large. To remedy this, the `BulkInsertableAssociations` concern can be
+used to declare that the owner defines associations that are safe for bulk insertion:
+
+```ruby
+class OwnerModel < ApplicationRecord
+ # other includes here
+ # ...
+ include BulkInsertableAssociations # include this last
+
+ has_many :my_models
+end
+```
+
+Here `my_models` must be declared `BulkInsertSafe` (as described previously) for bulk insertions
+to happen. You can now insert any yet unsaved records as follows:
+
+```ruby
+BulkInsertableAssociations.with_bulk_insert do
+ owner = OwnerModel.new(my_models: array_of_my_model_instances)
+ # saves `my_models` using a single bulk insert (possibly via multiple batches)
+ owner.save!
+end
+```
+
+You can still save relations that are not `BulkInsertSafe` in this block; they
+simply are treated as if you had invoked `save` from outside the block.
+
+## Known limitations
+
+There are a few restrictions to how these APIs can be used:
+
+- `BulkInsertableAssociations`:
+ - It is currently only compatible with `has_many` relations.
+ - It does not yet support `has_many through: ...` relations.
+
+Moreover, input data should either be limited to around 1000 records at most,
+or already batched prior to calling bulk insert. The `INSERT` statement runs in a single
+transaction, so for large amounts of records it may negatively affect database stability.
diff --git a/doc/development/database/iterating_tables_in_batches.md b/doc/development/database/iterating_tables_in_batches.md
new file mode 100644
index 00000000000..6d7a57ecacb
--- /dev/null
+++ b/doc/development/database/iterating_tables_in_batches.md
@@ -0,0 +1,598 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Iterating tables in batches
+
+Rails provides a method called `in_batches` that can be used to iterate over
+rows in batches. For example:
+
+```ruby
+User.in_batches(of: 10) do |relation|
+ relation.update_all(updated_at: Time.now)
+end
+```
+
+Unfortunately, this method is implemented in a way that is not very efficient,
+both query and memory usage wise.
+
+To work around this you can include the `EachBatch` module into your models,
+then use the `each_batch` class method. For example:
+
+```ruby
+class User < ActiveRecord::Base
+ include EachBatch
+end
+
+User.each_batch(of: 10) do |relation|
+ relation.update_all(updated_at: Time.now)
+end
+```
+
+This produces queries such as:
+
+```plaintext
+User Load (0.7ms) SELECT "users"."id" FROM "users" WHERE ("users"."id" >= 41654) ORDER BY "users"."id" ASC LIMIT 1 OFFSET 1000
+ (0.7ms) SELECT COUNT(*) FROM "users" WHERE ("users"."id" >= 41654) AND ("users"."id" < 42687)
+```
+
+The API of this method is similar to `in_batches`, though it doesn't support
+all of the arguments that `in_batches` supports. You should always use
+`each_batch` _unless_ you have a specific need for `in_batches`.
+
+## Iterating over non-unique columns
+
+One should proceed with extra caution. When you iterate over an attribute that is not unique,
+even with the applied max batch size, there is no guarantee that the resulting batches do not
+surpass it. The following snippet demonstrates this situation when one attempt to select
+`Ci::Build` entries for users with `id` between `1` and `10,000`, the database returns
+`1 215 178` matching rows.
+
+```ruby
+[ gstg ] production> Ci::Build.where(user_id: (1..10_000)).size
+=> 1215178
+```
+
+This happens because the built relation is translated into the following query:
+
+```ruby
+[ gstg ] production> puts Ci::Build.where(user_id: (1..10_000)).to_sql
+SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."user_id" BETWEEN 1 AND 10000
+=> nil
+```
+
+`And` queries which filter non-unique column by range `WHERE "ci_builds"."user_id" BETWEEN ? AND ?`,
+even though the range size is limited to a certain threshold (`10,000` in the previous example) this
+threshold does not translate to the size of the returned dataset. That happens because when taking
+`n` possible values of attributes, one can't tell for sure that the number of records that contains
+them is less than `n`.
+
+### Loose-index scan with `distinct_each_batch`
+
+When iterating over a non-unique column is necessary, use the `distinct_each_batch` helper
+method. The helper uses the [loose-index scan technique](https://wiki.postgresql.org/wiki/Loose_indexscan)
+(skip-index scan) to skip duplicated values within a database index.
+
+Example: iterating over distinct `author_id` in the Issue model
+
+```ruby
+Issue.distinct_each_batch(column: :author_id, of: 1000) do |relation|
+ users = User.where(id: relation.select(:author_id)).to_a
+end
+```
+
+The technique provides stable performance between the batches regardless of the data distribution.
+The `relation` object returns an ActiveRecord scope where only the given `column` is available.
+Other columns are not loaded.
+
+The underlying database queries use recursive CTEs, which adds extra overhead. We therefore advise to use
+smaller batch sizes than those used for a standard `each_batch` iteration.
+
+## Column definition
+
+`EachBatch` uses the primary key of the model by default for the iteration. This works most of the
+cases, however in some cases, you might want to use a different column for the iteration.
+
+```ruby
+Project.distinct.each_batch(column: :creator_id, of: 10) do |relation|
+ puts User.where(id: relation.select(:creator_id)).map(&:id)
+end
+```
+
+The query above iterates over the project creators and prints them out without duplications.
+
+NOTE:
+In case the column is not unique (no unique index definition), calling the `distinct` method on
+the relation is necessary. Using not unique column without `distinct` may result in `each_batch`
+falling into an endless loop as described in following
+[issue](https://gitlab.com/gitlab-org/gitlab/-/issues/285097).
+
+## `EachBatch` in data migrations
+
+When dealing with data migrations the preferred way to iterate over a large volume of data is using
+`EachBatch`.
+
+A special case of data migration is a [background migration](background_migrations.md#scheduling)
+where the actual data modification is executed in a background job. The migration code that
+determines the data ranges (slices) and schedules the background jobs uses `each_batch`.
+
+## Efficient usage of `each_batch`
+
+`EachBatch` helps to iterate over large tables. It's important to highlight that `EachBatch`
+does not magically solve all iteration-related performance problems, and it might not help at
+all in some scenarios. From the database point of view, correctly configured database indexes are
+also necessary to make `EachBatch` perform well.
+
+### Example 1: Simple iteration
+
+Let's consider that we want to iterate over the `users` table and print the `User` records to the
+standard output. The `users` table contains millions of records, thus running one query to fetch
+the users likely times out.
+
+![Users table overview](../img/each_batch_users_table_v13_7.png)
+
+This is a simplified version of the `users` table which contains several rows. We have a few
+smaller gaps in the `id` column to make the example a bit more realistic (a few records were
+already deleted). Currently, we have one index on the `id` field.
+
+Loading all users into memory (avoid):
+
+```ruby
+users = User.all
+
+users.each { |user| puts user.inspect }
+```
+
+Use `each_batch`:
+
+```ruby
+# Note: for this example I picked 5 as the batch size, the default is 1_000
+User.each_batch(of: 5) do |relation|
+ relation.each { |user| puts user.inspect }
+end
+```
+
+#### How `each_batch` works
+
+As the first step, it finds the lowest `id` (start `id`) in the table by executing the following
+database query:
+
+```sql
+SELECT "users"."id" FROM "users" ORDER BY "users"."id" ASC LIMIT 1
+```
+
+![Reading the start ID value](../img/each_batch_users_table_iteration_1_v13_7.png)
+
+Notice that the query only reads data from the index (`INDEX ONLY SCAN`), the table is not
+accessed. Database indexes are sorted so taking out the first item is a very cheap operation.
+
+The next step is to find the next `id` (end `id`) which should respect the batch size
+configuration. In this example we used a batch size of 5. `EachBatch` uses the `OFFSET` clause
+to get a "shifted" `id` value.
+
+```sql
+SELECT "users"."id" FROM "users" WHERE "users"."id" >= 1 ORDER BY "users"."id" ASC LIMIT 1 OFFSET 5
+```
+
+![Reading the end ID value](../img/each_batch_users_table_iteration_2_v13_7.png)
+
+Again, the query only looks into the index. The `OFFSET 5` takes out the sixth `id` value: this
+query reads a maximum of six items from the index regardless of the table size or the iteration
+count.
+
+At this point, we know the `id` range for the first batch. Now it's time to construct the query
+for the `relation` block.
+
+```sql
+SELECT "users".* FROM "users" WHERE "users"."id" >= 1 AND "users"."id" < 302
+```
+
+![Reading the rows from the `users` table](../img/each_batch_users_table_iteration_3_v13_7.png)
+
+Notice the `<` sign. Previously six items were read from the index and in this query, the last
+value is "excluded". The query looks at the index to get the location of the five `user`
+rows on the disk and read the rows from the table. The returned array is processed in Ruby.
+
+The first iteration is done. For the next iteration, the last `id` value is reused from the
+previous iteration in order to find out the next end `id` value.
+
+```sql
+SELECT "users"."id" FROM "users" WHERE "users"."id" >= 302 ORDER BY "users"."id" ASC LIMIT 1 OFFSET 5
+```
+
+![Reading the second end ID value](../img/each_batch_users_table_iteration_4_v13_7.png)
+
+Now we can easily construct the `users` query for the second iteration.
+
+```sql
+SELECT "users".* FROM "users" WHERE "users"."id" >= 302 AND "users"."id" < 353
+```
+
+![Reading the rows for the second iteration from the users table](../img/each_batch_users_table_iteration_5_v13_7.png)
+
+### Example 2: Iteration with filters
+
+Building on top of the previous example, we want to print users with zero sign-in count. We keep
+track of the number of sign-ins in the `sign_in_count` column so we write the following code:
+
+```ruby
+users = User.where(sign_in_count: 0)
+
+users.each_batch(of: 5) do |relation|
+ relation.each { |user| puts user.inspect }
+end
+```
+
+`each_batch` produces the following SQL query for the start `id` value:
+
+```sql
+SELECT "users"."id" FROM "users" WHERE "users"."sign_in_count" = 0 ORDER BY "users"."id" ASC LIMIT 1
+```
+
+Selecting only the `id` column and ordering by `id` forces the database to use the
+index on the `id` (primary key index) column however, we also have an extra condition on the
+`sign_in_count` column. The column is not part of the index, so the database needs to look into
+the actual table to find the first matching row.
+
+![Reading the index with extra filter](../img/each_batch_users_table_filter_v13_7.png)
+
+NOTE:
+The number of scanned rows depends on the data distribution in the table.
+
+- Best case scenario: the first user was never logged in. The database reads only one row.
+- Worst case scenario: all users were logged in at least once. The database reads all rows.
+
+In this particular example, the database had to read 10 rows (regardless of our batch size setting)
+to determine the first `id` value. In a "real-world" application it's hard to predict whether the
+filtering causes problems or not. In the case of GitLab, verifying the data on a
+production replica is a good start, but keep in mind that data distribution on GitLab.com can be
+different from self-managed instances.
+
+#### Improve filtering with `each_batch`
+
+##### Specialized conditional index
+
+```sql
+CREATE INDEX index_on_users_never_logged_in ON users (id) WHERE sign_in_count = 0
+```
+
+This is how our table and the newly created index looks like:
+
+![Reading the specialized index](../img/each_batch_users_table_filtered_index_v13_7.png)
+
+This index definition covers the conditions on the `id` and `sign_in_count` columns thus makes the
+`each_batch` queries very effective (similar to the simple iteration example).
+
+It's rare when a user was never signed in so we a anticipate small index size. Including only the
+`id` in the index definition also helps to keep the index size small.
+
+##### Index on columns
+
+Later on, we might want to iterate over the table filtering for different `sign_in_count` values, in
+those cases we cannot use the previously suggested conditional index because the `WHERE` condition
+does not match with our new filter (`sign_in_count > 10`).
+
+To address this problem, we have two options:
+
+- Create another, conditional index to cover the new query.
+- Replace the index with a more generalized configuration.
+
+NOTE:
+Having multiple indexes on the same table and on the same columns could be a performance bottleneck
+when writing data.
+
+Let's consider the following index (avoid):
+
+```sql
+CREATE INDEX index_on_users_never_logged_in ON users (id, sign_in_count)
+```
+
+The index definition starts with the `id` column which makes the index very inefficient from data
+selectivity point of view.
+
+```sql
+SELECT "users"."id" FROM "users" WHERE "users"."sign_in_count" = 0 ORDER BY "users"."id" ASC LIMIT 1
+```
+
+Executing the query above results in an `INDEX ONLY SCAN`. However, the query still needs to
+iterate over an unknown number of entries in the index, and then find the first item where the
+`sign_in_count` is `0`.
+
+![Reading an ineffective index](../img/each_batch_users_table_bad_index_v13_7.png)
+
+We can improve the query significantly by swapping the columns in the index definition (prefer).
+
+```sql
+CREATE INDEX index_on_users_never_logged_in ON users (sign_in_count, id)
+```
+
+![Reading a good index](../img/each_batch_users_table_good_index_v13_7.png)
+
+The following index definition does not work well with `each_batch` (avoid).
+
+```sql
+CREATE INDEX index_on_users_never_logged_in ON users (sign_in_count)
+```
+
+Since `each_batch` builds range queries based on the `id` column, this index cannot be used
+efficiently. The DB reads the rows from the table or uses a bitmap search where the primary
+key index is also read.
+
+##### "Slow" iteration
+
+Slow iteration means that we use a good index configuration to iterate over the table and
+apply filtering on the yielded relation.
+
+```ruby
+User.each_batch(of: 5) do |relation|
+ relation.where(sign_in_count: 0).each { |user| puts user inspect }
+end
+```
+
+The iteration uses the primary key index (on the `id` column) which makes it safe from statement
+timeouts. The filter (`sign_in_count: 0`) is applied on the `relation` where the `id` is already
+constrained (range). The number of rows is limited.
+
+Slow iteration generally takes more time to finish. The iteration count is higher and
+one iteration could yield fewer records than the batch size. Iterations may even yield
+0 records. This is not an optimal solution; however, in some cases (especially when
+dealing with large tables) this is the only viable option.
+
+### Using Subqueries
+
+Using subqueries in your `each_batch` query does not work well in most cases. Consider the following example:
+
+```ruby
+projects = Project.where(creator_id: Issue.where(confidential: true).select(:author_id))
+
+projects.each_batch do |relation|
+ # do something
+end
+```
+
+The iteration uses the `id` column of the `projects` table. The batching does not affect the
+subquery. This means for each iteration, the subquery is executed by the database. This adds a
+constant "load" on the query which often ends up in statement timeouts. We have an unknown number
+of [confidential issues](../../user/project/issues/confidential_issues.md), the execution time
+and the accessed database rows depend on the data distribution in the `issues` table.
+
+NOTE:
+Using subqueries works only when the subquery returns a small number of rows.
+
+#### Improving Subqueries
+
+When dealing with subqueries, a slow iteration approach could work: the filter on `creator_id`
+can be part of the generated `relation` object.
+
+```ruby
+projects = Project.all
+
+projects.each_batch do |relation|
+ relation.where(creator_id: Issue.where(confidential: true).select(:author_id))
+end
+```
+
+If the query on the `issues` table itself is not performant enough, a nested loop could be
+constructed. Try to avoid it when possible.
+
+```ruby
+projects = Project.all
+
+projects.each_batch do |relation|
+ issues = Issue.where(confidential: true)
+
+ issues.each_batch do |issues_relation|
+ relation.where(creator_id: issues_relation.select(:author_id))
+ end
+end
+```
+
+If we know that the `issues` table has many more rows than `projects`, it would make sense to flip
+the queries, where the `issues` table is batched first.
+
+### Using `JOIN` and `EXISTS`
+
+When to use `JOINS`:
+
+- When there's a 1:1 or 1:N relationship between the tables where we know that the joined record
+(almost) always exists. This works well for "extension-like" tables:
+ - `projects` - `project_settings`
+ - `users` - `user_details`
+ - `users` - `user_statuses`
+- `LEFT JOIN` works well in this case. Conditions on the joined table need to go to the yielded
+relation so the iteration is not affected by the data distribution in the joined table.
+
+Example:
+
+```ruby
+users = User.joins("LEFT JOIN personal_access_tokens on personal_access_tokens.user_id = users.id")
+
+users.each_batch do |relation|
+ relation.where("personal_access_tokens.name = 'name'")
+end
+```
+
+`EXISTS` queries should be added only to the inner `relation` of the `each_batch` query:
+
+```ruby
+User.each_batch do |relation|
+ relation.where("EXISTS (SELECT 1 FROM ...")
+end
+```
+
+### Complex queries on the relation object
+
+When the `relation` object has several extra conditions, the execution plans might become
+"unstable".
+
+Example:
+
+```ruby
+Issue.each_batch do |relation|
+ relation
+ .joins(:metrics)
+ .joins(:merge_requests_closing_issues)
+ .where("id IN (SELECT ...)")
+ .where(confidential: true)
+end
+```
+
+Here, we expect that the `relation` query reads the `BATCH_SIZE` of user records and then
+filters down the results according to the provided queries. The planner might decide that
+using a bitmap index lookup with the index on the `confidential` column is a better way to
+execute the query. This can cause an unexpectedly high amount of rows to be read and the
+query could time out.
+
+Problem: we know for sure that the relation is returning maximum `BATCH_SIZE` of records
+however, the planner does not know this.
+
+Common table expression (CTE) trick to force the range query to execute first:
+
+```ruby
+Issue.each_batch(of: 1000) do |relation|
+ cte = Gitlab::SQL::CTE.new(:batched_relation, relation.limit(1000))
+
+ scope = cte
+ .apply_to(Issue.all)
+ .joins(:metrics)
+ .joins(:merge_requests_closing_issues)
+ .where("id IN (SELECT ...)")
+ .where(confidential: true)
+
+ puts scope.to_a
+end
+```
+
+### `EachBatch` vs `BatchCount`
+
+When adding new counters for Service Ping, the preferred way to count records is using the
+`Gitlab::Database::BatchCount` class. The iteration logic implemented in `BatchCount`
+has similar performance characteristics like `EachBatch`. Most of the tips and suggestions
+for improving `BatchCount` mentioned above applies to `BatchCount` as well.
+
+## Iterate with keyset pagination
+
+There are a few special cases where iterating with `EachBatch` does not work. `EachBatch`
+requires one distinct column (usually the primary key), which makes the iteration impossible
+for timestamp columns and tables with composite primary keys.
+
+Where `EachBatch` does not work, you can use
+[keyset pagination](pagination_guidelines.md#keyset-pagination) to iterate over the
+table or a range of rows. The scaling and performance characteristics are very similar to
+`EachBatch`.
+
+Examples:
+
+- Iterate over the table in a specific order (timestamp columns) in combination with a tie-breaker
+if column user to sort by does not contain unique values.
+- Iterate over the table with composite primary keys.
+
+### Iterate over the issues in a project by creation date
+
+You can use keyset pagination to iterate over any database column in a specific order (for example,
+`created_at DESC`). To ensure consistent order of the returned records with the same values for
+`created_at`, use a tie-breaker column with unique values (for example, `id`).
+
+Assume you have the following index in the `issues` table:
+
+```sql
+idx_issues_on_project_id_and_created_at_and_id" btree (project_id, created_at, id)
+```
+
+### Fetching records for further processing
+
+The following snippet iterates over issue records within the project using the specified order
+(`created_at, id`).
+
+```ruby
+scope = Issue.where(project_id: 278964).order(:created_at, :id) # id is the tie-breaker
+
+iterator = Gitlab::Pagination::Keyset::Iterator.new(scope: scope)
+
+iterator.each_batch(of: 100) do |records|
+ puts records.map(&:id)
+end
+```
+
+You can add extra filters to the query. This example only lists the issue IDs created in the last
+30 days:
+
+```ruby
+scope = Issue.where(project_id: 278964).where('created_at > ?', 30.days.ago).order(:created_at, :id) # id is the tie-breaker
+
+iterator = Gitlab::Pagination::Keyset::Iterator.new(scope: scope)
+
+iterator.each_batch(of: 100) do |records|
+ puts records.map(&:id)
+end
+```
+
+### Updating records in the batch
+
+For complex `ActiveRecord` queries, the `.update_all` method does not work well, because it
+generates an incorrect `UPDATE` statement.
+You can use raw SQL for updating records in batches:
+
+```ruby
+scope = Issue.where(project_id: 278964).order(:created_at, :id) # id is the tie-breaker
+
+iterator = Gitlab::Pagination::Keyset::Iterator.new(scope: scope)
+
+iterator.each_batch(of: 100) do |records|
+ ApplicationRecord.connection.execute("UPDATE issues SET updated_at=NOW() WHERE issues.id in (#{records.dup.reselect(:id).to_sql})")
+end
+```
+
+NOTE:
+To keep the iteration stable and predictable, avoid updating the columns in the `ORDER BY` clause.
+
+### Iterate over the `merge_request_diff_commits` table
+
+The `merge_request_diff_commits` table uses a composite primary key (`merge_request_diff_id,
+relative_order`), which makes `EachBatch` impossible to use efficiently.
+
+To paginate over the `merge_request_diff_commits` table, you can use the following snippet:
+
+```ruby
+# Custom order object configuration:
+order = Gitlab::Pagination::Keyset::Order.build([
+ Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
+ attribute_name: 'merge_request_diff_id',
+ order_expression: MergeRequestDiffCommit.arel_table[:merge_request_diff_id].asc,
+ nullable: :not_nullable,
+ distinct: false,
+ ),
+ Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
+ attribute_name: 'relative_order',
+ order_expression: MergeRequestDiffCommit.arel_table[:relative_order].asc,
+ nullable: :not_nullable,
+ distinct: false,
+ )
+])
+MergeRequestDiffCommit.include(FromUnion) # keyset pagination generates UNION queries
+
+scope = MergeRequestDiffCommit.order(order)
+
+iterator = Gitlab::Pagination::Keyset::Iterator.new(scope: scope)
+
+iterator.each_batch(of: 100) do |records|
+ puts records.map { |record| [record.merge_request_diff_id, record.relative_order] }.inspect
+end
+```
+
+### Order object configuration
+
+Keyset pagination works well with simple `ActiveRecord` `order` scopes
+([first example](#iterate-over-the-issues-in-a-project-by-creation-date).
+However, in special cases, you need to describe the columns in the `ORDER BY` clause (second example)
+for the underlying keyset pagination library. When the `ORDER BY` configuration cannot be
+automatically determined by the keyset pagination library, an error is raised.
+
+The code comments of the
+[`Gitlab::Pagination::Keyset::Order`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/pagination/keyset/order.rb)
+and [`Gitlab::Pagination::Keyset::ColumnOrderDefinition`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/pagination/keyset/column_order_definition.rb)
+classes give an overview of the possible options for configuring the `ORDER BY` clause. You can
+also find a few code examples in the
+[keyset pagination](keyset_pagination.md#complex-order-configuration) documentation.
diff --git a/doc/development/database/loose_foreign_keys.md b/doc/development/database/loose_foreign_keys.md
index 6aa1b9c40ff..8dbccf048d7 100644
--- a/doc/development/database/loose_foreign_keys.md
+++ b/doc/development/database/loose_foreign_keys.md
@@ -10,7 +10,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
In relational databases (including PostgreSQL), foreign keys provide a way to link
two database tables together, and ensure data-consistency between them. In GitLab,
-[foreign keys](../foreign_keys.md) are vital part of the database design process.
+[foreign keys](foreign_keys.md) are vital part of the database design process.
Most of our database tables have foreign keys.
With the ongoing database [decomposition work](https://gitlab.com/groups/gitlab-org/-/epics/6168),
@@ -221,8 +221,8 @@ ON DELETE CASCADE;
```
The migration must run after the `DELETE` trigger is installed and the loose
-foreign key definition is deployed. As such, it must be a [post-deployment
-migration](post_deployment_migrations.md) dated after the migration for the
+foreign key definition is deployed. As such, it must be a
+[post-deployment migration](post_deployment_migrations.md) dated after the migration for the
trigger. If the foreign key is deleted earlier, there is a good chance of
introducing data inconsistency which needs manual cleanup:
@@ -251,8 +251,8 @@ When the loose foreign key definition is no longer needed (parent table is remov
we need to remove the definition from the YAML file and ensure that we don't leave pending deleted
records in the database.
-1. Remove the loose foreign key definition from the configuration (`config/gitlab_loose_foreign_keys.yml`).
1. Remove the deletion tracking trigger from the parent table (if the parent table is still there).
+1. Remove the loose foreign key definition from the configuration (`config/gitlab_loose_foreign_keys.yml`).
1. Remove leftover deleted records from the `loose_foreign_keys_deleted_records` table.
Migration for removing the trigger:
@@ -395,8 +395,7 @@ We considered using these Rails features as an alternative to foreign keys but t
For non-trivial objects that need to clean up data outside the
database (for example, object storage) where you might wish to use `dependent: :destroy`,
see alternatives in
-[Avoid `dependent: :nullify` and `dependent: :destroy` across
-databases](./multiple_databases.md#avoid-dependent-nullify-and-dependent-destroy-across-databases).
+[Avoid `dependent: :nullify` and `dependent: :destroy` across databases](multiple_databases.md#avoid-dependent-nullify-and-dependent-destroy-across-databases).
## Risks of loose foreign keys and possible mitigations
diff --git a/doc/development/database/multiple_databases.md b/doc/development/database/multiple_databases.md
index 9641ea37002..31fc454f8a7 100644
--- a/doc/development/database/multiple_databases.md
+++ b/doc/development/database/multiple_databases.md
@@ -1,15 +1,14 @@
---
stage: Data Stores
-group: Sharding
+group: Pods
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
# Multiple Databases
To allow GitLab to scale further we
-[decomposed the GitLab application database into multiple
-databases](https://gitlab.com/groups/gitlab-org/-/epics/6168). The two databases
-are `main` and `ci`. GitLab supports being run with either one database or two databases.
+[decomposed the GitLab application database into multiple databases](https://gitlab.com/groups/gitlab-org/-/epics/6168).
+The two databases are `main` and `ci`. GitLab supports being run with either one database or two databases.
On GitLab.com we are using two separate databases.
## GitLab Schema
@@ -246,7 +245,7 @@ where projects_with_ci_feature_usage.ci_feature = 'code_coverage'
```
The above example uses as a text column for simplicity but we should probably
-use an [enum](../creating_enums.md) to save space.
+use an [enum](creating_enums.md) to save space.
The downside of this new design is that this may need to be
updated (removed if the `ci_daily_build_group_report_results` is deleted).
diff --git a/doc/development/database/namespaces_storage_statistics.md b/doc/development/database/namespaces_storage_statistics.md
new file mode 100644
index 00000000000..702129b9ddb
--- /dev/null
+++ b/doc/development/database/namespaces_storage_statistics.md
@@ -0,0 +1,193 @@
+---
+stage: none
+group: unassigned
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Database case study: Namespaces storage statistics
+
+## Introduction
+
+On [Storage and limits management for groups](https://gitlab.com/groups/gitlab-org/-/epics/886),
+we want to facilitate a method for easily viewing the amount of
+storage consumed by a group, and allow easy management.
+
+## Proposal
+
+1. Create a new ActiveRecord model to hold the namespaces' statistics in an aggregated form (only for root [namespaces](../../user/namespace/index.md)).
+1. Refresh the statistics in this model every time a project belonging to this namespace is changed.
+
+## Problem
+
+In GitLab, we update the project storage statistics through a
+[callback](https://gitlab.com/gitlab-org/gitlab/-/blob/4ab54c2233e91f60a80e5b6fa2181e6899fdcc3e/app/models/project.rb#L97)
+every time the project is saved.
+
+The summary of those statistics per namespace is then retrieved
+by [`Namespaces#with_statistics`](https://gitlab.com/gitlab-org/gitlab/-/blob/4ab54c2233e91f60a80e5b6fa2181e6899fdcc3e/app/models/namespace.rb#L70) scope. Analyzing this query we noticed that:
+
+- It takes up to `1.2` seconds for namespaces with over `15k` projects.
+- It can't be analyzed with [ChatOps](../chatops_on_gitlabcom.md), as it times out.
+
+Additionally, the pattern that is currently used to update the project statistics
+(the callback) doesn't scale adequately. It is currently one of the largest
+[database queries transactions on production](https://gitlab.com/gitlab-org/gitlab/-/issues/29070)
+that takes the most time overall. We can't add one more query to it as
+it increases the transaction's length.
+
+Because of all of the above, we can't apply the same pattern to store
+and update the namespaces statistics, as the `namespaces` table is one
+of the largest tables on GitLab.com. Therefore we needed to find a performant and
+alternative method.
+
+## Attempts
+
+### Attempt A: PostgreSQL materialized view
+
+Model can be updated through a refresh strategy based on a project routes SQL and a [materialized view](https://www.postgresql.org/docs/11/rules-materializedviews.html):
+
+```sql
+SELECT split_part("rs".path, '/', 1) as root_path,
+ COALESCE(SUM(ps.storage_size), 0) AS storage_size,
+ COALESCE(SUM(ps.repository_size), 0) AS repository_size,
+ COALESCE(SUM(ps.wiki_size), 0) AS wiki_size,
+ COALESCE(SUM(ps.lfs_objects_size), 0) AS lfs_objects_size,
+ COALESCE(SUM(ps.build_artifacts_size), 0) AS build_artifacts_size,
+ COALESCE(SUM(ps.pipeline_artifacts_size), 0) AS pipeline_artifacts_size,
+ COALESCE(SUM(ps.packages_size), 0) AS packages_size,
+ COALESCE(SUM(ps.snippets_size), 0) AS snippets_size,
+ COALESCE(SUM(ps.uploads_size), 0) AS uploads_size
+FROM "projects"
+ INNER JOIN routes rs ON rs.source_id = projects.id AND rs.source_type = 'Project'
+ INNER JOIN project_statistics ps ON ps.project_id = projects.id
+GROUP BY root_path
+```
+
+We could then execute the query with:
+
+```sql
+REFRESH MATERIALIZED VIEW root_namespace_storage_statistics;
+```
+
+While this implied a single query update (and probably a fast one), it has some downsides:
+
+- Materialized views syntax varies from PostgreSQL and MySQL. While this feature was worked on, MySQL was still supported by GitLab.
+- Rails does not have native support for materialized views. We'd need to use a specialized gem to take care of the management of the database views, which implies additional work.
+
+### Attempt B: An update through a CTE
+
+Similar to Attempt A: Model update done through a refresh strategy with a [Common Table Expression](https://www.postgresql.org/docs/9.1/queries-with.html)
+
+```sql
+WITH refresh AS (
+ SELECT split_part("rs".path, '/', 1) as root_path,
+ COALESCE(SUM(ps.storage_size), 0) AS storage_size,
+ COALESCE(SUM(ps.repository_size), 0) AS repository_size,
+ COALESCE(SUM(ps.wiki_size), 0) AS wiki_size,
+ COALESCE(SUM(ps.lfs_objects_size), 0) AS lfs_objects_size,
+ COALESCE(SUM(ps.build_artifacts_size), 0) AS build_artifacts_size,
+ COALESCE(SUM(ps.pipeline_artifacts_size), 0) AS pipeline_artifacts_size,
+ COALESCE(SUM(ps.packages_size), 0) AS packages_size,
+ COALESCE(SUM(ps.snippets_size), 0) AS snippets_size,
+ COALESCE(SUM(ps.uploads_size), 0) AS uploads_size
+ FROM "projects"
+ INNER JOIN routes rs ON rs.source_id = projects.id AND rs.source_type = 'Project'
+ INNER JOIN project_statistics ps ON ps.project_id = projects.id
+ GROUP BY root_path)
+UPDATE namespace_storage_statistics
+SET storage_size = refresh.storage_size,
+ repository_size = refresh.repository_size,
+ wiki_size = refresh.wiki_size,
+ lfs_objects_size = refresh.lfs_objects_size,
+ build_artifacts_size = refresh.build_artifacts_size,
+ pipeline_artifacts_size = refresh.pipeline_artifacts_size,
+ packages_size = refresh.packages_size,
+ snippets_size = refresh.snippets_size,
+ uploads_size = refresh.uploads_size
+FROM refresh
+ INNER JOIN routes rs ON rs.path = refresh.root_path AND rs.source_type = 'Namespace'
+WHERE namespace_storage_statistics.namespace_id = rs.source_id
+```
+
+Same benefits and downsides as attempt A.
+
+### Attempt C: Get rid of the model and store the statistics on Redis
+
+We could get rid of the model that stores the statistics in aggregated form and instead use a Redis Set.
+This would be the [boring solution](https://about.gitlab.com/handbook/values/#boring-solutions) and the fastest one
+to implement, as GitLab already includes Redis as part of its [Architecture](../architecture.md#redis).
+
+The downside of this approach is that Redis does not provide the same persistence/consistency guarantees as PostgreSQL,
+and this is information we can't afford to lose in a Redis failure.
+
+### Attempt D: Tag the root namespace and its child namespaces
+
+Directly relate the root namespace to its child namespaces, so
+whenever a namespace is created without a parent, this one is tagged
+with the root namespace ID:
+
+| ID | root ID | parent ID |
+|:---|:--------|:----------|
+| 1 | 1 | NULL |
+| 2 | 1 | 1 |
+| 3 | 1 | 2 |
+
+To aggregate the statistics inside a namespace we'd execute something like:
+
+```sql
+SELECT COUNT(...)
+FROM projects
+WHERE namespace_id IN (
+ SELECT id
+ FROM namespaces
+ WHERE root_id = X
+)
+```
+
+Even though this approach would make aggregating much easier, it has some major downsides:
+
+- We'd have to migrate **all namespaces** by adding and filling a new column. Because of the size of the table, dealing with time/cost would be significant. The background migration would take approximately `153h`, see <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/29772>.
+- Background migration has to be shipped one release before, delaying the functionality by another milestone.
+
+### Attempt E (final): Update the namespace storage statistics asynchronously
+
+This approach consists of continuing to use the incremental statistics updates we already have,
+but we refresh them through Sidekiq jobs and in different transactions:
+
+1. Create a second table (`namespace_aggregation_schedules`) with two columns `id` and `namespace_id`.
+1. Whenever the statistics of a project changes, insert a row into `namespace_aggregation_schedules`
+ - We don't insert a new row if there's already one related to the root namespace.
+ - Keeping in mind the length of the transaction that involves updating `project_statistics`(<https://gitlab.com/gitlab-org/gitlab/-/issues/29070>), the insertion should be done in a different transaction and through a Sidekiq Job.
+1. After inserting the row, we schedule another worker to be executed asynchronously at two different moments:
+ - One enqueued for immediate execution and another one scheduled in `1.5h` hours.
+ - We only schedule the jobs, if we can obtain a `1.5h` lease on Redis on a key based on the root namespace ID.
+ - If we can't obtain the lease, it indicates there's another aggregation already in progress, or scheduled in no more than `1.5h`.
+1. This worker will:
+ - Update the root namespace storage statistics by querying all the namespaces through a service.
+ - Delete the related `namespace_aggregation_schedules` after the update.
+1. Another Sidekiq job is also included to traverse any remaining rows on the `namespace_aggregation_schedules` table and schedule jobs for every pending row.
+ - This job is scheduled with cron to run every night (UTC).
+
+This implementation has the following benefits:
+
+- All the updates are done asynchronously, so we're not increasing the length of the transactions for `project_statistics`.
+- We're doing the update in a single SQL query.
+- It is compatible with PostgreSQL and MySQL.
+- No background migration required.
+
+The only downside of this approach is that namespaces' statistics are updated up to `1.5` hours after the change is done,
+which means there's a time window in which the statistics are inaccurate. Because we're still not
+[enforcing storage limits](https://gitlab.com/gitlab-org/gitlab/-/issues/17664), this is not a major problem.
+
+## Conclusion
+
+Updating the storage statistics asynchronously, was the less problematic and
+performant approach of aggregating the root namespaces.
+
+All the details regarding this use case can be found on:
+
+- <https://gitlab.com/gitlab-org/gitlab-foss/-/issues/62214>
+- Merge Request with the implementation: <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/28996>
+
+Performance of the namespace storage statistics were measured in staging and production (GitLab.com). All results were posted
+on <https://gitlab.com/gitlab-org/gitlab-foss/-/issues/64092>: No problem has been reported so far.
diff --git a/doc/development/database/not_null_constraints.md b/doc/development/database/not_null_constraints.md
index 3962307f80d..9b3d017b09f 100644
--- a/doc/development/database/not_null_constraints.md
+++ b/doc/development/database/not_null_constraints.md
@@ -135,7 +135,7 @@ post-deployment migration or a background data migration:
- If the data volume is less than `1000` records, then the data migration can be executed within the post-migration.
- If the data volume is higher than `1000` records, it's advised to create a background migration.
-When unsure about which option to use, please contact the Database team for advice.
+When unsure about which option to use, contact the Database team for advice.
Back to our example, the epics table is not considerably large nor frequently accessed,
so we add a post-deployment migration for the 13.0 milestone (current),
@@ -206,6 +206,6 @@ In that rare case you need 3 releases end-to-end:
1. Release `N.M+1` - Cleanup the background migration.
1. Release `N.M+2` - Validate the `NOT NULL` constraint.
-For these cases, please consult the database team early in the update cycle. The `NOT NULL`
+For these cases, consult the database team early in the update cycle. The `NOT NULL`
constraint may not be required or other options could exist that do not affect really large
or frequently accessed tables.
diff --git a/doc/development/database/ordering_table_columns.md b/doc/development/database/ordering_table_columns.md
new file mode 100644
index 00000000000..7cd3d4fb208
--- /dev/null
+++ b/doc/development/database/ordering_table_columns.md
@@ -0,0 +1,152 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Ordering Table Columns in PostgreSQL
+
+For GitLab we require that columns of new tables are ordered to use the
+least amount of space. An easy way of doing this is to order them based on the
+type size in descending order with variable sizes (`text`, `varchar`, arrays,
+`json`, `jsonb`, and so on) at the end.
+
+Similar to C structures the space of a table is influenced by the order of
+columns. This is because the size of columns is aligned depending on the type of
+the following column. Let's consider an example:
+
+- `id` (integer, 4 bytes)
+- `name` (text, variable)
+- `user_id` (integer, 4 bytes)
+
+The first column is a 4-byte integer. The next is text of variable length. The
+`text` data type requires 1-word alignment, and on 64-bit platform, 1 word is 8
+bytes. To meet the alignment requirements, four zeros are to be added right
+after the first column, so `id` occupies 4 bytes, then 4 bytes of alignment
+padding, and only next `name` is being stored. Therefore, in this case, 8 bytes
+are spent for storing a 4-byte integer.
+
+The space between rows is also subject to alignment padding. The `user_id`
+column takes only 4 bytes, and on 64-bit platform, 4 zeroes are added for
+alignment padding, to allow storing the next row beginning with the "clear" word.
+
+As a result, the actual size of each column would be (omitting variable length
+data and 24-byte tuple header): 8 bytes, variable, 8 bytes. This means that
+each row requires at least 16 bytes for the two 4-byte integers. If a table
+has a few rows this is not an issue. However, once you start storing millions of
+rows you can save space by using a different order. For the above example, the
+ideal column order would be the following:
+
+- `id` (integer, 4 bytes)
+- `user_id` (integer, 4 bytes)
+- `name` (text, variable)
+
+or
+
+- `name` (text, variable)
+- `id` (integer, 4 bytes)
+- `user_id` (integer, 4 bytes)
+
+In these examples, the `id` and `user_id` columns are packed together, which
+means we only need 8 bytes to store _both_ of them. This in turn means each row
+requires 8 bytes less space.
+
+Since Ruby on Rails 5.1, the default data type for IDs is `bigint`, which uses 8 bytes.
+We are using `integer` in the examples to showcase a more realistic reordering scenario.
+
+## Type Sizes
+
+While the [PostgreSQL documentation](https://www.postgresql.org/docs/current/datatype.html) contains plenty
+of information we list the sizes of common types here so it's easier to
+look them up. Here "word" refers to the word size, which is 4 bytes for a 32
+bits platform and 8 bytes for a 64 bits platform.
+
+| Type | Size | Alignment needed |
+|:-----------------|:-------------------------------------|:-----------|
+| `smallint` | 2 bytes | 1 word |
+| `integer` | 4 bytes | 1 word |
+| `bigint` | 8 bytes | 8 bytes |
+| `real` | 4 bytes | 1 word |
+| `double precision` | 8 bytes | 8 bytes |
+| `boolean` | 1 byte | not needed |
+| `text` / `string` | variable, 1 byte plus the data | 1 word |
+| `bytea` | variable, 1 or 4 bytes plus the data | 1 word |
+| `timestamp` | 8 bytes | 8 bytes |
+| `timestamptz` | 8 bytes | 8 bytes |
+| `date` | 4 bytes | 1 word |
+
+A "variable" size means the actual size depends on the value being stored. If
+PostgreSQL determines this can be embedded directly into a row it may do so, but
+for very large values it stores the data externally and store a pointer (of
+1 word in size) in the column. Because of this variable sized columns should
+always be at the end of a table.
+
+## Real Example
+
+Let's use the `events` table as an example, which currently has the following
+layout:
+
+| Column | Type | Size |
+|:--------------|:----------------------------|:---------|
+| `id` | integer | 4 bytes |
+| `target_type` | character varying | variable |
+| `target_id` | integer | 4 bytes |
+| `title` | character varying | variable |
+| `data` | text | variable |
+| `project_id` | integer | 4 bytes |
+| `created_at` | timestamp without time zone | 8 bytes |
+| `updated_at` | timestamp without time zone | 8 bytes |
+| `action` | integer | 4 bytes |
+| `author_id` | integer | 4 bytes |
+
+After adding padding to align the columns this would translate to columns being
+divided into fixed size chunks as follows:
+
+| Chunk Size | Columns |
+|:-----------|:----------------------|
+| 8 bytes | `id` |
+| variable | `target_type` |
+| 8 bytes | `target_id` |
+| variable | `title` |
+| variable | `data` |
+| 8 bytes | `project_id` |
+| 8 bytes | `created_at` |
+| 8 bytes | `updated_at` |
+| 8 bytes | `action`, `author_id` |
+
+This means that excluding the variable sized data and tuple header, we need at
+least 8 * 6 = 48 bytes per row.
+
+We can optimise this by using the following column order instead:
+
+| Column | Type | Size |
+|:--------------|:----------------------------|:---------|
+| `created_at` | timestamp without time zone | 8 bytes |
+| `updated_at` | timestamp without time zone | 8 bytes |
+| `id` | integer | 4 bytes |
+| `target_id` | integer | 4 bytes |
+| `project_id` | integer | 4 bytes |
+| `action` | integer | 4 bytes |
+| `author_id` | integer | 4 bytes |
+| `target_type` | character varying | variable |
+| `title` | character varying | variable |
+| `data` | text | variable |
+
+This would produce the following chunks:
+
+| Chunk Size | Columns |
+|:-----------|:-----------------------|
+| 8 bytes | `created_at` |
+| 8 bytes | `updated_at` |
+| 8 bytes | `id`, `target_id` |
+| 8 bytes | `project_id`, `action` |
+| 8 bytes | `author_id` |
+| variable | `target_type` |
+| variable | `title` |
+| variable | `data` |
+
+Here we only need 40 bytes per row excluding the variable sized data and 24-byte
+tuple header. 8 bytes being saved may not sound like much, but for tables as
+large as the `events` table it does begin to matter. For example, when storing
+80 000 000 rows this translates to a space saving of at least 610 MB, all by
+just changing the order of a few columns.
diff --git a/doc/development/database/pagination_guidelines.md b/doc/development/database/pagination_guidelines.md
index 1641708ce01..fe2e3b46939 100644
--- a/doc/development/database/pagination_guidelines.md
+++ b/doc/development/database/pagination_guidelines.md
@@ -192,7 +192,7 @@ The query execution plan shows that this query is efficient, the database only r
(6 rows)
```
-See the [Understanding EXPLAIN plans](../understanding_explain_plans.md) to find more information about reading execution plans.
+See the [Understanding EXPLAIN plans](understanding_explain_plans.md) to find more information about reading execution plans.
Let's visit the 50_000th page:
diff --git a/doc/development/database/pagination_performance_guidelines.md b/doc/development/database/pagination_performance_guidelines.md
index b5040e499e4..0fef246f133 100644
--- a/doc/development/database/pagination_performance_guidelines.md
+++ b/doc/development/database/pagination_performance_guidelines.md
@@ -12,11 +12,11 @@ The following document gives a few ideas for improving the pagination (sorting)
When ordering the columns it's advised to order by distinct columns only. Consider the following example:
-|`id`|`created_at`|
-|-|-|
-|1|2021-01-04 14:13:43|
-|2|2021-01-05 19:03:12|
-|3|2021-01-05 19:03:12|
+| `id` | `created_at` |
+|------|-----------------------|
+| `1` | `2021-01-04 14:13:43` |
+| `2` | `2021-01-05 19:03:12` |
+| `3` | `2021-01-05 19:03:12` |
If we order by `created_at`, the result would likely depend on how the records are located on the disk.
diff --git a/doc/development/database/polymorphic_associations.md b/doc/development/database/polymorphic_associations.md
new file mode 100644
index 00000000000..ac4dc7720a5
--- /dev/null
+++ b/doc/development/database/polymorphic_associations.md
@@ -0,0 +1,152 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Polymorphic Associations
+
+**Summary:** always use separate tables instead of polymorphic associations.
+
+Rails makes it possible to define so called "polymorphic associations". This
+usually works by adding two columns to a table: a target type column, and a
+target ID. For example, at the time of writing we have such a setup for
+`members` with the following columns:
+
+- `source_type`: a string defining the model to use, can be either `Project` or
+ `Namespace`.
+- `source_id`: the ID of the row to retrieve based on `source_type`. For
+ example, when `source_type` is `Project` then `source_id` contains a
+ project ID.
+
+While such a setup may appear to be useful, it comes with many drawbacks; enough
+that you should avoid this at all costs.
+
+## Space Wasted
+
+Because this setup relies on string values to determine the model to use, it
+wastes a lot of space. For example, for `Project` and `Namespace` the
+maximum size is 9 bytes, plus 1 extra byte for every string when using
+PostgreSQL. While this may only be 10 bytes per row, given enough tables and
+rows using such a setup we can end up wasting quite a bit of disk space and
+memory (for any indexes).
+
+## Indexes
+
+Because our associations are broken up into two columns this may result in
+requiring composite indexes for queries to be performed efficiently. While
+composite indexes are not wrong at all, they can be tricky to set up as the
+ordering of columns in these indexes is important to ensure optimal performance.
+
+## Consistency
+
+One really big problem with polymorphic associations is being unable to enforce
+data consistency on the database level using foreign keys. For consistency to be
+enforced on the database level one would have to write their own foreign key
+logic to support polymorphic associations.
+
+Enforcing consistency on the database level is absolutely crucial for
+maintaining a healthy environment, and thus is another reason to avoid
+polymorphic associations.
+
+## Query Overhead
+
+When using polymorphic associations you always need to filter using both
+columns. For example, you may end up writing a query like this:
+
+```sql
+SELECT *
+FROM members
+WHERE source_type = 'Project'
+AND source_id = 13083;
+```
+
+Here PostgreSQL can perform the query quite efficiently if both columns are
+indexed. As the query gets more complex, it may not be able to use these
+indexes effectively.
+
+## Mixed Responsibilities
+
+Similar to functions and classes, a table should have a single responsibility:
+storing data with a certain set of pre-defined columns. When using polymorphic
+associations, you are storing different types of data (possibly with
+different columns set) in the same table.
+
+## The Solution
+
+Fortunately, there is a solution to these problems: use a
+separate table for every type you would otherwise store in the same table. Using
+a separate table allows you to use everything a database may provide to ensure
+consistency and query data efficiently, without any additional application logic
+being necessary.
+
+Let's say you have a `members` table storing both approved and pending members,
+for both projects and groups, and the pending state is determined by the column
+`requested_at` being set or not. Schema wise such a setup can lead to various
+columns only being set for certain rows, wasting space. It's also possible that
+certain indexes are only set for certain rows, again wasting space. Finally,
+querying such a table requires less than ideal queries. For example:
+
+```sql
+SELECT *
+FROM members
+WHERE requested_at IS NULL
+AND source_type = 'GroupMember'
+AND source_id = 4
+```
+
+Instead such a table should be broken up into separate tables. For example, you
+may end up with 4 tables in this case:
+
+- project_members
+- group_members
+- pending_project_members
+- pending_group_members
+
+This makes querying data trivial. For example, to get the members of a group
+you'd run:
+
+```sql
+SELECT *
+FROM group_members
+WHERE group_id = 4
+```
+
+To get all the pending members of a group in turn you'd run:
+
+```sql
+SELECT *
+FROM pending_group_members
+WHERE group_id = 4
+```
+
+If you want to get both you can use a `UNION`, though you need to be explicit
+about what columns you want to `SELECT` as otherwise the result set uses the
+columns of the first query. For example:
+
+```sql
+SELECT id, 'Group' AS target_type, group_id AS target_id
+FROM group_members
+
+UNION ALL
+
+SELECT id, 'Project' AS target_type, project_id AS target_id
+FROM project_members
+```
+
+The above example is perhaps a bit silly, but it shows that there's nothing
+stopping you from merging the data together and presenting it on the same page.
+Selecting columns explicitly can also speed up queries as the database has to do
+less work to get the data (compared to selecting all columns, even ones you're
+not using).
+
+Our schema also becomes easier. No longer do we need to both store and index the
+`source_type` column, we can define foreign keys easily, and we don't need to
+filter rows using the `IS NULL` condition.
+
+To summarize: using separate tables allows us to use foreign keys effectively,
+create indexes only where necessary, conserve space, query data more
+efficiently, and scale these tables more easily (for example, by storing them on
+separate disks). A nice side effect of this is that code can also become easier,
+as a single model isn't responsible for handling different kinds of
+data.
diff --git a/doc/development/database/post_deployment_migrations.md b/doc/development/database/post_deployment_migrations.md
index a49c77ca047..8166fcc8905 100644
--- a/doc/development/database/post_deployment_migrations.md
+++ b/doc/development/database/post_deployment_migrations.md
@@ -25,6 +25,10 @@ This however skips post deployment migrations:
SKIP_POST_DEPLOYMENT_MIGRATIONS=true bundle exec rake db:migrate
```
+For GitLab.com, these migrations are executed on a daily basis at the discretion of
+release managers through the
+[post-deploy migration pipeline](https://gitlab.com/gitlab-org/release/docs/-/blob/master/general/post_deploy_migration/readme.md).
+
## Deployment Integration
Say you're using Chef for deploying new versions of GitLab and you'd like to run
diff --git a/doc/development/database/query_count_limits.md b/doc/development/database/query_count_limits.md
new file mode 100644
index 00000000000..a888bbfc6e7
--- /dev/null
+++ b/doc/development/database/query_count_limits.md
@@ -0,0 +1,70 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Query Count Limits
+
+Each controller or API endpoint is allowed to execute up to 100 SQL queries and
+in test environments we raise an error when this threshold is exceeded.
+
+## Solving Failing Tests
+
+When a test fails because it executes more than 100 SQL queries there are two
+solutions to this problem:
+
+- Reduce the number of SQL queries that are executed.
+- Disable query limiting for the controller or API endpoint.
+
+You should only resort to disabling query limits when an existing controller or endpoint
+is to blame as in this case reducing the number of SQL queries can take a lot of
+effort. Newly added controllers and endpoints are not allowed to execute more
+than 100 SQL queries and no exceptions are made for this rule. _If_ a large
+number of SQL queries is necessary to perform certain work it's best to have
+this work performed by Sidekiq instead of doing this directly in a web request.
+
+## Disable query limiting
+
+In the event that you _have_ to disable query limits for a controller, you must first
+create an issue. This issue should (preferably in the title) mention the
+controller or endpoint and include the appropriate labels (`database`,
+`performance`, and at least a team specific label such as `Discussion`).
+
+After the issue has been created, you can disable query limits on the code in question. For
+Rails controllers it's best to create a `before_action` hook that runs as early
+as possible. The called method in turn should call
+`Gitlab::QueryLimiting.disable!('issue URL here')`. For example:
+
+```ruby
+class MyController < ApplicationController
+ before_action :disable_query_limiting, only: [:show]
+
+ def index
+ # ...
+ end
+
+ def show
+ # ...
+ end
+
+ def disable_query_limiting
+ Gitlab::QueryLimiting.disable!('https://gitlab.com/gitlab-org/...')
+ end
+end
+```
+
+By using a `before_action` you don't have to modify the controller method in
+question, reducing the likelihood of merge conflicts.
+
+For Grape API endpoints there unfortunately is not a reliable way of running a
+hook before a specific endpoint. This means that you have to add the allowlist
+call directly into the endpoint like so:
+
+```ruby
+get '/projects/:id/foo' do
+ Gitlab::QueryLimiting.disable!('...')
+
+ # ...
+end
+```
diff --git a/doc/development/database/query_performance.md b/doc/development/database/query_performance.md
new file mode 100644
index 00000000000..41dbd08d726
--- /dev/null
+++ b/doc/development/database/query_performance.md
@@ -0,0 +1,74 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Query performance guidelines
+
+This document describes various guidelines to follow when optimizing SQL queries.
+
+When you are optimizing your SQL queries, there are two dimensions to pay attention to:
+
+1. The query execution time. This is paramount as it reflects how the user experiences GitLab.
+1. The query plan. Optimizing the query plan is important in allowing queries to independently scale over time. Realizing that an index keeps a query performing well as the table grows before the query degrades is an example of why we analyze these plans.
+
+## Timing guidelines for queries
+
+| Query Type | Maximum Query Time | Notes |
+|----|----|---|
+| General queries | `100ms` | This is not a hard limit, but if a query is getting above it, it is important to spend time understanding why it can or cannot be optimized. |
+| Queries in a migration | `100ms` | This is different than the total [migration time](../migration_style_guide.md#how-long-a-migration-should-take). |
+| Concurrent operations in a migration | `5min` | Concurrent operations do not block the database, but they block the GitLab update. This includes operations such as `add_concurrent_index` and `add_concurrent_foreign_key`. |
+| Background migrations | `1s` | |
+| Service Ping | `1s` | See the [Service Ping docs](../service_ping/implement.md) for more details. |
+
+- When analyzing your query's performance, pay attention to if the time you are seeing is on a [cold or warm cache](#cold-and-warm-cache). These guidelines apply for both cache types.
+- When working with batched queries, change the range and batch size to see how it effects the query timing and caching.
+- If an existing query is not performing well, make an effort to improve it. If it is too complex or would stall development, create a follow-up so it can be addressed in a timely manner. You can always ask the database reviewer or maintainer for help and guidance.
+
+## Cold and warm cache
+
+When evaluating query performance it is important to understand the difference between
+cold and warm cached queries.
+
+The first time a query is made, it is made on a "cold cache". Meaning it needs
+to read from disk. If you run the query again, the data can be read from the
+cache, or what PostgreSQL calls shared buffers. This is the "warm cache" query.
+
+When analyzing an [`EXPLAIN` plan](understanding_explain_plans.md), you can see
+the difference not only in the timing, but by looking at the output for `Buffers`
+by running your explain with `EXPLAIN(analyze, buffers)`. [Database Lab](understanding_explain_plans.md#database-lab-engine)
+automatically includes these options.
+
+If you are making a warm cache query, you see only the `shared hits`.
+
+For example in #database-lab:
+
+```plaintext
+Shared buffers:
+ - hits: 36467 (~284.90 MiB) from the buffer pool
+ - reads: 0 from the OS file cache, including disk I/O
+```
+
+Or in the explain plan from `psql`:
+
+```sql
+Buffers: shared hit=7323
+```
+
+If the cache is cold, you also see `reads`.
+
+In #database-lab:
+
+```plaintext
+Shared buffers:
+ - hits: 17204 (~134.40 MiB) from the buffer pool
+ - reads: 15229 (~119.00 MiB) from the OS file cache, including disk I/O
+```
+
+In `psql`:
+
+```sql
+Buffers: shared hit=7202 read=121
+```
diff --git a/doc/development/database/query_recorder.md b/doc/development/database/query_recorder.md
new file mode 100644
index 00000000000..da5c6c8e6cb
--- /dev/null
+++ b/doc/development/database/query_recorder.md
@@ -0,0 +1,145 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# QueryRecorder
+
+QueryRecorder is a tool for detecting the [N+1 queries problem](https://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations) from tests.
+
+> Implemented in [spec/support/query_recorder.rb](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/support/helpers/query_recorder.rb) via [9c623e3e](https://gitlab.com/gitlab-org/gitlab-foss/commit/9c623e3e5d7434f2e30f7c389d13e5af4ede770a)
+
+As a rule, merge requests [should not increase query counts](../merge_request_performance_guidelines.md#query-counts). If you find yourself adding something like `.includes(:author, :assignee)` to avoid having `N+1` queries, consider using QueryRecorder to enforce this with a test. Without this, a new feature which causes an additional model to be accessed can silently reintroduce the problem.
+
+## How it works
+
+This style of test works by counting the number of SQL queries executed by ActiveRecord. First a control count is taken, then you add new records to the database and rerun the count. If the number of queries has significantly increased then an `N+1` queries problem exists.
+
+```ruby
+it "avoids N+1 database queries" do
+ control = ActiveRecord::QueryRecorder.new { visit_some_page }
+ create_list(:issue, 5)
+ expect { visit_some_page }.not_to exceed_query_limit(control)
+end
+```
+
+You can if you wish, have both the expectation and the control as
+`QueryRecorder` instances:
+
+```ruby
+it "avoids N+1 database queries" do
+ control = ActiveRecord::QueryRecorder.new { visit_some_page }
+ create_list(:issue, 5)
+ action = ActiveRecord::QueryRecorder.new { visit_some_page }
+
+ expect(action).not_to exceed_query_limit(control)
+end
+```
+
+As an example you might create 5 issues in between counts, which would cause the query count to increase by 5 if an N+1 problem exists.
+
+In some cases the query count might change slightly between runs for unrelated reasons. In this case you might need to test `exceed_query_limit(control_count + acceptable_change)`, but this should be avoided if possible.
+
+If this test fails, and the control was passed as a `QueryRecorder`, then the
+failure message indicates where the extra queries are by matching queries on
+the longest common prefix, grouping similar queries together.
+
+## Cached queries
+
+By default, QueryRecorder ignores [cached queries](../merge_request_performance_guidelines.md#cached-queries) in the count. However, it may be better to count
+all queries to avoid introducing an N+1 query that may be masked by the statement cache.
+To do this, this requires the `:use_sql_query_cache` flag to be set.
+You should pass the `skip_cached` variable to `QueryRecorder` and use the `exceed_all_query_limit` matcher:
+
+```ruby
+it "avoids N+1 database queries", :use_sql_query_cache do
+ control = ActiveRecord::QueryRecorder.new(skip_cached: false) { visit_some_page }
+ create_list(:issue, 5)
+ expect { visit_some_page }.not_to exceed_all_query_limit(control)
+end
+```
+
+## Use request specs instead of controller specs
+
+Use a [request spec](https://gitlab.com/gitlab-org/gitlab-foss/tree/master/spec/requests) when writing a N+1 test on the controller level.
+
+Controller specs should not be used to write N+1 tests as the controller is only initialized once per example.
+This could lead to false successes where subsequent "requests" could have queries reduced (for example, because of memoization).
+
+## Finding the source of the query
+
+There are multiple ways to find the source of queries.
+
+- Inspect the `QueryRecorder` `data` attribute. It stores queries by `file_name:line_number:method_name`.
+ Each entry is a `hash` with the following fields:
+
+ - `count`: the number of times a query from this `file_name:line_number:method_name` was called
+ - `occurrences`: the actual `SQL` of each call
+ - `backtrace`: the stack trace of each call (if either of the two following options were enabled)
+
+ `QueryRecorder#find_query` allows filtering queries by their `file_name:line_number:method_name` and
+ `count` attributes. For example:
+
+ ```ruby
+ control = ActiveRecord::QueryRecorder.new(skip_cached: false) { visit_some_page }
+ control.find_query(/.*note.rb.*/, 0, first_only: true)
+ ```
+
+ `QueryRecorder#occurrences_by_line_method` returns a sorted array based on `data`, sorted by `count`.
+
+- View the call backtrace for the specific `QueryRecorder` instance you want
+ by using `ActiveRecord::QueryRecorder.new(query_recorder_debug: true)`. The output
+ is stored in file `test.log`.
+
+- Enable the call backtrace for all tests using the `QUERY_RECORDER_DEBUG` environment variable.
+
+ To enable this, run the specs with the `QUERY_RECORDER_DEBUG` environment variable set. For example:
+
+ ```shell
+ QUERY_RECORDER_DEBUG=1 bundle exec rspec spec/requests/api/projects_spec.rb
+ ```
+
+ This logs calls to QueryRecorder into the `test.log` file. For example:
+
+ ```sql
+ QueryRecorder SQL: SELECT COUNT(*) FROM "issues" WHERE "issues"."deleted_at" IS NULL AND "issues"."project_id" = $1 AND ("issues"."state" IN ('opened')) AND "issues"."confidential" = $2
+ --> /home/user/gitlab/gdk/gitlab/spec/support/query_recorder.rb:19:in `callback'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/notifications/fanout.rb:127:in `finish'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/notifications/fanout.rb:46:in `block in finish'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/notifications/fanout.rb:46:in `each'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/notifications/fanout.rb:46:in `finish'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/notifications/instrumenter.rb:36:in `finish'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/notifications/instrumenter.rb:25:in `instrument'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/abstract_adapter.rb:478:in `log'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/postgresql_adapter.rb:601:in `exec_cache'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/postgresql_adapter.rb:585:in `execute_and_clear'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/postgresql/database_statements.rb:160:in `exec_query'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/abstract/database_statements.rb:356:in `select'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/abstract/database_statements.rb:32:in `select_all'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/abstract/query_cache.rb:68:in `block in select_all'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/abstract/query_cache.rb:83:in `cache_sql'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/abstract/query_cache.rb:68:in `select_all'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/relation/calculations.rb:270:in `execute_simple_calculation'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/relation/calculations.rb:227:in `perform_calculation'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/relation/calculations.rb:133:in `calculate'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/relation/calculations.rb:48:in `count'
+ --> /home/user/gitlab/gdk/gitlab/app/services/base_count_service.rb:20:in `uncached_count'
+ --> /home/user/gitlab/gdk/gitlab/app/services/base_count_service.rb:12:in `block in count'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/cache.rb:299:in `block in fetch'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/cache.rb:585:in `block in save_block_result_to_cache'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/cache.rb:547:in `block in instrument'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/notifications.rb:166:in `instrument'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/cache.rb:547:in `instrument'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/cache.rb:584:in `save_block_result_to_cache'
+ --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/cache.rb:299:in `fetch'
+ --> /home/user/gitlab/gdk/gitlab/app/services/base_count_service.rb:12:in `count'
+ --> /home/user/gitlab/gdk/gitlab/app/models/project.rb:1296:in `open_issues_count'
+ ```
+
+## See also
+
+- [Bullet](../profiling.md#bullet) For finding `N+1` query problems
+- [Performance guidelines](../performance.md)
+- [Merge request performance guidelines - Query counts](../merge_request_performance_guidelines.md#query-counts)
+- [Merge request performance guidelines - Cached queries](../merge_request_performance_guidelines.md#cached-queries)
diff --git a/doc/development/database/rename_database_tables.md b/doc/development/database/rename_database_tables.md
index cbcbd507204..d6827cb9e03 100644
--- a/doc/development/database/rename_database_tables.md
+++ b/doc/development/database/rename_database_tables.md
@@ -81,10 +81,10 @@ Execute a standard migration (not a post-migration):
when naming indexes, so there is a possibility that not all indexes are properly renamed. After running
the migration locally, check if there are inconsistently named indexes (`db/structure.sql`). Those can be
renamed manually in a separate migration, which can be also part of the release M.N+1.
-- Foreign key columns might still contain the old table name. For smaller tables, follow our [standard column
-rename process](avoiding_downtime_in_migrations.md#renaming-columns)
+- Foreign key columns might still contain the old table name. For smaller tables, follow our
+ [standard column rename process](avoiding_downtime_in_migrations.md#renaming-columns)
- Avoid renaming database tables which are using with triggers.
-- Table modifications (add or remove columns) are not allowed during the rename process, please make sure that all changes to the table happen before the rename migration is started (or in the next release).
+- Table modifications (add or remove columns) are not allowed during the rename process. Make sure that all changes to the table happen before the rename migration is started (or in the next release).
- As the index names might change, verify that the model does not use bulk insert
(for example, `insert_all` and `upsert_all`) with the `unique_by: index_name` option.
Renaming an index while using these methods may break functionality.
diff --git a/doc/development/database/serializing_data.md b/doc/development/database/serializing_data.md
new file mode 100644
index 00000000000..97e6f665484
--- /dev/null
+++ b/doc/development/database/serializing_data.md
@@ -0,0 +1,90 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Serializing Data
+
+**Summary:** don't store serialized data in the database, use separate columns
+and/or tables instead. This includes storing of comma separated values as a
+string.
+
+Rails makes it possible to store serialized data in JSON, YAML or other formats.
+Such a field can be defined as follows:
+
+```ruby
+class Issue < ActiveRecord::Model
+ serialize :custom_fields
+end
+```
+
+While it may be tempting to store serialized data in the database there are many
+problems with this. This document outlines these problems and provide an
+alternative.
+
+## Serialized Data Is Less Powerful
+
+When using a relational database you have the ability to query individual
+fields, change the schema, index data, and so forth. When you use serialized data
+all of that becomes either very difficult or downright impossible. While
+PostgreSQL does offer the ability to query JSON fields it is mostly meant for
+very specialized use cases, and not for more general use. If you use YAML in
+turn there's no way to query the data at all.
+
+## Waste Of Space
+
+Storing serialized data such as JSON or YAML ends up wasting a lot of space.
+This is because these formats often include additional characters (for example, double
+quotes or newlines) besides the data that you are storing.
+
+## Difficult To Manage
+
+There comes a time where you must add a new field to the serialized
+data, or change an existing one. Using serialized data this becomes difficult
+and very time consuming as the only way of doing so is to re-write all the
+stored values. To do so you would have to:
+
+1. Retrieve the data
+1. Parse it into a Ruby structure
+1. Mutate it
+1. Serialize it back to a String
+1. Store it in the database
+
+On the other hand, if one were to use regular columns adding a column would be:
+
+```sql
+ALTER TABLE table_name ADD COLUMN column_name type;
+```
+
+Such a query would take very little to no time and would immediately apply to
+all rows, without having to re-write large JSON or YAML structures.
+
+Finally, there comes a time when the JSON or YAML structure is no longer
+sufficient and you must migrate away from it. When storing only a few rows
+this may not be a problem, but when storing millions of rows such a migration
+can take hours or even days to complete.
+
+## Relational Databases Are Not Document Stores
+
+When storing data as JSON or YAML you're essentially using your database as if
+it were a document store (for example, MongoDB), except you're not using any of the
+powerful features provided by a typical RDBMS _nor_ are you using any of the
+features provided by a typical document store (for example, the ability to index fields
+of documents with variable fields). In other words, it's a waste.
+
+## Consistent Fields
+
+One argument sometimes made in favour of serialized data is having to store
+widely varying fields and values. Sometimes this is truly the case, and then
+perhaps it might make sense to use serialized data. However, in 99% of the cases
+the fields and types stored tend to be the same for every row. Even if there is
+a slight difference you can still use separate columns and just not set the ones
+you don't need.
+
+## The Solution
+
+The solution is to use separate columns and/or separate tables.
+This allows you to use all the features provided by your database, it
+makes it easier to manage and migrate the data, you conserve space, you can
+index the data efficiently and so forth.
diff --git a/doc/development/database/sha1_as_binary.md b/doc/development/database/sha1_as_binary.md
new file mode 100644
index 00000000000..dab9b0fe72e
--- /dev/null
+++ b/doc/development/database/sha1_as_binary.md
@@ -0,0 +1,42 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Storing SHA1 Hashes As Binary
+
+Storing SHA1 hashes as strings is not very space efficient. A SHA1 as a string
+requires at least 40 bytes, an additional byte to store the encoding, and
+perhaps more space depending on the internals of PostgreSQL.
+
+On the other hand, if one were to store a SHA1 as binary one would only need 20
+bytes for the actual SHA1, and 1 or 4 bytes of additional space (again depending
+on database internals). This means that in the best case scenario we can reduce
+the space usage by 50%.
+
+To make this easier to work with you can include the concern `ShaAttribute` into
+a model and define a SHA attribute using the `sha_attribute` class method. For
+example:
+
+```ruby
+class Commit < ActiveRecord::Base
+ include ShaAttribute
+
+ sha_attribute :sha
+end
+```
+
+This allows you to use the value of the `sha` attribute as if it were a string,
+while storing it as binary. This means that you can do something like this,
+without having to worry about converting data to the right binary format:
+
+```ruby
+commit = Commit.find_by(sha: '88c60307bd1f215095834f09a1a5cb18701ac8ad')
+commit.sha = '971604de4cfa324d91c41650fabc129420c8d1cc'
+commit.save
+```
+
+There is however one requirement: the column used to store the SHA has _must_ be
+a binary type. For Rails this means you need to use the `:binary` type instead
+of `:text` or `:string`.
diff --git a/doc/development/database/single_table_inheritance.md b/doc/development/database/single_table_inheritance.md
new file mode 100644
index 00000000000..c8d082e8a67
--- /dev/null
+++ b/doc/development/database/single_table_inheritance.md
@@ -0,0 +1,63 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Single Table Inheritance
+
+**Summary:** don't use Single Table Inheritance (STI), use separate tables
+instead.
+
+Rails makes it possible to have multiple models stored in the same table and map
+these rows to the correct models using a `type` column. This can be used to for
+example store two different types of SSH keys in the same table.
+
+While tempting to use one should avoid this at all costs for the same reasons as
+outlined in the document ["Polymorphic Associations"](polymorphic_associations.md).
+
+## Solution
+
+The solution is very simple: just use a separate table for every type you'd
+otherwise store in the same table. For example, instead of having a `keys` table
+with `type` set to either `Key` or `DeployKey` you'd have two separate tables:
+`keys` and `deploy_keys`.
+
+## In migrations
+
+Whenever a model is used in a migration, single table inheritance should be disabled.
+Due to the way Rails loads associations (even in migrations), failing to disable STI
+could result in loading unexpected code or associations which may cause unintended
+side effects or failures during upgrades.
+
+```ruby
+class SomeMigration < Gitlab::Database::Migration[2.0]
+ class Services < MigrationRecord
+ self.table_name = 'services'
+ self.inheritance_column = :_type_disabled
+ end
+
+ def up
+ ...
+```
+
+If nothing needs to be added to the model other than disabling STI or `EachBatch`,
+use the helper `define_batchable_model` instead of defining the class.
+This ensures that the migration loads the columns for the migration in isolation,
+and the helper disables STI by default.
+
+```ruby
+class EnqueueSomeBackgroundMigration < Gitlab::Database::Migration[1.0]
+ disable_ddl_transaction!
+
+ def up
+ define_batchable_model('services').select(:id).in_batches do |relation|
+ jobs = relation.pluck(:id).map do |id|
+ ['ExtractServicesUrl', [id]]
+ end
+
+ BackgroundMigrationWorker.bulk_perform_async(jobs)
+ end
+ end
+ ...
+```
diff --git a/doc/development/database/strings_and_the_text_data_type.md b/doc/development/database/strings_and_the_text_data_type.md
index 73e023f8d45..e2e1191018b 100644
--- a/doc/development/database/strings_and_the_text_data_type.md
+++ b/doc/development/database/strings_and_the_text_data_type.md
@@ -148,8 +148,9 @@ to update the `title_html` with a title that has more than 1024 characters, the
a database error.
Adding or removing a constraint to an existing attribute requires that any application changes are
-deployed _first_, [otherwise servers still in the old version of the application may try to update the
-attribute with invalid values](../multi_version_compatibility.md#ci-artifact-uploads-were-failing).
+deployed _first_,
+otherwise servers still in the old version of the application
+[may try to update the attribute with invalid values](../multi_version_compatibility.md#ci-artifact-uploads-were-failing).
For these reasons, `add_text_limit` should run in a post-deployment migration.
Still in our example, for the 13.0 milestone (current), consider that the following validation
@@ -188,7 +189,7 @@ migration or a background data migration:
- If the data volume is less than `1,000` records, then the data migration can be executed within the post-migration.
- If the data volume is higher than `1,000` records, it's advised to create a background migration.
-When unsure about which option to use, please contact the Database team for advice.
+When unsure about which option to use, contact the Database team for advice.
Back to our example, the issues table is considerably large and frequently accessed, so we are going
to add a background migration for the 13.0 milestone (current),
diff --git a/doc/development/database/swapping_tables.md b/doc/development/database/swapping_tables.md
new file mode 100644
index 00000000000..efb481ccf35
--- /dev/null
+++ b/doc/development/database/swapping_tables.md
@@ -0,0 +1,51 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Swapping Tables
+
+Sometimes you need to replace one table with another. For example, when
+migrating data in a very large table it's often better to create a copy of the
+table and insert & migrate the data into this new table in the background.
+
+Let's say you want to swap the table `events` with `events_for_migration`. In
+this case you need to follow 3 steps:
+
+1. Rename `events` to `events_temporary`
+1. Rename `events_for_migration` to `events`
+1. Rename `events_temporary` to `events_for_migration`
+
+Rails allows you to do this using the `rename_table` method:
+
+```ruby
+rename_table :events, :events_temporary
+rename_table :events_for_migration, :events
+rename_table :events_temporary, :events_for_migration
+```
+
+This does not require any downtime as long as the 3 `rename_table` calls are
+executed in the _same_ database transaction. Rails by default uses database
+transactions for migrations, but if it doesn't you need to start one
+manually:
+
+```ruby
+Event.transaction do
+ rename_table :events, :events_temporary
+ rename_table :events_for_migration, :events
+ rename_table :events_temporary, :events_for_migration
+end
+```
+
+Once swapped you _have to_ reset the primary key of the new table. For
+PostgreSQL you can use the `reset_pk_sequence!` method like so:
+
+```ruby
+reset_pk_sequence!('events')
+```
+
+Failure to reset the primary keys results in newly created rows starting
+with an ID value of 1. Depending on the existing data this can then lead to
+duplicate key constraints from popping up, preventing users from creating new
+data.
diff --git a/doc/development/database/transaction_guidelines.md b/doc/development/database/transaction_guidelines.md
index 255de19a420..1583bbc02c2 100644
--- a/doc/development/database/transaction_guidelines.md
+++ b/doc/development/database/transaction_guidelines.md
@@ -12,7 +12,7 @@ For further reference, check PostgreSQL documentation about [transactions](https
## Database decomposition and sharding
-The [sharding group](https://about.gitlab.com/handbook/engineering/development/enablement/sharding/) plans
+The [Pods Group](https://about.gitlab.com/handbook/engineering/development/enablement/data_stores/pods/) plans
to split the main GitLab database and move some of the database tables to other database servers.
We start decomposing the `ci_*`-related database tables first. To maintain the current application
diff --git a/doc/development/database/understanding_explain_plans.md b/doc/development/database/understanding_explain_plans.md
new file mode 100644
index 00000000000..446a84d5232
--- /dev/null
+++ b/doc/development/database/understanding_explain_plans.md
@@ -0,0 +1,829 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Understanding EXPLAIN plans
+
+PostgreSQL allows you to obtain query plans using the `EXPLAIN` command. This
+command can be invaluable when trying to determine how a query performs.
+You can use this command directly in your SQL query, as long as the query starts
+with it:
+
+```sql
+EXPLAIN
+SELECT COUNT(*)
+FROM projects
+WHERE visibility_level IN (0, 20);
+```
+
+When running this on GitLab.com, we are presented with the following output:
+
+```sql
+Aggregate (cost=922411.76..922411.77 rows=1 width=8)
+ -> Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0)
+ Filter: (visibility_level = ANY ('{0,20}'::integer[]))
+```
+
+When using _just_ `EXPLAIN`, PostgreSQL does not actually execute our query,
+instead it produces an _estimated_ execution plan based on the available
+statistics. This means the actual plan can differ quite a bit. Fortunately,
+PostgreSQL provides us with the option to execute the query as well. To do so,
+we need to use `EXPLAIN ANALYZE` instead of just `EXPLAIN`:
+
+```sql
+EXPLAIN ANALYZE
+SELECT COUNT(*)
+FROM projects
+WHERE visibility_level IN (0, 20);
+```
+
+This produces:
+
+```sql
+Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1)
+ -> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
+ Filter: (visibility_level = ANY ('{0,20}'::integer[]))
+ Rows Removed by Filter: 65677
+Planning time: 2.861 ms
+Execution time: 3428.596 ms
+```
+
+As we can see this plan is quite different, and includes a lot more data. Let's
+discuss this step by step.
+
+Because `EXPLAIN ANALYZE` executes the query, care should be taken when using a
+query that writes data or might time out. If the query modifies data,
+consider wrapping it in a transaction that rolls back automatically like so:
+
+```sql
+BEGIN;
+EXPLAIN ANALYZE
+DELETE FROM users WHERE id = 1;
+ROLLBACK;
+```
+
+The `EXPLAIN` command also takes additional options, such as `BUFFERS`:
+
+```sql
+EXPLAIN (ANALYZE, BUFFERS)
+SELECT COUNT(*)
+FROM projects
+WHERE visibility_level IN (0, 20);
+```
+
+This then produces:
+
+```sql
+Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1)
+ Buffers: shared hit=208846
+ -> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
+ Filter: (visibility_level = ANY ('{0,20}'::integer[]))
+ Rows Removed by Filter: 65677
+ Buffers: shared hit=208846
+Planning time: 2.861 ms
+Execution time: 3428.596 ms
+```
+
+For more information, refer to the official
+[`EXPLAIN` documentation](https://www.postgresql.org/docs/current/sql-explain.html)
+and [using `EXPLAIN` guide](https://www.postgresql.org/docs/current/using-explain.html).
+
+## Nodes
+
+Every query plan consists of nodes. Nodes can be nested, and are executed from
+the inside out. This means that the innermost node is executed before an outer
+node. This can be best thought of as nested function calls, returning their
+results as they unwind. For example, a plan starting with an `Aggregate`
+followed by a `Nested Loop`, followed by an `Index Only scan` can be thought of
+as the following Ruby code:
+
+```ruby
+aggregate(
+ nested_loop(
+ index_only_scan()
+ index_only_scan()
+ )
+)
+```
+
+Nodes are indicated using a `->` followed by the type of node taken. For
+example:
+
+```sql
+Aggregate (cost=922411.76..922411.77 rows=1 width=8)
+ -> Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0)
+ Filter: (visibility_level = ANY ('{0,20}'::integer[]))
+```
+
+Here the first node executed is `Seq scan on projects`. The `Filter:` is an
+additional filter applied to the results of the node. A filter is very similar
+to Ruby's `Array#select`: it takes the input rows, applies the filter, and
+produces a new list of rows. After the node is done, we perform the `Aggregate`
+above it.
+
+Nested nodes look like this:
+
+```sql
+Aggregate (cost=176.97..176.98 rows=1 width=8) (actual time=0.252..0.252 rows=1 loops=1)
+ Buffers: shared hit=155
+ -> Nested Loop (cost=0.86..176.75 rows=87 width=0) (actual time=0.035..0.249 rows=36 loops=1)
+ Buffers: shared hit=155
+ -> Index Only Scan using users_pkey on users users_1 (cost=0.43..4.95 rows=87 width=4) (actual time=0.029..0.123 rows=36 loops=1)
+ Index Cond: (id < 100)
+ Heap Fetches: 0
+ -> Index Only Scan using users_pkey on users (cost=0.43..1.96 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=36)
+ Index Cond: (id = users_1.id)
+ Heap Fetches: 0
+Planning time: 2.585 ms
+Execution time: 0.310 ms
+```
+
+Here we first perform two separate "Index Only" scans, followed by performing a
+"Nested Loop" on the result of these two scans.
+
+## Node statistics
+
+Each node in a plan has a set of associated statistics, such as the cost, the
+number of rows produced, the number of loops performed, and more. For example:
+
+```sql
+Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0)
+```
+
+Here we can see that our cost ranges from `0.00..908044.47` (we cover this in
+a moment), and we estimate (since we're using `EXPLAIN` and not `EXPLAIN
+ANALYZE`) a total of 5,746,914 rows to be produced by this node. The `width`
+statistics describes the estimated width of each row, in bytes.
+
+The `costs` field specifies how expensive a node was. The cost is measured in
+arbitrary units determined by the query planner's cost parameters. What
+influences the costs depends on a variety of settings, such as `seq_page_cost`,
+`cpu_tuple_cost`, and various others.
+The format of the costs field is as follows:
+
+```sql
+STARTUP COST..TOTAL COST
+```
+
+The startup cost states how expensive it was to start the node, with the total
+cost describing how expensive the entire node was. In general: the greater the
+values, the more expensive the node.
+
+When using `EXPLAIN ANALYZE`, these statistics also include the actual time
+(in milliseconds) spent, and other runtime statistics (for example, the actual number of
+produced rows):
+
+```sql
+Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
+```
+
+Here we can see we estimated 5,746,969 rows to be returned, but in reality we
+returned 5,746,940 rows. We can also see that _just_ this sequential scan took
+2.98 seconds to run.
+
+Using `EXPLAIN (ANALYZE, BUFFERS)` also gives us information about the
+number of rows removed by a filter, the number of buffers used, and more. For
+example:
+
+```sql
+Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
+ Filter: (visibility_level = ANY ('{0,20}'::integer[]))
+ Rows Removed by Filter: 65677
+ Buffers: shared hit=208846
+```
+
+Here we can see that our filter has to remove 65,677 rows, and that we use
+208,846 buffers. Each buffer in PostgreSQL is 8 KB (8192 bytes), meaning our
+above node uses *1.6 GB of buffers*. That's a lot!
+
+Keep in mind that some statistics are per-loop averages, while others are total values:
+
+| Field name | Value type |
+| --- | --- |
+| Actual Total Time | per-loop average |
+| Actual Rows | per-loop average |
+| Buffers Shared Hit | total value |
+| Buffers Shared Read | total value |
+| Buffers Shared Dirtied | total value |
+| Buffers Shared Written | total value |
+| I/O Read Time | total value |
+| I/O Read Write | total value |
+
+For example:
+
+```sql
+ -> Index Scan using users_pkey on public.users (cost=0.43..3.44 rows=1 width=1318) (actual time=0.025..0.025 rows=1 loops=888)
+ Index Cond: (users.id = issues.author_id)
+ Buffers: shared hit=3543 read=9
+ I/O Timings: read=17.760 write=0.000
+```
+
+Here we can see that this node used 3552 buffers (3543 + 9), returned 888 rows (`888 * 1`), and the actual duration was 22.2 milliseconds (`888 * 0.025`).
+17.76 milliseconds of the total duration was spent in reading from disk, to retrieve data that was not in the cache.
+
+## Node types
+
+There are quite a few different types of nodes, so we only cover some of the
+more common ones here.
+
+A full list of all the available nodes and their descriptions can be found in
+the [PostgreSQL source file `plannodes.h`](https://gitlab.com/postgres/postgres/blob/master/src/include/nodes/plannodes.h).
+pgMustard's [EXPLAIN docs](https://www.pgmustard.com/docs/explain) also offer detailed look into nodes and their fields.
+
+### Seq Scan
+
+A sequential scan over (a chunk of) a database table. This is like using
+`Array#each`, but on a database table. Sequential scans can be quite slow when
+retrieving lots of rows, so it's best to avoid these for large tables.
+
+### Index Only Scan
+
+A scan on an index that did not require fetching anything from the table. In
+certain cases an index only scan may still fetch data from the table, in this
+case the node includes a `Heap Fetches:` statistic.
+
+### Index Scan
+
+A scan on an index that required retrieving some data from the table.
+
+### Bitmap Index Scan and Bitmap Heap scan
+
+Bitmap scans fall between sequential scans and index scans. These are typically
+used when we would read too much data from an index scan, but too little to
+perform a sequential scan. A bitmap scan uses what is known as a
+[bitmap index](https://en.wikipedia.org/wiki/Bitmap_index) to perform its work.
+
+The [source code of PostgreSQL](https://gitlab.com/postgres/postgres/blob/REL_11_STABLE/src/include/nodes/plannodes.h#L441)
+states the following on bitmap scans:
+
+> Bitmap Index Scan delivers a bitmap of potential tuple locations; it does not
+> access the heap itself. The bitmap is used by an ancestor Bitmap Heap Scan
+> node, possibly after passing through intermediate Bitmap And and/or Bitmap Or
+> nodes to combine it with the results of other Bitmap Index Scans.
+
+### Limit
+
+Applies a `LIMIT` on the input rows.
+
+### Sort
+
+Sorts the input rows as specified using an `ORDER BY` statement.
+
+### Nested Loop
+
+A nested loop executes its child nodes for every row produced by a node that
+precedes it. For example:
+
+```sql
+-> Nested Loop (cost=0.86..176.75 rows=87 width=0) (actual time=0.035..0.249 rows=36 loops=1)
+ Buffers: shared hit=155
+ -> Index Only Scan using users_pkey on users users_1 (cost=0.43..4.95 rows=87 width=4) (actual time=0.029..0.123 rows=36 loops=1)
+ Index Cond: (id < 100)
+ Heap Fetches: 0
+ -> Index Only Scan using users_pkey on users (cost=0.43..1.96 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=36)
+ Index Cond: (id = users_1.id)
+ Heap Fetches: 0
+```
+
+Here the first child node (`Index Only Scan using users_pkey on users users_1`)
+produces 36 rows, and is executed once (`rows=36 loops=1`). The next node
+produces 1 row (`rows=1`), but is repeated 36 times (`loops=36`). This is
+because the previous node produced 36 rows.
+
+This means that nested loops can quickly slow the query down if the various
+child nodes keep producing many rows.
+
+## Optimising queries
+
+With that out of the way, let's see how we can optimise a query. Let's use the
+following query as an example:
+
+```sql
+SELECT COUNT(*)
+FROM users
+WHERE twitter != '';
+```
+
+This query counts the number of users that have a Twitter profile set.
+Let's run this using `EXPLAIN (ANALYZE, BUFFERS)`:
+
+```sql
+EXPLAIN (ANALYZE, BUFFERS)
+SELECT COUNT(*)
+FROM users
+WHERE twitter != '';
+```
+
+This produces the following plan:
+
+```sql
+Aggregate (cost=845110.21..845110.22 rows=1 width=8) (actual time=1271.157..1271.158 rows=1 loops=1)
+ Buffers: shared hit=202662
+ -> Seq Scan on users (cost=0.00..844969.99 rows=56087 width=0) (actual time=0.019..1265.883 rows=51833 loops=1)
+ Filter: ((twitter)::text <> ''::text)
+ Rows Removed by Filter: 2487813
+ Buffers: shared hit=202662
+Planning time: 0.390 ms
+Execution time: 1271.180 ms
+```
+
+From this query plan we can see the following:
+
+1. We need to perform a sequential scan on the `users` table.
+1. This sequential scan filters out 2,487,813 rows using a `Filter`.
+1. We use 202,622 buffers, which equals 1.58 GB of memory.
+1. It takes us 1.2 seconds to do all of this.
+
+Considering we are just counting users, that's quite expensive!
+
+Before we start making any changes, let's see if there are any existing indexes
+on the `users` table that we might be able to use. We can obtain this
+information by running `\d users` in a `psql` console, then scrolling down to
+the `Indexes:` section:
+
+```sql
+Indexes:
+ "users_pkey" PRIMARY KEY, btree (id)
+ "index_users_on_confirmation_token" UNIQUE, btree (confirmation_token)
+ "index_users_on_email" UNIQUE, btree (email)
+ "index_users_on_reset_password_token" UNIQUE, btree (reset_password_token)
+ "index_users_on_static_object_token" UNIQUE, btree (static_object_token)
+ "index_users_on_unlock_token" UNIQUE, btree (unlock_token)
+ "index_on_users_name_lower" btree (lower(name::text))
+ "index_users_on_accepted_term_id" btree (accepted_term_id)
+ "index_users_on_admin" btree (admin)
+ "index_users_on_created_at" btree (created_at)
+ "index_users_on_email_trigram" gin (email gin_trgm_ops)
+ "index_users_on_feed_token" btree (feed_token)
+ "index_users_on_group_view" btree (group_view)
+ "index_users_on_incoming_email_token" btree (incoming_email_token)
+ "index_users_on_managing_group_id" btree (managing_group_id)
+ "index_users_on_name" btree (name)
+ "index_users_on_name_trigram" gin (name gin_trgm_ops)
+ "index_users_on_public_email" btree (public_email) WHERE public_email::text <> ''::text
+ "index_users_on_state" btree (state)
+ "index_users_on_state_and_user_type" btree (state, user_type)
+ "index_users_on_unconfirmed_email" btree (unconfirmed_email) WHERE unconfirmed_email IS NOT NULL
+ "index_users_on_user_type" btree (user_type)
+ "index_users_on_username" btree (username)
+ "index_users_on_username_trigram" gin (username gin_trgm_ops)
+ "tmp_idx_on_user_id_where_bio_is_filled" btree (id) WHERE COALESCE(bio, ''::character varying)::text IS DISTINCT FROM ''::text
+```
+
+Here we can see there is no index on the `twitter` column, which means
+PostgreSQL has to perform a sequential scan in this case. Let's try to fix this
+by adding the following index:
+
+```sql
+CREATE INDEX CONCURRENTLY twitter_test ON users (twitter);
+```
+
+If we now re-run our query using `EXPLAIN (ANALYZE, BUFFERS)` we get the
+following plan:
+
+```sql
+Aggregate (cost=61002.82..61002.83 rows=1 width=8) (actual time=297.311..297.312 rows=1 loops=1)
+ Buffers: shared hit=51854 dirtied=19
+ -> Index Only Scan using twitter_test on users (cost=0.43..60873.13 rows=51877 width=0) (actual time=279.184..293.532 rows=51833 loops=1)
+ Filter: ((twitter)::text <> ''::text)
+ Rows Removed by Filter: 2487830
+ Heap Fetches: 26037
+ Buffers: shared hit=51854 dirtied=19
+Planning time: 0.191 ms
+Execution time: 297.334 ms
+```
+
+Now it takes just under 300 milliseconds to get our data, instead of 1.2
+seconds. However, we still use 51,854 buffers, which is about 400 MB of memory.
+300 milliseconds is also quite slow for such a simple query. To understand why
+this query is still expensive, let's take a look at the following:
+
+```sql
+Index Only Scan using twitter_test on users (cost=0.43..60873.13 rows=51877 width=0) (actual time=279.184..293.532 rows=51833 loops=1)
+ Filter: ((twitter)::text <> ''::text)
+ Rows Removed by Filter: 2487830
+```
+
+We start with an index only scan on our index, but we somehow still apply a
+`Filter` that filters out 2,487,830 rows. Why is that? Well, let's look at how
+we created the index:
+
+```sql
+CREATE INDEX CONCURRENTLY twitter_test ON users (twitter);
+```
+
+We told PostgreSQL to index all possible values of the `twitter` column,
+even empty strings. Our query in turn uses `WHERE twitter != ''`. This means
+that the index does improve things, as we don't need to do a sequential scan,
+but we may still encounter empty strings. This means PostgreSQL _has_ to apply a
+Filter on the index results to get rid of those values.
+
+Fortunately, we can improve this even further using "partial indexes". Partial
+indexes are indexes with a `WHERE` condition that is applied when indexing data.
+For example:
+
+```sql
+CREATE INDEX CONCURRENTLY some_index ON users (email) WHERE id < 100
+```
+
+This index would only index the `email` value of rows that match `WHERE id <
+100`. We can use partial indexes to change our Twitter index to the following:
+
+```sql
+CREATE INDEX CONCURRENTLY twitter_test ON users (twitter) WHERE twitter != '';
+```
+
+After being created, if we run our query again we are given the following plan:
+
+```sql
+Aggregate (cost=1608.26..1608.27 rows=1 width=8) (actual time=19.821..19.821 rows=1 loops=1)
+ Buffers: shared hit=44036
+ -> Index Only Scan using twitter_test on users (cost=0.41..1479.71 rows=51420 width=0) (actual time=0.023..15.514 rows=51833 loops=1)
+ Heap Fetches: 1208
+ Buffers: shared hit=44036
+Planning time: 0.123 ms
+Execution time: 19.848 ms
+```
+
+That's _a lot_ better! Now it only takes 20 milliseconds to get the data, and we
+only use about 344 MB of buffers (instead of the original 1.58 GB). The reason
+this works is that now PostgreSQL no longer needs to apply a `Filter`, as the
+index only contains `twitter` values that are not empty.
+
+Keep in mind that you shouldn't just add partial indexes every time you want to
+optimise a query. Every index has to be updated for every write, and they may
+require quite a bit of space, depending on the amount of indexed data. As a
+result, first check if there are any existing indexes you may be able to reuse.
+If there aren't any, check if you can perhaps slightly change an existing one to
+fit both the existing and new queries. Only add a new index if none of the
+existing indexes can be used in any way.
+
+When comparing execution plans, don't take timing as the only important metric.
+Good timing is the main goal of any optimization, but it can be too volatile to
+be used for comparison (for example, it depends a lot on the state of cache).
+When optimizing a query, we usually need to reduce the amount of data we're
+dealing with. Indexes are the way to work with fewer pages (buffers) to get the
+result, so, during optimization, look at the number of buffers used (read and hit),
+and work on reducing these numbers. Reduced timing is the consequence of reduced
+buffer numbers. [Database Lab Engine](#database-lab-engine) guarantees that the plan is structurally
+identical to production (and overall number of buffers is the same as on production),
+but difference in cache state and I/O speed may lead to different timings.
+
+## Queries that can't be optimised
+
+Now that we have seen how to optimise a query, let's look at another query that
+we might not be able to optimise:
+
+```sql
+EXPLAIN (ANALYZE, BUFFERS)
+SELECT COUNT(*)
+FROM projects
+WHERE visibility_level IN (0, 20);
+```
+
+The output of `EXPLAIN (ANALYZE, BUFFERS)` is as follows:
+
+```sql
+Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1)
+ Buffers: shared hit=208846
+ -> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
+ Filter: (visibility_level = ANY ('{0,20}'::integer[]))
+ Rows Removed by Filter: 65677
+ Buffers: shared hit=208846
+Planning time: 2.861 ms
+Execution time: 3428.596 ms
+```
+
+Looking at the output we see the following Filter:
+
+```sql
+Filter: (visibility_level = ANY ('{0,20}'::integer[]))
+Rows Removed by Filter: 65677
+```
+
+Looking at the number of rows removed by the filter, we may be tempted to add an
+index on `projects.visibility_level` to somehow turn this Sequential scan +
+filter into an index-only scan.
+
+Unfortunately, doing so is unlikely to improve anything. Contrary to what some
+might believe, an index being present _does not guarantee_ that PostgreSQL
+actually uses it. For example, when doing a `SELECT * FROM projects` it is much
+cheaper to just scan the entire table, instead of using an index and then
+fetching data from the table. In such cases PostgreSQL may decide to not use an
+index.
+
+Second, let's think for a moment what our query does: it gets all projects with
+visibility level 0 or 20. In the above plan we can see this produces quite a lot
+of rows (5,745,940), but how much is that relative to the total? Let's find out
+by running the following query:
+
+```sql
+SELECT visibility_level, count(*) AS amount
+FROM projects
+GROUP BY visibility_level
+ORDER BY visibility_level ASC;
+```
+
+For GitLab.com this produces:
+
+```sql
+ visibility_level | amount
+------------------+---------
+ 0 | 5071325
+ 10 | 65678
+ 20 | 674801
+```
+
+Here the total number of projects is 5,811,804, and 5,746,126 of those are of
+level 0 or 20. That's 98% of the entire table!
+
+So no matter what we do, this query retrieves 98% of the entire table. Since
+most time is spent doing exactly that, there isn't really much we can do to
+improve this query, other than _not_ running it at all.
+
+What is important here is that while some may recommend to straight up add an
+index the moment you see a sequential scan, it is _much more important_ to first
+understand what your query does, how much data it retrieves, and so on. After
+all, you can not optimise something you do not understand.
+
+### Cardinality and selectivity
+
+Earlier we saw that our query had to retrieve 98% of the rows in the table.
+There are two terms commonly used for databases: cardinality, and selectivity.
+Cardinality refers to the number of unique values in a particular column in a
+table.
+
+Selectivity is the number of unique values produced by an operation (for example, an
+index scan or filter), relative to the total number of rows. The higher the
+selectivity, the more likely PostgreSQL is able to use an index.
+
+In the above example, there are only 3 unique values: 0, 10, and 20. This means
+the cardinality is 3. The selectivity in turn is also very low: 0.0000003% (2 /
+5,811,804), because our `Filter` only filters using two values (`0` and `20`).
+With such a low selectivity value it's not surprising that PostgreSQL decides
+using an index is not worth it, because it would produce almost no unique rows.
+
+## Rewriting queries
+
+So the above query can't really be optimised as-is, or at least not much. But
+what if we slightly change the purpose of it? What if instead of retrieving all
+projects with `visibility_level` 0 or 20, we retrieve those that a user
+interacted with somehow?
+
+Fortunately, GitLab has an answer for this, and it's a table called
+`user_interacted_projects`. This table has the following schema:
+
+```sql
+Table "public.user_interacted_projects"
+ Column | Type | Modifiers
+------------+---------+-----------
+ user_id | integer | not null
+ project_id | integer | not null
+Indexes:
+ "index_user_interacted_projects_on_project_id_and_user_id" UNIQUE, btree (project_id, user_id)
+ "index_user_interacted_projects_on_user_id" btree (user_id)
+Foreign-key constraints:
+ "fk_rails_0894651f08" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
+ "fk_rails_722ceba4f7" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE
+```
+
+Let's rewrite our query to `JOIN` this table onto our projects, and get the
+projects for a specific user:
+
+```sql
+EXPLAIN ANALYZE
+SELECT COUNT(*)
+FROM projects
+INNER JOIN user_interacted_projects ON user_interacted_projects.project_id = projects.id
+WHERE projects.visibility_level IN (0, 20)
+AND user_interacted_projects.user_id = 1;
+```
+
+What we do here is the following:
+
+1. Get our projects.
+1. `INNER JOIN` `user_interacted_projects`, meaning we're only left with rows in
+ `projects` that have a corresponding row in `user_interacted_projects`.
+1. Limit this to the projects with `visibility_level` of 0 or 20, and to
+ projects that the user with ID 1 interacted with.
+
+If we run this query we get the following plan:
+
+```sql
+ Aggregate (cost=871.03..871.04 rows=1 width=8) (actual time=9.763..9.763 rows=1 loops=1)
+ -> Nested Loop (cost=0.86..870.52 rows=203 width=0) (actual time=1.072..9.748 rows=143 loops=1)
+ -> Index Scan using index_user_interacted_projects_on_user_id on user_interacted_projects (cost=0.43..160.71 rows=205 width=4) (actual time=0.939..2.508 rows=145 loops=1)
+ Index Cond: (user_id = 1)
+ -> Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145)
+ Index Cond: (id = user_interacted_projects.project_id)
+ Filter: (visibility_level = ANY ('{0,20}'::integer[]))
+ Rows Removed by Filter: 0
+ Planning time: 2.614 ms
+ Execution time: 9.809 ms
+```
+
+Here it only took us just under 10 milliseconds to get the data. We can also see
+we're retrieving far fewer projects:
+
+```sql
+Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145)
+ Index Cond: (id = user_interacted_projects.project_id)
+ Filter: (visibility_level = ANY ('{0,20}'::integer[]))
+ Rows Removed by Filter: 0
+```
+
+Here we see we perform 145 loops (`loops=145`), with every loop producing 1 row
+(`rows=1`). This is much less than before, and our query performs much better!
+
+If we look at the plan we also see our costs are very low:
+
+```sql
+Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145)
+```
+
+Here our cost is only 3.45, and it takes us 7.25 milliseconds to do so (0.05 * 145).
+The next index scan is a bit more expensive:
+
+```sql
+Index Scan using index_user_interacted_projects_on_user_id on user_interacted_projects (cost=0.43..160.71 rows=205 width=4) (actual time=0.939..2.508 rows=145 loops=1)
+```
+
+Here the cost is 160.71 (`cost=0.43..160.71`), taking about 2.5 milliseconds
+(based on the output of `actual time=....`).
+
+The most expensive part here is the "Nested Loop" that acts upon the result of
+these two index scans:
+
+```sql
+Nested Loop (cost=0.86..870.52 rows=203 width=0) (actual time=1.072..9.748 rows=143 loops=1)
+```
+
+Here we had to perform 870.52 disk page fetches for 203 rows, 9.748
+milliseconds, producing 143 rows in a single loop.
+
+The key takeaway here is that sometimes you have to rewrite (parts of) a query
+to make it better. Sometimes that means having to slightly change your feature
+to accommodate for better performance.
+
+## What makes a bad plan
+
+This is a bit of a difficult question to answer, because the definition of "bad"
+is relative to the problem you are trying to solve. However, some patterns are
+best avoided in most cases, such as:
+
+- Sequential scans on large tables
+- Filters that remove a lot of rows
+- Performing a certain step that requires _a lot_ of
+ buffers (for example, an index scan for GitLab.com that requires more than 512 MB).
+
+As a general guideline, aim for a query that:
+
+1. Takes no more than 10 milliseconds. Our target time spent in SQL per request
+ is around 100 milliseconds, so every query should be as fast as possible.
+1. Does not use an excessive number of buffers, relative to the workload. For
+ example, retrieving ten rows shouldn't require 1 GB of buffers.
+1. Does not spend a long amount of time performing disk IO operations. The
+ setting `track_io_timing` must be enabled for this data to be included in the
+ output of `EXPLAIN ANALYZE`.
+1. Applies a `LIMIT` when retrieving rows without aggregating them, such as
+ `SELECT * FROM users`.
+1. Doesn't use a `Filter` to filter out too many rows, especially if the query
+ does not use a `LIMIT` to limit the number of returned rows. Filters can
+ usually be removed by adding a (partial) index.
+
+These are _guidelines_ and not hard requirements, as different needs may require
+different queries. The only _rule_ is that you _must always measure_ your query
+(preferably using a production-like database) using `EXPLAIN (ANALYZE, BUFFERS)`
+and related tools such as:
+
+- [`explain.depesz.com`](https://explain.depesz.com/).
+- [`explain.dalibo.com/`](https://explain.dalibo.com/).
+
+## Producing query plans
+
+There are a few ways to get the output of a query plan. Of course you
+can directly run the `EXPLAIN` query in the `psql` console, or you can
+follow one of the other options below.
+
+### Database Lab Engine
+
+GitLab team members can use [Database Lab Engine](https://gitlab.com/postgres-ai/database-lab), and the companion
+SQL optimization tool - [Joe Bot](https://gitlab.com/postgres-ai/joe).
+
+Database Lab Engine provides developers with their own clone of the production database, while Joe Bot helps with exploring execution plans.
+
+Joe Bot is available in the [`#database-lab`](https://gitlab.slack.com/archives/CLJMDRD8C) channel on Slack,
+and through its [web interface](https://console.postgres.ai/gitlab/joe-instances).
+
+With Joe Bot you can execute DDL statements (like creating indexes, tables, and columns) and get query plans for `SELECT`, `UPDATE`, and `DELETE` statements.
+
+For example, in order to test new index on a column that is not existing on production yet, you can do the following:
+
+Create the column:
+
+```sql
+exec ALTER TABLE projects ADD COLUMN last_at timestamp without time zone
+```
+
+Create the index:
+
+```sql
+exec CREATE INDEX index_projects_last_activity ON projects (last_activity_at) WHERE last_activity_at IS NOT NULL
+```
+
+Analyze the table to update its statistics:
+
+```sql
+exec ANALYZE projects
+```
+
+Get the query plan:
+
+```sql
+explain SELECT * FROM projects WHERE last_activity_at < CURRENT_DATE
+```
+
+Once done you can rollback your changes:
+
+```sql
+reset
+```
+
+For more information about the available options, run:
+
+```sql
+help
+```
+
+The web interface comes with the following execution plan visualizers included:
+
+- [Depesz](https://explain.depesz.com/)
+- [PEV2](https://github.com/dalibo/pev2)
+- [FlameGraph](https://github.com/mgartner/pg_flame)
+
+#### Tips & Tricks
+
+The database connection is now maintained during your whole session, so you can use `exec set ...` for any session variables (such as `enable_seqscan` or `work_mem`). These settings are applied to all subsequent commands until you reset them. For example you can disable parallel queries with
+
+```sql
+exec SET max_parallel_workers_per_gather = 0
+```
+
+### Rails console
+
+Using the [`activerecord-explain-analyze`](https://github.com/6/activerecord-explain-analyze)
+you can directly generate the query plan from the Rails console:
+
+```ruby
+pry(main)> require 'activerecord-explain-analyze'
+=> true
+pry(main)> Project.where('build_timeout > ?', 3600).explain(analyze: true)
+ Project Load (1.9ms) SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600)
+ ↳ (pry):12
+=> EXPLAIN for: SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600)
+Seq Scan on public.projects (cost=0.00..2.17 rows=1 width=742) (actual time=0.040..0.041 rows=0 loops=1)
+ Output: id, name, path, description, created_at, updated_at, creator_id, namespace_id, ...
+ Filter: (projects.build_timeout > 3600)
+ Rows Removed by Filter: 14
+ Buffers: shared hit=2
+Planning time: 0.411 ms
+Execution time: 0.113 ms
+```
+
+### ChatOps
+
+GitLab team members can also use our ChatOps solution, available in Slack
+using the [`/chatops` slash command](../chatops_on_gitlabcom.md).
+
+NOTE:
+While ChatOps is still available, the recommended way to generate execution plans is to use [Database Lab Engine](#database-lab-engine).
+
+You can use ChatOps to get a query plan by running the following:
+
+```sql
+/chatops run explain SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
+```
+
+Visualising the plan using <https://explain.depesz.com/> is also supported:
+
+```sql
+/chatops run explain --visual SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
+```
+
+Quoting the query is not necessary.
+
+For more information about the available options, run:
+
+```sql
+/chatops run explain --help
+```
+
+## Further reading
+
+A more extensive guide on understanding query plans can be found in
+the [presentation](https://public.dalibo.com/exports/conferences/_archives/_2012/201211_explain/understanding_explain.pdf)
+from [Dalibo.org](https://www.dalibo.com/en/).
+
+Depesz's blog also has a good [section](https://www.depesz.com/tag/unexplainable/) dedicated to query plans.
diff --git a/doc/development/database/verifying_database_capabilities.md b/doc/development/database/verifying_database_capabilities.md
new file mode 100644
index 00000000000..55347edf4ec
--- /dev/null
+++ b/doc/development/database/verifying_database_capabilities.md
@@ -0,0 +1,38 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Verifying Database Capabilities
+
+Sometimes certain bits of code may only work on a certain database
+version. While we try to avoid such code as much as possible sometimes it is
+necessary to add database (version) specific behavior.
+
+To facilitate this we have the following methods that you can use:
+
+- `ApplicationRecord.database.version`: returns the PostgreSQL version number as a string
+ in the format `X.Y.Z`.
+
+This allows you to write code such as:
+
+```ruby
+if ApplicationRecord.database.version.to_f >= 11.7
+ run_really_fast_query
+else
+ run_fast_query
+end
+```
+
+## Read-only database
+
+The database can be used in read-only mode. In this case we have to
+make sure all GET requests don't attempt any write operations to the
+database. If one of those requests wants to write to the database, it needs
+to be wrapped in a `Gitlab::Database.read_only?` or `Gitlab::Database.read_write?`
+guard, to make sure it doesn't for read-only databases.
+
+We have a Rails Middleware that filters any potentially writing
+operations (the `CUD` operations of CRUD) and prevent the user from trying
+to update the database and getting a 500 error (see `Gitlab::Middleware::ReadOnly`).
diff --git a/doc/development/database_debugging.md b/doc/development/database_debugging.md
index 5d46ade98bb..f18830ee7ca 100644
--- a/doc/development/database_debugging.md
+++ b/doc/development/database_debugging.md
@@ -1,177 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/database_debugging.md'
+remove_date: '2022-11-06'
---
-# Troubleshooting and Debugging Database
+This document was moved to [another location](database/database_debugging.md).
-This section is to help give some copy-pasta you can use as a reference when you
-run into some head-banging database problems.
-
-A first step is to search for your error in Slack, or search for `GitLab <my error>` with Google.
-
-Available `RAILS_ENV`:
-
-- `production` (generally not for your main GDK database, but you may need this for other installations such as Omnibus).
-- `development` (this is your main GDK db).
-- `test` (used for tests like RSpec).
-
-## Delete everything and start over
-
-If you just want to delete everything and start over with an empty DB (approximately 1 minute):
-
-```shell
-bundle exec rake db:reset RAILS_ENV=development
-```
-
-If you want to seed the empty DB with sample data (approximately 4 minutes):
-
-```shell
-bundle exec rake dev:setup
-```
-
-If you just want to delete everything and start over with sample data (approximately 4 minutes). This
-also does `db:reset` and runs DB-specific migrations:
-
-```shell
-bundle exec rake db:setup RAILS_ENV=development
-```
-
-If your test DB is giving you problems, it is safe to delete everything because it doesn't contain important
-data:
-
-```shell
-bundle exec rake db:reset RAILS_ENV=test
-```
-
-## Migration wrangling
-
-- `bundle exec rake db:migrate RAILS_ENV=development`: Execute any pending migrations that you may have picked up from a MR
-- `bundle exec rake db:migrate:status RAILS_ENV=development`: Check if all migrations are `up` or `down`
-- `bundle exec rake db:migrate:down VERSION=20170926203418 RAILS_ENV=development`: Tear down a migration
-- `bundle exec rake db:migrate:up VERSION=20170926203418 RAILS_ENV=development`: Set up a migration
-- `bundle exec rake db:migrate:redo VERSION=20170926203418 RAILS_ENV=development`: Re-run a specific migration
-
-## Manually access the database
-
-Access the database via one of these commands (they all get you to the same place)
-
-```shell
-gdk psql -d gitlabhq_development
-bundle exec rails dbconsole -e development
-bundle exec rails db -e development
-```
-
-- `\q`: Quit/exit
-- `\dt`: List all tables
-- `\d+ issues`: List columns for `issues` table
-- `CREATE TABLE board_labels();`: Create a table called `board_labels`
-- `SELECT * FROM schema_migrations WHERE version = '20170926203418';`: Check if a migration was run
-- `DELETE FROM schema_migrations WHERE version = '20170926203418';`: Manually remove a migration
-
-## Access the database with a GUI
-
-Most GUIs (DataGrip, RubyMine, DBeaver) require a TCP connection to the database, but by default
-the database runs on a UNIX socket. To be able to access the database from these tools, some steps
-are needed:
-
-1. On the GDK root directory, run:
-
- ```shell
- gdk config set postgresql.host localhost
- ```
-
-1. Open your `gdk.yml`, and confirm that it has the following lines:
-
- ```yaml
- postgresql:
- host: localhost
- ```
-
-1. Reconfigure GDK:
-
- ```shell
- gdk reconfigure
- ```
-
-1. On your database GUI, select `localhost` as host, `5432` as port and `gitlabhq_development` as database.
- Alternatively, you can use the connection string `postgresql://localhost:5432/gitlabhq_development`.
-
-The new connection should be working now.
-
-## Access the GDK database with Visual Studio Code
-
-Use these instructions for exploring the GitLab database while developing with the GDK:
-
-1. Install or open [Visual Studio Code](https://code.visualstudio.com/download).
-1. Install the [PostgreSQL VSCode Extension](https://marketplace.visualstudio.com/items?itemName=ckolkman.vscode-postgres).
-1. In Visual Studio Code select **PostgreSQL Explorer** in the left toolbar.
-1. In the top bar of the new window, select `+` to **Add Database Connection**, and follow the prompts to fill in the details:
- 1. **Hostname**: the path to the PostgreSQL folder in your GDK directory (for example `/dev/gitlab-development-kit/postgresql`).
- 1. **PostgreSQL user to authenticate as**: usually your local username, unless otherwise specified during PostgreSQL installation.
- 1. **Password of the PostgreSQL user**: the password you set when installing PostgreSQL.
- 1. **Port number to connect to**: `5432` (default).
- 1. **Use an SSL connection?** This depends on your installation. Options are:
- - **Use Secure Connection**
- - **Standard Connection** (default)
- 1. **Optional. The database to connect to**: `gitlabhq_development`.
- 1. **The display name for the database connection**: `gitlabhq_development`.
-
-Your database connection should now be displayed in the PostgreSQL Explorer pane and
-you can explore the `gitlabhq_development` database. If you cannot connect, ensure
-that GDK is running. For further instructions on how to use the PostgreSQL Explorer
-Extension for Visual Studio Code, read the [usage section](https://marketplace.visualstudio.com/items?itemName=ckolkman.vscode-postgres#usage)
-of the extension documentation.
-
-## FAQ
-
-### `ActiveRecord::PendingMigrationError` with Spring
-
-When running specs with the [Spring pre-loader](rake_tasks.md#speed-up-tests-rake-tasks-and-migrations),
-the test database can get into a corrupted state. Trying to run the migration or
-dropping/resetting the test database has no effect.
-
-```shell
-$ bundle exec spring rspec some_spec.rb
-...
-Failure/Error: ActiveRecord::Migration.maintain_test_schema!
-
-ActiveRecord::PendingMigrationError:
-
-
- Migrations are pending. To resolve this issue, run:
-
- bin/rake db:migrate RAILS_ENV=test
-# ~/.rvm/gems/ruby-2.3.3/gems/activerecord-4.2.10/lib/active_record/migration.rb:392:in `check_pending!'
-...
-0 examples, 0 failures, 1 error occurred outside of examples
-```
-
-To resolve, you can kill the spring server and app that lives between spec runs.
-
-```shell
-$ ps aux | grep spring
-eric 87304 1.3 2.9 3080836 482596 ?? Ss 10:12AM 4:08.36 spring app | gitlab | started 6 hours ago | test mode
-eric 37709 0.0 0.0 2518640 7524 s006 S Wed11AM 0:00.79 spring server | gitlab | started 29 hours ago
-$ kill 87304
-$ kill 37709
-```
-
-### db:migrate `database version is too old to be migrated` error
-
-Users receive this error when `db:migrate` detects that the current schema version
-is older than the `MIN_SCHEMA_VERSION` defined in the `Gitlab::Database` library
-module.
-
-Over time we cleanup/combine old migrations in the codebase, so it is not always
-possible to migrate GitLab from every previous version.
-
-In some cases you may want to bypass this check. For example, if you were on a version
-of GitLab schema later than the `MIN_SCHEMA_VERSION`, and then rolled back the
-to an older migration, from before. In this case, to migrate forward again,
-you should set the `SKIP_SCHEMA_VERSION_CHECK` environment variable.
-
-```shell
-bundle exec rake db:migrate SKIP_SCHEMA_VERSION_CHECK=true
-```
+<!-- This redirect file can be deleted after <2022-11-06>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/database_query_comments.md b/doc/development/database_query_comments.md
index 2798071bc06..7f9def7e567 100644
--- a/doc/development/database_query_comments.md
+++ b/doc/development/database_query_comments.md
@@ -1,62 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/database_query_comments.md'
+remove_date: '2022-11-05'
---
-# Database query comments with Marginalia
+This document was moved to [another location](database/database_query_comments.md).
-The [Marginalia gem](https://github.com/basecamp/marginalia) is used to add
-query comments containing application related context information to PostgreSQL
-queries generated by ActiveRecord.
-
-It is very useful for tracing problematic queries back to the application source.
-
-An engineer during an on-call incident has the full context of a query
-and its application source from the comments.
-
-## Metadata information in comments
-
-Queries generated from **Rails** include the following metadata in comments:
-
-- `application`
-- `correlation_id`
-- `endpoint_id`
-- `line`
-
-Queries generated from **Sidekiq** workers include the following metadata
-in comments:
-
-- `application`
-- `jid`
-- `correlation_id`
-- `endpoint_id`
-- `line`
-
-`endpoint_id` is a single field that can represent any endpoint in the application:
-
-- For Rails controllers, it's the controller and action. For example, `Projects::BlobController#show`.
-- For Grape API endpoints, it's the route. For example, `/api/:version/users/:id`.
-- For Sidekiq workers, it's the worker class name. For example, `UserStatusCleanup::BatchWorker`.
-
-`line` is not present in production logs due to the additional overhead required.
-
-Examples of queries with comments:
-
-- Rails:
-
- ```sql
- /*application:web,controller:blob,action:show,correlation_id:01EZVMR923313VV44ZJDJ7PMEZ,endpoint_id:Projects::BlobController#show*/ SELECT "routes".* FROM "routes" WHERE "routes"."source_id" = 75 AND "routes"."source_type" = 'Namespace' LIMIT 1
- ```
-
-- Grape:
-
- ```sql
- /*application:web,correlation_id:01EZVN0DAYGJF5XHG9N4VX8FAH,endpoint_id:/api/:version/users/:id*/ SELECT COUNT(*) FROM "users" INNER JOIN "user_follow_users" ON "users"."id" = "user_follow_users"."followee_id" WHERE "user_follow_users"."follower_id" = 1
- ```
-
-- Sidekiq:
-
- ```sql
- /*application:sidekiq,correlation_id:df643992563683313bc0a0288fb55e23,jid:15fbc506590c625d7664b074,endpoint_id:UserStatusCleanup::BatchWorker,line:/app/workers/user_status_cleanup/batch_worker.rb:19:in `perform'*/ SELECT $1 AS one FROM "user_statuses" WHERE "user_statuses"."clear_status_at" <= $2 LIMIT $3
- ```
+<!-- This redirect file can be deleted after <2022-11-05>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/database_review.md b/doc/development/database_review.md
index 2b215190e6d..2decd304103 100644
--- a/doc/development/database_review.md
+++ b/doc/development/database_review.md
@@ -113,6 +113,7 @@ the following preparations into account.
- Ensure `db/structure.sql` is updated as [documented](migration_style_guide.md#schema-changes), and additionally ensure that the relevant version files under
`db/schema_migrations` were added or removed.
+- Ensure that the Database Dictionary is updated as [documented](database/database_dictionary.md).
- Make migrations reversible by using the `change` method or include a `down` method when using `up`.
- Include either a rollback procedure or describe how to rollback changes.
- Add the output of both migrating (`db:migrate`) and rolling back (`db:rollback`) for all migrations into the MR description.
@@ -179,7 +180,7 @@ Include in the MR description:
- [explain.depesz.com](https://explain.depesz.com) or [explain.dalibo.com](https://explain.dalibo.com): Paste both the plan and the query used in the form.
- When providing query plans, make sure it hits enough data:
- You can use a GitLab production replica to test your queries on a large scale,
- through the `#database-lab` Slack channel or through [ChatOps](understanding_explain_plans.md#chatops).
+ through the `#database-lab` Slack channel or through [ChatOps](database/understanding_explain_plans.md#chatops).
- Usually, the `gitlab-org` namespace (`namespace_id = 9970`) and the
`gitlab-org/gitlab-foss` (`project_id = 13083`) or the `gitlab-org/gitlab` (`project_id = 278964`)
projects provide enough data to serve as a good example.
@@ -187,7 +188,7 @@ Include in the MR description:
- If your queries belong to a new feature in GitLab.com and thus they don't return data in production:
- You may analyze the query and to provide the plan from a local environment.
- `#database-lab` and [postgres.ai](https://postgres.ai/) both allow updates to data (`exec UPDATE issues SET ...`) and creation of new tables and columns (`exec ALTER TABLE issues ADD COLUMN ...`).
- - More information on how to find the number of actual returned records in [Understanding EXPLAIN plans](understanding_explain_plans.md)
+ - More information on how to find the number of actual returned records in [Understanding EXPLAIN plans](database/understanding_explain_plans.md)
- For query changes, it is best to provide both the SQL queries along with the
plan _before_ and _after_ the change. This helps spot differences quickly.
- Include data that shows the performance improvement, preferably in
@@ -200,7 +201,7 @@ Include in the MR description:
#### Preparation when adding tables
-- Order columns based on the [Ordering Table Columns](ordering_table_columns.md) guidelines.
+- Order columns based on the [Ordering Table Columns](database/ordering_table_columns.md) guidelines.
- Add foreign keys to any columns pointing to data in other tables, including [an index](migration_style_guide.md#adding-foreign-key-constraints).
- Add indexes for fields that are used in statements such as `WHERE`, `ORDER BY`, `GROUP BY`, and `JOIN`s.
- New tables and columns are not necessarily risky, but over time some access patterns are inherently
@@ -225,7 +226,7 @@ Include in the MR description:
- Consider [access patterns and data layout](database/layout_and_access_patterns.md) if new tables or columns are added.
- Review migrations follow [database migration style guide](migration_style_guide.md),
for example
- - [Check ordering of columns](ordering_table_columns.md)
+ - [Check ordering of columns](database/ordering_table_columns.md)
- [Check indexes are present for foreign keys](migration_style_guide.md#adding-foreign-key-constraints)
- Ensure that migrations execute in a transaction or only contain
concurrent index/foreign key helpers (with transactions disabled)
@@ -247,16 +248,16 @@ Include in the MR description:
- Making numerous SQL queries per record in a dataset.
- Review queries (for example, make sure batch sizes are fine)
- Because execution time can be longer than for a regular migration,
- it's suggested to treat background migrations as post migrations:
- place them in `db/post_migrate` instead of `db/migrate`. Keep in mind
- that post migrations are executed post-deployment in production.
+ it's suggested to treat background migrations as
+ [post migrations](migration_style_guide.md#choose-an-appropriate-migration-type):
+ place them in `db/post_migrate` instead of `db/migrate`.
- If a migration [has tracking enabled](database/background_migrations.md#background-jobs-tracking),
ensure `mark_all_as_succeeded` is called even if no work is done.
- Check [timing guidelines for migrations](migration_style_guide.md#how-long-a-migration-should-take)
- Check migrations are reversible and implement a `#down` method
- Check new table migrations:
- Are the stated access patterns and volume reasonable? Do the assumptions they're based on seem sound? Do these patterns pose risks to stability?
- - Are the columns [ordered to conserve space](ordering_table_columns.md)?
+ - Are the columns [ordered to conserve space](database/ordering_table_columns.md)?
- Are there foreign keys for references to other tables?
- Check data migrations:
- Establish a time estimate for execution on GitLab.com.
@@ -267,10 +268,10 @@ Include in the MR description:
- Check for any overly complex queries and queries the author specifically
points out for review (if any)
- If not present, ask the author to provide SQL queries and query plans
- (for example, by using [ChatOps](understanding_explain_plans.md#chatops) or direct
+ (for example, by using [ChatOps](database/understanding_explain_plans.md#chatops) or direct
database access)
- For given queries, review parameters regarding data distribution
- - [Check query plans](understanding_explain_plans.md) and suggest improvements
+ - [Check query plans](database/understanding_explain_plans.md) and suggest improvements
to queries (changing the query, schema or adding indexes and similar)
- - General guideline is for queries to come in below [100ms execution time](query_performance.md#timing-guidelines-for-queries)
+ - General guideline is for queries to come in below [100ms execution time](database/query_performance.md#timing-guidelines-for-queries)
- Avoid N+1 problems and minimize the [query count](merge_request_performance_guidelines.md#query-counts).
diff --git a/doc/development/db_dump.md b/doc/development/db_dump.md
index f2076cbc410..c632302329a 100644
--- a/doc/development/db_dump.md
+++ b/doc/development/db_dump.md
@@ -1,56 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/db_dump.md'
+remove_date: '2022-11-06'
---
-# Importing a database dump into a staging environment
+This document was moved to [another location](database/db_dump.md).
-Sometimes it is useful to import the database from a production environment
-into a staging environment for testing. The procedure below assumes you have
-SSH and `sudo` access to both the production environment and the staging VM.
-
-**Destroy your staging VM** when you are done with it. It is important to avoid
-data leaks.
-
-On the staging VM, add the following line to `/etc/gitlab/gitlab.rb` to speed up
-large database imports.
-
-```shell
-# On STAGING
-echo "postgresql['checkpoint_segments'] = 64" | sudo tee -a /etc/gitlab/gitlab.rb
-sudo touch /etc/gitlab/skip-auto-reconfigure
-sudo gitlab-ctl reconfigure
-sudo gitlab-ctl stop puma
-sudo gitlab-ctl stop sidekiq
-```
-
-Next, we let the production environment stream a compressed SQL dump to our
-local machine via SSH, and redirect this stream to a `psql` client on the staging
-VM.
-
-```shell
-# On LOCAL MACHINE
-ssh -C gitlab.example.com sudo -u gitlab-psql /opt/gitlab/embedded/bin/pg_dump -Cc gitlabhq_production |\
- ssh -C staging-vm sudo -u gitlab-psql /opt/gitlab/embedded/bin/psql -d template1
-```
-
-## Recreating directory structure
-
-If you need to re-create some directory structure on the staging server you can
-use this procedure.
-
-First, on the production server, create a list of directories you want to
-re-create.
-
-```shell
-# On PRODUCTION
-(umask 077; sudo find /var/opt/gitlab/git-data/repositories -maxdepth 1 -type d -print0 > directories.txt)
-```
-
-Copy `directories.txt` to the staging server and create the directories there.
-
-```shell
-# On STAGING
-sudo -u git xargs -0 mkdir -p < directories.txt
-```
+<!-- This redirect file can be deleted after <2022-11-06>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/deprecation_guidelines/index.md b/doc/development/deprecation_guidelines/index.md
index 4e1d2e22e78..f0364f60d38 100644
--- a/doc/development/deprecation_guidelines/index.md
+++ b/doc/development/deprecation_guidelines/index.md
@@ -4,10 +4,10 @@ group: unassigned
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
-# Deprecation guidelines
+# Deprecating GitLab features
-This page includes information about how and when to remove or make [breaking
-changes](../contributing/index.md#breaking-changes) to GitLab features.
+This page includes information about how and when to remove or make breaking changes
+to GitLab features.
## Terminology
@@ -37,6 +37,16 @@ changes](../contributing/index.md#breaking-changes) to GitLab features.
![Deprecation, End of Support, Removal process](img/deprecation_removal_process.png)
+**Breaking change**:
+
+A "breaking change" is any change that requires users to make a corresponding change to their code, settings, or workflow. "Users" might be humans, API clients, or even code classes that "use" another class. Examples of breaking changes include:
+
+- Removing a user-facing feature without a replacement/workaround.
+- Changing the definition of an existing API (by doing things like re-naming query parameters or changing routes).
+- Removing a public method from a code class.
+
+A breaking change can be considered major if it affects many users, or represents a significant change in behavior.
+
## When can a feature be deprecated?
Deprecations should be announced on the [Deprecated feature removal schedule](../../update/deprecations.md).
@@ -45,6 +55,12 @@ Do not include the deprecation announcement in the merge request that introduces
Use a separate MR to create a deprecation entry. For steps to create a deprecation entry, see
[Deprecations](https://about.gitlab.com/handbook/marketing/blog/release-posts/#deprecations).
+## How are Community Contributions to a deprecated feature handled?
+
+Development on deprecated features is restricted to Priority 1 / Severity 1 bug fixes. Any community contributions to deprecated features are unlikely to be prioritized during milestone planning.
+
+However, at GitLab, we [give agency](https://about.gitlab.com/handbook/values/#give-agency) to our team members. So, a member of the team associated with the contribution may decide to review and merge it at their discretion.
+
## When can a feature be removed/changed?
Generally, feature or configuration can be removed/changed only on major release.
diff --git a/doc/development/distributed_tracing.md b/doc/development/distributed_tracing.md
index 116071cdfd9..f49d024095d 100644
--- a/doc/development/distributed_tracing.md
+++ b/doc/development/distributed_tracing.md
@@ -73,14 +73,14 @@ In this example, we have the following hypothetical values:
- `driver`: the driver such a Jaeger.
- `param_name`, `param_value`: these are driver specific configuration values. Configuration
- parameters for Jaeger are documented [further on in this
- document](#2-configure-the-gitlab_tracing-environment-variable) they should be URL encoded.
+ parameters for Jaeger are documented [further on in this document](#2-configure-the-gitlab_tracing-environment-variable)
+ they should be URL encoded.
Multiple values should be separated by `&` characters like a URL.
## Using Jaeger in the GitLab Development Kit
-The first tracing implementation that GitLab supports is Jaeger, and the [GitLab Development
-Kit](https://gitlab.com/gitlab-org/gitlab-development-kit/) supports distributed tracing with
+The first tracing implementation that GitLab supports is Jaeger, and the
+[GitLab Development Kit](https://gitlab.com/gitlab-org/gitlab-development-kit/) supports distributed tracing with
Jaeger out-of-the-box.
The easiest way to access tracing from a GDK environment is through the
@@ -116,8 +116,8 @@ Jaeger has many configuration options, but is very easy to start in an "all-in-o
memory for trace storage (and is therefore non-persistent). The main advantage of "all-in-one" mode
being ease of use.
-For more detailed configuration options, refer to the [Jaeger
-documentation](https://www.jaegertracing.io/docs/1.9/getting-started/).
+For more detailed configuration options, refer to the
+[Jaeger documentation](https://www.jaegertracing.io/docs/1.9/getting-started/).
#### Using Docker
@@ -201,8 +201,8 @@ If `GITLAB_TRACING` is not configured correctly, this issue is logged:
```
By default, GitLab ships with the Jaeger tracer, but other tracers can be included at compile time.
-Details of how this can be done are included in the [LabKit tracing
-documentation](https://pkg.go.dev/gitlab.com/gitlab-org/labkit/tracing).
+Details of how this can be done are included in the
+[LabKit tracing documentation](https://pkg.go.dev/gitlab.com/gitlab-org/labkit/tracing).
If no log messages about tracing are emitted, the `GITLAB_TRACING` environment variable is likely
not set.
diff --git a/doc/development/documentation/restful_api_styleguide.md b/doc/development/documentation/restful_api_styleguide.md
index 92c34c01e5d..bf1461a810d 100644
--- a/doc/development/documentation/restful_api_styleguide.md
+++ b/doc/development/documentation/restful_api_styleguide.md
@@ -5,7 +5,7 @@ group: unassigned
description: 'Writing styles, markup, formatting, and other standards for the GitLab RESTful APIs.'
---
-# RESTful API
+# Documenting REST API resources
REST API resources are documented in Markdown under
[`/doc/api`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/doc/api). Each
@@ -104,6 +104,9 @@ for the section. For example:
> `widget_message` [introduced](<link-to-issue>) in GitLab 14.3.
```
+If the API or attribute is deployed behind a feature flag,
+[include the feature flag information](feature_flags.md) in the version history.
+
## Deprecations
To document the deprecation of an API endpoint, follow the steps to
diff --git a/doc/development/documentation/site_architecture/deployment_process.md b/doc/development/documentation/site_architecture/deployment_process.md
index 5203ca52922..5f6076f3195 100644
--- a/doc/development/documentation/site_architecture/deployment_process.md
+++ b/doc/development/documentation/site_architecture/deployment_process.md
@@ -142,6 +142,20 @@ graph LR
B--"Unpacked documentation uploaded"-->C
```
+### Manually deploy to production
+
+GitLab Docs is deployed to production whenever the `Build docs.gitlab.com every 4 hours` scheduled pipeline runs. By
+default, this pipeline runs every four hours.
+
+Maintainers can [manually](../../../ci/pipelines/schedules.md#run-manually) run this pipeline to force a deployment to
+production:
+
+1. Go to the [scheduled pipelines](https://gitlab.com/gitlab-org/gitlab-docs/-/pipeline_schedules) for `gitlab-docs`.
+1. Next to `Build docs.gitlab.com every 4 hours`, select **Play** (**{play}**).
+
+The updated documentation is available in production after the `pages` and `pages:deploy` jobs
+complete in the new pipeline.
+
## Docker files
The [`dockerfiles` directory](https://gitlab.com/gitlab-org/gitlab-docs/blob/main/dockerfiles/) contains all needed
@@ -150,10 +164,10 @@ Dockerfiles to build and deploy <https://docs.gitlab.com>. It is heavily inspire
| Dockerfile | Docker image | Description |
|:---------------------------------------------------------------------------------------------------------------------------|:------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| [`Dockerfile.bootstrap`](https://gitlab.com/gitlab-org/gitlab-docs/blob/main/dockerfiles/Dockerfile.bootstrap) | `gitlab-docs:bootstrap` | Contains all the dependencies that are needed to build the website. If the gems are updated and `Gemfile{,.lock}` changes, the image must be rebuilt. |
-| [`Dockerfile.builder.onbuild`](https://gitlab.com/gitlab-org/gitlab-docs/blob/main/dockerfiles/Dockerfile.builder.onbuild) | `gitlab-docs:builder-onbuild` | Base image to build the docs website. It uses `ONBUILD` to perform all steps and depends on `gitlab-docs:bootstrap`. |
-| [`Dockerfile.nginx.onbuild`](https://gitlab.com/gitlab-org/gitlab-docs/blob/main/dockerfiles/Dockerfile.nginx.onbuild) | `gitlab-docs:nginx-onbuild` | Base image to use for building documentation archives. It uses `ONBUILD` to perform all required steps to copy the archive, and relies upon its parent `Dockerfile.builder.onbuild` that is invoked when building single documentation archives (see the `Dockerfile` of each branch) |
-| [`Dockerfile.archives`](https://gitlab.com/gitlab-org/gitlab-docs/blob/main/dockerfiles/Dockerfile.archives) | `gitlab-docs:archives` | Contains all the versions of the website in one archive. It copies all generated HTML files from every version in one location. |
+| [`bootstrap.Dockerfile`](https://gitlab.com/gitlab-org/gitlab-docs/blob/main/dockerfiles/bootstrap.Dockerfile) | `gitlab-docs:bootstrap` | Contains all the dependencies that are needed to build the website. If the gems are updated and `Gemfile{,.lock}` changes, the image must be rebuilt. |
+| [`builder.onbuild.Dockerfile`](https://gitlab.com/gitlab-org/gitlab-docs/blob/main/dockerfiles/builder.onbuild.Dockerfile) | `gitlab-docs:builder-onbuild` | Base image to build the docs website. It uses `ONBUILD` to perform all steps and depends on `gitlab-docs:bootstrap`. |
+| [`nginx.onbuild.Dockerfile`](https://gitlab.com/gitlab-org/gitlab-docs/blob/main/dockerfiles/nginx.onbuild.Dockerfile) | `gitlab-docs:nginx-onbuild` | Base image to use for building documentation archives. It uses `ONBUILD` to perform all required steps to copy the archive, and relies upon its parent `Dockerfile.builder.onbuild` that is invoked when building single documentation archives (see the `Dockerfile` of each branch) |
+| [`archives.Dockerfile`](https://gitlab.com/gitlab-org/gitlab-docs/blob/main/dockerfiles/archives.Dockerfile) | `gitlab-docs:archives` | Contains all the versions of the website in one archive. It copies all generated HTML files from every version in one location. |
### How to build the images
diff --git a/doc/development/documentation/site_architecture/folder_structure.md b/doc/development/documentation/site_architecture/folder_structure.md
index 0e8065d794f..7f29d3fba9e 100644
--- a/doc/development/documentation/site_architecture/folder_structure.md
+++ b/doc/development/documentation/site_architecture/folder_structure.md
@@ -85,7 +85,7 @@ place for it.
Do not include the same information in multiple places.
[Link to a single source of truth instead.](../styleguide/index.md#link-instead-of-repeating-text)
-For example, if you have code in a repository other than the [primary repositories](index.md#architecture),
+For example, if you have code in a repository other than the [primary repositories](https://gitlab.com/gitlab-org/gitlab-docs/-/blob/main/doc/architecture.md),
and documentation in the same repository, you can keep the documentation in that repository.
Then you can either:
diff --git a/doc/development/documentation/site_architecture/global_nav.md b/doc/development/documentation/site_architecture/global_nav.md
index e1e0da03abc..05e697869b9 100644
--- a/doc/development/documentation/site_architecture/global_nav.md
+++ b/doc/development/documentation/site_architecture/global_nav.md
@@ -22,7 +22,7 @@ At the highest level, our global nav is workflow-based. Navigation needs to help
The levels under each of the higher workflow-based topics are the names of features.
For example:
-**Use GitLab** (_workflow_) **> Build your application** (_workflow_) **> CI/CD** (_feature_) **> Pipelines** (_feature)
+**Use GitLab** (_workflow_) **> Build your application** (_workflow_) **> CI/CD** (_feature_) **> Pipelines** (_feature_)
## Choose the right words for your navigation entry
@@ -39,20 +39,36 @@ as helpful as **Get started with runners**.
## Add a navigation entry
-All topics should be included in the left nav.
-
To add a topic to the global nav, edit
[`navigation.yaml`](https://gitlab.com/gitlab-org/gitlab-docs/blob/main/content/_data/navigation.yaml)
and add your item.
-All new pages need a navigation item. Without a navigation, the page becomes "orphaned." That
-is:
+Without a navigation entry:
+
+- The navigation closes when the page is opened, and the reader loses their place.
+- The page isn't visible in a group with other pages.
+
+### Pages you don't need to add
+
+Exclude these pages from the global nav:
+
+- Legal notices.
+- Pages in the `architecture/blueprints` directory.
+- Pages in the `user/application_security/dast/checks/` directory.
+
+The following pages should probably be in the global nav, but the technical writers
+do not actively work to add them:
+
+- Pages in the `/development` directory.
+- Pages authored by the support team, which are under the `doc/administration/troubleshooting` directory.
+
+Sometimes pages for deprecated features are not in the global nav, depending on how long ago the feature was deprecated.
-- The navigation shuts when the page is opened, and the reader loses their place.
-- The page doesn't belong in a group with other pages.
+All other pages should be in the global nav.
-This means the decision to create a new page is a decision to create new navigation item and vice
-versa.
+The technical writing team runs a report to determine which pages are not in the nav.
+For now this report is manual, but [an issue exists](https://gitlab.com/gitlab-org/gitlab-docs/-/issues/1212)
+to automate it.
### Where to add
@@ -283,7 +299,7 @@ The [layout](https://gitlab.com/gitlab-org/gitlab-docs/blob/main/layouts/global_
is fed by the [data file](#data-file), builds the global nav, and is rendered by the
[default](https://gitlab.com/gitlab-org/gitlab-docs/blob/main/layouts/default.html) layout.
-The global nav contains links from all [four upstream projects](index.md#architecture).
+The global nav contains links from all [four upstream projects](https://gitlab.com/gitlab-org/gitlab-docs/-/blob/main/doc/architecture.md).
The [global nav URL](#urls) has a different prefix depending on the documentation file you change.
| Repository | Link prefix | Final URL |
diff --git a/doc/development/documentation/site_architecture/index.md b/doc/development/documentation/site_architecture/index.md
index af24fbe303b..2864bbe7404 100644
--- a/doc/development/documentation/site_architecture/index.md
+++ b/doc/development/documentation/site_architecture/index.md
@@ -11,247 +11,30 @@ the repository which is used to generate the GitLab documentation website and
is deployed to <https://docs.gitlab.com>. It uses the [Nanoc](https://nanoc.app/)
static site generator.
-## Architecture
+View the [`gitlab-docs` architecture page](https://gitlab.com/gitlab-org/gitlab-docs/-/blob/main/doc/architecture.md)
+for more information.
-While the source of the documentation content is stored in the repositories for
-each GitLab product, the source that is used to build the documentation
-site _from that content_ is located at <https://gitlab.com/gitlab-org/gitlab-docs>.
+## Documentation in other repositories
-The following diagram illustrates the relationship between the repositories
-from where content is sourced, the `gitlab-docs` project, and the published output.
-
-```mermaid
- graph LR
- A[gitlab-org/gitlab/doc]
- B[gitlab-org/gitlab-runner/docs]
- C[gitlab-org/omnibus-gitlab/doc]
- D[gitlab-org/charts/gitlab/doc]
- E[gitlab-org/cloud-native/gitlab-operator/doc]
- Y[gitlab-org/gitlab-docs]
- A --> Y
- B --> Y
- C --> Y
- D --> Y
- E --> Y
- Y -- Build pipeline --> Z
- Z[docs.gitlab.com]
- M[//ee/]
- N[//runner/]
- O[//omnibus/]
- P[//charts/]
- Q[//operator/]
- Z --> M
- Z --> N
- Z --> O
- Z --> P
- Z --> Q
-```
-
-GitLab docs content isn't kept in the `gitlab-docs` repository.
-All documentation files are hosted in the respective repository of each
-product, and all together are pulled to generate the docs website:
-
-- [GitLab](https://gitlab.com/gitlab-org/gitlab/-/tree/master/doc)
-- [Omnibus GitLab](https://gitlab.com/gitlab-org/omnibus-gitlab/-/tree/master/doc)
-- [GitLab Runner](https://gitlab.com/gitlab-org/gitlab-runner/-/tree/main/docs)
-- [GitLab Chart](https://gitlab.com/gitlab-org/charts/gitlab/-/tree/master/doc)
-- [GitLab Operator](https://gitlab.com/gitlab-org/cloud-native/gitlab-operator/-/tree/master/doc)
-
-Learn more about [the docs folder structure](folder_structure.md).
-
-### Documentation in other repositories
-
-If you have code and documentation in a repository other than the [primary repositories](#architecture),
+If you have code and documentation in a repository other than the [primary repositories](https://gitlab.com/gitlab-org/gitlab-docs/-/blob/main/doc/architecture.md),
you should keep the documentation with the code in that repository.
-Then you can either:
+Then you can use one of these approaches:
-- [Add the repository to the list of products](https://gitlab.com/gitlab-org/gitlab-docs/-/blob/main/doc/development.md#add-a-new-product)
- published at <https://docs.gitlab.com>.
-- [Add an entry in the global navigation](global_nav.md#add-a-navigation-entry) for
- <https://docs.gitlab.com> that links to the documentation in that repository.
+- Recommended. [Add the repository to the list of products](https://gitlab.com/gitlab-org/gitlab-docs/-/blob/main/doc/development.md#add-a-new-product)
+ published at <https://docs.gitlab.com>. The source of the documentation pages remains
+ in the external repository, but the resulting pages are indexed and searchable on <https://docs.gitlab.com>.
+- Recommended. [Add an entry in the global navigation](global_nav.md#add-a-navigation-entry) for
+ <https://docs.gitlab.com> that links directly to the documentation in that external repository.
+ The documentation pages are not indexed or searchable on <https://docs.gitlab.com>.
View [an example](https://gitlab.com/gitlab-org/gitlab-docs/-/blob/fedb6378a3c92274ba3b6031df0d34455594e4cc/content/_data/navigation.yaml#L2944-L2946).
-
-## Assets
-
-To provide an optimized site structure, design, and a search-engine friendly
-website, along with a discoverable documentation, we use a few assets for
-the GitLab Documentation website.
-
-### External libraries
-
-GitLab Docs is built with a combination of external:
-
-- [JavaScript libraries](https://gitlab.com/gitlab-org/gitlab-docs/-/blob/main/package.json).
-- [Ruby libraries](https://gitlab.com/gitlab-org/gitlab-docs/-/blob/main/Gemfile).
-
-### SEO
-
-- [Schema.org](https://schema.org/)
-- [Google Analytics](https://marketingplatform.google.com/about/analytics/)
-- [Google Tag Manager](https://developers.google.com/tag-platform/tag-manager)
-
-## Global navigation
-
-Read through [the global navigation documentation](global_nav.md) to understand:
-
-- How the global navigation is built.
-- How to add new navigation items.
-
-<!--
-## Helpers
-
-TBA
--->
-
-## Pipelines
-
-The pipeline in the `gitlab-docs` project:
-
-- Tests changes to the docs site code.
-- Builds the Docker images used in various pipeline jobs.
-- Builds and deploys the docs site itself.
-- Generates the review apps when the `review-docs-deploy` job is triggered.
-
-### Rebuild the docs site Docker images
-
-Once a week on Mondays, a scheduled pipeline runs and rebuilds the Docker images
-used in various pipeline jobs, like `docs-lint`. The Docker image configuration files are
-located in the [Dockerfiles directory](https://gitlab.com/gitlab-org/gitlab-docs/-/tree/main/dockerfiles).
-
-If you need to rebuild the Docker images immediately (must have maintainer level permissions):
-
-WARNING:
-If you change the Dockerfile configuration and rebuild the images, you can break the main
-pipeline in the main `gitlab` repository as well as in `gitlab-docs`. Create an image with
-a different name first and test it to ensure you do not break the pipelines.
-
-1. In [`gitlab-docs`](https://gitlab.com/gitlab-org/gitlab-docs), go to **{rocket}** **CI/CD > Pipelines**.
-1. Select **Run pipeline**.
-1. See that a new pipeline is running. The jobs that build the images are in the first
- stage, `build-images`. You can select the pipeline number to see the larger pipeline
- graph, or select the first (`build-images`) stage in the mini pipeline graph to
- expose the jobs that build the images.
-1. Select the **play** (**{play}**) button next to the images you want to rebuild.
- - Normally, you do not need to rebuild the `image:gitlab-docs-base` image, as it
- rarely changes. If it does need to be rebuilt, be sure to only run `image:docs-lint`
- after it is finished rebuilding.
-
-### Deploy the docs site
-
-Every four hours a scheduled pipeline builds and deploys the docs site. The pipeline
-fetches the current docs from the main project's main branch, builds it with Nanoc
-and deploys it to <https://docs.gitlab.com>.
-
-To build and deploy the site immediately (must have the Maintainer role):
-
-1. In [`gitlab-docs`](https://gitlab.com/gitlab-org/gitlab-docs), go to **{rocket}** **CI/CD > Schedules**.
-1. For the `Build docs.gitlab.com every 4 hours` scheduled pipeline, select the **play** (**{play}**) button.
-
-Read more about [documentation deployments](deployment_process.md).
-
-## Using YAML data files
-
-The easiest way to achieve something similar to
-[Jekyll's data files](https://jekyllrb.com/docs/datafiles/) in Nanoc is by
-using the [`@items`](https://nanoc.app/doc/reference/variables/#items-and-layouts)
-variable.
-
-The data file must be placed inside the `content/` directory and then it can
-be referenced in an ERB template.
-
-Suppose we have the `content/_data/versions.yaml` file with the content:
-
-```yaml
-versions:
- - 10.6
- - 10.5
- - 10.4
-```
-
-We can then loop over the `versions` array with something like:
-
-```erb
-<% @items['/_data/versions.yaml'][:versions].each do | version | %>
-
-<h3><%= version %></h3>
-
-<% end &>
-```
-
-Note that the data file must have the `yaml` extension (not `yml`) and that
-we reference the array with a symbol (`:versions`).
-
-## Archived documentation banner
-
-A banner is displayed on archived documentation pages with the text `This is archived documentation for
-GitLab. Go to the latest.` when either:
-
-- The version of the documentation displayed is not the first version entry in `online` in
- `content/_data/versions.yaml`.
-- The documentation was built from the default branch (`main`).
-
-For example, if the `online` entries for `content/_data/versions.yaml` are:
-
-```yaml
-online:
- - "14.4"
- - "14.3"
- - "14.2"
-```
-
-In this case, the archived documentation banner isn't displayed:
-
-- For 14.4, the docs built from the `14.4` branch. The branch name is the first entry in `online`.
-- For 14.5-pre, the docs built from the default project branch (`main`).
-
-The archived documentation banner is displayed:
-
-- For 14.3.
-- For 14.2.
-- For any other version.
-
-## Bumping versions of CSS and JavaScript
-
-Whenever the custom CSS and JavaScript files under `content/assets/` change,
-make sure to bump their version in the front matter. This method guarantees that
-your changes take effect by clearing the cache of previous files.
-
-Always use Nanoc's way of including those files, do not hardcode them in the
-layouts. For example use:
-
-```erb
-<script async type="application/javascript" src="<%= @items['/assets/javascripts/badges.*'].path %>"></script>
-
-<link rel="stylesheet" href="<%= @items['/assets/stylesheets/toc.*'].path %>">
-```
-
-The links pointing to the files should be similar to:
-
-```erb
-<%= @items['/path/to/assets/file.*'].path %>
-```
-
-Nanoc then builds and renders those links correctly according with what's
-defined in [`Rules`](https://gitlab.com/gitlab-org/gitlab-docs/blob/main/Rules).
-
-## Linking to source files
-
-A helper called [`edit_on_gitlab`](https://gitlab.com/gitlab-org/gitlab-docs/blob/main/lib/helpers/edit_on_gitlab.rb) can be used
-to link to a page's source file. We can link to both the simple editor and the
-web IDE. Here's how you can use it in a Nanoc layout:
-
-- Default editor: `<a href="<%= edit_on_gitlab(@item, editor: :simple) %>">Simple editor</a>`
-- Web IDE: `<a href="<%= edit_on_gitlab(@item, editor: :webide) %>">Web IDE</a>`
-
-If you don't specify `editor:`, the simple one is used by default.
-
-## Algolia search engine
-
-The docs site uses [Algolia DocSearch](https://community.algolia.com/docsearch/)
-for its search function.
-
-Learn more in <https://gitlab.com/gitlab-org/gitlab-docs/-/blob/main/doc/docsearch.md>.
+- Create a landing page for the product in the `gitlab` repository, and add the landing page
+ [to the global navigation](global_nav.md#add-a-navigation-entry), but keep the rest
+ of the documentation in the external repository. The landing page is indexed and
+ searchable on <https://docs.gitlab.com>, but the rest of the documentation is not.
+ For example, the [GitLab Workflow extension for VS Code](../../../user/project/repository/vscode.md).
+ We do not encourage the use of [pages with lists of links](../structure.md#topics-and-resources-pages),
+ so only use this option if the recommended options are not feasible.
## Monthly release process (versions)
@@ -260,5 +43,5 @@ For more information, read about the [monthly release process](https://gitlab.co
## Review Apps for documentation merge requests
-If you are contributing to GitLab docs read how to [create a Review App with each
-merge request](../index.md#previewing-the-changes-live).
+If you are contributing to GitLab docs read how to
+[create a Review App with each merge request](../index.md#previewing-the-changes-live).
diff --git a/doc/development/documentation/structure.md b/doc/development/documentation/structure.md
index a02046d4466..a5d1290a17a 100644
--- a/doc/development/documentation/structure.md
+++ b/doc/development/documentation/structure.md
@@ -148,35 +148,60 @@ Avoid these heading titles:
## Troubleshooting
-Troubleshooting can be one of two categories:
+Troubleshooting topics should be the last topics on a page.
-- **Troubleshooting task.** This information is written the same way as a [standard task](#task).
+Troubleshooting can be one of three categories:
+
+- **An introductory topic.** This topic introduces the troubleshooting section of a page.
+ For example:
+
+ ```markdown
+ ## Troubleshooting
+
+ When working with <x feature>, you might encounter the following issues.
+ ```
+
+- **Troubleshooting task.** The title should be similar to a [standard task](#task).
For example, "Run debug tools" or "Verify syntax."
-- **Troubleshooting reference.** This information has a specific format.
-Troubleshooting reference information should be in this format:
+- **Troubleshooting reference.** This information includes the error message. For example:
-```markdown
-# Title (the error message or a description of it)
+ ```markdown
+ ### The error message or a description of it
-You might get an error that states <error message>.
+ You might get an error that states <error message>.
-This issue occurs when...
+ This issue occurs when...
-The workaround is...
-```
+ The workaround is...
+ ```
-If multiple causes or workarounds exist, consider putting them into a table format.
+ If multiple causes or workarounds exist, consider putting them into a table format.
+ If you use the exact error message, surround it in backticks so it's styled as code.
+
+If a page has more than five troubleshooting topics, put the content on a separate page that has troubleshooting information exclusively. Name the page `Troubleshooting <featurename>`.
### Troubleshooting headings
-For the heading:
+For the heading of a **Troubleshooting reference** topic:
- Consider including at least a partial error message.
- Use fewer than 70 characters.
If you do not put the full error in the title, include it in the body text.
+### Related topics
+
+If inline links are not sufficient, you can create a topic called **Related topics**
+and include an unordered list of related topics. This topic should be above the Troubleshooting section.
+
+```markdown
+# Related topics
+
+- [Configure your pipeline](link-to-topic)
+- [Trigger a pipeline manually](link-to-topic)
+```
+
## General heading text guidelines
In general, for heading text:
@@ -272,18 +297,6 @@ If you need to add more than one task,
consider using subsections for each distinct task.
```
-### Related topics
-
-If inline links are not sufficient, you can create a topic called **Related topics**
-and include an unordered list of related topics. This topic should be above the Troubleshooting section.
-
-```markdown
-# Related topics
-
-- [Configure your pipeline](link-to-topic)
-- [Trigger a pipeline manually](link-to-topic)
-```
-
### Topics and resources pages
This page has a list of links that point to important sections
diff --git a/doc/development/documentation/styleguide/index.md b/doc/development/documentation/styleguide/index.md
index 1af0cb72055..709e6b2d0d9 100644
--- a/doc/development/documentation/styleguide/index.md
+++ b/doc/development/documentation/styleguide/index.md
@@ -150,6 +150,8 @@ the page is rendered to HTML. There can be only **one** level 1 heading per page
- For each subsection, increment the heading level. In other words, increment the number of `#` characters
in front of the heading.
+- Avoid headings greater than `H5` (`#####`). If you need more than five heading levels, move the topics to a new page instead.
+ Headings greater than `H5` do not display in the right sidebar navigation.
- Do not skip a level. For example: `##` > `####`.
- Leave one blank line before and after the heading.
@@ -162,6 +164,13 @@ Also, do not use links as part of heading text.
See also [heading guidelines for specific topic types](../structure.md).
+### Backticks in Markdown
+
+Use backticks for:
+
+- [Code blocks](#code-blocks).
+- Error messages.
+
### Markdown Rules
GitLab ensures that the Markdown used across all documentation is consistent, as
@@ -722,10 +731,12 @@ We include guidance for links in these categories:
- Use inline link Markdown markup `[Text](https://example.com)`.
It's easier to read, review, and maintain. Do not use `[Text][identifier]` reference-style links.
-
- Use meaningful anchor text.
For example, instead of writing something like `Read more about merge requests [here](LINK)`,
write `Read more about [merge requests](LINK)`.
+- Put the entire link on a single line. Some of our [linters](../testing.md) do not
+ validate links when split over multiple lines, and incorrect or broken links could
+ slip through.
### Links to internal documentation
@@ -787,45 +798,15 @@ section of GitLab.
### Links to external documentation
-When describing interactions with external software, it's often helpful to
-include links to external documentation. When possible, make sure that you're
-linking to an [**authoritative** source](#authoritative-sources). For example,
-if you're describing a feature in Microsoft's Active Directory, include a link
-to official Microsoft documentation.
-
-### Authoritative sources
-
-When citing external information, use sources that are written by the people who
-created the item or product in question. These sources are the most likely to be
-accurate and remain up to date.
-
-Examples of authoritative sources include:
-
-- Specifications, such as a [Request for Comments](https://www.ietf.org/standards/rfcs/)
- document from the Internet Engineering Task Force.
-- Official documentation for a product. For example, if you're setting up an
- interface with the Google OAuth 2 authorization server, include a link to
- Google's documentation.
-- Official documentation for a project. For example, if you're citing NodeJS
- functionality, refer directly to [NodeJS documentation](https://nodejs.org/en/docs/).
-- Books from an authoritative publisher.
+When possible, avoid links to external documentation. These links can easily become outdated, and are difficult to maintain.
-Examples of sources to avoid include:
+- [They lead to link rot](https://en.wikipedia.org/wiki/Link_rot).
+- [They create issues with maintenance](https://gitlab.com/gitlab-org/gitlab/-/issues/368300).
-- Personal blog posts.
-- Wikipedia.
-- Non-trustworthy articles.
-- Discussions on forums such as Stack Overflow.
-- Documentation from a company that describes another company's product.
+Sometimes links are required. They might clarify troubleshooting steps or help prevent duplication of content.
+Sometimes they are more precise and will be maintained more actively.
-While many of these sources to avoid can help you learn skills and or features,
-they can become obsolete quickly. Nobody is obliged to maintain any of these
-sites. Therefore, we should avoid using them as reference literature.
-
-NOTE:
-Non-authoritative sources are acceptable only if there is no equivalent
-authoritative source. Even then, focus on non-authoritative sources that are
-extensively cited or peer-reviewed.
+For each external link you add, weigh the customer benefit with the maintenance difficulties.
### Links requiring permissions
@@ -950,6 +931,16 @@ For example:
1. Optional. Enter a description for the job.
```
+### Recommended steps
+
+If a step is recommended, start the step with the word `Recommended` followed by a period.
+
+For example:
+
+```markdown
+1. Recommended. Enter a description for the job.
+```
+
### Documenting multiple fields at once
If the UI text sufficiently explains the fields in a section, do not include a task step for every field.
@@ -1106,6 +1097,36 @@ include a visual representation to help readers understand it, you can:
an area of the screen.
- Create a short video of the interaction and link to it.
+## Emojis
+
+Don't use the Markdown emoji format, for example `:smile:`, for any purpose. Use
+[GitLab SVG icons](#gitlab-svg-icons) instead.
+
+Use of emoji in Markdown requires GitLab Flavored Markdown, which is not supported by Kramdown,
+the Markdown rendering engine used for GitLab documentation.
+
+## GitLab SVG icons
+
+> [Introduced](https://gitlab.com/gitlab-org/gitlab-docs/-/issues/384) in GitLab 12.7.
+
+You can use icons from the [GitLab SVG library](https://gitlab-org.gitlab.io/gitlab-svgs/)
+directly in the documentation. For example, `**{tanuki}**` renders as: **{tanuki}**.
+
+In most cases, you should avoid using the icons in text.
+However, you can use an icon when hover text is the only
+available way to describe a UI element. For example, **Delete** or **Edit** buttons
+often have hover text only.
+
+When you do use an icon, start with the hover text and follow it with the SVG reference in parentheses.
+
+- Avoid: `Select **{pencil}** **Edit**.` This generates as: Select **{pencil}** **Edit**.
+- Use instead: `Select **Edit** (**{pencil}**).` This generates as: Select **Edit** (**{pencil}**).
+
+Do not use words to describe the icon:
+
+- Avoid: `Select **Erase job log** (the trash icon).`
+- Use instead: `Select **Erase job log** (**{remove}**).` This generates as: Select **Erase job log** (**{remove}**).
+
## Videos
Adding GitLab YouTube video tutorials to the documentation is highly
@@ -1187,28 +1208,6 @@ different mobile devices.
`/help`, because the GitLab Markdown processor doesn't support iframes. It's
hidden on the documentation site, but is displayed by `/help`.
-## GitLab SVG icons
-
-> [Introduced](https://gitlab.com/gitlab-org/gitlab-docs/-/issues/384) in GitLab 12.7.
-
-You can use icons from the [GitLab SVG library](https://gitlab-org.gitlab.io/gitlab-svgs/)
-directly in the documentation. For example, `**{tanuki}**` renders as: **{tanuki}**.
-
-In most cases, you should avoid using the icons in text.
-However, you can use an icon when hover text is the only
-available way to describe a UI element. For example, **Delete** or **Edit** buttons
-often have hover text only.
-
-When you do use an icon, start with the hover text and follow it with the SVG reference in parentheses.
-
-- Avoid: `Select **{pencil}** **Edit**.` This generates as: Select **{pencil}** **Edit**.
-- Use instead: `Select **Edit** (**{pencil}**).` This generates as: Select **Edit** (**{pencil}**).
-
-Do not use words to describe the icon:
-
-- Avoid: `Select **Erase job log** (the trash icon).`
-- Use instead: `Select **Erase job log** (**{remove}**).` This generates as: Select **Erase job log** (**{remove}**).
-
## Alert boxes
Use alert boxes to call attention to information. Use them sparingly, and never have an alert box immediately follow another alert box.
diff --git a/doc/development/documentation/styleguide/word_list.md b/doc/development/documentation/styleguide/word_list.md
index c753c39b727..1976caefc8e 100644
--- a/doc/development/documentation/styleguide/word_list.md
+++ b/doc/development/documentation/styleguide/word_list.md
@@ -239,6 +239,13 @@ Use **CI/CD minutes** instead of **CI minutes**, **pipeline minutes**, **pipelin
Do not use **click**. Instead, use **select** with buttons, links, menu items, and lists.
**Select** applies to more devices, while **click** is more specific to a mouse.
+## cloud native
+
+When you're talking about using a Kubernetes cluster to host GitLab, you're talking about a **cloud-native version of GitLab**.
+This version is different than the larger, more monolithic **Omnibus package** that is used to deploy GitLab.
+
+You can also use **cloud-native GitLab** for short. It should be hyphenated and lowercase.
+
## collapse
Use **collapse** instead of **close** when you are talking about expanding or collapsing a section in the UI.
@@ -434,6 +441,17 @@ Do not make **GitLab** possessive (GitLab's). This guidance follows [GitLab Trad
**GitLab.com** refers to the GitLab instance managed by GitLab itself.
+## GitLab Helm chart, GitLab chart
+
+To deploy a cloud-native version of GitLab, use:
+
+- The GitLab Helm chart (long version)
+- The GitLab chart (short version)
+
+Do not use **the `gitlab` chart**, **the GitLab Chart**, or **the cloud-native chart**.
+
+You use the **GitLab Helm chart** to deploy **cloud-native GitLab** in a Kubernetes cluster.
+
## GitLab Flavored Markdown
When possible, spell out [**GitLab Flavored Markdown**](../../../user/markdown.md).
@@ -1127,7 +1145,7 @@ in present tense, active voice.
## you, your, yours
Use **you**, **your**, and **yours** instead of [**the user** and **the user's**](#user-users).
-Documentation should be from the [point of view](https://design.gitlab.com/content/voice-tone#point-of-view) of the reader.
+Documentation should be from the [point of view](https://design.gitlab.com/content/voice-tone/#point-of-view) of the reader.
Use:
diff --git a/doc/development/documentation/testing.md b/doc/development/documentation/testing.md
index d55cbe28d9b..428a57a11fb 100644
--- a/doc/development/documentation/testing.md
+++ b/doc/development/documentation/testing.md
@@ -81,6 +81,36 @@ This requires you to either:
### Documentation link tests
+Merge requests containing changes to Markdown (`.md`) files run a `docs-lint links`
+job, which runs two types of link checks. In both cases, links with destinations
+that begin with `http` or `https` are considered external links, and skipped:
+
+- `bundle exec nanoc check internal_links`: Tests links to internal pages.
+- `bundle exec nanoc check internal_anchors`: Tests links to subheadings (anchors) on internal pages.
+
+Failures from these tests are displayed at the end of the test results in the **Issues found!** area.
+For example, failures in the `internal_anchors` test follow this format:
+
+```plaintext
+[ ERROR ] internal_anchors - Broken anchor detected!
+ - source file `/tmp/gitlab-docs/public/ee/user/application_security/api_fuzzing/index.html`
+ - destination `/tmp/gitlab-docs/public/ee/development/code_review.html`
+ - link `../../../development/code_review.html#review-response-slo`
+ - anchor `#review-response-slo`
+```
+
+- **Source file**: The full path to the file containing the error. To find the
+ file in the `gitlab` repository, replace `/tmp/gitlab-docs/public/ee` with `doc`, and `.html` with `.md`.
+- **Destination**: The full path to the file not found by the test. To find the
+ file in the `gitlab` repository, replace `/tmp/gitlab-docs/public/ee` with `doc`, and `.html` with `.md`.
+- **Link**: The actual link the script attempted to find.
+- **Anchor**: If present, the subheading (anchor) the script attempted to find.
+
+Check for multiple instances of the same broken link on each page reporting an error.
+Even if a specific broken link appears multiple times on a page, the test reports it only once.
+
+#### Run document link tests locally
+
To execute documentation link tests locally:
1. Navigate to the [`gitlab-docs`](https://gitlab.com/gitlab-org/gitlab-docs) directory.
@@ -219,12 +249,12 @@ You can use markdownlint:
### Vale
-[Vale](https://docs.errata.ai/vale/about/) is a grammar, style, and word usage linter for the
+[Vale](https://vale.sh/) is a grammar, style, and word usage linter for the
English language. Vale's configuration is stored in the
[`.vale.ini`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/.vale.ini) file located in the root
directory of projects.
-Vale supports creating [custom tests](https://docs.errata.ai/vale/styles) that extend any of
+Vale supports creating [custom tests](https://vale.sh/docs/topics/styles/) that extend any of
several types of checks, which we store in the `.linting/vale/styles/gitlab` directory in the
documentation directory of projects.
@@ -241,7 +271,7 @@ This configuration is also used in build pipelines, where
You can use Vale:
-- [On the command line](https://docs.errata.ai/vale/cli).
+- [On the command line](https://vale.sh/docs/vale-cli/structure/).
- [In a code editor](#configure-editors).
- [In a Git hook](#configure-pre-push-hooks). Vale only reports errors in the Git hook (the same
configuration as the CI/CD pipelines), and does not report suggestions or warnings.
@@ -250,7 +280,8 @@ You can use Vale:
Vale returns three types of results:
-- **Error** - For branding and trademark issues, and words or phrases with ambiguous meanings.
+- **Error** - For branding guidelines, trademark guidelines, and anything that causes content on
+ the docs site to render incorrectly.
- **Warning** - For Technical Writing team style preferences.
- **Suggestion** - For basic technical writing tenets and best practices.
@@ -304,7 +335,30 @@ For example, a page that scores `12` before a set of changes, and `9` after, ind
general complexity level of the page.
The readability score is calculated based on the number of words per sentence, and the number
-of syllables per word. For more information, see [the Vale documentation](https://docs.errata.ai/vale/styles#metric).
+of syllables per word. For more information, see [the Vale documentation](https://vale.sh/docs/topics/styles/#metric).
+
+#### When to add a new Vale rule
+
+It's tempting to add a Vale rule for every style guide rule. However, we should be
+mindful of the effort to create and enforce a Vale rule, and the noise it creates.
+
+In general, follow these guidelines:
+
+- If you add an [error-level Vale rule](#vale-result-types), you must fix
+ the existing occurrences of the issue in the documentation before you can add the rule.
+
+ If there are too many issues to fix in a single merge request, add the rule at a
+ `warning` level. Then, fix the existing issues in follow-up merge requests.
+ When the issues are fixed, promote the rule to an `error`.
+
+- If you add a warning-level or suggestion-level rule, consider:
+
+ - How many more warnings or suggestions it creates in the Vale output. If the
+ number of additional warnings is significant, the rule might be too broad.
+
+ - How often an author might ignore it because it's acceptable in the context.
+ If the rule is too subjective, it cannot be adequately enforced and creates
+ unnecessary additional warnings.
### Install linters
@@ -399,8 +453,6 @@ To configure Vale in your editor, install one of the following as appropriate:
In this setup the `markdownlint` checker is set as a "next" checker from the defined `vale` checker.
Enabling this custom Vale checker provides error linting from both Vale and markdownlint.
-We don't use [Vale Server](https://docs.errata.ai/vale-server/install).
-
### Configure pre-push hooks
Git [pre-push hooks](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks) allow Git users to:
@@ -479,7 +531,7 @@ document:
Whenever possible, exclude only the problematic rule and lines.
For more information, see
-[Vale's documentation](https://docs.errata.ai/vale/scoping#markup-based-configuration).
+[Vale's documentation](https://vale.sh/docs/topics/scoping/).
### Disable markdownlint tests
diff --git a/doc/development/documentation/versions.md b/doc/development/documentation/versions.md
index 067c37d30aa..3679c731a77 100644
--- a/doc/development/documentation/versions.md
+++ b/doc/development/documentation/versions.md
@@ -182,7 +182,7 @@ GitLab supports the current major version and two previous major versions.
For example, if 15.0 is the current major version, all major and minor releases of
GitLab 15.0, 14.0, and 13.0 are supported.
-[View the list of supported versions](https://about.gitlab.com/support/statement-of-support.html#version-support).
+[View the list of supported versions](https://about.gitlab.com/support/statement-of-support/#version-support).
If you see version history items or inline text that refers to unsupported versions, you can remove it.
@@ -198,8 +198,8 @@ We cannot guarantee future feature work, and promises
like these can raise legal issues. Instead, say that an issue exists.
For example:
-- Support for improvements is tracked `[in this issue](LINK)`.
-- You cannot do this thing, but `[an issue exists](LINK)` to change this behavior.
+- Support for improvements is proposed in issue `[issue-number](LINK-TO-ISSUE)`.
+- You cannot do this thing, but issue `[issue-number](LINK-TO-ISSUE)` proposes to change this behavior.
You can say that we plan to remove a feature.
diff --git a/doc/development/ee_features.md b/doc/development/ee_features.md
index 28cf6d4e1e3..777bc77875e 100644
--- a/doc/development/ee_features.md
+++ b/doc/development/ee_features.md
@@ -6,8 +6,10 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Guidelines for implementing Enterprise Edition features
-- **Write the code and the tests.**: As with any code, EE features should have
- good test coverage to prevent regressions.
+- **Place code in `ee/`**: Put all Enterprise Edition (EE) inside the `ee/` top-level directory. The
+ rest of the code must be as close to the Community Edition (CE) files as possible.
+- **Write tests**: As with any code, EE features must have good test coverage to prevent
+ regressions. All `ee/` code must have corresponding tests in `ee/`.
- **Write documentation.**: Add documentation to the `doc/` directory. Describe
the feature and include screenshots, if applicable. Indicate [what editions](documentation/styleguide/index.md#product-tier-badges)
the feature applies to.
@@ -16,54 +18,72 @@ info: To determine the technical writer assigned to the Stage/Group associated w
[EE features list](https://about.gitlab.com/features/).
<!-- markdownlint-enable MD044 -->
-## Act as SaaS
+## Implement a new EE feature
-When developing locally, there are times when you need your instance to act like the SaaS version of the product.
-In those instances, you can simulate SaaS by exporting an environment variable as seen below:
+If you're developing a GitLab Starter, GitLab Premium, or GitLab Ultimate licensed feature, use these steps to
+add your new feature or extend it.
-```shell
-export GITLAB_SIMULATE_SAAS=1
-```
+GitLab license features are added to [`ee/app/models/gitlab_subscriptions/features.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/models/gitlab_subscriptions/features.rb). To determine how
+to modify this file, first discuss how your feature fits into our licensing with your Product Manager.
-There are many ways to pass an environment variable to your local GitLab instance.
-For example, you can create a `env.runit` file in the root of your GDK with the above snippet.
+Use the following questions to guide you:
-## Act as CE when unlicensed
+1. Is this a new feature, or are you extending an existing licensed feature?
+ - If your feature already exists, you don't have to modify `features.rb`, but you
+ must locate the existing feature identifier to [guard it](#guard-your-ee-feature).
+ - If this is a new feature, decide on an identifier, such as `my_feature_name`, to add to the
+ `features.rb` file.
+1. Is this a **GitLab Starter**, **GitLab Premium**, or **GitLab Ultimate** feature?
+ - Based on the plan you choose to use the feature in, add the feature identifier to `STARTER_FEATURES`,
+ `PREMIUM_FEATURES`, or `ULTIMATE_FEATURES`.
+1. Will this feature be available globally (system-wide at the GitLab instance level)?
+ - Features such as [Geo](../administration/geo/index.md) and
+ [Database Load Balancing](../administration/postgresql/database_load_balancing.md) are used by the entire instance
+ and cannot be restricted to individual user namespaces. These features are defined in the instance license.
+ Add these features to `GLOBAL_FEATURES`.
-Since the implementation of
-[GitLab CE features to work with unlicensed EE instance](https://gitlab.com/gitlab-org/gitlab/-/issues/2500)
-GitLab Enterprise Edition should work like GitLab Community Edition
-when no license is active. So EE features always should be guarded by
-`project.feature_available?` or `group.licensed_feature_available?` (or
-`License.feature_available?` if it is a system-wide feature).
+### Guard your EE feature
-Frontend features should be guarded by pushing a flag from the backend by [using `push_licensed_feature`](licensed_feature_availability.md#restricting-frontend-features), and checked using `this.glFeatures.someFeature` in the frontend. For example:
+A licensed feature can only be available to licensed users. You must add a check or guard
+to determine if users have access to the feature.
-```html
-<script>
-import glFeatureFlagMixin from '~/vue_shared/mixins/gl_feature_flags_mixin';
+To guard your licensed feature:
-export default {
- mixins: [glFeatureFlagMixin()],
- components: {
- EEComponent: () => import('ee_component/components/test.vue'),
- },
- computed: {
- shouldRenderComponent() {
- return this.glFeatures.myEEFeature;
- }
- },
-};
-</script>
+1. Locate your feature identifier in `ee/app/models/gitlab_subscriptions/features.rb`.
+1. Use the following methods, where `my_feature_name` is your feature
+ identifier:
-<template>
- <div>
- <ee-component v-if="shouldRenderComponent"/>
- </div>
-</template>
-```
+ - In a project context:
+
+ ```ruby
+ my_project.licensed_feature_available?(:my_feature_name) # true if available for my_project
+ ```
+
+ - In a group or user namespace context:
+
+ ```ruby
+ my_group.licensed_feature_available?(:my_feature_name) # true if available for my_group
+ ```
-Look in `ee/app/models/license.rb` for the names of the licensed features.
+ - For a global (system-wide) feature:
+
+ ```ruby
+ License.feature_available?(:my_feature_name) # true if available in this instance
+ ```
+
+1. Optional. If your global feature is also available to namespaces with a paid plan, combine two
+feature identifiers to allow both admins and group users. For example:
+
+ ```ruby
+ License.feature_available?(:my_feature_name) || group.licensed_feature_available?(:my_feature_name_for_namespace) # Both admins and group members can see this EE feature
+ ```
+
+### Simulate a CE instance when unlicensed
+
+After the implementation of
+[GitLab CE features to work with unlicensed EE instance](https://gitlab.com/gitlab-org/gitlab/-/issues/2500)
+GitLab Enterprise Edition works like GitLab Community Edition
+when no license is active.
CE specs should remain untouched as much as possible and extra specs
should be added for EE. Licensed features can be stubbed using the
@@ -74,7 +94,7 @@ setting the [`FOSS_ONLY` environment variable](https://gitlab.com/gitlab-org/git
to something that evaluates as `true`. The same works for running tests
(for example `FOSS_ONLY=1 yarn jest`).
-### Running feature specs as CE
+#### Run feature specs as CE
When running [feature specs](testing_guide/best_practices.md#system--feature-tests)
as CE, you should ensure that the edition of backend and frontend match.
@@ -98,7 +118,28 @@ To do so:
bin/rspec spec/features/<path_to_your_spec>
```
-## CI pipelines in a FOSS context
+### Simulate a SaaS instance
+
+If you're developing locally and need your instance to act like the SaaS version of the product,
+you can simulate SaaS by exporting an environment variable:
+
+```shell
+export GITLAB_SIMULATE_SAAS=1
+```
+
+There are many ways to pass an environment variable to your local GitLab instance.
+For example, you can create a `env.runit` file in the root of your GDK with the above snippet.
+
+#### Allow use of licensed EE feature
+
+To enable plans per namespace turn on the `Allow use of licensed EE features` option from the settings page.
+This will make licensed EE features available to projects only if the project namespace's plan includes the feature
+or if the project is public. To enable it:
+
+1. If you are developing locally, follow the steps in [Simulate a SaaS instance](#simulate-a-saas-instance) to make the option available.
+1. Visit Admin > Settings > General > "Account and limit" and enable "Allow use of licensed EE features".
+
+### Run CI pipelines in a FOSS context
By default, merge request pipelines for development run in an EE-context only. If you are
developing features that differ between FOSS and EE, you may wish to run pipelines in a
@@ -108,10 +149,7 @@ To run pipelines in both contexts, add the `~"pipeline:run-as-if-foss"` label to
See the [As-if-FOSS jobs](pipelines.md#as-if-foss-jobs) pipelines documentation for more information.
-## Separation of EE code
-
-All EE code should be put inside the `ee/` top-level directory. The
-rest of the code should be as close to the CE files as possible.
+## Separation of EE code in the backend
### EE-only features
@@ -144,7 +182,7 @@ To test an EE class that doesn't exist in CE, create the spec file as you normal
would in the `ee/spec` directory, but without the second `ee/` subdirectory.
For example, a class `ee/app/models/vulnerability.rb` would have its tests in `ee/spec/models/vulnerability_spec.rb`.
-### EE features based on CE features
+### Extend CE features with EE backend code
For features that build on existing CE features, write a module in the `EE`
namespace and inject it in the CE class, on the last line of the file that the
@@ -243,8 +281,8 @@ There are a few gotchas with it:
overriding the method, because we can't know when the overridden method
(that is, calling `super` in the overriding method) would want to stop early.
In this case, we shouldn't just override it, but update the original method
- to make it call the other method we want to extend, like a [template method
- pattern](https://en.wikipedia.org/wiki/Template_method_pattern).
+ to make it call the other method we want to extend, like a
+ [template method pattern](https://en.wikipedia.org/wiki/Template_method_pattern).
For example, given this base:
```ruby
@@ -633,7 +671,7 @@ might need different strategies to extend it. To apply different strategies
easily, we would use `extend ActiveSupport::Concern` in the EE module.
Put the EE module files following
-[EE features based on CE features](#ee-features-based-on-ce-features).
+[Extend CE features with EE backend code](#extend-ce-features-with-ee-backend-code).
#### EE API routes
@@ -1009,9 +1047,9 @@ FactoryBot.define do
end
```
-## JavaScript code in `assets/javascripts/`
+## Separate of EE code in the frontend
-To separate EE-specific JS-files we should also move the files into an `ee` folder.
+To separate EE-specific JS-files, move the files into an `ee` folder.
For example there can be an
`app/assets/javascripts/protected_branches/protected_branches_bundle.js` and an
@@ -1032,40 +1070,123 @@ import bundle from 'ee/protected_branches/protected_branches_bundle.js';
import bundle from 'ee_else_ce/protected_branches/protected_branches_bundle.js';
```
-See the frontend guide [performance section](fe_guide/performance.md) for
-information on managing page-specific JavaScript within EE.
+### Add new EE-only features in the frontend
+
+If the feature being developed is not present in CE, add your entry point in
+`ee/`. For example:
+
+```shell
+# Add HTML element to mount
+ee/app/views/admin/geo/designs/index.html.haml
+
+# Init the application
+ee/app/assets/javascripts/pages/ee_only_feature/index.js
+
+# Mount the feature
+ee/app/assets/javascripts/ee_only_feature/index.js
+```
+
+Feature guarding `licensed_feature_available?` and `License.feature_available?` typical
+occurs in the controller, as described in the [backend guide](#ee-only-features).
+
+#### Test EE-only features
+
+Add your EE tests to `ee/spec/frontend/` following the same directory structure you use for CE.
+
+### Extend CE features with EE frontend code
+
+Use the [`push_licensed_feature`](#guard-your-ee-feature) to guard frontend features that extend
+existing views:
+
+```ruby
+# ee/app/controllers/ee/admin/my_controller.rb
+before_action do
+ push_licensed_feature(:my_feature_name) # for global features
+end
+```
+
+```ruby
+# ee/app/controllers/ee/group/my_controller.rb
+before_action do
+ push_licensed_feature(:my_feature_name, @group) # for group pages
+end
+```
+
+```ruby
+# ee/app/controllers/ee/project/my_controller.rb
+before_action do
+ push_licensed_feature(:my_feature_name, @group) # for group pages
+ push_licensed_feature(:my_feature_name, @project) # for project pages
+end
+```
+
+Verify your feature appears in `gon.licensed_features` in the browser console.
-## Vue code in `assets/javascript`
+#### Extend Vue applications with EE Vue components
-### script tag
+EE licensed features that enhance existing functionality in the UI add new
+elements or interactions to your Vue application as components.
-#### Child Component only used in EE
+To separate template differences, use a child EE component to separate Vue template differences.
+You must import the EE component [asynchronously](https://vuejs.org/v2/guide/components-dynamic-async.html#Async-Components).
-To separate Vue template differences we should [import the components asynchronously](https://vuejs.org/v2/guide/components-dynamic-async.html#Async-Components).
+This allows GitLab to load the correct component in EE, while in CE GitLab loads an empty component
+that renders nothing. This code **must** exist in the CE repository, in addition to the EE repository.
-Doing this allows for us to load the correct component in EE while in CE
-we can load a empty component that renders nothing. This code **should**
-exist in the CE repository as well as the EE repository.
+A CE component acts as the entry point to your EE feature. To add a EE component,
+locate it the `ee/` directory and add it with `import('ee_component/...')`:
```html
<script>
+// app/assets/javascripts/feature/components/form.vue
+
export default {
+ mixins: [glFeatureFlagMixin()],
components: {
- EEComponent: () => import('ee_component/components/test.vue'),
+ // Import an EE component from CE
+ MyEeComponent: () => import('ee_component/components/my_ee_component.vue'),
},
};
</script>
<template>
<div>
- <ee-component />
+ <!-- ... -->
+ <my-ee-component/>
+ <!-- ... -->
</div>
</template>
```
-#### For JS code that is EE only, like props, computed properties, methods, etc
+Check `glFeatures` to ensure that the Vue components are guarded. The components render only when
+the license is present.
-- Please do not use mixins unless ABSOLUTELY NECESSARY. Please try to find an alternative pattern.
+```html
+<script>
+// ee/app/assets/javascripts/feature/components/special_component.vue
+
+import glFeatureFlagMixin from '~/vue_shared/mixins/gl_feature_flags_mixin';
+
+export default {
+ mixins: [glFeatureFlagMixin()],
+ computed: {
+ shouldRenderComponent() {
+ // Comes from gon.licensed_features as a camel-case version of `my_feature_name`
+ return this.glFeatures.myFeatureName;
+ }
+ },
+};
+</script>
+
+<template>
+ <div v-if="shouldRenderComponent">
+ <!-- EE licensed feature UI -->
+ </div>
+</template>
+```
+
+NOTE:
+Do not use mixins unless ABSOLUTELY NECESSARY. Try to find an alternative pattern.
##### Recommended alternative approach (named/scoped slots)
@@ -1138,11 +1259,65 @@ export default {
**For EE components that need different results for the same computed values, we can pass in props to the CE wrapper as seen in the example.**
- **EE Child components**
- - Since we are using the asynchronous loading to check which component to load, we'd still use the component's name, check [this example](#child-component-only-used-in-ee).
+ - Since we are using the asynchronous loading to check which component to load, we'd still use the component's name, check [this example](#extend-vue-applications-with-ee-vue-components).
- **EE extra HTML**
- For the templates that have extra HTML in EE we should move it into a new component and use the `ee_else_ce` dynamic import
+#### Extend other JS code
+
+To extend JS files, complete the following steps:
+
+1. Use the `ee_else_ce` helper, where that EE only code must be inside the `ee/` folder.
+ 1. Create an EE file with only the EE, and extend the CE counterpart.
+ 1. For code inside functions that can't be extended, move the code to a new file and use `ee_else_ce` helper:
+
+```javascript
+ import eeCode from 'ee_else_ce/ee_code';
+
+ function test() {
+ const test = 'a';
+
+ eeCode();
+
+ return test;
+ }
+```
+
+In some cases, you'll need to extend other logic in your application. To extend your JS
+modules, create an EE version of the file and extend it with your custom logic:
+
+```javascript
+// app/assets/javascripts/feature/utils.js
+
+export const myFunction = () => {
+ // ...
+};
+
+// ... other CE functions ...
+```
+
+```javascript
+// ee/app/assets/javascripts/feature/utils.js
+import {
+ myFunction as ceMyFunction,
+} from '~/feature/utils';
+
+/* eslint-disable import/export */
+
+// Export same utils as CE
+export * from '~/feature/utils';
+
+// Only override `myFunction`
+export const myFunction = () => {
+ const result = ceMyFunction();
+ // add EE feature logic
+ return result;
+};
+
+/* eslint-enable import/export */
+```
+
#### Testing modules using EE/CE aliases
When writing Frontend tests, if the module under test imports other modules with `ee_else_ce/...` and these modules are also needed by the relevant test, then the relevant test **must** import these modules with `ee_else_ce/...`. This avoids unexpected EE or FOSS failures, and helps ensure the EE behaves like CE when it is unlicensed.
@@ -1185,29 +1360,7 @@ describe('ComponentUnderTest', () => {
```
-### Non Vue Files
-
-For regular JS files, the approach is similar.
-
-1. We keep using the [`ee_else_ce`](../development/ee_features.md#javascript-code-in-assetsjavascripts) helper, this means that EE only code should be inside the `ee/` folder.
- 1. An EE file should be created with the EE only code, and it should extend the CE counterpart.
- 1. For code inside functions that can't be extended, the code should be moved into a new file and we should use `ee_else_ce` helper:
-
-#### Example
-
-```javascript
- import eeCode from 'ee_else_ce/ee_code';
-
- function test() {
- const test = 'a';
-
- eeCode();
-
- return test;
- }
-```
-
-## SCSS code in `assets/stylesheets`
+#### SCSS code in `assets/stylesheets`
If a component you're adding styles for is limited to EE, it is better to have a
separate SCSS file in an appropriate directory within `app/assets/stylesheets`.
@@ -1218,9 +1371,8 @@ styles are usually kept in a stylesheet that is common for both CE and EE, and i
to isolate such ruleset from rest of CE rules (along with adding comment describing the same)
to avoid conflicts during CE to EE merge.
-### Bad
-
```scss
+// Bad
.section-body {
.section-title {
background: $gl-header-color;
@@ -1234,9 +1386,8 @@ to avoid conflicts during CE to EE merge.
}
```
-### Good
-
```scss
+// Good
.section-body {
.section-title {
background: $gl-header-color;
@@ -1252,7 +1403,7 @@ to avoid conflicts during CE to EE merge.
// EE-specific end
```
-## GitLab-svgs
+### GitLab-svgs
Conflicts in `app/assets/images/icons.json` or `app/assets/images/icons.svg` can
be resolved simply by regenerating those assets with
diff --git a/doc/development/elasticsearch.md b/doc/development/elasticsearch.md
index d32ceb43ce9..47942817790 100644
--- a/doc/development/elasticsearch.md
+++ b/doc/development/elasticsearch.md
@@ -38,7 +38,7 @@ Additionally, if you need large repositories or multiple forks for testing, plea
The Elasticsearch integration depends on an external indexer. We ship an [indexer written in Go](https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer). The user must trigger the initial indexing via a Rake task but, after this is done, GitLab itself will trigger reindexing when required via `after_` callbacks on create, update, and destroy that are inherited from [`/ee/app/models/concerns/elastic/application_versioned_search.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/models/concerns/elastic/application_versioned_search.rb).
-After initial indexing is complete, create, update, and delete operations for all models except projects (see [#207494](https://gitlab.com/gitlab-org/gitlab/-/issues/207494)) are tracked in a Redis [`ZSET`](https://redis.io/topics/data-types#sorted-sets). A regular `sidekiq-cron` `ElasticIndexBulkCronWorker` processes this queue, updating many Elasticsearch documents at a time with the [Bulk Request API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html).
+After initial indexing is complete, create, update, and delete operations for all models except projects (see [#207494](https://gitlab.com/gitlab-org/gitlab/-/issues/207494)) are tracked in a Redis [`ZSET`](https://redis.io/docs/manual/data-types/#sorted-sets). A regular `sidekiq-cron` `ElasticIndexBulkCronWorker` processes this queue, updating many Elasticsearch documents at a time with the [Bulk Request API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html).
Search queries are generated by the concerns found in [`ee/app/models/concerns/elastic`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/ee/app/models/concerns/elastic). These concerns are also in charge of access control, and have been a historic source of security bugs so please pay close attention to them!
@@ -277,8 +277,8 @@ These Advanced Search migrations, like any other GitLab changes, need to support
Depending on the order of deployment, it's possible that the migration
has started or finished and there's still a server running the application code from before the
-migration. We need to take this into consideration until we can [ensure all Advanced Search migrations
-start after the deployment has finished](https://gitlab.com/gitlab-org/gitlab/-/issues/321619).
+migration. We need to take this into consideration until we can
+[ensure all Advanced Search migrations start after the deployment has finished](https://gitlab.com/gitlab-org/gitlab/-/issues/321619).
### Reverting a migration
@@ -317,9 +317,8 @@ safely can.
We choose to use GitLab major version upgrades as a safe time to remove
backwards compatibility for indices that have not been fully migrated. We
-[document this in our upgrade
-documentation](../update/index.md#upgrading-to-a-new-major-version). We also
-choose to replace the migration code with the halted migration
+[document this in our upgrade documentation](../update/index.md#upgrading-to-a-new-major-version).
+We also choose to replace the migration code with the halted migration
and remove tests so that:
- We don't need to maintain any code that is called from our Advanced Search
@@ -381,7 +380,7 @@ the volume of updates.
All of the indexing happens in Sidekiq, so much of the relevant logs for the
Elasticsearch integration can be found in
-[`sidekiq.log`](../administration/logs.md#sidekiqlog). In particular, all
+[`sidekiq.log`](../administration/logs/index.md#sidekiqlog). In particular, all
Sidekiq workers that make requests to Elasticsearch in any way will log the
number of requests and time taken querying/writing to Elasticsearch. This can
be useful to understand whether or not your cluster is keeping up with
@@ -390,26 +389,25 @@ indexing.
Searching Elasticsearch is done via ordinary web workers handling requests. Any
requests to load a page or make an API request, which then make requests to
Elasticsearch, will log the number of requests and the time taken to
-[`production_json.log`](../administration/logs.md#production_jsonlog). These
+[`production_json.log`](../administration/logs/index.md#production_jsonlog). These
logs will also include the time spent on Database and Gitaly requests, which
may help to diagnose which part of the search is performing poorly.
There are additional logs specific to Elasticsearch that are sent to
-[`elasticsearch.log`](../administration/logs.md#elasticsearchlog)
+[`elasticsearch.log`](../administration/logs/index.md#elasticsearchlog)
that may contain information to help diagnose performance issues.
### Performance Bar
-Elasticsearch requests will be displayed in the [`Performance
-Bar`](../administration/monitoring/performance/performance_bar.md), which can
+Elasticsearch requests will be displayed in the
+[`Performance Bar`](../administration/monitoring/performance/performance_bar.md), which can
be used both locally in development and on any deployed GitLab instance to
diagnose poor search performance. This will show the exact queries being made,
which is useful to diagnose why a search might be slow.
### Correlation ID and `X-Opaque-Id`
-Our [correlation
-ID](distributed_tracing.md#developer-guidelines-for-working-with-correlation-ids)
+Our [correlation ID](distributed_tracing.md#developer-guidelines-for-working-with-correlation-ids)
is forwarded by all requests from Rails to Elasticsearch as the
[`X-Opaque-Id`](https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html#_identifying_running_tasks)
header which allows us to track any
@@ -477,13 +475,13 @@ documented here in case it is useful for others. The relevant logs that could
theoretically be used to figure out what needs to be replayed are:
1. All non-repository updates that were synced can be found in
- [`elasticsearch.log`](../administration/logs.md#elasticsearchlog) by
+ [`elasticsearch.log`](../administration/logs/index.md#elasticsearchlog) by
searching for
[`track_items`](https://gitlab.com/gitlab-org/gitlab/-/blob/1e60ea99bd8110a97d8fc481e2f41cab14e63d31/ee/app/services/elastic/process_bookkeeping_service.rb#L25)
and these can be replayed by sending these items again through
`::Elastic::ProcessBookkeepingService.track!`
1. All repository updates that occurred can be found in
- [`elasticsearch.log`](../administration/logs.md#elasticsearchlog) by
+ [`elasticsearch.log`](../administration/logs/index.md#elasticsearchlog) by
searching for
[`indexing_commit_range`](https://gitlab.com/gitlab-org/gitlab/-/blob/6f9d75dd3898536b9ec2fb206e0bd677ab59bd6d/ee/lib/gitlab/elastic/indexer.rb#L41).
Replaying these requires resetting the
@@ -492,13 +490,13 @@ theoretically be used to figure out what needs to be replayed are:
the project using
[`ElasticCommitIndexerWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/elastic_commit_indexer_worker.rb)
1. All project deletes that occurred can be found in
- [`sidekiq.log`](../administration/logs.md#sidekiqlog) by searching for
+ [`sidekiq.log`](../administration/logs/index.md#sidekiqlog) by searching for
[`ElasticDeleteProjectWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/elastic_delete_project_worker.rb).
These updates can be replayed by triggering another
`ElasticDeleteProjectWorker`.
-With the above methods and taking regular [Elasticsearch
-snapshots](https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html)
+With the above methods and taking regular
+[Elasticsearch snapshots](https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html)
we should be able to recover from different kinds of data loss issues in a
relatively short period of time compared to indexing everything from
scratch.
diff --git a/doc/development/emails.md b/doc/development/emails.md
index a5c2789a3ea..1b3c9226dd8 100644
--- a/doc/development/emails.md
+++ b/doc/development/emails.md
@@ -60,7 +60,7 @@ See the [Rails guides](https://guides.rubyonrails.org/action_mailer_basics.html#
# The email address including the %{key} placeholder that will be replaced to reference the
# item being replied to. This %{key} should be included in its entirety within the email
# address and not replaced by another value.
- # For example: emailadress+%{key}@gmail.com.
+ # For example: emailaddress+%{key}@gmail.com.
# The placeholder must appear in the "user" part of the address (before the `@`). It can be omitted but some features,
# including Service Desk, may not work properly.
address: "gitlab-incoming+%{key}@gmail.com"
@@ -160,9 +160,10 @@ and Helm Chart configuration (see [example merge request](https://gitlab.com/git
#### Rationale
This was done because to avoid [thread deadlocks](https://github.com/ruby/net-imap/issues/14), `MailRoom` needs
-an updated version of the `net-imap` gem. However, this [version of the net-imap cannot be installed by an unprivileged
-user](https://github.com/ruby/net-imap/issues/14) due to [an error installing the digest
-gem](https://github.com/ruby/digest/issues/14). [This bug in the Ruby interpreter](https://bugs.ruby-lang.org/issues/17761) was fixed in Ruby
+an updated version of the `net-imap` gem. However, this
+[version of the net-imap cannot be installed by an unprivileged user](https://github.com/ruby/net-imap/issues/14) due to
+[an error installing the digest gem](https://github.com/ruby/digest/issues/14).
+[This bug in the Ruby interpreter](https://bugs.ruby-lang.org/issues/17761) was fixed in Ruby
3.0.2.
Updating the gem directly in the GitLab Rails `Gemfile` caused a [production incident](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/4053)
diff --git a/doc/development/event_store.md b/doc/development/event_store.md
index ffde51216cf..37035083e23 100644
--- a/doc/development/event_store.md
+++ b/doc/development/event_store.md
@@ -223,6 +223,15 @@ Gitlab::EventStore.publish(
)
```
+Events should be dispatched from the relevant Service class whenever possible. Some
+exceptions exist where we may allow models to publish events, like in state machine transitions.
+For example, instead of scheduling `Ci::BuildFinishedWorker`, which runs a collection of side effects,
+we could publish a `Ci::BuildFinishedEvent` and let other domains react asynchronously.
+
+`ActiveRecord` callbacks are too low-level to represent a domain event. They represent more database
+record changes. There might be cases where it would make sense, but we should consider
+those exceptions.
+
## Create a subscriber
A subscriber is a Sidekiq worker that includes the `Gitlab::EventStore::Subscriber` module.
@@ -320,7 +329,7 @@ it 'publishes a ProjectCreatedEvent with project id and namespace id' do
# The project ID will only be generated when the `create_project`
# is called in the expect block.
expected_data = { project_id: kind_of(Numeric), namespace_id: group_id }
-
+
expect { create_project(user, name: 'Project', path: 'project', namespace_id: group_id) }
.to publish_event(Projects::ProjectCreatedEvent)
.with(expected_data)
diff --git a/doc/development/fe_guide/accessibility.md b/doc/development/fe_guide/accessibility.md
index 2a1083d031f..bdd6c5d6e84 100644
--- a/doc/development/fe_guide/accessibility.md
+++ b/doc/development/fe_guide/accessibility.md
@@ -13,7 +13,7 @@ This page contains guidelines we should follow.
## Quick summary
-Since [no ARIA is better than bad ARIA](https://www.w3.org/TR/wai-aria-practices/#no_aria_better_bad_aria),
+Since [no ARIA is better than bad ARIA](https://w3c.github.io/aria-practices/#no_aria_better_bad_aria),
review the following recommendations before using `aria-*`, `role`, and `tabindex`.
Use semantic HTML, which has accessibility semantics baked in, and ideally test with
[relevant combinations of screen readers and browsers](https://www.accessibility-developer-guide.com/knowledge/screen-readers/relevant-combinations/).
diff --git a/doc/development/fe_guide/architecture.md b/doc/development/fe_guide/architecture.md
index afaf6df8f8a..1d08296eafc 100644
--- a/doc/development/fe_guide/architecture.md
+++ b/doc/development/fe_guide/architecture.md
@@ -11,7 +11,7 @@ When developing a feature that requires architectural design, or changing the fu
A Frontend Architect is an expert who makes high-level Frontend design decisions
and decides on technical standards, including coding standards and frameworks.
-Architectural decisions should be accessible to everyone, so please document
+Architectural decisions should be accessible to everyone, so document
them in the relevant Merge Request discussion or by updating our documentation
when appropriate.
@@ -19,7 +19,7 @@ You can find the Frontend Architecture experts on the [team page](https://about.
## Widget Architecture
-The [Plan stage](https://about.gitlab.com/handbook/engineering/development/dev/fe-plan/)
+The [Plan stage](https://about.gitlab.com/handbook/engineering/development/dev/plan-project-management/)
is refactoring the right sidebar to consist of **widgets**. They have a specific architecture to be
reusable and to expose an interface that can be used by external Vue applications on the page.
Learn more about the [widget architecture](widgets.md).
diff --git a/doc/development/fe_guide/content_editor.md b/doc/development/fe_guide/content_editor.md
index d4c29cb8a24..f262e48b6da 100644
--- a/doc/development/fe_guide/content_editor.md
+++ b/doc/development/fe_guide/content_editor.md
@@ -296,6 +296,7 @@ const builtInContentEditorExtensions = [
Dropcursor,
Emoji,
// Other extensions
+]
```
### The Markdown serializer
diff --git a/doc/development/fe_guide/design_anti_patterns.md b/doc/development/fe_guide/design_anti_patterns.md
index b7238bb2813..580f488bd33 100644
--- a/doc/development/fe_guide/design_anti_patterns.md
+++ b/doc/development/fe_guide/design_anti_patterns.md
@@ -9,7 +9,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
Anti-patterns may seem like good approaches at first, but it has been shown that they bring more ills than benefits. These should
generally be avoided.
-Throughout the GitLab codebase, there may be historic uses of these anti-patterns. Please [use discretion](https://about.gitlab.com/handbook/engineering/development/principles/#balance-refactoring-and-velocity)
+Throughout the GitLab codebase, there may be historic uses of these anti-patterns. [Use discretion](https://about.gitlab.com/handbook/engineering/development/principles/#balance-refactoring-and-velocity)
when figuring out whether or not to refactor, when touching code that uses one of these legacy patterns.
NOTE:
@@ -62,7 +62,7 @@ could be appropriate:
- When a responsibility is truly global and should be referenced across the application
(for example, an application-wide Event Bus).
-Even in these scenarios, please consider avoiding the Shared Global Object pattern because the
+Even in these scenarios, consider avoiding the Shared Global Object pattern because the
side-effects can be notoriously difficult to reason with.
### References
@@ -140,7 +140,7 @@ that a Singleton could be appropriate in the following rare cases:
- We need to manage some resource that **MUST** have just 1 instance (that is, some hardware restriction).
- There is a real [cross-cutting concern](https://en.wikipedia.org/wiki/Cross-cutting_concern) (for example, logging) and a Singleton provides the simplest API.
-Even in these scenarios, please consider avoiding the Singleton pattern.
+Even in these scenarios, consider avoiding the Singleton pattern.
### What alternatives are there to the Singleton pattern?
diff --git a/doc/development/fe_guide/development_process.md b/doc/development/fe_guide/development_process.md
index b4893fd4ef9..3273263de3b 100644
--- a/doc/development/fe_guide/development_process.md
+++ b/doc/development/fe_guide/development_process.md
@@ -16,7 +16,7 @@ Copy the content over to your issue or merge request and if something doesn't ap
This checklist is intended to help us during development of bigger features/refactorings. It is not a "use it always and every point always matches" list.
-Please use your best judgment when to use it and please contribute new points through merge requests if something comes to your mind.
+Use your best judgment when to use it and contribute new points through merge requests if something comes to your mind.
```markdown
### Frontend development
@@ -39,7 +39,7 @@ Please use your best judgment when to use it and please contribute new points th
- [ ] **Cookie Mode** Think about hiding the feature behind a cookie flag if the implementation is on top of existing features
- [ ] **New route** Are you refactoring something big then you might consider adding a new route where you implement the new feature and when finished delete the current route and rename the new one. (for example 'merge_request' and 'new_merge_request')
- [ ] **Setup** Is there any specific setup needed for your implementation (for example a kubernetes cluster)? Then let everyone know if it is not already mentioned where they can find documentation (if it doesn't exist - create it)
-- [ ] **Security** Are there any new security relevant implementations? Then please contact the security team for an app security review. If you are not sure ask our [domain expert](https://about.gitlab.com/handbook/engineering/frontend/#frontend-domain-experts)
+- [ ] **Security** Are there any new security relevant implementations? Then contact the security team for an app security review. If you are not sure ask our [domain expert](https://about.gitlab.com/handbook/engineering/frontend/#frontend-domain-experts)
#### During development
@@ -90,7 +90,7 @@ code that is unused:
### Merge Request Review
-With the purpose of being [respectful of others' time](https://about.gitlab.com/handbook/values/#be-respectful-of-others-time) please follow these guidelines when asking for a review:
+With the purpose of being [respectful of others' time](https://about.gitlab.com/handbook/values/#be-respectful-of-others-time), follow these guidelines when asking for a review:
- Make sure your Merge Request:
- milestone is set
@@ -101,7 +101,7 @@ With the purpose of being [respectful of others' time](https://about.gitlab.com/
- includes tests
- includes a changelog entry (when necessary)
- Before assigning to a maintainer, assign to a reviewer.
-- If you assigned a merge request or pinged someone directly, be patient because we work in different timezones and asynchronously. Unless the merge request is urgent (like fixing a broken default branch), please don't DM or reassign the merge request before waiting for a 24-hour window.
+- If you assigned a merge request or pinged someone directly, be patient because we work in different timezones and asynchronously. Unless the merge request is urgent (like fixing a broken default branch), don't DM or reassign the merge request before waiting for a 24-hour window.
- If you have a question regarding your merge request/issue, make it on the merge request/issue. When we DM each other, we no longer have a SSOT and [no one else is able to contribute](https://about.gitlab.com/handbook/values/#public-by-default).
- When you have a big **Draft** merge request with many changes, you're advised to get the review started before adding/removing significant code. Make sure it is assigned well before the release cut-off, as the reviewers/maintainers would always prioritize reviewing finished MRs before the **Draft** ones.
- Make sure to remove the `Draft:` title before the last round of review.
diff --git a/doc/development/fe_guide/frontend_faq.md b/doc/development/fe_guide/frontend_faq.md
index 39c39894dac..6a645416c0a 100644
--- a/doc/development/fe_guide/frontend_faq.md
+++ b/doc/development/fe_guide/frontend_faq.md
@@ -9,7 +9,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
## Rules of Frontend FAQ
1. **You talk about Frontend FAQ.**
- Please share links to it whenever applicable, so more eyes catch when content
+ Share links to it whenever applicable, so more eyes catch when content
gets outdated.
1. **Keep it short and simple.**
Whenever an answer needs more than two sentences it does not belong here.
@@ -17,7 +17,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
Linking to relevant source code, issue / epic, or other documentation helps
to understand the answer.
1. **If you see something, do something.**
- Please remove or update any content that is outdated as soon as you see it.
+ Remove or update any content that is outdated as soon as you see it.
## FAQ
@@ -101,7 +101,7 @@ axios.get(joinPaths(gon.gitlab_url, '-', 'foo'))
axios.get(joinPaths(gon.relative_url_root, '-', 'foo'))
```
-Also, please try not to hardcode paths in the Frontend, but instead receive them from the Backend (see next section).
+Also, try not to hardcode paths in the Frontend, but instead receive them from the Backend (see next section).
When referencing Backend rails paths, avoid using `*_url`, and use `*_path` instead.
Example:
diff --git a/doc/development/fe_guide/graphql.md b/doc/development/fe_guide/graphql.md
index 10db332d64c..442dda20d23 100644
--- a/doc/development/fe_guide/graphql.md
+++ b/doc/development/fe_guide/graphql.md
@@ -14,7 +14,7 @@ info: "See the Technical Writers assigned to Development Guidelines: https://abo
**General resources**:
- [📚 Official Introduction to GraphQL](https://graphql.org/learn/)
-- [📚 Official Introduction to Apollo](https://www.apollographql.com/docs/tutorial/introduction/)
+- [📚 Official Introduction to Apollo](https://www.apollographql.com/tutorials/fullstack-quickstart/introduction)
**GraphQL at GitLab**:
@@ -109,7 +109,7 @@ Default client accepts two parameters: `resolvers` and `config`.
If you are making multiple queries to the same Apollo client object you might encounter the following error: `Cache data may be lost when replacing the someProperty field of a Query object. To address this problem, either ensure all objects of SomeEntityhave an id or a custom merge function`. We are already checking `ID` presence for every GraphQL type that has an `ID`, so this shouldn't be the case. Most likely, the `SomeEntity` type doesn't have an `ID` property, and to fix this warning we need to define a custom merge function.
-We have some client-wide types with `merge: true` defined in the default client as [typePolicies](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/javascripts/lib/graphql.js) (this means that Apollo will merge existing and incoming responses in the case of subsequent queries). Please consider adding `SomeEntity` there or defining a custom merge function for it.
+We have some client-wide types with `merge: true` defined in the default client as [typePolicies](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/javascripts/lib/graphql.js) (this means that Apollo will merge existing and incoming responses in the case of subsequent queries). Consider adding `SomeEntity` there or defining a custom merge function for it.
## GraphQL Queries
@@ -212,7 +212,7 @@ with a **new and updated** object.
To facilitate the process of updating the cache and returning the new object we
use the library [Immer](https://immerjs.github.io/immer/).
-Please, follow these conventions:
+Follow these conventions:
- The updated cache is named `data`.
- The original cache data is named `sourceData`.
@@ -597,7 +597,7 @@ export default {
Note that, even if the directive evaluates to `false`, the guarded entity is sent to the backend and
matched against the GraphQL schema. So this approach requires that the feature-flagged entity
exists in the schema, even if the feature flag is disabled. When the feature flag is turned off, it
-is recommended that the resolver returns `null` at the very least using the same feature flag as the frontend. See the [API GraphQL guide](../api_graphql_styleguide.md#frontend-and-backend-feature-flag-strategies).
+is recommended that the resolver returns `null` at the very least using the same feature flag as the frontend. See the [API GraphQL guide](../api_graphql_styleguide.md#feature-flags).
##### Different versions of a query
@@ -729,8 +729,9 @@ In this case, we can either:
- Skip passing a cursor.
- Pass `null` explicitly to `after`.
-After data is fetched, we can use the `update`-hook as an opportunity [to customize
-the data that is set in the Vue component property](https://apollo.vuejs.org/api/smart-query.html#options). This allows us to get a hold of the `pageInfo` object among other data.
+After data is fetched, we can use the `update`-hook as an opportunity
+[to customize the data that is set in the Vue component property](https://apollo.vuejs.org/api/smart-query.html#options).
+This allows us to get a hold of the `pageInfo` object among other data.
In the `result`-hook, we can inspect the `pageInfo` object to see if we need to fetch
the next page. Note that we also keep a `requestCount` to ensure that the application
@@ -895,6 +896,51 @@ export default new VueApollo({
This is similar to the `DesignCollection` example above as new page results are appended to the
previous ones.
+For some cases, it's hard to define the correct `keyArgs` for the field because all
+the fields are updated. In this case, we can set `keyArgs` to `false`. This instructs
+Apollo Client to not perform any automatic merge, and fully rely on the logic we
+put into the `merge` function.
+
+For example, we have a query like this:
+
+```javascript
+query searchGroupsWhereUserCanTransfer {
+ currentUser {
+ id
+ groups {
+ nodes {
+ id
+ fullName
+ }
+ pageInfo {
+ ...PageInfo
+ }
+ }
+ }
+}
+```
+
+Here, the `groups` field doesn't have a good candidate for `keyArgs`: both
+`nodes` and `pageInfo` will be updated when we're fetching a second page.
+Setting `keyArgs` to `false` makes the update work as intended:
+
+```javascript
+typePolicies: {
+ UserCore: {
+ fields: {
+ groups: {
+ keyArgs: false,
+ },
+ },
+ },
+ GroupConnection: {
+ fields: {
+ nodes: concatPagination(),
+ },
+ },
+}
+```
+
#### Using a recursive query in components
When it is necessary to fetch all paginated data initially an Apollo query can do the trick for us.
@@ -1444,7 +1490,7 @@ describe('Some component', () => {
When mocking resolved values, ensure the structure of the response is the same
as the actual API response. For example, root property should be `data`.
-When testing queries, please keep in mind they are promises, so they need to be _resolved_ to render a result. Without resolving, we can check the `loading` state of the query:
+When testing queries, keep in mind they are promises, so they need to be _resolved_ to render a result. Without resolving, we can check the `loading` state of the query:
```javascript
it('renders a loading state', () => {
@@ -2001,11 +2047,15 @@ relative to `app/graphql/queries` folder: for example, if we need a
### Mocked client returns empty objects instead of mock response
-If your unit test is failing because response contains empty objects instead of mock data, you would need to add `__typename` field to the mocked response. This happens because mocked client (unlike the real one) does not populate the response with typenames and in some cases we need to do it manually so the client is able to recognize a GraphQL type.
+If your unit test is failing because the response contains empty objects instead of mock data, add
+`__typename` field to the mocked responses.
+
+Alternatively, [GraphQL query fixtures](../testing_guide/frontend_testing.md#graphql-query-fixtures)
+automatically adds the `__typename` for you upon generation.
### Warning about losing cache data
-Sometimes you can see a warning in the console: `Cache data may be lost when replacing the someProperty field of a Query object. To address this problem, either ensure all objects of SomeEntityhave an id or a custom merge function`. Please check section about [multiple queries](#multiple-client-queries-for-the-same-object) to resolve an issue.
+Sometimes you can see a warning in the console: `Cache data may be lost when replacing the someProperty field of a Query object. To address this problem, either ensure all objects of SomeEntityhave an id or a custom merge function`. Check section about [multiple queries](#multiple-client-queries-for-the-same-object) to resolve an issue.
```yaml
- current_route_path = request.fullpath.match(/-\/tree\/[^\/]+\/(.+$)/).to_a[1]
diff --git a/doc/development/fe_guide/icons.md b/doc/development/fe_guide/icons.md
index d107af156db..73f196ef51f 100644
--- a/doc/development/fe_guide/icons.md
+++ b/doc/development/fe_guide/icons.md
@@ -81,7 +81,7 @@ export default {
### Usage in HTML/JS
-Please use the following function inside JS to render an icon:
+Use the following function inside JS to render an icon:
`gl.utils.spriteIcon(iconName)`
## Loading icon
diff --git a/doc/development/fe_guide/index.md b/doc/development/fe_guide/index.md
index 544985d7edc..02086ec5f1b 100644
--- a/doc/development/fe_guide/index.md
+++ b/doc/development/fe_guide/index.md
@@ -147,7 +147,7 @@ Best practices for [client-side logging](logging.md) for GitLab frontend develop
## [Internationalization (i18n) and Translations](../i18n/externalization.md)
-Frontend internationalization support is described in [this document](../i18n/).
+Frontend internationalization support is described in [this document](../i18n/index.md).
The [externalization part of the guide](../i18n/externalization.md) explains the helpers/methods available.
## [Troubleshooting](troubleshooting.md)
diff --git a/doc/development/fe_guide/merge_request_widget_extensions.md b/doc/development/fe_guide/merge_request_widget_extensions.md
new file mode 100644
index 00000000000..a2ff10cc57f
--- /dev/null
+++ b/doc/development/fe_guide/merge_request_widget_extensions.md
@@ -0,0 +1,437 @@
+---
+stage: Create
+group: Code Review
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Merge request widget extensions **(FREE)**
+
+> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/44616) in GitLab 13.6.
+
+Extensions in the merge request widget enable you to add new features
+into the merge request widget that match the design framework.
+With extensions we get a lot of benefits out of the box without much effort required, like:
+
+- A consistent look and feel.
+- Tracking when the extension is opened.
+- Virtual scrolling for performance.
+
+## Usage
+
+To use extensions you must first create a new extension object to fetch the
+data to render in the extension. For a working example, refer to the example file in
+`app/assets/javascripts/vue_merge_request_widget/extensions/issues.js`.
+
+The basic object structure:
+
+```javascript
+export default {
+ name: '', // Required: This helps identify the widget
+ props: [], // Required: Props passed from the widget state
+ i18n: { // Required: Object to hold i18n text
+ label: '', // Required: Used for tooltips and aria-labels
+ loading: '', // Required: Loading text for when data is loading
+ },
+ expandEvent: '', // Optional: RedisHLL event name to track expanding content
+ enablePolling: false, // Optional: Tells extension to poll for data
+ modalComponent: null, // Optional: The component to use for the modal
+ telemetry: true, // Optional: Reports basic telemetry for the extension. Set to false to disable telemetry
+ computed: {
+ summary(data) {}, // Required: Level 1 summary text
+ statusIcon(data) {}, // Required: Level 1 status icon
+ tertiaryButtons() {}, // Optional: Level 1 action buttons
+ shouldCollapse() {}, // Optional: Add logic to determine if the widget can expand or not
+ },
+ methods: {
+ fetchCollapsedData(props) {}, // Required: Fetches data required for collapsed state
+ fetchFullData(props) {}, // Required: Fetches data for the full expanded content
+ fetchMultiData() {}, // Optional: Works in conjunction with `enablePolling` and allows polling multiple endpoints
+ },
+};
+```
+
+By following the same data structure, each extension can follow the same registering structure,
+but each extension can manage its data sources.
+
+After creating this structure, you must register it. You can register the extension at any
+point _after_ the widget has been created. To register a extension:
+
+```javascript
+// Import the register method
+import { registerExtension } from '~/vue_merge_request_widget/components/extensions';
+
+// Import the new extension
+import issueExtension from '~/vue_merge_request_widget/extensions/issues';
+
+// Register the imported extension
+registerExtension(issueExtension);
+```
+
+## Data fetching
+
+Each extension must fetch data. Fetching is handled when registering the extension,
+not by the core component itself. This approach allows for various different
+data fetching methods to be used, such as GraphQL or REST API calls.
+
+### API calls
+
+For performance reasons, it is best if the collapsed state fetches only the data required to
+render the collapsed state. This fetching happens in the `fetchCollapsedData` method.
+This method is called with the props as an argument, so you can easily access
+any paths set in the state.
+
+To allow the extension to set the data, this method **must** return the data. No
+special formatting is required. When the extension receives this data,
+it is set to `collapsedData`. You can access `collapsedData` in any computed property or
+method.
+
+When the user clicks **Expand**, the `fetchFullData` method is called. This method
+also gets called with the props as an argument. This method **must** also return
+the full data. However, this data must be correctly formatted to match the format
+mentioned in the data structure section.
+
+#### Technical debt
+
+For some of the current extensions, there is no split in data fetching. All the data
+is fetched through the `fetchCollapsedData` method. While less performant,
+it allows for faster iteration.
+
+To handle this the `fetchFullData` returns the data set through
+the `fetchCollapsedData` method call. In these cases, the `fetchFullData` must
+return a promise:
+
+```javascript
+fetchCollapsedData() {
+ return ['Some data'];
+},
+fetchFullData() {
+ return Promise.resolve(this.collapsedData)
+},
+```
+
+### Data structure
+
+The data returned from `fetchFullData` must match the format below. This format
+allows the core component to render the data in a way that matches
+the design framework. Any text properties can use the styling placeholders
+mentioned below:
+
+```javascript
+{
+ id: data.id, // Required: ID used as a key for each row
+ header: 'Header' || ['Header', 'sub-header'], // Required: String or array can be used for the header text
+ text: '', // Required: Main text for the row
+ subtext: '', // Optional: Smaller sub-text to be displayed below the main text
+ icon: { // Optional: Icon object
+ name: EXTENSION_ICONS.success, // Required: The icon name for the row
+ },
+ badge: { // Optional: Badge displayed after text
+ text: '', // Required: Text to be displayed inside badge
+ variant: '', // Optional: GitLab UI badge variant, defaults to info
+ },
+ link: { // Optional: Link to a URL displayed after text
+ text: '', // Required: Text of the link
+ href: '', // Optional: URL for the link
+ },
+ modal: { // Optional: Link to open a modal displayed after text
+ text: '', // Required: Text of the link
+ onClick: () => {} // Optional: Function to run when link is clicked, i.e. to set this.modalData
+ }
+ actions: [], // Optional: Action button for row
+ children: [], // Optional: Child content to render, structure matches the same structure
+}
+```
+
+### Polling
+
+To enable polling for an extension, an options flag must be present in the extension:
+
+```javascript
+export default {
+ //...
+ enablePolling: true
+};
+```
+
+This flag tells the base component we should poll the `fetchCollapsedData()`
+defined in the extension. Polling stops if the response has data, or if an error is present.
+
+When writing the logic for `fetchCollapsedData()`, a complete Axios response must be returned
+from the method. The polling utility needs data like polling headers to work correctly:
+
+```javascript
+export default {
+ //...
+ enablePolling: true
+ methods: {
+ fetchCollapsedData() {
+ return axios.get(this.reportPath)
+ },
+ },
+};
+```
+
+Most of the time the data returned from the extension's endpoint is not in the format
+the UI needs. We must format the data before setting the collapsed data in the base component.
+
+If the computed property `summary` can rely on `collapsedData`, you can format the data
+when `fetchFullData` is invoked:
+
+```javascript
+export default {
+ //...
+ enablePolling: true
+ methods: {
+ fetchCollapsedData() {
+ return axios.get(this.reportPath)
+ },
+ fetchFullData() {
+ return Promise.resolve(this.prepareReports());
+ },
+ // custom method
+ prepareReports() {
+ // unpack values from collapsedData
+ const { new_errors, existing_errors, resolved_errors } = this.collapsedData;
+
+ // perform data formatting
+
+ return [...newErrors, ...existingErrors, ...resolvedErrors]
+ }
+ },
+};
+```
+
+If the extension relies on `collapsedData` being formatted before invoking `fetchFullData()`,
+then `fetchCollapsedData()` must return the Axios response as well as the formatted data:
+
+```javascript
+export default {
+ //...
+ enablePolling: true
+ methods: {
+ fetchCollapsedData() {
+ return axios.get(this.reportPath).then(res => {
+ const formattedData = this.prepareReports(res.data)
+
+ return {
+ ...res,
+ data: formattedData,
+ }
+ })
+ },
+ // Custom method
+ prepareReports() {
+ // Unpack values from collapsedData
+ const { new_errors, existing_errors, resolved_errors } = this.collapsedData;
+
+ // Perform data formatting
+
+ return [...newErrors, ...existingErrors, ...resolvedErrors]
+ }
+ },
+};
+```
+
+If the extension must poll multiple endpoints at the same time, then `fetchMultiData`
+can be used to return an array of functions. A new `poll` object is created for each
+endpoint and they are polled separately. After all endpoints are resolved, polling is
+stopped and `setCollapsedData` is called with an array of `response.data`.
+
+```javascript
+export default {
+ //...
+ enablePolling: true
+ methods: {
+ fetchMultiData() {
+ return [
+ () => axios.get(this.reportPath1),
+ () => axios.get(this.reportPath2),
+ () => axios.get(this.reportPath3)
+ },
+ },
+};
+```
+
+WARNING:
+The function must return a `Promise` that resolves the `response` object.
+The implementation relies on the `POLL-INTERVAL` header to keep polling, therefore it is
+important not to alter the status code and headers.
+
+### Errors
+
+If `fetchCollapsedData()` or `fetchFullData()` methods throw an error:
+
+- The loading state of the extension is updated to `LOADING_STATES.collapsedError`
+ and `LOADING_STATES.expandedError` respectively.
+- The extensions header displays an error icon and updates the text to be either:
+ - The text defined in `$options.i18n.error`.
+ - "Failed to load" if `$options.i18n.error` is not defined.
+- The error is sent to Sentry to log that it occurred.
+
+To customise the error text, add it to the `i18n` object in your extension:
+
+```javascript
+export default {
+ //...
+ i18n: {
+ //...
+ error: __('Your error text'),
+ },
+};
+```
+
+## Telemetry
+
+The base implementation of the widget extension framework includes some telemetry events.
+Each widget reports:
+
+- `view`: When it is rendered to the screen.
+- `expand`: When it is expanded.
+- `full_report_clicked`: When an (optional) input is clicked to view the full report.
+- Outcome (`expand_success`, `expand_warning`, or `expand_failed`): One of three
+ additional events relating to the status of the widget when it was expanded.
+
+### Add new widgets
+
+When adding new widgets, the above events must be marked as `known`, and have metrics
+created, to be reportable.
+
+NOTE:
+Events that are only for EE should include `--ee` at the end of both shell commands below.
+
+To generate these known events for a single widget:
+
+1. Widgets should be named `Widget${CamelName}`.
+ - For example: a widget for **Test Reports** should be `WidgetTestReports`.
+1. Compute the widget name slug by converting the `${CamelName}` to lower-, snake-case.
+ - The previous example would be `test_reports`.
+1. Add the new widget name slug to `lib/gitlab/usage_data_counters/merge_request_widget_extension_counter.rb`
+ in the `WIDGETS` list.
+1. Ensure the GDK is running (`gdk start`).
+1. Generate known events on the command line with the following command.
+ Replace `test_reports` with your appropriate name slug:
+
+ ```shell
+ bundle exec rails generate gitlab:usage_metric_definition \
+ counts.i_code_review_merge_request_widget_test_reports_count_view \
+ counts.i_code_review_merge_request_widget_test_reports_count_full_report_clicked \
+ counts.i_code_review_merge_request_widget_test_reports_count_expand \
+ counts.i_code_review_merge_request_widget_test_reports_count_expand_success \
+ counts.i_code_review_merge_request_widget_test_reports_count_expand_warning \
+ counts.i_code_review_merge_request_widget_test_reports_count_expand_failed \
+ --dir=all
+ ```
+
+1. Modify each newly generated file to match the existing files for the merge request widget extension telemetry.
+ - Find existing examples by doing a glob search, like: `metrics/**/*_i_code_review_merge_request_widget_*`
+ - Roughly speaking, each file should have these values:
+ 1. `description` = A plain English description of this value. Review existing widget extension telemetry files for examples.
+ 1. `product_section` = `dev`
+ 1. `product_stage` = `create`
+ 1. `product_group` = `code_review`
+ 1. `product_category` = `code_review`
+ 1. `introduced_by_url` = `'[your MR]'`
+ 1. `options.events` = (the event in the command from above that generated this file, like `i_code_review_merge_request_widget_test_reports_count_view`)
+ - This value is how the telemetry events are linked to "metrics" so this is probably one of the more important values.
+ 1. `data_source` = `redis`
+ 1. `data_category` = `optional`
+1. Generate known HLL events on the command line with the following command.
+ Replace `test_reports` with your appropriate name slug.
+
+ ```shell
+ bundle exec rails generate gitlab:usage_metric_definition:redis_hll code_review \
+ i_code_review_merge_request_widget_test_reports_view \
+ i_code_review_merge_request_widget_test_reports_full_report_clicked \
+ i_code_review_merge_request_widget_test_reports_expand \
+ i_code_review_merge_request_widget_test_reports_expand_success \
+ i_code_review_merge_request_widget_test_reports_expand_warning \
+ i_code_review_merge_request_widget_test_reports_expand_failed \
+ --class_name=RedisHLLMetric
+ ```
+
+1. Repeat step 6, but change the `data_source` to `redis_hll`.
+1. Add each of the HLL metrics to `lib/gitlab/usage_data_counters/known_events/code_review_events.yml`:
+ 1. `name` = (the event)
+ 1. `redis_slot` = `code_review`
+ 1. `category` = `code_review`
+ 1. `aggregation` = `weekly`
+1. Add each event to the appropriate aggregates in `config/metrics/aggregates/code_review.yml`
+
+### Add new events
+
+If you are adding a new event to our known events, include the new event in the
+`KNOWN_EVENTS` list in `lib/gitlab/usage_data_counters/merge_request_widget_extension_counter.rb`.
+
+## Icons
+
+Level 1 and all subsequent levels can have their own status icons. To keep with
+the design framework, import the `EXTENSION_ICONS` constant
+from the `constants.js` file:
+
+```javascript
+import { EXTENSION_ICONS } from '~/vue_merge_request_widget/constants.js';
+```
+
+This constant has the below icons available for use. Per the design framework,
+only some of these icons should be used on level 1:
+
+- `failed`
+- `warning`
+- `success`
+- `neutral`
+- `error`
+- `notice`
+- `severityCritical`
+- `severityHigh`
+- `severityMedium`
+- `severityLow`
+- `severityInfo`
+- `severityUnknown`
+
+## Text styling
+
+Any area that has text can be styled with the placeholders below. This
+technique follows the same technique as `sprintf`. However, instead of specifying
+these through `sprintf`, the extension does this automatically.
+
+Every placeholder contains starting and ending tags. For example, `success` uses
+`Hello %{success_start}world%{success_end}`. The extension then
+adds the start and end tags with the correct styling classes.
+
+| Placeholder | Style |
+|-------------|-----------------------------------------|
+| success | `gl-font-weight-bold gl-text-green-500` |
+| danger | `gl-font-weight-bold gl-text-red-500` |
+| critical | `gl-font-weight-bold gl-text-red-800` |
+| same | `gl-font-weight-bold gl-text-gray-700` |
+| strong | `gl-font-weight-bold` |
+| small | `gl-font-sm` |
+
+## Action buttons
+
+You can add action buttons to all level 1 and 2 in each extension. These buttons
+are meant as a way to provide links or actions for each row:
+
+- Action buttons for level 1 can be set through the `tertiaryButtons` computed property.
+ This property should return an array of objects for each action button.
+- Action buttons for level 2 can be set by adding the `actions` key to the level 2 rows object.
+ The value for this key must also be an array of objects for each action button.
+
+Links must follow this structure:
+
+```javascript
+{
+ text: 'Click me',
+ href: this.someLinkHref,
+ target: '_blank', // Optional
+}
+```
+
+For internal action buttons, follow this structure:
+
+```javascript
+{
+ text: 'Click me',
+ onClick() {}
+}
+```
diff --git a/doc/development/fe_guide/performance.md b/doc/development/fe_guide/performance.md
index bcdc49a1070..2e1fabd739c 100644
--- a/doc/development/fe_guide/performance.md
+++ b/doc/development/fe_guide/performance.md
@@ -8,6 +8,21 @@ info: To determine the technical writer assigned to the Stage/Group associated w
Performance is an essential part and one of the main areas of concern for any modern application.
+## Monitoring
+
+We have a performance dashboard available in one of our [Grafana instances](https://dashboards.gitlab.net/d/000000043/sitespeed-page-summary?orgId=1). This dashboard automatically aggregates metric data from [sitespeed.io](https://www.sitespeed.io/) every 4 hours. These changes are displayed after a set number of pages are aggregated.
+
+These pages can be found inside text files in the [`sitespeed-measurement-setup` repository](https://gitlab.com/gitlab-org/frontend/sitespeed-measurement-setup) called [`gitlab`](https://gitlab.com/gitlab-org/frontend/sitespeed-measurement-setup/-/tree/master/gitlab)
+Any frontend engineer can contribute to this dashboard. They can contribute by adding or removing URLs of pages to the text files. The changes are pushed live on the next scheduled run after the changes are merged into `main`.
+
+There are 3 recommended high impact metrics (core web vitals) to review on each page:
+
+- [Largest Contentful Paint](https://web.dev/lcp/)
+- [First Input Delay](https://web.dev/fid/)
+- [Cumulative Layout Shift](https://web.dev/cls/)
+
+For these metrics, lower numbers are better as it means that the website is more performant.
+
## User Timing API
[User Timing API](https://developer.mozilla.org/en-US/docs/Web/API/User_Timing_API) is a web API
@@ -77,9 +92,9 @@ performance.getEntriesByType('mark');
performance.getEntriesByType('measure');
```
-Using `getEntriesByName()` or `getEntriesByType()` returns an Array of [the PerformanceMeasure
-objects](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceMeasure) which contain
-information about the measurement's start time and duration.
+Using `getEntriesByName()` or `getEntriesByType()` returns an Array of
+[the PerformanceMeasure objects](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceMeasure)
+which contain information about the measurement's start time and duration.
### User Timing API utility
@@ -220,7 +235,7 @@ Use the following rules when creating real-time solutions.
A `Poll-Interval: -1` means you should disable polling, and this must be implemented.
1. A response with HTTP status different from 2XX should disable polling as well.
1. Use a common library for polling.
-1. Poll on active tabs only. Please use [Visibility](https://github.com/ai/visibilityjs).
+1. Poll on active tabs only. Use [Visibility](https://github.com/ai/visibilityjs).
1. Use regular polling intervals, do not use backoff polling or jitter, as the interval is
controlled by the server.
1. The backend code is likely to be using ETags. You do not and should not check for status
@@ -434,7 +449,7 @@ Use `webpackChunkName` when generating dynamic imports as
it provides a deterministic filename for the chunk which can then be cached
in the browser across GitLab versions.
-More information is available in [webpack's code splitting documentation](https://webpack.js.org/guides/code-splitting/#dynamic-imports) and [vue's dynamic component documentation](https://vuejs.org/v2/guide/components-dynamic-async.html).
+More information is available in [webpack's code splitting documentation](https://webpack.js.org/guides/code-splitting/#dynamic-imports) and [vue's dynamic component documentation](https://v2.vuejs.org/v2/guide/components-dynamic-async.html).
### Minimizing page size
diff --git a/doc/development/fe_guide/source_editor.md b/doc/development/fe_guide/source_editor.md
index b06e341630f..88508e94380 100644
--- a/doc/development/fe_guide/source_editor.md
+++ b/doc/development/fe_guide/source_editor.md
@@ -129,7 +129,7 @@ with additional functions on the instance level:
Source Editor provides a universal, extensible editing tool to the whole product,
and doesn't depend on any particular group. Even though the Source Editor's core is owned by
-[Create::Editor FE Team](https://about.gitlab.com/handbook/engineering/development/dev/create-editor/),
+[Create::Editor FE Team](https://about.gitlab.com/handbook/engineering/development/dev/create/editor/),
any group can own the extensions—the main functional elements. The goal of
Source Editor extensions is to keep the editor's core slim and stable. Any
needed features can be added as extensions to this core. Any group can
diff --git a/doc/development/fe_guide/storybook.md b/doc/development/fe_guide/storybook.md
index 4c0e7b2612b..45342eb6d72 100644
--- a/doc/development/fe_guide/storybook.md
+++ b/doc/development/fe_guide/storybook.md
@@ -6,7 +6,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Storybook
-The Storybook for the `gitlab-org/gitlab` project is available on our [GitLab Pages site](https://gitlab-org.gitlab.io/gitlab/storybook).
+The Storybook for the `gitlab-org/gitlab` project is available on our [GitLab Pages site](https://gitlab-org.gitlab.io/gitlab/storybook/).
## Storybook in local development
diff --git a/doc/development/fe_guide/style/javascript.md b/doc/development/fe_guide/style/javascript.md
index d93dc8292d4..b86bdfafa21 100644
--- a/doc/development/fe_guide/style/javascript.md
+++ b/doc/development/fe_guide/style/javascript.md
@@ -123,7 +123,8 @@ things.map(parseInt);
things.map(Number);
```
-**PLEASE NOTE:** If the String could represent a non-integer (i.e., it includes a decimal), **do not** use `parseInt`. Consider `Number` or `parseFloat` instead.
+NOTE:
+If the String could represent a non-integer (i.e., it includes a decimal), **do not** use `parseInt`. Consider `Number` or `parseFloat` instead.
## CSS Selectors - Use `js-` prefix
diff --git a/doc/development/fe_guide/style/scss.md b/doc/development/fe_guide/style/scss.md
index 451b0c8a4c6..17e80762a38 100644
--- a/doc/development/fe_guide/style/scss.md
+++ b/doc/development/fe_guide/style/scss.md
@@ -12,7 +12,7 @@ easy to maintain, and performant for the end-user.
## Rules
-Our CSS is a mixture of current and legacy approaches. That means sometimes it may be difficult to follow this guide to the letter; it means you are likely to run into exceptions, where following the guide is difficult to impossible without major effort. In those cases, you may work with your reviewers and maintainers to identify an approach that does not fit these rules. Please endeavor to limit these cases.
+Our CSS is a mixture of current and legacy approaches. That means sometimes it may be difficult to follow this guide to the letter; it means you are likely to run into exceptions, where following the guide is difficult to impossible without major effort. In those cases, you may work with your reviewers and maintainers to identify an approach that does not fit these rules. Try to limit these cases.
### Utility Classes
diff --git a/doc/development/fe_guide/style/vue.md b/doc/development/fe_guide/style/vue.md
index 5c79d47e7b0..c9bd0e1b35a 100644
--- a/doc/development/fe_guide/style/vue.md
+++ b/doc/development/fe_guide/style/vue.md
@@ -9,7 +9,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
## Linting
We default to [eslint-vue-plugin](https://github.com/vuejs/eslint-plugin-vue), with the `plugin:vue/recommended`.
-Please check this [rules](https://github.com/vuejs/eslint-plugin-vue#bulb-rules) for more documentation.
+Check the [rules](https://github.com/vuejs/eslint-plugin-vue#bulb-rules) for more documentation.
## Basic Rules
@@ -448,9 +448,9 @@ Typically, when testing a Vue component, the component should be "re-mounted" in
To achieve this:
1. Create a mutable `wrapper` variable inside the top-level `describe` block.
-1. Mount the component using [`mount`](https://vue-test-utils.vuejs.org/api/#mount)/
-[`shallowMount`](https://vue-test-utils.vuejs.org/api/#shallowMount).
-1. Reassign the resulting [`Wrapper`](https://vue-test-utils.vuejs.org/api/wrapper/#wrapper)
+1. Mount the component using [`mount`](https://v1.test-utils.vuejs.org/api/#mount)/
+[`shallowMount`](https://v1.test-utils.vuejs.org/api/#shallowMount).
+1. Reassign the resulting [`Wrapper`](https://v1.test-utils.vuejs.org/api/wrapper/#wrapper)
instance to our `wrapper` variable.
Creating a global, mutable wrapper provides a number of advantages, including the ability to:
@@ -476,8 +476,8 @@ Creating a global, mutable wrapper provides a number of advantages, including th
To avoid duplicating our mounting logic, it's useful to define a `createComponent` factory function
that we can reuse in each test block. This is a closure which should reassign our `wrapper` variable
-to the result of [`mount`](https://vue-test-utils.vuejs.org/api/#mount) and
-[`shallowMount`](https://vue-test-utils.vuejs.org/api/#shallowMount):
+to the result of [`mount`](https://v1.test-utils.vuejs.org/api/#mount) and
+[`shallowMount`](https://v1.test-utils.vuejs.org/api/#shallowMount):
```javascript
import MyComponent from '~/path/to/my_component.vue';
@@ -579,9 +579,9 @@ the mounting function (`mount` or `shallowMount`) to be used to mount the compon
### Setting component state
-1. Avoid using [`setProps`](https://vue-test-utils.vuejs.org/api/wrapper/#setprops) to set
+1. Avoid using [`setProps`](https://v1.test-utils.vuejs.org/api/wrapper/#setprops) to set
component state wherever possible. Instead, set the component's
-[`propsData`](https://vue-test-utils.vuejs.org/api/options.html#propsdata) when mounting the component:
+[`propsData`](https://v1.test-utils.vuejs.org/api/options.html#propsdata) when mounting the component:
```javascript
// bad
@@ -659,7 +659,7 @@ The goal of this accord is to make sure we are all on the same page.
1. If an outside jQuery Event needs to be listen to inside the Vue application, you may use
jQuery event listeners.
1. We avoid adding new jQuery events when they are not required. Instead of adding new jQuery
- events take a look at [different methods to do the same task](https://vuejs.org/v2/api/#vm-emit).
+ events take a look at [different methods to do the same task](https://v2.vuejs.org/v2/api/#vm-emit).
1. You may query the `window` object one time, while bootstrapping your application for application
specific data (for example, `scrollTo` is ok to access anytime). Do this access during the
bootstrapping of your application.
diff --git a/doc/development/fe_guide/tooling.md b/doc/development/fe_guide/tooling.md
index 1c32647eefd..2bb6cbfaf7a 100644
--- a/doc/development/fe_guide/tooling.md
+++ b/doc/development/fe_guide/tooling.md
@@ -175,7 +175,7 @@ preferred editor (all major editors are supported) accordingly. We suggest
setting up Prettier to run when each file is saved. For instructions about using
Prettier in your preferred editor, see the [Prettier documentation](https://prettier.io/docs/en/editors.html).
-Please take care that you only let Prettier format the same file types as the global Yarn script does (`.js`, `.vue`, `.graphql`, and `.scss`). For example, you can exclude file formats in your Visual Studio Code settings file:
+Take care that you only let Prettier format the same file types as the global Yarn script does (`.js`, `.vue`, `.graphql`, and `.scss`). For example, you can exclude file formats in your Visual Studio Code settings file:
```json
"prettier.disableLanguages": [
diff --git a/doc/development/fe_guide/troubleshooting.md b/doc/development/fe_guide/troubleshooting.md
index 14943cca3ac..c0894621ed1 100644
--- a/doc/development/fe_guide/troubleshooting.md
+++ b/doc/development/fe_guide/troubleshooting.md
@@ -12,7 +12,7 @@ Running into a problem? Maybe this will help ¯\_(ツ)_/¯.
### This guide doesn't contain the issue I ran into
-If you run into a Frontend development issue that is not in this guide, please consider updating this guide with your issue and possible remedies. This way future adventurers can face these dragons with more success, being armed with your experience and knowledge.
+If you run into a Frontend development issue that is not in this guide, consider updating this guide with your issue and possible remedies. This way future adventurers can face these dragons with more success, being armed with your experience and knowledge.
## Testing issues
diff --git a/doc/development/fe_guide/view_component.md b/doc/development/fe_guide/view_component.md
index f4bb7ac3a2e..2e373e6933b 100644
--- a/doc/development/fe_guide/view_component.md
+++ b/doc/development/fe_guide/view_component.md
@@ -13,18 +13,24 @@ They are rendered server-side and can be seamlessly used with template languages
Refer to the official [documentation](https://viewcomponent.org/) to learn more or
watch this [introduction video](https://youtu.be/akRhUbvtnmo).
+## Browse components with Lookbook
+
+We have a [Lookbook](https://github.com/allmarkedup/lookbook) in [http://gdk.test:3000/rails/lookbook](http://gdk.test:3000/rails/lookbook) (only available in development mode) to browse and interact with ViewComponent previews.
+
## Pajamas components
Some of the components of our [Pajamas](https://design.gitlab.com) design system are
available as a ViewComponent in `app/components/pajamas`.
NOTE:
-We have a small but growing number of Pajamas components. Reach out to the
-[Foundations team](https://about.gitlab.com/handbook/engineering/development/dev/ecosystem/foundations/)
+We are still in the process of creating these components, so not every Pajamas component is available as ViewComponent.
+Reach out to the [Foundations team](https://about.gitlab.com/handbook/engineering/development/dev/ecosystem/foundations/)
if the component you are looking for is not yet available.
### Available components
+Consider this list a best effort. The full list can be found in [`app/components/pajamas`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/app/components/pajamas). Also see [our Lookbook](http://gdk.test:3000/rails/lookbook) for a more interactive way to browse our components.
+
#### Alert
The `Pajamas::AlertComponent` follows the [Pajamas Alert](https://design.gitlab.com/components/alert) specification.
@@ -147,6 +153,39 @@ If you want to add custom attributes to any of these or the card itself, use the
For the full list of options, see its
[source](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/components/pajamas/card_component.rb).
+#### Checkbox tag
+
+The `Pajamas::CheckboxTagComponent` follows the [Pajamas Checkbox](https://design.gitlab.com/components/checkbox) specification.
+
+The `name` argument and `label` slot are required.
+
+For example:
+
+```haml
+= render Pajamas::CheckboxTagComponent.new(name: 'project[initialize_with_sast]',
+ checkbox_options: { data: { qa_selector: 'initialize_with_sast_checkbox', track_label: track_label, track_action: 'activate_form_input', track_property: 'init_with_sast' } }) do |c|
+ = c.label do
+ = s_('ProjectsNew|Enable Static Application Security Testing (SAST)')
+ = c.help_text do
+ = s_('ProjectsNew|Analyze your source code for known security vulnerabilities.')
+ = link_to _('Learn more.'), help_page_path('user/application_security/sast/index'), target: '_blank', rel: 'noopener noreferrer', data: { track_action: 'followed' }
+```
+
+For the full list of options, see its
+[source](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/components/pajamas/checkbox_tag_component.rb).
+
+#### Checkbox
+
+The `Pajamas::CheckboxComponent` follows the [Pajamas Checkbox](https://design.gitlab.com/components/checkbox) specification.
+
+NOTE:
+`Pajamas::CheckboxComponent` is used internally by the [GitLab UI form builder](haml.md#use-the-gitlab-ui-form-builder) and requires an instance of [ActionView::Helpers::FormBuilder](https://api.rubyonrails.org/v6.1.0/classes/ActionView/Helpers/FormBuilder.html) to be passed as the `form` argument.
+It is preferred to use the [gitlab_ui_checkbox_component](haml.md#gitlab_ui_checkbox_component) method to render this ViewComponent.
+To use a checkbox without an instance of [ActionView::Helpers::FormBuilder](https://api.rubyonrails.org/v6.1.0/classes/ActionView/Helpers/FormBuilder.html) use [CheckboxTagComponent](#checkbox-tag).
+
+For the full list of options, see its
+[source](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/components/pajamas/checkbox_component.rb).
+
#### Toggle
The `Pajamas::ToggleComponent` follows the [Pajamas Toggle](https://design.gitlab.com/components/toggle) specification.
@@ -172,3 +211,5 @@ For the full list of options, see its
over creating plain Haml tags with CSS classes.
- If you are making changes to an existing Haml view and see, for example, a
button that is still implemented with plain Haml, consider migrating it to use a ViewComponent.
+- If you decide to create a new component, consider creating [previews](https://viewcomponent.org/guide/previews.html) for it, too.
+ This will help others to discover your component with Lookbook, also it makes it much easier to test its different states.
diff --git a/doc/development/fe_guide/vue.md b/doc/development/fe_guide/vue.md
index 7943ae119be..27660c0f5f7 100644
--- a/doc/development/fe_guide/vue.md
+++ b/doc/development/fe_guide/vue.md
@@ -71,7 +71,7 @@ component, is that you avoid creating a fixture or an HTML element in the unit t
##### `provide` and `inject`
-Vue supports dependency injection through [`provide` and `inject`](https://vuejs.org/v2/api/#provide-inject).
+Vue supports dependency injection through [`provide` and `inject`](https://v2.vuejs.org/v2/api/#provide-inject).
In the component the `inject` configuration accesses the values `provide` passes down.
This example of a Vue app initialization shows how the `provide` configuration passes a value from HAML to the component:
@@ -266,7 +266,7 @@ return new Vue({
#### Accessing feature flags
-Use the [`provide` and `inject`](https://vuejs.org/v2/api/#provide-inject) mechanisms
+Use the [`provide` and `inject`](https://v2.vuejs.org/v2/api/#provide-inject) mechanisms
in Vue to make feature flags available to any descendant components in a Vue
application. The `glFeatures` object is already provided in `commons/vue.js`, so
only the mixin is required to use the flags:
@@ -339,7 +339,7 @@ Check this [page](vuex.md) for more details.
### Mixing Vue and JavaScript classes (in the data function)
-In the [Vue documentation](https://vuejs.org/v2/api/#Options-Data) the Data function/object is defined as follows:
+In the [Vue documentation](https://v2.vuejs.org/v2/api/#Options-Data) the Data function/object is defined as follows:
> The data object for the Vue instance. Vue recursively converts its properties into getter/setters
to make it "reactive". The object must be plain: native objects such as browser API objects and
@@ -348,7 +348,7 @@ recommended to observe objects with their own stateful behavior.
Based on the Vue guidance:
-- **Do not** use or create a JavaScript class in your [data function](https://vuejs.org/v2/api/#data),
+- **Do not** use or create a JavaScript class in your [data function](https://v2.vuejs.org/v2/api/#data),
such as `user: new User()`.
- **Do not** add new JavaScript class implementations.
- **Do** use [GraphQL](../api_graphql_styleguide.md), [Vuex](vuex.md) or a set of components if
@@ -531,7 +531,7 @@ Each Vue component has a unique output. This output is always present in the ren
Although each method of a Vue component can be tested individually, our goal is to test the output
of the render function, which represents the state at all times.
-Visit the [Vue testing guide](https://vuejs.org/v2/guide/testing.html#Unit-Testing) for help
+Visit the [Vue testing guide](https://v2.vuejs.org/v2/guide/testing.html#Unit-Testing) for help
testing the rendered output.
Here's an example of a well structured unit test for [this Vue component](#appendix---vue-component-subject-under-test):
diff --git a/doc/development/fe_guide/vuex.md b/doc/development/fe_guide/vuex.md
index 064f01c8195..8bfb912161a 100644
--- a/doc/development/fe_guide/vuex.md
+++ b/doc/development/fe_guide/vuex.md
@@ -165,7 +165,7 @@ Instead of creating an mutation to toggle the loading state, we should:
As a result, we can dispatch the `fetchNamespace` action from the component and it is responsible to commit `REQUEST_NAMESPACE`, `RECEIVE_NAMESPACE_SUCCESS` and `RECEIVE_NAMESPACE_ERROR` mutations.
-> Previously, we were dispatching actions from the `fetchNamespace` action instead of committing mutation, so please don't be confused if you find a different pattern in the older parts of the codebase. However, we encourage leveraging a new pattern whenever you write new Vuex stores.
+> Previously, we were dispatching actions from the `fetchNamespace` action instead of committing mutation, so don't be confused if you find a different pattern in the older parts of the codebase. However, we encourage leveraging a new pattern whenever you write new Vuex stores.
By following this pattern we guarantee:
@@ -364,8 +364,8 @@ export default initialState => ({
We made the conscious decision to avoid this pattern to improve the ability to
discover and search our frontend codebase. The same applies
-when [providing data to a Vue app](vue.md#providing-data-from-haml-to-javascript). The reasoning for this is described in [this
-discussion](https://gitlab.com/gitlab-org/frontend/rfcs/-/issues/56#note_302514865):
+when [providing data to a Vue app](vue.md#providing-data-from-haml-to-javascript). The reasoning for this is described in
+[this discussion](https://gitlab.com/gitlab-org/frontend/rfcs/-/issues/56#note_302514865):
> Consider a `someStateKey` is being used in the store state. You _may_ not be
> able to grep for it directly if it was provided only by `el.dataset`. Instead,
diff --git a/doc/development/fe_guide/widgets.md b/doc/development/fe_guide/widgets.md
index 02876afe597..b54f9add97d 100644
--- a/doc/development/fe_guide/widgets.md
+++ b/doc/development/fe_guide/widgets.md
@@ -141,3 +141,7 @@ methods: {
```
[View an example of such a component.](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/javascripts/notes/components/sidebar_subscription.vue)
+
+## Merge request widgets
+
+Refer to the documentation specific to the [merge request widget extension framework](merge_request_widget_extensions.md).
diff --git a/doc/development/feature_categorization/index.md b/doc/development/feature_categorization/index.md
index b2d141798fa..a93ed58336d 100644
--- a/doc/development/feature_categorization/index.md
+++ b/doc/development/feature_categorization/index.md
@@ -10,8 +10,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
Each Sidekiq worker, controller action, or API endpoint
must declare a `feature_category` attribute. This attribute maps each
-of these to a [feature
-category](https://about.gitlab.com/handbook/product/categories/). This
+of these to a [feature category](https://about.gitlab.com/handbook/product/categories/). This
is done for error budgeting, alert routing, and team attribution.
The list of feature categories can be found in the file `config/feature_categories.yml`.
@@ -29,8 +28,7 @@ product categories. When this occurs, you can automatically update
and generate a new version of the file, which needs to be committed to
the repository.
-The [Scalability
-team](https://about.gitlab.com/handbook/engineering/infrastructure/team/scalability/)
+The [Scalability team](https://about.gitlab.com/handbook/engineering/infrastructure/team/scalability/)
currently maintains the `feature_categories.yml` file. They will automatically be
notified on Slack when the file becomes outdated.
diff --git a/doc/development/feature_development.md b/doc/development/feature_development.md
index a5d74a0bfd9..e50c1edd282 100644
--- a/doc/development/feature_development.md
+++ b/doc/development/feature_development.md
@@ -71,9 +71,9 @@ Consult these topics for information on contributing to specific GitLab features
- [Developing against interacting components or features](interacting_components.md)
- [Manage feature flags](feature_flags/index.md)
-- [Licensed feature availability](licensed_feature_availability.md)
+- [Implementing Enterprise Edition features](ee_features.md)
- [Accessing session data](session.md)
-- [How to dump production data to staging](db_dump.md)
+- [How to dump production data to staging](database/db_dump.md)
- [Geo development](geo.md)
- [Redis guidelines](redis.md)
- [Adding a new Redis instance](redis/new_redis_instance.md)
diff --git a/doc/development/feature_flags/controls.md b/doc/development/feature_flags/controls.md
index 07c3c83912a..8a862e5f7cd 100644
--- a/doc/development/feature_flags/controls.md
+++ b/doc/development/feature_flags/controls.md
@@ -310,7 +310,7 @@ Changes to the issue format can be submitted in the
#### Instance level
Any feature flag change that affects any GitLab instance is automatically logged in
-[features_json.log](../../administration/logs.md#features_jsonlog).
+[features_json.log](../../administration/logs/index.md#features_jsonlog).
You can search the change history in [Kibana](https://about.gitlab.com/handbook/support/workflows/kibana.html).
You can also access the feature flag change history for GitLab.com [in Kibana](https://log.gprd.gitlab.net/goto/d060337c017723084c6d97e09e591fc6).
diff --git a/doc/development/feature_flags/index.md b/doc/development/feature_flags/index.md
index 140d5f826cf..502a028f089 100644
--- a/doc/development/feature_flags/index.md
+++ b/doc/development/feature_flags/index.md
@@ -170,7 +170,7 @@ Each feature flag is defined in a separate YAML file consisting of a number of f
| `default_enabled` | yes | The default state of the feature flag. |
| `introduced_by_url` | no | The URL to the merge request that introduced the feature flag. |
| `rollout_issue_url` | no | The URL to the Issue covering the feature flag rollout. |
-| `milestone` | no | Milestone in which the feature was added. |
+| `milestone` | no | Milestone in which the feature flag was created. |
| `group` | no | The [group](https://about.gitlab.com/handbook/product/categories/#devops-stages) that owns the feature flag. |
NOTE:
@@ -178,6 +178,9 @@ All validations are skipped when running in `RAILS_ENV=production`.
## Create a new feature flag
+NOTE:
+GitLab Pages uses [a different process](../pages/index.md#feature-flags) for feature flags.
+
The GitLab codebase provides [`bin/feature-flag`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/bin/feature-flag),
a dedicated tool to create new feature flag definitions.
The tool asks various questions about the new feature flag, then creates
@@ -423,6 +426,21 @@ Feature.enabled?(:a_feature, project) && Feature.disabled?(:a_feature_override,
/chatops run feature set --project=gitlab-org/gitlab a_feature_override true
```
+#### Percentage-based actor selection
+
+When using the percentage rollout of actors on multiple feature flags, the actors for each feature flag are selected separately.
+
+For example, the following feature flags are enabled for a certain percentage of actors:
+
+```plaintext
+/chatops run chatops feature set feature-set-1 25 --actors
+/chatops run chatops feature set feature-set-2 25 --actors
+```
+
+If a project A has `:feature-set-1` enabled, there is no guarantee that project A also has `:feature-set-2` enabled.
+
+For more detail, see [This is how percentages work in Flipper](https://www.hackwithpassion.com/this-is-how-percentages-work-in-flipper).
+
#### Use actors for verifying in production
WARNING:
diff --git a/doc/development/features_inside_dot_gitlab.md b/doc/development/features_inside_dot_gitlab.md
index ca7dbd6adde..f30a041931e 100644
--- a/doc/development/features_inside_dot_gitlab.md
+++ b/doc/development/features_inside_dot_gitlab.md
@@ -14,8 +14,8 @@ When implementing new features, please refer to these existing features to avoid
- [Merge request Templates](../user/project/description_templates.md#create-a-merge-request-template): `.gitlab/merge_request_templates/`.
- [GitLab agent](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/master/doc/configuration_repository.md#layout): `.gitlab/agents/`.
- [CODEOWNERS](../user/project/code_owners.md#set-up-code-owners): `.gitlab/CODEOWNERS`.
-- [Route Maps](../ci/review_apps/#route-maps): `.gitlab/route-map.yml`.
+- [Route Maps](../ci/review_apps/index.md#route-maps): `.gitlab/route-map.yml`.
- [Customize Auto DevOps Helm Values](../topics/autodevops/customize.md#customize-values-for-helm-chart): `.gitlab/auto-deploy-values.yaml`.
- [Insights](../user/project/insights/index.md#configure-your-insights): `.gitlab/insights.yml`.
- [Service Desk Templates](../user/project/service_desk.md#using-customized-email-templates): `.gitlab/service_desk_templates/`.
-- [Web IDE](../user/project/web_ide/#web-ide-configuration-file): `.gitlab/.gitlab-webide.yml`.
+- [Web IDE](../user/project/web_ide/index.md#web-ide-configuration-file): `.gitlab/.gitlab-webide.yml`.
diff --git a/doc/development/filtering_by_label.md b/doc/development/filtering_by_label.md
index 9e759744a1a..675fe004c22 100644
--- a/doc/development/filtering_by_label.md
+++ b/doc/development/filtering_by_label.md
@@ -1,179 +1,11 @@
---
-stage: Plan
-group: Project Management
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/filtering_by_label.md'
+remove_date: '2022-11-05'
---
-# Filtering by label
-## Introduction
+This document was moved to [another location](database/filtering_by_label.md).
-GitLab has [labels](../user/project/labels.md) that can be assigned to issues,
-merge requests, and epics. Labels on those objects are a many-to-many relation
-through the polymorphic `label_links` table.
-
-To filter these objects by multiple labels - for instance, 'all open
-issues with the label ~Plan and the label ~backend' - we generate a
-query containing a `GROUP BY` clause. In a simple form, this looks like:
-
-```sql
-SELECT
- issues.*
-FROM
- issues
- INNER JOIN label_links ON label_links.target_id = issues.id
- AND label_links.target_type = 'Issue'
- INNER JOIN labels ON labels.id = label_links.label_id
-WHERE
- issues.project_id = 13083
- AND (issues.state IN ('opened'))
- AND labels.title IN ('Plan',
- 'backend')
-GROUP BY
- issues.id
-HAVING (COUNT(DISTINCT labels.title) = 2)
-ORDER BY
- issues.updated_at DESC,
- issues.id DESC
-LIMIT 20 OFFSET 0
-```
-
-In particular, note that:
-
-1. We `GROUP BY issues.id` so that we can ...
-1. Use the `HAVING (COUNT(DISTINCT labels.title) = 2)` condition to ensure that
- all matched issues have both labels.
-
-This is more complicated than is ideal. It makes the query construction more
-prone to errors (such as
-[issue #15557](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/15557)).
-
-## Attempt A: `WHERE EXISTS`
-
-### Attempt A1: use multiple subqueries with `WHERE EXISTS`
-
-In [issue #37137](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/37137)
-and its associated [merge request](https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/14022),
-we tried to replace the `GROUP BY` with multiple uses of `WHERE EXISTS`. For the
-example above, this would give:
-
-```sql
-WHERE (EXISTS (
- SELECT
- TRUE
- FROM
- label_links
- INNER JOIN labels ON labels.id = label_links.label_id
- WHERE
- labels.title = 'Plan'
- AND target_type = 'Issue'
- AND target_id = issues.id))
-AND (EXISTS (
- SELECT
- TRUE
- FROM
- label_links
- INNER JOIN labels ON labels.id = label_links.label_id
- WHERE
- labels.title = 'backend'
- AND target_type = 'Issue'
- AND target_id = issues.id))
-```
-
-While this worked without schema changes, and did improve readability somewhat,
-it did not improve query performance.
-
-### Attempt A2: use label IDs in the `WHERE EXISTS` clause
-
-In [merge request #34503](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/34503), we followed a similar approach to A1. But this time, we
-did a separate query to fetch the IDs of the labels used in the filter so that we avoid the `JOIN` in the `EXISTS` clause and filter directly by
-`label_links.label_id`. We also added a new index on `label_links` for the `target_id`, `label_id`, and `target_type` columns to speed up this query.
-
-Finding the label IDs wasn't straightforward because there could be multiple labels with the same title within a single root namespace. We solved
-this by grouping the label IDs by title and then using the array of IDs in the `EXISTS` clauses.
-
-This resulted in a significant performance improvement. However, this optimization could not be applied to the dashboard pages
-where we do not have a project or group context. We could not easily search for the label IDs here because that would mean searching across all
-projects and groups that the user has access to.
-
-## Attempt B: Denormalize using an array column
-
-Having [removed MySQL support in GitLab 12.1](https://about.gitlab.com/blog/2019/06/27/removing-mysql-support/),
-using [PostgreSQL's arrays](https://www.postgresql.org/docs/11/arrays.html) became more
-tractable as we didn't have to support two databases. We discussed denormalizing
-the `label_links` table for querying in
-[issue #49651](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/49651),
-with two options: label IDs and titles.
-
-We can think of both of those as array columns on `issues`, `merge_requests`,
-and `epics`: `issues.label_ids` would be an array column of label IDs, and
-`issues.label_titles` would be an array of label titles.
-
-These array columns can be complemented with [GIN
-indexes](https://www.postgresql.org/docs/11/gin-intro.html) to improve
-matching.
-
-### Attempt B1: store label IDs for each object
-
-This has some strong advantages over titles:
-
-1. Unless a label is deleted, or a project is moved, we never need to
- bulk-update the denormalized column.
-1. It uses less storage than the titles.
-
-Unfortunately, our application design makes this hard. If we were able to query
-just by label ID easily, we wouldn't need the `INNER JOIN labels` in the initial
-query at the start of this document. GitLab allows users to filter by label
-title across projects and even across groups, so a filter by the label ~Plan may
-include labels with multiple distinct IDs.
-
-We do not want users to have to know about the different IDs, which means that
-given this data set:
-
-| Project | ~Plan label ID | ~backend label ID |
-| ------- | -------------- | ----------------- |
-| A | 11 | 12 |
-| B | 21 | 22 |
-| C | 31 | 32 |
-
-We would need something like:
-
-```sql
-WHERE
- label_ids @> ARRAY[11, 12]
- OR label_ids @> ARRAY[21, 22]
- OR label_ids @> ARRAY[31, 32]
-```
-
-This can get even more complicated when we consider that in some cases, there
-might be two ~backend labels - with different IDs - that could apply to the same
-object, so the number of combinations would balloon further.
-
-### Attempt B2: store label titles for each object
-
-From the perspective of updating the object, this is the worst
-option. We have to bulk update the objects when:
-
-1. The objects are moved from one project to another.
-1. The project is moved from one group to another.
-1. The label is renamed.
-1. The label is deleted.
-
-It also uses much more storage. Querying is simple, though:
-
-```sql
-WHERE
- label_titles @> ARRAY['Plan', 'backend']
-```
-
-And our [tests in issue #49651](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/49651#note_188777346)
-showed that this could be fast.
-
-However, at present, the disadvantages outweigh the advantages.
-
-## Conclusion
-
-We found a method A2 that does not need denormalization and improves the query performance significantly. This
-did not apply to all cases, but we were able to apply method A1 to the rest of the cases so that we remove the
-`GROUP BY` and `HAVING` clauses in all scenarios.
-
-This simplified the query and improved the performance in the most common cases.
+<!-- This redirect file can be deleted after <2022-11-05>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/fips_compliance.md b/doc/development/fips_compliance.md
index 6261b2fda6f..c690408ee60 100644
--- a/doc/development/fips_compliance.md
+++ b/doc/development/fips_compliance.md
@@ -12,12 +12,9 @@ to ensure a certain security floor is met by vendors selling products to U.S.
Federal institutions.
WARNING:
-GitLab is not FIPS compliant, even when built and run on a FIPS-enforcing
-system. Large parts of the build are broken, and many features use forbidden
-cryptographic primitives. Running GitLab on a FIPS-enforcing system is not
-supported and may result in data loss. This document is intended to help
-engineers looking to develop FIPS-related fixes. It is not intended to be used
-to run a production GitLab instance.
+You can build a FIPS-compliant instance of GitLab, but [not all features are included](#unsupported-features-in-fips-mode).
+A FIPS-compliant instance must be configured following the [FIPS install instructions](#install-gitlab-with-fips-compliance)
+exactly.
There are two current FIPS standards: [140-2](https://en.wikipedia.org/wiki/FIPS_140-2)
and [140-3](https://en.wikipedia.org/wiki/FIPS_140-3). At GitLab we usually
@@ -25,10 +22,7 @@ mean FIPS 140-2.
## Current status
-Read [Epic &5104](https://gitlab.com/groups/gitlab-org/-/epics/5104) for more
-information on the status of the investigation.
-
-GitLab is actively working towards FIPS compliance.
+GitLab is actively working towards FIPS compliance. Progress on this initiative can be tracked with this [FIPS compliance Epic](https://gitlab.com/groups/gitlab-org/-/epics/6452).
## FIPS compliance at GitLab
@@ -46,15 +40,40 @@ when FIPS mode is enabled.
| Ubuntu 20.04 Libgcrypt Cryptographic Module | [3902](https://csrc.nist.gov/projects/cryptographic-module-validation-program/certificate/3902) | EC2 instances | `gpg`, `sshd` |
| Amazon Linux 2 Kernel Crypto API Cryptographic Module | [3709](https://csrc.nist.gov/projects/cryptographic-module-validation-program/certificate/3709) | EKS nodes | Linux kernel |
| Amazon Linux 2 OpenSSL Cryptographic Module | [3553](https://csrc.nist.gov/projects/cryptographic-module-validation-program/certificate/3553) | EKS nodes | NGINX |
-| RedHat Enterprise Linux 8 OpenSSL Cryptographic Module | [3852](https://csrc.nist.gov/projects/cryptographic-module-validation-program/certificate/3852) | EKS nodes | UBI containers: Workhorse, Pages, Container Registry, Rails (Puma/Sidekiq), Security Analyzers |
+| RedHat Enterprise Linux 8 OpenSSL Cryptographic Module | [4271](https://csrc.nist.gov/projects/cryptographic-module-validation-program/certificate/4271) | EKS nodes | UBI containers: Workhorse, Pages, Container Registry, Rails (Puma/Sidekiq), Security Analyzers |
| RedHat Enterprise Linux 8 Libgcrypt Cryptographic Module | [3784](https://csrc.nist.gov/projects/cryptographic-module-validation-program/certificate/3784) | EKS nodes | UBI containers: GitLab Shell, `gpg` |
### Supported Operating Systems
-The supported hybrid environments are:
+The supported hybrid platforms are:
+
+- Omnibus GitLab: Ubuntu 20.04 LTS
+- Cloud Native GitLab: Amazon Linux 2 (EKS)
+
+### Unsupported features in FIPS mode
+
+Some GitLab features may not work when FIPS mode is enabled. The following features
+are known to not work in FIPS mode. However, there may be additional features not
+listed here that also do not work properly in FIPS mode:
+
+- [Container Scanning](../user/application_security/container_scanning/index.md) support for scanning images in repositories that require authentication.
+- [Code Quality](../ci/testing/code_quality.md) does not support operating in FIPS-compliant mode.
+- [Dependency scanning](../user/application_security/dependency_scanning/index.md) support for Gradle.
+- [Dynamic Application Security Testing (DAST)](../user/application_security/dast/index.md)
+ does not support operating in FIPS-compliant mode.
+- [License compliance](../user/compliance/license_compliance/index.md).
+- [Solutions for vulnerabilities](../user/application_security/vulnerabilities/index.md#resolve-a-vulnerability)
+ for yarn projects.
+- [Static Application Security Testing (SAST)](../user/application_security/sast/index.md)
+ supports a reduced set of [analyzers](../user/application_security/sast/#fips-enabled-images)
+ when operating in FIPS-compliant mode.
+- Advanced Search is currently not included in FIPS mode. It must not be enabled in order to be FIPS-compliant.
+- [Gravatar or Libravatar-based profile images](../administration/libravatar.md) are not FIPS-compliant.
+
+Additionally, these package repositories are disabled in FIPS mode:
-- Omnibus: Ubuntu 20.04 FIPS
-- EKS: Amazon Linux 2
+- [Conan package repository](../user/packages/conan_repository/index.md).
+- [Debian package repository](../user/packages/debian_repository/index.md).
## FIPS validation at GitLab
@@ -281,6 +300,9 @@ gitlab:
gitlab-mailroom:
image:
tag: master-fips
+ gitlab-pages:
+ image:
+ tag: master-fips
migrations:
image:
tag: master-fips
@@ -299,18 +321,17 @@ gitlab:
nginx-ingress:
controller:
image:
- repository: registry.gitlab.com/stanhu/gitlab-test-images/k8s-staging-ingress-nginx/controller
- tag: v1.2.0-beta.1
+ repository: registry.gitlab.com/gitlab-org/cloud-native/charts/gitlab-ingress-nginx/controller
+ tag: v1.2.1-fips
pullPolicy: Always
- digest: sha256:ace38833689ad34db4a46bc1e099242696eb800def88f02200a8615530734116
+ digest: sha256:c4222b7ab3836b9be2a7649cff4b2e6ead34286dfdf3a7b04eb34fdd3abb0334
```
The above example shows a FIPS-enabled [`nginx-ingress`](https://github.com/kubernetes/ingress-nginx) image.
-See [this issue](https://gitlab.com/gitlab-org/charts/gitlab/-/issues/3153#note_917782207) for more details on
-how to build NGINX and the Ingress Controller.
+See our [Charts documentation on FIPS](https://docs.gitlab.com/charts/advanced/fips/index.html) for more details.
You can also use release tags, but the versioning is tricky because each
-component may use its own versioning scheme. For example, for GitLab v15.1:
+component may use its own versioning scheme. For example, for GitLab v15.2:
```yaml
global:
@@ -324,30 +345,33 @@ global:
gitlab:
gitaly:
image:
- tag: v15.1.0-fips
+ tag: v15.2.0-fips
gitlab-exporter:
image:
- tag: 11.15.2-fips
+ tag: 11.17.1-fips
gitlab-shell:
image:
- tag: v15.1.0-fips
+ tag: v14.9.0-fips
gitlab-mailroom:
image:
- tag: v15.1.0-fips
+ tag: v15.2.0-fips
+ gitlab-pages:
+ image:
+ tag: v1.61.0-fips
migrations:
image:
- tag: v15.1.0-fips
+ tag: v15.2.0-fips
sidekiq:
image:
- tag: v15.1.0-fips
+ tag: v15.2.0-fips
toolbox:
image:
- tag: v15.1.0-fips
+ tag: v15.2.0-fips
webservice:
image:
- tag: v15.1.0-fips
+ tag: v15.2.0-fips
workhorse:
- tag: v15.1.0-fips
+ tag: v15.2.0-fips
```
## FIPS Performance Benchmarking
@@ -417,29 +441,6 @@ def default_min_key_size(name)
end
```
-#### Unsupported features in FIPS mode
-
-Some GitLab features may not work when FIPS mode is enabled. The following features
-are known to not work in FIPS mode. However, there may be additional features not
-listed here that also do not work properly in FIPS mode:
-
-- [Container Scanning](../user/application_security/container_scanning/index.md) support for scanning images in repositories that require authentication.
-- [Code Quality](../ci/testing/code_quality.md) does not support operating in FIPS-compliant mode.
-- [Dependency scanning](../user/application_security/dependency_scanning/index.md) support for Gradle.
-- [Dynamic Application Security Testing (DAST)](../user/application_security/dast/index.md)
- does not support operating in FIPS-compliant mode.
-- [License compliance](../user/compliance/license_compliance/index.md).
-- [Solutions for vulnerabilities](../user/application_security/vulnerabilities/index.md#resolve-a-vulnerability)
- for yarn projects.
-- [Static Application Security Testing (SAST)](../user/application_security/sast/index.md)
- supports a reduced set of [analyzers](../user/application_security/sast/#fips-enabled-images)
- when operating in FIPS-compliant mode.
-
-Additionally, these package repositories are disabled in FIPS mode:
-
-- [Conan package repository](../user/packages/conan_repository/index.md).
-- [Debian package repository](../user/packages/debian_repository/index.md).
-
## Nightly Omnibus FIPS builds
The Distribution team has created [nightly FIPS Omnibus builds](https://packages.gitlab.com/gitlab/nightly-fips-builds). These
diff --git a/doc/development/foreign_keys.md b/doc/development/foreign_keys.md
index e0dd0fe8e7c..cdf655bf0bf 100644
--- a/doc/development/foreign_keys.md
+++ b/doc/development/foreign_keys.md
@@ -1,200 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/foreign_keys.md'
+remove_date: '2022-11-05'
---
-# Foreign Keys & Associations
+This document was moved to [another location](database/foreign_keys.md).
-When adding an association to a model you must also add a foreign key. For
-example, say you have the following model:
-
-```ruby
-class User < ActiveRecord::Base
- has_many :posts
-end
-```
-
-Add a foreign key here on column `posts.user_id`. This ensures
-that data consistency is enforced on database level. Foreign keys also mean that
-the database can very quickly remove associated data (for example, when removing a
-user), instead of Rails having to do this.
-
-## Adding Foreign Keys In Migrations
-
-Foreign keys can be added concurrently using `add_concurrent_foreign_key` as
-defined in `Gitlab::Database::MigrationHelpers`. See the [Migration Style
-Guide](migration_style_guide.md) for more information.
-
-Keep in mind that you can only safely add foreign keys to existing tables after
-you have removed any orphaned rows. The method `add_concurrent_foreign_key`
-does not take care of this so you must do so manually. See
-[adding foreign key constraint to an existing column](database/add_foreign_key_to_existing_column.md).
-
-## Updating Foreign Keys In Migrations
-
-Sometimes a foreign key constraint must be changed, preserving the column
-but updating the constraint condition. For example, moving from
-`ON DELETE CASCADE` to `ON DELETE SET NULL` or vice-versa.
-
-PostgreSQL does not prevent you from adding overlapping foreign keys. It
-honors the most recently added constraint. This allows us to replace foreign keys without
-ever losing foreign key protection on a column.
-
-To replace a foreign key:
-
-1. [Add the new foreign key without validation](database/add_foreign_key_to_existing_column.md#prevent-invalid-records)
-
- The name of the foreign key constraint must be changed to add a new
- foreign key before removing the old one.
-
- ```ruby
- class ReplaceFkOnPackagesPackagesProjectId < Gitlab::Database::Migration[2.0]
- disable_ddl_transaction!
-
- NEW_CONSTRAINT_NAME = 'fk_new'
-
- def up
- add_concurrent_foreign_key(:packages_packages, :projects, column: :project_id, on_delete: :nullify, validate: false, name: NEW_CONSTRAINT_NAME)
- end
-
- def down
- with_lock_retries do
- remove_foreign_key_if_exists(:packages_packages, column: :project_id, on_delete: :nullify, name: NEW_CONSTRAINT_NAME)
- end
- end
- end
- ```
-
-1. [Validate the new foreign key](database/add_foreign_key_to_existing_column.md#validate-the-foreign-key)
-
- ```ruby
- class ValidateFkNew < Gitlab::Database::Migration[2.0]
- NEW_CONSTRAINT_NAME = 'fk_new'
-
- # foreign key added in <link to MR or path to migration adding new FK>
- def up
- validate_foreign_key(:packages_packages, name: NEW_CONSTRAINT_NAME)
- end
-
- def down
- # no-op
- end
- end
- ```
-
-1. Remove the old foreign key:
-
- ```ruby
- class RemoveFkOld < Gitlab::Database::Migration[2.0]
- OLD_CONSTRAINT_NAME = 'fk_old'
-
- # new foreign key added in <link to MR or path to migration adding new FK>
- # and validated in <link to MR or path to migration validating new FK>
- def up
- remove_foreign_key_if_exists(:packages_packages, column: :project_id, on_delete: :cascade, name: OLD_CONSTRAINT_NAME)
- end
-
- def down
- # Validation is skipped here, so if rolled back, this will need to be revalidated in a separate migration
- add_concurrent_foreign_key(:packages_packages, :projects, column: :project_id, on_delete: :cascade, validate: false, name: OLD_CONSTRAINT_NAME)
- end
- end
- ```
-
-## Cascading Deletes
-
-Every foreign key must define an `ON DELETE` clause, and in 99% of the cases
-this should be set to `CASCADE`.
-
-## Indexes
-
-When adding a foreign key in PostgreSQL the column is not indexed automatically,
-thus you must also add a concurrent index. Not doing so results in cascading
-deletes being very slow.
-
-## Naming foreign keys
-
-By default Ruby on Rails uses the `_id` suffix for foreign keys. So we should
-only use this suffix for associations between two tables. If you want to
-reference an ID on a third party platform the `_xid` suffix is recommended.
-
-The spec `spec/db/schema_spec.rb` tests if all columns with the `_id` suffix
-have a foreign key constraint. So if that spec fails, don't add the column to
-`IGNORED_FK_COLUMNS`, but instead add the FK constraint, or consider naming it
-differently.
-
-## Dependent Removals
-
-Don't define options such as `dependent: :destroy` or `dependent: :delete` when
-defining an association. Defining these options means Rails handles the
-removal of data, instead of letting the database handle this in the most
-efficient way possible.
-
-In other words, this is bad and should be avoided at all costs:
-
-```ruby
-class User < ActiveRecord::Base
- has_many :posts, dependent: :destroy
-end
-```
-
-Should you truly have a need for this it should be approved by a database
-specialist first.
-
-You should also not define any `before_destroy` or `after_destroy` callbacks on
-your models _unless_ absolutely required and only when approved by database
-specialists. For example, if each row in a table has a corresponding file on a
-file system it may be tempting to add a `after_destroy` hook. This however
-introduces non database logic to a model, and means we can no longer rely on
-foreign keys to remove the data as this would result in the file system data
-being left behind. In such a case you should use a service class instead that
-takes care of removing non database data.
-
-In cases where the relation spans multiple databases you have even
-further problems using `dependent: :destroy` or the above hooks. You can
-read more about alternatives at [Avoid `dependent: :nullify` and
-`dependent: :destroy` across
-databases](database/multiple_databases.md#avoid-dependent-nullify-and-dependent-destroy-across-databases).
-
-## Alternative primary keys with `has_one` associations
-
-Sometimes a `has_one` association is used to create a one-to-one relationship:
-
-```ruby
-class User < ActiveRecord::Base
- has_one :user_config
-end
-
-class UserConfig < ActiveRecord::Base
- belongs_to :user
-end
-```
-
-In these cases, there may be an opportunity to remove the unnecessary `id`
-column on the associated table, `user_config.id` in this example. Instead,
-the originating table ID can be used as the primary key for the associated
-table:
-
-```ruby
-create_table :user_configs, id: false do |t|
- t.references :users, primary_key: true, default: nil, index: false, foreign_key: { on_delete: :cascade }
- ...
-end
-```
-
-Setting `default: nil` ensures a primary key sequence is not created, and since the primary key
-automatically gets an index, we set `index: false` to avoid creating a duplicate.
-You also need to add the new primary key to the model:
-
-```ruby
-class UserConfig < ActiveRecord::Base
- self.primary_key = :user_id
-
- belongs_to :user
-end
-```
-
-Using a foreign key as primary key saves space but can make
-[batch counting](service_ping/implement.md#batch-counters) in [Service Ping](service_ping/index.md) less efficient.
-Consider using a regular `id` column if the table is relevant for Service Ping.
+<!-- This redirect file can be deleted after <2022-11-05>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/gemfile.md b/doc/development/gemfile.md
index 0fcfb88c9cd..f9cf69020bb 100644
--- a/doc/development/gemfile.md
+++ b/doc/development/gemfile.md
@@ -61,8 +61,7 @@ to a gem, go through these steps:
1. Follow the [instructions for new projects](https://about.gitlab.com/handbook/engineering/gitlab-repositories/#creating-a-new-project).
1. Follow the instructions for setting up a [CI/CD configuration](https://about.gitlab.com/handbook/engineering/gitlab-repositories/#cicd-configuration).
1. Follow the instructions for [publishing a project](https://about.gitlab.com/handbook/engineering/gitlab-repositories/#publishing-a-project).
- - See [issue
- #325463](https://gitlab.com/gitlab-org/gitlab/-/issues/325463)
+ - See [issue #325463](https://gitlab.com/gitlab-org/gitlab/-/issues/325463)
for an example.
- In some cases we may want to move a gem to its own namespace. Some
examples might be that it will naturally have more than one project
@@ -74,8 +73,8 @@ to a gem, go through these steps:
apply if someone who currently works at GitLab wants to maintain
the gem beyond their time working at GitLab.
-When publishing a gem to RubyGems.org, also note the section on [gem
-owners](https://about.gitlab.com/handbook/developer-onboarding/#ruby-gems)
+When publishing a gem to RubyGems.org, also note the section on
+[gem owners](https://about.gitlab.com/handbook/developer-onboarding/#ruby-gems)
in the handbook.
## Upgrade Rails
@@ -113,8 +112,7 @@ gem 'thor', '>= 1.1.1'
```
Here we're using the operator `>=` (greater than or equal to) rather
-than `~>` ([pessimistic
-operator](https://thoughtbot.com/blog/rubys-pessimistic-operator))
+than `~>` ([pessimistic operator](https://thoughtbot.com/blog/rubys-pessimistic-operator))
making it possible to upgrade `license_finder` or any other gem to a
version that depends on `thor 1.2`.
@@ -134,15 +132,14 @@ that also relied on `thor` but had its version pinned to a vulnerable
one. These changes are easy to miss in the `Gemfile.lock`. Pinning the
version would result in a conflict that would need to be solved.
-To avoid upgrading indirect dependencies, we can use [`bundle update
---conservative`](https://bundler.io/man/bundle-update.1.html#OPTIONS).
+To avoid upgrading indirect dependencies, we can use
+[`bundle update --conservative`](https://bundler.io/man/bundle-update.1.html#OPTIONS).
When submitting a merge request including a dependency update,
include a link to the Gem diff between the 2 versions in the merge request
description. You can find this link on `rubygems.org`, select
**Review Changes** to generate a comparison
between the versions on `diffend.io`. For example, this is the gem
-diff for [`thor` 1.0.0 vs
-1.0.1](https://my.diffend.io/gems/thor/1.0.0/1.0.1). Use the
+diff for [`thor` 1.0.0 vs 1.0.1](https://my.diffend.io/gems/thor/1.0.0/1.0.1). Use the
links directly generated from RubyGems, since the links from GitLab or other code-hosting
platforms might not reflect the code that's actually published.
diff --git a/doc/development/geo.md b/doc/development/geo.md
index 9e9bd85ecd8..f042af42de5 100644
--- a/doc/development/geo.md
+++ b/doc/development/geo.md
@@ -576,7 +576,7 @@ See `Gitlab::Geo.enabled?` and `Gitlab::Geo.license_allows?` methods.
All Geo **secondary** sites are read-only.
-The general principle of a [read-only database](verifying_database_capabilities.md#read-only-database)
+The general principle of a [read-only database](database/verifying_database_capabilities.md#read-only-database)
applies to all Geo **secondary** sites. So the
`Gitlab::Database.read_only?` method will always return `true` on a
**secondary** site.
diff --git a/doc/development/geo/proxying.md b/doc/development/geo/proxying.md
index 41c7f426c6f..2f0226c489c 100644
--- a/doc/development/geo/proxying.md
+++ b/doc/development/geo/proxying.md
@@ -128,8 +128,8 @@ Secondary-->>Client: admin/geo/replication/projects logged in response (session
## Git pull
-For historical reasons, the `push_from_secondary` path is used to forward a Git pull. There is [an issue proposing to
-rename this route](https://gitlab.com/gitlab-org/gitlab/-/issues/292690) to avoid confusion.
+For historical reasons, the `push_from_secondary` path is used to forward a Git pull. There is
+[an issue proposing to rename this route](https://gitlab.com/gitlab-org/gitlab/-/issues/292690) to avoid confusion.
### Git pull over HTTP(s)
diff --git a/doc/development/git_object_deduplication.md b/doc/development/git_object_deduplication.md
index 1a864ef81f0..a6b359769f8 100644
--- a/doc/development/git_object_deduplication.md
+++ b/doc/development/git_object_deduplication.md
@@ -18,8 +18,8 @@ GitLab implements Git object deduplication.
### Understanding Git alternates
-At the Git level, we achieve deduplication by using [Git
-alternates](https://git-scm.com/docs/gitrepository-layout#gitrepository-layout-objects).
+At the Git level, we achieve deduplication by using
+[Git alternates](https://git-scm.com/docs/gitrepository-layout#gitrepository-layout-objects).
Git alternates is a mechanism that lets a repository borrow objects from
another repository on the same machine.
@@ -44,8 +44,8 @@ reliable decide if an object is no longer needed.
### Git alternates in GitLab: pool repositories
-GitLab organizes this object borrowing by [creating special **pool
-repositories**](../administration/repository_storage_types.md) which are hidden from the user. We then use Git
+GitLab organizes this object borrowing by [creating special **pool repositories**](../administration/repository_storage_types.md)
+which are hidden from the user. We then use Git
alternates to let a collection of project repositories borrow from a
single pool repository. We call such a collection of project
repositories a pool. Pools form star-shaped networks of repositories
@@ -99,9 +99,8 @@ are as follows:
### Assumptions
-- All repositories in a pool must use [hashed
- storage](../administration/repository_storage_types.md). This is so
- that we don't have to ever worry about updating paths in
+- All repositories in a pool must use [hashed storage](../administration/repository_storage_types.md).
+ This is so that we don't have to ever worry about updating paths in
`object/info/alternates` files.
- All repositories in a pool must be on the same Gitaly storage shard.
The Git alternates mechanism relies on direct disk access across
diff --git a/doc/development/gitaly.md b/doc/development/gitaly.md
index 8a0cf8e7717..66b535682f5 100644
--- a/doc/development/gitaly.md
+++ b/doc/development/gitaly.md
@@ -79,8 +79,7 @@ During RSpec tests, the Gitaly instance writes logs to `gitlab/log/gitaly-test.l
While Gitaly can handle all Git access, many of GitLab customers still
run Gitaly atop NFS. The legacy Rugged implementation for Git calls may
be faster than the Gitaly RPC due to N+1 Gitaly calls and other
-reasons. See [the
-issue](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/57317) for more
+reasons. See [the issue](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/57317) for more
details.
Until GitLab has eliminated most of these inefficiencies or the use of
diff --git a/doc/development/github_importer.md b/doc/development/github_importer.md
index 57cb74a6159..0aa1bad711d 100644
--- a/doc/development/github_importer.md
+++ b/doc/development/github_importer.md
@@ -71,8 +71,8 @@ This worker imports all pull requests. For every pull request a job for the
### 5. Stage::ImportPullRequestsMergedByWorker
-This worker imports the pull requests' _merged-by_ user information. The [_List pull
-requests_](https://docs.github.com/en/rest/reference/pulls#list-pull-requests)
+This worker imports the pull requests' _merged-by_ user information. The
+[_List pull requests_](https://docs.github.com/en/rest/pulls#list-pull-requests)
API doesn't provide this information. Therefore, this stage must fetch each merged pull request
individually to import this information. A
`Gitlab::GithubImport::ImportPullRequestMergedByWorker` job is scheduled for each fetched pull
diff --git a/doc/development/gitlab_flavored_markdown/specification_guide/index.md b/doc/development/gitlab_flavored_markdown/specification_guide/index.md
index 80837506037..756b87cd407 100644
--- a/doc/development/gitlab_flavored_markdown/specification_guide/index.md
+++ b/doc/development/gitlab_flavored_markdown/specification_guide/index.md
@@ -54,7 +54,7 @@ simultaneous in the same areas of logic. In these situations,
_GitHub_ Flavored Markdown may be referred to with variable or constant names like
`ghfm_` to avoid confusion. For example, we use the `ghfm` acronym for the
[`ghfm_spec_v_0.29.txt` GitHub Flavored Markdown specification file](#github-flavored-markdown-specification)
-which is committed to the `gitlab` repo and used as input to the
+which is committed to the `gitlab` repository and used as input to the
[`update_specification.rb` script](#update-specificationrb-script).
The original CommonMark specification is referred to as _CommonMark_ (no acronym).
@@ -86,6 +86,51 @@ it does not have a static, hardcoded, manually updated `spec.txt`. Instead, the
GLFM `spec.txt` is automatically generated based on other input files. This process
is explained in detail in the [Implementation](#implementation) sections below.
+#### Official specifications vs internal extensions
+
+Within GFM and GLFM respectively, both GitHub and GitLab have two "sets" of Markdown they support:
+
+- Official specification
+- Internal extensions
+
+The following taxonomy chart shows the taxonomy and terminology of the various specifications:
+
+```mermaid
+graph TD
+CM[CommonMark - spec.txt - e.g. headings] --- GFMS[GFM Specification - spec.txt - e.g. strikethrough extension]
+GFMS --- GLFM[GLFM Specification - e.g. color chips]
+GFMS --- GFMI[GFM internal extensions - e.g. GitHub-specific references]
+GLFM --- GLFS[GLFM internal extensions - e.g. GitLab-specific references]
+```
+
+##### Official specifications
+
+GFM and GLFM each have an official specification, which includes both:
+
+1. The CommonMark standard.
+1. Generic extensions to the CommonMark standard.
+
+For example, GFM adds the
+[`strikethrough` extension](https://github.github.com/gfm/#strikethrough-extension-),
+and GLFM adds the
+[`color chips` extension](../../../user/markdown.md#colors).
+These extensions in the official specifications are not dependent upon any specific
+implementation or environment. They can be implemented in any third-party Markdown rendering engine.
+
+##### Internal extensions
+
+GFM and GLFM each also have a set of internal extensions. These extensions are not part of the GFM or GLFM
+official specifications, but are part of the GitHub and GitLab internal Markdown renderer and parser
+implementations. These internal extensions are often dependent upon the GitHub or GitLab
+implementations or environments, and may depend upon metadata which is only available via
+interacting with those environments. For example,
+[GitHub supports GitHub-specific autolinked references](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/autolinked-references-and-urls),
+and
+[GitLab also supports GitLab-specific references](../../../user/markdown.md#gitlab-specific-references).
+These may also be implemented by third-party Markdown rendering engines which integrate with
+GitHub or GitLab. For example, editor or IDE plugins which enable the user to directly edit
+Markdown for issues, pull requests, or merge requests within the editor or IDE.
+
### Markdown examples
Everywhere in the context of the specification and this guide, the term
@@ -136,7 +181,7 @@ NOTE:
#### Markdown snapshot testing
_Markdown snapshot testing_ refers to the automated testing performed in
-the GitLab codebase, which is driven by "example_snapshots" fixture data derived from all of
+the GitLab codebase, which is driven by `example_snapshots` fixture data derived from all of
the examples in the GLFM specification. It consists of both backend RSpec tests and frontend Jest
tests which use the fixture data. This fixture data is contained in YAML files. These files
are generated and updated based on the Markdown examples in the specification,
diff --git a/doc/development/go_guide/dependencies.md b/doc/development/go_guide/dependencies.md
index 0c2ce4f2b48..2a53fa590e3 100644
--- a/doc/development/go_guide/dependencies.md
+++ b/doc/development/go_guide/dependencies.md
@@ -44,9 +44,9 @@ end with a timestamp and the first 12 characters of the commit identifier:
If a VCS tag matches one of these patterns, it is ignored.
-For a complete understanding of Go modules and versioning, see [this series of
-blog posts](https://go.dev/blog/using-go-modules) on the official Go
-website.
+For a complete understanding of Go modules and versioning, see
+[this series of blog posts](https://go.dev/blog/using-go-modules)
+on the official Go website.
## 'Module' vs 'Package'
diff --git a/doc/development/go_guide/index.md b/doc/development/go_guide/index.md
index 1a11321b70f..711b0662a8c 100644
--- a/doc/development/go_guide/index.md
+++ b/doc/development/go_guide/index.md
@@ -145,18 +145,16 @@ Go GitLab linter plugins are maintained in the [`gitlab-org/language-tools/go/li
## Dependencies
Dependencies should be kept to the minimum. The introduction of a new
-dependency should be argued in the merge request, as per our [Approval
-Guidelines](../code_review.md#approval-guidelines). Both [License
-Scanning](../../user/compliance/license_compliance/index.md)
-**(ULTIMATE)** and [Dependency
-Scanning](../../user/application_security/dependency_scanning/index.md)
-**(ULTIMATE)** should be activated on all projects to ensure new dependencies
+dependency should be argued in the merge request, as per our [Approval Guidelines](../code_review.md#approval-guidelines).
+Both [License Scanning](../../user/compliance/license_compliance/index.md)
+and [Dependency Scanning](../../user/application_security/dependency_scanning/index.md)
+should be activated on all projects to ensure new dependencies
security status and license compatibility.
### Modules
-In Go 1.11 and later, a standard dependency system is available behind the name [Go
-Modules](https://github.com/golang/go/wiki/Modules). It provides a way to
+In Go 1.11 and later, a standard dependency system is available behind the name
+[Go Modules](https://github.com/golang/go/wiki/Modules). It provides a way to
define and lock dependencies for reproducible builds. It should be used
whenever possible.
@@ -168,8 +166,8 @@ projects, and makes merge requests easier to review.
In some cases, such as building a Go project for it to act as a dependency of a
CI run for another project, removing the `vendor/` directory means the code must
be downloaded repeatedly, which can lead to intermittent problems due to rate
-limiting or network failures. In these circumstances, you should [cache the
-downloaded code between](../../ci/caching/index.md#cache-go-dependencies).
+limiting or network failures. In these circumstances, you should
+[cache the downloaded code between](../../ci/caching/index.md#cache-go-dependencies).
There was a
[bug on modules checksums](https://github.com/golang/go/issues/29278) in Go versions earlier than v1.11.4, so make
@@ -330,18 +328,15 @@ A few things to keep in mind when adding context:
### References for working with errors
- [Go 1.13 errors](https://go.dev/blog/go1.13-errors).
-- [Programing with
- errors](https://peter.bourgon.org/blog/2019/09/11/programming-with-errors.html).
-- [Don't just check errors, handle them
- gracefully](https://dave.cheney.net/2016/04/27/dont-just-check-errors-handle-them-gracefully).
+- [Programing with errors](https://peter.bourgon.org/blog/2019/09/11/programming-with-errors.html).
+- [Don't just check errors, handle them gracefully](https://dave.cheney.net/2016/04/27/dont-just-check-errors-handle-them-gracefully).
## CLIs
Every Go program is launched from the command line.
[`cli`](https://github.com/urfave/cli) is a convenient package to create command
line apps. It should be used whether the project is a daemon or a simple CLI
-tool. Flags can be mapped to [environment
-variables](https://github.com/urfave/cli#values-from-the-environment) directly,
+tool. Flags can be mapped to [environment variables](https://github.com/urfave/cli#values-from-the-environment) directly,
which documents and centralizes at the same time all the possible command line
interactions with the program. Don't use `os.GetEnv`, it hides variables deep
in the code.
@@ -362,8 +357,7 @@ Every binary ideally must have structured (JSON) logging in place as it helps
with searching and filtering the logs. At GitLab we use structured logging in
JSON format, as all our infrastructure assumes that. When using
[Logrus](https://github.com/sirupsen/logrus) you can turn on structured
-logging simply by using the build in [JSON
-formatter](https://github.com/sirupsen/logrus#formatters). This follows the
+logging simply by using the build in [JSON formatter](https://github.com/sirupsen/logrus#formatters). This follows the
same logging type we use in our [Ruby applications](../logging.md#use-structured-json-logging).
#### How to use Logrus
@@ -414,8 +408,7 @@ should be used in functions that can block and passed as the first parameter.
Every project should have a `Dockerfile` at the root of their repository, to
build and run the project. Since Go program are static binaries, they should
not require any external dependency, and shells in the final image are useless.
-We encourage [Multistage
-builds](https://docs.docker.com/develop/develop-images/multistage-build/):
+We encourage [Multistage builds](https://docs.docker.com/develop/develop-images/multistage-build/):
- They let the user build the project with the right Go version and
dependencies.
@@ -448,36 +441,28 @@ up to run `goimports -local gitlab.com/gitlab-org` so that it's applied to every
If initializing a slice, provide a capacity where possible to avoid extra
allocations.
-<table>
-<tr><th>:white_check_mark: Do</th><th>:x: Don't</th></tr>
-<tr>
- <td>
+**Don't:**
- ```golang
- s2 := make([]string, 0, size)
- for _, val := range s1 {
- s2 = append(s2, val)
- }
- ```
+```golang
+var s2 []string
+for _, val := range s1 {
+ s2 = append(s2, val)
+}
+```
- </td>
- <td>
+**Do:**
- ```golang
- var s2 []string
- for _, val := range s1 {
- s2 = append(s2, val)
- }
- ```
-
- </td>
-</tr>
-</table>
+```golang
+s2 := make([]string, 0, size)
+for _, val := range s1 {
+ s2 = append(s2, val)
+}
+```
If no capacity is passed to `make` when creating a new slice, `append`
will continuously resize the slice's backing array if it cannot hold
the values. Providing the capacity ensures that allocations are kept
-to a minimum. It is recommended that the [`prealloc`](https://github.com/alexkohler/prealloc)
+to a minimum. It's recommended that the [`prealloc`](https://github.com/alexkohler/prealloc)
golanci-lint rule automatically check for this.
### Analyzer Tests
diff --git a/doc/development/gotchas.md b/doc/development/gotchas.md
index d89dbbcf904..af11340737f 100644
--- a/doc/development/gotchas.md
+++ b/doc/development/gotchas.md
@@ -200,8 +200,7 @@ refresh_service.execute(oldrev, newrev, ref)
See ["Why is it bad style to `rescue Exception => e` in Ruby?"](https://stackoverflow.com/questions/10048173/why-is-it-bad-style-to-rescue-exception-e-in-ruby).
-This rule is [enforced automatically by
-RuboCop](https://gitlab.com/gitlab-org/gitlab-foss/blob/8-4-stable/.rubocop.yml#L911-914)._
+This rule is [enforced automatically by RuboCop](https://gitlab.com/gitlab-org/gitlab-foss/blob/8-4-stable/.rubocop.yml#L911-914)._
## Do not use inline JavaScript in views
diff --git a/doc/development/hash_indexes.md b/doc/development/hash_indexes.md
index 731639b6f06..2a9f7e5a25d 100644
--- a/doc/development/hash_indexes.md
+++ b/doc/development/hash_indexes.md
@@ -1,26 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/hash_indexes.md'
+remove_date: '2022-11-06'
---
-# Hash Indexes
+This document was moved to [another location](database/hash_indexes.md).
-PostgreSQL supports hash indexes besides the regular B-tree
-indexes. Hash indexes however are to be avoided at all costs. While they may
-_sometimes_ provide better performance the cost of rehashing can be very high.
-More importantly: at least until PostgreSQL 10.0 hash indexes are not
-WAL-logged, meaning they are not replicated to any replicas. From the PostgreSQL
-documentation:
-
-> Hash index operations are not presently WAL-logged, so hash indexes might need
-> to be rebuilt with REINDEX after a database crash if there were unwritten
-> changes. Also, changes to hash indexes are not replicated over streaming or
-> file-based replication after the initial base backup, so they give wrong
-> answers to queries that subsequently use them. For these reasons, hash index
-> use is presently discouraged.
-
-RuboCop is configured to register an offense when it detects the use of a hash
-index.
-
-Instead of using hash indexes you should use regular B-tree indexes.
+<!-- This redirect file can be deleted after <2022-11-06>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/i18n/externalization.md b/doc/development/i18n/externalization.md
index 18704fc2b60..ea22b33f1bc 100644
--- a/doc/development/i18n/externalization.md
+++ b/doc/development/i18n/externalization.md
@@ -85,7 +85,7 @@ Or:
hello = _("Hello world!")
```
-Be careful when translating strings at the class or module level since these are only evaluated once
+Be careful when translating strings at the class or module level because these are only evaluated once
at class load time. For example:
```ruby
@@ -299,16 +299,16 @@ use `%{created_at}` in Ruby but `%{createdAt}` in JavaScript. Make sure to
- In Ruby/HAML:
```ruby
- _("Hello %{name}") % { name: 'Joe' } => 'Hello Joe'
+ format(_("Hello %{name}"), name: 'Joe') => 'Hello Joe'
```
- In Vue:
Use the [`GlSprintf`](https://gitlab-org.gitlab.io/gitlab-ui/?path=/docs/utilities-sprintf--sentence-with-link) component if:
- - You need to include child components in the translation string.
- - You need to include HTML in your translation string.
- - You're using `sprintf` and need to pass `false` as the third argument to
+ - You are including child components in the translation string.
+ - You are including HTML in your translation string.
+ - You are using `sprintf` and are passing `false` as the third argument to
prevent it from escaping placeholder values.
For example:
@@ -482,7 +482,7 @@ Instead of this:
```ruby
# incorrect usage example
-n_("%{project_name}", "%d projects selected", count) % { project_name: 'GitLab' }
+format(n_("%{project_name}", "%d projects selected", count), project_name: 'GitLab')
```
### Namespaces
diff --git a/doc/development/i18n/proofreader.md b/doc/development/i18n/proofreader.md
index cee078ca891..f986a852567 100644
--- a/doc/development/i18n/proofreader.md
+++ b/doc/development/i18n/proofreader.md
@@ -34,6 +34,7 @@ are very appreciative of the work done by translators and proofreaders!
- Weizhe Ding - [GitLab](https://gitlab.com/d.weizhe), [Crowdin](https://crowdin.com/profile/d.weizhe)
- Yi-Jyun Pan - [GitLab](https://gitlab.com/pan93412), [Crowdin](https://crowdin.com/profile/pan93412)
- Victor Wu - [GitLab](https://gitlab.com/_victorwu_), [Crowdin](https://crowdin.com/profile/victorwu)
+ - Hansel Wang - [GitLab](https://gitlab.com/airness), [Crowdin](https://crowdin.com/profile/airness)
- Chinese Traditional, Hong Kong 繁體中文 (香港)
- Victor Wu - [GitLab](https://gitlab.com/_victorwu_), [Crowdin](https://crowdin.com/profile/victorwu)
- Ivan Ip - [GitLab](https://gitlab.com/lifehome), [Crowdin](https://crowdin.com/profile/lifehome)
@@ -63,7 +64,6 @@ are very appreciative of the work done by translators and proofreaders!
- German
- Michael Hahnle - [GitLab](https://gitlab.com/mhah), [Crowdin](https://crowdin.com/profile/mhah)
- Katrin Leinweber - [GitLab](https://gitlab.com/katrinleinweber), [Crowdin](https://crowdin.com/profile/katrinleinweber)
- - Justman10000 - [GitLab](https://gitlab.com/Justman10000), [Crowdin](https://crowdin.com/profile/Justman10000)
- Vladislav Wanner - [GitLab](https://gitlab.com/RumBugen), [Crowdin](https://crowdin.com/profile/RumBugen)
- Greek
- Proofreaders needed.
diff --git a/doc/development/image_scaling.md b/doc/development/image_scaling.md
index 93575429369..2078db8294c 100644
--- a/doc/development/image_scaling.md
+++ b/doc/development/image_scaling.md
@@ -1,6 +1,6 @@
---
stage: Data Stores
-group: Memory
+group: Application Performance
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
diff --git a/doc/development/import_export.md b/doc/development/import_export.md
index 2a29df380de..6cbbb6bf716 100644
--- a/doc/development/import_export.md
+++ b/doc/development/import_export.md
@@ -113,8 +113,8 @@ Marked stuck import jobs as failed. JIDs: xyz
While the performance problems are not tackled, there is a process to workaround
importing big projects, using a foreground import:
-[Foreground import](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/5384) of big projects for customers.
-(Using the import template in the [infrastructure tracker](https://gitlab.com/gitlab-com/gl-infra/infrastructure/))
+[Foreground import](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/5384) of big projects for customers.
+(Using the import template in the [infrastructure tracker](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues))
## Security
diff --git a/doc/development/import_project.md b/doc/development/import_project.md
index c63ba229921..7c55d2e2668 100644
--- a/doc/development/import_project.md
+++ b/doc/development/import_project.md
@@ -144,7 +144,7 @@ project files on disk.
##### Import is successful, but with a `Total number of not imported relations: XX` message, and issues are not created during the import
If you receive a `Total number of not imported relations: XX` message, and issues
-aren't created during the import, check [exceptions_json.log](../administration/logs.md#exceptions_jsonlog).
+aren't created during the import, check [exceptions_json.log](../administration/logs/index.md#exceptions_jsonlog).
You might see an error like `N is out of range for ActiveModel::Type::Integer with limit 4 bytes`,
where `N` is the integer exceeding the 4-byte integer limit. If that's the case, you
are likely hitting the issue with rebalancing of `relative_position` field of the issues.
diff --git a/doc/development/index.md b/doc/development/index.md
index 1b897db5097..34e6f466664 100644
--- a/doc/development/index.md
+++ b/doc/development/index.md
@@ -7,9 +7,9 @@ info: "See the Technical Writers assigned to Development Guidelines: https://abo
description: "Development Guidelines: learn how to contribute to GitLab."
---
-# Contributor and Development Docs
+# Contribute to the development of GitLab
-Learn the processes and technical information needed for contributing to GitLab.
+Learn how to contribute to the development of the GitLab product.
This content is intended for members of the GitLab Team as well as community
contributors. Content specific to the GitLab Team should instead be included in
diff --git a/doc/development/insert_into_tables_in_batches.md b/doc/development/insert_into_tables_in_batches.md
index ebed3d16319..ced5332e880 100644
--- a/doc/development/insert_into_tables_in_batches.md
+++ b/doc/development/insert_into_tables_in_batches.md
@@ -1,196 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
-description: "Sometimes it is necessary to store large amounts of records at once, which can be inefficient
-when iterating collections and performing individual `save`s. With the arrival of `insert_all`
-in Rails 6, which operates at the row level (that is, using `Hash`es), GitLab has added a set
-of APIs that make it safe and simple to insert ActiveRecord objects in bulk."
+redirect_to: 'database/insert_into_tables_in_batches.md'
+remove_date: '2022-11-05'
---
-# Insert into tables in batches
+This document was moved to [another location](database/insert_into_tables_in_batches.md).
-Sometimes it is necessary to store large amounts of records at once, which can be inefficient
-when iterating collections and saving each record individually. With the arrival of
-[`insert_all`](https://apidock.com/rails/ActiveRecord/Persistence/ClassMethods/insert_all)
-in Rails 6, which operates at the row level (that is, using `Hash` objects), GitLab has added a set
-of APIs that make it safe and simple to insert `ActiveRecord` objects in bulk.
-
-## Prepare `ApplicationRecord`s for bulk insertion
-
-In order for a model class to take advantage of the bulk insertion API, it has to include the
-`BulkInsertSafe` concern first:
-
-```ruby
-class MyModel < ApplicationRecord
- # other includes here
- # ...
- include BulkInsertSafe # include this last
-
- # ...
-end
-```
-
-The `BulkInsertSafe` concern has two functions:
-
-- It performs checks against your model class to ensure that it does not use ActiveRecord
- APIs that are not safe to use with respect to bulk insertions (more on that below).
-- It adds new class methods `bulk_insert!` and `bulk_upsert!`, which you can use to insert many records at once.
-
-## Insert records with `bulk_insert!` and `bulk_upsert!`
-
-If the target class passes the checks performed by `BulkInsertSafe`, you can insert an array of
-ActiveRecord model objects as follows:
-
-```ruby
-records = [MyModel.new, ...]
-
-MyModel.bulk_insert!(records)
-```
-
-Calls to `bulk_insert!` always attempt to insert _new records_. If instead
-you would like to replace existing records with new values, while still inserting those
-that do not already exist, then you can use `bulk_upsert!`:
-
-```ruby
-records = [MyModel.new, existing_model, ...]
-
-MyModel.bulk_upsert!(records, unique_by: [:name])
-```
-
-In this example, `unique_by` specifies the columns by which records are considered to be
-unique and as such are updated if they existed prior to insertion. For example, if
-`existing_model` has a `name` attribute, and if a record with the same `name` value already
-exists, its fields are updated with those of `existing_model`.
-
-The `unique_by` parameter can also be passed as a `Symbol`, in which case it specifies
-a database index by which a column is considered unique:
-
-```ruby
-MyModel.bulk_insert!(records, unique_by: :index_on_name)
-```
-
-### Record validation
-
-The `bulk_insert!` method guarantees that `records` are inserted transactionally, and
-runs validations on each record prior to insertion. If any record fails to validate,
-an error is raised and the transaction is rolled back. You can turn off validations via
-the `:validate` option:
-
-```ruby
-MyModel.bulk_insert!(records, validate: false)
-```
-
-### Batch size configuration
-
-In those cases where the number of `records` is above a given threshold, insertions
-occur in multiple batches. The default batch size is defined in
-[`BulkInsertSafe::DEFAULT_BATCH_SIZE`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/bulk_insert_safe.rb).
-Assuming a default threshold of 500, inserting 950 records
-would result in two batches being written sequentially (of size 500 and 450 respectively.)
-You can override the default batch size via the `:batch_size` option:
-
-```ruby
-MyModel.bulk_insert!(records, batch_size: 100)
-```
-
-Assuming the same number of 950 records, this would result in 10 batches being written instead.
-Since this also affects the number of `INSERT` statements that occur, make sure you measure the
-performance impact this might have on your code. There is a trade-off between the number of
-`INSERT` statements the database has to process and the size and cost of each `INSERT`.
-
-### Handling duplicate records
-
-NOTE:
-This parameter applies only to `bulk_insert!`. If you intend to update existing
-records, use `bulk_upsert!` instead.
-
-It may happen that some records you are trying to insert already exist, which would result in
-primary key conflicts. There are two ways to address this problem: failing fast by raising an
-error or skipping duplicate records. The default behavior of `bulk_insert!` is to fail fast
-and raise an `ActiveRecord::RecordNotUnique` error.
-
-If this is undesirable, you can instead skip duplicate records with the `skip_duplicates` flag:
-
-```ruby
-MyModel.bulk_insert!(records, skip_duplicates: true)
-```
-
-### Requirements for safe bulk insertions
-
-Large parts of ActiveRecord's persistence API are built around the notion of callbacks. Many
-of these callbacks fire in response to model life cycle events such as `save` or `create`.
-These callbacks cannot be used with bulk insertions, since they are meant to be called for
-every instance that is saved or created. Since these events do not fire when
-records are inserted in bulk, we currently prevent their use.
-
-The specifics around which callbacks are explicitly allowed are defined in
-[`BulkInsertSafe`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/bulk_insert_safe.rb).
-Consult the module source code for details. If your class uses callbacks that are not explicitly designated
-safe and you `include BulkInsertSafe` the application fails with an error.
-
-### `BulkInsertSafe` versus `InsertAll`
-
-Internally, `BulkInsertSafe` is based on `InsertAll`, and you may wonder when to choose
-the former over the latter. To help you make the decision,
-the key differences between these classes are listed in the table below.
-
-| | Input type | Validates input | Specify batch size | Can bypass callbacks | Transactional |
-|--------------- | -------------------- | --------------- | ------------------ | --------------------------------- | ------------- |
-| `bulk_insert!` | ActiveRecord objects | Yes (optional) | Yes (optional) | No (prevents unsafe callback use) | Yes |
-| `insert_all!` | Attribute hashes | No | No | Yes | Yes |
-
-To summarize, `BulkInsertSafe` moves bulk inserts closer to how ActiveRecord objects
-and inserts would normally behave. However, if all you need is to insert raw data in bulk, then
-`insert_all` is more efficient.
-
-## Insert `has_many` associations in bulk
-
-A common use case is to save collections of associated relations through the owner side of the relation,
-where the owned relation is associated to the owner through the `has_many` class method:
-
-```ruby
-owner = OwnerModel.new(owned_relations: array_of_owned_relations)
-# saves all `owned_relations` one-by-one
-owner.save!
-```
-
-This issues a single `INSERT`, and transaction, for every record in `owned_relations`, which is inefficient if
-`array_of_owned_relations` is large. To remedy this, the `BulkInsertableAssociations` concern can be
-used to declare that the owner defines associations that are safe for bulk insertion:
-
-```ruby
-class OwnerModel < ApplicationRecord
- # other includes here
- # ...
- include BulkInsertableAssociations # include this last
-
- has_many :my_models
-end
-```
-
-Here `my_models` must be declared `BulkInsertSafe` (as described previously) for bulk insertions
-to happen. You can now insert any yet unsaved records as follows:
-
-```ruby
-BulkInsertableAssociations.with_bulk_insert do
- owner = OwnerModel.new(my_models: array_of_my_model_instances)
- # saves `my_models` using a single bulk insert (possibly via multiple batches)
- owner.save!
-end
-```
-
-You can still save relations that are not `BulkInsertSafe` in this block; they
-simply are treated as if you had invoked `save` from outside the block.
-
-## Known limitations
-
-There are a few restrictions to how these APIs can be used:
-
-- `BulkInsertableAssociations`:
- - It is currently only compatible with `has_many` relations.
- - It does not yet support `has_many through: ...` relations.
-
-Moreover, input data should either be limited to around 1000 records at most,
-or already batched prior to calling bulk insert. The `INSERT` statement runs in a single
-transaction, so for large amounts of records it may negatively affect database stability.
+<!-- This redirect file can be deleted after <2022-11-05>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/integrations/index.md b/doc/development/integrations/index.md
index 5d1bd5ad61c..0387ba2e4dd 100644
--- a/doc/development/integrations/index.md
+++ b/doc/development/integrations/index.md
@@ -273,7 +273,7 @@ When developing a new integration, we also recommend you gate the availability b
You can provide help text in the integration form, including links to off-site documentation,
as described above in [Customize the frontend form](#customize-the-frontend-form). Refer to
-our [usability guidelines](https://design.gitlab.com/usability/helping-users) for help text.
+our [usability guidelines](https://design.gitlab.com/usability/helping-users/) for help text.
For more detailed documentation, provide a page in `doc/user/project/integrations`,
and link it from the [Integrations overview](../../user/project/integrations/index.md).
diff --git a/doc/development/integrations/secure.md b/doc/development/integrations/secure.md
index 0a0c5e4d2a6..55e57a3c2ee 100644
--- a/doc/development/integrations/secure.md
+++ b/doc/development/integrations/secure.md
@@ -151,7 +151,7 @@ Depending on the CI infrastructure,
the CI may have to fetch the Docker image every time the job runs.
For the scanning job to run fast and avoid wasting bandwidth, Docker images should be as small as
possible. You should aim for 50MB or smaller. If that isn't possible, try to keep it below 1.46 GB,
-which is the size of a CD-ROM.
+which is the size of a DVD-ROM.
If the scanner requires a fully functional Linux environment,
it is recommended to use a [Debian](https://www.debian.org/intro/about) "slim" distribution or [Alpine Linux](https://www.alpinelinux.org/).
@@ -253,6 +253,10 @@ then `artifacts:reports:dependency_scanning` must be set to `depscan.json`.
Following the POSIX exit code standard, the scanner exits with 0 for success and any number from 1 to 255 for anything else.
Success also includes the case when vulnerabilities are found.
+When a CI job fails, security report results are not ingested by GitLab, even if the job
+[allows failure](../../ci/yaml/#allow_failure). The report artifacts are still uploaded to GitLab and available
+for [download in the pipeline security tab](../../user/application_security/vulnerability_report/pipeline.md#download-security-scan-outputs).
+
When executing a scanning job using the [Docker-in-Docker privileged mode](../../user/application_security/sast/index.md#requirements),
we reserve the following standard exit codes.
@@ -310,7 +314,7 @@ This documentation gives an overview of the report JSON format,
as well as recommendations and examples to help integrators set its fields.
The format is extensively described in the documentation of
[SAST](../../user/application_security/sast/index.md#reports-json-format),
-[DAST](../../user/application_security/dast/#reports),
+[DAST](../../user/application_security/dast/index.md#reports),
[Dependency Scanning](../../user/application_security/dependency_scanning/index.md#reports-json-format),
and [Container Scanning](../../user/application_security/container_scanning/index.md#reports-json-format)
@@ -493,19 +497,20 @@ We recommend that you use the identifiers the GitLab scanners already define:
|------------|------|---------------|
| [CVE](https://cve.mitre.org/cve/) | `cve` | CVE-2019-10086 |
| [CWE](https://cwe.mitre.org/data/index.html) | `cwe` | CWE-1026 |
+| [ELSA](https://linux.oracle.com/security/) | `elsa` | ELSA-2020-0085 |
| [OSVD](https://cve.mitre.org/data/refs/refmap/source-OSVDB.html) | `osvdb` | OSVDB-113928 |
+| [OWASP](https://owasp.org/Top10/) | `owasp` | A01:2021–Broken Access Control Design |
+| [RHSA](https://access.redhat.com/errata-search/#/) | `rhsa` | RHSA-2020:0111 |
| [USN](https://ubuntu.com/security/notices) | `usn` | USN-4234-1 |
| [WASC](http://projects.webappsec.org/Threat-Classification-Reference-Grid) | `wasc` | WASC-19 |
-| [RHSA](https://access.redhat.com/errata/#/) | `rhsa` | RHSA-2020:0111 |
-| [ELSA](https://linux.oracle.com/security/) | `elsa` | ELSA-2020-0085 |
The generic identifiers listed above are defined in the [common library](https://gitlab.com/gitlab-org/security-products/analyzers/common),
which is shared by some of the analyzers that GitLab maintains. You can [contribute](https://gitlab.com/gitlab-org/security-products/analyzers/common/blob/master/issue/identifier.go)
new generic identifiers to if needed. Analyzers may also produce vendor-specific or product-specific
identifiers, which don't belong in the [common library](https://gitlab.com/gitlab-org/security-products/analyzers/common).
-The first item of the `identifiers` array is called the [primary
-identifier](../../user/application_security/terminology/#primary-identifier).
+The first item of the `identifiers` array is called the
+[primary identifier](../../user/application_security/terminology/index.md#primary-identifier).
The primary identifier is particularly important, because it is used to
[track vulnerabilities](#tracking-and-merging-vulnerabilities) as new commits are pushed to the repository.
Identifiers are also used to [merge duplicate vulnerabilities](#tracking-and-merging-vulnerabilities)
diff --git a/doc/development/internal_api/index.md b/doc/development/internal_api/index.md
index 13e095b4a83..9b29af3e433 100644
--- a/doc/development/internal_api/index.md
+++ b/doc/development/internal_api/index.md
@@ -148,8 +148,8 @@ curl --request POST --header "Gitlab-Shared-Secret: <Base64 encoded token>" \
## Authorized Keys Check
This endpoint is called by the GitLab Shell authorized keys
-check. Which is called by OpenSSH for [fast SSH key
-lookup](../../administration/operations/fast_ssh_key_lookup.md).
+check. Which is called by OpenSSH for
+[fast SSH key lookup](../../administration/operations/fast_ssh_key_lookup.md).
| Attribute | Type | Required | Description |
|:----------|:-------|:---------|:------------|
@@ -494,10 +494,15 @@ curl --request GET --header "Gitlab-Kas-Api-Request: <JWT token>" \
Called from GitLab agent server (`kas`) to increase the usage
metric counters.
-| Attribute | Type | Required | Description |
-|:----------|:-------|:---------|:------------|
-| `gitops_sync_count` | integer| no | The number to increase the `gitops_sync_count` counter by |
-| `k8s_api_proxy_request_count` | integer| no | The number to increase the `k8s_api_proxy_request_count` counter by |
+| Attribute | Type | Required | Description |
+|:---------------------------------------------------------------------------|:--------------|:---------|:-----------------------------------------------------------------------------------------------------------------|
+| `gitops_sync_count` (DEPRECATED) | integer | no | The number to increase the `gitops_sync` counter by |
+| `k8s_api_proxy_request_count` (DEPRECATED) | integer | no | The number to increase the `k8s_api_proxy_request` counter by |
+| `counters` | hash | no | The number to increase the `k8s_api_proxy_request` counter by |
+| `counters["k8s_api_proxy_request"]` | integer | no | The number to increase the `k8s_api_proxy_request` counter by |
+| `counters["gitops_sync"]` | integer | no | The number to increase the `gitops_sync` counter by |
+| `unique_counters` | hash | no | The number to increase the `k8s_api_proxy_request` counter by |
+| `unique_counters["agent_users_using_ci_tunnel"]` | integer array | no | The set of unique user ids that have interacted a CI Tunnel to track the `agent_users_using_ci_tunnel` metric event |
```plaintext
POST /internal/kubernetes/usage_metrics
@@ -624,6 +629,40 @@ Example response:
}
```
+### Policy Configuration
+
+Called from GitLab agent server (`kas`) to retrieve `policies_configuration`
+configured for the project belonging to the agent token. GitLab `kas` uses
+this to configure the agent to scan images in the Kubernetes cluster based on the configuration.
+
+```plaintext
+GET /internal/kubernetes/modules/starboard_vulnerability/policies_configuration
+```
+
+Example Request:
+
+```shell
+curl --request GET --header "Gitlab-Kas-Api-Request: <JWT token>" \
+ --header "Authorization: Bearer <agent token>" "http://localhost:3000/api/v4/internal/kubernetes/modules/starboard_vulnerability/policies_configuration"
+```
+
+Example response:
+
+```json
+{
+ "configurations": [
+ {
+ "cadence": "30 2 * * *",
+ "namespaces": [
+ "namespace-a",
+ "namespace-b"
+ ],
+ "updated_at": "2022-06-02T05:36:26+00:00"
+ }
+ ]
+}
+```
+
## Subscriptions
The subscriptions endpoint is used by [CustomersDot](https://gitlab.com/gitlab-org/customers-gitlab-com) (`customers.gitlab.com`)
diff --git a/doc/development/issue_types.md b/doc/development/issue_types.md
index e6047c62827..820c37aeb14 100644
--- a/doc/development/issue_types.md
+++ b/doc/development/issue_types.md
@@ -13,10 +13,8 @@ Sometimes when a new resource type is added it's not clear if it should be only
"extension" of Issue (Issue Type) or if it should be a new first-class resource type
(similar to issue, epic, merge request, snippet).
-The idea of Issue Types was first proposed in [this
-issue](https://gitlab.com/gitlab-org/gitlab/-/issues/8767) and its usage was
-discussed few times since then, for example in [incident
-management](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/55532).
+The idea of Issue Types was first proposed in [this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/8767) and its usage was
+discussed few times since then, for example in [incident management](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/55532).
## What is an Issue Type
diff --git a/doc/development/iterating_tables_in_batches.md b/doc/development/iterating_tables_in_batches.md
index 1159e3755e5..589e38a5cb0 100644
--- a/doc/development/iterating_tables_in_batches.md
+++ b/doc/development/iterating_tables_in_batches.md
@@ -1,598 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/iterating_tables_in_batches.md'
+remove_date: '2022-11-06'
---
-# Iterating tables in batches
+This document was moved to [another location](database/iterating_tables_in_batches.md).
-Rails provides a method called `in_batches` that can be used to iterate over
-rows in batches. For example:
-
-```ruby
-User.in_batches(of: 10) do |relation|
- relation.update_all(updated_at: Time.now)
-end
-```
-
-Unfortunately, this method is implemented in a way that is not very efficient,
-both query and memory usage wise.
-
-To work around this you can include the `EachBatch` module into your models,
-then use the `each_batch` class method. For example:
-
-```ruby
-class User < ActiveRecord::Base
- include EachBatch
-end
-
-User.each_batch(of: 10) do |relation|
- relation.update_all(updated_at: Time.now)
-end
-```
-
-This produces queries such as:
-
-```plaintext
-User Load (0.7ms) SELECT "users"."id" FROM "users" WHERE ("users"."id" >= 41654) ORDER BY "users"."id" ASC LIMIT 1 OFFSET 1000
- (0.7ms) SELECT COUNT(*) FROM "users" WHERE ("users"."id" >= 41654) AND ("users"."id" < 42687)
-```
-
-The API of this method is similar to `in_batches`, though it doesn't support
-all of the arguments that `in_batches` supports. You should always use
-`each_batch` _unless_ you have a specific need for `in_batches`.
-
-## Iterating over non-unique columns
-
-One should proceed with extra caution. When you iterate over an attribute that is not unique,
-even with the applied max batch size, there is no guarantee that the resulting batches do not
-surpass it. The following snippet demonstrates this situation when one attempt to select
-`Ci::Build` entries for users with `id` between `1` and `10,000`, the database returns
-`1 215 178` matching rows.
-
-```ruby
-[ gstg ] production> Ci::Build.where(user_id: (1..10_000)).size
-=> 1215178
-```
-
-This happens because the built relation is translated into the following query:
-
-```ruby
-[ gstg ] production> puts Ci::Build.where(user_id: (1..10_000)).to_sql
-SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."user_id" BETWEEN 1 AND 10000
-=> nil
-```
-
-`And` queries which filter non-unique column by range `WHERE "ci_builds"."user_id" BETWEEN ? AND ?`,
-even though the range size is limited to a certain threshold (`10,000` in the previous example) this
-threshold does not translate to the size of the returned dataset. That happens because when taking
-`n` possible values of attributes, one can't tell for sure that the number of records that contains
-them is less than `n`.
-
-### Loose-index scan with `distinct_each_batch`
-
-When iterating over a non-unique column is necessary, use the `distinct_each_batch` helper
-method. The helper uses the [loose-index scan technique](https://wiki.postgresql.org/wiki/Loose_indexscan)
-(skip-index scan) to skip duplicated values within a database index.
-
-Example: iterating over distinct `author_id` in the Issue model
-
-```ruby
-Issue.distinct_each_batch(column: :author_id, of: 1000) do |relation|
- users = User.where(id: relation.select(:author_id)).to_a
-end
-```
-
-The technique provides stable performance between the batches regardless of the data distribution.
-The `relation` object returns an ActiveRecord scope where only the given `column` is available.
-Other columns are not loaded.
-
-The underlying database queries use recursive CTEs, which adds extra overhead. We therefore advise to use
-smaller batch sizes than those used for a standard `each_batch` iteration.
-
-## Column definition
-
-`EachBatch` uses the primary key of the model by default for the iteration. This works most of the
-cases, however in some cases, you might want to use a different column for the iteration.
-
-```ruby
-Project.distinct.each_batch(column: :creator_id, of: 10) do |relation|
- puts User.where(id: relation.select(:creator_id)).map(&:id)
-end
-```
-
-The query above iterates over the project creators and prints them out without duplications.
-
-NOTE:
-In case the column is not unique (no unique index definition), calling the `distinct` method on
-the relation is necessary. Using not unique column without `distinct` may result in `each_batch`
-falling into an endless loop as described in following
-[issue](https://gitlab.com/gitlab-org/gitlab/-/issues/285097).
-
-## `EachBatch` in data migrations
-
-When dealing with data migrations the preferred way to iterate over a large volume of data is using
-`EachBatch`.
-
-A special case of data migration is a [background migration](database/background_migrations.md#scheduling)
-where the actual data modification is executed in a background job. The migration code that
-determines the data ranges (slices) and schedules the background jobs uses `each_batch`.
-
-## Efficient usage of `each_batch`
-
-`EachBatch` helps to iterate over large tables. It's important to highlight that `EachBatch`
-does not magically solve all iteration-related performance problems, and it might not help at
-all in some scenarios. From the database point of view, correctly configured database indexes are
-also necessary to make `EachBatch` perform well.
-
-### Example 1: Simple iteration
-
-Let's consider that we want to iterate over the `users` table and print the `User` records to the
-standard output. The `users` table contains millions of records, thus running one query to fetch
-the users likely times out.
-
-![Users table overview](img/each_batch_users_table_v13_7.png)
-
-This is a simplified version of the `users` table which contains several rows. We have a few
-smaller gaps in the `id` column to make the example a bit more realistic (a few records were
-already deleted). Currently, we have one index on the `id` field.
-
-Loading all users into memory (avoid):
-
-```ruby
-users = User.all
-
-users.each { |user| puts user.inspect }
-```
-
-Use `each_batch`:
-
-```ruby
-# Note: for this example I picked 5 as the batch size, the default is 1_000
-User.each_batch(of: 5) do |relation|
- relation.each { |user| puts user.inspect }
-end
-```
-
-#### How `each_batch` works
-
-As the first step, it finds the lowest `id` (start `id`) in the table by executing the following
-database query:
-
-```sql
-SELECT "users"."id" FROM "users" ORDER BY "users"."id" ASC LIMIT 1
-```
-
-![Reading the start ID value](img/each_batch_users_table_iteration_1_v13_7.png)
-
-Notice that the query only reads data from the index (`INDEX ONLY SCAN`), the table is not
-accessed. Database indexes are sorted so taking out the first item is a very cheap operation.
-
-The next step is to find the next `id` (end `id`) which should respect the batch size
-configuration. In this example we used a batch size of 5. `EachBatch` uses the `OFFSET` clause
-to get a "shifted" `id` value.
-
-```sql
-SELECT "users"."id" FROM "users" WHERE "users"."id" >= 1 ORDER BY "users"."id" ASC LIMIT 1 OFFSET 5
-```
-
-![Reading the end ID value](img/each_batch_users_table_iteration_2_v13_7.png)
-
-Again, the query only looks into the index. The `OFFSET 5` takes out the sixth `id` value: this
-query reads a maximum of six items from the index regardless of the table size or the iteration
-count.
-
-At this point, we know the `id` range for the first batch. Now it's time to construct the query
-for the `relation` block.
-
-```sql
-SELECT "users".* FROM "users" WHERE "users"."id" >= 1 AND "users"."id" < 302
-```
-
-![Reading the rows from the `users` table](img/each_batch_users_table_iteration_3_v13_7.png)
-
-Notice the `<` sign. Previously six items were read from the index and in this query, the last
-value is "excluded". The query looks at the index to get the location of the five `user`
-rows on the disk and read the rows from the table. The returned array is processed in Ruby.
-
-The first iteration is done. For the next iteration, the last `id` value is reused from the
-previous iteration in order to find out the next end `id` value.
-
-```sql
-SELECT "users"."id" FROM "users" WHERE "users"."id" >= 302 ORDER BY "users"."id" ASC LIMIT 1 OFFSET 5
-```
-
-![Reading the second end ID value](img/each_batch_users_table_iteration_4_v13_7.png)
-
-Now we can easily construct the `users` query for the second iteration.
-
-```sql
-SELECT "users".* FROM "users" WHERE "users"."id" >= 302 AND "users"."id" < 353
-```
-
-![Reading the rows for the second iteration from the users table](img/each_batch_users_table_iteration_5_v13_7.png)
-
-### Example 2: Iteration with filters
-
-Building on top of the previous example, we want to print users with zero sign-in count. We keep
-track of the number of sign-ins in the `sign_in_count` column so we write the following code:
-
-```ruby
-users = User.where(sign_in_count: 0)
-
-users.each_batch(of: 5) do |relation|
- relation.each { |user| puts user.inspect }
-end
-```
-
-`each_batch` produces the following SQL query for the start `id` value:
-
-```sql
-SELECT "users"."id" FROM "users" WHERE "users"."sign_in_count" = 0 ORDER BY "users"."id" ASC LIMIT 1
-```
-
-Selecting only the `id` column and ordering by `id` forces the database to use the
-index on the `id` (primary key index) column however, we also have an extra condition on the
-`sign_in_count` column. The column is not part of the index, so the database needs to look into
-the actual table to find the first matching row.
-
-![Reading the index with extra filter](img/each_batch_users_table_filter_v13_7.png)
-
-NOTE:
-The number of scanned rows depends on the data distribution in the table.
-
-- Best case scenario: the first user was never logged in. The database reads only one row.
-- Worst case scenario: all users were logged in at least once. The database reads all rows.
-
-In this particular example, the database had to read 10 rows (regardless of our batch size setting)
-to determine the first `id` value. In a "real-world" application it's hard to predict whether the
-filtering causes problems or not. In the case of GitLab, verifying the data on a
-production replica is a good start, but keep in mind that data distribution on GitLab.com can be
-different from self-managed instances.
-
-#### Improve filtering with `each_batch`
-
-##### Specialized conditional index
-
-```sql
-CREATE INDEX index_on_users_never_logged_in ON users (id) WHERE sign_in_count = 0
-```
-
-This is how our table and the newly created index looks like:
-
-![Reading the specialized index](img/each_batch_users_table_filtered_index_v13_7.png)
-
-This index definition covers the conditions on the `id` and `sign_in_count` columns thus makes the
-`each_batch` queries very effective (similar to the simple iteration example).
-
-It's rare when a user was never signed in so we a anticipate small index size. Including only the
-`id` in the index definition also helps to keep the index size small.
-
-##### Index on columns
-
-Later on, we might want to iterate over the table filtering for different `sign_in_count` values, in
-those cases we cannot use the previously suggested conditional index because the `WHERE` condition
-does not match with our new filter (`sign_in_count > 10`).
-
-To address this problem, we have two options:
-
-- Create another, conditional index to cover the new query.
-- Replace the index with a more generalized configuration.
-
-NOTE:
-Having multiple indexes on the same table and on the same columns could be a performance bottleneck
-when writing data.
-
-Let's consider the following index (avoid):
-
-```sql
-CREATE INDEX index_on_users_never_logged_in ON users (id, sign_in_count)
-```
-
-The index definition starts with the `id` column which makes the index very inefficient from data
-selectivity point of view.
-
-```sql
-SELECT "users"."id" FROM "users" WHERE "users"."sign_in_count" = 0 ORDER BY "users"."id" ASC LIMIT 1
-```
-
-Executing the query above results in an `INDEX ONLY SCAN`. However, the query still needs to
-iterate over an unknown number of entries in the index, and then find the first item where the
-`sign_in_count` is `0`.
-
-![Reading an ineffective index](img/each_batch_users_table_bad_index_v13_7.png)
-
-We can improve the query significantly by swapping the columns in the index definition (prefer).
-
-```sql
-CREATE INDEX index_on_users_never_logged_in ON users (sign_in_count, id)
-```
-
-![Reading a good index](img/each_batch_users_table_good_index_v13_7.png)
-
-The following index definition does not work well with `each_batch` (avoid).
-
-```sql
-CREATE INDEX index_on_users_never_logged_in ON users (sign_in_count)
-```
-
-Since `each_batch` builds range queries based on the `id` column, this index cannot be used
-efficiently. The DB reads the rows from the table or uses a bitmap search where the primary
-key index is also read.
-
-##### "Slow" iteration
-
-Slow iteration means that we use a good index configuration to iterate over the table and
-apply filtering on the yielded relation.
-
-```ruby
-User.each_batch(of: 5) do |relation|
- relation.where(sign_in_count: 0).each { |user| puts user inspect }
-end
-```
-
-The iteration uses the primary key index (on the `id` column) which makes it safe from statement
-timeouts. The filter (`sign_in_count: 0`) is applied on the `relation` where the `id` is already
-constrained (range). The number of rows is limited.
-
-Slow iteration generally takes more time to finish. The iteration count is higher and
-one iteration could yield fewer records than the batch size. Iterations may even yield
-0 records. This is not an optimal solution; however, in some cases (especially when
-dealing with large tables) this is the only viable option.
-
-### Using Subqueries
-
-Using subqueries in your `each_batch` query does not work well in most cases. Consider the following example:
-
-```ruby
-projects = Project.where(creator_id: Issue.where(confidential: true).select(:author_id))
-
-projects.each_batch do |relation|
- # do something
-end
-```
-
-The iteration uses the `id` column of the `projects` table. The batching does not affect the
-subquery. This means for each iteration, the subquery is executed by the database. This adds a
-constant "load" on the query which often ends up in statement timeouts. We have an unknown number
-of [confidential issues](../user/project/issues/confidential_issues.md), the execution time
-and the accessed database rows depend on the data distribution in the `issues` table.
-
-NOTE:
-Using subqueries works only when the subquery returns a small number of rows.
-
-#### Improving Subqueries
-
-When dealing with subqueries, a slow iteration approach could work: the filter on `creator_id`
-can be part of the generated `relation` object.
-
-```ruby
-projects = Project.all
-
-projects.each_batch do |relation|
- relation.where(creator_id: Issue.where(confidential: true).select(:author_id))
-end
-```
-
-If the query on the `issues` table itself is not performant enough, a nested loop could be
-constructed. Try to avoid it when possible.
-
-```ruby
-projects = Project.all
-
-projects.each_batch do |relation|
- issues = Issue.where(confidential: true)
-
- issues.each_batch do |issues_relation|
- relation.where(creator_id: issues_relation.select(:author_id))
- end
-end
-```
-
-If we know that the `issues` table has many more rows than `projects`, it would make sense to flip
-the queries, where the `issues` table is batched first.
-
-### Using `JOIN` and `EXISTS`
-
-When to use `JOINS`:
-
-- When there's a 1:1 or 1:N relationship between the tables where we know that the joined record
-(almost) always exists. This works well for "extension-like" tables:
- - `projects` - `project_settings`
- - `users` - `user_details`
- - `users` - `user_statuses`
-- `LEFT JOIN` works well in this case. Conditions on the joined table need to go to the yielded
-relation so the iteration is not affected by the data distribution in the joined table.
-
-Example:
-
-```ruby
-users = User.joins("LEFT JOIN personal_access_tokens on personal_access_tokens.user_id = users.id")
-
-users.each_batch do |relation|
- relation.where("personal_access_tokens.name = 'name'")
-end
-```
-
-`EXISTS` queries should be added only to the inner `relation` of the `each_batch` query:
-
-```ruby
-User.each_batch do |relation|
- relation.where("EXISTS (SELECT 1 FROM ...")
-end
-```
-
-### Complex queries on the relation object
-
-When the `relation` object has several extra conditions, the execution plans might become
-"unstable".
-
-Example:
-
-```ruby
-Issue.each_batch do |relation|
- relation
- .joins(:metrics)
- .joins(:merge_requests_closing_issues)
- .where("id IN (SELECT ...)")
- .where(confidential: true)
-end
-```
-
-Here, we expect that the `relation` query reads the `BATCH_SIZE` of user records and then
-filters down the results according to the provided queries. The planner might decide that
-using a bitmap index lookup with the index on the `confidential` column is a better way to
-execute the query. This can cause an unexpectedly high amount of rows to be read and the
-query could time out.
-
-Problem: we know for sure that the relation is returning maximum `BATCH_SIZE` of records
-however, the planner does not know this.
-
-Common table expression (CTE) trick to force the range query to execute first:
-
-```ruby
-Issue.each_batch(of: 1000) do |relation|
- cte = Gitlab::SQL::CTE.new(:batched_relation, relation.limit(1000))
-
- scope = cte
- .apply_to(Issue.all)
- .joins(:metrics)
- .joins(:merge_requests_closing_issues)
- .where("id IN (SELECT ...)")
- .where(confidential: true)
-
- puts scope.to_a
-end
-```
-
-### `EachBatch` vs `BatchCount`
-
-When adding new counters for Service Ping, the preferred way to count records is using the
-`Gitlab::Database::BatchCount` class. The iteration logic implemented in `BatchCount`
-has similar performance characteristics like `EachBatch`. Most of the tips and suggestions
-for improving `BatchCount` mentioned above applies to `BatchCount` as well.
-
-## Iterate with keyset pagination
-
-There are a few special cases where iterating with `EachBatch` does not work. `EachBatch`
-requires one distinct column (usually the primary key), which makes the iteration impossible
-for timestamp columns and tables with composite primary keys.
-
-Where `EachBatch` does not work, you can use
-[keyset pagination](database/pagination_guidelines.md#keyset-pagination) to iterate over the
-table or a range of rows. The scaling and performance characteristics are very similar to
-`EachBatch`.
-
-Examples:
-
-- Iterate over the table in a specific order (timestamp columns) in combination with a tie-breaker
-if column user to sort by does not contain unique values.
-- Iterate over the table with composite primary keys.
-
-### Iterate over the issues in a project by creation date
-
-You can use keyset pagination to iterate over any database column in a specific order (for example,
-`created_at DESC`). To ensure consistent order of the returned records with the same values for
-`created_at`, use a tie-breaker column with unique values (for example, `id`).
-
-Assume you have the following index in the `issues` table:
-
-```sql
-idx_issues_on_project_id_and_created_at_and_id" btree (project_id, created_at, id)
-```
-
-### Fetching records for further processing
-
-The following snippet iterates over issue records within the project using the specified order
-(`created_at, id`).
-
-```ruby
-scope = Issue.where(project_id: 278964).order(:created_at, :id) # id is the tie-breaker
-
-iterator = Gitlab::Pagination::Keyset::Iterator.new(scope: scope)
-
-iterator.each_batch(of: 100) do |records|
- puts records.map(&:id)
-end
-```
-
-You can add extra filters to the query. This example only lists the issue IDs created in the last
-30 days:
-
-```ruby
-scope = Issue.where(project_id: 278964).where('created_at > ?', 30.days.ago).order(:created_at, :id) # id is the tie-breaker
-
-iterator = Gitlab::Pagination::Keyset::Iterator.new(scope: scope)
-
-iterator.each_batch(of: 100) do |records|
- puts records.map(&:id)
-end
-```
-
-### Updating records in the batch
-
-For complex `ActiveRecord` queries, the `.update_all` method does not work well, because it
-generates an incorrect `UPDATE` statement.
-You can use raw SQL for updating records in batches:
-
-```ruby
-scope = Issue.where(project_id: 278964).order(:created_at, :id) # id is the tie-breaker
-
-iterator = Gitlab::Pagination::Keyset::Iterator.new(scope: scope)
-
-iterator.each_batch(of: 100) do |records|
- ApplicationRecord.connection.execute("UPDATE issues SET updated_at=NOW() WHERE issues.id in (#{records.dup.reselect(:id).to_sql})")
-end
-```
-
-NOTE:
-To keep the iteration stable and predictable, avoid updating the columns in the `ORDER BY` clause.
-
-### Iterate over the `merge_request_diff_commits` table
-
-The `merge_request_diff_commits` table uses a composite primary key (`merge_request_diff_id,
-relative_order`), which makes `EachBatch` impossible to use efficiently.
-
-To paginate over the `merge_request_diff_commits` table, you can use the following snippet:
-
-```ruby
-# Custom order object configuration:
-order = Gitlab::Pagination::Keyset::Order.build([
- Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
- attribute_name: 'merge_request_diff_id',
- order_expression: MergeRequestDiffCommit.arel_table[:merge_request_diff_id].asc,
- nullable: :not_nullable,
- distinct: false,
- ),
- Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
- attribute_name: 'relative_order',
- order_expression: MergeRequestDiffCommit.arel_table[:relative_order].asc,
- nullable: :not_nullable,
- distinct: false,
- )
-])
-MergeRequestDiffCommit.include(FromUnion) # keyset pagination generates UNION queries
-
-scope = MergeRequestDiffCommit.order(order)
-
-iterator = Gitlab::Pagination::Keyset::Iterator.new(scope: scope)
-
-iterator.each_batch(of: 100) do |records|
- puts records.map { |record| [record.merge_request_diff_id, record.relative_order] }.inspect
-end
-```
-
-### Order object configuration
-
-Keyset pagination works well with simple `ActiveRecord` `order` scopes
-([first example](iterating_tables_in_batches.md#iterate-over-the-issues-in-a-project-by-creation-date).
-However, in special cases, you need to describe the columns in the `ORDER BY` clause (second example)
-for the underlying keyset pagination library. When the `ORDER BY` configuration cannot be
-automatically determined by the keyset pagination library, an error is raised.
-
-The code comments of the
-[`Gitlab::Pagination::Keyset::Order`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/pagination/keyset/order.rb)
-and [`Gitlab::Pagination::Keyset::ColumnOrderDefinition`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/pagination/keyset/column_order_definition.rb)
-classes give an overview of the possible options for configuring the `ORDER BY` clause. You can
-also find a few code examples in the
-[keyset pagination](database/keyset_pagination.md#complex-order-configuration) documentation.
+<!-- This redirect file can be deleted after <2022-11-06>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/jh_features_review.md b/doc/development/jh_features_review.md
index 88830a80bf1..7b81ecfe8f5 100644
--- a/doc/development/jh_features_review.md
+++ b/doc/development/jh_features_review.md
@@ -62,7 +62,8 @@ For features that build on existing CE/EE features, a module in the `JH`
namespace injected in the CE/EE class/module is needed. This aligns with
what we're doing with EE features.
-See [EE features based on CE features](ee_features.md#ee-features-based-on-ce-features) for more details.
+See [Extend CE features with EE backend code](ee_features.md#extend-ce-features-with-ee-backend-code)
+for more details.
For example, to prepend a module into the `User` class you would use
the following approach:
diff --git a/doc/development/kubernetes.md b/doc/development/kubernetes.md
index ee261769d82..1c83bef5620 100644
--- a/doc/development/kubernetes.md
+++ b/doc/development/kubernetes.md
@@ -155,7 +155,7 @@ Mitigation strategies include:
## Debugging Kubernetes integrations
Logs related to the Kubernetes integration can be found in
-[`kubernetes.log`](../administration/logs.md#kuberneteslog). On a local
+[`kubernetes.log`](../administration/logs/index.md#kuberneteslog). On a local
GDK install, these logs are present in `log/kubernetes.log`.
Some services such as
diff --git a/doc/development/lfs.md b/doc/development/lfs.md
index 9b78c8869b1..5900eb68294 100644
--- a/doc/development/lfs.md
+++ b/doc/development/lfs.md
@@ -76,14 +76,13 @@ process, which writes the contents to the standard output.
1. The archive data is sent back to the client.
In step 7, the `gitaly-lfs-smudge` filter must talk to Workhorse, not to
-Rails, or an invalid LFS blob is saved. To support this, GitLab
-13.5 [changed the default Omnibus configuration to have Gitaly talk to
-the Workhorse](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/4592)
+Rails, or an invalid LFS blob is saved. To support this, GitLab 13.5
+[changed the default Omnibus configuration to have Gitaly talk to the Workhorse](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/4592)
instead of Rails.
One side effect of this change: the correlation ID of the original
request is not preserved for the internal API requests made by Gitaly
(or `gitaly-lfs-smudge`), such as the one made in step 8. The
-correlation IDs for those API requests are random values until [this
-Workhorse issue](https://gitlab.com/gitlab-org/gitlab-workhorse/-/issues/309) is
+correlation IDs for those API requests are random values until
+[this Workhorse issue](https://gitlab.com/gitlab-org/gitlab-workhorse/-/issues/309) is
resolved.
diff --git a/doc/development/licensed_feature_availability.md b/doc/development/licensed_feature_availability.md
index 21b07ae89b5..b007df8f1da 100644
--- a/doc/development/licensed_feature_availability.md
+++ b/doc/development/licensed_feature_availability.md
@@ -1,72 +1,11 @@
---
-stage: Fulfillment
-group: Provision
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'ee_features.md'
+remove_date: '2022-10-08'
---
-# Licensed feature availability
+This document was moved to [another location](ee_features.md).
-As of GitLab 9.4, we've been supporting a simplified version of licensed
-feature availability checks via `ee/app/models/license.rb`, both for
-on-premise or GitLab.com plans and features.
-
-## Restricting features scoped by namespaces or projects
-
-GitLab.com plans are persisted on user groups and namespaces, therefore, if you're adding a
-feature such as [Related issues](../user/project/issues/related_issues.md) or
-[Service Desk](../user/project/service_desk.md),
-it should be restricted on namespace scope.
-
-1. Add the feature symbol on `STARTER_FEATURES`, `PREMIUM_FEATURES`, or `ULTIMATE_FEATURES` constants in
- `ee/app/models/gitlab_subscriptions/features.rb`.
-1. Check using:
-
-```ruby
-project.licensed_feature_available?(:feature_symbol)
-```
-
-or
-
-```ruby
-group.licensed_feature_available?(:feature_symbol)
-```
-
-For projects, `licensed_feature_available` delegates to its associated `namespace`.
-
-## Restricting global features (instance)
-
-However, for features such as [Geo](../administration/geo/index.md) and
-[Database Load Balancing](../administration/postgresql/database_load_balancing.md), which cannot be restricted
-to only a subset of projects or namespaces, the check is made directly in
-the instance license.
-
-1. Add the feature symbol to `STARTER_FEATURES`, `PREMIUM_FEATURES` or `ULTIMATE_FEATURES` constants in
- `ee/app/models/gitlab_subscriptions/features.rb`.
-1. Add the same feature symbol to `GLOBAL_FEATURES`.
-1. Check using:
-
-```ruby
-License.feature_available?(:feature_symbol)
-```
-
-## Restricting frontend features
-
-To restrict frontend features based on the license, use `push_licensed_feature`.
-The frontend can then access this via `this.glFeatures`:
-
-```ruby
-before_action do
- push_licensed_feature(:feature_symbol)
- # or by project/namespace
- push_licensed_feature(:feature_symbol, project)
-end
-```
-
-## Allow use of licensed EE features
-
-To enable plans per namespace turn on the `Allow use of licensed EE features` option from the settings page.
-This will make licensed EE features available to projects only if the project namespace's plan includes the feature
-or if the project is public. To enable it:
-
-1. If you are developing locally, follow the steps in [simulate SaaS](ee_features.md#act-as-saas) to make the option available.
-1. Select Admin > Settings > General > "Account and limit" and enable "Allow use of licensed EE features".
+<!-- This redirect file can be deleted after <2022-10-08>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/logging.md b/doc/development/logging.md
index 749f85c9e2d..f1fa7f4c8c9 100644
--- a/doc/development/logging.md
+++ b/doc/development/logging.md
@@ -6,7 +6,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# GitLab Developers Guide to Logging **(FREE)**
-[GitLab Logs](../administration/logs.md) play a critical role for both
+[GitLab Logs](../administration/logs/index.md) play a critical role for both
administrators and GitLab team members to diagnose problems in the field.
## Don't use `Rails.logger`
@@ -44,8 +44,7 @@ These logs suffer from a number of problems:
Note that currently on GitLab.com, any messages in `production.log` aren't
indexed by Elasticsearch due to the sheer volume and noise. They
do end up in Google Stackdriver, but it is still harder to search for
-logs there. See the [GitLab.com logging
-documentation](https://gitlab.com/gitlab-com/runbooks/-/tree/master/docs/logging)
+logs there. See the [GitLab.com logging documentation](https://gitlab.com/gitlab-com/runbooks/-/tree/master/docs/logging)
for more details.
## Use structured (JSON) logging
@@ -66,7 +65,7 @@ Suppose you want to log the events that happen in a project
importer. You want to log issues created, merge requests, and so on, as the
importer progresses. Here's what to do:
-1. Look at [the list of GitLab Logs](../administration/logs.md) to see
+1. Look at [the list of GitLab Logs](../administration/logs/index.md) to see
if your log message might belong with one of the existing log files.
1. If there isn't a good place, consider creating a new filename, but
check with a maintainer if it makes sense to do so. A log file should
@@ -386,18 +385,18 @@ end
## Additional steps with new log files
1. Consider log retention settings. By default, Omnibus rotates any
- logs in `/var/log/gitlab/gitlab-rails/*.log` every hour and [keep at
- most 30 compressed files](https://docs.gitlab.com/omnibus/settings/logs.html#logrotate).
+ logs in `/var/log/gitlab/gitlab-rails/*.log` every hour and
+ [keep at most 30 compressed files](https://docs.gitlab.com/omnibus/settings/logs.html#logrotate).
On GitLab.com, that setting is only 6 compressed files. These settings should suffice
for most users, but you may need to tweak them in [Omnibus GitLab](https://gitlab.com/gitlab-org/omnibus-gitlab).
-1. If you add a new file, submit an issue to the [production
- tracker](https://gitlab.com/gitlab-com/gl-infra/production/-/issues) or
+1. If you add a new file, submit an issue to the
+ [production tracker](https://gitlab.com/gitlab-com/gl-infra/production/-/issues) or
a merge request to the [`gitlab_fluentd`](https://gitlab.com/gitlab-cookbooks/gitlab_fluentd)
project. See [this example](https://gitlab.com/gitlab-cookbooks/gitlab_fluentd/-/merge_requests/51/diffs).
-1. Be sure to update the [GitLab CE/EE documentation](../administration/logs.md) and the [GitLab.com
- runbooks](https://gitlab.com/gitlab-com/runbooks/blob/master/docs/logging/README.md).
+1. Be sure to update the [GitLab CE/EE documentation](../administration/logs/index.md) and the
+ [GitLab.com runbooks](https://gitlab.com/gitlab-com/runbooks/blob/master/docs/logging/README.md).
## Control logging visibility
diff --git a/doc/development/merge_request_concepts/index.md b/doc/development/merge_request_concepts/index.md
index 8df0da5123e..331f0e01579 100644
--- a/doc/development/merge_request_concepts/index.md
+++ b/doc/development/merge_request_concepts/index.md
@@ -5,13 +5,11 @@ group: Code Review
info: "See the Technical Writers assigned to Development Guidelines: https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments-to-development-guidelines"
---
-# Merge Request Concepts
+# Merge request concepts
-**NOTE**:
+NOTE:
The documentation below is the single source of truth for the merge request terminology and functionality.
-## Overview
-
The merge request is made up of several different key components and ideas that encompass the overall merge request experience. These concepts sometimes have competing and confusing terminology or overlap with other concepts. The concepts this will cover are:
1. Merge widget
@@ -19,7 +17,12 @@ The merge request is made up of several different key components and ideas that
1. Merge checks
1. Approval rules
-### Merge widget
+When developing new merge request widgets, read the
+[merge request widget extension framework](../fe_guide/merge_request_widget_extensions.md)
+documentation. All new widgets should use this framework, and older widgets should
+be ported to use it.
+
+## Merge widget
The merge widget is the component of the merge request where the `merge` button exists:
@@ -27,27 +30,27 @@ The merge widget is the component of the merge request where the `merge` button
This area of the merge request is where all of the options and commit messages are defined prior to merging. It also contains information about what is in the merge request, what issues may be closed, and other important information to the merging process.
-### Report widgets
+## Report widgets
Reports are widgets within the merge request that report information about changes within the merge request. These widgets provide information to better help the author understand the changes and further improvements to the proposed changes.
-[Design Documentation](https://design.gitlab.com/regions/merge-request-reports)
+[Design Documentation](https://design.gitlab.com/regions/merge-request-reports/)
![merge request reports](../img/merge_request_reports_v14_7.png)
-### Merge checks
+## Merge checks
Merge checks are statuses that can either pass or fail and conditionally control the availability of the merge button being available within a merge request. The key distinguishing factor in a merge check is that users **do not** interact with the merge checks inside of the merge request, but are able to influence whether or not the check passes or fails. Results from the check are processed as true/false to determine whether or not a merge request can be merged. Examples include:
-1. merge conflicts
-1. pipeline success
-1. threads resolution
-1. [external status checks](../../user/project/merge_requests/status_checks.md)
-1. required approvals
+- Merge conflicts.
+- Pipeline success.
+- Threads resolution.
+- [External status checks](../../user/project/merge_requests/status_checks.md).
+- Required approvals.
When all of the required merge checks are satisfied a merge request becomes mergeable.
-### Approvals
+## Approvals
Approval rules specify users that are required to or can optionally approve a merge request based on some kind of organizational policy. When approvals are required, they effectively become a required merge check. The key differentiator between merge checks and approval rules is that users **do** interact with approval rules, by deciding to approve the merge request.
diff --git a/doc/development/merge_request_concepts/widget_extensions.md b/doc/development/merge_request_concepts/widget_extensions.md
new file mode 100644
index 00000000000..097e9155f2b
--- /dev/null
+++ b/doc/development/merge_request_concepts/widget_extensions.md
@@ -0,0 +1,11 @@
+---
+redirect_to: '../fe_guide/merge_request_widget_extensions.md'
+remove_date: '2022-10-27'
+---
+
+This document was moved to [another location](../fe_guide/merge_request_widget_extensions.md).
+
+<!-- This redirect file can be deleted after <2022-10-27>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/merge_request_performance_guidelines.md b/doc/development/merge_request_performance_guidelines.md
index 5e7fe9cc8fb..7ff25705ae6 100644
--- a/doc/development/merge_request_performance_guidelines.md
+++ b/doc/development/merge_request_performance_guidelines.md
@@ -193,7 +193,7 @@ costly, time-consuming query to the replicas.
## Use CTEs wisely
-Read about [complex queries on the relation object](iterating_tables_in_batches.md#complex-queries-on-the-relation-object) for considerations on how to use CTEs. We have found in some situations that CTEs can become problematic in use (similar to the n+1 problem above). In particular, hierarchical recursive CTE queries such as the CTE in [AuthorizedProjectsWorker](https://gitlab.com/gitlab-org/gitlab/-/issues/325688) are very difficult to optimize and don't scale. We should avoid them when implementing new features that require any kind of hierarchical structure.
+Read about [complex queries on the relation object](database/iterating_tables_in_batches.md#complex-queries-on-the-relation-object) for considerations on how to use CTEs. We have found in some situations that CTEs can become problematic in use (similar to the n+1 problem above). In particular, hierarchical recursive CTE queries such as the CTE in [AuthorizedProjectsWorker](https://gitlab.com/gitlab-org/gitlab/-/issues/325688) are very difficult to optimize and don't scale. We should avoid them when implementing new features that require any kind of hierarchical structure.
CTEs have been effectively used as an optimization fence in many simpler cases,
such as this [example](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/43242#note_61416277).
@@ -394,8 +394,8 @@ query for every mention of `@alice`.
Caching data per transaction can be done using
[RequestStore](https://github.com/steveklabnik/request_store) (use
`Gitlab::SafeRequestStore` to avoid having to remember to check
-`RequestStore.active?`). Caching data in Redis can be done using [Rails' caching
-system](https://guides.rubyonrails.org/caching_with_rails.html).
+`RequestStore.active?`). Caching data in Redis can be done using
+[Rails' caching system](https://guides.rubyonrails.org/caching_with_rails.html).
## Pagination
@@ -414,8 +414,7 @@ The main styles of pagination are:
The ultimately scalable solution for pagination is to use Keyset-based pagination.
However, we don't have support for that at GitLab at that moment. You
-can follow the progress looking at [API: Keyset Pagination
-](https://gitlab.com/groups/gitlab-org/-/epics/2039).
+can follow the progress looking at [API: Keyset Pagination](https://gitlab.com/groups/gitlab-org/-/epics/2039).
Take into consideration the following when choosing a pagination strategy:
diff --git a/doc/development/migration_style_guide.md b/doc/development/migration_style_guide.md
index e0e21319f47..64d8b22f1b8 100644
--- a/doc/development/migration_style_guide.md
+++ b/doc/development/migration_style_guide.md
@@ -45,9 +45,12 @@ work it needs to perform and how long it takes to complete:
One exception is a migration that takes longer but is absolutely critical for the application to operate correctly.
For example, you might have indices that enforce unique tuples, or that are needed for query performance in critical parts of the application. In cases where the migration would be unacceptably slow, however, a better option might be to guard the feature with a [feature flag](feature_flags/index.md)
and perform a post-deployment migration instead. The feature can then be turned on after the migration finishes.
+
+ Migrations used to add new models are also part of these regular schema migrations. The only differences are the Rails command used to generate the migrations and the additional generated files, one for the model and one for the model's spec.
1. [**Post-deployment migrations.**](database/post_deployment_migrations.md) These are Rails migrations in `db/post_migrate` and
- run _after_ new application code has been deployed (for GitLab.com after the production deployment has finished).
- They can be used for schema changes that aren't critical for the application to operate, or data migrations that take at most a few minutes.
+ are run independently from the GitLab.com deployments. Pending post migrations are executed on a daily basis at the discretion
+ of release manager through the [post-deploy migration pipeline](https://gitlab.com/gitlab-org/release/docs/-/blob/master/general/post_deploy_migration/readme.md#how-to-determine-if-a-post-deploy-migration-has-been-executed-on-gitlabcom).
+ These migrations can be used for schema changes that aren't critical for the application to operate, or data migrations that take at most a few minutes.
Common examples for schema changes that should run post-deploy include:
- Clean-ups, like removing unused columns.
- Adding non-critical indices on high-traffic tables.
@@ -88,7 +91,7 @@ Keep in mind that all durations should be measured against GitLab.com.
|----|----|---|
| Regular migrations | `<= 3 minutes` | A valid exception are changes without which application functionality or performance would be severely degraded and which cannot be delayed. |
| Post-deployment migrations | `<= 10 minutes` | A valid exception are schema changes, since they must not happen in background migrations. |
-| Background migrations | `> 10 minutes` | Since these are suitable for larger tables, it's not possible to set a precise timing guideline, however, any single query must stay below [`1 second` execution time](query_performance.md#timing-guidelines-for-queries) with cold caches. |
+| Background migrations | `> 10 minutes` | Since these are suitable for larger tables, it's not possible to set a precise timing guideline, however, any single query must stay below [`1 second` execution time](database/query_performance.md#timing-guidelines-for-queries) with cold caches. |
## Decide which database to target
@@ -108,6 +111,20 @@ bundle exec rails g migration migration_name_here
This generates the migration file in `db/migrate`.
+### Regular schema migrations to add new models
+
+To create a new model you can use the following Rails generator:
+
+```shell
+bundle exec rails g model model_name_here
+```
+
+This will generate:
+
+- the migration file in `db/migrate`
+- the model file in `app/models`
+- the spec file in `spec/models`
+
## Schema Changes
Changes to the schema should be committed to `db/structure.sql`. This
@@ -119,7 +136,7 @@ columns manually for existing tables as this causes confusion to
other people using `db/structure.sql` generated by Rails.
NOTE:
-[Creating an index asynchronously requires two merge requests.](adding_database_indexes.md#add-a-migration-to-create-the-index-synchronously)
+[Creating an index asynchronously requires two merge requests.](database/adding_database_indexes.md#add-a-migration-to-create-the-index-synchronously)
When done, commit the schema change in the merge request
that adds the index with `add_concurrent_index`.
@@ -245,7 +262,7 @@ When using a single-transaction migration, a transaction holds a database connec
for the duration of the migration, so you must make sure the actions in the migration
do not take too much time: GitLab.com's production database has a `15s` timeout, so
in general, the cumulative execution time in a migration should aim to fit comfortably
-in that limit. Singular query timings should fit within the [standard limit](query_performance.md#timing-guidelines-for-queries)
+in that limit. Singular query timings should fit within the [standard limit](database/query_performance.md#timing-guidelines-for-queries)
In case you need to insert, update, or delete a significant amount of data, you:
@@ -268,7 +285,7 @@ which is a "versioned" class. For new migrations, the latest version should be u
can be looked up in `Gitlab::Database::Migration::MIGRATION_CLASSES`) to use the latest version
of migration helpers.
-In this example, we use version 1.0 of the migration class:
+In this example, we use version 2.0 of the migration class:
```ruby
class TestMigration < Gitlab::Database::Migration[2.0]
@@ -580,7 +597,7 @@ end
Verify the index is not being used anymore with this Thanos query:
```sql
-sum(rate(pg_stat_user_indexes_idx_tup_read{env="gprd", indexrelname="index_ci_name", type="patroni-ci"}[5m]))
+sum by (type)(rate(pg_stat_user_indexes_idx_scan{env="gprd", indexrelname="index_groups_on_parent_id_id"}[5m]))
```
Note that it is not necessary to check if the index exists prior to
@@ -611,7 +628,7 @@ might not be required, like:
Additionally, wide indexes are not required to match all filter criteria of queries, we just need
to cover enough columns so that the index lookup has a small enough selectivity. Please review our
-[Adding Database indexes](adding_database_indexes.md) guide for more details.
+[Adding Database indexes](database/adding_database_indexes.md) guide for more details.
When adding an index to a non-empty table make sure to use the method
`add_concurrent_index` instead of the regular `add_index` method.
@@ -640,7 +657,7 @@ end
You must explicitly name indexes that are created with more complex
definitions beyond table name, column names, and uniqueness constraint.
-Consult the [Adding Database Indexes](adding_database_indexes.md#requirements-for-naming-indexes)
+Consult the [Adding Database Indexes](database/adding_database_indexes.md#requirements-for-naming-indexes)
guide for more details.
If you need to add a unique index, please keep in mind there is the possibility
@@ -658,7 +675,7 @@ If a migration requires conditional logic based on the absence or
presence of an index, you must test for existence of that index using
its name. This helps avoids problems with how Rails compares index definitions,
which can lead to unexpected results. For more details, review the
-[Adding Database Indexes](adding_database_indexes.md#why-explicit-names-are-required)
+[Adding Database Indexes](database/adding_database_indexes.md#why-explicit-names-are-required)
guide.
The easiest way to test for existence of an index by name is to use the
@@ -739,6 +756,21 @@ If a backport adding a column with a default value is needed for %12.9 or earlie
it should use `add_column_with_default` helper. If a [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3)
is involved, backporting to %12.9 is contraindicated.
+## Removing the column default for non-nullable columns
+
+If you have added a non-nullable column, and used the default value to populate
+existing data, you need to keep that default value around until at least after
+the application code is updated. You cannot remove the default value in the
+same migration, as the migrations run before the model code is updated and
+models will have an old schema cache, meaning they won't know about this column
+and won't be able to set it. In this case it's recommended to:
+
+1. Add the column with default value in a normal migration.
+1. Remove the default in a post-deployment migration.
+
+The post-deployment migration happens after the application restarts,
+ensuring the new column has been discovered.
+
## Changing the column default
One might think that changing a default column with `change_column_default` is an
@@ -1196,8 +1228,8 @@ If using a model in the migrations, you should first
[clear the column cache](https://api.rubyonrails.org/classes/ActiveRecord/ModelSchema/ClassMethods.html#method-i-reset_column_information)
using `reset_column_information`.
-If using a model that leverages single table inheritance (STI), there are [special
-considerations](single_table_inheritance.md#in-migrations).
+If using a model that leverages single table inheritance (STI), there are
+[special considerations](database/single_table_inheritance.md#in-migrations).
This avoids problems where a column that you are using was altered and cached
in a previous migration.
diff --git a/doc/development/module_with_instance_variables.md b/doc/development/module_with_instance_variables.md
index 0f910f20534..8e39186d396 100644
--- a/doc/development/module_with_instance_variables.md
+++ b/doc/development/module_with_instance_variables.md
@@ -35,12 +35,10 @@ one of the variables. Everything could touch anything.
People are saying multiple inheritance is bad. Mixing multiple modules with
multiple instance variables scattering everywhere suffer from the same issue.
The same applies to `ActiveSupport::Concern`. See:
-[Consider replacing concerns with dedicated classes & composition](
-https://gitlab.com/gitlab-org/gitlab/-/issues/16270)
+[Consider replacing concerns with dedicated classes & composition](https://gitlab.com/gitlab-org/gitlab/-/issues/16270)
There's also a similar idea:
-[Use decorators and interface segregation to solve overgrowing models problem](
-https://gitlab.com/gitlab-org/gitlab/-/issues/14235)
+[Use decorators and interface segregation to solve overgrowing models problem](https://gitlab.com/gitlab-org/gitlab/-/issues/14235)
Note that `included` doesn't solve the whole issue. They define the
dependencies, but they still allow each modules to talk implicitly via the
diff --git a/doc/development/multi_version_compatibility.md b/doc/development/multi_version_compatibility.md
index bdab92f5185..0f3531cf5dd 100644
--- a/doc/development/multi_version_compatibility.md
+++ b/doc/development/multi_version_compatibility.md
@@ -270,7 +270,7 @@ and set this column to `false`. The old servers were still updating the old colu
that updated the new column from the old one. For the new servers though, they were only updating the new column and that same trigger
was now working against us and setting it back to the wrong value.
-For more information, see [the relevant issue](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/9176).
+For more information, see [the relevant issue](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/9176).
### Sidebar wasn't loading for some users
diff --git a/doc/development/namespaces_storage_statistics.md b/doc/development/namespaces_storage_statistics.md
index e5263288210..75e79d1f693 100644
--- a/doc/development/namespaces_storage_statistics.md
+++ b/doc/development/namespaces_storage_statistics.md
@@ -1,193 +1,11 @@
---
-stage: none
-group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/namespaces_storage_statistics.md'
+remove_date: '2022-11-05'
---
-# Database case study: Namespaces storage statistics
+This document was moved to [another location](database/namespaces_storage_statistics.md).
-## Introduction
-
-On [Storage and limits management for groups](https://gitlab.com/groups/gitlab-org/-/epics/886),
-we want to facilitate a method for easily viewing the amount of
-storage consumed by a group, and allow easy management.
-
-## Proposal
-
-1. Create a new ActiveRecord model to hold the namespaces' statistics in an aggregated form (only for root [namespaces](../user/group/index.md#namespaces)).
-1. Refresh the statistics in this model every time a project belonging to this namespace is changed.
-
-## Problem
-
-In GitLab, we update the project storage statistics through a
-[callback](https://gitlab.com/gitlab-org/gitlab/-/blob/4ab54c2233e91f60a80e5b6fa2181e6899fdcc3e/app/models/project.rb#L97)
-every time the project is saved.
-
-The summary of those statistics per namespace is then retrieved
-by [`Namespaces#with_statistics`](https://gitlab.com/gitlab-org/gitlab/-/blob/4ab54c2233e91f60a80e5b6fa2181e6899fdcc3e/app/models/namespace.rb#L70) scope. Analyzing this query we noticed that:
-
-- It takes up to `1.2` seconds for namespaces with over `15k` projects.
-- It can't be analyzed with [ChatOps](chatops_on_gitlabcom.md), as it times out.
-
-Additionally, the pattern that is currently used to update the project statistics
-(the callback) doesn't scale adequately. It is currently one of the largest
-[database queries transactions on production](https://gitlab.com/gitlab-org/gitlab/-/issues/29070)
-that takes the most time overall. We can't add one more query to it as
-it increases the transaction's length.
-
-Because of all of the above, we can't apply the same pattern to store
-and update the namespaces statistics, as the `namespaces` table is one
-of the largest tables on GitLab.com. Therefore we needed to find a performant and
-alternative method.
-
-## Attempts
-
-### Attempt A: PostgreSQL materialized view
-
-Model can be updated through a refresh strategy based on a project routes SQL and a [materialized view](https://www.postgresql.org/docs/11/rules-materializedviews.html):
-
-```sql
-SELECT split_part("rs".path, '/', 1) as root_path,
- COALESCE(SUM(ps.storage_size), 0) AS storage_size,
- COALESCE(SUM(ps.repository_size), 0) AS repository_size,
- COALESCE(SUM(ps.wiki_size), 0) AS wiki_size,
- COALESCE(SUM(ps.lfs_objects_size), 0) AS lfs_objects_size,
- COALESCE(SUM(ps.build_artifacts_size), 0) AS build_artifacts_size,
- COALESCE(SUM(ps.pipeline_artifacts_size), 0) AS pipeline_artifacts_size,
- COALESCE(SUM(ps.packages_size), 0) AS packages_size,
- COALESCE(SUM(ps.snippets_size), 0) AS snippets_size,
- COALESCE(SUM(ps.uploads_size), 0) AS uploads_size
-FROM "projects"
- INNER JOIN routes rs ON rs.source_id = projects.id AND rs.source_type = 'Project'
- INNER JOIN project_statistics ps ON ps.project_id = projects.id
-GROUP BY root_path
-```
-
-We could then execute the query with:
-
-```sql
-REFRESH MATERIALIZED VIEW root_namespace_storage_statistics;
-```
-
-While this implied a single query update (and probably a fast one), it has some downsides:
-
-- Materialized views syntax varies from PostgreSQL and MySQL. While this feature was worked on, MySQL was still supported by GitLab.
-- Rails does not have native support for materialized views. We'd need to use a specialized gem to take care of the management of the database views, which implies additional work.
-
-### Attempt B: An update through a CTE
-
-Similar to Attempt A: Model update done through a refresh strategy with a [Common Table Expression](https://www.postgresql.org/docs/9.1/queries-with.html)
-
-```sql
-WITH refresh AS (
- SELECT split_part("rs".path, '/', 1) as root_path,
- COALESCE(SUM(ps.storage_size), 0) AS storage_size,
- COALESCE(SUM(ps.repository_size), 0) AS repository_size,
- COALESCE(SUM(ps.wiki_size), 0) AS wiki_size,
- COALESCE(SUM(ps.lfs_objects_size), 0) AS lfs_objects_size,
- COALESCE(SUM(ps.build_artifacts_size), 0) AS build_artifacts_size,
- COALESCE(SUM(ps.pipeline_artifacts_size), 0) AS pipeline_artifacts_size,
- COALESCE(SUM(ps.packages_size), 0) AS packages_size,
- COALESCE(SUM(ps.snippets_size), 0) AS snippets_size,
- COALESCE(SUM(ps.uploads_size), 0) AS uploads_size
- FROM "projects"
- INNER JOIN routes rs ON rs.source_id = projects.id AND rs.source_type = 'Project'
- INNER JOIN project_statistics ps ON ps.project_id = projects.id
- GROUP BY root_path)
-UPDATE namespace_storage_statistics
-SET storage_size = refresh.storage_size,
- repository_size = refresh.repository_size,
- wiki_size = refresh.wiki_size,
- lfs_objects_size = refresh.lfs_objects_size,
- build_artifacts_size = refresh.build_artifacts_size,
- pipeline_artifacts_size = refresh.pipeline_artifacts_size,
- packages_size = refresh.packages_size,
- snippets_size = refresh.snippets_size,
- uploads_size = refresh.uploads_size
-FROM refresh
- INNER JOIN routes rs ON rs.path = refresh.root_path AND rs.source_type = 'Namespace'
-WHERE namespace_storage_statistics.namespace_id = rs.source_id
-```
-
-Same benefits and downsides as attempt A.
-
-### Attempt C: Get rid of the model and store the statistics on Redis
-
-We could get rid of the model that stores the statistics in aggregated form and instead use a Redis Set.
-This would be the [boring solution](https://about.gitlab.com/handbook/values/#boring-solutions) and the fastest one
-to implement, as GitLab already includes Redis as part of its [Architecture](architecture.md#redis).
-
-The downside of this approach is that Redis does not provide the same persistence/consistency guarantees as PostgreSQL,
-and this is information we can't afford to lose in a Redis failure.
-
-### Attempt D: Tag the root namespace and its child namespaces
-
-Directly relate the root namespace to its child namespaces, so
-whenever a namespace is created without a parent, this one is tagged
-with the root namespace ID:
-
-| ID | root ID | parent ID |
-|:---|:--------|:----------|
-| 1 | 1 | NULL |
-| 2 | 1 | 1 |
-| 3 | 1 | 2 |
-
-To aggregate the statistics inside a namespace we'd execute something like:
-
-```sql
-SELECT COUNT(...)
-FROM projects
-WHERE namespace_id IN (
- SELECT id
- FROM namespaces
- WHERE root_id = X
-)
-```
-
-Even though this approach would make aggregating much easier, it has some major downsides:
-
-- We'd have to migrate **all namespaces** by adding and filling a new column. Because of the size of the table, dealing with time/cost would be significant. The background migration would take approximately `153h`, see <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/29772>.
-- Background migration has to be shipped one release before, delaying the functionality by another milestone.
-
-### Attempt E (final): Update the namespace storage statistics asynchronously
-
-This approach consists of continuing to use the incremental statistics updates we already have,
-but we refresh them through Sidekiq jobs and in different transactions:
-
-1. Create a second table (`namespace_aggregation_schedules`) with two columns `id` and `namespace_id`.
-1. Whenever the statistics of a project changes, insert a row into `namespace_aggregation_schedules`
- - We don't insert a new row if there's already one related to the root namespace.
- - Keeping in mind the length of the transaction that involves updating `project_statistics`(<https://gitlab.com/gitlab-org/gitlab/-/issues/29070>), the insertion should be done in a different transaction and through a Sidekiq Job.
-1. After inserting the row, we schedule another worker to be executed asynchronously at two different moments:
- - One enqueued for immediate execution and another one scheduled in `1.5h` hours.
- - We only schedule the jobs, if we can obtain a `1.5h` lease on Redis on a key based on the root namespace ID.
- - If we can't obtain the lease, it indicates there's another aggregation already in progress, or scheduled in no more than `1.5h`.
-1. This worker will:
- - Update the root namespace storage statistics by querying all the namespaces through a service.
- - Delete the related `namespace_aggregation_schedules` after the update.
-1. Another Sidekiq job is also included to traverse any remaining rows on the `namespace_aggregation_schedules` table and schedule jobs for every pending row.
- - This job is scheduled with cron to run every night (UTC).
-
-This implementation has the following benefits:
-
-- All the updates are done asynchronously, so we're not increasing the length of the transactions for `project_statistics`.
-- We're doing the update in a single SQL query.
-- It is compatible with PostgreSQL and MySQL.
-- No background migration required.
-
-The only downside of this approach is that namespaces' statistics are updated up to `1.5` hours after the change is done,
-which means there's a time window in which the statistics are inaccurate. Because we're still not
-[enforcing storage limits](https://gitlab.com/gitlab-org/gitlab/-/issues/17664), this is not a major problem.
-
-## Conclusion
-
-Updating the storage statistics asynchronously, was the less problematic and
-performant approach of aggregating the root namespaces.
-
-All the details regarding this use case can be found on:
-
-- <https://gitlab.com/gitlab-org/gitlab-foss/-/issues/62214>
-- Merge Request with the implementation: <https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/28996>
-
-Performance of the namespace storage statistics were measured in staging and production (GitLab.com). All results were posted
-on <https://gitlab.com/gitlab-org/gitlab-foss/-/issues/64092>: No problem has been reported so far.
+<!-- This redirect file can be deleted after <2022-11-05>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/new_fe_guide/development/accessibility.md b/doc/development/new_fe_guide/development/accessibility.md
index 65485104efe..9575acd20c7 100644
--- a/doc/development/new_fe_guide/development/accessibility.md
+++ b/doc/development/new_fe_guide/development/accessibility.md
@@ -1,52 +1,11 @@
---
-stage: none
-group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: '../../fe_guide/accessibility.md'
+remove_date: '2022-11-15'
---
-# Accessibility
+This document was moved to [another location](../../fe_guide/accessibility.md).
-Using semantic HTML plays a key role when it comes to accessibility.
-
-## Accessible Rich Internet Applications - ARIA
-
-WAI-ARIA (the Accessible Rich Internet Applications specification) defines a way to make Web content and Web applications more accessible to people with disabilities.
-
-The W3C recommends [using semantic elements](https://www.w3.org/TR/using-aria/#notes2) as the primary method to achieve accessibility rather than adding aria attributes. Adding aria attributes should be seen as a secondary method for creating accessible elements.
-
-### Role
-
-The `role` attribute describes the role the element plays in the context of the document.
-
-Review the list of [WAI-ARIA roles](https://www.w3.org/TR/wai-aria-1.1/#landmark_roles).
-
-## Icons
-
-When using icons or images that aren't absolutely needed to understand the context, we should use `aria-hidden="true"`.
-
-On the other hand, if an icon is crucial to understand the context we should do one of the following:
-
-1. Use `aria-label` in the element with a meaningful description
-1. Use `aria-labelledby` to point to an element that contains the explanation for that icon
-
-## Form inputs
-
-In forms we should use the `for` attribute in the label statement:
-
-```html
-<div>
- <label for="name">Fill in your name:</label>
- <input type="text" id="name" name="name">
-</div>
-```
-
-## Testing
-
-1. On MacOS you can use [VoiceOver](https://www.apple.com/accessibility/vision/) by pressing `cmd+F5`.
-1. On Windows you can use [Narrator](https://www.microsoft.com/en-us/accessibility/windows) by pressing Windows logo key + Control + Enter.
-
-## Online resources
-
-- [Chrome Accessibility Developer Tools](https://github.com/GoogleChrome/accessibility-developer-tools) for testing accessibility
-- [Audit Rules Page](https://github.com/GoogleChrome/accessibility-developer-tools/wiki/Audit-Rules) for best practices
-- [Lighthouse Accessibility Score](https://web.dev/performance-scoring/) for accessibility audits
+<!-- This redirect file can be deleted after <2022-11-15>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/new_fe_guide/development/components.md b/doc/development/new_fe_guide/development/components.md
index ec714c9c26f..9ad742272d1 100644
--- a/doc/development/new_fe_guide/development/components.md
+++ b/doc/development/new_fe_guide/development/components.md
@@ -1,27 +1,11 @@
---
-stage: none
-group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: '../../fe_guide/index.md'
+remove_date: '2022-11-15'
---
-# Components
+This document was moved to [another location](../../fe_guide/index.md).
-## Graphs
-
-We have a lot of graphing libraries in our codebase to render graphs. In an effort to improve maintainability, new graphs should use [D3.js](https://d3js.org/). If a new graph is fairly simple, consider implementing it in SVGs or HTML5 canvas.
-
-We chose D3 as our library going forward because of the following features:
-
-- [Tree shaking webpack capabilities](https://github.com/d3/d3/blob/master/CHANGES.md#changes-in-d3-40).
-- [Compatible with vue.js as well as vanilla JavaScript](https://github.com/d3/d3/blob/master/CHANGES.md#changes-in-d3-40).
-
-D3 is very popular across many projects outside of GitLab:
-
-- [The New York Times](https://archive.nytimes.com/www.nytimes.com/interactive/2012/02/13/us/politics/2013-budget-proposal-graphic.html)
-- [plot.ly](https://plotly.com/)
-- [Ayoa](https://www.ayoa.com/previously-droptask/)
-
-Within GitLab, D3 has been used for the following notable features
-
-- [Prometheus graphs](../../../user/project/integrations/prometheus.md)
-- Contribution calendars
+<!-- This redirect file can be deleted after <2022-11-15>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/new_fe_guide/development/index.md b/doc/development/new_fe_guide/development/index.md
index 5922c3aeeed..9ad742272d1 100644
--- a/doc/development/new_fe_guide/development/index.md
+++ b/doc/development/new_fe_guide/development/index.md
@@ -1,23 +1,11 @@
---
-stage: none
-group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: '../../fe_guide/index.md'
+remove_date: '2022-11-15'
---
-# Development
+This document was moved to [another location](../../fe_guide/index.md).
-## [Components](components.md)
-
-Documentation on existing components and how to best create a new component.
-
-## [Accessibility](accessibility.md)
-
-Learn how to implement an accessible frontend.
-
-## [Performance](performance.md)
-
-Learn how to keep our frontend performant.
-
-## [Testing](../../testing_guide/frontend_testing.md)
-
-Learn how to keep our frontend tested.
+<!-- This redirect file can be deleted after <2022-11-15>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/new_fe_guide/development/performance.md b/doc/development/new_fe_guide/development/performance.md
index ee853942cb9..c72f3ded896 100644
--- a/doc/development/new_fe_guide/development/performance.md
+++ b/doc/development/new_fe_guide/development/performance.md
@@ -1,22 +1,11 @@
---
-stage: none
-group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: '../../fe_guide/performance.md'
+remove_date: '2022-11-15'
---
-# Performance
+This document was moved to [another location](../../fe_guide/performance.md).
-## Monitoring
-
-We have a performance dashboard available in one of our [Grafana instances](https://dashboards.gitlab.net/d/000000043/sitespeed-page-summary?orgId=1). This dashboard automatically aggregates metric data from [sitespeed.io](https://www.sitespeed.io/) every 4 hours. These changes are displayed after a set number of pages are aggregated.
-
-These pages can be found inside text files in the [`sitespeed-measurement-setup` repository](https://gitlab.com/gitlab-org/frontend/sitespeed-measurement-setup) called [`gitlab`](https://gitlab.com/gitlab-org/frontend/sitespeed-measurement-setup/-/tree/master/gitlab)
-Any frontend engineer can contribute to this dashboard. They can contribute by adding or removing URLs of pages to the text files. The changes are pushed live on the next scheduled run after the changes are merged into `main`.
-
-There are 3 recommended high impact metrics (core web vitals) to review on each page:
-
-- [Largest Contentful Paint](https://web.dev/lcp/)
-- [First Input Delay](https://web.dev/fid/)
-- [Cumulative Layout Shift](https://web.dev/cls/)
-
-For these metrics, lower numbers are better as it means that the website is more performant.
+<!-- This redirect file can be deleted after <2022-11-15>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/new_fe_guide/index.md b/doc/development/new_fe_guide/index.md
index 4d4098844b2..83c1db696b4 100644
--- a/doc/development/new_fe_guide/index.md
+++ b/doc/development/new_fe_guide/index.md
@@ -1,22 +1,11 @@
---
-stage: none
-group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: '../fe_guide/index.md'
+remove_date: '2022-11-15'
---
-# Frontend Development Guidelines
+This document was moved to [another location](../fe_guide/index.md).
-This guide contains all the information to successfully contribute to the GitLab frontend.
-This is a living document, and we welcome contributions, feedback, and suggestions.
-
-## [Development](development/index.md)
-
-Guidance on topics related to development.
-
-## [Modules](modules/index.md)
-
-Learn about all the internal JavaScript modules that make up our frontend.
-
-## [Tips](tips.md)
-
-Tips from our frontend team to develop more efficiently and effectively.
+<!-- This redirect file can be deleted after <2022-11-15>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/new_fe_guide/modules/dirty_submit.md b/doc/development/new_fe_guide/modules/dirty_submit.md
index 6e1062aa72e..9ad742272d1 100644
--- a/doc/development/new_fe_guide/modules/dirty_submit.md
+++ b/doc/development/new_fe_guide/modules/dirty_submit.md
@@ -1,28 +1,11 @@
---
-stage: none
-group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: '../../fe_guide/index.md'
+remove_date: '2022-11-15'
---
-# Dirty Submit
+This document was moved to [another location](../../fe_guide/index.md).
-> [Introduced](https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/21115) in GitLab 11.3.
-
-## Summary
-
-Prevent submitting forms with no changes.
-
-Currently handles `input`, `textarea` and `select` elements.
-
-Also, see [the code](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/javascripts/dirty_submit/)
-within the GitLab project.
-
-## Usage
-
-```javascript
-import dirtySubmitFactory from './dirty_submit/dirty_submit_form';
-
-new DirtySubmitForm(document.querySelector('form'));
-// or
-new DirtySubmitForm(document.querySelectorAll('form'));
-```
+<!-- This redirect file can be deleted after <2022-11-15>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/new_fe_guide/modules/index.md b/doc/development/new_fe_guide/modules/index.md
index a9bdcda4a2d..9ad742272d1 100644
--- a/doc/development/new_fe_guide/modules/index.md
+++ b/doc/development/new_fe_guide/modules/index.md
@@ -1,15 +1,11 @@
---
-stage: none
-group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: '../../fe_guide/index.md'
+remove_date: '2022-11-15'
---
-# Modules
+This document was moved to [another location](../../fe_guide/index.md).
-- [DirtySubmit](dirty_submit.md)
-
- Disable form submits until there are unsaved changes.
-
-- [Merge Request widget extensions](widget_extensions.md)
-
- Easily add extensions into the merge request widget
+<!-- This redirect file can be deleted after <2022-11-15>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/new_fe_guide/modules/widget_extensions.md b/doc/development/new_fe_guide/modules/widget_extensions.md
index 4bae0ac70c4..3741ee8c38a 100644
--- a/doc/development/new_fe_guide/modules/widget_extensions.md
+++ b/doc/development/new_fe_guide/modules/widget_extensions.md
@@ -1,355 +1,11 @@
---
-stage: Create
-group: Code Review
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: '../../fe_guide/merge_request_widget_extensions.md'
+remove_date: '2022-11-15'
---
-# Merge request widget extensions **(FREE)**
+This document was moved to [another location](../../fe_guide/merge_request_widget_extensions.md).
-> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/44616) in GitLab 13.6.
-
-## Summary
-
-Extensions in the merge request widget enable you to add new features
-into the merge request widget that match the design framework.
-With extensions we get a lot of benefits out of the box without much effort required, like:
-
-- A consistent look and feel.
-- Tracking when the extension is opened.
-- Virtual scrolling for performance.
-
-## Usage
-
-To use extensions you must first create a new extension object to fetch the
-data to render in the extension. For a working example, refer to the example file in
-`app/assets/javascripts/vue_merge_request_widget/extensions/issues.js`.
-
-The basic object structure:
-
-```javascript
-export default {
- name: '', // Required: This helps identify the widget
- props: [], // Required: Props passed from the widget state
- i18n: { // Required: Object to hold i18n text
- label: '', // Required: Used for tooltips and aria-labels
- loading: '', // Required: Loading text for when data is loading
- },
- expandEvent: '', // Optional: RedisHLL event name to track expanding content
- enablePolling: false, // Optional: Tells extension to poll for data
- modalComponent: null, // Optional: The component to use for the modal
- computed: {
- summary(data) {}, // Required: Level 1 summary text
- statusIcon(data) {}, // Required: Level 1 status icon
- tertiaryButtons() {}, // Optional: Level 1 action buttons
- shouldCollapse() {}, // Optional: Add logic to determine if the widget can expand or not
- },
- methods: {
- fetchCollapsedData(props) {}, // Required: Fetches data required for collapsed state
- fetchFullData(props) {}, // Required: Fetches data for the full expanded content
- fetchMultiData() {}, // Optional: Works in conjunction with `enablePolling` and allows polling multiple endpoints
- },
-};
-```
-
-By following the same data structure, each extension can follow the same registering structure,
-but each extension can manage its data sources.
-
-After creating this structure, you must register it. You can register the extension at any
-point _after_ the widget has been created. To register a extension:
-
-```javascript
-// Import the register method
-import { registerExtension } from '~/vue_merge_request_widget/components/extensions';
-
-// Import the new extension
-import issueExtension from '~/vue_merge_request_widget/extensions/issues';
-
-// Register the imported extension
-registerExtension(issueExtension);
-```
-
-## Data fetching
-
-Each extension must fetch data. Fetching is handled when registering the extension,
-not by the core component itself. This approach allows for various different
-data fetching methods to be used, such as GraphQL or REST API calls.
-
-### API calls
-
-For performance reasons, it is best if the collapsed state fetches only the data required to
-render the collapsed state. This fetching happens within the `fetchCollapsedData` method.
-This method is called with the props as an argument, so you can easily access
-any paths set in the state.
-
-To allow the extension to set the data, this method **must** return the data. No
-special formatting is required. When the extension receives this data,
-it is set to `collapsedData`. You can access `collapsedData` in any computed property or
-method.
-
-When the user clicks **Expand**, the `fetchFullData` method is called. This method
-also gets called with the props as an argument. This method **must** also return
-the full data. However, this data needs to be correctly formatted to match the format
-mentioned in the data structure section.
-
-#### Technical debt
-
-For some of the current extensions, there is no split in data fetching. All the data
-is fetched through the `fetchCollapsedData` method. While less performant,
-it allows for faster iteration.
-
-To handle this the `fetchFullData` returns the data set through
-the `fetchCollapsedData` method call. In these cases, the `fetchFullData` must
-return a promise:
-
-```javascript
-fetchCollapsedData() {
- return ['Some data'];
-},
-fetchFullData() {
- return Promise.resolve(this.collapsedData)
-},
-```
-
-### Data structure
-
-The data returned from `fetchFullData` must match the format below. This format
-allows the core component to render the data in a way that matches
-the design framework. Any text properties can use the styling placeholders
-mentioned below:
-
-```javascript
-{
- id: data.id, // Required: ID used as a key for each row
- header: 'Header' || ['Header', 'sub-header'], // Required: String or array can be used for the header text
- text: '', // Required: Main text for the row
- subtext: '', // Optional: Smaller sub-text to be displayed below the main text
- icon: { // Optional: Icon object
- name: EXTENSION_ICONS.success, // Required: The icon name for the row
- },
- badge: { // Optional: Badge displayed after text
- text: '', // Required: Text to be displayed inside badge
- variant: '', // Optional: GitLab UI badge variant, defaults to info
- },
- link: { // Optional: Link to a URL displayed after text
- text: '', // Required: Text of the link
- href: '', // Optional: URL for the link
- },
- modal: { // Optional: Link to open a modal displayed after text
- text: '', // Required: Text of the link
- onClick: () => {} // Optional: Function to run when link is clicked, i.e. to set this.modalData
- }
- actions: [], // Optional: Action button for row
- children: [], // Optional: Child content to render, structure matches the same structure
-}
-```
-
-### Polling
-
-To enable polling for an extension, an options flag must be present in the extension:
-
-```javascript
-export default {
- //...
- enablePolling: true
-};
-```
-
-This flag tells the base component we should poll the `fetchCollapsedData()`
-defined in the extension. Polling stops if the response has data, or if an error is present.
-
-When writing the logic for `fetchCollapsedData()`, a complete Axios response must be returned
-from the method. The polling utility needs data like polling headers to work correctly:
-
-```javascript
-export default {
- //...
- enablePolling: true
- methods: {
- fetchCollapsedData() {
- return axios.get(this.reportPath)
- },
- },
-};
-```
-
-Most of the time the data returned from the extension's endpoint is not in the format
-the UI needs. We must format the data before setting the collapsed data in the base component.
-
-If the computed property `summary` can rely on `collapsedData`, you can format the data
-when `fetchFullData` is invoked:
-
-```javascript
-export default {
- //...
- enablePolling: true
- methods: {
- fetchCollapsedData() {
- return axios.get(this.reportPath)
- },
- fetchFullData() {
- return Promise.resolve(this.prepareReports());
- },
- // custom method
- prepareReports() {
- // unpack values from collapsedData
- const { new_errors, existing_errors, resolved_errors } = this.collapsedData;
-
- // perform data formatting
-
- return [...newErrors, ...existingErrors, ...resolvedErrors]
- }
- },
-};
-```
-
-If the extension relies on `collapsedData` being formatted before invoking `fetchFullData()`,
-then `fetchCollapsedData()` must return the Axios response as well as the formatted data:
-
-```javascript
-export default {
- //...
- enablePolling: true
- methods: {
- fetchCollapsedData() {
- return axios.get(this.reportPath).then(res => {
- const formattedData = this.prepareReports(res.data)
-
- return {
- ...res,
- data: formattedData,
- }
- })
- },
- // Custom method
- prepareReports() {
- // Unpack values from collapsedData
- const { new_errors, existing_errors, resolved_errors } = this.collapsedData;
-
- // Perform data formatting
-
- return [...newErrors, ...existingErrors, ...resolvedErrors]
- }
- },
-};
-```
-
-If the extension needs to poll multiple endpoints at the same time, then `fetchMultiData`
-can be used to return an array of functions. A new `poll` object is created for each
-endpoint and they are polled separately. After all endpoints are resolved, polling is
-stopped and `setCollapsedData` is called with an array of `response.data`.
-
-```javascript
-export default {
- //...
- enablePolling: true
- methods: {
- fetchMultiData() {
- return [
- () => axios.get(this.reportPath1),
- () => axios.get(this.reportPath2),
- () => axios.get(this.reportPath3)
- },
- },
-};
-```
-
-**Important** The function needs to return a `Promise` that resolves the `response` object.
-The implementation relies on the `POLL-INTERVAL` header to keep polling, therefore it is
-important not to alter the status code and headers.
-
-### Errors
-
-If `fetchCollapsedData()` or `fetchFullData()` methods throw an error:
-
-- The loading state of the extension is updated to `LOADING_STATES.collapsedError`
- and `LOADING_STATES.expandedError` respectively.
-- The extensions header displays an error icon and updates the text to be either:
- - The text defined in `$options.i18n.error`.
- - "Failed to load" if `$options.i18n.error` is not defined.
-- The error is sent to Sentry to log that it occurred.
-
-To customise the error text, add it to the `i18n` object in your extension:
-
-```javascript
-export default {
- //...
- i18n: {
- //...
- error: __('Your error text'),
- },
-};
-```
-
-## Icons
-
-Level 1 and all subsequent levels can have their own status icons. To keep with
-the design framework, import the `EXTENSION_ICONS` constant
-from the `constants.js` file:
-
-```javascript
-import { EXTENSION_ICONS } from '~/vue_merge_request_widget/constants.js';
-```
-
-This constant has the below icons available for use. Per the design framework,
-only some of these icons should be used on level 1:
-
-- `failed`
-- `warning`
-- `success`
-- `neutral`
-- `error`
-- `notice`
-- `severityCritical`
-- `severityHigh`
-- `severityMedium`
-- `severityLow`
-- `severityInfo`
-- `severityUnknown`
-
-## Text styling
-
-Any area that has text can be styled with the placeholders below. This
-technique follows the same technique as `sprintf`. However, instead of specifying
-these through `sprintf`, the extension does this automatically.
-
-Every placeholder contains starting and ending tags. For example, `success` uses
-`Hello %{success_start}world%{success_end}`. The extension then
-adds the start and end tags with the correct styling classes.
-
-| Placeholder | Style |
-|---|---|
-| success | `gl-font-weight-bold gl-text-green-500` |
-| danger | `gl-font-weight-bold gl-text-red-500` |
-| critical | `gl-font-weight-bold gl-text-red-800` |
-| same | `gl-font-weight-bold gl-text-gray-700` |
-| strong | `gl-font-weight-bold` |
-| small | `gl-font-sm` |
-
-## Action buttons
-
-You can add action buttons to all level 1 and 2 in each extension. These buttons
-are meant as a way to provide links or actions for each row:
-
-- Action buttons for level 1 can be set through the `tertiaryButtons` computed property.
- This property should return an array of objects for each action button.
-- Action buttons for level 2 can be set by adding the `actions` key to the level 2 rows object.
- The value for this key must also be an array of objects for each action button.
-
-Links must follow this structure:
-
-```javascript
-{
- text: 'Click me',
- href: this.someLinkHref,
- target: '_blank', // Optional
-}
-```
-
-For internal action buttons, follow this structure:
-
-```javascript
-{
- text: 'Click me',
- onClick() {}
-}
-```
+<!-- This redirect file can be deleted after <2022-11-15>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/new_fe_guide/tips.md b/doc/development/new_fe_guide/tips.md
index 5d4c0fc019f..83c1db696b4 100644
--- a/doc/development/new_fe_guide/tips.md
+++ b/doc/development/new_fe_guide/tips.md
@@ -1,35 +1,11 @@
---
-stage: none
-group: Development
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: '../fe_guide/index.md'
+remove_date: '2022-11-15'
---
-# Tips
+This document was moved to [another location](../fe_guide/index.md).
-## Clearing production compiled assets
-
-To clear production compiled assets created with `yarn webpack-prod` you can run:
-
-```shell
-yarn clean
-```
-
-## Creating feature flags in development
-
-The process for creating a feature flag is the same as [enabling a feature flag in development](../feature_flags/index.md#enabling-a-feature-flag-locally-in-development).
-
-Your feature flag can now be:
-
-- [Made available to the frontend](../feature_flags/index.md#frontend) via the `gon`
-- Queried in [tests](../feature_flags/index.md#feature-flags-in-tests)
-- Queried in HAML templates and Ruby files via the `Feature.enabled?(:my_shiny_new_feature_flag)` method
-
-### More on feature flags
-
-- [Deleting a feature flag](../../api/features.md#delete-a-feature)
-- [Manage feature flags](https://about.gitlab.com/handbook/product-development-flow/feature-flag-lifecycle/)
-- [Feature flags API](../../api/features.md)
-
-## Running tests locally
-
-This can be done as outlined by the [frontend testing guide](../testing_guide/frontend_testing.md#running-frontend-tests).
+<!-- This redirect file can be deleted after <2022-11-15>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/ordering_table_columns.md b/doc/development/ordering_table_columns.md
index 7cd3d4fb208..b665cb0d4c7 100644
--- a/doc/development/ordering_table_columns.md
+++ b/doc/development/ordering_table_columns.md
@@ -1,152 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/ordering_table_columns.md'
+remove_date: '2022-11-04'
---
-# Ordering Table Columns in PostgreSQL
+This document was moved to [another location](database/ordering_table_columns.md).
-For GitLab we require that columns of new tables are ordered to use the
-least amount of space. An easy way of doing this is to order them based on the
-type size in descending order with variable sizes (`text`, `varchar`, arrays,
-`json`, `jsonb`, and so on) at the end.
-
-Similar to C structures the space of a table is influenced by the order of
-columns. This is because the size of columns is aligned depending on the type of
-the following column. Let's consider an example:
-
-- `id` (integer, 4 bytes)
-- `name` (text, variable)
-- `user_id` (integer, 4 bytes)
-
-The first column is a 4-byte integer. The next is text of variable length. The
-`text` data type requires 1-word alignment, and on 64-bit platform, 1 word is 8
-bytes. To meet the alignment requirements, four zeros are to be added right
-after the first column, so `id` occupies 4 bytes, then 4 bytes of alignment
-padding, and only next `name` is being stored. Therefore, in this case, 8 bytes
-are spent for storing a 4-byte integer.
-
-The space between rows is also subject to alignment padding. The `user_id`
-column takes only 4 bytes, and on 64-bit platform, 4 zeroes are added for
-alignment padding, to allow storing the next row beginning with the "clear" word.
-
-As a result, the actual size of each column would be (omitting variable length
-data and 24-byte tuple header): 8 bytes, variable, 8 bytes. This means that
-each row requires at least 16 bytes for the two 4-byte integers. If a table
-has a few rows this is not an issue. However, once you start storing millions of
-rows you can save space by using a different order. For the above example, the
-ideal column order would be the following:
-
-- `id` (integer, 4 bytes)
-- `user_id` (integer, 4 bytes)
-- `name` (text, variable)
-
-or
-
-- `name` (text, variable)
-- `id` (integer, 4 bytes)
-- `user_id` (integer, 4 bytes)
-
-In these examples, the `id` and `user_id` columns are packed together, which
-means we only need 8 bytes to store _both_ of them. This in turn means each row
-requires 8 bytes less space.
-
-Since Ruby on Rails 5.1, the default data type for IDs is `bigint`, which uses 8 bytes.
-We are using `integer` in the examples to showcase a more realistic reordering scenario.
-
-## Type Sizes
-
-While the [PostgreSQL documentation](https://www.postgresql.org/docs/current/datatype.html) contains plenty
-of information we list the sizes of common types here so it's easier to
-look them up. Here "word" refers to the word size, which is 4 bytes for a 32
-bits platform and 8 bytes for a 64 bits platform.
-
-| Type | Size | Alignment needed |
-|:-----------------|:-------------------------------------|:-----------|
-| `smallint` | 2 bytes | 1 word |
-| `integer` | 4 bytes | 1 word |
-| `bigint` | 8 bytes | 8 bytes |
-| `real` | 4 bytes | 1 word |
-| `double precision` | 8 bytes | 8 bytes |
-| `boolean` | 1 byte | not needed |
-| `text` / `string` | variable, 1 byte plus the data | 1 word |
-| `bytea` | variable, 1 or 4 bytes plus the data | 1 word |
-| `timestamp` | 8 bytes | 8 bytes |
-| `timestamptz` | 8 bytes | 8 bytes |
-| `date` | 4 bytes | 1 word |
-
-A "variable" size means the actual size depends on the value being stored. If
-PostgreSQL determines this can be embedded directly into a row it may do so, but
-for very large values it stores the data externally and store a pointer (of
-1 word in size) in the column. Because of this variable sized columns should
-always be at the end of a table.
-
-## Real Example
-
-Let's use the `events` table as an example, which currently has the following
-layout:
-
-| Column | Type | Size |
-|:--------------|:----------------------------|:---------|
-| `id` | integer | 4 bytes |
-| `target_type` | character varying | variable |
-| `target_id` | integer | 4 bytes |
-| `title` | character varying | variable |
-| `data` | text | variable |
-| `project_id` | integer | 4 bytes |
-| `created_at` | timestamp without time zone | 8 bytes |
-| `updated_at` | timestamp without time zone | 8 bytes |
-| `action` | integer | 4 bytes |
-| `author_id` | integer | 4 bytes |
-
-After adding padding to align the columns this would translate to columns being
-divided into fixed size chunks as follows:
-
-| Chunk Size | Columns |
-|:-----------|:----------------------|
-| 8 bytes | `id` |
-| variable | `target_type` |
-| 8 bytes | `target_id` |
-| variable | `title` |
-| variable | `data` |
-| 8 bytes | `project_id` |
-| 8 bytes | `created_at` |
-| 8 bytes | `updated_at` |
-| 8 bytes | `action`, `author_id` |
-
-This means that excluding the variable sized data and tuple header, we need at
-least 8 * 6 = 48 bytes per row.
-
-We can optimise this by using the following column order instead:
-
-| Column | Type | Size |
-|:--------------|:----------------------------|:---------|
-| `created_at` | timestamp without time zone | 8 bytes |
-| `updated_at` | timestamp without time zone | 8 bytes |
-| `id` | integer | 4 bytes |
-| `target_id` | integer | 4 bytes |
-| `project_id` | integer | 4 bytes |
-| `action` | integer | 4 bytes |
-| `author_id` | integer | 4 bytes |
-| `target_type` | character varying | variable |
-| `title` | character varying | variable |
-| `data` | text | variable |
-
-This would produce the following chunks:
-
-| Chunk Size | Columns |
-|:-----------|:-----------------------|
-| 8 bytes | `created_at` |
-| 8 bytes | `updated_at` |
-| 8 bytes | `id`, `target_id` |
-| 8 bytes | `project_id`, `action` |
-| 8 bytes | `author_id` |
-| variable | `target_type` |
-| variable | `title` |
-| variable | `data` |
-
-Here we only need 40 bytes per row excluding the variable sized data and 24-byte
-tuple header. 8 bytes being saved may not sound like much, but for tables as
-large as the `events` table it does begin to matter. For example, when storing
-80 000 000 rows this translates to a space saving of at least 610 MB, all by
-just changing the order of a few columns.
+<!-- This redirect file can be deleted after <2022-11-04>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/pages/index.md b/doc/development/pages/index.md
index 02019db48ba..af9d4d33683 100644
--- a/doc/development/pages/index.md
+++ b/doc/development/pages/index.md
@@ -10,8 +10,8 @@ description: "GitLab's development guidelines for GitLab Pages"
## Configuring GitLab Pages hostname
-GitLab Pages needs a hostname or domain, as each different GitLab Pages site is accessed via a
-subdomain. GitLab Pages hostname can be set in different manners:
+GitLab Pages need a hostname or domain, as each different GitLab Pages site is accessed via a
+subdomain. You can set the GitLab Pages hostname:
- [Without wildcard, editing your hosts file](#without-wildcard-editing-your-hosts-file).
- [With DNS wildcard alternatives](#with-dns-wildcard-alternatives).
@@ -96,7 +96,7 @@ it with commands like:
### Running GitLab Pages manually
-You can also build and start the app independent of GDK processes management.
+You can also build and start the app independently of GDK processes management.
For any changes in the code, you must run `make` to build the app. It's best to just always run
it before you start the app. It's quick to build so don't worry!
@@ -114,9 +114,9 @@ FIPS_MODE=1 make && ./gitlab-pages -config=gitlab-pages.conf
### Creating GitLab Pages site
To build a GitLab Pages site locally you must
-[configure `gitlab-runner`](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/runner.md)
+[configure `gitlab-runner`](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/runner.md).
-Check the [user manual](../../user/project/pages/index.md).
+For more information, refer to the [user manual](../../user/project/pages/index.md).
### Enabling access control
@@ -125,8 +125,9 @@ who have access to your GitLab project.
GitLab Pages access control is disabled by default. To enable it:
-1. Enable the GitLab Pages access control in GitLab itself, which can be done by either:
- - If you're not using GDK, editing `gitlab.yml`:
+1. Enable the GitLab Pages access control in GitLab itself. You can do this in two ways:
+
+ - If you're not using GDK, edit `gitlab.yml`:
```yaml
# gitlab/config/gitlab.yml
@@ -134,7 +135,7 @@ GitLab Pages access control is disabled by default. To enable it:
access_control: true
```
- - Editing `gdk.yml` if you're using GDK:
+ - If you're using GDK, edit `gdk.yml`:
```yaml
# $GDK_ROOT/gdk.yml
@@ -149,8 +150,9 @@ GitLab Pages access control is disabled by default. To enable it:
1. Create an [Instance-wide OAuth application](../../integration/oauth_provider.md#instance-wide-applications)
with the `api` scope.
1. Set the value of your `redirect-uri` to the `pages-domain` authorization endpoint
- - `http://pages.gdk.test:3010/auth`, for example
- - The `redirect-uri` must not contain any GitLab Pages site domain.
+(for example, `http://pages.gdk.test:3010/auth`).
+The `redirect-uri` must not contain any GitLab Pages site domain.
+
1. Add the auth client configuration:
- With GDK, in `gdk.yml`:
@@ -236,3 +238,29 @@ make acceptance
# so we want to have the latest changes in the build that is tested
make && go test ./ -run TestRedirect
```
+
+## Contributing
+
+### Feature flags
+
+WARNING:
+All newly-introduced feature flags should be [disabled by default](https://about.gitlab.com/handbook/product-development-flow/feature-flag-lifecycle/#feature-flags-in-gitlab-development).
+
+Consider adding a [feature flag](../feature_flags/index.md) for any non-trivial changes.
+Feature flags can make the release and rollback of these changes easier, avoiding
+incidents and downtime. To add a new feature flag to GitLab Pages:
+
+1. Create the feature flag in
+ [`internal/feature/feature.go`](https://gitlab.com/gitlab-org/gitlab-pages/-/blob/master/internal/feature/feature.go),
+ which must be **off** by default.
+1. Create an issue to track the feature flag using the `Feature Flag` template.
+1. Add the `~"feature flag"` label to any merge requests that handle feature flags.
+
+For GitLab Pages, the feature flags are controlled by environment variables at a global level.
+A deployment at the service level is required to change the state of a feature flag.
+Example of a merge request enabling a GitLab Pages feature flag:
+[Enforce GitLab Pages rate limits](https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/merge_requests/1500)
+
+## Related topics
+
+- [Feature flags in the development of GitLab](../feature_flags/index.md)
diff --git a/doc/development/performance.md b/doc/development/performance.md
index d7cbef0a211..479782e0ccf 100644
--- a/doc/development/performance.md
+++ b/doc/development/performance.md
@@ -18,14 +18,13 @@ consistent performance of GitLab. Refer to the [Index](#performance-documentatio
- Backend:
- [Tooling](#tooling)
- Database:
- - [Query performance guidelines](../development/query_performance.md)
+ - [Query performance guidelines](database/query_performance.md)
- [Pagination performance guidelines](../development/database/pagination_performance_guidelines.md)
- [Keyset pagination performance](../development/database/keyset_pagination.md#performance)
- [Troubleshooting import/export performance issues](../development/import_export.md#troubleshooting-performance-issues)
- [Pipelines performance in the `gitlab` project](../development/pipelines.md#performance)
- Frontend:
- - [Performance guidelines](../development/fe_guide/performance.md)
- - [Performance dashboards and monitoring guidelines](../development/new_fe_guide/development/performance.md)
+ - [Performance guidelines and monitoring](../development/fe_guide/performance.md)
- [Browser performance testing guidelines](../ci/testing/browser_performance_testing.md)
- [`gdk measure` and `gdk measure-workflow`](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/gdk_commands.md#measure-performance)
- QA:
@@ -927,12 +926,11 @@ SOME_CONSTANT = 'bar'
You might want millions of project rows in your local database, for example,
in order to compare relative query performance, or to reproduce a bug. You could
-do this by hand with SQL commands or using [Mass Inserting Rails
-Models](mass_insert.md) functionality.
+do this by hand with SQL commands or using [Mass Inserting Rails Models](mass_insert.md) functionality.
Assuming you are working with ActiveRecord models, you might also find these links helpful:
-- [Insert records in batches](insert_into_tables_in_batches.md)
+- [Insert records in batches](database/insert_into_tables_in_batches.md)
- [BulkInsert gem](https://github.com/jamis/bulk_insert)
- [ActiveRecord::PgGenerateSeries gem](https://github.com/ryu39/active_record-pg_generate_series)
diff --git a/doc/development/permissions.md b/doc/development/permissions.md
index ed95456c4f9..8e517b8577c 100644
--- a/doc/development/permissions.md
+++ b/doc/development/permissions.md
@@ -87,7 +87,7 @@ module):
- Owner (`50`)
If a user is the member of both a project and the project parent groups, the
-higher permission is taken into account for the project.
+highest permission is the applied access level for the project.
If a user is the member of a project, but not the parent groups, they
can still view the groups and their entities (like epics).
diff --git a/doc/development/pipelines.md b/doc/development/pipelines.md
index 2bf1e5a315a..d57e5bbeb26 100644
--- a/doc/development/pipelines.md
+++ b/doc/development/pipelines.md
@@ -221,8 +221,9 @@ that includes `rspec-profile` in their name.
### Logging
-- Rails logging to `log/test.log` is disabled by default in CI [for
- performance reasons](https://jtway.co/speed-up-your-rails-test-suite-by-6-in-1-line-13fedb869ec4). To override this setting, provide the
+- Rails logging to `log/test.log` is disabled by default in CI
+ [for performance reasons](https://jtway.co/speed-up-your-rails-test-suite-by-6-in-1-line-13fedb869ec4).
+ To override this setting, provide the
`RAILS_ENABLE_TEST_LOG` environment variable.
## Review app jobs
@@ -247,6 +248,9 @@ The intent is to ensure that a change doesn't introduce a failure after `gitlab-
## As-if-JH jobs
+NOTE:
+This is disabled for now.
+
The `* as-if-jh` jobs run the GitLab test suite "as if JiHu", meaning as if the jobs would run in the context
of [GitLab JH](jh_features_review.md). These jobs are only created in the following cases:
@@ -261,12 +265,18 @@ The intent is to ensure that a change doesn't introduce a failure after `gitlab-
### When to consider applying `pipeline:run-as-if-jh` label
+NOTE:
+This is disabled for now.
+
If a Ruby file is renamed and there's a corresponding [`prepend_mod` line](jh_features_review.md#jh-features-based-on-ce-or-ee-features),
it's likely that GitLab JH is relying on it and requires a corresponding
change to rename the module or class it's prepending.
### Corresponding JH branch
+NOTE:
+This is disabled for now.
+
You can create a corresponding JH branch on [GitLab JH](https://jihulab.com/gitlab-cn/gitlab) by
appending `-jh` to the branch name. If a corresponding JH branch is found,
`* as-if-jh` jobs grab the `jh` folder from the respective branch,
diff --git a/doc/development/policies.md b/doc/development/policies.md
index c9e4fdb4350..f0c9d0ec5f9 100644
--- a/doc/development/policies.md
+++ b/doc/development/policies.md
@@ -74,8 +74,7 @@ Do not use boolean operators such as `&&` and `||` within the rule DSL,
as conditions within rule blocks are objects, not booleans. The same
applies for ternary operators (`condition ? ... : ...`), and `if`
blocks. These operators cannot be overridden, and are hence banned via a
-[custom
-cop](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/49771).
+[custom cop](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/49771).
## Scores, Order, Performance
diff --git a/doc/development/polymorphic_associations.md b/doc/development/polymorphic_associations.md
index bbeaab40a90..6b9158b8408 100644
--- a/doc/development/polymorphic_associations.md
+++ b/doc/development/polymorphic_associations.md
@@ -1,152 +1,11 @@
---
-stage: none
-group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/polymorphic_associations.md'
+remove_date: '2022-11-04'
---
-# Polymorphic Associations
+This document was moved to [another location](database/polymorphic_associations.md).
-**Summary:** always use separate tables instead of polymorphic associations.
-
-Rails makes it possible to define so called "polymorphic associations". This
-usually works by adding two columns to a table: a target type column, and a
-target ID. For example, at the time of writing we have such a setup for
-`members` with the following columns:
-
-- `source_type`: a string defining the model to use, can be either `Project` or
- `Namespace`.
-- `source_id`: the ID of the row to retrieve based on `source_type`. For
- example, when `source_type` is `Project` then `source_id` contains a
- project ID.
-
-While such a setup may appear to be useful, it comes with many drawbacks; enough
-that you should avoid this at all costs.
-
-## Space Wasted
-
-Because this setup relies on string values to determine the model to use, it
-wastes a lot of space. For example, for `Project` and `Namespace` the
-maximum size is 9 bytes, plus 1 extra byte for every string when using
-PostgreSQL. While this may only be 10 bytes per row, given enough tables and
-rows using such a setup we can end up wasting quite a bit of disk space and
-memory (for any indexes).
-
-## Indexes
-
-Because our associations are broken up into two columns this may result in
-requiring composite indexes for queries to be performed efficiently. While
-composite indexes are not wrong at all, they can be tricky to set up as the
-ordering of columns in these indexes is important to ensure optimal performance.
-
-## Consistency
-
-One really big problem with polymorphic associations is being unable to enforce
-data consistency on the database level using foreign keys. For consistency to be
-enforced on the database level one would have to write their own foreign key
-logic to support polymorphic associations.
-
-Enforcing consistency on the database level is absolutely crucial for
-maintaining a healthy environment, and thus is another reason to avoid
-polymorphic associations.
-
-## Query Overhead
-
-When using polymorphic associations you always need to filter using both
-columns. For example, you may end up writing a query like this:
-
-```sql
-SELECT *
-FROM members
-WHERE source_type = 'Project'
-AND source_id = 13083;
-```
-
-Here PostgreSQL can perform the query quite efficiently if both columns are
-indexed. As the query gets more complex, it may not be able to use these
-indexes effectively.
-
-## Mixed Responsibilities
-
-Similar to functions and classes, a table should have a single responsibility:
-storing data with a certain set of pre-defined columns. When using polymorphic
-associations, you are storing different types of data (possibly with
-different columns set) in the same table.
-
-## The Solution
-
-Fortunately, there is a solution to these problems: use a
-separate table for every type you would otherwise store in the same table. Using
-a separate table allows you to use everything a database may provide to ensure
-consistency and query data efficiently, without any additional application logic
-being necessary.
-
-Let's say you have a `members` table storing both approved and pending members,
-for both projects and groups, and the pending state is determined by the column
-`requested_at` being set or not. Schema wise such a setup can lead to various
-columns only being set for certain rows, wasting space. It's also possible that
-certain indexes are only set for certain rows, again wasting space. Finally,
-querying such a table requires less than ideal queries. For example:
-
-```sql
-SELECT *
-FROM members
-WHERE requested_at IS NULL
-AND source_type = 'GroupMember'
-AND source_id = 4
-```
-
-Instead such a table should be broken up into separate tables. For example, you
-may end up with 4 tables in this case:
-
-- project_members
-- group_members
-- pending_project_members
-- pending_group_members
-
-This makes querying data trivial. For example, to get the members of a group
-you'd run:
-
-```sql
-SELECT *
-FROM group_members
-WHERE group_id = 4
-```
-
-To get all the pending members of a group in turn you'd run:
-
-```sql
-SELECT *
-FROM pending_group_members
-WHERE group_id = 4
-```
-
-If you want to get both you can use a `UNION`, though you need to be explicit
-about what columns you want to `SELECT` as otherwise the result set uses the
-columns of the first query. For example:
-
-```sql
-SELECT id, 'Group' AS target_type, group_id AS target_id
-FROM group_members
-
-UNION ALL
-
-SELECT id, 'Project' AS target_type, project_id AS target_id
-FROM project_members
-```
-
-The above example is perhaps a bit silly, but it shows that there's nothing
-stopping you from merging the data together and presenting it on the same page.
-Selecting columns explicitly can also speed up queries as the database has to do
-less work to get the data (compared to selecting all columns, even ones you're
-not using).
-
-Our schema also becomes easier. No longer do we need to both store and index the
-`source_type` column, we can define foreign keys easily, and we don't need to
-filter rows using the `IS NULL` condition.
-
-To summarize: using separate tables allows us to use foreign keys effectively,
-create indexes only where necessary, conserve space, query data more
-efficiently, and scale these tables more easily (for example, by storing them on
-separate disks). A nice side effect of this is that code can also become easier,
-as a single model isn't responsible for handling different kinds of
-data.
+<!-- This redirect file can be deleted after <2022-11-04>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/query_count_limits.md b/doc/development/query_count_limits.md
index 49509727337..f16c8cfc6cd 100644
--- a/doc/development/query_count_limits.md
+++ b/doc/development/query_count_limits.md
@@ -1,70 +1,11 @@
---
-stage: none
-group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/query_count_limits.md'
+remove_date: '2022-11-06'
---
-# Query Count Limits
+This document was moved to [another location](database/query_count_limits.md).
-Each controller or API endpoint is allowed to execute up to 100 SQL queries and
-in test environments we raise an error when this threshold is exceeded.
-
-## Solving Failing Tests
-
-When a test fails because it executes more than 100 SQL queries there are two
-solutions to this problem:
-
-- Reduce the number of SQL queries that are executed.
-- Disable query limiting for the controller or API endpoint.
-
-You should only resort to disabling query limits when an existing controller or endpoint
-is to blame as in this case reducing the number of SQL queries can take a lot of
-effort. Newly added controllers and endpoints are not allowed to execute more
-than 100 SQL queries and no exceptions are made for this rule. _If_ a large
-number of SQL queries is necessary to perform certain work it's best to have
-this work performed by Sidekiq instead of doing this directly in a web request.
-
-## Disable query limiting
-
-In the event that you _have_ to disable query limits for a controller, you must first
-create an issue. This issue should (preferably in the title) mention the
-controller or endpoint and include the appropriate labels (`database`,
-`performance`, and at least a team specific label such as `Discussion`).
-
-After the issue has been created, you can disable query limits on the code in question. For
-Rails controllers it's best to create a `before_action` hook that runs as early
-as possible. The called method in turn should call
-`Gitlab::QueryLimiting.disable!('issue URL here')`. For example:
-
-```ruby
-class MyController < ApplicationController
- before_action :disable_query_limiting, only: [:show]
-
- def index
- # ...
- end
-
- def show
- # ...
- end
-
- def disable_query_limiting
- Gitlab::QueryLimiting.disable!('https://gitlab.com/gitlab-org/...')
- end
-end
-```
-
-By using a `before_action` you don't have to modify the controller method in
-question, reducing the likelihood of merge conflicts.
-
-For Grape API endpoints there unfortunately is not a reliable way of running a
-hook before a specific endpoint. This means that you have to add the allowlist
-call directly into the endpoint like so:
-
-```ruby
-get '/projects/:id/foo' do
- Gitlab::QueryLimiting.disable!('...')
-
- # ...
-end
-```
+<!-- This redirect file can be deleted after <2022-11-06>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/query_performance.md b/doc/development/query_performance.md
index 4fe27d42c38..618d007f766 100644
--- a/doc/development/query_performance.md
+++ b/doc/development/query_performance.md
@@ -1,74 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/query_performance.md'
+remove_date: '2022-11-06'
---
-# Query performance guidelines
+This document was moved to [another location](database/query_performance.md).
-This document describes various guidelines to follow when optimizing SQL queries.
-
-When you are optimizing your SQL queries, there are two dimensions to pay attention to:
-
-1. The query execution time. This is paramount as it reflects how the user experiences GitLab.
-1. The query plan. Optimizing the query plan is important in allowing queries to independently scale over time. Realizing that an index keeps a query performing well as the table grows before the query degrades is an example of why we analyze these plans.
-
-## Timing guidelines for queries
-
-| Query Type | Maximum Query Time | Notes |
-|----|----|---|
-| General queries | `100ms` | This is not a hard limit, but if a query is getting above it, it is important to spend time understanding why it can or cannot be optimized. |
-| Queries in a migration | `100ms` | This is different than the total [migration time](migration_style_guide.md#how-long-a-migration-should-take). |
-| Concurrent operations in a migration | `5min` | Concurrent operations do not block the database, but they block the GitLab update. This includes operations such as `add_concurrent_index` and `add_concurrent_foreign_key`. |
-| Background migrations | `1s` | |
-| Service Ping | `1s` | See the [Service Ping docs](service_ping/implement.md) for more details. |
-
-- When analyzing your query's performance, pay attention to if the time you are seeing is on a [cold or warm cache](#cold-and-warm-cache). These guidelines apply for both cache types.
-- When working with batched queries, change the range and batch size to see how it effects the query timing and caching.
-- If an existing query is not performing well, make an effort to improve it. If it is too complex or would stall development, create a follow-up so it can be addressed in a timely manner. You can always ask the database reviewer or maintainer for help and guidance.
-
-## Cold and warm cache
-
-When evaluating query performance it is important to understand the difference between
-cold and warm cached queries.
-
-The first time a query is made, it is made on a "cold cache". Meaning it needs
-to read from disk. If you run the query again, the data can be read from the
-cache, or what PostgreSQL calls shared buffers. This is the "warm cache" query.
-
-When analyzing an [`EXPLAIN` plan](understanding_explain_plans.md), you can see
-the difference not only in the timing, but by looking at the output for `Buffers`
-by running your explain with `EXPLAIN(analyze, buffers)`. [Database Lab](understanding_explain_plans.md#database-lab-engine)
-automatically includes these options.
-
-If you are making a warm cache query, you see only the `shared hits`.
-
-For example in #database-lab:
-
-```plaintext
-Shared buffers:
- - hits: 36467 (~284.90 MiB) from the buffer pool
- - reads: 0 from the OS file cache, including disk I/O
-```
-
-Or in the explain plan from `psql`:
-
-```sql
-Buffers: shared hit=7323
-```
-
-If the cache is cold, you also see `reads`.
-
-In #database-lab:
-
-```plaintext
-Shared buffers:
- - hits: 17204 (~134.40 MiB) from the buffer pool
- - reads: 15229 (~119.00 MiB) from the OS file cache, including disk I/O
-```
-
-In `psql`:
-
-```sql
-Buffers: shared hit=7202 read=121
-```
+<!-- This redirect file can be deleted after <2022-11-06>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/query_recorder.md b/doc/development/query_recorder.md
index 371d6e0e49e..cb05bc604af 100644
--- a/doc/development/query_recorder.md
+++ b/doc/development/query_recorder.md
@@ -1,145 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/query_recorder.md'
+remove_date: '2022-11-06'
---
-# QueryRecorder
+This document was moved to [another location](database/query_recorder.md).
-QueryRecorder is a tool for detecting the [N+1 queries problem](https://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations) from tests.
-
-> Implemented in [spec/support/query_recorder.rb](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/support/helpers/query_recorder.rb) via [9c623e3e](https://gitlab.com/gitlab-org/gitlab-foss/commit/9c623e3e5d7434f2e30f7c389d13e5af4ede770a)
-
-As a rule, merge requests [should not increase query counts](merge_request_performance_guidelines.md#query-counts). If you find yourself adding something like `.includes(:author, :assignee)` to avoid having `N+1` queries, consider using QueryRecorder to enforce this with a test. Without this, a new feature which causes an additional model to be accessed can silently reintroduce the problem.
-
-## How it works
-
-This style of test works by counting the number of SQL queries executed by ActiveRecord. First a control count is taken, then you add new records to the database and rerun the count. If the number of queries has significantly increased then an `N+1` queries problem exists.
-
-```ruby
-it "avoids N+1 database queries" do
- control = ActiveRecord::QueryRecorder.new { visit_some_page }
- create_list(:issue, 5)
- expect { visit_some_page }.not_to exceed_query_limit(control)
-end
-```
-
-You can if you wish, have both the expectation and the control as
-`QueryRecorder` instances:
-
-```ruby
-it "avoids N+1 database queries" do
- control = ActiveRecord::QueryRecorder.new { visit_some_page }
- create_list(:issue, 5)
- action = ActiveRecord::QueryRecorder.new { visit_some_page }
-
- expect(action).not_to exceed_query_limit(control)
-end
-```
-
-As an example you might create 5 issues in between counts, which would cause the query count to increase by 5 if an N+1 problem exists.
-
-In some cases the query count might change slightly between runs for unrelated reasons. In this case you might need to test `exceed_query_limit(control_count + acceptable_change)`, but this should be avoided if possible.
-
-If this test fails, and the control was passed as a `QueryRecorder`, then the
-failure message indicates where the extra queries are by matching queries on
-the longest common prefix, grouping similar queries together.
-
-## Cached queries
-
-By default, QueryRecorder ignores [cached queries](merge_request_performance_guidelines.md#cached-queries) in the count. However, it may be better to count
-all queries to avoid introducing an N+1 query that may be masked by the statement cache.
-To do this, this requires the `:use_sql_query_cache` flag to be set.
-You should pass the `skip_cached` variable to `QueryRecorder` and use the `exceed_all_query_limit` matcher:
-
-```ruby
-it "avoids N+1 database queries", :use_sql_query_cache do
- control = ActiveRecord::QueryRecorder.new(skip_cached: false) { visit_some_page }
- create_list(:issue, 5)
- expect { visit_some_page }.not_to exceed_all_query_limit(control)
-end
-```
-
-## Use request specs instead of controller specs
-
-Use a [request spec](https://gitlab.com/gitlab-org/gitlab-foss/tree/master/spec/requests) when writing a N+1 test on the controller level.
-
-Controller specs should not be used to write N+1 tests as the controller is only initialized once per example.
-This could lead to false successes where subsequent "requests" could have queries reduced (for example, because of memoization).
-
-## Finding the source of the query
-
-There are multiple ways to find the source of queries.
-
-- Inspect the `QueryRecorder` `data` attribute. It stores queries by `file_name:line_number:method_name`.
- Each entry is a `hash` with the following fields:
-
- - `count`: the number of times a query from this `file_name:line_number:method_name` was called
- - `occurrences`: the actual `SQL` of each call
- - `backtrace`: the stack trace of each call (if either of the two following options were enabled)
-
- `QueryRecorder#find_query` allows filtering queries by their `file_name:line_number:method_name` and
- `count` attributes. For example:
-
- ```ruby
- control = ActiveRecord::QueryRecorder.new(skip_cached: false) { visit_some_page }
- control.find_query(/.*note.rb.*/, 0, first_only: true)
- ```
-
- `QueryRecorder#occurrences_by_line_method` returns a sorted array based on `data`, sorted by `count`.
-
-- View the call backtrace for the specific `QueryRecorder` instance you want
- by using `ActiveRecord::QueryRecorder.new(query_recorder_debug: true)`. The output
- is stored in file `test.log`.
-
-- Enable the call backtrace for all tests using the `QUERY_RECORDER_DEBUG` environment variable.
-
- To enable this, run the specs with the `QUERY_RECORDER_DEBUG` environment variable set. For example:
-
- ```shell
- QUERY_RECORDER_DEBUG=1 bundle exec rspec spec/requests/api/projects_spec.rb
- ```
-
- This logs calls to QueryRecorder into the `test.log` file. For example:
-
- ```sql
- QueryRecorder SQL: SELECT COUNT(*) FROM "issues" WHERE "issues"."deleted_at" IS NULL AND "issues"."project_id" = $1 AND ("issues"."state" IN ('opened')) AND "issues"."confidential" = $2
- --> /home/user/gitlab/gdk/gitlab/spec/support/query_recorder.rb:19:in `callback'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/notifications/fanout.rb:127:in `finish'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/notifications/fanout.rb:46:in `block in finish'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/notifications/fanout.rb:46:in `each'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/notifications/fanout.rb:46:in `finish'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/notifications/instrumenter.rb:36:in `finish'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/notifications/instrumenter.rb:25:in `instrument'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/abstract_adapter.rb:478:in `log'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/postgresql_adapter.rb:601:in `exec_cache'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/postgresql_adapter.rb:585:in `execute_and_clear'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/postgresql/database_statements.rb:160:in `exec_query'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/abstract/database_statements.rb:356:in `select'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/abstract/database_statements.rb:32:in `select_all'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/abstract/query_cache.rb:68:in `block in select_all'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/abstract/query_cache.rb:83:in `cache_sql'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/connection_adapters/abstract/query_cache.rb:68:in `select_all'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/relation/calculations.rb:270:in `execute_simple_calculation'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/relation/calculations.rb:227:in `perform_calculation'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/relation/calculations.rb:133:in `calculate'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activerecord-4.2.8/lib/active_record/relation/calculations.rb:48:in `count'
- --> /home/user/gitlab/gdk/gitlab/app/services/base_count_service.rb:20:in `uncached_count'
- --> /home/user/gitlab/gdk/gitlab/app/services/base_count_service.rb:12:in `block in count'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/cache.rb:299:in `block in fetch'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/cache.rb:585:in `block in save_block_result_to_cache'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/cache.rb:547:in `block in instrument'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/notifications.rb:166:in `instrument'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/cache.rb:547:in `instrument'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/cache.rb:584:in `save_block_result_to_cache'
- --> /home/user/.rbenv/versions/2.3.5/lib/ruby/gems/2.3.0/gems/activesupport-4.2.8/lib/active_support/cache.rb:299:in `fetch'
- --> /home/user/gitlab/gdk/gitlab/app/services/base_count_service.rb:12:in `count'
- --> /home/user/gitlab/gdk/gitlab/app/models/project.rb:1296:in `open_issues_count'
- ```
-
-## See also
-
-- [Bullet](profiling.md#bullet) For finding `N+1` query problems
-- [Performance guidelines](performance.md)
-- [Merge request performance guidelines - Query counts](merge_request_performance_guidelines.md#query-counts)
-- [Merge request performance guidelines - Cached queries](merge_request_performance_guidelines.md#cached-queries)
+<!-- This redirect file can be deleted after <2022-11-06>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/rails_update.md b/doc/development/rails_update.md
index 36ffae97377..9907a78421f 100644
--- a/doc/development/rails_update.md
+++ b/doc/development/rails_update.md
@@ -27,8 +27,8 @@ We strive to run GitLab using the latest Rails releases to benefit from performa
1. Run `yarn patch-package @rails/ujs` after updating this to ensure our local patch file version matches.
1. Create an MR with the `pipeline:run-all-rspec` label and see if pipeline breaks.
1. To resolve and debug spec failures use `git bisect` against the rails repository. See the [debugging section](#git-bisect-against-rails) below.
-1. Include links to the Gem diffs between the two versions in the merge request description. For example, this is the gem diff for [`activesupport` 6.1.3.2 to
-6.1.4.1](https://my.diffend.io/gems/activerecord/6.1.3.2/6.1.4.1).
+1. Include links to the Gem diffs between the two versions in the merge request description. For example, this is the gem diff for
+ [`activesupport` 6.1.3.2 to 6.1.4.1](https://my.diffend.io/gems/activerecord/6.1.3.2/6.1.4.1).
### Prepare an MR for Gitaly
diff --git a/doc/development/real_time.md b/doc/development/real_time.md
index df725a36a93..21f3ee1f3b2 100644
--- a/doc/development/real_time.md
+++ b/doc/development/real_time.md
@@ -60,8 +60,8 @@ downstream services.
To mitigate this, ensure that the code establishing the new WebSocket connection
is feature flagged and defaulted to `off`. A careful, percentage-based roll-out
-of the feature flag ensures that effects can be observed on the [WebSocket
-dashboard](https://dashboards.gitlab.net/d/websockets-main/websockets-overview?orgId=1)
+of the feature flag ensures that effects can be observed on the
+[WebSocket dashboard](https://dashboards.gitlab.net/d/websockets-main/websockets-overview?orgId=1)
1. Create a
[feature flag roll-out](https://gitlab.com/gitlab-org/gitlab/-/blob/master/.gitlab/issue_templates/Feature%20Flag%20Roll%20Out.md)
diff --git a/doc/development/redis/new_redis_instance.md b/doc/development/redis/new_redis_instance.md
index 4900755b58c..efaf1e5a6d0 100644
--- a/doc/development/redis/new_redis_instance.md
+++ b/doc/development/redis/new_redis_instance.md
@@ -169,7 +169,7 @@ MultiStore uses two feature flags to control the actual migration:
- `use_primary_and_secondary_stores_for_[store_name]`
- `use_primary_store_as_default_for_[store_name]`
-For example, if our new Redis instance is called `Gitlab::Redis::Foo`, we can [create](../../../ee/development/feature_flags/#create-a-new-feature-flag) two feature flags by executing:
+For example, if our new Redis instance is called `Gitlab::Redis::Foo`, we can [create](../feature_flags/index.md#create-a-new-feature-flag) two feature flags by executing:
```shell
bin/feature-flag use_primary_and_secondary_stores_for_foo
@@ -265,7 +265,7 @@ instances to cope without this functional partition.
If we decide to keep the migration code:
- We should document the migration steps.
-- If we used a feature flag, we should ensure it's an [ops type feature
- flag](../feature_flags/index.md#ops-type), as these are long-lived flags.
+- If we used a feature flag, we should ensure it's an
+ [ops type feature flag](../feature_flags/index.md#ops-type), as these are long-lived flags.
Otherwise, we can remove the flags and conclude the project.
diff --git a/doc/development/reusing_abstractions.md b/doc/development/reusing_abstractions.md
index f3eb1ebcc0c..ef4e8b0310f 100644
--- a/doc/development/reusing_abstractions.md
+++ b/doc/development/reusing_abstractions.md
@@ -109,7 +109,7 @@ the various abstractions and what they can (not) reuse:
| Abstraction | Service classes | Finders | Presenters | Serializers | Model instance method | Model class methods | Active Record | Worker
|:-----------------------|:-----------------|:---------|:------------|:--------------|:------------------------|:----------------------|:----------------|:--------
-| Controller | Yes | Yes | Yes | Yes | Yes | No | No | No
+| Controller/API endpoint| Yes | Yes | Yes | Yes | Yes | No | No | No
| Service class | Yes | Yes | No | No | Yes | No | No | Yes
| Finder | No | No | No | No | Yes | Yes | No | No
| Presenter | No | Yes | No | No | Yes | Yes | No | No
@@ -125,9 +125,11 @@ Everything in `app/controllers`.
Controllers should not do much work on their own, instead they simply pass input
to other classes and present the results.
-### Grape endpoint
+### API endpoints
-Everything in `lib/api`.
+Everything in `lib/api` (the REST API) and `app/graphql` (the GraphQL API).
+
+API endpoints have the same abstraction level as controllers.
### Service classes
@@ -145,6 +147,27 @@ Legacy classes inherited from `BaseService` for historical reasons.
In Service classes the use of `execute` and `#execute` is preferred over `call` and `#call`.
+Model properties should be passed to the constructor in a `params` hash, and will be assigned directly.
+
+To pass extra parameters (which need to be processed, and are not model properties),
+include an `options` hash in the constructor and store it in an instance variable:
+
+```ruby
+# container: Project, or Group
+# current_user: Current user
+# params: Model properties from the controller, already allowlisted with strong parameters
+# options: Configuration for this service, can be any of the following:
+# notify: Whether to send a notifcation to the current user
+# cc: Email address to copy when sending a notification
+def initialize(container:, current_user: nil, params: {}, options: {})
+ super(container, current_user, params)
+ @options = options
+end
+```
+
+View the [initial discussion](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/90008#note_988744060)
+and [further discussion](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/90853#note_1053425083).
+
Classes that are not service objects should be [created elsewhere](directory_structure.md#use-namespaces-to-define-bounded-contexts), such as in `lib`.
#### ServiceResponse
@@ -206,7 +229,27 @@ See [the documentation](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/p
Everything in `app/serializers`, used for presenting the response to a request,
typically in JSON.
-### Model class methods
+### Models
+
+Classes and modules in `app/models` represent domain concepts that encapsulate both
+[data and behavior](https://en.wikipedia.org/wiki/Domain_model).
+
+These classes can interact directly with a data store (like ActiveRecord models) or
+can be a thin wrapper (Plain Old Ruby Objects) on top of ActiveRecord models to express a
+richer domain concept.
+
+[Entities and Value Objects](https://martinfowler.com/bliki/EvansClassification.html)
+that represent domain concepts are considered domain models.
+
+Some examples:
+
+- [`DesignManagement::DesignAtVersion`](https://gitlab.com/gitlab-org/gitlab/-/blob/b62ce98cff8e0530210670f9cb0314221181b77f/app/models/design_management/design_at_version.rb)
+ is a model that leverages validations to combine designs and versions.
+- [`Ci::Minutes::Usage`](https://gitlab.com/gitlab-org/gitlab/-/blob/ec52f19f7325410177c00fef06379f55ab7cab67/ee/app/models/ci/minutes/usage.rb)
+ is a Value Object that provides [CI/CD minutes usage](../ci/pipelines/cicd_minutes.md)
+ for a given namespace.
+
+#### Model class methods
These are class methods defined by _GitLab itself_, including the following
methods provided by Active Record:
@@ -220,7 +263,7 @@ methods provided by Active Record:
Any other methods such as `find_by(some_column: X)` are not included, and
instead fall under the "Active Record" abstraction.
-### Model instance methods
+#### Model instance methods
Instance methods defined on Active Record models by _GitLab itself_. Methods
provided by Active Record are not included, except for the following methods:
@@ -230,7 +273,7 @@ provided by Active Record are not included, except for the following methods:
- `destroy`
- `delete`
-### Active Record
+#### Active Record
The API provided by Active Record itself, such as the `where` method, `save`,
`delete_all`, and so on.
diff --git a/doc/development/routing.md b/doc/development/routing.md
index 2b3ecd8127b..3d5857b4237 100644
--- a/doc/development/routing.md
+++ b/doc/development/routing.md
@@ -6,8 +6,8 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Routing
-The GitLab backend is written primarily with Rails so it uses [Rails
-routing](https://guides.rubyonrails.org/routing.html). Beside Rails best
+The GitLab backend is written primarily with Rails so it uses
+[Rails routing](https://guides.rubyonrails.org/routing.html). Beside Rails best
practices, there are few rules unique to the GitLab application. To
support subgroups, GitLab project and group routes use the wildcard
character to match project and group routes. For example, we might have
diff --git a/doc/development/scalability.md b/doc/development/scalability.md
index 39cd0ecfcdd..b7ee0ca1167 100644
--- a/doc/development/scalability.md
+++ b/doc/development/scalability.md
@@ -35,8 +35,8 @@ The application has a tight coupling to the database schema. When the
application starts, Rails queries the database schema, caching the tables and
column types for the data requested. Because of this schema cache, dropping a
column or table while the application is running can produce 500 errors to the
-user. This is why we have a [process for dropping columns and other
-no-downtime changes](database/avoiding_downtime_in_migrations.md).
+user. This is why we have a
+[process for dropping columns and other no-downtime changes](database/avoiding_downtime_in_migrations.md).
#### Multi-tenancy
@@ -61,11 +61,11 @@ There are two ways to deal with this:
- Sharding. Distribute data across multiple databases.
Partitioning is a built-in PostgreSQL feature and requires minimal changes
-in the application. However, it [requires PostgreSQL
-11](https://www.2ndquadrant.com/en/blog/partitioning-evolution-postgresql-11/).
+in the application. However, it
+[requires PostgreSQL 11](https://www.2ndquadrant.com/en/blog/partitioning-evolution-postgresql-11/).
-For example, a natural way to partition is to [partition tables by
-dates](https://gitlab.com/groups/gitlab-org/-/epics/2023). For example,
+For example, a natural way to partition is to
+[partition tables by dates](https://gitlab.com/groups/gitlab-org/-/epics/2023). For example,
the `events` and `audit_events` table are natural candidates for this
kind of partitioning.
@@ -77,10 +77,10 @@ to abstract data access into API calls that abstract the database from
the application, but this is a significant amount of work.
There are solutions that may help abstract the sharding to some extent
-from the application. For example, we want to look at [Citus
-Data](https://www.citusdata.com/product/community) closely. Citus Data
-provides a Rails plugin that adds a [tenant ID to ActiveRecord
-models](https://www.citusdata.com/blog/2017/01/05/easily-scale-out-multi-tenant-apps/).
+from the application. For example, we want to look at
+[Citus Data](https://www.citusdata.com/product/community) closely. Citus Data
+provides a Rails plugin that adds a
+[tenant ID to ActiveRecord models](https://www.citusdata.com/blog/2017/01/05/easily-scale-out-multi-tenant-apps/).
Sharding can also be done based on feature verticals. This is the
microservice approach to sharding, where each service represents a
@@ -97,12 +97,12 @@ systems.
#### Database size
-A recent [database checkup shows a breakdown of the table sizes on
-GitLab.com](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/8022#master-1022016101-8).
+A recent
+[database checkup shows a breakdown of the table sizes on GitLab.com](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/8022#master-1022016101-8).
Since `merge_request_diff_files` contains over 1 TB of data, we want to
-reduce/eliminate this table first. GitLab has support for [storing diffs in
-object storage](../administration/merge_request_diffs.md), which we [want to do on
-GitLab.com](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/7356).
+reduce/eliminate this table first. GitLab has support for
+[storing diffs in object storage](../administration/merge_request_diffs.md), which we
+[want to do on GitLab.com](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/7356).
#### High availability
@@ -128,8 +128,7 @@ some actions that aren't traditionally available in standard load balancers. For
example, the application considers a replica only if its replication lag is low
(for example, WAL data behind by less than 100 MB).
-More [details are in a blog
-post](https://about.gitlab.com/blog/2017/10/02/scaling-the-gitlab-database/).
+More [details are in a blog post](https://about.gitlab.com/blog/2017/10/02/scaling-the-gitlab-database/).
### PgBouncer
@@ -148,10 +147,10 @@ limitation:
- Run multiple PgBouncer instances.
- Use a multi-threaded connection pooler (for example,
- [Odyssey](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/7776).
+ [Odyssey](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/7776).
-On some Linux systems, it's possible to run [multiple PgBouncer instances on
-the same port](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/4796).
+On some Linux systems, it's possible to run
+[multiple PgBouncer instances on the same port](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/4796).
On GitLab.com, we run multiple PgBouncer instances on different ports to
avoid saturating a single core.
diff --git a/doc/development/secure_coding_guidelines.md b/doc/development/secure_coding_guidelines.md
index 9048da77071..8053b4285e6 100644
--- a/doc/development/secure_coding_guidelines.md
+++ b/doc/development/secure_coding_guidelines.md
@@ -196,7 +196,7 @@ Go's [`regexp`](https://pkg.go.dev/regexp) package uses `re2` and isn't vulnerab
- [Rubular](https://rubular.com/) is a nice online tool to fiddle with Ruby Regexps.
- [Runaway Regular Expressions](https://www.regular-expressions.info/catastrophic.html)
-- [The impact of regular expression denial of service (ReDoS) in practice: an empirical study at the ecosystem scale](https://people.cs.vt.edu/~davisjam/downloads/publications/DavisCoghlanServantLee-EcosystemREDOS-ESECFSE18.pdf). This research paper discusses approaches to automatically detect ReDoS vulnerabilities.
+- [The impact of regular expression denial of service (ReDoS) in practice: an empirical study at the ecosystem scale](https://davisjam.github.io/files/publications/DavisCoghlanServantLee-EcosystemREDOS-ESECFSE18.pdf). This research paper discusses approaches to automatically detect ReDoS vulnerabilities.
- [Freezing the web: A study of ReDoS vulnerabilities in JavaScript-based web servers](https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-staicu.pdf). Another research paper about detecting ReDoS vulnerabilities.
## Server Side Request Forgery (SSRF)
diff --git a/doc/development/serializing_data.md b/doc/development/serializing_data.md
index 97e6f665484..aa8b20eded7 100644
--- a/doc/development/serializing_data.md
+++ b/doc/development/serializing_data.md
@@ -1,90 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/serializing_data.md'
+remove_date: '2022-11-06'
---
-# Serializing Data
+This document was moved to [another location](database/serializing_data.md).
-**Summary:** don't store serialized data in the database, use separate columns
-and/or tables instead. This includes storing of comma separated values as a
-string.
-
-Rails makes it possible to store serialized data in JSON, YAML or other formats.
-Such a field can be defined as follows:
-
-```ruby
-class Issue < ActiveRecord::Model
- serialize :custom_fields
-end
-```
-
-While it may be tempting to store serialized data in the database there are many
-problems with this. This document outlines these problems and provide an
-alternative.
-
-## Serialized Data Is Less Powerful
-
-When using a relational database you have the ability to query individual
-fields, change the schema, index data, and so forth. When you use serialized data
-all of that becomes either very difficult or downright impossible. While
-PostgreSQL does offer the ability to query JSON fields it is mostly meant for
-very specialized use cases, and not for more general use. If you use YAML in
-turn there's no way to query the data at all.
-
-## Waste Of Space
-
-Storing serialized data such as JSON or YAML ends up wasting a lot of space.
-This is because these formats often include additional characters (for example, double
-quotes or newlines) besides the data that you are storing.
-
-## Difficult To Manage
-
-There comes a time where you must add a new field to the serialized
-data, or change an existing one. Using serialized data this becomes difficult
-and very time consuming as the only way of doing so is to re-write all the
-stored values. To do so you would have to:
-
-1. Retrieve the data
-1. Parse it into a Ruby structure
-1. Mutate it
-1. Serialize it back to a String
-1. Store it in the database
-
-On the other hand, if one were to use regular columns adding a column would be:
-
-```sql
-ALTER TABLE table_name ADD COLUMN column_name type;
-```
-
-Such a query would take very little to no time and would immediately apply to
-all rows, without having to re-write large JSON or YAML structures.
-
-Finally, there comes a time when the JSON or YAML structure is no longer
-sufficient and you must migrate away from it. When storing only a few rows
-this may not be a problem, but when storing millions of rows such a migration
-can take hours or even days to complete.
-
-## Relational Databases Are Not Document Stores
-
-When storing data as JSON or YAML you're essentially using your database as if
-it were a document store (for example, MongoDB), except you're not using any of the
-powerful features provided by a typical RDBMS _nor_ are you using any of the
-features provided by a typical document store (for example, the ability to index fields
-of documents with variable fields). In other words, it's a waste.
-
-## Consistent Fields
-
-One argument sometimes made in favour of serialized data is having to store
-widely varying fields and values. Sometimes this is truly the case, and then
-perhaps it might make sense to use serialized data. However, in 99% of the cases
-the fields and types stored tend to be the same for every row. Even if there is
-a slight difference you can still use separate columns and just not set the ones
-you don't need.
-
-## The Solution
-
-The solution is to use separate columns and/or separate tables.
-This allows you to use all the features provided by your database, it
-makes it easier to manage and migrate the data, you conserve space, you can
-index the data efficiently and so forth.
+<!-- This redirect file can be deleted after <2022-11-06>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/service_measurement.md b/doc/development/service_measurement.md
index 82fb9731bcb..17e509672f2 100644
--- a/doc/development/service_measurement.md
+++ b/doc/development/service_measurement.md
@@ -19,7 +19,7 @@ The measuring module is a tool that allows to measure a service's execution, and
- RSS memory usage
- Server worker ID
-The measuring module logs these measurements into a structured log called [`service_measurement.log`](../administration/logs.md#service_measurementlog),
+The measuring module logs these measurements into a structured log called [`service_measurement.log`](../administration/logs/index.md#service_measurementlog),
as a single entry for each service execution.
For GitLab.com, `service_measurement.log` is ingested in Elasticsearch and Kibana as part of our monitoring solution.
diff --git a/doc/development/service_ping/implement.md b/doc/development/service_ping/implement.md
index 3263ba6458e..0ebc58dd669 100644
--- a/doc/development/service_ping/implement.md
+++ b/doc/development/service_ping/implement.md
@@ -9,7 +9,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
Service Ping consists of two kinds of data:
- **Counters**: Track how often a certain event happened over time, such as how many CI/CD pipelines have run.
- They are monotonic and always trend up.
+ They are monotonic and usually trend up.
- **Observations**: Facts collected from one or more GitLab instances and can carry arbitrary data.
There are no general guidelines for how to collect those, due to the individual nature of that data.
@@ -94,7 +94,7 @@ add_metric('CountUsersAssociatingMilestonesToReleasesMetric', time_frame: 'all')
```
WARNING:
-Counting over non-unique columns can lead to performance issues. For more information, see the [iterating tables in batches](../iterating_tables_in_batches.md) guide.
+Counting over non-unique columns can lead to performance issues. For more information, see the [iterating tables in batches](../database/iterating_tables_in_batches.md) guide.
Examples:
@@ -269,9 +269,15 @@ Arguments:
#### Ordinary Redis counters
-Example of implementation:
+Example of implementation: [`Gitlab::UsageDataCounters::WikiPageCounter`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data_counters/wiki_page_counter.rb), using Redis methods [`INCR`](https://redis.io/commands/incr) and [`GET`](https://redis.io/commands/get).
-Using Redis methods [`INCR`](https://redis.io/commands/incr), [`GET`](https://redis.io/commands/get), and [`Gitlab::UsageDataCounters::WikiPageCounter`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data_counters/wiki_page_counter.rb)
+Events are handled by counter classes in the `Gitlab::UsageDataCounters` namespace, inheriting from `BaseCounter`, that are either:
+
+1. Listed in [`Gitlab::UsageDataCounters::COUNTERS`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data_counters.rb#L5) to be then included in `Gitlab::UsageData`.
+
+1. Specified in the metric definition using the `RedisMetric` instrumentation class as a `counter_class` option to be picked up using the [metric instrumentation](metrics_instrumentation.md) framework. Refer to the [Redis metrics](metrics_instrumentation.md#redis-metrics) documentation for an example implementation.
+
+Inheriting classes are expected to override `KNOWN_EVENTS` and `PREFIX` constants to build event names and associated metrics. For example, for prefix `issues` and events array `%w[create, update, delete]`, three metrics will be added to the Service Ping payload: `counts.issues_create`, `counts.issues_update` and `counts.issues_delete`.
##### `UsageData` API
@@ -316,7 +322,7 @@ Enabled by default in GitLab 13.7 and later.
#### Redis HLL counters
WARNING:
-HyperLogLog (HLL) is a probabilistic algorithm and its **results always includes some small error**. According to [Redis documentation](https://redis.io/commands/pfcount), data from
+HyperLogLog (HLL) is a probabilistic algorithm and its **results always includes some small error**. According to [Redis documentation](https://redis.io/commands/pfcount/), data from
used HLL implementation is "approximated with a standard error of 0.81%".
NOTE:
@@ -324,7 +330,7 @@ NOTE:
With `Gitlab::UsageDataCounters::HLLRedisCounter` we have available data structures used to count unique values.
-Implemented using Redis methods [PFADD](https://redis.io/commands/pfadd) and [PFCOUNT](https://redis.io/commands/pfcount).
+Implemented using Redis methods [PFADD](https://redis.io/commands/pfadd/) and [PFCOUNT](https://redis.io/commands/pfcount/).
##### Add new events
@@ -371,14 +377,15 @@ Implemented using Redis methods [PFADD](https://redis.io/commands/pfadd) and [PF
- In the controller using the `RedisTracking` module and the following format:
```ruby
- track_redis_hll_event(*controller_actions, name:, if: nil, &block)
+ track_event(*controller_actions, name:, conditions: nil, destinations: [:redis_hll], &block)
```
Arguments:
- `controller_actions`: the controller actions to track.
- `name`: the event name.
- - `if`: optional custom conditions. Uses the same format as Rails callbacks.
+ - `conditions`: optional custom conditions. Uses the same format as Rails callbacks.
+ - `destinations`: optional list of destinations. Currently supports `:redis_hll` and `:snowplow`. Default: [:redis_hll].
- `&block`: optional block that computes and returns the `custom_id` that we want to track. This overrides the `visitor_id`.
Example:
@@ -389,7 +396,7 @@ Implemented using Redis methods [PFADD](https://redis.io/commands/pfadd) and [PF
include RedisTracking
skip_before_action :authenticate_user!, only: :show
- track_redis_hll_event :index, :show, name: 'users_visiting_projects'
+ track_event :index, :show, name: 'users_visiting_projects'
def index
render html: 'index'
@@ -688,7 +695,7 @@ pry(main)> Gitlab::UsageData.count(User.active)
Paste the SQL query into `#database-lab` to see how the query performs at scale.
- GitLab.com's production database has a 15 second timeout.
-- Any single query must stay below the [1 second execution time](../query_performance.md#timing-guidelines-for-queries) with cold caches.
+- Any single query must stay below the [1 second execution time](../database/query_performance.md#timing-guidelines-for-queries) with cold caches.
- Add a specialized index on columns involved to reduce the execution time.
To understand the query's execution, we add the following information
diff --git a/doc/development/service_ping/index.md b/doc/development/service_ping/index.md
index cd8af3e9152..4481fe33bda 100644
--- a/doc/development/service_ping/index.md
+++ b/doc/development/service_ping/index.md
@@ -22,9 +22,7 @@ and sales teams understand how GitLab is used. The data helps to:
Service Ping information is not anonymous. It's linked to the instance's hostname, but does
not contain project names, usernames, or any other specific data.
-Sending a Service Ping payload is optional and you can [disable](../../user/admin_area/settings/usage_statistics.md#enable-or-disable-usage-statistics) it on any
-self-managed instance. When Service Ping is enabled, GitLab gathers data from the other instances
-and can show your instance's usage statistics to your users.
+Service Ping is enabled by default. However, you can [disable](../../user/admin_area/settings/usage_statistics.md#enable-or-disable-usage-statistics) it on any self-managed instance. When Service Ping is enabled, GitLab gathers data from the other instances and can show your instance's usage statistics to your users.
## Service Ping terminology
@@ -113,7 +111,7 @@ sequenceDiagram
1. Finally, the timing metadata information that is used for diagnostic purposes is submitted to the Versions application. It consists of a list of metric identifiers and the time it took to calculate the metrics:
- > - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/37911) in GitLab 15.0 [with a flag(../../user/feature_flags.md), enabled by default.
+ > - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/37911) in GitLab 15.0 [with a flag](../../user/feature_flags.md), enabled by default.
> - [Generally available](https://gitlab.com/gitlab-org/gitlab/-/issues/295289) in GitLab 15.2. [Feature flag `measure_service_ping_metric_collection`](https://gitlab.com/gitlab-org/gitlab/-/issues/358128) removed.
```ruby
diff --git a/doc/development/service_ping/metrics_dictionary.md b/doc/development/service_ping/metrics_dictionary.md
index 2adba5d8095..d063c4c7601 100644
--- a/doc/development/service_ping/metrics_dictionary.md
+++ b/doc/development/service_ping/metrics_dictionary.md
@@ -205,8 +205,8 @@ instance unique identifier.
key_path: uuid
description: GitLab instance unique identifier
product_category: collection
-product_section: growth
-product_stage: growth
+product_section: analytics
+product_stage: analytics
product_group: product_intelligence
value_type: string
status: active
@@ -301,7 +301,7 @@ bundle exec rails generate gitlab:usage_metric_definition:redis_hll issues users
## Metrics Dictionary
-[Metrics Dictionary is a separate application](https://gitlab.com/gitlab-org/growth/product-intelligence/metric-dictionary).
+[Metrics Dictionary is a separate application](https://gitlab.com/gitlab-org/analytics-section/product-intelligence/metric-dictionary).
All metrics available in Service Ping are in the [Metrics Dictionary](https://metrics.gitlab.com/).
diff --git a/doc/development/service_ping/metrics_instrumentation.md b/doc/development/service_ping/metrics_instrumentation.md
index e1c51713f3c..9dc37386111 100644
--- a/doc/development/service_ping/metrics_instrumentation.md
+++ b/doc/development/service_ping/metrics_instrumentation.md
@@ -29,7 +29,7 @@ A metric definition has the [`instrumentation_class`](metrics_dictionary.md) fie
The defined instrumentation class should inherit one of the existing metric classes: `DatabaseMetric`, `RedisMetric`, `RedisHLLMetric`, `NumbersMetric` or `GenericMetric`.
-The current convention is that a single instrumentation class corresponds to a single metric. On a rare occasions, there are exceptions to that convention like [Redis metrics](#redis-metrics). To use a single instrumentation class for more than one metric, please reach out to one of the `@gitlab-org/growth/product-intelligence/engineers` members to consult about your case.
+The current convention is that a single instrumentation class corresponds to a single metric. On rare occasions, there are exceptions to that convention like [Redis metrics](#redis-metrics). To use a single instrumentation class for more than one metric, please reach out to one of the `@gitlab-org/analytics-section/product-intelligence/engineers` members to consult about your case.
Using the instrumentation classes ensures that metrics can fail safe individually, without breaking the entire
process of Service Ping generation.
@@ -38,12 +38,15 @@ We have built a domain-specific language (DSL) to define the metrics instrumenta
## Database metrics
+You can use database metrics to track data kept in the database, for example, a count of issues that exist on a given instance.
+
- `operation`: Operations for the given `relation`, one of `count`, `distinct_count`, `sum`, and `average`.
- `relation`: `ActiveRecord::Relation` for the objects we want to perform the `operation`.
- `start`: Specifies the start value of the batch counting, by default is `relation.minimum(:id)`.
- `finish`: Specifies the end value of the batch counting, by default is `relation.maximum(:id)`.
- `cache_start_and_finish_as`: Specifies the cache key for `start` and `finish` values and sets up caching them. Use this call when `start` and `finish` are expensive queries that should be reused between different metric calculations.
- `available?`: Specifies whether the metric should be reported. The default is `true`.
+- `timestamp_column`: Optionally specifies timestamp column for metric used to filter records for time constrained metrics. The default is `created_at`.
[Example of a merge request that adds a database metric](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/60022).
@@ -149,6 +152,8 @@ end
## Redis metrics
+You can use Redis metrics to track events not kept in the database, for example, a count of how many times the search bar has been used.
+
[Example of a merge request that adds a `Redis` metric](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66582).
Count unique values for `source_code_pushes` event.
@@ -199,6 +204,9 @@ options:
```
## Redis HyperLogLog metrics
+
+You can use Redis HyperLogLog metrics to track events not kept in the database and incremented for unique values such as unique users,
+for example, a count of how many different users used the search bar.
[Example of a merge request that adds a `RedisHLL` metric](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/61685).
@@ -283,6 +291,8 @@ instrumentation_class: 'IssuesBoardsCountMetric'
## Generic metrics
+You can use generic metrics for other metrics, for example, an instance's database version. Observations type of data will always have a Generic metric counter type.
+
- `value`: Specifies the value of the metric.
- `available?`: Specifies whether the metric should be reported. The default is `true`.
diff --git a/doc/development/service_ping/performance_indicator_metrics.md b/doc/development/service_ping/performance_indicator_metrics.md
index bdd4c319d41..d2abc597a22 100644
--- a/doc/development/service_ping/performance_indicator_metrics.md
+++ b/doc/development/service_ping/performance_indicator_metrics.md
@@ -10,8 +10,7 @@ This guide describes how to use metrics definitions to define [performance indic
To use a metric definition to manage a performance indicator:
-1. Create a new issue and use the [Performance Indicator Metric issue template](https://gitlab.com/gitlab-org/gitlab/-/issues/new?issuable_template=Performance%20Indicator%20Metric).
+1. Create a merge request that includes related changes.
1. Use labels `~"product intelligence"`, `"~Data Warehouse::Impact Check"`.
-1. Create a merge request that includes changes related only to the metric performance indicator.
1. Update the metric definition `performance_indicator_type` [field](metrics_dictionary.md#metrics-definition-and-validation).
-1. Create an issue in GitLab Data Team project with the [Product Performance Indicator template](https://gitlab.com/gitlab-data/analytics/-/issues/new?issuable_template=Product%20Performance%20Indicator%20Template).
+1. Create an issue in GitLab Product Data Insights project with the [PI Chart Help template](https://gitlab.com/gitlab-data/product-analytics/-/issues/new?issuable_template=PI%20Chart%20Help) to have the new metric visualized.
diff --git a/doc/development/service_ping/review_guidelines.md b/doc/development/service_ping/review_guidelines.md
index 4ce5b2d577c..1b00858be7e 100644
--- a/doc/development/service_ping/review_guidelines.md
+++ b/doc/development/service_ping/review_guidelines.md
@@ -7,7 +7,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Service Ping review guidelines
This page includes introductory material for a
-[Product Intelligence](https://about.gitlab.com/handbook/engineering/development/growth/product-intelligence/)
+[Product Intelligence](https://about.gitlab.com/handbook/engineering/development/analytics/product-intelligence/)
review, and is specific to Service Ping related reviews. For broader advice and
general best practices for code reviews, refer to our [code review guide](../code_review.md).
@@ -42,7 +42,7 @@ are regular backend changes.
- Assign both the `~backend` and `~product intelligence` reviews to another Product Intelligence team member.
- Assign the maintainer review to someone outside of the Product Intelligence group.
- Assign an
- [engineer](https://gitlab.com/groups/gitlab-org/growth/product-intelligence/engineers/-/group_members?with_inherited_permissions=exclude) from the Product Intelligence team for a review.
+ [engineer](https://gitlab.com/groups/gitlab-org/analytics-section/product-intelligence/engineers/-/group_members?with_inherited_permissions=exclude) from the Product Intelligence team for a review.
- Set the correct attributes in the metric's YAML definition:
- `product_section`, `product_stage`, `product_group`, `product_category`
- Provide a clear description of the metric.
@@ -76,7 +76,7 @@ are regular backend changes.
[Danger bot](../dangerbot.md) adds the list of changed Product Intelligence files
and pings the
-[`@gitlab-org/growth/product-intelligence/engineers`](https://gitlab.com/groups/gitlab-org/growth/product-intelligence/engineers/-/group_members?with_inherited_permissions=exclude) group for merge requests
+[`@gitlab-org/analytics-section/product-intelligence/engineers`](https://gitlab.com/groups/gitlab-org/analytics-section/product-intelligence/engineers/-/group_members?with_inherited_permissions=exclude) group for merge requests
that are not drafts.
Any of the Product Intelligence engineers can be assigned for the Product Intelligence review.
diff --git a/doc/development/service_ping/usage_data.md b/doc/development/service_ping/usage_data.md
index a659bbf2265..4181bd90a02 100644
--- a/doc/development/service_ping/usage_data.md
+++ b/doc/development/service_ping/usage_data.md
@@ -59,7 +59,7 @@ Arguments:
- `end`: custom end of the batch counting to avoid complex min calculations
WARNING:
-Counting over non-unique columns can lead to performance issues. For more information, see the [iterating tables in batches](../iterating_tables_in_batches.md) guide.
+Counting over non-unique columns can lead to performance issues. For more information, see the [iterating tables in batches](../database/iterating_tables_in_batches.md) guide.
Examples:
diff --git a/doc/development/sha1_as_binary.md b/doc/development/sha1_as_binary.md
index a7bb3001ddb..7f928d09470 100644
--- a/doc/development/sha1_as_binary.md
+++ b/doc/development/sha1_as_binary.md
@@ -1,42 +1,11 @@
---
-stage: none
-group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/sha1_as_binary.md'
+remove_date: '2022-11-06'
---
-# Storing SHA1 Hashes As Binary
+This document was moved to [another location](database/sha1_as_binary.md).
-Storing SHA1 hashes as strings is not very space efficient. A SHA1 as a string
-requires at least 40 bytes, an additional byte to store the encoding, and
-perhaps more space depending on the internals of PostgreSQL.
-
-On the other hand, if one were to store a SHA1 as binary one would only need 20
-bytes for the actual SHA1, and 1 or 4 bytes of additional space (again depending
-on database internals). This means that in the best case scenario we can reduce
-the space usage by 50%.
-
-To make this easier to work with you can include the concern `ShaAttribute` into
-a model and define a SHA attribute using the `sha_attribute` class method. For
-example:
-
-```ruby
-class Commit < ActiveRecord::Base
- include ShaAttribute
-
- sha_attribute :sha
-end
-```
-
-This allows you to use the value of the `sha` attribute as if it were a string,
-while storing it as binary. This means that you can do something like this,
-without having to worry about converting data to the right binary format:
-
-```ruby
-commit = Commit.find_by(sha: '88c60307bd1f215095834f09a1a5cb18701ac8ad')
-commit.sha = '971604de4cfa324d91c41650fabc129420c8d1cc'
-commit.save
-```
-
-There is however one requirement: the column used to store the SHA has _must_ be
-a binary type. For Rails this means you need to use the `:binary` type instead
-of `:text` or `:string`.
+<!-- This redirect file can be deleted after <2022-11-06>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/shell_commands.md b/doc/development/shell_commands.md
index fcb8c20bdd3..d8a3f86685e 100644
--- a/doc/development/shell_commands.md
+++ b/doc/development/shell_commands.md
@@ -17,7 +17,7 @@ These guidelines are meant to make your code more reliable _and_ secure.
## Use File and FileUtils instead of shell commands
-Sometimes we invoke basic Unix commands via the shell when there is also a Ruby API for doing it. Use the Ruby API if it exists. <https://www.ruby-doc.org/stdlib-2.0.0/libdoc/fileutils/rdoc/FileUtils.html#module-FileUtils-label-Module+Functions>
+Sometimes we invoke basic Unix commands via the shell when there is also a Ruby API for doing it. Use [the Ruby API](https://ruby-doc.org/stdlib-2.0.0/libdoc/fileutils/rdoc/FileUtils.html#module-FileUtils-label-Module+Functions) if it exists.
```ruby
# Wrong
diff --git a/doc/development/sidekiq/compatibility_across_updates.md b/doc/development/sidekiq/compatibility_across_updates.md
index 96a3573d11a..1d369b5a970 100644
--- a/doc/development/sidekiq/compatibility_across_updates.md
+++ b/doc/development/sidekiq/compatibility_across_updates.md
@@ -18,18 +18,17 @@ several possible situations:
## Adding new workers
-On GitLab.com, we [do not currently have a Sidekiq deployment in the
-canary stage](https://gitlab.com/gitlab-org/gitlab/-/issues/19239). This
-means that a new worker than can be scheduled from an HTTP endpoint may
+On GitLab.com, we
+[do not currently have a Sidekiq deployment in the canary stage](https://gitlab.com/gitlab-org/gitlab/-/issues/19239).
+This means that a new worker than can be scheduled from an HTTP endpoint may
be scheduled from canary but not run on Sidekiq until the full
production deployment is complete. This can be several hours later than
scheduling the job. For some workers, this will not be a problem. For
-others - particularly [latency-sensitive
-jobs](worker_attributes.md#latency-sensitive-jobs) - this will result in a poor user
-experience.
+others - particularly [latency-sensitive jobs](worker_attributes.md#latency-sensitive-jobs) -
+this will result in a poor user experience.
This only applies to new worker classes when they are first introduced.
-As we recommend [using feature flags](../feature_flags/) as a general
+As we recommend [using feature flags](../feature_flags/index.md) as a general
development process, it's best to control the entire change (including
scheduling of the new Sidekiq worker) with a feature flag.
diff --git a/doc/development/sidekiq/idempotent_jobs.md b/doc/development/sidekiq/idempotent_jobs.md
index a5ae8737ad1..5d1ebce763e 100644
--- a/doc/development/sidekiq/idempotent_jobs.md
+++ b/doc/development/sidekiq/idempotent_jobs.md
@@ -78,9 +78,8 @@ GitLab supports two deduplication strategies:
- `until_executing`, which is the default strategy
- `until_executed`
-More [deduplication strategies have been
-suggested](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/195). If
-you are implementing a worker that could benefit from a different
+More [deduplication strategies have been suggested](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/195).
+If you are implementing a worker that could benefit from a different
strategy, please comment in the issue.
#### Until Executing
diff --git a/doc/development/sidekiq/index.md b/doc/development/sidekiq/index.md
index c9906c4c768..003f54d48b5 100644
--- a/doc/development/sidekiq/index.md
+++ b/doc/development/sidekiq/index.md
@@ -9,7 +9,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
We use [Sidekiq](https://github.com/mperham/sidekiq) as our background
job processor. These guides are for writing jobs that will work well on
GitLab.com and be consistent with our existing worker classes. For
-information on administering GitLab, see [configuring Sidekiq](../../administration/sidekiq.md).
+information on administering GitLab, see [configuring Sidekiq](../../administration/sidekiq/index.md).
There are pages with additional detail on the following topics:
@@ -27,12 +27,11 @@ There are pages with additional detail on the following topics:
All workers should include `ApplicationWorker` instead of `Sidekiq::Worker`,
which adds some convenience methods and automatically sets the queue based on
-the [routing rules](../../administration/operations/extra_sidekiq_routing.md#queue-routing-rules).
+the [routing rules](../../administration/sidekiq/extra_sidekiq_routing.md#queue-routing-rules).
## Retries
-Sidekiq defaults to using [25
-retries](https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry),
+Sidekiq defaults to using [25 retries](https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry),
with back-off between each retry. 25 retries means that the last retry
would happen around three weeks after the first attempt (assuming all 24
prior retries failed).
@@ -64,7 +63,7 @@ error rate.
Previously, each worker had its own queue, which was automatically set based on the
worker class name. For a worker named `ProcessSomethingWorker`, the queue name
would be `process_something`. You can now route workers to a specific queue using
-[queue routing rules](../../administration/operations/extra_sidekiq_routing.md#queue-routing-rules).
+[queue routing rules](../../administration/sidekiq/extra_sidekiq_routing.md#queue-routing-rules).
In GDK, new workers are routed to a queue named `default`.
If you're not sure what queue a worker uses,
@@ -75,7 +74,7 @@ After adding a new worker, run `bin/rake
gitlab:sidekiq:all_queues_yml:generate` to regenerate
`app/workers/all_queues.yml` or `ee/app/workers/all_queues.yml` so that
it can be picked up by
-[`sidekiq-cluster`](../../administration/operations/extra_sidekiq_processes.md)
+[`sidekiq-cluster`](../../administration/sidekiq/extra_sidekiq_processes.md)
in installations that don't use routing rules. To learn more about potential changes,
read [Use routing rules by default and deprecate queue selectors for self-managed](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/596).
@@ -176,11 +175,10 @@ available in Sidekiq. There are possible workarounds such as:
Some jobs have a weight declared. This is only used when running Sidekiq
in the default execution mode - using
-[`sidekiq-cluster`](../../administration/operations/extra_sidekiq_processes.md)
+[`sidekiq-cluster`](../../administration/sidekiq/extra_sidekiq_processes.md)
does not account for weights.
-As we are [moving towards using `sidekiq-cluster` in
-Free](https://gitlab.com/gitlab-org/gitlab/-/issues/34396), newly-added
+As we are [moving towards using `sidekiq-cluster` in Free](https://gitlab.com/gitlab-org/gitlab/-/issues/34396), newly-added
workers do not need to have weights specified. They can use the
default weight, which is 1.
diff --git a/doc/development/sidekiq/logging.md b/doc/development/sidekiq/logging.md
index 474ea5de951..b461047ea47 100644
--- a/doc/development/sidekiq/logging.md
+++ b/doc/development/sidekiq/logging.md
@@ -11,8 +11,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
> [Introduced](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/9) in GitLab 12.8.
To have some more information about workers in the logs, we add
-[metadata to the jobs in the form of an
-`ApplicationContext`](../logging.md#logging-context-metadata-through-rails-or-grape-requests).
+[metadata to the jobs in the form of an `ApplicationContext`](../logging.md#logging-context-metadata-through-rails-or-grape-requests).
In most cases, when scheduling a job from a request, this context is already
deducted from the request and added to the scheduled job.
@@ -128,7 +127,7 @@ blocks:
## Arguments logging
-As of GitLab 13.6, Sidekiq job arguments are logged by default, unless [`SIDEKIQ_LOG_ARGUMENTS`](../../administration/troubleshooting/sidekiq.md#log-arguments-to-sidekiq-jobs)
+As of GitLab 13.6, Sidekiq job arguments are logged by default, unless [`SIDEKIQ_LOG_ARGUMENTS`](../../administration/sidekiq/sidekiq_troubleshooting.md#log-arguments-to-sidekiq-jobs)
is disabled.
By default, the only arguments logged are numeric arguments, because
diff --git a/doc/development/sidekiq/worker_attributes.md b/doc/development/sidekiq/worker_attributes.md
index 6820627f761..a1d24d0c392 100644
--- a/doc/development/sidekiq/worker_attributes.md
+++ b/doc/development/sidekiq/worker_attributes.md
@@ -86,13 +86,11 @@ but that always reduces work.
To do this, we want to calculate the expected increase in total execution time
and RPS (throughput) for the new shard. We can get these values from:
-- The [Queue Detail
- dashboard](https://dashboards.gitlab.net/d/sidekiq-queue-detail/sidekiq-queue-detail)
+- The [Queue Detail dashboard](https://dashboards.gitlab.net/d/sidekiq-queue-detail/sidekiq-queue-detail)
has values for the queue itself. For a new queue, we can look for
queues that have similar patterns or are scheduled in similar
circumstances.
-- The [Shard Detail
- dashboard](https://dashboards.gitlab.net/d/sidekiq-shard-detail/sidekiq-shard-detail)
+- The [Shard Detail dashboard](https://dashboards.gitlab.net/d/sidekiq-shard-detail/sidekiq-shard-detail)
has Total Execution Time and Throughput (RPS). The Shard Utilization
panel displays if there is currently any excess capacity for this
shard.
diff --git a/doc/development/single_table_inheritance.md b/doc/development/single_table_inheritance.md
index c8d082e8a67..da8d48f2a42 100644
--- a/doc/development/single_table_inheritance.md
+++ b/doc/development/single_table_inheritance.md
@@ -1,63 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/single_table_inheritance.md'
+remove_date: '2022-11-06'
---
-# Single Table Inheritance
+This document was moved to [another location](database/single_table_inheritance.md).
-**Summary:** don't use Single Table Inheritance (STI), use separate tables
-instead.
-
-Rails makes it possible to have multiple models stored in the same table and map
-these rows to the correct models using a `type` column. This can be used to for
-example store two different types of SSH keys in the same table.
-
-While tempting to use one should avoid this at all costs for the same reasons as
-outlined in the document ["Polymorphic Associations"](polymorphic_associations.md).
-
-## Solution
-
-The solution is very simple: just use a separate table for every type you'd
-otherwise store in the same table. For example, instead of having a `keys` table
-with `type` set to either `Key` or `DeployKey` you'd have two separate tables:
-`keys` and `deploy_keys`.
-
-## In migrations
-
-Whenever a model is used in a migration, single table inheritance should be disabled.
-Due to the way Rails loads associations (even in migrations), failing to disable STI
-could result in loading unexpected code or associations which may cause unintended
-side effects or failures during upgrades.
-
-```ruby
-class SomeMigration < Gitlab::Database::Migration[2.0]
- class Services < MigrationRecord
- self.table_name = 'services'
- self.inheritance_column = :_type_disabled
- end
-
- def up
- ...
-```
-
-If nothing needs to be added to the model other than disabling STI or `EachBatch`,
-use the helper `define_batchable_model` instead of defining the class.
-This ensures that the migration loads the columns for the migration in isolation,
-and the helper disables STI by default.
-
-```ruby
-class EnqueueSomeBackgroundMigration < Gitlab::Database::Migration[1.0]
- disable_ddl_transaction!
-
- def up
- define_batchable_model('services').select(:id).in_batches do |relation|
- jobs = relation.pluck(:id).map do |id|
- ['ExtractServicesUrl', [id]]
- end
-
- BackgroundMigrationWorker.bulk_perform_async(jobs)
- end
- end
- ...
-```
+<!-- This redirect file can be deleted after <2022-11-06>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/snowplow/implementation.md b/doc/development/snowplow/implementation.md
index f8e37aee1e0..9a923b115a2 100644
--- a/doc/development/snowplow/implementation.md
+++ b/doc/development/snowplow/implementation.md
@@ -431,7 +431,7 @@ To test backend Snowplow events, use the `expect_snowplow_event` helper. For mor
### Performance
-We use the [AsyncEmitter](https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/ruby-tracker/emitters/#the-asyncemitter-class) when tracking events, which allows for instrumentation calls to be run in a background thread. This is still an active area of development.
+We use the [AsyncEmitter](https://snowplow.github.io/snowplow-ruby-tracker/SnowplowTracker/AsyncEmitter.html) when tracking events, which allows for instrumentation calls to be run in a background thread. This is still an active area of development.
## Develop and test Snowplow
diff --git a/doc/development/snowplow/index.md b/doc/development/snowplow/index.md
index 155ce87b8d9..24cd9093267 100644
--- a/doc/development/snowplow/index.md
+++ b/doc/development/snowplow/index.md
@@ -89,10 +89,10 @@ Each click event provides attributes that describe the event.
| Attribute | Type | Required | Description |
| --------- | ------- | -------- | ----------- |
-| category | text | true | The page or backend section of the application. Unless infeasible, use the Rails page attribute by default in the frontend, and namespace + class name on the backend. |
+| category | text | true | The page or backend section of the application. Unless infeasible, use the Rails page attribute by default in the frontend, and namespace + class name on the backend, for example, `Notes::CreateService`. |
| action | text | true | The action the user takes, or aspect that's being instrumented. The first word must describe the action or aspect. For example, clicks must be `click`, activations must be `activate`, creations must be `create`. Use underscores to describe what was acted on. For example, activating a form field is `activate_form_input`, an interface action like clicking on a dropdown is `click_dropdown`, a behavior like creating a project record from the backend is `create_project`. |
-| label | text | false | The specific element or object to act on. This can be one of the following: the label of the element, for example, a tab labeled 'Create from template' for `create_from_template`; a unique identifier if no text is available, for example, `groups_dropdown_close` for closing the Groups dropdown in the top bar; or the name or title attribute of a record being created. |
-| property | text | false | Any additional property of the element, or object being acted on. |
+| label | text | false | The specific element or object to act on. This can be one of the following: the label of the element, for example, a tab labeled 'Create from template' for `create_from_template`; a unique identifier if no text is available, for example, `groups_dropdown_close` for closing the Groups dropdown in the top bar; or the name or title attribute of a record being created. For Service Ping metrics adapted to Snowplow events, this should be the full metric [key path](../service_ping/metrics_dictionary.md#metric-key_path) taken from its definition file. |
+| property | text | false | Any additional property of the element, or object being acted on. For Service Ping metrics adapted to Snowplow events, this should be additional information or context that can help analyze the event. For example, in the case of `usage_activity_by_stage_monthly.create.merge_requests_users`, there are four different possible merge request actions: "create", "merge", "comment", and "close". Each of these would be a possible property value. |
| value | decimal | false | Describes a numeric value (decimal) directly related to the event. This could be the value of an input. For example, `10` when clicking `internal` visibility. |
### Examples
@@ -106,6 +106,7 @@ Each click event provides attributes that describe the event.
| `[projects:blob:show]` | `congratulate_first_pipeline` | `click_button` | `[human_access]` | - |
| `[projects:clusters:new]` | `chart_options` | `generate_link` | `[chart_link]` | - |
| `[projects:clusters:new]` | `chart_options` | `click_add_label_button` | `[label_id]` | - |
+| `API::NpmPackages` | `counts.package_events_i_package_push_package_by_deploy_token` | `push_package` | `npm` | - |
_* If you choose to omit the category you can use the default._<br>
_** Use property for variable strings._
diff --git a/doc/development/snowplow/infrastructure.md b/doc/development/snowplow/infrastructure.md
index 758c850e89f..ea4653dc91d 100644
--- a/doc/development/snowplow/infrastructure.md
+++ b/doc/development/snowplow/infrastructure.md
@@ -50,7 +50,7 @@ See [Snowplow technology 101](https://github.com/snowplow/snowplow/#snowplow-tec
### Pseudonymization
-In contrast to a typical Snowplow pipeline, after enrichment, GitLab Snowplow events go through a [pseudonymization service](https://gitlab.com/gitlab-org/growth/product-intelligence/snowplow-pseudonymization) in the form of an AWS Lambda service before they are stored in S3 storage.
+In contrast to a typical Snowplow pipeline, after enrichment, GitLab Snowplow events go through a [pseudonymization service](https://gitlab.com/gitlab-org/analytics-section/product-intelligence/snowplow-pseudonymization) in the form of an AWS Lambda service before they are stored in S3 storage.
#### Why events need to be pseudonymized
@@ -85,7 +85,7 @@ There are several tools that monitor Snowplow events tracking in different stage
- The number of events that successfully reach Snowplow collectors.
- The number of events that failed to reach Snowplow collectors.
- The number of backend events that were sent.
-- [AWS CloudWatch dashboard](https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards:name=SnowPlow;start=P3D) monitors the state of the events in a processing pipeline. The pipeline starts from Snowplow collectors, goes through to enrichers and pseudonymization, and then up to persistence in an S3 bucket. From S3, the events are imported into the Snowflake Data Warehouse. You must have AWS access rights to view this dashboard. For more information, see [monitoring](https://gitlab.com/gitlab-org/growth/product-intelligence/snowplow-pseudonymization#monitoring) in the Snowplow Events pseudonymization service documentation.
+- [AWS CloudWatch dashboard](https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards:name=SnowPlow;start=P3D) monitors the state of the events in a processing pipeline. The pipeline starts from Snowplow collectors, goes through to enrichers and pseudonymization, and then up to persistence in an S3 bucket. From S3, the events are imported into the Snowflake Data Warehouse. You must have AWS access rights to view this dashboard. For more information, see [monitoring](https://gitlab.com/gitlab-org/analytics-section/product-intelligence/snowplow-pseudonymization#monitoring) in the Snowplow Events pseudonymization service documentation.
- [Sisense dashboard](https://app.periscopedata.com/app/gitlab/417669/Snowplow-Summary-Dashboard) provides information about the number of good and bad events imported into the Data Warehouse, in addition to the total number of imported Snowplow events.
For more information, see this [video walk-through](https://www.youtube.com/watch?v=NxPS0aKa_oU).
@@ -93,7 +93,7 @@ For more information, see this [video walk-through](https://www.youtube.com/watc
## Related topics
- [Snowplow technology 101](https://github.com/snowplow/snowplow/#snowplow-technology-101)
-- [Snowplow pseudonymization AWS Lambda project](https://gitlab.com/gitlab-org/growth/product-intelligence/snowplow-pseudonymization)
+- [Snowplow pseudonymization AWS Lambda project](https://gitlab.com/gitlab-org/analytics-section/product-intelligence/snowplow-pseudonymization)
- [Product Intelligence Guide](https://about.gitlab.com/handbook/product/product-intelligence-guide/)
- [Data Infrastructure](https://about.gitlab.com/handbook/business-technology/data-team/platform/infrastructure/)
- [Snowplow architecture overview (internal)](https://www.youtube.com/watch?v=eVYJjzspsLU)
diff --git a/doc/development/snowplow/review_guidelines.md b/doc/development/snowplow/review_guidelines.md
index 673166452b7..44de849792c 100644
--- a/doc/development/snowplow/review_guidelines.md
+++ b/doc/development/snowplow/review_guidelines.md
@@ -7,7 +7,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Snowplow review guidelines
This page includes introductory material for a
-[Product Intelligence](https://about.gitlab.com/handbook/engineering/development/growth/product-intelligence/)
+[Product Intelligence](https://about.gitlab.com/handbook/engineering/development/analytics/product-intelligence/)
review, and is specific to Snowplow related reviews. For broader advice and
general best practices for code reviews, refer to our [code review guide](../code_review.md).
diff --git a/doc/development/sql.md b/doc/development/sql.md
index 8553e2a5500..7101bf7fb4b 100644
--- a/doc/development/sql.md
+++ b/doc/development/sql.md
@@ -79,8 +79,9 @@ ON table_name
USING GIN(column_name gin_trgm_ops);
```
-The key here is the `GIN(column_name gin_trgm_ops)` part. This creates a [GIN
-index](https://www.postgresql.org/docs/current/gin.html) with the operator class set to `gin_trgm_ops`. These indexes
+The key here is the `GIN(column_name gin_trgm_ops)` part. This creates a
+[GIN index](https://www.postgresql.org/docs/current/gin.html)
+with the operator class set to `gin_trgm_ops`. These indexes
_can_ be used by `ILIKE` / `LIKE` and can lead to greatly improved performance.
One downside of these indexes is that they can easily get quite large (depending
on the amount of data indexed).
diff --git a/doc/development/stage_group_observability/dashboards/stage_group_dashboard.md b/doc/development/stage_group_observability/dashboards/stage_group_dashboard.md
index c1831cfce69..c8c18b93a8f 100644
--- a/doc/development/stage_group_observability/dashboards/stage_group_dashboard.md
+++ b/doc/development/stage_group_observability/dashboards/stage_group_dashboard.md
@@ -56,7 +56,7 @@ description, note the following:
To inspect the raw data of the panel for further calculation, select **Inspect** from the dropdown
list of a panel. Queries, raw data, and panel JSON structure are available.
-Read more at [Grafana panel inspection](https://grafana.com/docs/grafana/latest/panels/inspect-panel/).
+Read more at [Grafana panel inspection](http://grafana.com/docs/grafana/next/panels/query-a-data-source/).
All the dashboards are powered by [Grafana](https://grafana.com/), a frontend for displaying metrics.
Grafana consumes the data returned from queries to backend Prometheus data source, then presents it
diff --git a/doc/development/swapping_tables.md b/doc/development/swapping_tables.md
index efb481ccf35..eaa6568dc36 100644
--- a/doc/development/swapping_tables.md
+++ b/doc/development/swapping_tables.md
@@ -1,51 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/swapping_tables.md'
+remove_date: '2022-11-04'
---
-# Swapping Tables
+This document was moved to [another location](database/swapping_tables.md).
-Sometimes you need to replace one table with another. For example, when
-migrating data in a very large table it's often better to create a copy of the
-table and insert & migrate the data into this new table in the background.
-
-Let's say you want to swap the table `events` with `events_for_migration`. In
-this case you need to follow 3 steps:
-
-1. Rename `events` to `events_temporary`
-1. Rename `events_for_migration` to `events`
-1. Rename `events_temporary` to `events_for_migration`
-
-Rails allows you to do this using the `rename_table` method:
-
-```ruby
-rename_table :events, :events_temporary
-rename_table :events_for_migration, :events
-rename_table :events_temporary, :events_for_migration
-```
-
-This does not require any downtime as long as the 3 `rename_table` calls are
-executed in the _same_ database transaction. Rails by default uses database
-transactions for migrations, but if it doesn't you need to start one
-manually:
-
-```ruby
-Event.transaction do
- rename_table :events, :events_temporary
- rename_table :events_for_migration, :events
- rename_table :events_temporary, :events_for_migration
-end
-```
-
-Once swapped you _have to_ reset the primary key of the new table. For
-PostgreSQL you can use the `reset_pk_sequence!` method like so:
-
-```ruby
-reset_pk_sequence!('events')
-```
-
-Failure to reset the primary keys results in newly created rows starting
-with an ID value of 1. Depending on the existing data this can then lead to
-duplicate key constraints from popping up, preventing users from creating new
-data.
+<!-- This redirect file can be deleted after <2022-11-04>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/testing_guide/best_practices.md b/doc/development/testing_guide/best_practices.md
index ea36214f6b7..79a72981e3f 100644
--- a/doc/development/testing_guide/best_practices.md
+++ b/doc/development/testing_guide/best_practices.md
@@ -422,8 +422,8 @@ Use the coverage reports to ensure your tests cover 100% of your code.
### System / Feature tests
NOTE:
-Before writing a new system test, [please consider **not**
-writing one](testing_levels.md#consider-not-writing-a-system-test)!
+Before writing a new system test,
+[please consider **not** writing one](testing_levels.md#consider-not-writing-a-system-test)!
- Feature specs should be named `ROLE_ACTION_spec.rb`, such as
`user_changes_password_spec.rb`.
@@ -909,8 +909,8 @@ By default, Sidekiq jobs are enqueued into a jobs array and aren't processed.
If a test queues Sidekiq jobs and need them to be processed, the
`:sidekiq_inline` trait can be used.
-The `:sidekiq_might_not_need_inline` trait was added when [Sidekiq inline mode was
-changed to fake mode](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/15479)
+The `:sidekiq_might_not_need_inline` trait was added when
+[Sidekiq inline mode was changed to fake mode](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/15479)
to all the tests that needed Sidekiq to actually process jobs. Tests with
this trait should be either fixed to not rely on Sidekiq processing jobs, or their
`:sidekiq_might_not_need_inline` trait should be updated to `:sidekiq_inline` if
@@ -1239,8 +1239,7 @@ The `match_schema` matcher allows validating that the subject matches a
a JSON string or a JSON-compatible data structure.
`match_response_schema` is a convenience matcher for using with a
-response object. from a [request
-spec](testing_levels.md#integration-tests).
+response object. from a [request spec](testing_levels.md#integration-tests).
Examples:
diff --git a/doc/development/testing_guide/contract/consumer_tests.md b/doc/development/testing_guide/contract/consumer_tests.md
index df7c9ee0abd..46f4f446ad9 100644
--- a/doc/development/testing_guide/contract/consumer_tests.md
+++ b/doc/development/testing_guide/contract/consumer_tests.md
@@ -6,7 +6,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Writing consumer tests
-This tutorial guides you through writing a consumer test from scratch. To start, the consumer tests are written using [`jest-pact`](https://github.com/pact-foundation/jest-pact) that builds on top of [`pact-js`](https://github.com/pact-foundation/pact-js). This tutorial shows you how to write a consumer test for the `/discussions.json` endpoint, which is actually `/:namespace_name/:project_name/-/merge_requests/:id/discussions.json`.
+This tutorial guides you through writing a consumer test from scratch. To start, the consumer tests are written using [`jest-pact`](https://github.com/pact-foundation/jest-pact) that builds on top of [`pact-js`](https://github.com/pact-foundation/pact-js). This tutorial shows you how to write a consumer test for the `/discussions.json` REST API endpoint, which is actually `/:namespace_name/:project_name/-/merge_requests/:id/discussions.json`. For an example of a GraphQL consumer test, see [`spec/contracts/consumer/specs/project/pipeline/show.spec.js`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/spec/contracts/consumer/specs/project/pipeline/show.spec.js).
## Create the skeleton
@@ -24,7 +24,7 @@ To learn more about how the contract test directory is structured, see the contr
The Pact consumer test is defined through the `pactWith` function that takes `PactOptions` and the `PactFn`.
```javascript
-const { pactWith } = require('jest-pact');
+import { pactWith } from 'jest-pact';
pactWith(PactOptions, PactFn);
```
@@ -34,7 +34,7 @@ pactWith(PactOptions, PactFn);
`PactOptions` with `jest-pact` introduces [additional options](https://github.com/pact-foundation/jest-pact/blob/dce370c1ab4b7cb5dff12c4b62246dc229c53d0e/README.md#defaults) that build on top of the ones [provided in `pact-js`](https://github.com/pact-foundation/pact-js#constructor). In most cases, you define the `consumer`, `provider`, `log`, and `dir` options for these tests.
```javascript
-const { pactWith } = require('jest-pact');
+import { pactWith } from 'jest-pact';
pactWith(
{
@@ -54,7 +54,7 @@ To learn more about how to name the consumers and providers, see contract testin
The `PactFn` is where your tests are defined. This is where you set up the mock provider and where you can use the standard Jest methods like [`Jest.describe`](https://jestjs.io/docs/api#describename-fn), [`Jest.beforeEach`](https://jestjs.io/docs/api#beforeeachfn-timeout), and [`Jest.it`](https://jestjs.io/docs/api#testname-fn-timeout). For more information, see [https://jestjs.io/docs/api](https://jestjs.io/docs/api).
```javascript
-const { pactWith } = require('jest-pact');
+import { pactWith } from 'jest-pact';
pactWith(
{
@@ -70,7 +70,7 @@ pactWith(
});
- it('return a successful body', () => {
+ it('return a successful body', async () => {
});
});
@@ -92,8 +92,8 @@ For this tutorial, define four attributes for the `Interaction`:
After you define the `Interaction`, add that interaction to the mock provider by calling `addInteraction`.
```javascript
-const { pactWith } = require('jest-pact');
-const { Matchers } = require('@pact-foundation/pact');
+import { pactWith } from 'jest-pact';
+import { Matchers } from '@pact-foundation/pact';
pactWith(
{
@@ -132,7 +132,7 @@ pactWith(
provider.addInteraction(interaction);
});
- it('return a successful body', () => {
+ it('return a successful body', async () => {
});
});
@@ -142,38 +142,36 @@ pactWith(
### Response body `Matchers`
-Notice how we use `Matchers` in the `body` of the expected response. This allows us to be flexible enough to accept different values but still be strict enough to distinguish between valid and invalid values. We must ensure that we have a tight definition that is neither too strict nor too lax. Read more about the [different types of `Matchers`](https://github.com/pact-foundation/pact-js#using-the-v3-matching-rules).
+Notice how we use `Matchers` in the `body` of the expected response. This allows us to be flexible enough to accept different values but still be strict enough to distinguish between valid and invalid values. We must ensure that we have a tight definition that is neither too strict nor too lax. Read more about the [different types of `Matchers`](https://github.com/pact-foundation/pact-js/blob/master/docs/matching.md). We are currently using the V2 matching rules.
## Write the test
After the mock provider is set up, you can write the test. For this test, you make a request and expect a particular response.
-First, set up the client that makes the API request. To do that, create `spec/contracts/consumer/endpoints/project/merge_requests.js` and add the following API request.
+First, set up the client that makes the API request. To do that, create `spec/contracts/consumer/resources/api/project/merge_requests.js` and add the following API request. If the endpoint is a GraphQL, then we create it under `spec/contracts/consumer/resources/graphql` instead.
```javascript
-const axios = require('axios');
-
-exports.getDiscussions = (endpoint) => {
- const url = endpoint.url;
-
- return axios
- .request({
- method: 'GET',
- baseURL: url,
- url: '/gitlab-org/gitlab-qa/-/merge_requests/1/discussions.json',
- headers: { Accept: '*/*' },
- })
- .then((response) => response.data);
-};
+import axios from 'axios';
+
+export async function getDiscussions(endpoint) {
+ const { url } = endpoint;
+
+ return axios({
+ method: 'GET',
+ baseURL: url,
+ url: '/gitlab-org/gitlab-qa/-/merge_requests/1/discussions.json',
+ headers: { Accept: '*/*' },
+ })
+}
```
After that's set up, import it to the test file and call it to make the request. Then, you can make the request and define your expectations.
```javascript
-const { pactWith } = require('jest-pact');
-const { Matchers } = require('@pact-foundation/pact');
+import { pactWith } from 'jest-pact';
+import { Matchers } from '@pact-foundation/pact';
-const { getDiscussions } = require('../endpoints/project/merge_requests');
+import { getDiscussions } from '../../../resources/api/project/merge_requests';
pactWith(
{
@@ -211,17 +209,17 @@ pactWith(
};
});
- it('return a successful body', () => {
- return getDiscussions({
+ it('return a successful body', async () => {
+ const discussions = await getDiscussions({
url: provider.mockService.baseUrl,
- }).then((discussions) => {
- expect(discussions).toEqual(Matchers.eachLike({
- id: 'fd73763cbcbf7b29eb8765d969a38f7d735e222a',
- project_id: 6954442,
- ...
- resolved: true
- }));
});
+
+ expect(discussions).toEqual(Matchers.eachLike({
+ id: 'fd73763cbcbf7b29eb8765d969a38f7d735e222a',
+ project_id: 6954442,
+ ...
+ resolved: true
+ }));
});
});
},
@@ -237,7 +235,7 @@ As you may have noticed, the request and response definitions can get large. Thi
Create a file under `spec/contracts/consumer/fixtures/project/merge_request` called `discussions.fixture.js` where you will place the `request` and `response` definitions.
```javascript
-const { Matchers } = require('@pact-foundation/pact');
+import { Matchers } from '@pact-foundation/pact';
const body = Matchers.eachLike({
id: Matchers.string('fd73763cbcbf7b29eb8765d969a38f7d735e222a'),
@@ -254,11 +252,15 @@ const Discussions = {
headers: {
'Content-Type': 'application/json; charset=utf-8',
},
- body: body,
+ body,
},
- request: {
+ scenario: {
+ state: 'a merge request with discussions exists',
uponReceiving: 'a request for discussions',
+ },
+
+ request: {
withRequest: {
method: 'GET',
path: '/gitlab-org/gitlab-qa/-/merge_requests/1/discussions.json',
@@ -275,36 +277,41 @@ exports.Discussions = Discussions;
With all of that moved to the `fixture`, you can simplify the test to the following:
```javascript
-const { pactWith } = require('jest-pact');
+import { pactWith } from 'jest-pact';
+
+import { Discussions } from '../../../fixtures/project/merge_request/discussions.fixture';
+import { getDiscussions } from '../../../resources/api/project/merge_requests';
-const { Discussions } = require('../fixtures/discussions.fixture');
-const { getDiscussions } = require('../endpoints/project/merge_requests');
+const CONSUMER_NAME = 'MergeRequest#show';
+const PROVIDER_NAME = 'Merge Request Discussions Endpoint';
+const CONSUMER_LOG = '../logs/consumer.log';
+const CONTRACT_DIR = '../contracts/project/merge_request/show';
pactWith(
{
- consumer: 'MergeRequest#show',
- provider: 'Merge Request Discussions Endpoint',
- log: '../logs/consumer.log',
- dir: '../contracts/project/merge_request/show',
+ consumer: CONSUMER_NAME,
+ provider: PROVIDER_NAME,
+ log: CONSUMER_LOG,
+ dir: CONTRACT_DIR,
},
(provider) => {
- describe('Merge Request Discussions Endpoint', () => {
+ describe(PROVIDER_NAME, () => {
beforeEach(() => {
const interaction = {
- state: 'a merge request with discussions exists',
+ ...Discussions.scenario,
...Discussions.request,
willRespondWith: Discussions.success,
};
- return provider.addInteraction(interaction);
+ provider.addInteraction(interaction);
});
- it('return a successful body', () => {
- return getDiscussions({
+ it('return a successful body', async () => {
+ const discussions = await getDiscussions({
url: provider.mockService.baseUrl,
- }).then((discussions) => {
- expect(discussions).toEqual(Discussions.body);
});
+
+ expect(discussions).toEqual(Discussions.body);
});
});
},
diff --git a/doc/development/testing_guide/contract/index.md b/doc/development/testing_guide/contract/index.md
index 8e12eea2874..30a4adaca44 100644
--- a/doc/development/testing_guide/contract/index.md
+++ b/doc/development/testing_guide/contract/index.md
@@ -28,14 +28,14 @@ Before running the consumer tests, go to `spec/contracts/consumer` and run `npm
### Run the provider tests
-Before running the provider tests, make sure your GDK (GitLab Development Kit) is fully set up and running. You can follow the setup instructions detailed in the [GDK repository](https://gitlab.com/gitlab-org/gitlab-development-kit/-/tree/main). To run the provider tests, you use Rake tasks that are defined in [`./lib/tasks/contracts.rake`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/tasks/contracts.rake). To get a list of all the Rake tasks related to the provider tests, run `bundle exec rake -T contracts`. For example:
+Before running the provider tests, make sure your GDK (GitLab Development Kit) is fully set up and running. You can follow the setup instructions detailed in the [GDK repository](https://gitlab.com/gitlab-org/gitlab-development-kit/-/tree/main). To run the provider tests, you use Rake tasks that can be found in [`./lib/tasks/contracts`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/tasks/contracts). To get a list of all the Rake tasks related to the provider tests, run `bundle exec rake -T contracts`. For example:
```shell
$ bundle exec rake -T contracts
-rake contracts:mr:pact:verify:diffs # Verify provider against the consumer pacts for diffs
-rake contracts:mr:pact:verify:discussions # Verify provider against the consumer pacts for discussions
-rake contracts:mr:pact:verify:metadata # Verify provider against the consumer pacts for metadata
-rake contracts:mr:test:merge_request[contract_mr] # Run all merge request contract tests
+rake contracts:merge_requests:pact:verify:diffs_batch # Verify provider against the consumer pacts for diffs_batch
+rake contracts:merge_requests:pact:verify:diffs_metadata # Verify provider against the consumer pacts for diffs_metadata
+rake contracts:merge_requests:pact:verify:discussions # Verify provider against the consumer pacts for discussions
+rake contracts:merge_requests:test:merge_requests[contract_merge_requests] # Run all merge request contract tests
```
## Test suite folder structure and naming conventions
@@ -50,11 +50,11 @@ Having an organized and sensible folder structure for the test suite makes it ea
The consumer tests are grouped according to the different pages in the application. Each file contains various types of requests found in a page. As such, the consumer test files are named using the Rails standards of how pages are referenced. For example, the project pipelines page would be the `Project::Pipeline#index` page so the equivalent consumer test would be located in `consumer/specs/project/pipelines/index.spec.js`.
-When defining the location to output the contract generated by the test, we want to follow the same file structure which would be `contracts/project/pipelines/` for this example. This is the structure in `consumer/endpoints` and `consumer/fixtures` as well.
+When defining the location to output the contract generated by the test, we want to follow the same file structure which would be `contracts/project/pipelines/` for this example. This is the structure in `consumer/resources` and `consumer/fixtures` as well.
#### Provider tests
-The provider tests are grouped similarly to our controllers. Each of these tests contains various tests for an API endpoint. For example, the API endpoint to get a list of pipelines for a project would be located in `provider/pact_helpers/project/pipelines/get_list_project_pipelines_helper.rb`. The provider states are structured the same way.
+The provider tests are grouped similarly to our controllers. Each of these tests contains various tests for an API endpoint. For example, the API endpoint to get a list of pipelines for a project would be located in `provider/pact_helpers/project/pipelines/get_list_project_pipelines_helper.rb`. The provider states are grouped according to the different pages in the application similar to the consumer tests.
### Naming conventions
diff --git a/doc/development/testing_guide/end_to_end/best_practices.md b/doc/development/testing_guide/end_to_end/best_practices.md
index 00b843ffdbe..bfda94b1f1d 100644
--- a/doc/development/testing_guide/end_to_end/best_practices.md
+++ b/doc/development/testing_guide/end_to_end/best_practices.md
@@ -415,8 +415,8 @@ except(page).to have_no_text('hidden')
Unfortunately, that's not automatically the case for the predicate methods that we add to our
[page objects](page_objects.md). We need to [create our own negatable matchers](https://relishapp.com/rspec/rspec-expectations/v/3-9/docs/custom-matchers/define-a-custom-matcher#matcher-with-separate-logic-for-expect().to-and-expect().not-to).
-The initial example uses the `have_job` matcher which is derived from the [`has_job?` predicate
-method of the `Page::Project::Pipeline::Show` page object](https://gitlab.com/gitlab-org/gitlab/-/blob/87864b3047c23b4308f59c27a3757045944af447/qa/qa/page/project/pipeline/show.rb#L53).
+The initial example uses the `have_job` matcher which is derived from the
+[`has_job?` predicate method of the `Page::Project::Pipeline::Show` page object](https://gitlab.com/gitlab-org/gitlab/-/blob/87864b3047c23b4308f59c27a3757045944af447/qa/qa/page/project/pipeline/show.rb#L53).
To create a negatable matcher, we use `has_no_job?` for the negative case:
```ruby
diff --git a/doc/development/testing_guide/end_to_end/feature_flags.md b/doc/development/testing_guide/end_to_end/feature_flags.md
index cb4c8e8a6e8..33f73304a26 100644
--- a/doc/development/testing_guide/end_to_end/feature_flags.md
+++ b/doc/development/testing_guide/end_to_end/feature_flags.md
@@ -217,8 +217,8 @@ If enabling the feature flag results in E2E test failures, you can browse the ar
If an end-to-end test enables a feature flag, the end-to-end test suite can be used to test changes in a merge request
by running the `package-and-qa` job in the merge request pipeline. If the feature flag and relevant changes have already been merged, you can confirm that the tests
-pass on the default branch. The end-to-end tests run on the default branch every two hours, and the results are posted to a [Test
-Session Report, which is available in the testcase-sessions project](https://gitlab.com/gitlab-org/quality/testcase-sessions/-/issues?label_name%5B%5D=found%3Amain).
+pass on the default branch. The end-to-end tests run on the default branch every two hours, and the results are posted to a
+[Test Session Report, which is available in the testcase-sessions project](https://gitlab.com/gitlab-org/quality/testcase-sessions/-/issues?label_name%5B%5D=found%3Amain).
If the relevant tests do not enable the feature flag themselves, you can check if the tests will need to be updated by opening
a draft merge request that enables the flag by default via a [feature flag definition file](../../feature_flags/index.md#feature-flag-definition-and-validation).
diff --git a/doc/development/testing_guide/end_to_end/index.md b/doc/development/testing_guide/end_to_end/index.md
index 06359d612ad..989d090d581 100644
--- a/doc/development/testing_guide/end_to_end/index.md
+++ b/doc/development/testing_guide/end_to_end/index.md
@@ -140,8 +140,8 @@ a flaky test we first want to make sure that it's no longer flaky.
We can do that using the `ce:custom-parallel` and `ee:custom-parallel` jobs.
Both are manual jobs that you can configure using custom variables.
When clicking the name (not the play icon) of one of the parallel jobs,
-you are prompted to enter variables. You can use any of [the variables
-that can be used with `gitlab-qa`](https://gitlab.com/gitlab-org/gitlab-qa/blob/master/docs/what_tests_can_be_run.md#supported-gitlab-environment-variables)
+you are prompted to enter variables. You can use any of
+[the variables that can be used with `gitlab-qa`](https://gitlab.com/gitlab-org/gitlab-qa/blob/master/docs/what_tests_can_be_run.md#supported-gitlab-environment-variables)
as well as these:
| Variable | Description |
@@ -150,8 +150,9 @@ as well as these:
| `QA_TESTS` | The tests to run (no default, which means run all the tests in the scenario). Use file paths as you would when running tests via RSpec, for example, `qa/specs/features/ee/browser_ui` would include all the `EE` UI tests. |
| `QA_RSPEC_TAGS` | The RSpec tags to add (no default) |
-For now, [manual jobs with custom variables don't use the same variable
-when retried](https://gitlab.com/gitlab-org/gitlab/-/issues/31367), so if you want to run the same tests multiple times,
+For now,
+[manual jobs with custom variables don't use the same variable when retried](https://gitlab.com/gitlab-org/gitlab/-/issues/31367),
+so if you want to run the same tests multiple times,
specify the same variables in each `custom-parallel` job (up to as
many of the 10 available jobs that you want to run).
@@ -164,8 +165,8 @@ automatically started: it runs the QA smoke suite against the
You can also manually start the `review-qa-all`: it runs the full QA suite
against the [Review App](../review_apps.md).
-**This runs end-to-end tests against a Review App based on [the official GitLab
-Helm chart](https://gitlab.com/gitlab-org/charts/gitlab/), itself deployed with custom
+**This runs end-to-end tests against a Review App based on
+[the official GitLab Helm chart](https://gitlab.com/gitlab-org/charts/gitlab/), itself deployed with custom
[Cloud Native components](https://gitlab.com/gitlab-org/build/CNG) built from your merge request's changes.**
See [Review Apps](../review_apps.md) for more details about Review Apps.
@@ -235,7 +236,7 @@ Each type of scheduled pipeline generates a static link for the latest test repo
- [`staging-sanity`](https://storage.googleapis.com/gitlab-qa-allure-reports/staging-sanity/master/index.html)
- [`staging-sanity-no-admin`](https://storage.googleapis.com/gitlab-qa-allure-reports/staging-sanity-no-admin/master/index.html)
- [`canary-sanity`](https://storage.googleapis.com/gitlab-qa-allure-reports/canary-sanity/master/index.html)
-- [`production`](https://storage.googleapis.com/gitlab-qa-allure-reports/production/master/index.html)
+- [`production`](https://storage.googleapis.com/gitlab-qa-allure-reports/production-full/master/index.html)
- [`production-sanity`](https://storage.googleapis.com/gitlab-qa-allure-reports/production-sanity/master/index.html)
## How do I run the tests?
@@ -243,8 +244,8 @@ Each type of scheduled pipeline generates a static link for the latest test repo
If you are not [testing code in a merge request](#testing-code-in-merge-requests),
there are two main options for running the tests. If you want to run
the existing tests against a live GitLab instance or against a pre-built Docker image,
-use the [GitLab QA orchestrator](https://gitlab.com/gitlab-org/gitlab-qa/tree/master/README.md). See also [examples
-of the test scenarios you can run via the orchestrator](https://gitlab.com/gitlab-org/gitlab-qa/blob/master/docs/what_tests_can_be_run.md#examples).
+use the [GitLab QA orchestrator](https://gitlab.com/gitlab-org/gitlab-qa/tree/master/README.md). See also
+[examples of the test scenarios you can run via the orchestrator](https://gitlab.com/gitlab-org/gitlab-qa/blob/master/docs/what_tests_can_be_run.md#examples).
On the other hand, if you would like to run against a local development GitLab
environment, you can use the [GitLab Development Kit (GDK)](https://gitlab.com/gitlab-org/gitlab-development-kit/).
@@ -262,8 +263,8 @@ architecture. See the [documentation about it](https://gitlab.com/gitlab-org/git
Once you decided where to put [test environment orchestration scenarios](https://gitlab.com/gitlab-org/gitlab-qa/tree/master/lib/gitlab/qa/scenario) and
[instance-level scenarios](https://gitlab.com/gitlab-org/gitlab-foss/tree/master/qa/qa/specs/features), take a look at the [GitLab QA README](https://gitlab.com/gitlab-org/gitlab/-/tree/master/qa/README.md),
-the [GitLab QA orchestrator README](https://gitlab.com/gitlab-org/gitlab-qa/tree/master/README.md), and [the already existing
-instance-level scenarios](https://gitlab.com/gitlab-org/gitlab-foss/tree/master/qa/qa/specs/features).
+the [GitLab QA orchestrator README](https://gitlab.com/gitlab-org/gitlab-qa/tree/master/README.md),
+and [the already existing instance-level scenarios](https://gitlab.com/gitlab-org/gitlab-foss/tree/master/qa/qa/specs/features).
### Consider **not** writing an end-to-end test
diff --git a/doc/development/testing_guide/end_to_end/rspec_metadata_tests.md b/doc/development/testing_guide/end_to_end/rspec_metadata_tests.md
index 591d03db7b8..322f2412e5b 100644
--- a/doc/development/testing_guide/end_to_end/rspec_metadata_tests.md
+++ b/doc/development/testing_guide/end_to_end/rspec_metadata_tests.md
@@ -30,6 +30,7 @@ This is a partial list of the [RSpec metadata](https://relishapp.com/rspec/rspec
| `:ldap_no_tls` | The test requires a GitLab instance to be configured to use an external LDAP server with TLS not enabled. |
| `:ldap_tls` | The test requires a GitLab instance to be configured to use an external LDAP server with TLS enabled. |
| `:mattermost` | The test requires a GitLab Mattermost service on the GitLab instance. |
+| `:metrics` | The test requires a GitLab instance where [dedicated metrics exporters](../../../administration/monitoring/prometheus/web_exporter.md) are running alongside Puma and Sidekiq. |
| `:mixed_env` | The test should only be executed in environments that have a paired canary version available through traffic routing based on the existence of the `gitlab_canary=true` cookie. Tests in this category are switching the cookie mid-test to validate mixed deployment environments. |
| `:object_storage` | The test requires a GitLab instance to be configured to use multiple [object storage types](../../../administration/object_storage.md). Uses MinIO as the object storage server. |
| `:only` | The test is only to be run in specific execution contexts. See [test execution context selection](execution_context_selection.md) for more information. |
diff --git a/doc/development/testing_guide/end_to_end/running_tests_that_require_special_setup.md b/doc/development/testing_guide/end_to_end/running_tests_that_require_special_setup.md
index 438294161ac..322f108783f 100644
--- a/doc/development/testing_guide/end_to_end/running_tests_that_require_special_setup.md
+++ b/doc/development/testing_guide/end_to_end/running_tests_that_require_special_setup.md
@@ -8,11 +8,20 @@ info: To determine the technical writer assigned to the Stage/Group associated w
## Jenkins spec
-The [`jenkins_build_status_spec`](https://gitlab.com/gitlab-org/gitlab/-/blob/163c8a8c814db26d11e104d1cb2dcf02eb567dbe/qa/qa/specs/features/ee/browser_ui/3_create/jenkins/jenkins_build_status_spec.rb) spins up a Jenkins instance in a Docker container based on an image stored in the [GitLab-QA container registry](https://gitlab.com/gitlab-org/gitlab-qa/container_registry).
-The Docker image it uses is preconfigured with some base data and plugins.
-The test then configures the GitLab plugin in Jenkins with a URL of the GitLab instance that are used
-to run the tests. Unfortunately, the GitLab Jenkins plugin does not accept ports so `http://localhost:3000` would
-not be accepted. Therefore, this requires us to run GitLab on port 80 or inside a Docker container.
+The [`jenkins_build_status_spec`](https://gitlab.com/gitlab-org/gitlab/-/blob/24a86debf49f3aed6f2ecfd6e8f9233b3a214181/qa/qa/specs/features/browser_ui/3_create/jenkins/jenkins_build_status_spec.rb)
+spins up a Jenkins instance in a Docker container with the Jenkins GitLab plugin pre-installed. Due to a license restriction we are unable to distribute this image.
+To build a QA compatible image, please visit the [third party images project](https://gitlab.com/gitlab-org/quality/third-party-docker-public), where third party Dockerfiles can be found.
+The project also has instructions for forking and building the images automatically in CI.
+
+Some extra environment variables for the location of the forked repository are also needed.
+
+- `QA_THIRD_PARTY_DOCKER_REGISTRY` (the container registry where the repository/images are hosted, eg `registry.gitlab.com`)
+- `QA_THIRD_PARTY_DOCKER_REPOSITORY` (the base repository path where the images are hosted, eg `registry.gitlab.com/<project path>`)
+- `QA_THIRD_PARTY_DOCKER_USER` (a username that has access to the container registry for this repository)
+- `QA_THIRD_PARTY_DOCKER_PASSWORD` (a password/token for the username to authenticate with)
+
+The test configures the GitLab plugin in Jenkins with a URL of the GitLab instance that are used
+to run the tests. Bi-directional networking is needed between a GitLab instance and Jenkins, so GitLab can also be started in a Docker container.
To start a Docker container for GitLab based on the nightly image:
@@ -21,34 +30,25 @@ docker run \
--publish 80:80 \
--name gitlab \
--hostname localhost \
+ --network test
gitlab/gitlab-ee:nightly
```
To run the tests from the `/qa` directory:
```shell
-WEBDRIVER_HEADLESS=false bin/qa Test::Instance::All http://localhost -- qa/specs/features/ee/browser_ui/3_create/jenkins/jenkins_build_status_spec.rb
+export QA_THIRD_PARTY_DOCKER_REGISTRY=<registry>
+export QA_THIRD_PARTY_DOCKER_REPOSITORY=<repository>
+export QA_THIRD_PARTY_DOCKER_USER=<user with registry access>
+export QA_THIRD_PARTY_DOCKER_PASSWORD=<password for user>
+export WEBDRIVER_HEADLESS=0
+bin/qa Test::Instance::All http://localhost -- qa/specs/features/ee/browser_ui/3_create/jenkins/jenkins_build_status_spec.rb
```
The test automatically spins up a Docker container for Jenkins and tear down once the test completes.
-However, if you need to run Jenkins manually outside of the tests, use this command:
-
-```shell
-docker run \
- --hostname localhost \
- --name jenkins-server \
- --env JENKINS_HOME=jenkins_home \
- --publish 8080:8080 \
- registry.gitlab.com/gitlab-org/gitlab-qa/jenkins-gitlab:version1
-```
-
-Jenkins is available on `http://localhost:8080`.
-
-Administrator username is `admin` and password is `password`.
-
-It is worth noting that this is not an orchestrated test. It is [tagged with the `:orchestrated` meta](https://gitlab.com/gitlab-org/gitlab/-/blob/163c8a8c814db26d11e104d1cb2dcf02eb567dbe/qa/qa/specs/features/ee/browser_ui/3_create/jenkins/jenkins_build_status_spec.rb#L5)
-only to prevent it from running in the pipelines for live environments such as Staging.
+If you need to run Jenkins manually outside of the tests, please refer to the README for the
+[third party images project](https://gitlab.com/gitlab-org/quality/third-party-docker-public/-/blob/main/jenkins/README.md)
### Troubleshooting
@@ -385,6 +385,43 @@ To run the LDAP tests on your local with TLS disabled, follow these steps:
GITLAB_LDAP_USERNAME="tanuki" GITLAB_LDAP_PASSWORD="password" QA_LOG_LEVEL=debug WEBDRIVER_HEADLESS=false bin/qa Test::Instance::All http://localhost qa/specs/features/browser_ui/1_manage/login/log_into_gitlab_via_ldap_spec.rb
```
+## SMTP tests
+
+Tests that are tagged with `:smtp` meta tag are orchestrated tests that ensure email notifications are received by a user.
+
+These tests require a GitLab instance with SMTP enabled and integrated with an SMTP server, [MailHog](https://github.com/mailhog/MailHog).
+
+To run these tests locally against the GDK:
+
+1. Add these settings to your `gitlab.yml` file:
+
+ ```yaml
+ smtp:
+ enabled: true
+ address: "mailhog.test"
+ port: 1025
+ ```
+
+1. Start MailHog in a Docker container:
+
+ ```shell
+ docker network create test && docker run \
+ --network test \
+ --hostname mailhog.test \
+ --name mailhog \
+ --publish 1025:1025 \
+ --publish 8025:8025 \
+ mailhog/mailhog:v1.0.0
+ ```
+
+1. Run the test from [`gitlab/qa`](https://gitlab.com/gitlab-org/gitlab/-/tree/d5447ebb5f99d4c72780681ddf4dc25b0738acba/qa) directory:
+
+ ```shell
+ QA_LOG_LEVEL=debug WEBDRIVER_HEADLESS=false bin/qa Test::Instance::All http://localhost:3000 qa/specs/features/browser_ui/2_plan/email/trigger_email_notification_spec.rb -- --tag orchestrated
+ ```
+
+For instructions on how to run these tests using the `gitlab-qa` gem, please refer to [the GitLab QA documentation](https://gitlab.com/gitlab-org/gitlab-qa/-/blob/master/docs/what_tests_can_be_run.md#testintegrationsmtp-ceeefull-image-address).
+
## Guide to the mobile suite
### What are mobile tests
diff --git a/doc/development/testing_guide/frontend_testing.md b/doc/development/testing_guide/frontend_testing.md
index d91c53823e2..2845dde9a24 100644
--- a/doc/development/testing_guide/frontend_testing.md
+++ b/doc/development/testing_guide/frontend_testing.md
@@ -51,15 +51,12 @@ The default timeout for Jest is set in
If your test exceeds that time, it fails.
If you cannot improve the performance of the tests, you can increase the timeout
-for a specific test using
-[`setTestTimeout`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/frontend/__helpers__/timeout.js).
+for a specific test using [`jest.setTimeout`](https://jestjs.io/docs/27.x/jest-object#jestsettimeouttimeout)
```javascript
-import { setTestTimeout } from 'helpers/timeout';
-
describe('Component', () => {
it('does something amazing', () => {
- setTestTimeout(500);
+ jest.setTimeout(500);
// ...
});
});
@@ -466,7 +463,7 @@ it('waits for an Ajax call', () => {
#### Vue rendering
-Use [`nextTick()`](https://vuejs.org/v2/api/#Vue-nextTick) to wait until a Vue component is
+Use [`nextTick()`](https://v2.vuejs.org/v2/api/#Vue-nextTick) to wait until a Vue component is
re-rendered.
**in Jest:**
diff --git a/doc/development/testing_guide/index.md b/doc/development/testing_guide/index.md
index fa9f1f1ac3e..cd7c70e2eaa 100644
--- a/doc/development/testing_guide/index.md
+++ b/doc/development/testing_guide/index.md
@@ -9,8 +9,8 @@ info: To determine the technical writer assigned to the Stage/Group associated w
This document describes various guidelines and best practices for automated
testing of the GitLab project.
-It is meant to be an _extension_ of the [Thoughtbot testing
-style guide](https://github.com/thoughtbot/guides/tree/master/testing-rspec). If
+It is meant to be an _extension_ of the
+[Thoughtbot testing style guide](https://github.com/thoughtbot/guides/tree/master/testing-rspec). If
this guide defines a rule that contradicts the Thoughtbot guide, this guide
takes precedence. Some guidelines may be repeated verbatim to stress their
importance.
diff --git a/doc/development/testing_guide/review_apps.md b/doc/development/testing_guide/review_apps.md
index 532bb9fcdef..b272d23522e 100644
--- a/doc/development/testing_guide/review_apps.md
+++ b/doc/development/testing_guide/review_apps.md
@@ -25,7 +25,7 @@ For any of the following scenarios, the `start-review-app-pipeline` job would be
On every [pipeline](https://gitlab.com/gitlab-org/gitlab/pipelines/125315730) in the `qa` stage (which comes after the
`review` stage), the `review-qa-smoke` and `review-qa-reliable` jobs are automatically started. The `review-qa-smoke` runs
-the QA smoke suite and the `review-qa-reliable` executes E2E tests identified as [reliable](https://about.gitlab.com/handbook/engineering/quality/quality-engineering/reliable-tests).
+the QA smoke suite and the `review-qa-reliable` executes E2E tests identified as [reliable](https://about.gitlab.com/handbook/engineering/quality/quality-engineering/reliable-tests/).
`review-qa-*` jobs ensure that end-to-end tests for the changes in the merge request pass in a live environment. This shifts the identification of e2e failures from an environment on the path to production to the merge request, to prevent breaking features on GitLab.com or costly GitLab.com deployment blockers. `review-qa-*` failures should be investigated with counterpart SET involvement if needed to help determine the root cause of the error.
diff --git a/doc/development/testing_guide/testing_levels.md b/doc/development/testing_guide/testing_levels.md
index 02f32a031dc..c1bf3609b53 100644
--- a/doc/development/testing_guide/testing_levels.md
+++ b/doc/development/testing_guide/testing_levels.md
@@ -115,7 +115,7 @@ graph RL
Testing the value of a constant means copying it, resulting in extra effort without additional confidence that the value is correct.
- **Vue components**:
Computed properties, methods, and lifecycle hooks can be considered an implementation detail of components, are implicitly covered by component tests, and don't need to be tested.
- For more information, see the [official Vue guidelines](https://vue-test-utils.vuejs.org/guides/#getting-started).
+ For more information, see the [official Vue guidelines](https://v1.test-utils.vuejs.org/guides/#getting-started).
#### What to mock in unit tests
@@ -208,7 +208,7 @@ graph RL
Similar to unit tests, background operations cannot be stopped or waited on. This means they continue running in the following tests and cause side effects.
- **Child components**:
Every component is tested individually, so child components are mocked.
- See also [`shallowMount()`](https://vue-test-utils.vuejs.org/api/#shallowmount)
+ See also [`shallowMount()`](https://v1.test-utils.vuejs.org/api/#shallowmount)
#### What *not* to mock in component tests
diff --git a/doc/development/testing_guide/testing_migrations_guide.md b/doc/development/testing_guide/testing_migrations_guide.md
index d71788e21f3..261a4f4a27e 100644
--- a/doc/development/testing_guide/testing_migrations_guide.md
+++ b/doc/development/testing_guide/testing_migrations_guide.md
@@ -317,8 +317,8 @@ To test these you usually have to:
- Verify that the expected jobs were scheduled, with the correct set
of records, the correct batch size, interval, etc.
-The behavior of the background migration itself needs to be verified in a [separate
-test for the background migration class](#example-background-migration-test).
+The behavior of the background migration itself needs to be verified in a
+[separate test for the background migration class](#example-background-migration-test).
This spec tests the
[`db/post_migrate/20210701111909_backfill_issues_upvotes_count.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/v14.1.0-ee/db/post_migrate/20210701111909_backfill_issues_upvotes_count.rb)
diff --git a/doc/development/understanding_explain_plans.md b/doc/development/understanding_explain_plans.md
index 17fcd5b3e88..72c3df11a96 100644
--- a/doc/development/understanding_explain_plans.md
+++ b/doc/development/understanding_explain_plans.md
@@ -1,829 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/understanding_explain_plans.md'
+remove_date: '2022-11-04'
---
-# Understanding EXPLAIN plans
+This document was moved to [another location](database/understanding_explain_plans.md).
-PostgreSQL allows you to obtain query plans using the `EXPLAIN` command. This
-command can be invaluable when trying to determine how a query performs.
-You can use this command directly in your SQL query, as long as the query starts
-with it:
-
-```sql
-EXPLAIN
-SELECT COUNT(*)
-FROM projects
-WHERE visibility_level IN (0, 20);
-```
-
-When running this on GitLab.com, we are presented with the following output:
-
-```sql
-Aggregate (cost=922411.76..922411.77 rows=1 width=8)
- -> Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0)
- Filter: (visibility_level = ANY ('{0,20}'::integer[]))
-```
-
-When using _just_ `EXPLAIN`, PostgreSQL does not actually execute our query,
-instead it produces an _estimated_ execution plan based on the available
-statistics. This means the actual plan can differ quite a bit. Fortunately,
-PostgreSQL provides us with the option to execute the query as well. To do so,
-we need to use `EXPLAIN ANALYZE` instead of just `EXPLAIN`:
-
-```sql
-EXPLAIN ANALYZE
-SELECT COUNT(*)
-FROM projects
-WHERE visibility_level IN (0, 20);
-```
-
-This produces:
-
-```sql
-Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1)
- -> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
- Filter: (visibility_level = ANY ('{0,20}'::integer[]))
- Rows Removed by Filter: 65677
-Planning time: 2.861 ms
-Execution time: 3428.596 ms
-```
-
-As we can see this plan is quite different, and includes a lot more data. Let's
-discuss this step by step.
-
-Because `EXPLAIN ANALYZE` executes the query, care should be taken when using a
-query that writes data or might time out. If the query modifies data,
-consider wrapping it in a transaction that rolls back automatically like so:
-
-```sql
-BEGIN;
-EXPLAIN ANALYZE
-DELETE FROM users WHERE id = 1;
-ROLLBACK;
-```
-
-The `EXPLAIN` command also takes additional options, such as `BUFFERS`:
-
-```sql
-EXPLAIN (ANALYZE, BUFFERS)
-SELECT COUNT(*)
-FROM projects
-WHERE visibility_level IN (0, 20);
-```
-
-This then produces:
-
-```sql
-Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1)
- Buffers: shared hit=208846
- -> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
- Filter: (visibility_level = ANY ('{0,20}'::integer[]))
- Rows Removed by Filter: 65677
- Buffers: shared hit=208846
-Planning time: 2.861 ms
-Execution time: 3428.596 ms
-```
-
-For more information, refer to the official
-[`EXPLAIN` documentation](https://www.postgresql.org/docs/current/sql-explain.html)
-and [using `EXPLAIN` guide](https://www.postgresql.org/docs/current/using-explain.html).
-
-## Nodes
-
-Every query plan consists of nodes. Nodes can be nested, and are executed from
-the inside out. This means that the innermost node is executed before an outer
-node. This can be best thought of as nested function calls, returning their
-results as they unwind. For example, a plan starting with an `Aggregate`
-followed by a `Nested Loop`, followed by an `Index Only scan` can be thought of
-as the following Ruby code:
-
-```ruby
-aggregate(
- nested_loop(
- index_only_scan()
- index_only_scan()
- )
-)
-```
-
-Nodes are indicated using a `->` followed by the type of node taken. For
-example:
-
-```sql
-Aggregate (cost=922411.76..922411.77 rows=1 width=8)
- -> Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0)
- Filter: (visibility_level = ANY ('{0,20}'::integer[]))
-```
-
-Here the first node executed is `Seq scan on projects`. The `Filter:` is an
-additional filter applied to the results of the node. A filter is very similar
-to Ruby's `Array#select`: it takes the input rows, applies the filter, and
-produces a new list of rows. After the node is done, we perform the `Aggregate`
-above it.
-
-Nested nodes look like this:
-
-```sql
-Aggregate (cost=176.97..176.98 rows=1 width=8) (actual time=0.252..0.252 rows=1 loops=1)
- Buffers: shared hit=155
- -> Nested Loop (cost=0.86..176.75 rows=87 width=0) (actual time=0.035..0.249 rows=36 loops=1)
- Buffers: shared hit=155
- -> Index Only Scan using users_pkey on users users_1 (cost=0.43..4.95 rows=87 width=4) (actual time=0.029..0.123 rows=36 loops=1)
- Index Cond: (id < 100)
- Heap Fetches: 0
- -> Index Only Scan using users_pkey on users (cost=0.43..1.96 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=36)
- Index Cond: (id = users_1.id)
- Heap Fetches: 0
-Planning time: 2.585 ms
-Execution time: 0.310 ms
-```
-
-Here we first perform two separate "Index Only" scans, followed by performing a
-"Nested Loop" on the result of these two scans.
-
-## Node statistics
-
-Each node in a plan has a set of associated statistics, such as the cost, the
-number of rows produced, the number of loops performed, and more. For example:
-
-```sql
-Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0)
-```
-
-Here we can see that our cost ranges from `0.00..908044.47` (we cover this in
-a moment), and we estimate (since we're using `EXPLAIN` and not `EXPLAIN
-ANALYZE`) a total of 5,746,914 rows to be produced by this node. The `width`
-statistics describes the estimated width of each row, in bytes.
-
-The `costs` field specifies how expensive a node was. The cost is measured in
-arbitrary units determined by the query planner's cost parameters. What
-influences the costs depends on a variety of settings, such as `seq_page_cost`,
-`cpu_tuple_cost`, and various others.
-The format of the costs field is as follows:
-
-```sql
-STARTUP COST..TOTAL COST
-```
-
-The startup cost states how expensive it was to start the node, with the total
-cost describing how expensive the entire node was. In general: the greater the
-values, the more expensive the node.
-
-When using `EXPLAIN ANALYZE`, these statistics also include the actual time
-(in milliseconds) spent, and other runtime statistics (for example, the actual number of
-produced rows):
-
-```sql
-Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
-```
-
-Here we can see we estimated 5,746,969 rows to be returned, but in reality we
-returned 5,746,940 rows. We can also see that _just_ this sequential scan took
-2.98 seconds to run.
-
-Using `EXPLAIN (ANALYZE, BUFFERS)` also gives us information about the
-number of rows removed by a filter, the number of buffers used, and more. For
-example:
-
-```sql
-Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
- Filter: (visibility_level = ANY ('{0,20}'::integer[]))
- Rows Removed by Filter: 65677
- Buffers: shared hit=208846
-```
-
-Here we can see that our filter has to remove 65,677 rows, and that we use
-208,846 buffers. Each buffer in PostgreSQL is 8 KB (8192 bytes), meaning our
-above node uses *1.6 GB of buffers*. That's a lot!
-
-Keep in mind that some statistics are per-loop averages, while others are total values:
-
-| Field name | Value type |
-| --- | --- |
-| Actual Total Time | per-loop average |
-| Actual Rows | per-loop average |
-| Buffers Shared Hit | total value |
-| Buffers Shared Read | total value |
-| Buffers Shared Dirtied | total value |
-| Buffers Shared Written | total value |
-| I/O Read Time | total value |
-| I/O Read Write | total value |
-
-For example:
-
-```sql
- -> Index Scan using users_pkey on public.users (cost=0.43..3.44 rows=1 width=1318) (actual time=0.025..0.025 rows=1 loops=888)
- Index Cond: (users.id = issues.author_id)
- Buffers: shared hit=3543 read=9
- I/O Timings: read=17.760 write=0.000
-```
-
-Here we can see that this node used 3552 buffers (3543 + 9), returned 888 rows (`888 * 1`), and the actual duration was 22.2 milliseconds (`888 * 0.025`).
-17.76 milliseconds of the total duration was spent in reading from disk, to retrieve data that was not in the cache.
-
-## Node types
-
-There are quite a few different types of nodes, so we only cover some of the
-more common ones here.
-
-A full list of all the available nodes and their descriptions can be found in
-the [PostgreSQL source file `plannodes.h`](https://gitlab.com/postgres/postgres/blob/master/src/include/nodes/plannodes.h).
-pgMustard's [EXPLAIN docs](https://www.pgmustard.com/docs/explain) also offer detailed look into nodes and their fields.
-
-### Seq Scan
-
-A sequential scan over (a chunk of) a database table. This is like using
-`Array#each`, but on a database table. Sequential scans can be quite slow when
-retrieving lots of rows, so it's best to avoid these for large tables.
-
-### Index Only Scan
-
-A scan on an index that did not require fetching anything from the table. In
-certain cases an index only scan may still fetch data from the table, in this
-case the node includes a `Heap Fetches:` statistic.
-
-### Index Scan
-
-A scan on an index that required retrieving some data from the table.
-
-### Bitmap Index Scan and Bitmap Heap scan
-
-Bitmap scans fall between sequential scans and index scans. These are typically
-used when we would read too much data from an index scan, but too little to
-perform a sequential scan. A bitmap scan uses what is known as a [bitmap
-index](https://en.wikipedia.org/wiki/Bitmap_index) to perform its work.
-
-The [source code of PostgreSQL](https://gitlab.com/postgres/postgres/blob/REL_11_STABLE/src/include/nodes/plannodes.h#L441)
-states the following on bitmap scans:
-
-> Bitmap Index Scan delivers a bitmap of potential tuple locations; it does not
-> access the heap itself. The bitmap is used by an ancestor Bitmap Heap Scan
-> node, possibly after passing through intermediate Bitmap And and/or Bitmap Or
-> nodes to combine it with the results of other Bitmap Index Scans.
-
-### Limit
-
-Applies a `LIMIT` on the input rows.
-
-### Sort
-
-Sorts the input rows as specified using an `ORDER BY` statement.
-
-### Nested Loop
-
-A nested loop executes its child nodes for every row produced by a node that
-precedes it. For example:
-
-```sql
--> Nested Loop (cost=0.86..176.75 rows=87 width=0) (actual time=0.035..0.249 rows=36 loops=1)
- Buffers: shared hit=155
- -> Index Only Scan using users_pkey on users users_1 (cost=0.43..4.95 rows=87 width=4) (actual time=0.029..0.123 rows=36 loops=1)
- Index Cond: (id < 100)
- Heap Fetches: 0
- -> Index Only Scan using users_pkey on users (cost=0.43..1.96 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=36)
- Index Cond: (id = users_1.id)
- Heap Fetches: 0
-```
-
-Here the first child node (`Index Only Scan using users_pkey on users users_1`)
-produces 36 rows, and is executed once (`rows=36 loops=1`). The next node
-produces 1 row (`rows=1`), but is repeated 36 times (`loops=36`). This is
-because the previous node produced 36 rows.
-
-This means that nested loops can quickly slow the query down if the various
-child nodes keep producing many rows.
-
-## Optimising queries
-
-With that out of the way, let's see how we can optimise a query. Let's use the
-following query as an example:
-
-```sql
-SELECT COUNT(*)
-FROM users
-WHERE twitter != '';
-```
-
-This query counts the number of users that have a Twitter profile set.
-Let's run this using `EXPLAIN (ANALYZE, BUFFERS)`:
-
-```sql
-EXPLAIN (ANALYZE, BUFFERS)
-SELECT COUNT(*)
-FROM users
-WHERE twitter != '';
-```
-
-This produces the following plan:
-
-```sql
-Aggregate (cost=845110.21..845110.22 rows=1 width=8) (actual time=1271.157..1271.158 rows=1 loops=1)
- Buffers: shared hit=202662
- -> Seq Scan on users (cost=0.00..844969.99 rows=56087 width=0) (actual time=0.019..1265.883 rows=51833 loops=1)
- Filter: ((twitter)::text <> ''::text)
- Rows Removed by Filter: 2487813
- Buffers: shared hit=202662
-Planning time: 0.390 ms
-Execution time: 1271.180 ms
-```
-
-From this query plan we can see the following:
-
-1. We need to perform a sequential scan on the `users` table.
-1. This sequential scan filters out 2,487,813 rows using a `Filter`.
-1. We use 202,622 buffers, which equals 1.58 GB of memory.
-1. It takes us 1.2 seconds to do all of this.
-
-Considering we are just counting users, that's quite expensive!
-
-Before we start making any changes, let's see if there are any existing indexes
-on the `users` table that we might be able to use. We can obtain this
-information by running `\d users` in a `psql` console, then scrolling down to
-the `Indexes:` section:
-
-```sql
-Indexes:
- "users_pkey" PRIMARY KEY, btree (id)
- "index_users_on_confirmation_token" UNIQUE, btree (confirmation_token)
- "index_users_on_email" UNIQUE, btree (email)
- "index_users_on_reset_password_token" UNIQUE, btree (reset_password_token)
- "index_users_on_static_object_token" UNIQUE, btree (static_object_token)
- "index_users_on_unlock_token" UNIQUE, btree (unlock_token)
- "index_on_users_name_lower" btree (lower(name::text))
- "index_users_on_accepted_term_id" btree (accepted_term_id)
- "index_users_on_admin" btree (admin)
- "index_users_on_created_at" btree (created_at)
- "index_users_on_email_trigram" gin (email gin_trgm_ops)
- "index_users_on_feed_token" btree (feed_token)
- "index_users_on_group_view" btree (group_view)
- "index_users_on_incoming_email_token" btree (incoming_email_token)
- "index_users_on_managing_group_id" btree (managing_group_id)
- "index_users_on_name" btree (name)
- "index_users_on_name_trigram" gin (name gin_trgm_ops)
- "index_users_on_public_email" btree (public_email) WHERE public_email::text <> ''::text
- "index_users_on_state" btree (state)
- "index_users_on_state_and_user_type" btree (state, user_type)
- "index_users_on_unconfirmed_email" btree (unconfirmed_email) WHERE unconfirmed_email IS NOT NULL
- "index_users_on_user_type" btree (user_type)
- "index_users_on_username" btree (username)
- "index_users_on_username_trigram" gin (username gin_trgm_ops)
- "tmp_idx_on_user_id_where_bio_is_filled" btree (id) WHERE COALESCE(bio, ''::character varying)::text IS DISTINCT FROM ''::text
-```
-
-Here we can see there is no index on the `twitter` column, which means
-PostgreSQL has to perform a sequential scan in this case. Let's try to fix this
-by adding the following index:
-
-```sql
-CREATE INDEX CONCURRENTLY twitter_test ON users (twitter);
-```
-
-If we now re-run our query using `EXPLAIN (ANALYZE, BUFFERS)` we get the
-following plan:
-
-```sql
-Aggregate (cost=61002.82..61002.83 rows=1 width=8) (actual time=297.311..297.312 rows=1 loops=1)
- Buffers: shared hit=51854 dirtied=19
- -> Index Only Scan using twitter_test on users (cost=0.43..60873.13 rows=51877 width=0) (actual time=279.184..293.532 rows=51833 loops=1)
- Filter: ((twitter)::text <> ''::text)
- Rows Removed by Filter: 2487830
- Heap Fetches: 26037
- Buffers: shared hit=51854 dirtied=19
-Planning time: 0.191 ms
-Execution time: 297.334 ms
-```
-
-Now it takes just under 300 milliseconds to get our data, instead of 1.2
-seconds. However, we still use 51,854 buffers, which is about 400 MB of memory.
-300 milliseconds is also quite slow for such a simple query. To understand why
-this query is still expensive, let's take a look at the following:
-
-```sql
-Index Only Scan using twitter_test on users (cost=0.43..60873.13 rows=51877 width=0) (actual time=279.184..293.532 rows=51833 loops=1)
- Filter: ((twitter)::text <> ''::text)
- Rows Removed by Filter: 2487830
-```
-
-We start with an index only scan on our index, but we somehow still apply a
-`Filter` that filters out 2,487,830 rows. Why is that? Well, let's look at how
-we created the index:
-
-```sql
-CREATE INDEX CONCURRENTLY twitter_test ON users (twitter);
-```
-
-We told PostgreSQL to index all possible values of the `twitter` column,
-even empty strings. Our query in turn uses `WHERE twitter != ''`. This means
-that the index does improve things, as we don't need to do a sequential scan,
-but we may still encounter empty strings. This means PostgreSQL _has_ to apply a
-Filter on the index results to get rid of those values.
-
-Fortunately, we can improve this even further using "partial indexes". Partial
-indexes are indexes with a `WHERE` condition that is applied when indexing data.
-For example:
-
-```sql
-CREATE INDEX CONCURRENTLY some_index ON users (email) WHERE id < 100
-```
-
-This index would only index the `email` value of rows that match `WHERE id <
-100`. We can use partial indexes to change our Twitter index to the following:
-
-```sql
-CREATE INDEX CONCURRENTLY twitter_test ON users (twitter) WHERE twitter != '';
-```
-
-After being created, if we run our query again we are given the following plan:
-
-```sql
-Aggregate (cost=1608.26..1608.27 rows=1 width=8) (actual time=19.821..19.821 rows=1 loops=1)
- Buffers: shared hit=44036
- -> Index Only Scan using twitter_test on users (cost=0.41..1479.71 rows=51420 width=0) (actual time=0.023..15.514 rows=51833 loops=1)
- Heap Fetches: 1208
- Buffers: shared hit=44036
-Planning time: 0.123 ms
-Execution time: 19.848 ms
-```
-
-That's _a lot_ better! Now it only takes 20 milliseconds to get the data, and we
-only use about 344 MB of buffers (instead of the original 1.58 GB). The reason
-this works is that now PostgreSQL no longer needs to apply a `Filter`, as the
-index only contains `twitter` values that are not empty.
-
-Keep in mind that you shouldn't just add partial indexes every time you want to
-optimise a query. Every index has to be updated for every write, and they may
-require quite a bit of space, depending on the amount of indexed data. As a
-result, first check if there are any existing indexes you may be able to reuse.
-If there aren't any, check if you can perhaps slightly change an existing one to
-fit both the existing and new queries. Only add a new index if none of the
-existing indexes can be used in any way.
-
-When comparing execution plans, don't take timing as the only important metric.
-Good timing is the main goal of any optimization, but it can be too volatile to
-be used for comparison (for example, it depends a lot on the state of cache).
-When optimizing a query, we usually need to reduce the amount of data we're
-dealing with. Indexes are the way to work with fewer pages (buffers) to get the
-result, so, during optimization, look at the number of buffers used (read and hit),
-and work on reducing these numbers. Reduced timing is the consequence of reduced
-buffer numbers. [Database Lab Engine](#database-lab-engine) guarantees that the plan is structurally
-identical to production (and overall number of buffers is the same as on production),
-but difference in cache state and I/O speed may lead to different timings.
-
-## Queries that can't be optimised
-
-Now that we have seen how to optimise a query, let's look at another query that
-we might not be able to optimise:
-
-```sql
-EXPLAIN (ANALYZE, BUFFERS)
-SELECT COUNT(*)
-FROM projects
-WHERE visibility_level IN (0, 20);
-```
-
-The output of `EXPLAIN (ANALYZE, BUFFERS)` is as follows:
-
-```sql
-Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1)
- Buffers: shared hit=208846
- -> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
- Filter: (visibility_level = ANY ('{0,20}'::integer[]))
- Rows Removed by Filter: 65677
- Buffers: shared hit=208846
-Planning time: 2.861 ms
-Execution time: 3428.596 ms
-```
-
-Looking at the output we see the following Filter:
-
-```sql
-Filter: (visibility_level = ANY ('{0,20}'::integer[]))
-Rows Removed by Filter: 65677
-```
-
-Looking at the number of rows removed by the filter, we may be tempted to add an
-index on `projects.visibility_level` to somehow turn this Sequential scan +
-filter into an index-only scan.
-
-Unfortunately, doing so is unlikely to improve anything. Contrary to what some
-might believe, an index being present _does not guarantee_ that PostgreSQL
-actually uses it. For example, when doing a `SELECT * FROM projects` it is much
-cheaper to just scan the entire table, instead of using an index and then
-fetching data from the table. In such cases PostgreSQL may decide to not use an
-index.
-
-Second, let's think for a moment what our query does: it gets all projects with
-visibility level 0 or 20. In the above plan we can see this produces quite a lot
-of rows (5,745,940), but how much is that relative to the total? Let's find out
-by running the following query:
-
-```sql
-SELECT visibility_level, count(*) AS amount
-FROM projects
-GROUP BY visibility_level
-ORDER BY visibility_level ASC;
-```
-
-For GitLab.com this produces:
-
-```sql
- visibility_level | amount
-------------------+---------
- 0 | 5071325
- 10 | 65678
- 20 | 674801
-```
-
-Here the total number of projects is 5,811,804, and 5,746,126 of those are of
-level 0 or 20. That's 98% of the entire table!
-
-So no matter what we do, this query retrieves 98% of the entire table. Since
-most time is spent doing exactly that, there isn't really much we can do to
-improve this query, other than _not_ running it at all.
-
-What is important here is that while some may recommend to straight up add an
-index the moment you see a sequential scan, it is _much more important_ to first
-understand what your query does, how much data it retrieves, and so on. After
-all, you can not optimise something you do not understand.
-
-### Cardinality and selectivity
-
-Earlier we saw that our query had to retrieve 98% of the rows in the table.
-There are two terms commonly used for databases: cardinality, and selectivity.
-Cardinality refers to the number of unique values in a particular column in a
-table.
-
-Selectivity is the number of unique values produced by an operation (for example, an
-index scan or filter), relative to the total number of rows. The higher the
-selectivity, the more likely PostgreSQL is able to use an index.
-
-In the above example, there are only 3 unique values: 0, 10, and 20. This means
-the cardinality is 3. The selectivity in turn is also very low: 0.0000003% (2 /
-5,811,804), because our `Filter` only filters using two values (`0` and `20`).
-With such a low selectivity value it's not surprising that PostgreSQL decides
-using an index is not worth it, because it would produce almost no unique rows.
-
-## Rewriting queries
-
-So the above query can't really be optimised as-is, or at least not much. But
-what if we slightly change the purpose of it? What if instead of retrieving all
-projects with `visibility_level` 0 or 20, we retrieve those that a user
-interacted with somehow?
-
-Fortunately, GitLab has an answer for this, and it's a table called
-`user_interacted_projects`. This table has the following schema:
-
-```sql
-Table "public.user_interacted_projects"
- Column | Type | Modifiers
-------------+---------+-----------
- user_id | integer | not null
- project_id | integer | not null
-Indexes:
- "index_user_interacted_projects_on_project_id_and_user_id" UNIQUE, btree (project_id, user_id)
- "index_user_interacted_projects_on_user_id" btree (user_id)
-Foreign-key constraints:
- "fk_rails_0894651f08" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
- "fk_rails_722ceba4f7" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE
-```
-
-Let's rewrite our query to `JOIN` this table onto our projects, and get the
-projects for a specific user:
-
-```sql
-EXPLAIN ANALYZE
-SELECT COUNT(*)
-FROM projects
-INNER JOIN user_interacted_projects ON user_interacted_projects.project_id = projects.id
-WHERE projects.visibility_level IN (0, 20)
-AND user_interacted_projects.user_id = 1;
-```
-
-What we do here is the following:
-
-1. Get our projects.
-1. `INNER JOIN` `user_interacted_projects`, meaning we're only left with rows in
- `projects` that have a corresponding row in `user_interacted_projects`.
-1. Limit this to the projects with `visibility_level` of 0 or 20, and to
- projects that the user with ID 1 interacted with.
-
-If we run this query we get the following plan:
-
-```sql
- Aggregate (cost=871.03..871.04 rows=1 width=8) (actual time=9.763..9.763 rows=1 loops=1)
- -> Nested Loop (cost=0.86..870.52 rows=203 width=0) (actual time=1.072..9.748 rows=143 loops=1)
- -> Index Scan using index_user_interacted_projects_on_user_id on user_interacted_projects (cost=0.43..160.71 rows=205 width=4) (actual time=0.939..2.508 rows=145 loops=1)
- Index Cond: (user_id = 1)
- -> Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145)
- Index Cond: (id = user_interacted_projects.project_id)
- Filter: (visibility_level = ANY ('{0,20}'::integer[]))
- Rows Removed by Filter: 0
- Planning time: 2.614 ms
- Execution time: 9.809 ms
-```
-
-Here it only took us just under 10 milliseconds to get the data. We can also see
-we're retrieving far fewer projects:
-
-```sql
-Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145)
- Index Cond: (id = user_interacted_projects.project_id)
- Filter: (visibility_level = ANY ('{0,20}'::integer[]))
- Rows Removed by Filter: 0
-```
-
-Here we see we perform 145 loops (`loops=145`), with every loop producing 1 row
-(`rows=1`). This is much less than before, and our query performs much better!
-
-If we look at the plan we also see our costs are very low:
-
-```sql
-Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145)
-```
-
-Here our cost is only 3.45, and it takes us 7.25 milliseconds to do so (0.05 * 145).
-The next index scan is a bit more expensive:
-
-```sql
-Index Scan using index_user_interacted_projects_on_user_id on user_interacted_projects (cost=0.43..160.71 rows=205 width=4) (actual time=0.939..2.508 rows=145 loops=1)
-```
-
-Here the cost is 160.71 (`cost=0.43..160.71`), taking about 2.5 milliseconds
-(based on the output of `actual time=....`).
-
-The most expensive part here is the "Nested Loop" that acts upon the result of
-these two index scans:
-
-```sql
-Nested Loop (cost=0.86..870.52 rows=203 width=0) (actual time=1.072..9.748 rows=143 loops=1)
-```
-
-Here we had to perform 870.52 disk page fetches for 203 rows, 9.748
-milliseconds, producing 143 rows in a single loop.
-
-The key takeaway here is that sometimes you have to rewrite (parts of) a query
-to make it better. Sometimes that means having to slightly change your feature
-to accommodate for better performance.
-
-## What makes a bad plan
-
-This is a bit of a difficult question to answer, because the definition of "bad"
-is relative to the problem you are trying to solve. However, some patterns are
-best avoided in most cases, such as:
-
-- Sequential scans on large tables
-- Filters that remove a lot of rows
-- Performing a certain step that requires _a lot_ of
- buffers (for example, an index scan for GitLab.com that requires more than 512 MB).
-
-As a general guideline, aim for a query that:
-
-1. Takes no more than 10 milliseconds. Our target time spent in SQL per request
- is around 100 milliseconds, so every query should be as fast as possible.
-1. Does not use an excessive number of buffers, relative to the workload. For
- example, retrieving ten rows shouldn't require 1 GB of buffers.
-1. Does not spend a long amount of time performing disk IO operations. The
- setting `track_io_timing` must be enabled for this data to be included in the
- output of `EXPLAIN ANALYZE`.
-1. Applies a `LIMIT` when retrieving rows without aggregating them, such as
- `SELECT * FROM users`.
-1. Doesn't use a `Filter` to filter out too many rows, especially if the query
- does not use a `LIMIT` to limit the number of returned rows. Filters can
- usually be removed by adding a (partial) index.
-
-These are _guidelines_ and not hard requirements, as different needs may require
-different queries. The only _rule_ is that you _must always measure_ your query
-(preferably using a production-like database) using `EXPLAIN (ANALYZE, BUFFERS)`
-and related tools such as:
-
-- [`explain.depesz.com`](https://explain.depesz.com/).
-- [`explain.dalibo.com/`](https://explain.dalibo.com/).
-
-## Producing query plans
-
-There are a few ways to get the output of a query plan. Of course you
-can directly run the `EXPLAIN` query in the `psql` console, or you can
-follow one of the other options below.
-
-### Database Lab Engine
-
-GitLab team members can use [Database Lab Engine](https://gitlab.com/postgres-ai/database-lab), and the companion
-SQL optimization tool - [Joe Bot](https://gitlab.com/postgres-ai/joe).
-
-Database Lab Engine provides developers with their own clone of the production database, while Joe Bot helps with exploring execution plans.
-
-Joe Bot is available in the [`#database-lab`](https://gitlab.slack.com/archives/CLJMDRD8C) channel on Slack,
-and through its [web interface](https://console.postgres.ai/gitlab/joe-instances).
-
-With Joe Bot you can execute DDL statements (like creating indexes, tables, and columns) and get query plans for `SELECT`, `UPDATE`, and `DELETE` statements.
-
-For example, in order to test new index on a column that is not existing on production yet, you can do the following:
-
-Create the column:
-
-```sql
-exec ALTER TABLE projects ADD COLUMN last_at timestamp without time zone
-```
-
-Create the index:
-
-```sql
-exec CREATE INDEX index_projects_last_activity ON projects (last_activity_at) WHERE last_activity_at IS NOT NULL
-```
-
-Analyze the table to update its statistics:
-
-```sql
-exec ANALYZE projects
-```
-
-Get the query plan:
-
-```sql
-explain SELECT * FROM projects WHERE last_activity_at < CURRENT_DATE
-```
-
-Once done you can rollback your changes:
-
-```sql
-reset
-```
-
-For more information about the available options, run:
-
-```sql
-help
-```
-
-The web interface comes with the following execution plan visualizers included:
-
-- [Depesz](https://explain.depesz.com/)
-- [PEV2](https://github.com/dalibo/pev2)
-- [FlameGraph](https://github.com/mgartner/pg_flame)
-
-#### Tips & Tricks
-
-The database connection is now maintained during your whole session, so you can use `exec set ...` for any session variables (such as `enable_seqscan` or `work_mem`). These settings are applied to all subsequent commands until you reset them. For example you can disable parallel queries with
-
-```sql
-exec SET max_parallel_workers_per_gather = 0
-```
-
-### Rails console
-
-Using the [`activerecord-explain-analyze`](https://github.com/6/activerecord-explain-analyze)
-you can directly generate the query plan from the Rails console:
-
-```ruby
-pry(main)> require 'activerecord-explain-analyze'
-=> true
-pry(main)> Project.where('build_timeout > ?', 3600).explain(analyze: true)
- Project Load (1.9ms) SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600)
- ↳ (pry):12
-=> EXPLAIN for: SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600)
-Seq Scan on public.projects (cost=0.00..2.17 rows=1 width=742) (actual time=0.040..0.041 rows=0 loops=1)
- Output: id, name, path, description, created_at, updated_at, creator_id, namespace_id, ...
- Filter: (projects.build_timeout > 3600)
- Rows Removed by Filter: 14
- Buffers: shared hit=2
-Planning time: 0.411 ms
-Execution time: 0.113 ms
-```
-
-### ChatOps
-
-[GitLab team members can also use our ChatOps solution, available in Slack using the
-`/chatops` slash command](chatops_on_gitlabcom.md).
-
-NOTE:
-While ChatOps is still available, the recommended way to generate execution plans is to use [Database Lab Engine](#database-lab-engine).
-
-You can use ChatOps to get a query plan by running the following:
-
-```sql
-/chatops run explain SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
-```
-
-Visualising the plan using <https://explain.depesz.com/> is also supported:
-
-```sql
-/chatops run explain --visual SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
-```
-
-Quoting the query is not necessary.
-
-For more information about the available options, run:
-
-```sql
-/chatops run explain --help
-```
-
-## Further reading
-
-A more extensive guide on understanding query plans can be found in
-the [presentation](https://public.dalibo.com/exports/conferences/_archives/_2012/201211_explain/understanding_explain.pdf)
-from [Dalibo.org](https://www.dalibo.com/en/).
-
-Depesz's blog also has a good [section](https://www.depesz.com/tag/unexplainable/) dedicated to query plans.
+<!-- This redirect file can be deleted after <2022-11-04>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/uploads/working_with_uploads.md b/doc/development/uploads/working_with_uploads.md
index d44f2f69168..5a5f987c37c 100644
--- a/doc/development/uploads/working_with_uploads.md
+++ b/doc/development/uploads/working_with_uploads.md
@@ -6,92 +6,295 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Uploads guide: Adding new uploads
-Here, we describe how to add a new upload route [accelerated](index.md#workhorse-assisted-uploads) by Workhorse.
-
-Upload routes belong to one of these categories:
-
-1. Rails controllers: uploads handled by Rails controllers.
-1. Grape API: uploads handled by a Grape API endpoint.
-1. GraphQL API: uploads handled by a GraphQL resolve function.
-
-WARNING:
-GraphQL uploads do not support [direct upload](index.md#direct-upload). Depending on the use case, the feature may not work on installations without NFS (like GitLab.com or Kubernetes installations). Uploading to object storage inside the GraphQL resolve function may result in timeout errors. For more details, follow [issue #280819](https://gitlab.com/gitlab-org/gitlab/-/issues/280819).
-
-## Update Workhorse for the new route
-
-For both the Rails controller and Grape API uploads, Workhorse must be updated to get the
-support for the new upload route.
-
-1. Open a new issue in the [Workhorse tracker](https://gitlab.com/gitlab-org/gitlab-workhorse/-/issues/new) describing precisely the new upload route:
- - The route's URL.
- - The upload encoding.
- - If possible, provide a dump of the upload request.
-1. Implement and get the MR merged for this issue above.
-1. Ask the Maintainers of [Workhorse](https://gitlab.com/gitlab-org/gitlab-workhorse) to create a new release. You can do that in the merge request
- directly during the maintainer review, or ask for it in the `#workhorse` Slack channel.
-1. Bump the [Workhorse version file](https://gitlab.com/gitlab-org/gitlab/-/blob/master/GITLAB_WORKHORSE_VERSION)
- to the version you have from the previous points, or bump it in the same merge request that contains
- the Rails changes. Refer to [Implementing the new route with a Rails controller](#implementing-the-new-route-with-a-rails-controller) or [Implementing the new route with a Grape API endpoint](#implementing-the-new-route-with-a-grape-api-endpoint) below.
-
-## Implementing the new route with a Rails controller
-
-For a Rails controller upload, we usually have a `multipart/form-data` upload and there are a
-few things to do:
-
-1. The upload is available under the parameter name you're using. For example, it could be an `artifact`
- or a nested parameter such as `user[avatar]`. If you have the upload under the
- `file` parameter, reading `params[:file]` should get you an [`UploadedFile`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/uploaded_file.rb) instance.
-1. Generally speaking, it's a good idea to check if the instance is from the [`UploadedFile`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/uploaded_file.rb) class. For example, see how we checked
-[that the parameter is indeed an `UploadedFile`](https://gitlab.com/gitlab-org/gitlab/-/commit/ea30fe8a71bf16ba07f1050ab4820607b5658719#51c0cc7a17b7f12c32bc41cfab3649ff2739b0eb_79_77).
-
-WARNING:
-**Do not** call `UploadedFile#from_params` directly! Do not build an [`UploadedFile`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/uploaded_file.rb)
-instance using `UploadedFile#from_params`! This method can be unsafe to use depending on the `params`
-passed. Instead, use the [`UploadedFile`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/uploaded_file.rb)
-instance that [`multipart.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/middleware/multipart.rb)
-builds automatically for you.
-
-## Implementing the new route with a Grape API endpoint
-
-For a Grape API upload, we can have a body or multipart upload. Things are slightly more complicated: two endpoints are needed. One for the
-Workhorse pre-upload authorization and one for accepting the upload metadata from Workhorse:
-
-1. Implement an endpoint with the URL + `/authorize` suffix that will:
- - Check that the request is coming from Workhorse with the `require_gitlab_workhorse!` from the [API helpers](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/api/helpers.rb).
- - Check user permissions.
- - Set the status to `200` with `status 200`.
- - Set the content type with `content_type Gitlab::Workhorse::INTERNAL_API_CONTENT_TYPE`.
- - Use your dedicated `Uploader` class (let's say that it's `FileUploader`) to build the response with `FileUploader.workhorse_authorize(params)`.
-1. Implement the endpoint for the upload request that will:
- - Require all the `UploadedFile` objects as parameters.
- - For example, if we expect a single parameter `file` to be an [`UploadedFile`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/uploaded_file.rb) instance,
-use `requires :file, type: ::API::Validations::Types::WorkhorseFile`.
- - Body upload requests have their upload available under the parameter `file`.
- - Check that the request is coming from Workhorse with the `require_gitlab_workhorse!` from the
-[API helpers](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/api/helpers.rb).
- - Check the user permissions.
- - The remaining code of the processing. In this step, the code must read the parameter. For
-our example, it would be `params[:file]`.
-
-WARNING:
-**Do not** call `UploadedFile#from_params` directly! Do not build an [`UploadedFile`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/uploaded_file.rb)
-object using `UploadedFile#from_params`! This method can be unsafe to use depending on the `params`
-passed. Instead, use the [`UploadedFile`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/uploaded_file.rb)
-object that [`multipart.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/middleware/multipart.rb)
-builds automatically for you.
-
-## Document Object Storage buckets and CarrierWave integration
-
-When using Object Storage, GitLab expects each kind of upload to maintain its own bucket in the respective
-Object Storage destination. Moreover, the integration with CarrierWave is not used all the time.
-The [Object Storage Working Group](https://about.gitlab.com/company/team/structure/working-groups/object-storage/)
-is investigating an approach that unifies Object Storage buckets into a single one and removes CarrierWave
-so as to simplify implementation and administration of uploads.
-
-Therefore, document new uploads here by slotting them into the following tables:
-
-- [Feature bucket details](#feature-bucket-details)
-- [CarrierWave integration](#carrierwave-integration)
+## Recommendations
+
+- When creating an uploader, [make it a subclass](#where-should-i-store-my-files) of `AttachmentUploader`
+- Add your uploader to the [tables](#tables) in this document
+- Do not add [new object storage buckets](#where-should-i-store-my-files)
+- Implement [direct upload](#implementing-direct-upload-support)
+- If you need to process your uploads, decide [where to do that](#processing-uploads)
+
+## Background information
+
+- [CarrierWave Uploaders](#carrierwave-uploaders)
+- [GitLab modifications to CarrierWave](#gitlab-modifications-to-carrierwave)
+
+## Where should I store my files?
+
+CarrierWave Uploaders determine where files get
+stored. When you create a new Uploader class you are deciding where to store the files of your new
+feature.
+
+First of all, ask yourself if you need a new Uploader class. It is OK
+to use the same Uploader class for different mountpoints or different
+models.
+
+If you do want or need your own Uploader class then you should make it
+a **subclass of `AttachmentUploader`**. You then inherit the storage
+location and directory scheme from that class. The directory scheme
+is:
+
+```ruby
+File.join(model.class.underscore, mounted_as.to_s, model.id.to_s)
+```
+
+If you look around in the GitLab code base you will find quite a few
+Uploaders that have their own storage location. For object storage,
+this means Uploaders have their own buckets. We now **discourage**
+adding new buckets for the following reasons:
+
+- Using a new bucket adds to development time because you need to make downstream changes in [GDK](https://gitlab.com/gitlab-org/gitlab-development-kit), [Omnibus GitLab](https://gitlab.com/gitlab-org/omnibus-gitlab) and [CNG](https://gitlab.com/gitlab-org/build/CNG).
+- Using a new bucket requires GitLab.com Infrastructure changes, which slows down the roll-out of your new feature
+- Using a new bucket slows down adoption of your new feature for self-managed GitLab installation: people cannot start using your new feature until their local GitLab administrator has configured the new bucket.
+
+By using an existing bucket you avoid all this extra work
+and friction. The `Gitlab.config.uploads` storage location, which is what
+`AttachmentUploader` uses, is guaranteed to already be configured.
+
+## Implementing Direct Upload support
+
+Below we will outline how to implement [direct upload](#direct-upload-via-workhorse) support.
+
+Using direct upload is not always necessary but it is usually a good
+idea. Unless the uploads handled by your feature are both infrequent
+and small, you probably want to implement direct upload. An example of
+a feature with small and infrequent uploads is project avatars: these
+rarely change and the application imposes strict size limits on them.
+
+If your feature handles uploads that are not both infrequent and small,
+then not implementing direct upload support means that you are taking on
+technical debt. At the very least, you should make sure that you _can_
+add direct upload support later.
+
+To support Direct Upload you need two things:
+
+1. A pre-authorization endpoint in Rails
+1. A Workhorse routing rule
+
+Workhorse does not know where to store your upload. To find out it
+makes a pre-authorization request. It also does not know whether or
+where to make a pre-authorization request. For that you need the
+routing rule.
+
+A note to those of us who remember,
+[Workhorse used to be a separate project](https://gitlab.com/groups/gitlab-org/-/epics/4826):
+it is not necessary anymore to break these two steps into separate merge
+requests. In fact it is probably easier to do both in one merge
+request.
+
+### Adding a Workhorse routing rule
+
+Routing rules are defined in
+[workhorse/internal/upstream/routes.go](https://gitlab.com/gitlab-org/gitlab/-/blob/adf99b5327700cf34a845626481d7d6fcc454e57/workhorse/internal/upstream/routes.go).
+They consist of:
+
+- An HTTP verb (usually "POST" or "PUT")
+- A path regular expression
+- An upload type: MIME multipart or "full request body"
+- Optionally, you can also match on HTTP headers like `Content-Type`
+
+Example:
+
+```golang
+u.route("PUT", apiProjectPattern+`packages/nuget/`, mimeMultipartUploader),
+```
+
+You should add a test for your routing rule to `TestAcceleratedUpload`
+in
+[workhorse/upload_test.go](https://gitlab.com/gitlab-org/gitlab/-/blob/adf99b5327700cf34a845626481d7d6fcc454e57/workhorse/upload_test.go).
+
+You should also manually verify that when you perform an upload
+request for your new feature, Workhorse makes a pre-authorization
+request. You can check this by looking at the Rails access logs. This
+is necessary because if you make a mistake in your routing rule you
+won't get a hard failure: you just end up using the less efficient
+default path.
+
+### Adding a pre-authorization endpoint
+
+We distinguish three cases: Rails controllers, Grape API endpoints and
+GraphQL resources.
+
+To start with the bad news: direct upload for GraphQL is currently not
+supported. The reason for this is that Workhorse does not parse
+GraphQL queries. Also see [issue #280819](https://gitlab.com/gitlab-org/gitlab/-/issues/280819).
+Consider accepting your file upload via Grape instead.
+
+For Grape pre-authorization endpoints, look for existing examples that
+implement `/authorize` routes. One example is the
+[POST `:id/uploads/authorize` endpoint](https://gitlab.com/gitlab-org/gitlab/-/blob/9ad53d623eecebb799ce89eada951e4f4a59c116/lib/api/projects.rb#L642-651).
+Note that this particular example is using FileUploader, which means
+that the upload will be stored in the storage location (bucket) of
+that Uploader class.
+
+For Rails endpoints you can use the
+[WorkhorseAuthorization concern](https://gitlab.com/gitlab-org/gitlab/-/blob/adf99b5327700cf34a845626481d7d6fcc454e57/app/controllers/concerns/workhorse_authorization.rb).
+
+## Processing uploads
+
+Some features require us to process uploads, for example to extract
+metadata from the uploaded file. There are a couple of different ways
+you can implement this. The main choice is _where_ to implement the
+processing, or "who is the processor".
+
+|Processor|Direct Upload possible?|Can reject HTTP request?|Implementation|
+|---|---|---|---|
+|Sidekiq|yes|no|Straightforward|
+|Workhorse|yes|yes|Complex|
+|Rails|no|yes|Easy|
+
+Processing in Rails looks appealing but it tends to lead to scaling
+problems down the road because you cannot use direct upload. You are
+then forced to rebuild your feature with processing in Workhorse. So
+if the requirements of your feature allows it, doing the processing in
+Sidekiq strikes a good balance between complexity and the ability to
+scale.
+
+## CarrierWave Uploaders
+
+GitLab uses a modified version of
+[CarrierWave](https://github.com/carrierwaveuploader/carrierwave) to
+manage uploads. Below we will describe how we use CarrierWave and how
+we modified it.
+
+The central concept of CarrierWave is the **Uploader** class. The
+Uploader defines where files get stored, and optionally contains
+validation and processing logic. To use an Uploader you must associate
+it with a text column on an ActiveRecord model. This called "mounting"
+and the column is called the "mountpoint". For example:
+
+```ruby
+class Project < ApplicationRecord
+ mount_uploader :avatar, AttachmentUploader
+end
+```
+
+Now if I upload an avatar called `tanuki.png` the idea is that in the
+`projects.avatar` column for my project, CarrierWave stores the string
+`tanuki.png`, and that the AttachmentUploader class contains the
+configuration data and directory schema. For example if the project ID
+is 123, the actual file may be in
+`/var/opt/gitlab/gitlab-rails/uploads/-/system/project/avatar/123/tanuki.png`.
+The directory
+`/var/opt/gitlab/gitlab-rails/uploads/-/system/project/avatar/123/`
+was chosen by the Uploader using among others configuration
+(`/var/opt/gitlab/gitlab-rails/uploads`), the model name (`project`),
+the model ID (`123`) and the mountpoint (`avatar`).
+
+> The Uploader determines the individual storage directory of your
+> upload. The mountpoint column in your model contains the filename.
+
+You never access the mountpoint column directly because CarrierWave
+defines a getter and setter on your model that operates on file handle
+objects.
+
+### Optional Uploader behaviors
+
+Besides determining the storage directory for your upload, a
+CarrierWave Uploader can implement several other behaviors via
+callbacks. Not all of these behaviors are usable in GitLab. In
+particular, you currently cannot use the `version` mechanism of
+CarrierWave. Things you can do include:
+
+- Filename validation
+- **Incompatible with direct upload:** One time pre-processing of file contents, e.g. image resizing
+- **Incompatible with direct upload:** Encryption at rest
+
+Note that CarrierWave pre-processing behaviors such as image resizing
+or encryption require local access to the uploaded file. This forces
+you to upload the processed file from Ruby. This flies against direct
+upload, which is all about _not_ doing the upload in Ruby. If you use
+direct upload with an Uploader with pre-processing behaviors then the
+pre-processing behaviors will be skipped silently.
+
+### CarrierWave Storage engines
+
+CarrierWave has 2 storage engines:
+
+|CarrierWave class|GitLab name|Description|
+|---|---|---|
+|`CarrierWave::Storage::File`|`ObjectStorage::Store::LOCAL` |Local files, accessed through the Ruby stdlib|
+| `CarrierWave::Storage::Fog`|`ObjectStorage::Store::REMOTE`|Cloud files, accessed through the [Fog gem](https://github.com/fog/fog)|
+
+GitLab uses both of these engines, depending on configuration.
+
+The normal way to choose a storage engine in CarrierWave is to use the
+`Uploader.storage` class method. In GitLab we do not do this; we have
+overridden `Uploader#storage` instead. This allows us to vary the
+storage engine file by file.
+
+### CarrierWave file lifecycle
+
+An Uploader is associated with two storage areas: regular storage and
+cache storage. Each has its own storage engine. If you assign a file
+to a mountpoint setter (`project.avatar =
+File.open('/tmp/tanuki.png')`) you will copy/move the file to cache
+storage as a side effect via the `cache!` method. To persist the file
+you must somehow call the `store!` method. This either happens via
+[ActiveRecord callbacks](https://github.com/carrierwaveuploader/carrierwave/blob/v1.3.2/lib/carrierwave/orm/activerecord.rb#L55)
+or by calling `store!` on an Uploader instance.
+
+Normally you do not need to interact with `cache!` and `store!` but if
+you need to debug GitLab CarrierWave modifications it is useful to
+know that they are there and that they always get called.
+Specifically, it is good to know that CarrierWave pre-processing
+behaviors (`process` etc.) are implemented as `before :cache` hooks,
+and in the case of direct upload, these hooks are ignored and do not
+run.
+
+> Direct upload skips all CarrierWave `before :cache` hooks.
+
+## GitLab modifications to CarrierWave
+
+GitLab uses a modified version of CarrierWave to make a number of things possible.
+
+### Migrating data between storage engines
+
+In
+[app/uploaders/object_storage.rb](https://gitlab.com/gitlab-org/gitlab/-/blob/adf99b5327700cf34a845626481d7d6fcc454e57/app/uploaders/object_storage.rb)
+there is code for migrating user data between local storage and object
+storage. This code exists because for a long time, GitLab.com stored
+uploads on local storage via NFS. This changed when as part of an infrastructure
+migration we had to move the uploads to object storage.
+
+This is why the CarrierWave `storage` varies from upload to upload in
+GitLab, and why we have database columns like `uploads.store` or
+`ci_job_artifacts.file_store`.
+
+### Direct Upload via Workhorse
+
+Workhorse direct upload is a mechanism that lets us accept large
+uploads without spending a lot of Ruby CPU time. Workhorse is written
+in Go and goroutines have a much lower resource footprint than Ruby
+threads.
+
+Direct upload works as follows.
+
+1. Workhorse accepts a user upload request
+1. Workhorse pre-authenticates the request with Rails, and receives a temporary upload location
+1. Workhorse stores the file upload in the user's request to the temporary upload location
+1. Workhorse propagates the request to Rails
+1. Rails issues a remote copy operation to copy the uploaded file from its temporary location to the final location
+1. Rails deletes the temporary upload
+1. Workhorse deletes the temporary upload a second time in case Rails timed out
+
+Normally, `cache!` returns an instance of
+`CarrierWave::SanitizedFile`, and `store!` then
+[uploads that file using Fog](https://github.com/carrierwaveuploader/carrierwave/blob/v1.3.2/lib/carrierwave/storage/fog.rb#L327-L335).
+
+In the case of object storage, with the modifications specific to GitLab, the
+copying from the temporary location to the final location is
+implemented by Rails fooling CarrierWave. When CarrierWave tries to
+`cache!` the upload, we
+[return](https://gitlab.com/gitlab-org/gitlab/-/blob/59b441d578e41cb177406a9799639e7a5aa9c7e1/app/uploaders/object_storage.rb#L367)
+a `CarrierWave::Storage::Fog::File` file handle which points to the
+temporary file. During the `store!` phase, CarrierWave then
+[copies](https://github.com/carrierwaveuploader/carrierwave/blob/v1.3.2/lib/carrierwave/storage/fog.rb#L325)
+this file to its intended location.
+
+## Tables
+
+The Scalability::Frameworks team is going to make object storage and uploads more easy to use and more robust. If you add or change uploaders, it helps us if you update this table too. This helps us keep an overview of where and how uploaders are used.
### Feature bucket details
diff --git a/doc/development/utilities.md b/doc/development/utilities.md
index b9b4c6448e2..3f6187a4c2e 100644
--- a/doc/development/utilities.md
+++ b/doc/development/utilities.md
@@ -188,6 +188,24 @@ Refer to [`strong_memoize.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/maste
end
```
+ Alternatively, use the `strong_memoize_attr` helper to memoize the method for you:
+
+ ```ruby
+ class Find
+ include Gitlab::Utils::StrongMemoize
+
+ def result
+ search
+ end
+ strong_memoize_attr :result
+
+ strong_memoize_attr :enabled?, :enabled
+ def enabled?
+ Feature.enabled?(:some_feature)
+ end
+ end
+ ```
+
- Clear memoization
```ruby
diff --git a/doc/development/verifying_database_capabilities.md b/doc/development/verifying_database_capabilities.md
index 55347edf4ec..0217eb96e5a 100644
--- a/doc/development/verifying_database_capabilities.md
+++ b/doc/development/verifying_database_capabilities.md
@@ -1,38 +1,11 @@
---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/verifying_database_capabilities.md'
+remove_date: '2022-11-06'
---
-# Verifying Database Capabilities
+This document was moved to [another location](database/verifying_database_capabilities.md).
-Sometimes certain bits of code may only work on a certain database
-version. While we try to avoid such code as much as possible sometimes it is
-necessary to add database (version) specific behavior.
-
-To facilitate this we have the following methods that you can use:
-
-- `ApplicationRecord.database.version`: returns the PostgreSQL version number as a string
- in the format `X.Y.Z`.
-
-This allows you to write code such as:
-
-```ruby
-if ApplicationRecord.database.version.to_f >= 11.7
- run_really_fast_query
-else
- run_fast_query
-end
-```
-
-## Read-only database
-
-The database can be used in read-only mode. In this case we have to
-make sure all GET requests don't attempt any write operations to the
-database. If one of those requests wants to write to the database, it needs
-to be wrapped in a `Gitlab::Database.read_only?` or `Gitlab::Database.read_write?`
-guard, to make sure it doesn't for read-only databases.
-
-We have a Rails Middleware that filters any potentially writing
-operations (the `CUD` operations of CRUD) and prevent the user from trying
-to update the database and getting a 500 error (see `Gitlab::Middleware::ReadOnly`).
+<!-- This redirect file can be deleted after <2022-11-06>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/development/windows.md b/doc/development/windows.md
index 3eed9c057ab..17dfaddef36 100644
--- a/doc/development/windows.md
+++ b/doc/development/windows.md
@@ -37,8 +37,7 @@ A list of software preinstalled on the Windows images is available at: [Preinsta
## GCP Windows image for development
-The [shared Windows GitLab
-runners](https://about.gitlab.com/releases/2020/01/22/gitlab-12-7-released/#windows-shared-runners-on-gitlabcom-beta)
+The [shared Windows GitLab runners](https://about.gitlab.com/releases/2020/01/22/gitlab-12-7-released/#windows-shared-runners-on-gitlabcom-beta)
are built with [Packer](https://www.packer.io/).
The Infrastructure as Code repository for building the Google Cloud images is available at:
diff --git a/doc/development/work_items.md b/doc/development/work_items.md
index 9a17a152525..3625f85eb82 100644
--- a/doc/development/work_items.md
+++ b/doc/development/work_items.md
@@ -36,14 +36,12 @@ Here are some problems with current issues usage and why we are looking into wor
differences in common interactions that the user needs to hold a complicated mental
model of how they each behave.
- Issues are not extensible enough to support all of the emerging jobs they need to facilitate.
-- Codebase maintainability and feature development become bigger challenges as we grow the Issue type
+- Codebase maintainability and feature development becomes a bigger challenge as we grow the Issue type.
beyond its core role of issue tracking into supporting the different work item types and handling
logic and structure differences.
- New functionality is typically implemented with first class objects that import behavior from issues via
shared concerns. This leads to duplicated effort and ultimately small differences between common interactions. This
leads to inconsistent UX.
-- Codebase maintainability and feature development becomes a bigger challenges as we grow issues
- beyond its core role of issue tracking into supporting the different types and subtle differences between them.
## Work item terminology
diff --git a/doc/development/workhorse/configuration.md b/doc/development/workhorse/configuration.md
index b86bb824ea1..a94ba2b4fc6 100644
--- a/doc/development/workhorse/configuration.md
+++ b/doc/development/workhorse/configuration.md
@@ -211,7 +211,7 @@ Workhorse supports distributed tracing through [LabKit](https://gitlab.com/gitla
using [OpenTracing APIs](https://opentracing.io).
By default, no tracing implementation is linked into the binary. You can link in
-different OpenTracing providers with [build tags](https://golang.org/pkg/go/build/#hdr-Build_Constraints)
+different OpenTracing providers with [build tags](https://pkg.go.dev/go/build#hdr-Build_Constraints)
or build constraints by setting the `BUILD_TAGS` make variable.
For more details of the supported providers, refer to LabKit. For an example of
@@ -234,7 +234,7 @@ When a user makes an HTTP request, such as creating a new project, the
initial request is routed through Workhorse to another service, which
may in turn, make other requests. To help trace the request as it flows
across services, Workhorse generates a random value called a
-[correlation ID](../../administration/troubleshooting/tracing_correlation_id.md).
+[correlation ID](../../administration/logs/tracing_correlation_id.md).
Workhorse sends this correlation ID via the `X-Request-Id` HTTP header.
Some GitLab services, such as GitLab Shell, generate their own
@@ -278,9 +278,9 @@ trusted_cidrs_for_x_forwarded_for = ["10.0.0.0/8", "127.0.0.1/32"]
## Continuous profiling
Workhorse supports continuous profiling through [LabKit](https://gitlab.com/gitlab-org/labkit/)
-using [Stackdriver Profiler](https://cloud.google.com/profiler). By default, the
+using [Stackdriver Profiler](https://cloud.google.com/products/operations). By default, the
Stackdriver Profiler implementation is linked in the binary using
-[build tags](https://golang.org/pkg/go/build/#hdr-Build_Constraints), though it's not
+[build tags](https://pkg.go.dev/go/build#hdr-Build_Constraints), though it's not
required and can be skipped. For example:
```shell
diff --git a/doc/development/workhorse/gitlab_features.md b/doc/development/workhorse/gitlab_features.md
index 365cc7991d8..3b240d4cbc6 100644
--- a/doc/development/workhorse/gitlab_features.md
+++ b/doc/development/workhorse/gitlab_features.md
@@ -70,4 +70,4 @@ memory than it costs to have Workhorse look after it.
- Workhorse does not clean up idle client connections.
- We assume that all requests to Rails pass through Workhorse.
-For more information see ['A brief history of GitLab Workhorse'](https://about.gitlab.com/2016/04/12/a-brief-history-of-gitlab-workhorse/).
+For more information see ['A brief history of GitLab Workhorse'](https://about.gitlab.com/blog/2016/04/12/a-brief-history-of-gitlab-workhorse/).
diff --git a/doc/development/workhorse/index.md b/doc/development/workhorse/index.md
index 3aa7e945f53..962124248ef 100644
--- a/doc/development/workhorse/index.md
+++ b/doc/development/workhorse/index.md
@@ -10,8 +10,8 @@ GitLab Workhorse is a smart reverse proxy for GitLab. It handles
"large" HTTP requests such as file downloads, file uploads, Git
push/pull and Git archive downloads.
-Workhorse itself is not a feature, but there are [several features in
-GitLab](gitlab_features.md) that would not work efficiently without Workhorse.
+Workhorse itself is not a feature, but there are
+[several features in GitLab](gitlab_features.md) that would not work efficiently without Workhorse.
The canonical source for Workhorse is
[`gitlab-org/gitlab/workhorse`](https://gitlab.com/gitlab-org/gitlab/tree/master/workhorse).
@@ -21,7 +21,7 @@ but that repository is no longer used for development.
## Install Workhorse
-To install GitLab Workhorse you need [Go 1.15 or newer](https://golang.org/dl) and
+To install GitLab Workhorse you need [Go 1.15 or newer](https://go.dev/dl) and
[GNU Make](https://www.gnu.org/software/make/).
To install into `/usr/local/bin` run `make install`.
@@ -44,7 +44,7 @@ On some operating systems, such as FreeBSD, you may have to use
### Run time dependencies
-Workhorse uses [ExifTool](https://www.sno.phy.queensu.ca/~phil/exiftool/) for
+Workhorse uses [ExifTool](https://exiftool.org/) for
removing EXIF data (which may contain sensitive information) from uploaded
images. If you installed GitLab: