Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/architecture/blueprints')
-rw-r--r--doc/architecture/blueprints/_template.md2
-rw-r--r--doc/architecture/blueprints/ai_gateway/index.md151
-rw-r--r--doc/architecture/blueprints/capacity_planning/index.md78
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-admin-area.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-agent-for-kubernetes.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-backups.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-ci-runners.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-container-registry.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-contributions-forks.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-data-migration.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-database-sequences.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-explore.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-git-access.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-gitlab-pages.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-global-search.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-graphql.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-organizations.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-personal-access-tokens.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-personal-namespaces.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-router-endpoints-classification.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-schema-changes.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-secrets.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-snippets.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-template.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-uploads.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-user-profile.md6
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-your-work.md6
-rw-r--r--doc/architecture/blueprints/cells/deployment-architecture.md4
-rw-r--r--doc/architecture/blueprints/cells/glossary.md6
-rw-r--r--doc/architecture/blueprints/cells/impact.md7
-rw-r--r--doc/architecture/blueprints/cells/impacted_features/container-registry.md24
-rw-r--r--doc/architecture/blueprints/cells/index.md144
-rw-r--r--doc/architecture/blueprints/cells/proposal-stateless-router-with-routes-learning.md28
-rw-r--r--doc/architecture/blueprints/cells/routing-service.md196
-rw-r--r--doc/architecture/blueprints/ci_gcp_secrets_manager/index.md107
-rw-r--r--doc/architecture/blueprints/ci_pipeline_components/index.md228
-rw-r--r--doc/architecture/blueprints/clickhouse_usage/index.md2
-rw-r--r--doc/architecture/blueprints/composable_codebase_using_rails_engines/index.md2
-rw-r--r--doc/architecture/blueprints/consolidating_groups_and_projects/index.md4
-rw-r--r--doc/architecture/blueprints/container_registry_metadata_database/index.md46
-rw-r--r--doc/architecture/blueprints/container_registry_metadata_database_self_managed_rollout/index.md12
-rw-r--r--doc/architecture/blueprints/database/automated_query_analysis/index.md2
-rw-r--r--doc/architecture/blueprints/database/scalability/patterns/index.md2
-rw-r--r--doc/architecture/blueprints/database/scalability/patterns/read_mostly.md2
-rw-r--r--doc/architecture/blueprints/database/scalability/patterns/time_decay.md2
-rw-r--r--doc/architecture/blueprints/database_testing/index.md2
-rw-r--r--doc/architecture/blueprints/gitaly_adaptive_concurrency_limit/index.md4
-rw-r--r--doc/architecture/blueprints/gitaly_handle_upload_pack_in_http2_server/index.md2
-rw-r--r--doc/architecture/blueprints/gitlab_services/img/architecture.pngbin64365 -> 21002 bytes
-rw-r--r--doc/architecture/blueprints/gitlab_services/index.md8
-rw-r--r--doc/architecture/blueprints/gitlab_steps/data.drawio.pngbin0 -> 42192 bytes
-rw-r--r--doc/architecture/blueprints/gitlab_steps/decisions/001_initial_support.md30
-rw-r--r--doc/architecture/blueprints/gitlab_steps/implementation.md339
-rw-r--r--doc/architecture/blueprints/gitlab_steps/index.md98
-rw-r--r--doc/architecture/blueprints/gitlab_steps/runner-integration.md116
-rw-r--r--doc/architecture/blueprints/gitlab_steps/step-runner-sequence.drawio.pngbin0 -> 70107 bytes
-rw-r--r--doc/architecture/blueprints/google_artifact_registry_integration/index.md12
-rw-r--r--doc/architecture/blueprints/google_artifact_registry_integration/ui_ux.md4
-rw-r--r--doc/architecture/blueprints/google_cloud_platform_integration/index.md34
-rw-r--r--doc/architecture/blueprints/new_diffs.md135
-rw-r--r--doc/architecture/blueprints/new_diffs/index.md431
-rw-r--r--doc/architecture/blueprints/object_storage/index.md2
-rw-r--r--doc/architecture/blueprints/observability_logging/index.md4
-rw-r--r--doc/architecture/blueprints/observability_logging/system_overview.pngbin76330 -> 21521 bytes
-rw-r--r--doc/architecture/blueprints/organization/index.md26
-rw-r--r--doc/architecture/blueprints/organization/isolation.md6
-rw-r--r--doc/architecture/blueprints/rate_limiting/index.md2
-rw-r--r--doc/architecture/blueprints/repository_backups/index.md2
-rw-r--r--doc/architecture/blueprints/runner_tokens/index.md15
-rw-r--r--doc/architecture/blueprints/secret_detection/decisions/001_use_ruby_push_check_approach_within_monolith.md32
-rw-r--r--doc/architecture/blueprints/secret_detection/index.md339
-rw-r--r--doc/architecture/blueprints/secret_manager/secrets-manager-overview.pngbin419952 -> 119870 bytes
72 files changed, 2122 insertions, 714 deletions
diff --git a/doc/architecture/blueprints/_template.md b/doc/architecture/blueprints/_template.md
index 18f88322906..8577054db83 100644
--- a/doc/architecture/blueprints/_template.md
+++ b/doc/architecture/blueprints/_template.md
@@ -13,7 +13,7 @@ Before you start:
- Copy this file to a sub-directory and call it `index.md` for it to appear in
the blueprint directory.
-- Please remove comment blocks for sections you've filled in.
+- Remove comment blocks for sections you've filled in.
When your blueprint ready for review, all of these comment blocks should be
removed.
diff --git a/doc/architecture/blueprints/ai_gateway/index.md b/doc/architecture/blueprints/ai_gateway/index.md
index 8c5a13d2e76..c09f8aaa621 100644
--- a/doc/architecture/blueprints/ai_gateway/index.md
+++ b/doc/architecture/blueprints/ai_gateway/index.md
@@ -77,19 +77,25 @@ secret redaction at this level of the stack as well as in GitLab-rails.
#### Protocol
-We're choosing to use a simple JSON API for the AI-gateway
-service. This allows us to re-use a lot of what is already in place in
-the current model-gateway. It also allows us to make the endpoints
-version agnostic. We could have an API that expects only a rudimentary
-envelope that can contain dynamic information. We should make sure
-that we make these APIs compatible with multiple versions of GitLab,
-or other clients that use the gateway through GitLab. **This means
-that all client versions talk to the same API endpoint, the AI-gateway
-needs to support this, but we don't need to support different
-endpoints per version**.
-
-We also considered gRPC as a the protocol for communication between
-GitLab instances, they differ on these items:
+The communication between the AI-Gateway service and its clients (including the GitLab Rails application) shall use a JSON-based API.
+
+The AI-Gateway API shall expose single-purpose endpoints responsible for providing access to different AI features. [A later section](#single-purpose-endpoints) of this document provides detailed guidelines for building specific endpoints.
+
+The AI Gateway communication protocol shall only expect a rudimentary envelope that wraps all feature-specific dynamic information. The proposed architecture of the protocol allows the API endpoints to be version agnostic, and the AI-Gateway APIs compatible with multiple versions of GitLab(or other clients that use the gateway through GitLab).
+
+ **This means
+that all clients regardless of their versions use the same set of AI-Gateway API feature endpoints. The AI-gateway feature endpoints have to support different client versions, instead of creating multiple feature endpoints per different supported client versions**.
+
+We can however add a version to the path in case we do want to evolve
+a certain endpoint. It's not expected that we'll need to do this
+often, but having a version in the path keeps the option open. The
+benefit of this is that individual GitLab milestone releases will
+continue pointing to the endpoint version it was tested against at the
+time of release, while allowing us to iterate quickly by introducing
+new endpoint versions.
+
+We also considered gRPC as a protocol for communication between
+GitLab instances, JSON API, and gRPC differ on these items:
| gRPC | REST + JSON |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|
@@ -138,37 +144,80 @@ one of the prompt payloads is no longer supported by the AI
gateway. It allows us to potentially avoid breaking features in older
GitLab installations as the AI landscape changes.
-#### Cross version compatibility
-
-When building single purpose endpoints, we should be mindful that
-these endpoints will be used by different GitLab instances indirectly
-by external clients. To achieve this, we have a very simple envelope
-to provide information. It has to have a series of `prompt_components`
-that contain information the AI-gateway can use to build prompts and
-query the model of it selects.
-
-Each prompt component contains 3 elements:
-
-- `type`: This is the kind of information that is being presented in
- `payload`. The AI-gateway should ignore any types it does not know
- about.
-- `payload`: The actual information that can be used by the AI-gateway
- to build the payload that is going to go out to AI providers. The
- payload will be different depending on the type, and the version of
- the client providing the payload. This means that the AI-gateway
- needs to consider all fields optional.
-- `metadata`: Information about the client that built this part of the
- prompt. This may, or may not be used by GitLab for
- telemetry. Nothing inside this field should be required.
-
-The only component in there that is likely to change often is the
+### The AI-Gateway API protocol
+
+It is important to build each single-purpose endpoint, in a version-agnostic way so it can be used by different GitLab instances (and indirectly by external clients). To achieve this goal:
+
+**The AI-Gateway protocol shall rely on a simple JSON envelope wrapping all feature-specific information.** The AI-Gateway protocol can be seen as a transport layer protocol from [the OSI model](https://en.wikipedia.org/wiki/OSI_model) (eg: TCP, UDP) which defines how to transport information between nodes, without being aware of what information is being transported.
+
+The AI-Gateway protocol does not specify which information received by single-purpose endpoint should be processed and in which way. Providing endpoint with the freedom to decide if they will use data coming from each protocol envelope or ignore it.
+
+The AI-Gateway protocol defines each request in the following way:
+
+1. Each single-purpose endpoint shall accept requests containing a single JSON object with a single key: `prompt_components`.
+1. The `prompt_components` key shall contain an array of JSON envelopes that are built according to the following rules:
+
+Each JSON envelope contains 3 elements:
+
+1. `type`: A string identifier specifying a type of information that is being presented in the envelopes
+ `payload`. The AI-gateway single-purpose endpoint may ignore any types it does not know about.
+1. `payload`: The actual information that can be used by the AI-Gateway single-purpose endpoint to send requests to 3rd party AI services providers. The data inside the `payload` element can differ depending on the `type`, and the version of
+ the client providing the `payload`. This means that the AI-Gateway
+ single-purpose endpoint must consider the structure and the type of data present inside the `payload` optional, and gracefully handle missing or malformed information.
+1. `metadata`: This field contains information about a client that built this `prompt_components` envelope. Information from the `metadata` field may, or may not be used by GitLab for
+ telemetry. The same as with the `payload` all fields inside the `metadata` shall be considered optional.
+
+The only envelope field that is expected to likely change often is the
`payload` one. There we need to make sure that all fields are
-optional, and avoid renaming, removing or repurposing fields.
+optional and avoid renaming, removing, or repurposing fields.
-When this is needed, we need to build support for the old versions of
-a field in the gateway, and keep them around for at least 2 major
-versions of GitLab. For example, we could consider adding 2 versions
-of a prompt to the `prompt_components` payload:
+To document and validate the content of `payload` we can specify their
+format using [JSON-schema](https://json-schema.org/).
+
+An example request according to the AI-Gateway component looks as follows:
+
+```json
+{
+ "prompt_components": [
+ {
+ "type": "prompt",
+ "metadata": {
+ "source": "GitLab EE",
+ "version": "16.7.0-pre",
+ },
+ "payload": {
+ "content": "...",
+ "params": {
+ "temperature": 0.2,
+ "maxOutputTokens": 1024
+ },
+ "model": "code-gecko",
+ "provider": "vertex-ai"
+ }
+ },
+ {
+ "type": "editor_content",
+ "metadata": {
+ "source": "vscode",
+ "version": "1.1.1"
+ },
+ "payload": {
+ "filename": "application.rb",
+ "before_cursor": "require 'active_record/railtie'",
+ "after_cursor": "\nrequire 'action_controller/railtie'",
+ "open_files": [
+ {
+ "filename": "app/controllers/application_controller.rb",
+ "content": "class ApplicationController < ActionController::Base..."
+ }
+ ]
+ }
+ }
+ ]
+}
+```
+
+Another example use case includes 2 versions of a prompt passed in the `prompt_components` payload. Where each version is tailored for different 3rd party AI model provider:
```json
{
@@ -177,7 +226,7 @@ of a prompt to the `prompt_components` payload:
"type": "prompt",
"metadata": {
"source": "GitLab EE",
- "version": "16.3",
+ "version": "16.7.0-pre",
},
"payload": {
"content": "You can fetch information about a resource called an issue...",
@@ -193,7 +242,7 @@ of a prompt to the `prompt_components` payload:
"type": "prompt",
"metadata": {
"source": "GitLab EE",
- "version": "16.3",
+ "version": "16.7.0-pre",
},
"payload": {
"content": "System: You can fetch information about a resource called an issue...\n\nHuman:",
@@ -209,18 +258,20 @@ of a prompt to the `prompt_components` payload:
}
```
-Allowing the API to direct the prompt to either provider, based on
-what is in the payload.
+#### Cross-version compatibility
-To document and validate the content of `payload` we can specify their
-format using [JSON-schema](https://json-schema.org/).
+**When renaming, removing, or repurposing fields inside `payload` is needed, a single-purpose endpoint that uses the affected envelope type must build support for the old versions of
+a field in the gateway, and keep them around for at least 2 major
+versions of GitLab.**
+
+A good practise that might help support backwards compatibility is to provide building blocks for the prompt inside the `prompt_components` rather then a complete prompt. By moving responsibility of compiling prompt out of building blocks on the AI-Gateway, one can achive more flexibility in terms of prompt adjustments in the future.
#### Example feature: Code Suggestions
For example, a rough Code Suggestions service could look like this:
```plaintext
-POST /internal/code-suggestions/completions
+POST /v3/code/completions
```
```json
@@ -230,7 +281,7 @@ POST /internal/code-suggestions/completions
"type": "prompt",
"metadata": {
"source": "GitLab EE",
- "version": "16.3",
+ "version": "16.7.0-pre",
},
"payload": {
"content": "...",
diff --git a/doc/architecture/blueprints/capacity_planning/index.md b/doc/architecture/blueprints/capacity_planning/index.md
index ed014f545f9..31740d50368 100644
--- a/doc/architecture/blueprints/capacity_planning/index.md
+++ b/doc/architecture/blueprints/capacity_planning/index.md
@@ -13,21 +13,27 @@ approvers: [ "@swiskow", "@rnienaber", "@o-lluch" ]
## Summary
-This document outlines how we plan to set up infrastructure capacity planning for GitLab Dedicated tenant environments, which is a [FY24-Q3 OKR](https://gitlab.com/gitlab-com/gitlab-OKRs/-/work_items/3507).
+This document outlines how we plan to set up infrastructure capacity planning for GitLab Dedicated tenant environments, which started as a [FY24-Q3 OKR](https://gitlab.com/gitlab-com/gitlab-OKRs/-/work_items/3507).
-We make use of Tamland, a tool we built to provide saturation forecasting insights for GitLab.com infrastructure resources. We propose to include Tamland as a part of the GitLab Dedicated stack and execute forecasting from within the tenant environments.
+We make use of [Tamland](https://gitlab.com/gitlab-com/gl-infra/tamland), a tool we build to provide saturation forecasting insights for GitLab.com infrastructure resources.
+We propose to include Tamland as a part of the GitLab Dedicated stack and execute forecasting from within the tenant environments.
-Tamland predicts SLO violations and their respective dates, which need to be reviewed and acted upon. In terms of team organisation, the Dedicated team is proposed to own the tenant-side setup for Tamland and to own the predicted SLO violations, with the help and guidance of the Scalability::Projections team, which drives further development, documentation and overall guidance for capacity planning, including for Dedicated.
+Tamland predicts SLO violations and their respective dates, which need to be reviewed and acted upon.
+In terms of team organisation, the Dedicated team is proposed to own the tenant-side setup for Tamland and to own the predicted SLO violations, with the help and guidance of the Scalability::Projections team, which drives further development, documentation and overall guidance for capacity planning, including for Dedicated.
-With this setup, we aim to turn Tamland into a more generic tool, which can be used in various environments including but not limited to Dedicated tenants. Long-term, we think of including Tamland in self-managed installations and think of Tamland as a candidate for open source release.
+With this setup, we aim to turn Tamland into a more generic tool, which can be used in various environments including but not limited to Dedicated tenants.
+Long-term, we think of including Tamland in self-managed installations and think of Tamland as a candidate for open source release.
## Motivation
### Background: Capacity planning for GitLab.com
-[Tamland](https://gitlab.com/gitlab-com/gl-infra/tamland) is an infrastructure resource forecasting project owned by the [Scalability::Projections](https://about.gitlab.com/handbook/engineering/infrastructure/team/scalability/projections.html) group. It implements [capacity planning](https://about.gitlab.com/handbook/engineering/infrastructure/capacity-planning/) for GitLab.com, which is a [controlled activity covered by SOC 2](https://gitlab.com/gitlab-com/gl-security/security-assurance/security-compliance-commercial-and-dedicated/observation-management/-/issues/604). As of today, it is used exclusively for GitLab.com to predict upcoming SLO violations across hundreds of monitored infrastructure components.
+[Tamland](https://gitlab.com/gitlab-com/gl-infra/tamland) is an infrastructure resource forecasting project owned by the [Scalability::Observability](https://about.gitlab.com/handbook/engineering/infrastructure/team/scalability/#scalabilityobservability) group.
+It implements [capacity planning](https://about.gitlab.com/handbook/engineering/infrastructure/capacity-planning/) for GitLab.com, which is a [controlled activity covered by SOC 2](https://gitlab.com/gitlab-com/gl-security/security-assurance/security-compliance-commercial-and-dedicated/observation-management/-/issues/604).
+As of today, it is used exclusively for GitLab.com to predict upcoming SLO violations across hundreds of monitored infrastructure components.
-Tamland produces a [report](https://gitlab-com.gitlab.io/gl-infra/tamland/intro.html) (hosted on GitLab Pages) containing forecast plots, information around predicted violations and other information around the components monitored. Any predicted SLO violation result in a capacity warning issue being created in the [issue tracker for capacity planning](https://gitlab.com/gitlab-com/gl-infra/capacity-planning/-/boards/2816983) on GitLab.com.
+Tamland produces a [report](https://gitlab-com.gitlab.io/gl-infra/tamland/intro.html) (hosted on GitLab Pages) containing forecast plots, information around predicted violations and other information around the components monitored.
+Any predicted SLO violation result in a capacity warning issue being created in the [issue tracker for capacity planning](https://gitlab.com/gitlab-com/gl-infra/capacity-planning/-/boards/2816983) on GitLab.com.
At present, Tamland is quite tailor made and specific for GitLab.com:
@@ -36,21 +42,28 @@ At present, Tamland is quite tailor made and specific for GitLab.com:
[Turning Tamland into a tool](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/1106) we can use more generically and making it independent of GitLab.com specifics is subject of ongoing work.
-For illustration, we can see a saturation forecast plot below for the `disk_space` resource for a PostgreSQL service called `patroni-ci`. Within the 90 days forecast horizon, we predict a violation of the `soft` SLO (set at 85% saturation) and this resulted in the creation of a [capacity planning issue](https://gitlab.com/gitlab-com/gl-infra/capacity-planning/-/issues/1219) for further review and potential actions. At present, the Scalability::Projections group reviews those issues and engages with the respective DRI for the service in question to remedy a saturation concern.
+For illustration, we can see a saturation forecast plot below for the `disk_space` resource for a PostgreSQL service called `patroni-ci`.
+Within the 90 days forecast horizon, we predict a violation of the `soft` SLO (set at 85% saturation) and this resulted in the creation of a [capacity planning issue](https://gitlab.com/gitlab-com/gl-infra/capacity-planning/-/issues/1219) for further review and potential actions.
+At present, the Scalability::Projections group reviews those issues and engages with the respective DRI for the service in question to remedy a saturation concern.
<img src="images/image-20230911144743188.png" alt="image-20230911144743188" style="zoom:67%;" />
-For GitLab.com capacity planning, we operate Tamland from a scheduled CI pipeline with access to the central Thanos, which provides saturation and utilization metrics for GitLab.com. The CI pipeline produces the desired report, exposes it on GitLab Pages and also creates capacity planning issues. Scalability::Projections runs a capacity planning triage rotation which entails reviewing and prioritizing any open issues and their respective saturation concerns.
+For GitLab.com capacity planning, we operate Tamland from a scheduled CI pipeline with access to the central Thanos, which provides saturation and utilization metrics for GitLab.com.
+The CI pipeline produces the desired report, exposes it on GitLab Pages and also creates capacity planning issues.
+Scalability::Projections runs a capacity planning triage rotation which entails reviewing and prioritizing any open issues and their respective saturation concerns.
### Problem Statement
-With the number of [GitLab Dedicated](https://about.gitlab.com/dedicated/) deployments increasing, we need to establish capacity planning processes for Dedicated tenants. This is going to help us notice any pending resource constraints soon enough to be able to upgrade the infrastructure for a given tenant before the resource saturates and causes an incident.
+With the number of [GitLab Dedicated](https://about.gitlab.com/dedicated/) deployments increasing, we need to establish capacity planning processes for Dedicated tenants.
+This is going to help us notice any pending resource constraints soon enough to be able to upgrade the infrastructure for a given tenant before the resource saturates and causes an incident.
-Each Dedicated tenant is an isolated GitLab environment, with a full set of metrics monitored. These metrics are standardized in the [metrics catalog](https://gitlab.com/gitlab-com/runbooks/-/blob/master/reference-architectures/get-hybrid/src/gitlab-metrics-config.libsonnet?ref_type=heads) and on top of these, we have defined saturation metrics along with respective SLOs.
+Each Dedicated tenant is an isolated GitLab environment, with a full set of metrics monitored.
+These metrics are standardized in the [metrics catalog](https://gitlab.com/gitlab-com/runbooks/-/blob/master/reference-architectures/get-hybrid/src/gitlab-metrics-config.libsonnet?ref_type=heads) and on top of these, we have defined saturation metrics along with respective SLOs.
In order to provide capacity planning and forecasts for saturation metrics for each tenant, we'd like to get Tamland set up for GitLab Dedicated.
-While Tamland is developed by the Scalability::Projections and this team also owns the capacity planning process for GitLab.com, they don't have access to any of the Dedicated infrastructure as we have strong isolation implemented for Dedicated environments. As such, the technical design choices are going to affect how those teams interact and vice versa. We include this consideration into this documentation as we think the organisational aspect is a crucial part of it.
+While Tamland is developed by the Scalability::Projections and this team also owns the capacity planning process for GitLab.com, they don't have access to any of the Dedicated infrastructure as we have strong isolation implemented for Dedicated environments.
+As such, the technical design choices are going to affect how those teams interact and vice versa. We include this consideration into this documentation as we think the organisational aspect is a crucial part of it.
### Key questions
@@ -70,25 +83,34 @@ While Tamland is developed by the Scalability::Projections and this team also ow
##### Reporting
-As of today, it's not quite clear yet how we'd like to consume forecasting data across tenants. In contrast to GitLab.com, we generate forecasts across a potentially large number of tenants. At this point, we suspect that we're more interested in an aggregate report across tenants rather than individual, very detailed saturation forecasts. As such, this is subject to refinement in a further iteration once we have the underlying data available and gathered practical insight in how we consume this information.
+As of today, it's not quite clear yet how we'd like to consume forecasting data across tenants.
+In contrast to GitLab.com, we generate forecasts across a potentially large number of tenants.
+At this point, we suspect that we're more interested in an aggregate report across tenants rather than individual, very detailed saturation forecasts.
+As such, this is subject to refinement in a further iteration once we have the underlying data available and gathered practical insight in how we consume this information.
##### Issue management
-While each predicted SLO violation results in the creation of a GitLab issue, this may not be the right mode of raising awareness for Dedicated. Similar to the reporting side, this is subject to further discussion once we have data to look at.
+While each predicted SLO violation results in the creation of a GitLab issue, this may not be the right mode of raising awareness for Dedicated.
+Similar to the reporting side, this is subject to further discussion once we have data to look at.
##### Customizing forecasting models
-Forecasting models can and should be tuned and informed with domain knowledge to produce accurate forecasts. This information is a part of the Tamland manifest. In the first iteration, we don't support per-tenant customization, but this can be added later.
+Forecasting models can and should be tuned and informed with domain knowledge to produce accurate forecasts.
+This information is a part of the Tamland manifest.
+In the first iteration, we don't support per-tenant customization, but this can be added later.
## Proposed Design for Dedicated: A part of the Dedicated stack
-Dedicated environments are fully isolated and run their own Prometheus instance to capture metrics, including saturation metrics. Tamland will run from each individual Dedicated tenant environment, consume metrics from Prometheus and store the resulting data in S3. From there, we consume forecast data and act on it.
+Dedicated environments are fully isolated and run their own Prometheus instance to capture metrics, including saturation metrics.
+Tamland will run from each individual Dedicated tenant environment, consume metrics from Prometheus and store the resulting data in S3.
+From there, we consume forecast data and act on it.
![tamland-as-part-of-stack](images/tamland-as-part-of-stack.png)
### Storage for output and cache
-Any data Tamland relies on is stored in a S3 bucket. We use one bucket per tenant to clearly separate data between tenants.
+Any data Tamland relies on is stored in a S3 bucket.
+We use one bucket per tenant to clearly separate data between tenants.
1. Resulting forecast data and other outputs
1. Tamland's internal cache for Prometheus metrics data
@@ -97,9 +119,11 @@ There is no need for a persistent state across Tamland runs aside from the S3 bu
### Benefits of executing inside tenant environments
-Each Tamland run for a single environment (tenant) can take a few hours to execute. With the number of tenants expected to increase significantly, we need to consider scaling the execution environment for Tamland.
+Each Tamland run for a single environment (tenant) can take a few hours to execute.
+With the number of tenants expected to increase significantly, we need to consider scaling the execution environment for Tamland.
-In this design, Tamland becomes a part of the Dedicated stack and a component of the individual tenant environment. As such, scaling the execution environment for Tamland is solved by design, because tenant forecasts execute inherently parallel in their respective environments.
+In this design, Tamland becomes a part of the Dedicated stack and a component of the individual tenant environment.
+As such, scaling the execution environment for Tamland is solved by design, because tenant forecasts execute inherently parallel in their respective environments.
### Distribution model: Docker
@@ -107,15 +131,18 @@ Tamland is released as a Docker image, see [Tamland's README](https://gitlab.com
### Tamland manifest
-The manifest contains information about which saturation metrics to forecast on (see this [manifest example](https://gitlab.com/gitlab-com/gl-infra/tamland/-/blob/62854e1afbc2ed3160a55a738ea587e0cf7f994f/saturation.json) for GitLab.com). This will be generated from the metrics catalog and will be the same for all tenants for starters.
+The manifest contains information about which saturation metrics to forecast on (see this [manifest example](https://gitlab.com/gitlab-com/gl-infra/tamland/-/blob/62854e1afbc2ed3160a55a738ea587e0cf7f994f/saturation.json) for GitLab.com).
+This will be generated from the metrics catalog and will be the same for all tenants for starters.
-In order to generate the manifest from the metrics catalog, we setup dedicated GitLab project `tamland-dedicated` . On a regular basis, a scheduled pipeline grabs the metrics catalog, generates the JSON manifest from it and commits this to the project.
+In order to generate the manifest from the metrics catalog, we setup dedicated GitLab project `tamland-dedicated`.
+On a regular basis, a scheduled pipeline grabs the metrics catalog, generates the JSON manifest from it and commits this to the project.
On the Dedicated tenants, we download the latest version of the committed JSON manifest from `tamland-dedicated` and use this as input to execute Tamland.
### Acting on forecast insights
-When Tamland forecast data is available for a tenant, the Dedicated teams consume this data and act on it accordingly. The Scalability::Projections group is going to support and guide this process to get started and help interpret data, along with implementing Tamland features required to streamline this process for Dedicated in further iterations.
+When Tamland forecast data is available for a tenant, the Dedicated teams consume this data and act on it accordingly.
+The Scalability::Observability group is going to support and guide this process to get started and help interpret data, along with implementing Tamland features required to streamline this process for Dedicated in further iterations.
## Alternative Solution
@@ -125,11 +152,14 @@ An alternative design, we don't consider an option at this point, is to setup Ta
![tamland-as-a-service](images/tamland-as-a-service.png)
-In this design, a central Prometheus/Thanos instance is needed to provide the metrics data for Tamland. Dedicated tenants use remote-write to push their Prometheus data to the central Thanos instance.
+In this design, a central Prometheus/Thanos instance is needed to provide the metrics data for Tamland.
+Dedicated tenants use remote-write to push their Prometheus data to the central Thanos instance.
-Tamland is set up to run on a regular basis and consume metrics data from the single Thanos instance. It stores its results and cache in S3, similar to the other design.
+Tamland is set up to run on a regular basis and consume metrics data from the single Thanos instance.
+It stores its results and cache in S3, similar to the other design.
-In order to execute forecasts regularly, we need to provide an execution environment to run Tamland in. With an increasing number of tenants, we'd need to scale up resources for this cluster.
+In order to execute forecasts regularly, we need to provide an execution environment to run Tamland in.
+With an increasing number of tenants, we'd need to scale up resources for this cluster.
This design **has not been chosen** because of both technical and organisational concerns:
diff --git a/doc/architecture/blueprints/cells/cells-feature-admin-area.md b/doc/architecture/blueprints/cells/cells-feature-admin-area.md
deleted file mode 100644
index 3f23e56c3af..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-admin-area.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/admin-area.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/admin-area.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-agent-for-kubernetes.md b/doc/architecture/blueprints/cells/cells-feature-agent-for-kubernetes.md
deleted file mode 100644
index 050b3a922b1..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-agent-for-kubernetes.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/agent-for-kubernetes.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/agent-for-kubernetes.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-backups.md b/doc/architecture/blueprints/cells/cells-feature-backups.md
deleted file mode 100644
index a0c38171ce6..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-backups.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/backups.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/backups.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-ci-runners.md b/doc/architecture/blueprints/cells/cells-feature-ci-runners.md
deleted file mode 100644
index a14f2a47237..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-ci-runners.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/ci-runners.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/ci-runners.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-container-registry.md b/doc/architecture/blueprints/cells/cells-feature-container-registry.md
deleted file mode 100644
index d9ff6da7f62..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-container-registry.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/container-registry.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/container-registry.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-contributions-forks.md b/doc/architecture/blueprints/cells/cells-feature-contributions-forks.md
deleted file mode 100644
index a87e4ba3391..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-contributions-forks.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/contributions-forks.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/contributions-forks.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-data-migration.md b/doc/architecture/blueprints/cells/cells-feature-data-migration.md
deleted file mode 100644
index 5638bb29dc5..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-data-migration.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/data-migration.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/data-migration.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-database-sequences.md b/doc/architecture/blueprints/cells/cells-feature-database-sequences.md
deleted file mode 100644
index 9b426ed80a4..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-database-sequences.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/database-sequences.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/database-sequences.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-explore.md b/doc/architecture/blueprints/cells/cells-feature-explore.md
deleted file mode 100644
index 95924e3d1e8..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-explore.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/explore.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/explore.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-git-access.md b/doc/architecture/blueprints/cells/cells-feature-git-access.md
deleted file mode 100644
index 18fc2b61b1f..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-git-access.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/git-access.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/git-access.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-gitlab-pages.md b/doc/architecture/blueprints/cells/cells-feature-gitlab-pages.md
deleted file mode 100644
index 964423334c1..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-gitlab-pages.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/gitlab-pages.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/gitlab-pages.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-global-search.md b/doc/architecture/blueprints/cells/cells-feature-global-search.md
deleted file mode 100644
index 0a2a89b2d45..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-global-search.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/global-search.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/global-search.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-graphql.md b/doc/architecture/blueprints/cells/cells-feature-graphql.md
deleted file mode 100644
index 69ce2128484..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-graphql.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/graphql.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/graphql.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-organizations.md b/doc/architecture/blueprints/cells/cells-feature-organizations.md
deleted file mode 100644
index 6b589307404..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-organizations.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/organizations.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/organizations.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-personal-access-tokens.md b/doc/architecture/blueprints/cells/cells-feature-personal-access-tokens.md
deleted file mode 100644
index 115af11e3a6..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-personal-access-tokens.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/personal-access-tokens.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/personal-access-tokens.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-personal-namespaces.md b/doc/architecture/blueprints/cells/cells-feature-personal-namespaces.md
deleted file mode 100644
index 6d5ec0c9dd6..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-personal-namespaces.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/personal-namespaces.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/personal-namespaces.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-router-endpoints-classification.md b/doc/architecture/blueprints/cells/cells-feature-router-endpoints-classification.md
deleted file mode 100644
index 0143ac6ffd9..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-router-endpoints-classification.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/router-endpoints-classification.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/router-endpoints-classification.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-schema-changes.md b/doc/architecture/blueprints/cells/cells-feature-schema-changes.md
deleted file mode 100644
index bf78a4eae41..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-schema-changes.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/schema-changes.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/schema-changes.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-secrets.md b/doc/architecture/blueprints/cells/cells-feature-secrets.md
deleted file mode 100644
index 1c4c79d96fc..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-secrets.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/secrets.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/secrets.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-snippets.md b/doc/architecture/blueprints/cells/cells-feature-snippets.md
deleted file mode 100644
index 2963bbdec2c..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-snippets.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/snippets.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/snippets.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-template.md b/doc/architecture/blueprints/cells/cells-feature-template.md
deleted file mode 100644
index c75cc88f46c..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-template.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/template.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/template.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-uploads.md b/doc/architecture/blueprints/cells/cells-feature-uploads.md
deleted file mode 100644
index eab7a8a4fcd..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-uploads.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/uploads.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/uploads.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-user-profile.md b/doc/architecture/blueprints/cells/cells-feature-user-profile.md
deleted file mode 100644
index 73f312f3762..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-user-profile.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/user-profile.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/user-profile.md).
diff --git a/doc/architecture/blueprints/cells/cells-feature-your-work.md b/doc/architecture/blueprints/cells/cells-feature-your-work.md
deleted file mode 100644
index 344037f2a76..00000000000
--- a/doc/architecture/blueprints/cells/cells-feature-your-work.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'impacted_features/your-work.md'
-remove_date: '2023-11-17'
----
-
-This document was moved to [another location](impacted_features/your-work.md).
diff --git a/doc/architecture/blueprints/cells/deployment-architecture.md b/doc/architecture/blueprints/cells/deployment-architecture.md
index 57dabd447b4..1ec8461b138 100644
--- a/doc/architecture/blueprints/cells/deployment-architecture.md
+++ b/doc/architecture/blueprints/cells/deployment-architecture.md
@@ -96,7 +96,7 @@ The differences compared to [Initial Cells deployment](#3-initial-cells-deployme
The differences compared to [Hybrid Cells deployment](#4-hybrid-cells-deployment---initial-complete-cells-architecture) are:
-- The Routing Service is expanded to support [GitLab Pages](../../../user/project/pages/index.md) and [GitLab Container Registry](../../../user/packages/container_registry/index.md).
+- The Routing Service is expanded to support [GitLab Pages](../../../user/project/pages/index.md) and [GitLab container registry](../../../user/packages/container_registry/index.md).
- Each Cell has all services isolated.
- It is allowed that some Cells will follow a [hybrid architecture](#4-hybrid-cells-deployment---initial-complete-cells-architecture).
@@ -134,7 +134,7 @@ As per the architecture, the above services are required to be run Cell-local:
| Service | | Uses | Migrate from cluster-wide to Cell | Description |
| ------------------- | --------------- | ------------------------------- | ----------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
| **GitLab Pages** | GitLab-built | Routing Service, Rails API | No problem | Serving CI generated pages under `.gitlab.io` or custom domains |
-| **GitLab Registry** | GitLab-built | Object Storage, PostgreSQL | Non-trivial data migration in case of split | Service to provide GitLab Container Registry |
+| **GitLab Registry** | GitLab-built | Object Storage, PostgreSQL | Non-trivial data migration in case of split | Service to provide GitLab container registry |
| **Gitaly Cluster** | GitLab-built | Disk storage, PostgreSQL | No problem: Built-in migration routines to balance Gitaly nodes | Gitaly holds Git repository data. Many Gitaly clusters can be configured in application. |
| **Elasticsearch** | Managed service | Many nodes required by sharding | Time consuming: Rebuild cluster from scratch | Search across all projects |
| **Object Storage** | Managed service | | Not straightforward: Rather hard to selectively migrate between buckets | Holds all user and CI uploaded files that is served by GitLab |
diff --git a/doc/architecture/blueprints/cells/glossary.md b/doc/architecture/blueprints/cells/glossary.md
deleted file mode 100644
index 69824663867..00000000000
--- a/doc/architecture/blueprints/cells/glossary.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-redirect_to: 'goals.md#glossary'
-remove_date: '2023-11-24'
----
-
-This document was moved to [another location](goals.md#glossary).
diff --git a/doc/architecture/blueprints/cells/impact.md b/doc/architecture/blueprints/cells/impact.md
deleted file mode 100644
index 1f77b9056be..00000000000
--- a/doc/architecture/blueprints/cells/impact.md
+++ /dev/null
@@ -1,7 +0,0 @@
----
-redirect_to: 'index.md'
-remove_date: '2023-11-22'
----
-
-This document was removed due to being outdated.
-Go to [index page](index.md) for the most recent content.
diff --git a/doc/architecture/blueprints/cells/impacted_features/container-registry.md b/doc/architecture/blueprints/cells/impacted_features/container-registry.md
index ea15dd52d94..4c6ac36a2af 100644
--- a/doc/architecture/blueprints/cells/impacted_features/container-registry.md
+++ b/doc/architecture/blueprints/cells/impacted_features/container-registry.md
@@ -19,22 +19,22 @@ GitLab [Container Registry](../../../../user/packages/container_registry/index.m
## 1. Definition
-GitLab Container Registry is a complex service requiring usage of PostgreSQL, Redis and Object Storage dependencies.
-Right now there's undergoing work to introduce [Container Registry Metadata](../../container_registry_metadata_database/index.md) to optimize data storage and image retention policies of Container Registry.
+GitLab container registry is a complex service requiring usage of PostgreSQL, Redis and Object Storage dependencies.
+Right now there's undergoing work to introduce [Container Registry Metadata](../../container_registry_metadata_database/index.md) to optimize data storage and image retention policies of container registry.
-GitLab Container Registry is serving as a container for stored data, but on its own does not authenticate `docker login`.
+GitLab container registry is serving as a container for stored data, but on its own does not authenticate `docker login`.
The `docker login` is executed with user credentials (can be `personal access token`) or CI build credentials (ephemeral `ci_builds.token`).
Container Registry uses data deduplication.
It means that the same blob (image layer) that is shared between many Projects is stored only once.
Each layer is hashed by `sha256`.
-The `docker login` does request a JWT time-limited authentication token that is signed by GitLab, but validated by Container Registry service.
+The `docker login` does request a JWT time-limited authentication token that is signed by GitLab, but validated by container registry service.
The JWT token does store all authorized scopes (`container repository images`) and operation types (`push` or `pull`).
A single JWT authentication token can have many authorized scopes.
-This allows Container Registry and client to mount existing blobs from other scopes.
+This allows container registry and client to mount existing blobs from other scopes.
GitLab responds only with authorized scopes.
-Then it is up to GitLab Container Registry to validate if the given operation can be performed.
+Then it is up to GitLab container registry to validate if the given operation can be performed.
The GitLab.com pages are always scoped to a Project.
Each Project can have many container registry images attached.
@@ -88,24 +88,24 @@ curl \
## 3. Proposal
-### 3.1. Shard Container Registry separately to Cells architecture
+### 3.1. Shard container registry separately to Cells architecture
-Due to its extensive and in general highly scalable horizontal architecture it should be evaluated if the GitLab Container Registry should be run not in Cell, but in a Cluster and be scaled independently.
+Due to its extensive and in general highly scalable horizontal architecture it should be evaluated if the GitLab container registry should be run not in Cell, but in a Cluster and be scaled independently.
This might be easier, but would definitely not offer the same amount of data isolation.
-### 3.2. Run Container Registry within a Cell
+### 3.2. Run container registry within a Cell
-It appears that except `/jwt/auth` which would likely have to be processed by Router (to decode `scope`) the Container Registry could be run as a local service of a Cell.
+It appears that except `/jwt/auth` which would likely have to be processed by Router (to decode `scope`) the container registry could be run as a local service of a Cell.
The actual data at least in case of GitLab.com is not forwarded via registry, but rather served directly from Object Storage / CDN.
Its design encodes container repository image in a URL that is easily routable.
-It appears that we could re-use the same stateless Router service in front of Container Registry to serve manifests and blobs redirect.
+It appears that we could re-use the same stateless Router service in front of container registry to serve manifests and blobs redirect.
The only downside is increased complexity of managing standalone registry for each Cell, but this might be desired approach.
## 4. Evaluation
-There do not seem to be any theoretical problems with running GitLab Container Registry in a Cell.
+There do not seem to be any theoretical problems with running GitLab container registry in a Cell.
It seems that the service can be easily made routable to work well.
The practical complexities are around managing a complex service from an infrastructure side.
diff --git a/doc/architecture/blueprints/cells/index.md b/doc/architecture/blueprints/cells/index.md
index c9a03830a4a..3b800a54781 100644
--- a/doc/architecture/blueprints/cells/index.md
+++ b/doc/architecture/blueprints/cells/index.md
@@ -4,7 +4,7 @@ creation-date: "2022-09-07"
authors: [ "@ayufan", "@fzimmer", "@DylanGriffith", "@lohrc", "@tkuah" ]
coach: "@ayufan"
approvers: [ "@lohrc" ]
-owning-stage: "~devops::enablement"
+owning-stage: "~devops::data stores"
participating-stages: []
---
@@ -85,9 +85,8 @@ In some cases, a table with an ambiguous usage has to be broken down.
For example: `uploads` are used to store user avatars, as well as uploaded attachments for comments.
It would be expected that `uploads` is split into `uploads` (describing Group/Project-level attachments) and `global_uploads` (describing, for example, user avatars).
-Except for the initial 2-3 quarters this work is highly parallel.
It is expected that **group::tenant scale** will help other teams to fix their feature set to work with Cells.
-The first 2-3 quarters are required to define a general split of data and build the required tooling.
+The first 2-3 quarters are required to define a general split of data, and build the required tooling and development guidelines.
1. **Instance-wide settings are shared across cluster.**
@@ -97,17 +96,21 @@ The first 2-3 quarters are required to define a general split of data and build
The purpose is to make `users` cluster-wide.
+1. **User can create Organization.**
+
+ The purpose is to create Organizations that are isolated from each other.
+
1. **User can create Group.** ✓ ([demo](https://www.youtube.com/watch?v=LUyV0ncfdRs))
The purpose is to perform a targeted decomposition of `users` and `namespaces`, because `namespaces` will be stored locally in the Cell.
-1. **User can create Project.**
+1. **User can create Project.** ✓ ([demo](https://www.youtube.com/watch?v=Z-2W8MfDwuI))
The purpose is to perform a targeted decomposition of `users` and `projects`, because `projects` will be stored locally in the Cell.
-1. **User can create Organization on Cell 2.**
+1. **User can create Project with a README file**
- The purpose is to create Organizations that are isolated from each other.
+ The purpose is to allow `users` to create README files in a project.
1. **User can change profile avatar that is shared in cluster.**
@@ -141,6 +144,28 @@ The first 2-3 quarters are required to define a general split of data and build
The purpose is to have many Organizations per Cell, but never have a single Organization spanning across many Cells. This is required to ensure that information shown within an Organization is isolated, and does not require fetching information from other Cells.
+#### Dependencies
+
+We have identified the following dependencies between the essential workflows.
+
+```mermaid
+flowchart TD
+ A[Create Organization] --> B[Create Group]
+ B --> C[Create Project]
+ L --> D[Create Issue]
+ E --> F[Push to Git repo]
+ E --> G[Create Merge Request]
+ E --> H[Create CI Pipeline]
+ G --> J[Merge when Pipeline Succeeds]
+ H --> J
+ J --> K[Issue gets closed by the reference in MR description]
+ D --> K
+ A --> L[Manage members]
+ B --> L
+ C --> L
+ L --> E[Create file in repository]
+```
+
### 3. Additional workflows
Some of these additional workflows might need to be supported, depending on the group decision.
@@ -157,43 +182,7 @@ This list is not exhaustive of work needed to be done.
### 4. Routing layer
-The routing layer is meant to offer a consistent user experience where all Cells are presented under a single domain (for example, `gitlab.com`), instead of having to navigate to separate domains.
-
-The user will be able to use `https://gitlab.com` to access Cell-enabled GitLab.
-Depending on the URL access, it will be transparently proxied to the correct Cell that can serve this particular information.
-For example:
-
-- All requests going to `https://gitlab.com/users/sign_in` are randomly distributed to all Cells.
-- All requests going to `https://gitlab.com/gitlab-org/gitlab/-/tree/master` are always directed to Cell 5, for example.
-- All requests going to `https://gitlab.com/my-username/my-project` are always directed to Cell 1.
-
-1. **Technology.**
-
- We decide what technology the routing service is written in.
- The choice is dependent on the best performing language, and the expected way and place of deployment of the routing layer.
- If it is required to make the service multi-cloud it might be required to deploy it to the CDN provider.
- Then the service needs to be written using a technology compatible with the CDN provider.
-
-1. **Cell discovery.**
-
- The routing service needs to be able to discover and monitor the health of all Cells.
-
-1. **User can use single domain to interact with many Cells.**
-
- The routing service will intelligently route all requests to Cells based on the resource being
- accessed versus the Cell containing the data.
-
-1. **Router endpoints classification.**
-
- The stateless routing service will fetch and cache information about endpoints from one of the Cells.
- We need to implement a protocol that will allow us to accurately describe the incoming request (its fingerprint), so it can be classified by one of the Cells, and the results of that can be cached.
- We also need to implement a mechanism for negative cache and cache eviction.
-
-1. **GraphQL and other ambiguous endpoints.**
-
- Most endpoints have a unique sharding key: the Organization, which directly or indirectly (via a Group or Project) can be used to classify endpoints.
- Some endpoints are ambiguous in their usage (they don't encode the sharding key), or the sharding key is stored deep in the payload.
- In these cases, we need to decide how to handle endpoints like `/api/graphql`.
+See [Cells: Routing Service](routing-service.md).
### 5. Cell deployment
@@ -268,49 +257,48 @@ Expectations:
The delivered iterations will focus on solving particular steps of a given key work stream.
It is expected that initial iterations will be rather slow, because they require substantially more changes to prepare the codebase for data split.
-One iteration describes one quarter's worth of work.
-
-1. [Iteration 1](https://gitlab.com/groups/gitlab-org/-/epics/9667) - FY24Q1 - Complete
+### [Iteration 1](https://gitlab.com/groups/gitlab-org/-/epics/9667) (FY24Q1)
- - Data access layer: Initial Admin Area settings are shared across cluster.
- - Essential workflows: Allow to share cluster-wide data with database-level data access layer
+- Data access layer: Initial Admin Area settings are shared across cluster.
+- Essential workflows: Allow to share cluster-wide data with database-level data access layer.
-1. [Iteration 2](https://gitlab.com/groups/gitlab-org/-/epics/9813) - Expected delivery: 16.2 FY24Q2, Actual delivery: 16.4 FY24Q3 - Complete
+### [Iteration 2](https://gitlab.com/groups/gitlab-org/-/epics/9813) (FY24Q2-FY24Q3)
- - Essential workflows: User accounts are shared across cluster.
- - Essential workflows: User can create Group.
+- Essential workflows: User accounts are shared across cluster.
+- Essential workflows: User can create Group.
-1. [Iteration 3](https://gitlab.com/groups/gitlab-org/-/epics/10997) - Expected delivery: 16.7 FY24Q4 - In Progress
+### [Iteration 3](https://gitlab.com/groups/gitlab-org/-/epics/10997) (FY24Q4-FY25Q1)
- - Essential workflows: User can create Project.
- - Routing: Technology.
- - Routing: Cell discovery.
+- Essential workflows: User can create Project.
+- Routing: Technology.
+- Routing: Cell discovery.
-1. [Iteration 4](https://gitlab.com/groups/gitlab-org/-/epics/10998) - Expected delivery: 16.10 FY25Q1 - Planned
+### [Iteration 4](https://gitlab.com/groups/gitlab-org/-/epics/10998) (FY25Q1-FY25Q2)
- - Essential workflows: User can create Organization on Cell 2.
- - Data access layer: Cluster-unique identifiers.
- - Data access layer: Evaluate the efficiency of database-level access vs. API-oriented access layer.
- - Data access layer: Data access layer.
- - Routing: User can use single domain to interact with many Cells.
- - Cell deployment: Extend GitLab Dedicated to support GCP.
+- Essential workflows: User can create Organization on Cell 2.
-1. Iteration 5..N - starting FY25Q1
+### Iteration 5..N - starting FY25Q3
- - Essential workflows: User can push to Git repository.
- - Essential workflows: User can run CI pipeline.
- - Essential workflows: Instance-wide settings are shared across cluster.
- - Essential workflows: User can change profile avatar that is shared in cluster.
- - Essential workflows: User can create issue.
- - Essential workflows: User can create merge request, and merge it after it is green.
- - Essential workflows: User can manage Group and Project members.
- - Essential workflows: User can manage instance-wide runners.
- - Essential workflows: User is part of Organization and can only see information from the Organization.
- - Routing: Router endpoints classification.
- - Routing: GraphQL and other ambiguous endpoints.
- - Data access layer: Allow to share cluster-wide data with database-level data access layer.
- - Data access layer: Cluster-wide deletions.
- - Data access layer: Database migrations.
+- Data access layer: Cluster-unique identifiers.
+- Data access layer: Evaluate the efficiency of database-level access vs. API-oriented access layer.
+- Data access layer: Data access layer.
+- Routing: User can use single domain to interact with many Cells.
+- Cell deployment: Extend GitLab Dedicated to support GCP.
+- Essential workflows: User can create Project with a README file.
+- Essential workflows: User can push to Git repository.
+- Essential workflows: User can run CI pipeline.
+- Essential workflows: Instance-wide settings are shared across cluster.
+- Essential workflows: User can change profile avatar that is shared in cluster.
+- Essential workflows: User can create issue.
+- Essential workflows: User can create merge request, and merge it after it is green.
+- Essential workflows: User can manage Group and Project members.
+- Essential workflows: User can manage instance-wide runners.
+- Essential workflows: User is part of Organization and can only see information from the Organization.
+- Routing: Router endpoints classification.
+- Routing: GraphQL and other ambiguous endpoints.
+- Data access layer: Allow to share cluster-wide data with database-level data access layer.
+- Data access layer: Cluster-wide deletions.
+- Data access layer: Database migrations.
## Technical proposals
@@ -318,7 +306,7 @@ The Cells architecture has long lasting implications to data processing, locatio
This section links all different technical proposals that are being evaluated.
- [Stateless Router That Uses a Cache to Pick Cell and Is Redirected When Wrong Cell Is Reached](proposal-stateless-router-with-buffering-requests.md)
-- [Stateless Router That Uses a Cache to Pick Cell and pre-flight `/api/v4/cells/learn`](proposal-stateless-router-with-routes-learning.md)
+- [Stateless Router That Uses a Cache to Pick Cell and pre-flight `/api/v4/internal/cells/learn`](proposal-stateless-router-with-routes-learning.md)
## Impacted features
diff --git a/doc/architecture/blueprints/cells/proposal-stateless-router-with-routes-learning.md b/doc/architecture/blueprints/cells/proposal-stateless-router-with-routes-learning.md
index 962f71673df..cdcb5b8b21f 100644
--- a/doc/architecture/blueprints/cells/proposal-stateless-router-with-routes-learning.md
+++ b/doc/architecture/blueprints/cells/proposal-stateless-router-with-routes-learning.md
@@ -35,7 +35,7 @@ Organization can only be on a single Cell.
## Differences
The main difference between this proposal and one [with buffering requests](proposal-stateless-router-with-buffering-requests.md)
-is that this proposal uses a pre-flight API request (`/api/v4/cells/learn`) to redirect the request body to the correct Cell.
+is that this proposal uses a pre-flight API request (`/pi/v4/internal/cells/learn`) to redirect the request body to the correct Cell.
This means that each request is sent exactly once to be processed, but the URI is used to decode which Cell it should be directed.
## Summary in diagrams
@@ -157,11 +157,11 @@ graph TD;
1. The `application_settings` (and probably a few other instance level tables) are decomposed into `gitlab_admin` schema
1. A new column `routes.cell_id` is added to `routes` table
1. A new Router service exists to choose which cell to route a request to.
-1. If a router receives a new request it will send `/api/v4/cells/learn?method=GET&path_info=/group-org/project` to learn which Cell can process it
+1. If a router receives a new request it will send `/api/v4/internal/cells/learn?method=GET&path_info=/group-org/project` to learn which Cell can process it
1. A new concept will be introduced in GitLab called an organization
1. We require all existing endpoints to be routable by URI, or be fixed to a specific Cell for processing. This requires changing ambiguous endpoints like `/dashboard` to be scoped like `/organizations/my-organization/-/dashboard`
1. Endpoints like `/admin` would be routed always to the specific Cell, like `cell_0`
-1. Each Cell can respond to `/api/v4/cells/learn` and classify each endpoint
+1. Each Cell can respond to `/api/v4/internal/cells/learn` and classify each endpoint
1. Writes to `gitlab_users` and `gitlab_routes` are sent to a primary PostgreSQL server in our `US` region but reads can come from replicas in the same region. This will add latency for these writes but we expect they are infrequent relative to the rest of GitLab.
## Pre-flight request learning
@@ -174,7 +174,7 @@ the routable path. GitLab Rails will decode `path_info` and match it to
an existing endpoint and find a routable entity (like project). The router will
treat this as short-lived cache information.
-1. Prefix match: `/api/v4/cells/learn?method=GET&path_info=/gitlab-org/gitlab-test/-/issues`
+1. Prefix match: `/api/v4/internal/cells/learn?method=GET&path_info=/gitlab-org/gitlab-test/-/issues`
```json
{
@@ -184,7 +184,7 @@ treat this as short-lived cache information.
}
```
-1. Some endpoints might require an exact match: `/api/v4/cells/learn?method=GET&path_info=/-/profile`
+1. Some endpoints might require an exact match: `/api/v4/internal/cells/learn?method=GET&path_info=/-/profile`
```json
{
@@ -283,7 +283,7 @@ keeping settings in sync for all cells.
to aggregate information from many Cells.
1. All unknown routes are sent to the latest deployment which we assume to be `Cell US0`.
This is required as newly added endpoints will be only decodable by latest cell.
- Likely this is not a problem for the `/cells/learn` is it is lightweight
+ Likely this is not a problem for the `/internal/cells/learn` is it is lightweight
to process and this should not cause a performance impact.
## Example database configuration
@@ -361,7 +361,7 @@ this limitation.
1. User is in Europe so DNS resolves to the router in Europe
1. They request `/my-company/my-project` without the router cache, so the router chooses randomly `Cell EU1`
-1. The `/cells/learn` is sent to `Cell EU1`, which responds that resource lives on `Cell EU0`
+1. The `/internal/cells/learn` is sent to `Cell EU1`, which responds that resource lives on `Cell EU0`
1. `Cell EU0` returns the correct response
1. The router now caches and remembers any request paths matching `/my-company/*` should go to `Cell EU0`
@@ -372,7 +372,7 @@ sequenceDiagram
participant cell_eu0 as Cell EU0
participant cell_eu1 as Cell EU1
user->>router_eu: GET /my-company/my-project
- router_eu->>cell_eu1: /api/v4/cells/learn?method=GET&path_info=/my-company/my-project
+ router_eu->>cell_eu1: /api/v4/internal/cells/learn?method=GET&path_info=/my-company/my-project
cell_eu1->>router_eu: {path: "/my-company", cell: "cell_eu0", source: "routable"}
router_eu->>cell_eu0: GET /my-company/my-project
cell_eu0->>user: <h1>My Project...
@@ -382,9 +382,9 @@ sequenceDiagram
1. User is in Europe so DNS resolves to the router in Europe
1. The router does not have `/my-company/*` cached yet so it chooses randomly `Cell EU1`
-1. The `/cells/learn` is sent to `Cell EU1`, which responds that resource lives on `Cell EU0`
+1. The `/internal/cells/learn` is sent to `Cell EU1`, which responds that resource lives on `Cell EU0`
1. `Cell EU0` redirects them through a login flow
-1. User requests `/users/sign_in`, uses random Cell to run `/cells/learn`
+1. User requests `/users/sign_in`, uses random Cell to run `/internal/cells/learn`
1. The `Cell EU1` responds with `cell_0` as a fixed route
1. User after login requests `/my-company/my-project` which is cached and stored in `Cell EU0`
1. `Cell EU0` returns the correct response
@@ -396,12 +396,12 @@ sequenceDiagram
participant cell_eu0 as Cell EU0
participant cell_eu1 as Cell EU1
user->>router_eu: GET /my-company/my-project
- router_eu->>cell_eu1: /api/v4/cells/learn?method=GET&path_info=/my-company/my-project
+ router_eu->>cell_eu1: /api/v4/internal/cells/learn?method=GET&path_info=/my-company/my-project
cell_eu1->>router_eu: {path: "/my-company", cell: "cell_eu0", source: "routable"}
router_eu->>cell_eu0: GET /my-company/my-project
cell_eu0->>user: 302 /users/sign_in?redirect=/my-company/my-project
user->>router_eu: GET /users/sign_in?redirect=/my-company/my-project
- router_eu->>cell_eu1: /api/v4/cells/learn?method=GET&path_info=/users/sign_in
+ router_eu->>cell_eu1: /api/v4/internal/cells/learn?method=GET&path_info=/users/sign_in
cell_eu1->>router_eu: {path: "/users", cell: "cell_eu0", source: "fixed"}
router_eu->>cell_eu0: GET /users/sign_in?redirect=/my-company/my-project
cell_eu0-->>user: <h1>Sign in...
@@ -445,7 +445,7 @@ sequenceDiagram
participant cell_eu0 as Cell EU0
participant cell_us0 as Cell US0
user->>router_eu: GET /gitlab-org/gitlab
- router_eu->>cell_eu0: /api/v4/cells/learn?method=GET&path_info=/gitlab-org/gitlab
+ router_eu->>cell_eu0: /api/v4/internal/cells/learn?method=GET&path_info=/gitlab-org/gitlab
cell_eu0->>router_eu: {path: "/gitlab-org", cell: "cell_us0", source: "routable"}
router_eu->>cell_us0: GET /gitlab-org/gitlab
cell_us0->>user: <h1>GitLab.org...
@@ -569,7 +569,7 @@ sequenceDiagram
router_us->>cell_us1: GET /
cell_us1->>user: 302 /dashboard
user->>router_us: GET /dashboard
- router_us->>cell_us1: /api/v4/cells/learn?method=GET&path_info=/dashboard
+ router_us->>cell_us1: /api/v4/internal/cells/learn?method=GET&path_info=/dashboard
cell_us1->>router_us: {path: "/dashboard", cell: "cell_us0", source: "routable"}
router_us->>cell_us0: GET /dashboard
cell_us0->>user: <h1>Dashboard...
diff --git a/doc/architecture/blueprints/cells/routing-service.md b/doc/architecture/blueprints/cells/routing-service.md
new file mode 100644
index 00000000000..9efdbdf3f91
--- /dev/null
+++ b/doc/architecture/blueprints/cells/routing-service.md
@@ -0,0 +1,196 @@
+---
+stage: core platform
+group: Tenant Scale
+description: 'Cells: Routing Service'
+---
+
+# Cells: Routing Service
+
+This document describes design goals and architecture of Routing Service
+used by Cells. To better understand where the Routing Service fits
+into architecture take a look at [Deployment Architecture](deployment-architecture.md).
+
+## Goals
+
+The routing layer is meant to offer a consistent user experience where all Cells are presented under a single domain (for example, `gitlab.com`), instead of having to navigate to separate domains.
+
+The user will be able to use `https://gitlab.com` to access Cell-enabled GitLab.
+Depending on the URL access, it will be transparently proxied to the correct Cell that can serve this particular information.
+For example:
+
+- All requests going to `https://gitlab.com/users/sign_in` are randomly distributed to all Cells.
+- All requests going to `https://gitlab.com/gitlab-org/gitlab/-/tree/master` are always directed to Cell 5, for example.
+- All requests going to `https://gitlab.com/my-username/my-project` are always directed to Cell 1.
+
+1. **Technology.**
+
+ We decide what technology the routing service is written in.
+ The choice is dependent on the best performing language, and the expected way and place of deployment of the routing layer.
+ If it is required to make the service multi-cloud it might be required to deploy it to the CDN provider.
+ Then the service needs to be written using a technology compatible with the CDN provider.
+
+1. **Cell discovery.**
+
+ The routing service needs to be able to discover and monitor the health of all Cells.
+
+1. **User can use single domain to interact with many Cells.**
+
+ The routing service will intelligently route all requests to Cells based on the resource being
+ accessed versus the Cell containing the data.
+
+1. **Router endpoints classification.**
+
+ The stateless routing service will fetch and cache information about endpoints from one of the Cells.
+ We need to implement a protocol that will allow us to accurately describe the incoming request (its fingerprint), so it can be classified by one of the Cells, and the results of that can be cached.
+ We also need to implement a mechanism for negative cache and cache eviction.
+
+1. **GraphQL and other ambiguous endpoints.**
+
+ Most endpoints have a unique sharding key: the Organization, which directly or indirectly (via a Group or Project) can be used to classify endpoints.
+ Some endpoints are ambiguous in their usage (they don't encode the sharding key), or the sharding key is stored deep in the payload.
+ In these cases, we need to decide how to handle endpoints like `/api/graphql`.
+
+1. **Small.**
+
+ The Routing Service is configuration-driven and rules-driven, and does not implement any business logic.
+ The maximum size of the project source code in initial phase is 1_000 lines without tests.
+ The reason for the hard limit is to make the Routing Service to not have any special logic,
+ and could be rewritten into any technology in a matter of a few days.
+
+## Requirements
+
+| Requirement | Description | Priority |
+|---------------|-------------------------------------------------------------------|----------|
+| Discovery | needs to be able to discover and monitor the health of all Cells. | high |
+| Security | only authorized cells can be routed to | high |
+| Single domain | e.g. GitLab.com | high |
+| Caching | can cache routing information for performance | high |
+| [50 ms of increased latency](#low-latency) | | high |
+| Path-based | can make routing decision based on path | high |
+| Complexity | the routing service should be configuration-driven and small | high |
+| Stateless | does not need database, Cells provide all routing information | medium |
+| Secrets-based | can make routing decision based on secret (e.g. JWT) | medium |
+| Observability | can use existing observability tooling | low |
+| Self-managed | can be eventually used by [self-managed](goals.md#self-managed) | low |
+| Regional | can route requests to different [regions](goals.md#regions) | low |
+
+### Low Latency
+
+The target latency for routing service **should be less than 50 _ms_**.
+
+Looking at the `urgency: high` request we don't have a lot of headroom on the p50.
+Adding an extra 50 _ms_ allows us to still be in or SLO on the p95 level.
+
+There is 3 primary entry points for the application; [`web`](https://gitlab.com/gitlab-com/runbooks/-/blob/5d8248314b343bef15a4c021ac33978525f809e3/services/service-catalog.yml#L492-537), [`api`](https://gitlab.com/gitlab-com/runbooks/-/blob/5d8248314b343bef15a4c021ac33978525f809e3/services/service-catalog.yml#L18-62), and [`git`](https://gitlab.com/gitlab-com/runbooks/-/blob/5d8248314b343bef15a4c021ac33978525f809e3/services/service-catalog.yml#L589-638).
+Each service is assigned a Service Level Indicator (SLI) based on latency using the [apdex](https://www.apdex.org/wp-content/uploads/2020/09/ApdexTechnicalSpecificationV11_000.pdf) standard.
+The corresponding Service Level Objectives (SLOs) for these SLIs require low latencies for large amount of requests.
+It's crucial to ensure that the addition of the routing layer in front of these services does not impact the SLIs.
+The routing layer is a proxy for these services, and we lack a comprehensive SLI monitoring system for the entire request flow (including components like the Edge network and Load Balancers) we use the SLIs for `web`, `git`, and `api` as a target.
+
+The main SLI we use is the [rails requests](../../../development/application_slis/rails_request.md).
+It has multiple `satisfied` targets (apdex) depending on the [request urgency](../../../development/application_slis/rails_request.md#how-to-adjust-the-urgency):
+
+| Urgency | Duration in ms |
+|------------|----------------|
+| `:high` | 250 _ms_ |
+| `:medium` | 500 _ms_ |
+| `:default` | 1000 _ms_ |
+| `:low` | 5000 _ms_ |
+
+#### Analysis
+
+The way we calculate the headroom we have is by using the following:
+
+```math
+\mathrm{Headroom}\ {ms} = \mathrm{Satisfied}\ {ms} - \mathrm{Duration}\ {ms}
+```
+
+**`web`**:
+
+| Target Duration | Percentile | Headroom |
+|-----------------|------------|-----------|
+| 5000 _ms_ | p99 | 4000 _ms_ |
+| 5000 _ms_ | p95 | 4500 _ms_ |
+| 5000 _ms_ | p90 | 4600 _ms_ |
+| 5000 _ms_ | p50 | 4900 _ms_ |
+| 1000 _ms_ | p99 | 500 _ms_ |
+| 1000 _ms_ | p95 | 740 _ms_ |
+| 1000 _ms_ | p90 | 840 _ms_ |
+| 1000 _ms_ | p50 | 900 _ms_ |
+| 500 _ms_ | p99 | 0 _ms_ |
+| 500 _ms_ | p95 | 60 _ms_ |
+| 500 _ms_ | p90 | 100 _ms_ |
+| 500 _ms_ | p50 | 400 _ms_ |
+| 250 _ms_ | p99 | 140 _ms_ |
+| 250 _ms_ | p95 | 170 _ms_ |
+| 250 _ms_ | p90 | 180 _ms_ |
+| 250 _ms_ | p50 | 200 _ms_ |
+
+_Analysis was done in <https://gitlab.com/gitlab-org/gitlab/-/issues/432934#note_1667993089>_
+
+**`api`**:
+
+| Target Duration | Percentile | Headroom |
+|-----------------|------------|-----------|
+| 5000 _ms_ | p99 | 3500 _ms_ |
+| 5000 _ms_ | p95 | 4300 _ms_ |
+| 5000 _ms_ | p90 | 4600 _ms_ |
+| 5000 _ms_ | p50 | 4900 _ms_ |
+| 1000 _ms_ | p99 | 440 _ms_ |
+| 1000 _ms_ | p95 | 750 _ms_ |
+| 1000 _ms_ | p90 | 830 _ms_ |
+| 1000 _ms_ | p50 | 950 _ms_ |
+| 500 _ms_ | p99 | 450 _ms_ |
+| 500 _ms_ | p95 | 480 _ms_ |
+| 500 _ms_ | p90 | 490 _ms_ |
+| 500 _ms_ | p50 | 490 _ms_ |
+| 250 _ms_ | p99 | 90 _ms_ |
+| 250 _ms_ | p95 | 170 _ms_ |
+| 250 _ms_ | p90 | 210 _ms_ |
+| 250 _ms_ | p50 | 230 _ms_ |
+
+_Analysis was done in <https://gitlab.com/gitlab-org/gitlab/-/issues/432934#note_1669995479>_
+
+**`git`**:
+
+| Target Duration | Percentile | Headroom |
+|-----------------|------------|-----------|
+| 5000 _ms_ | p99 | 3760 _ms_ |
+| 5000 _ms_ | p95 | 4280 _ms_ |
+| 5000 _ms_ | p90 | 4430 _ms_ |
+| 5000 _ms_ | p50 | 4900 _ms_ |
+| 1000 _ms_ | p99 | 500 _ms_ |
+| 1000 _ms_ | p95 | 750 _ms_ |
+| 1000 _ms_ | p90 | 800 _ms_ |
+| 1000 _ms_ | p50 | 900 _ms_ |
+| 500 _ms_ | p99 | 280 _ms_ |
+| 500 _ms_ | p95 | 370 _ms_ |
+| 500 _ms_ | p90 | 400 _ms_ |
+| 500 _ms_ | p50 | 430 _ms_ |
+| 250 _ms_ | p99 | 200 _ms_ |
+| 250 _ms_ | p95 | 230 _ms_ |
+| 250 _ms_ | p90 | 240 _ms_ |
+| 250 _ms_ | p50 | 240 _ms_ |
+
+_Analysis was done in <https://gitlab.com/gitlab-org/gitlab/-/issues/432934#note_1671385680>_
+
+## Non-Goals
+
+Not yet defined.
+
+## Proposal
+
+TBD
+
+## Technology
+
+TBD
+
+## Alternatives
+
+TBD
+
+## Links
+
+- [Cells - Routing: Technology](https://gitlab.com/groups/gitlab-org/-/epics/11002)
+- [Classify endpoints](https://gitlab.com/gitlab-org/gitlab/-/issues/430330)
diff --git a/doc/architecture/blueprints/ci_gcp_secrets_manager/index.md b/doc/architecture/blueprints/ci_gcp_secrets_manager/index.md
new file mode 100644
index 00000000000..1dc529d767d
--- /dev/null
+++ b/doc/architecture/blueprints/ci_gcp_secrets_manager/index.md
@@ -0,0 +1,107 @@
+---
+status: proposed
+creation-date: "2023-11-29"
+authors: [ "@alberts-gitlab" ]
+coach: "@grzesiek"
+approvers: [ "@jocelynjane", "@shampton" ]
+owning-stage: "~devops::verify"
+participating-stages: []
+---
+
+<!-- Blueprints often contain forward-looking statements -->
+<!-- vale gitlab.FutureTense = NO -->
+
+# Support GCP Secrets Manager for CI External Secrets
+
+## Summary
+
+This blueprint describes the architecture to add GCP Secrets Manager as one of the
+sources for CI External Secrets.
+
+## Motivation
+
+GitLab CI allows users to pull secrets from external sources into GitLab CI jobs.
+Prior to this, the supported secret managers are HashiCorp Vault and Azure Key Vault.
+GCP Secrets Manager is another major secret manager product and there has been
+multiple requests and feedback to add GCP Secrets Manager to the list of
+supported secret managers.
+
+### Goals
+
+The goal of this feature is to allow GitLab CI users to use secrets stored in
+GCP Secrets Manager in their CI jobs.
+
+### Non-Goals
+
+This feature does not cover the following:
+
+- Using secrets from GCP Secrets Manager in other GitLab workloads.
+- Managing secrets in GCP Secrets Manager or other secret managers through GitLab.
+
+## Proposal
+
+This feature requires a tight integration between GCP Secrets Manager, GitLab Rails and GitLab Runner.
+
+The solution to this feature involves three main parts:
+
+1. Authentication with GCP Secrets Manager
+1. CI configuration on GitLab Rails
+1. Secrets access by GitLab Runner
+
+### Authentication with GCP Secrets Manager
+
+GCP Secrets Manager needs to authenticate secret access requests coming from GitLab Runner.
+Since GitLab Runner can operate in many modes (GitLab.com SaaS runners, SaaS with self-managed runner, GitLab Self-Managed, etc),
+there is no direct correlation between the Runner instance and any GCP identities that can have access to the secrets.
+
+To solve this, we would use OIDC and GCP's Workload Identity Federation mechanism to authorize the requests.
+
+CI jobs already have support for OIDC through CI variables containing ID tokens issued by the GitLab instance.
+These ID tokens already carry `claim`s that describe the context of the CI job.
+For example, it includes details such as `group_id`, `group_path`, `project_id`, and `project_path`.
+
+On the GCP side, Workload Identity Federation allows the use of OIDC to grant GCP IAM roles to the external identities
+represented by the ID tokens. Through Workload Identity Federation, the GCP user can grant specific IAM roles to
+specific principals identified through the OIDC `claim`. For example, a particular `group_id` claim can be given an IAM role
+to access a particular set of secrets in GCP Secrets Manager. This would allow the GCP user to grant granular
+access to the secrets in GCP Secrets Manager.
+
+### CI configuration on GitLab Rails
+
+GitLab Rails will be the interface where users configure the CI jobs. For the GCP Secrets Manager integration,
+there needs to be additional configuration to specify GCP Secrets Manager as a source for external secrets as well as
+GCP specific information in order to enable authentication between GitLab Runner and GCP Secrets Manager.
+
+The proposed CI keyword would be the following:
+
+```yaml
+job_name:
+ id_tokens:
+ GCP_SM_ID_TOKEN:
+ aud: my-GCP-workload-identity-federation-audience
+ secrets:
+ DATABASE_PASSWORD:
+ gcp_sm:
+ name: my-project-secret # This is the name of the secret defined in GCP Secrets Manager
+ version: 1 # optional: default to `latest`.
+ token: GCP_SM_ID_TOKEN
+```
+
+In addition, GitLab Runner needs to know the following in order to perform the authentication and access the secret.
+These should be included as CI variables in the job.
+
+- GCP Project Number `GCP_PROJECT_NUMBER`
+- GCP Workload Federation Pool ID `GCP_WORKLOAD_FEDERATION_POOL_ID`
+- GCP Workload Federation Provider ID `GCP_WORKLOAD_FEDERATION_PROVIDER_ID`
+
+### Secrets access by GitLab Runner
+
+Based on the job specification defined above, GitLab Runner needs to implement the following:
+
+1. OIDC authentication with GCP Secure Token Service to obtain an access token.
+1. Secret access requests to GCP Secrets Manager to obtain the payload of the desired secret version.
+1. Adding the secrets to the build.
+
+## Alternative Solutions
+
+N/A.
diff --git a/doc/architecture/blueprints/ci_pipeline_components/index.md b/doc/architecture/blueprints/ci_pipeline_components/index.md
index 9fdbf8cb70b..9a225c9cd97 100644
--- a/doc/architecture/blueprints/ci_pipeline_components/index.md
+++ b/doc/architecture/blueprints/ci_pipeline_components/index.md
@@ -17,8 +17,6 @@ This document covers the future plans for the CI/CD Catalog feature. For informa
## Summary
-## Goals
-
The goal of the CI/CD pipeline components catalog is to make the reusing
pipeline configurations easier and more efficient. Providing a way to
discover, understand and learn how to reuse pipeline constructs allows for a
@@ -107,7 +105,9 @@ identifying abstract concepts and are subject to changes as we refine the design
- **Template** is a type of component that contains a snippet of CI/CD configuration that can be [included](../../../ci/yaml/includes.md) in a project's pipeline configuration.
- **Publishing** is the act of listing a version of the resource (for example, a project release) on the Catalog.
-## Definition of pipeline component
+## CI component
+
+### Definition of component
A pipeline component is a reusable single-purpose building block that abstracts away a single pipeline configuration unit.
Components are used to compose a part or entire pipeline configuration.
@@ -155,7 +155,7 @@ predictable. The predictability, determinism, referential transparency and
making CI components predictable is still important for us, but we may be
unable to achieve it early iterations.
-## Structure of a component
+### Structure of a component
A pipeline component is identified by a unique address in the form `<fqdn>/<component-path>@<version>` containing:
@@ -165,12 +165,12 @@ A pipeline component is identified by a unique address in the form `<fqdn>/<comp
For example: `gitlab.com/gitlab-org/dast@1.0`.
-### The FQDN
+#### The FQDN
Initially we support only component addresses that point to the same GitLab instance, meaning that the FQDN matches
the GitLab host.
-### The component path
+#### The component path
The directory identified by the component path must contain at least the component YAML and optionally a
related `README.md` documentation file.
@@ -189,7 +189,7 @@ a file `mydir/file.yml` in `gitlab-org/dast` project would be expanded to:
gitlab.com/gitlab-org/dast/mydir/path/to/component@<CURRENT_SHA>
```
-The component YAML file follows the filename convention `<type>.yml` where component type is one of:
+The component YAML file follows the file name convention `<type>.yml` where component type is one of:
| Component type | Context |
| -------------- | ------- |
@@ -206,7 +206,7 @@ For example:
A component YAML file:
- Must have a **name** to be referenced to.
-- Must specify its **type** in the filename, which defines how it can be used (raw configuration to be `include`d, child pipeline workflow, job step).
+- Must specify its **type** in the file name, which defines how it can be used (raw configuration to be `include`d, child pipeline workflow, job step).
- Must define its **content** based on the type.
- Must specify **input parameters** that it accepts. Components should depend on input parameters for dynamic values and not environment variables.
- Should be **validated statically** (for example: using JSON schema validators).
@@ -226,7 +226,7 @@ spec:
# content of the component
```
-### The component version
+#### The component version
The version of the component can be (in order of highest priority first):
@@ -244,86 +244,37 @@ As we want to be able to reference any revisions (even those not released), a co
When referencing a component by local path (for example `./path/to/component`), its version is implicit and matches
the commit SHA of the current pipeline context.
-## Components repository
-
-A components repository is a GitLab project/repository that exclusively hosts one or more pipeline components.
-
-A components repository can be a catalog resource. For a components repository it's highly recommended to set
-an appropriate avatar and project description to improve discoverability in the catalog.
-
-Components repositories that are released in the catalog must have a `README.md` file at the root directory of the repository.
-The `README.md` represents the documentation of the components repository, hence it's recommended
-even when not listing the repository in the catalog.
-
-### Structure of a components repository
-
-A components repository can host one or more components. The author can decide whether to define a single component
-per repository or include multiple cohesive components in the same repository.
+### Note about future resource types
-A components repository is identified by the project full path.
-
-Let's imagine we are developing a component that runs RSpec tests for a Rails app. We create a project
-called `myorg/rails-rspec`.
-
-The following directory structure would support 1 component per repository:
-
-```plaintext
-.
-├── template.yml
-├── README.md
-└── .gitlab-ci.yml
-```
-
-The `.gitlab-ci.yml` is recommended for the project to ensure changes are verified accordingly.
-
-The component is now identified by the path `gitlab.com/myorg/rails-rspec` which also maps to the
-project path. We expect a `template.yml` file and `README.md` to be located in the root directory of the repository.
-
-The following directory structure would support multiple components per repository:
+In the future, to support multiple types of resources in the Catalog we could
+require a file `catalog-resource.yml` to be defined in the root directory of the project:
-```plaintext
-.
-├── .gitlab-ci.yml
-├── README.md
-├── unit/
-│ └── template.yml
-├── integration/
-│ └── template.yml
-└── feature/
- └── template.yml
+```yaml
+name: DAST
+description: Scan a web endpoint to find vulnerabilities
+category: security
+tags: [dynamic analysis, security scanner]
+type: components_repository
```
-In this example we are defining multiple test profiles that are executed with RSpec.
-The user could choose to use one or more of these.
-
-Each of these components are identified by their path `gitlab.com/myorg/rails-rspec/unit`, `gitlab.com/myorg/rails-rspec/integration`
-and `gitlab.com/myorg/rails-rspec/feature`.
-
-This directory structure could also support both strategies:
+This file could also be used for indexing metadata about the content of the resource.
+For example, users could list the components in the repository and we can index
+further data for search purpose:
-```plaintext
-.
-├── template.yml # myorg/rails-rspec
-├── README.md
-├── LICENSE
-├── .gitlab-ci.yml
-├── unit/
-│ └── template.yml # myorg/rails-rspec/unit
-├── integration/
-│ └── template.yml # myorg/rails-rspec/integration
-└── feature/
- └── template.yml # myorg/rails-rspec/feature
+```yaml
+name: DAST
+description: Scan a web endpoint to find vulnerabilities
+category: security
+tags: [dynamic analysis, security scanner]
+type: components_repository
+metadata:
+ components:
+ - all-scans
+ - scan-x
+ - scan-y
```
-With the above structure we could have a top-level component that can be used as the
-default component. For example, `myorg/rails-rspec` could run all the test profiles together.
-However, more specific test profiles could be used separately (for example `myorg/rails-rspec/integration`).
-
-NOTE:
-Nesting of components is not permitted.
-This limitation encourages cohesion at project level and keeps complexity low.
-
-## `spec:inputs:` parameters
+## Input parameters
If the component takes any input parameters they must be specified according to the following schema:
@@ -521,6 +472,85 @@ spec:
# rest of the pipeline config
```
+## Components repository
+
+A components repository is a GitLab project/repository that exclusively hosts one or more pipeline components.
+
+A components repository can be a catalog resource. For a components repository it's highly recommended to set
+an appropriate avatar and project description to improve discoverability in the catalog.
+
+Components repositories that are released in the catalog must have a `README.md` file at the root directory of the repository.
+The `README.md` represents the documentation of the components repository, hence it's recommended
+even when not listing the repository in the catalog.
+
+### Structure of a components repository
+
+A components repository can host one or more components. The author can decide whether to define a single component
+per repository or include multiple cohesive components in the same repository.
+
+A components repository is identified by the project full path.
+
+Let's imagine we are developing a component that runs RSpec tests for a Rails app. We create a project
+called `myorg/rails-rspec`.
+
+The following directory structure would support 1 component per repository:
+
+```plaintext
+.
+├── template.yml
+├── README.md
+└── .gitlab-ci.yml
+```
+
+The `.gitlab-ci.yml` is recommended for the project to ensure changes are verified accordingly.
+
+The component is now identified by the path `gitlab.com/myorg/rails-rspec` which also maps to the
+project path. We expect a `template.yml` file and `README.md` to be located in the root directory of the repository.
+
+The following directory structure would support multiple components per repository:
+
+```plaintext
+.
+├── .gitlab-ci.yml
+├── README.md
+├── unit/
+│ └── template.yml
+├── integration/
+│ └── template.yml
+└── feature/
+ └── template.yml
+```
+
+In this example we are defining multiple test profiles that are executed with RSpec.
+The user could choose to use one or more of these.
+
+Each of these components are identified by their path `gitlab.com/myorg/rails-rspec/unit`, `gitlab.com/myorg/rails-rspec/integration`
+and `gitlab.com/myorg/rails-rspec/feature`.
+
+This directory structure could also support both strategies:
+
+```plaintext
+.
+├── template.yml # myorg/rails-rspec
+├── README.md
+├── LICENSE
+├── .gitlab-ci.yml
+├── unit/
+│ └── template.yml # myorg/rails-rspec/unit
+├── integration/
+│ └── template.yml # myorg/rails-rspec/integration
+└── feature/
+ └── template.yml # myorg/rails-rspec/feature
+```
+
+With the above structure we could have a top-level component that can be used as the
+default component. For example, `myorg/rails-rspec` could run all the test profiles together.
+However, more specific test profiles could be used separately (for example `myorg/rails-rspec/integration`).
+
+NOTE:
+Nesting of components is not permitted.
+This limitation encourages cohesion at project level and keeps complexity low.
+
## CI Catalog
The CI Catalog is an index of resources that users can leverage in CI/CD. It initially
@@ -546,7 +576,7 @@ Once a project is marked as a "catalog resource" it can eventually be displayed
We could create a database record when the setting is enabled and modify the record's state when
the same is disabled.
-## Catalog resource
+### Catalog resource
Upon publishing, a catalog resource should have at least following attributes:
@@ -574,7 +604,7 @@ be listed in the catalog resource's page for various reasons:
To list a catalog resource in the Catalog we first need to create a release for
the project.
-## Releasing new resource versions to the Catalog
+### Releasing new resource versions to the Catalog
The versions that will be published for the resource should be the project
[releases](../../../user/project/releases/index.md). Creating project releases is an official
@@ -610,36 +640,6 @@ For example: index the content of `spec:` section for CI components.
See an [example of development workflow](dev_workflow.md) for a components repository.
-## Note about future resource types
-
-In the future, to support multiple types of resources in the Catalog we could
-require a file `catalog-resource.yml` to be defined in the root directory of the project:
-
-```yaml
-name: DAST
-description: Scan a web endpoint to find vulnerabilities
-category: security
-tags: [dynamic analysis, security scanner]
-type: components_repository
-```
-
-This file could also be used for indexing metadata about the content of the resource.
-For example, users could list the components in the repository and we can index
-further data for search purpose:
-
-```yaml
-name: DAST
-description: Scan a web endpoint to find vulnerabilities
-category: security
-tags: [dynamic analysis, security scanner]
-type: components_repository
-metadata:
- components:
- - all-scans
- - scan-x
- - scan-y
-```
-
## Implementation guidelines
- Start with the smallest user base. Dogfood the feature for `gitlab-org` and
diff --git a/doc/architecture/blueprints/clickhouse_usage/index.md b/doc/architecture/blueprints/clickhouse_usage/index.md
index 3febb09f0bf..bbe4132f833 100644
--- a/doc/architecture/blueprints/clickhouse_usage/index.md
+++ b/doc/architecture/blueprints/clickhouse_usage/index.md
@@ -4,7 +4,7 @@ creation-date: "2023-02-02"
authors: [ "@nhxnguyen" ]
coach: "@grzesiek"
approvers: [ "@dorrino", "@nhxnguyen" ]
-owning-stage: "~devops::data_stores"
+owning-stage: "~devops::data stores"
participating-stages: ["~section::ops", "~section::dev"]
---
diff --git a/doc/architecture/blueprints/composable_codebase_using_rails_engines/index.md b/doc/architecture/blueprints/composable_codebase_using_rails_engines/index.md
index 5b82716cb21..fc35cc0dd50 100644
--- a/doc/architecture/blueprints/composable_codebase_using_rails_engines/index.md
+++ b/doc/architecture/blueprints/composable_codebase_using_rails_engines/index.md
@@ -124,7 +124,7 @@ application layers. This list is not exhaustive, but shows a general list of the
- Web GraphQL: provide a flexible API interface, allowing the Web frontend to fetch only the data needed thereby reducing the amount of compute and data transfer
- Web ActionCable: provide bi-directional connection to enable real-time features for Users visiting web interface
- Web Feature Flags Unleash Backend: provide an Unleash-compatible Server that uses GitLab API
-- Web Packages API: provide a REST API compatible with the packaging tools: Debian, Maven, Container Registry Proxy, etc.
+- Web Packages API: provide a REST API compatible with the packaging tools: Debian, Maven, container registry proxy, etc.
- Git nodes: all code required to authorize `git pull/push` over `SSH` or `HTTPS`
- Sidekiq: run background jobs
- Services/Models/DB: all code required to maintain our database structure, data validation, business logic, and policies models that needs to be shared with other components
diff --git a/doc/architecture/blueprints/consolidating_groups_and_projects/index.md b/doc/architecture/blueprints/consolidating_groups_and_projects/index.md
index 2e0b4d40e13..89e7f0a8a88 100644
--- a/doc/architecture/blueprints/consolidating_groups_and_projects/index.md
+++ b/doc/architecture/blueprints/consolidating_groups_and_projects/index.md
@@ -5,7 +5,7 @@ authors: [ "@alexpooley", "@ifarkas" ]
coach: "@grzesiek"
approvers: [ "@m_gill", "@mushakov" ]
author-stage: "~devops::plan"
-owning-stage: "~devops::data_stores"
+owning-stage: "~devops::data stores"
participating-stages: []
---
@@ -230,7 +230,7 @@ We should strive to do the code clean up as we move through the phases. However,
The initial iteration will provide a framework to house features under `Namespaces`. Stage groups will eventually need to migrate their own features and functionality over to `Namespaces`. This may impact these features in unexpected ways. Therefore, to minimize UX debt and maintain product consistency, stage groups will have to consider several factors when migrating their features over to `Namespaces`:
1. **Conceptual model**: What are the current and future state conceptual models of these features ([see object modeling for designers](https://hpadkisson.medium.com/object-modeling-for-designers-an-introduction-7871bdcf8baf))? These should be documented in Pajamas (example: [merge requests](https://design.gitlab.com/objects/merge-request/)).
-1. **Merge conflicts**: What inconsistencies are there across project, group, and administrator levels? How might these be addressed? For an example of how we rationalized this for labels, please see [this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/338820).
+1. **Merge conflicts**: What inconsistencies are there across project, group, and administrator levels? How might these be addressed? For an example of how we rationalized this for labels, see [this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/338820).
1. **Inheritance & information flow**: How is information inherited across our container hierarchy currently? How might this be impacted if complying with the new [inheritance behavior](https://gitlab.com/gitlab-org/gitlab/-/issues/343316) framework?
1. **Settings**: Where can settings for this feature be found currently? How will these be impacted by `Namespaces`?
1. **Access**: Who can access this feature and is that impacted by the new container structure? Are there any role or privacy considerations?
diff --git a/doc/architecture/blueprints/container_registry_metadata_database/index.md b/doc/architecture/blueprints/container_registry_metadata_database/index.md
index c9f7f1c0d27..66beac6cdb7 100644
--- a/doc/architecture/blueprints/container_registry_metadata_database/index.md
+++ b/doc/architecture/blueprints/container_registry_metadata_database/index.md
@@ -10,19 +10,19 @@ participating-stages: []
<!-- vale gitlab.FutureTense = NO -->
-# Container Registry Metadata Database
+# Container registry metadata database
-## Usage of the GitLab Container Registry
+## Usage of the GitLab container registry
-With the [Container Registry](https://gitlab.com/gitlab-org/container-registry) integrated into GitLab, every GitLab project can have its own space to store its Docker images. You can use the registry to build, push and share images using the Docker client, CI/CD or the GitLab API.
+With the [container registry](https://gitlab.com/gitlab-org/container-registry) integrated into GitLab, every GitLab project can have its own space to store its Docker images. You can use the registry to build, push and share images using the Docker client, CI/CD or the GitLab API.
-Each day on GitLab.com, between [150k and 200k images are pushed to the registry](https://app.periscopedata.com/app/gitlab/527857/Package-GitLab.com-Stage-Activity-Dashboard?widget=9620193&udv=0), generating about [700k API events](https://app.periscopedata.com/app/gitlab/527857/Package-GitLab.com-Stage-Activity-Dashboard?widget=7601761&udv=0). It's also worth noting that although some customers use other registry vendors, [more than 96% of instances](https://app.periscopedata.com/app/gitlab/527857/Package-GitLab.com-Stage-Activity-Dashboard?widget=9832282&udv=0) are using the GitLab Container Registry.
+Each day on GitLab.com, between [150k and 200k images are pushed to the registry](https://app.periscopedata.com/app/gitlab/527857/Package-GitLab.com-Stage-Activity-Dashboard?widget=9620193&udv=0), generating about [700k API events](https://app.periscopedata.com/app/gitlab/527857/Package-GitLab.com-Stage-Activity-Dashboard?widget=7601761&udv=0). It's also worth noting that although some customers use other registry vendors, [more than 96% of instances](https://app.periscopedata.com/app/gitlab/527857/Package-GitLab.com-Stage-Activity-Dashboard?widget=9832282&udv=0) are using the GitLab container registry.
-For GitLab.com and for GitLab customers, the Container Registry is a critical component to building and deploying software.
+For GitLab.com and for GitLab customers, the container registry is a critical component to building and deploying software.
## Current Architecture
-The Container Registry is a single [Go](https://go.dev/) application. Its only dependency is the storage backend on which images and metadata are stored.
+The container registry is a single [Go](https://go.dev/) application. Its only dependency is the storage backend on which images and metadata are stored.
```mermaid
graph LR
@@ -30,7 +30,7 @@ graph LR
R -- Write/read metadata --> B
```
-Client applications (for example, GitLab Rails and Docker CLI) interact with the Container Registry through its [HTTP API](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/spec/gitlab/api.md). The most common operations are pushing and pulling images to/from the registry, which require a series of HTTP requests in a specific order. The request flow for these operations is detailed in the [Request flow](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/push-pull-request-flow.md).
+Client applications (for example, GitLab Rails and Docker CLI) interact with the container registry through its [HTTP API](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/spec/gitlab/api.md). The most common operations are pushing and pulling images to/from the registry, which require a series of HTTP requests in a specific order. The request flow for these operations is detailed in the [Request flow](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/push-pull-request-flow.md).
The registry supports multiple [storage backends](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/configuration.md#storage), including Google Cloud Storage (GCS) which is used for the GitLab.com registry. In the storage backend, images are stored as blobs, deduplicated, and shared across repositories. These are then linked (like a symlink) to each repository that relies on them, giving them access to the central storage location.
@@ -38,22 +38,22 @@ The name and hierarchy of repositories, as well as image manifests and tags are
### Clients
-The Container Registry has two main clients: the GitLab Rails application and the Docker client/CLI.
+The container registry has two main clients: the GitLab Rails application and the Docker client/CLI.
#### Docker
-The Docker client (`docker` CLI) interacts with the GitLab Container Registry mainly using the [login](https://docs.docker.com/engine/reference/commandline/login/), [push](https://docs.docker.com/engine/reference/commandline/push/) and [pull](https://docs.docker.com/engine/reference/commandline/pull/) commands.
+The Docker client (`docker` CLI) interacts with the GitLab container registry mainly using the [login](https://docs.docker.com/engine/reference/commandline/login/), [push](https://docs.docker.com/engine/reference/commandline/push/) and [pull](https://docs.docker.com/engine/reference/commandline/pull/) commands.
##### Login and Authentication
-GitLab Rails is the default token-based authentication provider for the GitLab Container Registry.
+GitLab Rails is the default token-based authentication provider for the GitLab container registry.
Once the registry receives a request sent by an unauthenticated Docker client, it will reply with `401 Unauthorized` and instruct the client to obtain a token from the GitLab Rails API. The Docker client will then request a Bearer token and embed it in the `Authorization` header of all requests. The registry is responsible for determining if the user is authentication/authorized to perform those requests based on the provided token.
```mermaid
sequenceDiagram
participant C as Docker client
- participant R as GitLab Container Registry
+ participant R as GitLab container registry
participant G as GitLab Rails
C->>R: docker login gitlab.example.com
@@ -65,7 +65,7 @@ sequenceDiagram
Note right of C: Bearer token included in the Authorization header
```
-Please refer to the [Docker documentation](https://docs.docker.com/registry/spec/auth/token/) for more details.
+For more details, refer to the [Docker documentation](https://docs.docker.com/registry/spec/auth/token/).
##### Push and Pull
@@ -81,7 +81,7 @@ The single entrypoint for the registry is the [HTTP API](https://gitlab.com/gitl
| Operation | UI | Background | Observations |
| ------------------------------------------------------------ | ------------------ | ------------------------ | ------------------------------------------------------------ |
-| [Check API version](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/spec/api.md#api-version-check) | **{check-circle}** Yes | **{check-circle}** Yes | Used globally to ensure that the registry supports the Docker Distribution V2 API, as well as for identifying whether GitLab Rails is talking to the GitLab Container Registry or a third-party one (used to toggle features only available in the former). |
+| [Check API version](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/spec/api.md#api-version-check) | **{check-circle}** Yes | **{check-circle}** Yes | Used globally to ensure that the registry supports the Docker Distribution V2 API, as well as for identifying whether GitLab Rails is talking to the GitLab container registry or a third-party one (used to toggle features only available in the former). |
| [List repository tags](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/spec/api.md#listing-image-tags) | **{check-circle}** Yes | **{check-circle}** Yes | Used to list and show tags in the UI. Used to list tags in the background for [cleanup policies](../../../user/packages/container_registry/reduce_container_registry_storage.md#cleanup-policy) and [Geo replication](../../../administration/geo/replication/container_registry.md). |
| [Check if manifest exists](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/spec/api.md#existing-manifests) | **{check-circle}** Yes | **{dotted-circle}** No | Used to get the digest of a manifest by tag. This is then used to pull the manifest and show the tag details in the UI. |
| [Pull manifest](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/spec/api.md#pulling-an-image-manifest) | **{check-circle}** Yes | **{dotted-circle}** No | Used to show the image size and the manifest digest in the tag details UI. |
@@ -164,7 +164,7 @@ Although blobs are shared across repositories, manifest and tag metadata are sco
#### GitLab.com
-Due to scale, performance and isolation concerns, for GitLab.com the registry database will be on a separate dedicated PostgreSQL cluster. Please see [#93](https://gitlab.com/gitlab-org/container-registry/-/issues/93) and [GitLab-com/gl-infra/reliability#10109](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/10109) for additional context.
+Due to scale, performance and isolation concerns, for GitLab.com the registry database will be on a separate dedicated PostgreSQL cluster. See [#93](https://gitlab.com/gitlab-org/container-registry/-/issues/93) and [GitLab-com/gl-infra/reliability#10109](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/10109) for additional context.
The diagram below illustrates the architecture of the database cluster:
@@ -208,7 +208,7 @@ for self-managed instances. Customers not able to upgrade to PostgreSQL 12 have
with security backports and bug fixes.
Apart from online garbage collection, the metadata database's availability unblocks the
-implementation of many requested features for the GitLab Container Registry. These features are only
+implementation of many requested features for the GitLab container registry. These features are only
available for instances using the new version backed by the metadata database.
### Availability
@@ -238,7 +238,7 @@ This is a list of all the registry HTTP API operations and how they depend on th
| [Complete blob upload](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/spec/api.md#put-blob-upload) | `PUT` | `/v2/<name>/blobs/uploads/<uuid>` | **{check-circle}** Yes | **{check-circle}** Yes | **{dotted-circle}** No |
| [Cancel blob upload](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/spec/api.md#canceling-an-upload) | `DELETE` | `/v2/<name>/blobs/uploads/<uuid>` | **{check-circle}** Yes | **{check-circle}** Yes | **{dotted-circle}** No |
-`*` Please refer to the [list of interactions between registry and Rails](#from-gitlab-rails-to-registry) to know why and how.
+`*` Refer to the [list of interactions between registry and Rails](#from-gitlab-rails-to-registry) to know why and how.
#### Failure Scenarios
@@ -294,29 +294,29 @@ Together, these resources should provide an adequate level of insight into the r
#### Third-Party Container Registries
-GitLab ships with the GitLab Container Registry by default, but it's also compatible with third-party registries, as long as they comply with the [Docker Distribution V2 Specification](https://docs.docker.com/registry/spec/api/), now superseded by the [Open Container Initiative (OCI) Image Specification](https://github.com/opencontainers/image-spec/blob/master/spec.md).
+GitLab ships with the GitLab container registry by default, but it's also compatible with third-party registries, as long as they comply with the [Docker Distribution V2 Specification](https://docs.docker.com/registry/spec/api/), now superseded by the [Open Container Initiative (OCI) Image Specification](https://github.com/opencontainers/image-spec/blob/master/spec.md).
So far, we have tried to maintain full compatibility with third-party registries when adding new features. For example, in 12.8, we introduced a new [tag delete feature](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/23325) to delete a single tag without deleting the underlying manifest. Because this feature is not part of the Docker or OCI specifications, we have kept the previous behavior as a fallback option to maintain compatibility with third-party registries.
-However, this will likely change in the future. Apart from online garbage collection, and as described in [challenges](#challenges), the metadata database will unblock the implementation of many requested features for the GitLab Container Registry in the mid/long term. Most of these features will only be available for instances using the GitLab Container Registry. They are not part of the Docker Distribution or OCI specifications, neither we will be able to provide a compatible fallback option.
+However, this will likely change in the future. Apart from online garbage collection, and as described in [challenges](#challenges), the metadata database will unblock the implementation of many requested features for the GitLab container registry in the mid/long term. Most of these features will only be available for instances using the GitLab container registry. They are not part of the Docker Distribution or OCI specifications, neither we will be able to provide a compatible fallback option.
-For this reason, any features that require the use of the GitLab Container Registry will be disabled if using a third-party registry, for as long as third-party registries continue to be supported.
+For this reason, any features that require the use of the GitLab container registry will be disabled if using a third-party registry, for as long as third-party registries continue to be supported.
#### Synchronizing Changes With GitLab Rails
-Currently, the GitLab Rails and GitLab Container Registry releases and deployments have been fully independent, as we have not introduced any new API features or breaking changes, apart from the described tag delete feature.
+Currently, the GitLab Rails and GitLab container registry releases and deployments have been fully independent, as we have not introduced any new API features or breaking changes, apart from the described tag delete feature.
The registry will remain independent from GitLab Rails changes, but in the mid/long term, the implementation of new features or breaking changes will imply a corresponding change in GitLab Rails, so the latter will depend on a specific minimum version of the registry.
For example, to track the size of each repository, we may extend the metadata database to store that information and then propagate it to GitLab Rails by extending the HTTP API that it consumes. In GitLab Rails, this new information would likely be stored in its database and processed to offer a new feature at the UI/API level.
-This kind of changes will require a synchronization between the GitLab Rails and the GitLab Container Registry releases and deployments, as the former will depend on a specific version of the latter.
+This kind of changes will require a synchronization between the GitLab Rails and the GitLab container registry releases and deployments, as the former will depend on a specific version of the latter.
##### Feature Toggling
All GitLab Rails features dependent on a specific version of the registry should be guarded by validating the registry vendor and version.
-This is already done to determine whether a tag should be deleted using the new tag delete feature (only available in the GitLab Container Registry v2.8.1+) or the old method. In this case, GitLab Rails sends an `OPTIONS` request to the registry tag route to determine whether the `DELETE` method is supported or not.
+This is already done to determine whether a tag should be deleted using the new tag delete feature (only available in the GitLab container registry v2.8.1+) or the old method. In this case, GitLab Rails sends an `OPTIONS` request to the registry tag route to determine whether the `DELETE` method is supported or not.
Alternatively, and as the universal long-term solution, we need to determine the registry vendor, version, and supported features (the last two are only applicable if the vendor is GitLab) and persist it in the GitLab Rails database. This information can then be used in real time to toggle features or fallback to alternative methods, if possible. The initial implementation of this approach was introduced as part of [#204839](https://gitlab.com/gitlab-org/gitlab/-/issues/204839). Currently, it's only used for metrics purposes. Further improvements are required to guarantee that the version information is kept up to date in self-managed instances, where the registry may be hot swapped.
@@ -324,7 +324,7 @@ Alternatively, and as the universal long-term solution, we need to determine the
As described above, feature toggling offers a last line of defense against desynchronized releases and deployments, ensuring that GitLab Rails remains functional in case the registry version that supports new features is not yet available.
-However, the release and deployment of GitLab Rails and the GitLab Container Registry should be synchronized to avoid any delays. Contrary to GitLab Rails, the registry release and deployment are manual processes, so special attention must be paid by maintainers to ensure that the GitLab Rails changes are only released and deployed after the corresponding registry changes.
+However, the release and deployment of GitLab Rails and the GitLab container registry should be synchronized to avoid any delays. Contrary to GitLab Rails, the registry release and deployment are manual processes, so special attention must be paid by maintainers to ensure that the GitLab Rails changes are only released and deployed after the corresponding registry changes.
As a solution to strengthen this process, a file can be added to the GitLab Rails codebase, containing the minimum required version of the registry. This file should be updated with every change that depends on a specific version of the registry. It should also be considered when releasing and deploying GitLab Rails, ensuring that the pipeline only goes through once the specified minimum required registry version is deployed.
diff --git a/doc/architecture/blueprints/container_registry_metadata_database_self_managed_rollout/index.md b/doc/architecture/blueprints/container_registry_metadata_database_self_managed_rollout/index.md
index d91f2fdddbf..bfc5c4a7133 100644
--- a/doc/architecture/blueprints/container_registry_metadata_database_self_managed_rollout/index.md
+++ b/doc/architecture/blueprints/container_registry_metadata_database_self_managed_rollout/index.md
@@ -11,11 +11,11 @@ participating-stages: []
<!-- Blueprints often contain forward-looking statements -->
<!-- vale gitlab.FutureTense = NO -->
-# Container Registry Self-Managed Database Rollout
+# Container registry self-managed database rollout
## Summary
-The latest iteration of the [Container Registry](https://gitlab.com/gitlab-org/container-registry)
+The latest iteration of the [container registry](https://gitlab.com/gitlab-org/container-registry)
has been rearchitected to use a PostgreSQL database and deployed on GitLab.com.
Now we must bring the advantages provided by the database to self-managed users.
While the container registry retains the capacity to run without the new database,
@@ -45,12 +45,12 @@ of the container registry for both GitLab.com and for self-managed users.
- Progressively rollout the new dependency of a PostgreSQL database instance for the registry for charts and omnibus deployments.
- Progressively rollout automation for the registry PostgreSQL database instance for charts and omnibus deployments.
- Develop processes and tools that self-managed admins can use to migrate existing registry deployments to the metadata database.
-- Develop processes and tools that self-managed admins can use spin up fresh installs of the Container Registry which use the metadata database.
+- Develop processes and tools that self-managed admins can use spin up fresh installs of the container registry which use the metadata database.
- Create a plan which will eventually allow us to fully drop support for original object storage metadata subsystem.
### Non-Goals
-- Developing new Container Registry features outside the scope of enabling admins to migrate to the metadata database.
+- Developing new container registry features outside the scope of enabling admins to migrate to the metadata database.
- Determining lifecycle support decisions, such as when to default to the database, and when to end support for non-database registries.
## Proposal
@@ -88,7 +88,7 @@ The metadata database is in early beta for self-managed users. The core migratio
process for existing registries has been implemented, and online garbage collection
is fully implemented. Certain database enabled features are only enabled for GitLab.com
and automatic database provisioning for the registry database is not available.
-Please see the table below for the status of features related to the container
+See the table below for the status of features related to the container
registry database.
| Feature | Description | Status | Link |
@@ -161,7 +161,7 @@ import which would lead to greater consistency across all storage driver impleme
### The Import Tool
The [import tool](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/database-import-tool.md)
-is a well-validated component of the Container Registry project that we have used
+is a well-validated component of the container registry project that we have used
from the beginning as a way to perform local testing. This tool is a thin wrapper
over the core import functionality — the code which handles the import logic has
been extensively validated.
diff --git a/doc/architecture/blueprints/database/automated_query_analysis/index.md b/doc/architecture/blueprints/database/automated_query_analysis/index.md
index 40f6b2af412..cf136d7650e 100644
--- a/doc/architecture/blueprints/database/automated_query_analysis/index.md
+++ b/doc/architecture/blueprints/database/automated_query_analysis/index.md
@@ -4,7 +4,7 @@ creation-date: "2023-02-08"
authors: [ "@mattkasa", "@jon_jenkins" ]
coach: "@DylanGriffith"
approvers: [ "@rogerwoo", "@alexives" ]
-owning-stage: "~devops::data_stores"
+owning-stage: "~devops::data stores"
participating-stages: []
---
diff --git a/doc/architecture/blueprints/database/scalability/patterns/index.md b/doc/architecture/blueprints/database/scalability/patterns/index.md
index d28734ce511..88f6914b01b 100644
--- a/doc/architecture/blueprints/database/scalability/patterns/index.md
+++ b/doc/architecture/blueprints/database/scalability/patterns/index.md
@@ -1,7 +1,7 @@
---
stage: Data Stores
group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments
description: 'Learn how to scale the database through the use of best-of-class database scalability patterns'
---
diff --git a/doc/architecture/blueprints/database/scalability/patterns/read_mostly.md b/doc/architecture/blueprints/database/scalability/patterns/read_mostly.md
index 3a3fd2f33c2..562b62e1e44 100644
--- a/doc/architecture/blueprints/database/scalability/patterns/read_mostly.md
+++ b/doc/architecture/blueprints/database/scalability/patterns/read_mostly.md
@@ -1,7 +1,7 @@
---
stage: Data Stores
group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments
description: 'Learn how to scale operating on read-mostly data at scale'
---
diff --git a/doc/architecture/blueprints/database/scalability/patterns/time_decay.md b/doc/architecture/blueprints/database/scalability/patterns/time_decay.md
index 24fc3f45717..a2fbcf35a01 100644
--- a/doc/architecture/blueprints/database/scalability/patterns/time_decay.md
+++ b/doc/architecture/blueprints/database/scalability/patterns/time_decay.md
@@ -1,7 +1,7 @@
---
stage: Data Stores
group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments
description: 'Learn how to operate on large time-decay data'
---
diff --git a/doc/architecture/blueprints/database_testing/index.md b/doc/architecture/blueprints/database_testing/index.md
index 79560dd3959..8ad940d3e86 100644
--- a/doc/architecture/blueprints/database_testing/index.md
+++ b/doc/architecture/blueprints/database_testing/index.md
@@ -4,7 +4,7 @@ creation-date: "2021-02-08"
authors: [ "@abrandl" ]
coach: "@glopezfernandez"
approvers: [ "@fabian", "@craig-gomes" ]
-owning-stage: "~devops::data_stores"
+owning-stage: "~devops::data stores"
participating-stages: []
---
diff --git a/doc/architecture/blueprints/gitaly_adaptive_concurrency_limit/index.md b/doc/architecture/blueprints/gitaly_adaptive_concurrency_limit/index.md
index 2bd121a34bb..f3335a0935e 100644
--- a/doc/architecture/blueprints/gitaly_adaptive_concurrency_limit/index.md
+++ b/doc/architecture/blueprints/gitaly_adaptive_concurrency_limit/index.md
@@ -97,8 +97,8 @@ constraints and distinguishing features, including cgroup utilization and
upload-pack RPC, among others.
The proposed solution does not aim to replace the existing limits in Gitaly
-for [RPC concurrency](../../../administration/gitaly/configure_gitaly.md#limit-rpc-concurrency)
-and [pack object concurrency](../../../administration/gitaly/configure_gitaly.md#limit-pack-objects-concurrency),
+for [RPC concurrency](../../../administration/gitaly/concurrency_limiting.md#limit-rpc-concurrency)
+and [pack object concurrency](../../../administration/gitaly/concurrency_limiting.md#limit-pack-objects-concurrency),
but automatically tweak the parameters. This means
that other aspects, such as queuing, in-queue timeout, queue length,
partitioning, and scoping, will remain unchanged. The proposed solution only
diff --git a/doc/architecture/blueprints/gitaly_handle_upload_pack_in_http2_server/index.md b/doc/architecture/blueprints/gitaly_handle_upload_pack_in_http2_server/index.md
index acee83b2649..897f9f97365 100644
--- a/doc/architecture/blueprints/gitaly_handle_upload_pack_in_http2_server/index.md
+++ b/doc/architecture/blueprints/gitaly_handle_upload_pack_in_http2_server/index.md
@@ -28,7 +28,7 @@ provided. We are looking for a solution that won't require us to completely rewr
### How Git data transfer works
-Please skip this part if you are familiar with how Git data transfer architecture at GitLab.
+Skip this part if you are familiar with how Git data transfer architecture at GitLab.
Git data transfer is undeniably one of the crucial services that a Git server can offer. It is a fundamental feature of Git that was originally developed for Linux
kernel development. As Git gained popularity, it continued to be recognized as a distributed system. However, the emergence of centralized Git services like GitHub or
diff --git a/doc/architecture/blueprints/gitlab_services/img/architecture.png b/doc/architecture/blueprints/gitlab_services/img/architecture.png
index 8ec0852e12b..3bcc18f1264 100644
--- a/doc/architecture/blueprints/gitlab_services/img/architecture.png
+++ b/doc/architecture/blueprints/gitlab_services/img/architecture.png
Binary files differ
diff --git a/doc/architecture/blueprints/gitlab_services/index.md b/doc/architecture/blueprints/gitlab_services/index.md
index c2f1d08a984..8acdae9f29e 100644
--- a/doc/architecture/blueprints/gitlab_services/index.md
+++ b/doc/architecture/blueprints/gitlab_services/index.md
@@ -22,15 +22,15 @@ As GitLab works towards providing a single platform for the whole DevSecOps cycl
its offering should not stop at pipelines, but should include the deployment and release management, as well as
observability of user-developed and third party applications.
-While GitLab offers some concepts, like the `environment` syntax in GitLab pipelines,
-it does not offer any concept on what is running in a given environment. While the environment might answer the "where" is
+While GitLab offers some concepts, like the `environment` syntax in GitLab pipelines,
+it does not offer any concept on what is running in a given environment. While the environment might answer the "where" is
something running, it does not answer the question of "what" is running there. We should
-introduce [service](https://about.gitlab.com/direction/delivery/glossary.html#service) and [release artifact](https://about.gitlab.com/direction/delivery/glossary.html#release) to answer this question. The [Delivery glossary](https://about.gitlab.com/direction/delivery/glossary.html#service) defines
+introduce [service](https://about.gitlab.com/direction/delivery/glossary.html#service) and [release artifact](https://about.gitlab.com/direction/delivery/glossary.html#release) to answer this question. The [Delivery glossary](https://about.gitlab.com/direction/delivery/glossary.html#service) defines
a service as
> a logical concept that is a (mostly) independently deployable part of an application that is loosely coupled with other services to serve specific functionalities for the application.
-A service would connect to the SCM, registry or issues through release artifacts and would be a focused view into the [environments](https://about.gitlab.com/direction/delivery/glossary.html#environment) where
+A service would connect to the SCM, registry or issues through release artifacts and would be a focused view into the [environments](https://about.gitlab.com/direction/delivery/glossary.html#environment) where
a specific version of the given release artifact is deployed (or being deployed).
Having a concept of services allows our users to track their applications in production, not only in CI/CD pipelines. This opens up possibilities, like cost management.
diff --git a/doc/architecture/blueprints/gitlab_steps/data.drawio.png b/doc/architecture/blueprints/gitlab_steps/data.drawio.png
new file mode 100644
index 00000000000..59436093fb7
--- /dev/null
+++ b/doc/architecture/blueprints/gitlab_steps/data.drawio.png
Binary files differ
diff --git a/doc/architecture/blueprints/gitlab_steps/decisions/001_initial_support.md b/doc/architecture/blueprints/gitlab_steps/decisions/001_initial_support.md
new file mode 100644
index 00000000000..669e16ba625
--- /dev/null
+++ b/doc/architecture/blueprints/gitlab_steps/decisions/001_initial_support.md
@@ -0,0 +1,30 @@
+---
+owning-stage: "~devops::verify"
+description: 'GitLab Steps ADR 001: Bootstrap Step Runner'
+---
+
+# GitLab Steps ADR 001: Bootstrap Step Runner
+
+## Context
+
+[GitLab Steps](../index.md) is a new feature that does not have any prior usage at GitLab.
+We decided that there are two important objectives at this stage of the project:
+
+- Integrate the project into existing CI pipelines for the purpose of user evaluation as part of an [Experiment](../../../../policy/experiment-beta-support.md#experiment) phase.
+- Provide a contribution framework for other developers in the form of a project with contribution guidelines.
+
+## Decision
+
+The [GitLab Steps: Iteration 1: Bootstrap Step Runner (MVC)](https://gitlab.com/groups/gitlab-org/-/epics/11736)
+was created to achieve the following objectives:
+
+- We defined the initial plan to bootstrap the project.
+- The project will be stored in [`gitlab-org/step-runner`](https://gitlab.com/gitlab-org/step-runner).
+- We will implement the [Step Definition](../step-definition.md) as a [Protocol Buffer](https://protobuf.dev/). The initial implementation is described in the [Baseline Step Proto](../implementation.md).
+- Usage of [Protocol Buffers](https://protobuf.dev/) will provide strong guards for the minimal required definition to be used by the project.
+- We will provide documentation on how to use GitLab Steps in existing CI pipelines.
+
+## Alternatives
+
+No alternatives were considered at this phase, since there's no pre-existing work at GitLab
+for that type of feature.
diff --git a/doc/architecture/blueprints/gitlab_steps/implementation.md b/doc/architecture/blueprints/gitlab_steps/implementation.md
new file mode 100644
index 00000000000..d8480cfb2be
--- /dev/null
+++ b/doc/architecture/blueprints/gitlab_steps/implementation.md
@@ -0,0 +1,339 @@
+---
+owning-stage: "~devops::verify"
+description: Implementation details for [CI Steps](index.md).
+---
+
+# Design and implementation details
+
+## Baseline Step Proto
+
+The internals of Step Runner operate on the baseline step definition
+which is defined in Protocol Buffer. All GitLab CI steps (and other
+supported formats such as GitHub Actions) compile / fold to baseline steps.
+Both step invocations in `.gitlab-ci.yml` and step definitions
+in `step.yml` files will be compiled to baseline structures.
+The term "step" means "baseline step" for the remainder of this document.
+
+Each step includes a reference `ref` in the form of a URI. The method of
+retrieval is determined by the protocol of the URI.
+
+Steps and step traces have fields for inputs, outputs,
+environment variables and environment exports.
+After steps are downloaded and the `step.yml` is parsed
+a step definition `def` will be added.
+If a step defines multiple additional steps then the
+trace will include sub-traces for each sub-step.
+
+```protobuf
+message Step {
+ string name = 1;
+ string step = 2;
+ map<string,string> env = 3;
+ map<string,google.protobuf.Value> inputs = 4;
+}
+
+message Definition {
+ DefinitionType type = 1;
+ Exec exec = 2;
+ repeated Step steps = 3;
+ message Exec {
+ repeated string command = 1;
+ string work_dir = 2;
+ }
+}
+
+enum DefinitionType {
+ definition_type_unspecified = 0;
+ exec = 1;
+ steps = 2;
+}
+
+message Spec {
+ Content spec = 1;
+ message Content {
+ map<string,Input> inputs = 1;
+ message Input {
+ InputType type = 1;
+ google.protobuf.Value default = 2;
+ }
+ }
+}
+
+enum InputType {
+ spec_type_unspecified = 0;
+ string = 1;
+ number = 2;
+ bool = 3;
+ struct = 4;
+ list = 5;
+}
+
+message StepResult {
+ Step step = 1;
+ Spec spec = 2;
+ Definition def = 3;
+ enum Status {
+ unspecified = 0;
+ running = 1;
+ success = 2;
+ failure = 3;
+ }
+ Status status = 4;
+ map<string,Output> outputs = 5;
+ message Output {
+ string key = 1;
+ string value = 2;
+ bool masked = 3;
+ }
+ map<string,string> exports = 6;
+ int32 exit_code = 7;
+ repeated StepResult children_step_results = 8;
+}
+```
+
+## Step Caching
+
+Steps are cached locally by a key comprised of `location`
+(URL), `version` and `hash`. This prevents the exact same component
+from being downloaded multiple times. The first time a step is
+referenced it will be downloaded (unless local) and the cache will
+return the path to the folder containing `step.yml` and the other
+step files. If the same step is referenced again, the same folder
+will be returned without downloading.
+
+If a step is referenced which differs by version or hash from another
+cached step, it will be re-downloaded into a different folder and
+cached separately.
+
+## Execution Context
+
+State is kept by Step Runner across all steps in the form of
+an execution context. The context contains the output of each step,
+environment variables and overall job and environment metadata.
+The execution context can be referenced by expressions in
+GitLab CI steps provided by the workflow author.
+
+Example of context available to expressions in `.gitlab-ci.yml`:
+
+```yaml
+steps:
+ previous_step:
+ outputs:
+ name: "hello world"
+env:
+ EXAMPLE_VAR: "bar"
+job:
+ id: 1234
+```
+
+Expressions in step definitions can also reference execution
+context. However they can only access overall
+job and environment metadata and the inputs defined in `step.yml`.
+They cannot access the outputs of previous steps. In order to
+provide the output of one step to the next, the step input
+values should include an expression which references another
+step's output.
+
+Example of context available to expressions in `step.yml`:
+
+```yaml
+inputs:
+ name: "foo"
+env:
+ EXAMPLE_VAR: "bar"
+job:
+ id: 1234
+```
+
+E.g. this is not allowed in a `step.yml file` because steps
+should not couple to one another.
+
+```yaml
+spec:
+ inputs:
+ name:
+---
+type: exec
+exec:
+ command: [echo, hello, ${{ steps.previous_step.outputs.name }}]
+```
+
+This is allowed because the GitLab CI steps syntax passes data
+from one step to another:
+
+```yaml
+spec:
+ inputs:
+ name:
+---
+type: exec
+exec:
+ command: [echo, hello, ${{ inputs.name }}]
+```
+
+```yaml
+steps:
+- name: previous_step
+ ...
+- name: greeting
+ inputs:
+ name: ${{ steps.previous_step.outputs.name }}
+```
+
+Therefore evaluation of expressions will done in two different kinds
+of context. One as a GitLab CI Step and one as a step definition.
+
+### Step Inputs
+
+Step inputs can be given in several ways. They can be embeded
+directly into expressions in an `exec` command (as above). Or they
+can be embedded in expressions for environment variables set during
+exec:
+
+```yaml
+spec:
+ inputs:
+ name:
+---
+type: exec
+exec:
+ command: [greeting.sh]
+env:
+ NAME: ${{ inputs.name }}
+```
+
+### Input Types
+
+Input values are stored as strings. But they can also have a type
+associated with them. Supported types are:
+
+- `string`
+- `bool`
+- `number`
+- `object`
+
+String type values can be any string. Bool type values must be either `true`
+or `false` when parsed as JSON. Number type values must a valid float64
+when parsed as JSON. Object types will be a JSON serialization of
+the YAML input structure.
+
+For example, these would be valid inputs:
+
+```yaml
+steps:
+- name: my_step
+ inputs:
+ foo: bar
+ baz: true
+ bam: 1
+```
+
+Given this step definition:
+
+```yaml
+spec:
+ inputs:
+ foo:
+ type: string
+ baz:
+ type: bool
+ bam:
+ type: number
+---
+type: exec
+exec:
+ command: [echo, ${{ inputs.foo }}, ${{ inputs.baz }}, ${{ inputs.bam }}]
+```
+
+And it would output `bar true 1`
+
+For an object type, these would be valid inputs:
+
+```yaml
+steps:
+ name: my_step
+ inputs:
+ foo:
+ steps:
+ - name: my_inner_step
+ inputs:
+ name: steppy
+```
+
+Given this step definition:
+
+```yaml
+spec:
+ inputs:
+ foo:
+ type: object
+---
+type: exec
+exec:
+ command: [echo, ${{ inputs.foo }}]
+```
+
+And it would output `{"steps":[{"name":"my_inner_step","inputs":{"name":"steppy"}}]}`
+
+### Outputs
+
+Output files are created into which steps can write their
+outputs and environment variable exports. The file locations are
+provided in `OUTPUT_FILE` and `ENV_FILE` environment variables.
+
+After execution Step Runner will read the output and environment
+variable files and populate the trace with their values. The
+outputs will be stored under the context for the executed step.
+And the exported environment variables will be merged with environment
+provided to the next step.
+
+Some steps can be of type `steps` and be composed of a sequence
+of GitLab CI steps. These will be compiled and executed in sequence.
+Any environment variables exported by nested steps will be available
+to subsequent steps. And will be available to high level steps
+when the nested steps are complete. E.g. entering nested steps does
+not create a new "scope" or context object. Environment variables
+are global.
+
+## Containers
+
+We've tried a couple approaches to running steps in containers.
+In end we've decided to delegate steps entirely to a step runner
+in the container.
+
+Here are the options considered:
+
+### Delegation (chosen option)
+
+A provision is made for passing complex structures to steps, which
+is to serialize them as JSON (see Inputs above). In this way the actual
+step to be run can be merely a parameter to step running in container.
+So the outer step is a `docker/run` step with a command that executes
+`step-runner` with a `steps` input parameter. The `docker/run` step will
+run the container and then extract the output files from the container
+and re-emit them to the outer steps.
+
+This same technique will work for running steps in VMs or whatever.
+Step Runner doesn't have to know anything about containerizing or
+isolation steps.
+
+### Special Compilation (rejected option)
+
+When we see the `image` keyword in a GitLab CI step we would download
+and compile the "target" step. Then manufacture a `docker/run` step
+and pass the complied `exec` command as an input. Then we would compile
+the `docker/run` step and execute it.
+
+However this requires Step Runner to know how to construct a `docker/run`
+step. Which couples Step Runner with the method of isolation, making
+isolation in VMs and other methods more complicated.
+
+### Native Docker (rejected option)
+
+The baseline step can include provisions for running a step in a
+Docker container. For example the step could include a `ref` "target"
+field and an `image` field.
+
+However this also couples Step Runner with Docker and expands the role
+of Step Runner. It is preferable to make Docker an external step
+that Step Runner execs in the same way as any other step.
diff --git a/doc/architecture/blueprints/gitlab_steps/index.md b/doc/architecture/blueprints/gitlab_steps/index.md
index 5e3becfec19..f43af46d3c7 100644
--- a/doc/architecture/blueprints/gitlab_steps/index.md
+++ b/doc/architecture/blueprints/gitlab_steps/index.md
@@ -1,7 +1,7 @@
---
status: proposed
creation-date: "2023-08-23"
-authors: [ "@ayufan" ]
+authors: [ "@ayufan", "@josephburnett" ]
coach: "@grzegorz"
approvers: [ "@dhershkovitch", "@DarrenEastman", "@cheryl.li" ]
owning-stage: "~devops::verify"
@@ -57,7 +57,7 @@ have to be part of CI syntax. Instead, they can be provided in the form of reusa
that are configured in a generic way in the CI config, and later downloaded and executed according
to inputs and parameters.
-The GitLab Steps is meant to fill that product-gap by following similar model to competitors
+GitLab Steps is meant to fill that product-gap by following similar model to competitors
and to some extent staying compatible with them. The GitLab Steps is meant to replace all
purpose-specific syntax to handle specific features. By providing and using reusable components,
that are build outside of `.gitlab-ci.yml`, that are versioned, and requested when needed
@@ -131,19 +131,109 @@ TBD
## Proposal
+Step Runner will be a new go binary which lives at `https://gitlab.com/gitlab-org/step-runner`.
+It will be able to accept a number of input formats which are compiled to a standard proto format.
+Output will be a standard proto trace which will include details for debugging and reproducing the build.
+
+### Capabilities
+
+- Read steps
+ - from environment variable
+ - from `.gitlab-ci.yml` file
+ - from gRPC server in step-runner
+ - from commandline JSON input
+- Compile GitLab Steps and GitHub Actions to a baseline step definition
+ - explicit inputs and outputs
+ - explicit environment and exports
+ - baseline steps can be type `exec` or more steps
+- Download and run steps from:
+ - Git repos
+ - zip files
+ - locally provided
+- A job can be composed of different kinds of steps
+ - steps can come from different sources and be run in different ways
+ - steps can access environment exports and output of previous steps
+- Produce a step-by-step trace of execution
+ - including final inputs and outputs
+ - including final environment and exports
+ - including logs of each step
+ - each step specifies the exact runtime and component used (hash)
+ - (optional) masking sensitive inputs, outputs, environment and exports
+- Replaying a trace
+ - reuses the exact runtimes and components from trace
+ - output of trace will be the same trace if build is deterministic
+
+### Example invocations
+
+#### Command line
+
+- `STEPS=$(cat steps.yml) step-runner ci`
+- `step-runner local .gitlab-ci.yml --format gitlab-ci --job-name hello-world --output-file trace.json`
+- `step-runner replay trace.json`
+- `step-runner ci --port 8080`
+
+#### GitLab CI
+
+```yaml
+hello-world:
+ image: registry.gitlab.com/gitlab-org/step-runner
+ variables:
+ STEPS: |
+ - step: gitlab.com/josephburnett/component-hello-steppy@master
+ inputs:
+ greeting: "hello ${{ env.name }}"
+ env:
+ name: world
+ script:
+ - /step-runner ci
+ artifacts:
+ paths:
+ - trace.json
+```
+
+### Basic compilation and execution process
+
+Steps as expressed in GitLab CI are complied to the baseline step definition.
+Referenced steps are loaded and compiled to produce an `exec` command,
+or to produce an additional list of GitLab CI steps which are compiled recursively.
+Each steps is executed immediately after compilation so its output will be available for subsequent compilations.
+
+![diagram of data during compilation](data.drawio.png)
+
+Steps return outputs and exports via files which are collected by Step Runner after each step.
+Finally all the compiled inputs and outputs for each step are collected in a step trace.
+
+![sequenced diagram of step runner compilation and execution](step-runner-sequence.drawio.png)
+
### GitLab Steps definition and syntax
- [Step Definition](step-definition.md).
- [Syntactic Sugar extensions](steps-syntactic-sugar.md).
-### Integration of GitLab Steps in `.gitlab-ci.yml`
+### Integration of GitLab Steps
- [Usage of the GitLab Steps with `.gitlab-ci.yml`](gitlab-ci.md).
+- [Runner Integration](runner-integration.md).
## Design and implementation details
-TBD
+### 2023-11-28 - GitLab Steps ADR 001: Bootstrap Step Runner
+
+- See the [GitLab Steps ADR 001: Bootstrap Step Runner](decisions/001_initial_support.md).
+- See the [Baseline Step Proto](implementation.md).
## References
+- [GitLab Issue #215511](https://gitlab.com/gitlab-org/gitlab/-/issues/215511)
+- [Step Runner Code](https://gitlab.com/josephburnett/step-runner/-/tree/blueprint2).
+ This is the exploratory code created during the writing of this blueprint.
+ It shows the structure of the Step Runner binary and how the pieces fit together.
+ It runs but doesn't quite do the right thing (see all the TODOs).
+- [CI Steps / CI Events / Executors / Taskonaut (video)](https://youtu.be/nZoO547IISM).
+ Some high-level discussion about how these 4 blueprints relate to each other.
+ And a good prequel to the video about this MR.
+- [Steps in Runner (video)](https://youtu.be/82WLQ4zHYts).
+ A walk through of the Step Runner details from the code perspective.
+- [CI YAML keywords](https://gitlab.com/gitlab-org/gitlab/-/issues/398129#note_1324467337).
+ An inventory of affected keywords.
- [GitLab Epic 11535](https://gitlab.com/groups/gitlab-org/-/epics/11535)
diff --git a/doc/architecture/blueprints/gitlab_steps/runner-integration.md b/doc/architecture/blueprints/gitlab_steps/runner-integration.md
new file mode 100644
index 00000000000..e5a635908bc
--- /dev/null
+++ b/doc/architecture/blueprints/gitlab_steps/runner-integration.md
@@ -0,0 +1,116 @@
+---
+owning-stage: "~devops::verify"
+description: Runner integration for [CI Steps](index.md).
+---
+
+# Runner Integration
+
+Steps are delivered to Step Runner as a YAML blob in the GitLab CI syntax.
+Runner interacts with Step Runner over a gRPC service `StepRunner`
+which is started on a local socket in the execution environment. This
+is the same way that Nesting serves a gRPC service in a dedicated
+Mac instance. The service has three RPCs, `run`, `follow` and `cancel`.
+
+Run is the initial delivery of the steps. Follow requests a streaming
+response to step traces. And Cancel stops execution and cleans up
+resources as soon as possible.
+
+Step Runner operating in gRPC mode will be able to executed multiple
+step payloads at once. That is each call to `run` will start a new
+goroutine and execute the steps until completion. Multiple calls to `run`
+may be made simultaneously. This is also why components are cached by
+`location`, `version` and `hash`. Because we cannot be changing which
+ref we are on while multiple, concurrent executions are using the
+underlying files.
+
+```proto
+service StepRunner {
+ rpc Run(RunRequest) returns (RunResponse);
+ rpc Follow(FollowRequest) returns (stream FollowResponse);
+ rpc Cancel(CancelRequest) returns (CancelResponse);
+}
+
+message RunRequest {
+ string id = 1;
+ oneof job_oneof {
+ string ci_job = 2;
+ Steps steps = 3;
+ }
+}
+
+message RunResponse {
+}
+
+message FollowRequest {
+ string id = 1;
+}
+
+message FollowResponse {
+ StepResult result = 1;
+}
+
+message CancelRequest {
+ string id = 1;
+}
+
+message CancelResponse {
+}
+```
+
+As steps are executed, traces are streamed back to GitLab Runner.
+So execution can be followed at least at the step level. If a more
+granular follow is required, we can introduce a gRPC step type which
+can stream back logs as they are produced.
+
+Here is how we will connect to Step Runner in each runner executor:
+
+## Instance
+
+The Instance executor is accessed via SSH, the same as today. However
+instead of starting a bash shell and piping in commands, it connects
+to the Step Runner socket in a known location and makes gRPC
+calls. This is the same as how Runner calls the Nesting server in
+dedicated Mac instances to make VMs.
+
+This requires that Step Runner is present and started in the job
+execution environment.
+
+## Docker
+
+The same requirement that Step Runner is present and started is true
+for the Docker executor (and `docker-autoscaler`). However in order to
+connect to the socket inside the container, we must `exec` a bridge
+process in the container. This will be another command on the Step
+Runner binary which proxies STDIN and STDOUT to the local socket in a
+known location, allowing the caller of exec to make gRPC calls inside
+the container.
+
+## Kubernetes
+
+The Kubelet on Kubernetes Nodes exposes an exec API which will start a
+process in a container of a running Pod. We will use this exec create
+a bridge process that will allow the caller to make gRPC calls inside
+the Pod. Same as the Docker executor.
+
+In order to access to this protected Kubelet API we must use the
+Kubernetes API which provides an exec sub-resource on Pod. A caller
+can POST to the URL of a pod suffixed with `/exec` and then negotiate
+the connection up to a SPDY protocol for bidirectional byte
+streaming. So GitLab Runner can use the Kubernetes API to connect to
+the Step Runner service and deliver job payloads.
+
+This is the same way that `kubectl exec` works. In fact most of the
+internals such as SPDY negotiation are provided as `client-go`
+libraries. So Runner can call the Kubernetes API directly by
+importing the necessary libraries rather than shelling out to
+Kubectl.
+
+Historically one of the weaknesses of the Kubernetes Executor was
+running a whole job through a single exec. To mitigate this Runner
+uses the attach command instead, which can "re-attach" to an existing
+shell process and pick up where it left off.
+
+This is not necessary for Step Runner however, because the exec is
+just establishing a bridge to the long-running gRPC process. If the
+connection drops, Runner will just "re-attach" by exec'ing another
+connection and continuing to make RPC calls like `follow`.
diff --git a/doc/architecture/blueprints/gitlab_steps/step-runner-sequence.drawio.png b/doc/architecture/blueprints/gitlab_steps/step-runner-sequence.drawio.png
new file mode 100644
index 00000000000..9f6a6dcad9f
--- /dev/null
+++ b/doc/architecture/blueprints/gitlab_steps/step-runner-sequence.drawio.png
Binary files differ
diff --git a/doc/architecture/blueprints/google_artifact_registry_integration/index.md b/doc/architecture/blueprints/google_artifact_registry_integration/index.md
index ef66ae33b2a..0419601e266 100644
--- a/doc/architecture/blueprints/google_artifact_registry_integration/index.md
+++ b/doc/architecture/blueprints/google_artifact_registry_integration/index.md
@@ -18,9 +18,9 @@ As highlighted in the announcement, one key goal is the ability to "_use Google'
## Motivation
-Please refer to the [announcement](https://about.gitlab.com/blog/2023/08/29/gitlab-google-partnership-s3c/) blog post for more details about the motivation and long-term goals of the GitLab and Google Cloud partnership.
+Refer to the [announcement](https://about.gitlab.com/blog/2023/08/29/gitlab-google-partnership-s3c/) blog post for more details about the motivation and long-term goals of the GitLab and Google Cloud partnership.
-Regarding the scope of this design document, our primary focus is to fulfill the Product requirement of providing users with visibility over their container images in GAR. The motivation for this specific goal is rooted in foundational research on the use of external registries as a complement to the GitLab Container Registry ([internal](https://gitlab.com/gitlab-org/ux-research/-/issues/2602)).
+Regarding the scope of this design document, our primary focus is to fulfill the Product requirement of providing users with visibility over their container images in GAR. The motivation for this specific goal is rooted in foundational research on the use of external registries as a complement to the GitLab container registry ([internal](https://gitlab.com/gitlab-org/ux-research/-/issues/2602)).
Since this marks the first step in the GAR integration, our aim is to achieve this goal in a way that establishes a foundation to facilitate reusability in the future. This groundwork could benefit potential future expansions, such as support for additional artifact formats (npm, Maven, etc.), and features beyond the Package stage (e.g., vulnerability scanning, deployments, etc.).
@@ -74,7 +74,7 @@ As previously highlighted, access to the GAR integration features is restricted
#### Resource Mapping
-For the [GitLab Container Registry](../../../user/packages/container_registry/index.md), repositories within a specific project must have a path that matches the project full path. This is essentially how we establish a resource mapping between GitLab Rails and the registry, which serves multiple purposes, including granular authorization, scoping storage usage to a given project/group/namespace, and more.
+For the [GitLab container registry](../../../user/packages/container_registry/index.md), repositories within a specific project must have a path that matches the project full path. This is essentially how we establish a resource mapping between GitLab Rails and the registry, which serves multiple purposes, including granular authorization, scoping storage usage to a given project/group/namespace, and more.
Regarding the GAR integration, since there is no equivalent entities for GitLab project/group/namespace resources on the GAR side, we aim to simplify matters by allowing users to attach any [GAR repository](https://cloud.google.com/artifact-registry/docs/repositories) to any GitLab project, regardless of their respective paths. Similarly, we do not plan to restrict the attachment of a particular GAR repository to a single GitLab project. Ultimately, it is up to users to determine how to organize both datasets in the way that best suits their needs.
@@ -82,13 +82,13 @@ Regarding the GAR integration, since there is no equivalent entities for GitLab
GAR provides three APIs: Docker API, REST API, and RPC API.
-The [Docker API](https://cloud.google.com/artifact-registry/docs/reference/docker-api) is based on the [Docker Registry HTTP API V2](https://docs.docker.com/registry/spec/api), now superseded by the [OCI Distribution Specification API](https://github.com/opencontainers/distribution-spec/blob/main/spec.md) (from now on referred to as OCI API). This API is used for pushing/pulling images to/from GAR and also provides some discoverability operations. Please refer to [Alternative Solutions](#alternative-solutions) for the reasons why we don't intend to use it.
+The [Docker API](https://cloud.google.com/artifact-registry/docs/reference/docker-api) is based on the [Docker Registry HTTP API V2](https://docs.docker.com/registry/spec/api), now superseded by the [OCI Distribution Specification API](https://github.com/opencontainers/distribution-spec/blob/main/spec.md) (from now on referred to as OCI API). This API is used for pushing/pulling images to/from GAR and also provides some discoverability operations. Refer to [Alternative Solutions](#alternative-solutions) for the reasons why we don't intend to use it.
Among the proprietary GAR APIs, the [REST API](https://cloud.google.com/artifact-registry/docs/reference/rest) provides basic functionality for managing repositories. This includes [`list`](https://cloud.google.com/artifact-registry/docs/reference/rest/v1/projects.locations.repositories.dockerImages/list) and [`get`](https://cloud.google.com/artifact-registry/docs/reference/rest/v1/projects.locations.repositories.dockerImages/get) operations for container image repositories, which could be used for this integration. Both operations return the same data structure, represented by the [`DockerImage`](https://cloud.google.com/artifact-registry/docs/reference/rest/v1/projects.locations.repositories.dockerImages#DockerImage) object, so both provide the same level of detail.
Last but not least, there is also an [RPC API](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1), backed by gRPC and Protocol Buffers. This API provides the most functionality, covering all GAR features. From the available operations, we can make use of the [`ListDockerImagesRequest`](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1#listdockerimagesrequest) and [`GetDockerImageRequest`](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1#google.devtools.artifactregistry.v1.GetDockerImageRequest) operations. As with the REST API, both responses are composed of [`DockerImage`](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1#google.devtools.artifactregistry.v1.DockerImage) objects.
-Between the two proprietary API options, we chose the RPC one because it provides support not only for the operations we need today but also offers better coverage of all GAR features, which will be beneficial in future iterations. Finally, we do not intend to make direct use of this API but rather use it through the official Ruby client SDK. Please see [Client SDK](backend.md#client-sdk) below for more details.
+Between the two proprietary API options, we chose the RPC one because it provides support not only for the operations we need today but also offers better coverage of all GAR features, which will be beneficial in future iterations. Finally, we do not intend to make direct use of this API but rather use it through the official Ruby client SDK. See [Client SDK](backend.md#client-sdk) below for more details.
#### Backend Integration
@@ -116,6 +116,6 @@ One alternative solution considered was to use the Docker/OCI API provided by GA
- **Multiple Requests**: To retrieve all the required information about each image, multiple requests to different endpoints (listing tags, obtaining image manifests, and image configuration blobs) would have been necessary, leading to a `1+N` performance issue.
-GitLab had previously faced significant challenges with the last two limitations, prompting the development of a custom [GitLab Container Registry API](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/spec/gitlab/api.md) to address them. Additionally, GitLab decided to [deprecate support](../../../update/deprecations.md#use-of-third-party-container-registries-is-deprecated) for connecting to third-party container registries using the Docker/OCI API due to these same limitations and the increased cost of maintaining two solutions in parallel. As a result, there is an ongoing effort to replace the use of the Docker/OCI API endpoints with custom API endpoints for all container registry functionalities in GitLab.
+GitLab had previously faced significant challenges with the last two limitations, prompting the development of a custom [GitLab container registry API](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/spec/gitlab/api.md) to address them. Additionally, GitLab decided to [deprecate support](../../../update/deprecations.md#use-of-third-party-container-registries-is-deprecated) for connecting to third-party container registries using the Docker/OCI API due to these same limitations and the increased cost of maintaining two solutions in parallel. As a result, there is an ongoing effort to replace the use of the Docker/OCI API endpoints with custom API endpoints for all container registry functionalities in GitLab.
Considering these factors, the decision was made to build the GAR integration from scratch using the proprietary GAR API. This approach provides more flexibility and control over the integration and can serve as a foundation for future expansions, such as support for other GAR artifact formats.
diff --git a/doc/architecture/blueprints/google_artifact_registry_integration/ui_ux.md b/doc/architecture/blueprints/google_artifact_registry_integration/ui_ux.md
index 5cb862d50e7..f82889b9ccf 100644
--- a/doc/architecture/blueprints/google_artifact_registry_integration/ui_ux.md
+++ b/doc/architecture/blueprints/google_artifact_registry_integration/ui_ux.md
@@ -8,9 +8,9 @@ description: 'UI/UX for Google Artifact Registry Integration'
## Structure and Organization
-Unlike the GitLab Container Registry (and therefore the Docker Registry and OCI Distribution), GAR does not treat tags as the primary "artifacts" in a repository. Instead, the primary "artifacts" are the image manifests. For each manifest object (represented by [`DockerImage`](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1#google.devtools.artifactregistry.v1.DockerImage)), there is a list of assigned tags (if any). Consequently, when listing the contents of a repository through the GAR API, the response comprises a collection of manifest objects (along with their associated tags as properties), rather than a collection of tag objects. Additionally, due to this design choice, untagged manifests are also present in the response.
+Unlike the GitLab container registry (and therefore the Docker Registry and OCI Distribution), GAR does not treat tags as the primary "artifacts" in a repository. Instead, the primary "artifacts" are the image manifests. For each manifest object (represented by [`DockerImage`](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1#google.devtools.artifactregistry.v1.DockerImage)), there is a list of assigned tags (if any). Consequently, when listing the contents of a repository through the GAR API, the response comprises a collection of manifest objects (along with their associated tags as properties), rather than a collection of tag objects. Additionally, due to this design choice, untagged manifests are also present in the response.
-To maximize flexibility, extensibility, and maintain familiarity for GAR users, we plan to fully embrace the GAR API data structures while surfacing data in the GitLab UI. We won't attempt to emulate a "list of tags" response to match the UI/UX that we already have for the GitLab Container Registry.
+To maximize flexibility, extensibility, and maintain familiarity for GAR users, we plan to fully embrace the GAR API data structures while surfacing data in the GitLab UI. We won't attempt to emulate a "list of tags" response to match the UI/UX that we already have for the GitLab container registry.
Considering the above, there will be a view that provides a pageable and sortable list of all images in the configured GAR repository. Additionally, there will be a detail view to display more information about a single image. You can find a list of available image attributes documented [here](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1#google.devtools.artifactregistry.v1.DockerImage).
diff --git a/doc/architecture/blueprints/google_cloud_platform_integration/index.md b/doc/architecture/blueprints/google_cloud_platform_integration/index.md
new file mode 100644
index 00000000000..7fe2c1f655a
--- /dev/null
+++ b/doc/architecture/blueprints/google_cloud_platform_integration/index.md
@@ -0,0 +1,34 @@
+---
+status: ongoing
+creation-date: "2023-10-26"
+authors: [ "@sgoldstein" ]
+coaches: ["@jessieay", "@grzesiek"]
+approvers: ["@sgoldstein", "@jreporter"]
+owning-stage: "~section::ops"
+participating-stages: ["~devops::verify", "~devops::package", "~devops::govern"]
+---
+
+# Google Cloud Platform Integration
+
+GitLab and Google Cloud Platform (GCP) provide complementary tooling which we
+are integrating via our [partnership](https://about.gitlab.com/blog/2023/08/29/gitlab-google-partnership-s3c/).
+
+This design doc is not public at that time. The whole content is
+available in [GitLab-internal project](https://gitlab.com/gitlab-org/architecture/gitlab-gcp-integration/design-doc).
+
+## Who
+
+<!-- vale gitlab.Spelling = NO -->
+
+| Who | Role
+|------------------------|--------------------------------------------------|
+| Sam Goldstein | Director of Engineering, Engineering DRI |
+| Grzegorz Bizon | Distinguished Engineer - Technical Lead |
+| Jessie Young | Principal Engineer |
+| David Fernandez | Staff Engineer |
+| Imre Farkas | Staff Engineer |
+| João Pereira | Staff Engineer |
+| Joe Burnett | Staff Engineer |
+| Tomasz Maczukin | Senior Engineer |
+
+<!-- vale gitlab.Spelling = YES -->
diff --git a/doc/architecture/blueprints/new_diffs.md b/doc/architecture/blueprints/new_diffs.md
index af1e4679c14..d5768f6c914 100644
--- a/doc/architecture/blueprints/new_diffs.md
+++ b/doc/architecture/blueprints/new_diffs.md
@@ -1,132 +1,11 @@
---
-status: proposed
-creation-date: "2023-10-10"
-authors: [ "@iamphill" ]
-coach: [ "@ntepluhina" ]
-approvers: [ ]
-owning-stage: "~devops::create"
-participating-stages: []
+redirect_to: 'new_diffs/index.md'
+remove_date: '2024-02-29'
---
-<!-- Blueprints often contain forward-looking statements -->
-<!-- vale gitlab.FutureTense = NO -->
+This document was moved to [another location](new_diffs/index.md).
-# New diffs
-
-## Summary
-
-Diffs at GitLab are spread across several places with each area using their own method. We are aiming
-to develop a single, performant way for diffs to be rendered across the application. Our aim here is
-to improve all areas of diff rendering, from the backend creation of diffs to the frontend rendering
-the diffs.
-
-## Motivation
-
-### Goals
-
-- improved perceived performance
-- improved maintainability
-- consistent coverage of all scenarios
-
-### Non-Goals
-
-<!--
-Listing non-goals helps to focus discussion and make progress. This section is
-optional.
-
-- What is out of scope for this blueprint?
--->
-
-### Priority of Goals
-
-In an effort to provide guidance on which goals are more important than others to assist in making
-consistent choices, despite all goals being important, we defined the following order.
-
-**Perceived performance** is above **improved maintainability** is above **consistent coverage**.
-
-Examples:
-
-- a proposal improves maintainability at the cost of perceived performance: ❌ we should consider an alternative.
-- a proposal removes a feature from certain contexts, hurting coverage, and has no impact on perceived performance or maintanability: ❌ we should re-consider.
-- a proposal improves perceived performance but removes features from certain contexts of usage: ✅ it's valid and should be discussed with Product/UX.
-- a proposal guarantees consistent coverage and has no impact on perceived performance or maintainability: ✅ it's valid.
-
-In essence, we'll strive to meet every goal at each decision but prioritise the higher ones.
-
-## Proposal
-
-<!--
-This is where we get down to the specifics of what the proposal actually is,
-but keep it simple! This should have enough detail that reviewers can
-understand exactly what you're proposing, but should not include things like
-API designs or implementation. The "Design Details" section below is for the
-real nitty-gritty.
-
-You might want to consider including the pros and cons of the proposed solution so that they can be
-compared with the pros and cons of alternatives.
--->
-
-## Design and implementation details
-
-### Workspace & Artifacts
-
-- We will store implementation details like metrics, budgets, and development & architectural patterns here in the docs
-- We will store large bodies of research, the results of audits, etc. in the [wiki](https://gitlab.com/gitlab-com/create-stage/new-diffs/-/wikis/home) of the [New Diffs project](https://gitlab.com/gitlab-com/create-stage/new-diffs)
-- We will store audio & video recordings on the public Youtube channel in the Code Review / New Diffs playlist
-- We will store drafts, meeting notes, and other temporary documents in public Google docs
-
-### Definitions
-
-#### Maintainability
-
-Maintainable projects are _simple_ projects.
-
-Simplicity is the opposite of complexity. This uses a definition of simple and complex [described by Rich Hickey in "Simple Made Easy"](https://www.infoq.com/presentations/Simple-Made-Easy/) (Strange Loop, 2011).
-
-- Maintainable code is simple (single task, single concept, separate from other things).
-- Maintainable projects expand on simple code by having simple structure (folders define classes of behaviors, e.g. you can be assured that a component directory will never initiate a network call, because that would be complecting visual display with data access)
-- Maintainable applications flow out of simple organization and simple code. The old saying is a cluttered desk is representative of a cluttered mind. Rigorous discipline on simplicity will be represented in our output (the product). By being strict about working simply, we will naturally produce applications where our users can more easily reason about their behavior.
-
-#### Done
-
-GitLab has an existing [definition of done](/ee/development/contributing/merge_request_workflow.md#definition-of-done) which is geared primarily toward identifying when an MR is ready to be merged.
-
-In addition to the items in the GitLab definition of done, work on new diffs should also adhere to the following requirements:
-
-- Meets or exceeds all metrics
- - Meets or exceeds our minimum accessibility metrics (these are explicitly not part of our defined priorities, since they are non-negotiable)
-- All work is fully documented for engineers (user documentation is a requirement of the standard definition of done)
-
-<!--
-This section should contain enough information that the specifics of your
-change are understandable. This may include API specs (though not always
-required) or even code snippets. If there's any ambiguity about HOW your
-proposal will be implemented, this is the place to discuss them.
-
-If you are not sure how many implementation details you should include in the
-blueprint, the rule of thumb here is to provide enough context for people to
-understand the proposal. As you move forward with the implementation, you may
-need to add more implementation details to the blueprint, as those may become
-an important context for important technical decisions made along the way. A
-blueprint is also a register of such technical decisions. If a technical
-decision requires additional context before it can be made, you probably should
-document this context in a blueprint. If it is a small technical decision that
-can be made in a merge request by an author and a maintainer, you probably do
-not need to document it here. The impact a technical decision will have is
-another helpful information - if a technical decision is very impactful,
-documenting it, along with associated implementation details, is advisable.
-
-If it's helpful to include workflow diagrams or any other related images.
-Diagrams authored in GitLab flavored markdown are preferred. In cases where
-that is not feasible, images should be placed under `images/` in the same
-directory as the `index.md` for the proposal.
--->
-
-## Alternative Solutions
-
-<!--
-It might be a good idea to include a list of alternative solutions or paths considered, although it is not required. Include pros and cons for
-each alternative solution/path.
-
-"Do nothing" and its pros and cons could be included in the list too.
--->
+<!-- This redirect file can be deleted after <2024-02-29>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/architecture/blueprints/new_diffs/index.md b/doc/architecture/blueprints/new_diffs/index.md
new file mode 100644
index 00000000000..2a3010259d5
--- /dev/null
+++ b/doc/architecture/blueprints/new_diffs/index.md
@@ -0,0 +1,431 @@
+---
+status: proposed
+creation-date: "2023-10-10"
+authors: [ "@thomasrandolph", "@patrickbajao", "@igor.drozdov", "@jerasmus", "@iamphill", "@slashmanov", "@psjakubowska" ]
+coach: [ "@ntepluhina" ]
+approvers: [ ]
+owning-stage: "~devops::create"
+participating-stages: []
+---
+
+<!-- Blueprints often contain forward-looking statements -->
+<!-- vale gitlab.FutureTense = NO -->
+
+# New diffs
+
+## Summary
+
+Diffs at GitLab are spread across several places with each area using their own method. We are aiming
+to develop a single, performant way for diffs to be rendered across the application. Our aim here is
+to improve all areas of diff rendering, from the backend creation of diffs to the frontend rendering
+the diffs.
+
+## Motivation
+
+### Goals
+
+- improved perceived performance
+- improved maintainability
+- consistent coverage of all scenarios
+
+### Non-Goals
+
+<!--
+Listing non-goals helps to focus discussion and make progress. This section is
+optional.
+
+- What is out of scope for this blueprint?
+-->
+
+### Priority of Goals
+
+In an effort to provide guidance on which goals are more important than others to assist in making
+consistent choices, despite all goals being important, we defined the following order.
+
+**Perceived performance** is above **improved maintainability** is above **consistent coverage**.
+
+Examples:
+
+- a proposal improves maintainability at the cost of perceived performance: ❌ we should consider an alternative.
+- a proposal removes a feature from certain contexts, hurting coverage, and has no impact on perceived performance or maintainability: ❌ we should re-consider.
+- a proposal improves perceived performance but removes features from certain contexts of usage: ✅ it's valid and should be discussed with Product/UX.
+- a proposal guarantees consistent coverage and has no impact on perceived performance or maintainability: ✅ it's valid.
+
+In essence, we'll strive to meet every goal at each decision but prioritise the higher ones.
+
+## Proposal
+
+<!--
+This is where we get down to the specifics of what the proposal actually is,
+but keep it simple! This should have enough detail that reviewers can
+understand exactly what you're proposing, but should not include things like
+API designs or implementation. The "Design Details" section below is for the
+real nitty-gritty.
+
+You might want to consider including the pros and cons of the proposed solution so that they can be
+compared with the pros and cons of alternatives.
+-->
+
+### Accessibility
+
+New diffs should be displayed in a way that is compliant with [Web Content Accessibility Guidelines 2.1](https://www.w3.org/TR/WCAG21/) level AA for web-based content and [Authoring Tool Accessibility Guidelines 2.0](https://www.w3.org/TR/ATAG20/) level AA for user interface.
+
+We recognize that in order to have an accessible experience using diffs in the context of GitLab, we need to ensure the compliance both for displaying and interacting with diffs. That's why the accessibility
+audit and further recommendation will also consider Content Editor used feature for reviewing changes.
+
+#### ATAG 2.0 AA
+
+Giving the nature of diffs, the following guidelines will be our main focus:
+
+1. [Guideline A.2.1: (For the authoring tool user interface) Make alternative content available to authors](https://www.w3.org/TR/ATAG20/#gl_a21)
+1. [Guideline A.3.1: (For the authoring tool user interface) Provide keyboard access to authoring features](https://www.w3.org/TR/ATAG20/#gl_a31)
+1. [Guideline A.3.4: (For the authoring tool user interface) Enhance navigation and editing via content structure](https://www.w3.org/TR/ATAG20/#gl_a34)
+1. [Guideline A.3.6: (For the authoring tool user interface) Manage preference settings](https://www.w3.org/TR/ATAG20/#gl_a36)
+
+## Design and implementation details
+
+### Workspace & Artifacts
+
+- We will store implementation details like metrics, budgets, and development & architectural patterns here in the docs
+- We will store large bodies of research, the results of audits, etc. in the [wiki](https://gitlab.com/gitlab-com/create-stage/new-diffs/-/wikis/home) of the [New Diffs project](https://gitlab.com/gitlab-com/create-stage/new-diffs)
+- We will store audio & video recordings on the public YouTube channel in the Code Review / New Diffs playlist
+- We will store drafts, meeting notes, and other temporary documents in public Google docs
+
+### Definitions
+
+#### Maintainability
+
+Maintainable projects are _simple_ projects.
+
+Simplicity is the opposite of complexity. This uses a definition of simple and complex [described by Rich Hickey in "Simple Made Easy"](https://www.infoq.com/presentations/Simple-Made-Easy/) (Strange Loop, 2011).
+
+- Maintainable code is simple (single task, single concept, separate from other things).
+- Maintainable projects expand on simple code by having simple structure (folders define classes of behaviors, e.g. you can be assured that a component directory will never initiate a network call, because that would be conflating visual display with data access)
+- Maintainable applications flow out of simple organization and simple code. The old saying is a cluttered desk is representative of a cluttered mind. Rigorous discipline on simplicity will be represented in our output (the product). By being strict about working simply, we will naturally produce applications where our users can more easily reason about their behavior.
+
+#### Done
+
+GitLab has an existing [definition of done](/ee/development/contributing/merge_request_workflow.md#definition-of-done) which is geared primarily toward identifying when an MR is ready to be merged.
+
+In addition to the items in the GitLab definition of done, work on new diffs should also adhere to the following requirements:
+
+- Meets or exceeds all metrics
+ - Meets or exceeds our minimum accessibility metrics (these are explicitly not part of our defined priorities, because they are non-negotiable)
+- All work is fully documented for engineers (user documentation is a requirement of the standard definition of done)
+
+### Metrics
+
+To measure our success, we need to set meaningful metrics. These metrics should meaningfully and positively impact the end user.
+
+1. Meets or exceeds [WCAG 2.2 AA](https://www.w3.org/TR/WCAG22/).
+1. Meets or exceeds [ATAG 2.0 AA](https://www.w3.org/TR/ATAG20/).
+1. The new Diffs app loads less than or equal to 300 KiB of JavaScript (compressed / "across-the-wire")<sup>1</sup>.
+1. The new Diffs app loads less than or equal to 150 KiB of markup, images, styles, fonts, etc. (compressed / "across-the-wire")<sup>1</sup>.
+1. The new Diffs app can execute in total isolation from the rest of the GitLab product:
+ 1. "Execute" means the app can load, display data, and allows user interaction ("read-only").
+ 1. If a part of the application is only used in merge requests or diffs, it is considered part of the Diffs application.
+ 1. If a part of the application must be brought in from the rest of the product, it is not considered part of the Diffs load (as defined in metrics 3 and 4).
+ 1. If a part of the application must be brought in from the rest of the product, it may not block functionality of the Diffs application.
+ 1. If a part of the application must be brought in from the rest of the product, it must be loaded asynchronously.
+ 1. If a part of the application meets 5.1-5.5 _(such as: the Markdown editor is loaded asynchronously when the user would like to leave a comment on a diff)_ and its inclusion causes a budget overflow:
+ - It must be added to a list of documented exceptions that we accept are out of bounds and out of our control.
+ - The exceptions list should be addressed on a regular basis to determine the ongoing value of overflowing our budget.
+
+---
+<sup>1</sup>: [The Performance Inequality Gap, 2023](https://infrequently.org/2022/12/performance-baseline-2023/)
+
+### Front end
+
+#### High-level implementation
+
+<!--
+This section should contain enough information that the specifics of your
+change are understandable. This may include API specs (though not always
+required) or even code snippets. If there's any ambiguity about HOW your
+proposal will be implemented, this is the place to discuss them.
+
+If you are not sure how many implementation details you should include in the
+blueprint, the rule of thumb here is to provide enough context for people to
+understand the proposal. As you move forward with the implementation, you may
+need to add more implementation details to the blueprint, as those may become
+an important context for important technical decisions made along the way. A
+blueprint is also a register of such technical decisions. If a technical
+decision requires additional context before it can be made, you probably should
+document this context in a blueprint. If it is a small technical decision that
+can be made in a merge request by an author and a maintainer, you probably do
+not need to document it here. The impact a technical decision will have is
+another helpful information - if a technical decision is very impactful,
+documenting it, along with associated implementation details, is advisable.
+
+If it's helpful to include workflow diagrams or any other related images.
+Diagrams authored in GitLab flavored markdown are preferred. In cases where
+that is not feasible, images should be placed under `images/` in the same
+directory as the `index.md` for the proposal.
+-->
+
+#### HTML structure
+
+The HTML structure of a diff should have support for assistive technology.
+For this reason, a table could be a preferred solution as it allows to indicate
+logical relationship between the presented data and is easier to navigate for
+screen reader users with keyboard. Labeled columns will make sure that information
+such as line numbers can be associated with the edited piece of code.
+
+Possible structure could include:
+
+```html
+<table>
+ <caption class="gl-sr-only">Changes for file index.js. 10 lines changed: 5 deleted, 5 added.</caption>
+ <tr hidden>
+ <th>Original line number: </th>
+ <th>Diff line number: </th>
+ <th>Line change:</th>
+ </tr>
+ <tr>
+ <td>1234</td>
+ <td></td>
+ <td>.tree-time-ago ,</td>
+ </tr>
+ […]
+</table>
+```
+
+See [WAI tutorial on tables](https://www.w3.org/WAI/tutorials/tables) for
+more implementation guidelines.
+
+Each file table should include a short summary of changes that will read out:
+
+- total number of lines changed,
+- number of added lines,
+- number of removed lines.
+
+The summary of the table content can be placed either within `<caption>` element, or before the table within an element referred as `aria-describedby`.
+See <abbr>WAI</abbr> (Web Accessibility Initiative) for more information on both approaches:
+
+- [Nesting summary inside the `<caption>` element](https://www.w3.org/WAI/tutorials/tables/caption-summary/#nesting-summary-inside-the-caption-element)
+- [Using `aria-describedby` to provide a table summary](https://www.w3.org/WAI/tutorials/tables/caption-summary/#using-aria-describedby-to-provide-a-table-summary)
+
+However, if such a structure will compromise other functional aspects of displaying a diff,
+more generic elements together with ARIA support can be used.
+
+#### Visual indicators
+
+It is important that each visual indicator should have a screen reader text
+denoting the meaning of that indicator. When needed, use `gl-sr-only` or `gl-sr-only-focusable`
+class to make the element accessible by screen readers, but not by sighted users.
+
+Some of the visual indicators that require alternatives for assistive technology are:
+
+- `+` or red highlighting to be read as `added`
+- `-` or green highlighting to be read as `removed`
+
+## Alternative Solutions
+
+<!--
+It might be a good idea to include a list of alternative solutions or paths considered, although it is not required. Include pros and cons for
+each alternative solution/path.
+
+"Do nothing" and its pros and cons could be included in the list too.
+-->
+
+## Proposed changes
+
+These changes (indicated by an arbitrary name like "Design A") suggest a proposed final path forward for this blueprint, but have not yet been accepted as the authoritative content.
+
+- Mark the highest hierarchical heading with your design name. If you are changing multiple headings at the same level, make sure to mark them all with the same name. This will create a high-level table of contents that is easier to reason about.
+
+### Front end (Design A)
+
+NOTE:
+This draft proposal suggests one potential front end architecture which may not be chosen. It is not necessarily mutually exclusive with other proposed designs.
+
+Ideally, we would meet our definition of done and our accountability metrics on our first try.
+We also need to continue to stay within those boundaries as we move forward. To ensure this,
+we need to design an application architecture that:
+
+1. Is:
+ 1. Scalable.
+ 1. Malleable.
+ 1. Flexible.
+1. Considers itself a mission-critical part of the overall GitLab product.
+1. Treats itself as a complex, unique application with concerns that cannot be addressed
+ as side effects of other parts of the product.
+1. Can handle data access/format changes without making UI changes.
+1. Can handle UI changes without making data access/format changes.
+1. Provides a hookable, inspectable API and avoids code coupling.
+1. Separates:
+ - State and application data.
+ - Application behavior and UI.
+ - Data access and network access.
+
+#### High-level implementation
+
+NOTE:
+This draft proposal suggests one potential front end architecture which may not be chosen. It is not necessarily mutually exclusive with other proposed designs.
+
+(See [New Diffs: Technical Architecture Design](https://gitlab.com/gitlab-org/gitlab/-/issues/431276) for nicer visuals of this chart)
+
+```mermaid
+flowchart TB
+ classDef sticky fill:#d0cabf, color:black
+ stickyMetricsA>"Metrics 3, 4, & 5 apply to<br>the entire front end application"]
+
+ stickyMetricsA -.- fe
+ fe
+
+ Socket((WebSocket))
+
+ be
+
+subgraph fe [Front End]
+ stickyMetricsB>"Metrics 1 & 2 apply<br>to all UI elements"]
+ stickyInbound>"All data is formatted precisely<br>how the UI needs to interact with it"]
+ stickyOutbound>"All data is formatted precisely<br>how the back end expects it"]
+ stickyIdb>"Long-term.
+
+ e.g. diffs, MRs, emoji, notes, drafts, user-only data<br>like file reviews, collapse states, etc."]
+ stickySession>"Session-term.
+
+ e.g. selected tab, scroll position,<br>temporary changes to user settings, etc."]
+
+ Events([Event Hub])
+ UI[UI]
+ uiState((Local State))
+ Logic[Application Logic]
+ Normalizer[Data Normalizer]
+ Inbound{{Inbound Contract}}
+ Outbound{{Outbound Contract}}
+ Data[Data Access]
+ idb((indexedDB))
+ session((sessionStorage))
+ Network[Network Access]
+end
+
+subgraph be [Back End]
+ stickyApi>"A large list of defined actions a<br>Diffs/Merge Request UI could perform.
+
+ e.g.: <code>mergeRequest:notes:saveDraft</code> or<br><code>mergeRequest:changeStatus</code> (with <br><code>status: 'draft'</code> or <code>status: 'ready'</code>, etc.).
+
+ Must not expose any implementation detail,<br>like models, storage structure, etc."]
+ API[Activities API]
+ unk[\"?"/]
+
+ API -.- stickyApi
+end
+
+ %% Make stickies look like paper sort of?
+ class stickyMetricsA,stickyMetricsB,stickyInbound,stickyOutbound,stickyIdb,stickySession,stickyApi sticky
+
+ UI <--> uiState
+ stickyMetricsB -.- UI
+ Network ~~~ stickyMetricsB
+
+ Logic <--> Normalizer
+
+ Normalizer --> Outbound
+ Outbound --> Data
+ Inbound --> Normalizer
+ Data --> Inbound
+
+ Inbound -.- stickyInbound
+ Outbound -.- stickyOutbound
+
+ Data <--> idb
+ Data <--> session
+ idb -.- stickyIdb
+ session -.- stickySession
+
+ Events <--> UI
+ Events <--> Logic
+ Events <--> Data
+ Events <--> Network
+
+ Network --> Socket --> API --> unk
+```
+
+## Proposal (Design B)
+
+NOTE:
+This draft proposal suggests one potential front end architecture which may not be chosen. It is not necessarily mutually exclusive with other proposed designs.
+
+New diffs introduce a paradigm shift in our approach to rendering diffs.
+Previously, we had two different approaches to rendering diffs:
+
+1. Merge requests heavily utilized client-side rendering.
+1. All other pages used server-side rendering with sprinkles of JavaScript.
+
+In merge requests, most of the rendering work was done on the client:
+
+- The backend would only generate a JSON response with diffs data.
+- The client would be responsible for both drawing the diffs and reacting to user input.
+
+This led to us adopting a
+[virtualized scrolling solution](https://github.com/Akryum/vue-virtual-scroller/tree/v1/packages/vue-virtual-scroller)
+for client-side rendering, which sped up drawing large diff file lists significantly.
+
+Unfortunately, this came with downsides of a very high maintenance cost and
+[constant bugs](https://gitlab.com/gitlab-org/gitlab/-/issues/427155#note_1607184794).
+The user experience also suffered because we couldn't show diffs right away
+when you visited a page, and had to wait for the JSON response first.
+Lastly, this approach went completely parallel to the server-rendered diffs used on other pages,
+which resulted in two completely separate codebases for the diffs.
+
+The new-diffs approach changes that by doing the following:
+
+1. Stop using virtualized scrolling for rendering diffs.
+1. Move most of the rendering work to the server.
+1. Enhance server-rendered HTML on the client.
+1. Unify diffs codebase across merge requests and other pages.
+
+## Design & Implementation Details (Design B)
+
+NOTE:
+This draft proposal suggests one potential front end architecture which may not be chosen. It is not necessarily mutually exclusive with other proposed designs.
+
+### Metrics
+
+1. _(no change)_
+1. _(no change)_
+1. _(no change)_
+1. _(no change)_
+1. _(no change)_
+1. When rendering diffs on the server:
+ - The total server-rendered count should not exceed 5 files.
+ - It should not try to render empty diffs. (It should render at least 1 file.)
+ - The total lines of diff code rendered should not exceed 1000 lines.
+
+### Overview
+
+New diffs introduce a change in responsibilities for both frontend and backend.
+
+The backend will:
+
+1. Prepare diffs data.
+1. Highlight diff lines.
+1. Render diffs as HTML.
+1. Embed diffs metadata into the final response.
+
+The frontend will:
+
+1. Enhance existing and future diffs HTML.
+1. Fetch and render additional diffs HTML that didn't fit into the page document.
+
+#### Static and dynamic separation
+
+To achieve the separation of concerns, we should distinguish between static and dynamic UI on the page:
+
+- Everything that is static should always be rendered on the server.
+- Everything dynamic should be enhanced on the client.
+
+As an example: a highlighted diff line doesn't change with user input, so we should consider rendering it on the server.
+
+#### Performance optimizations
+
+To improve the perceived performance of the page we should implement the following techniques:
+
+1. Limit the number of diffs rendered on the page at first.
+1. Use [HTML streaming](https://gitlab.com/gitlab-org/frontend/rfcs/-/issues/101)
+ to render the rest of the diffs.
+ 1. Use Web Components to hook into diff files appearing on the page.
+1. Apply `content-visibility` whenever possible to reduce redraw overhead.
+1. Render diff discussions asynchronously.
diff --git a/doc/architecture/blueprints/object_storage/index.md b/doc/architecture/blueprints/object_storage/index.md
index 3f649960554..daafc03941b 100644
--- a/doc/architecture/blueprints/object_storage/index.md
+++ b/doc/architecture/blueprints/object_storage/index.md
@@ -4,7 +4,7 @@ creation-date: "2021-11-18"
authors: [ "@nolith" ]
coach: "@glopezfernandez"
approvers: [ "@marin" ]
-owning-stage: "~devops::data_stores"
+owning-stage: "~devops::data stores"
participating-stages: []
---
diff --git a/doc/architecture/blueprints/observability_logging/index.md b/doc/architecture/blueprints/observability_logging/index.md
index d8259e0a736..bbe15cde58e 100644
--- a/doc/architecture/blueprints/observability_logging/index.md
+++ b/doc/architecture/blueprints/observability_logging/index.md
@@ -121,7 +121,7 @@ Hence the decision to only support Log objects seems like a boring and simple so
Similar to traces, logging data ingestion will be done at the Ingress level.
As part of [the forward-auth](https://doc.traefik.io/traefik/middlewares/http/forwardauth/) flow, Traefik will forward the request to Gatekeeper which in turn leverages Redis for counting.
This is currently done only for [the ingestion path](https://gitlab.com/gitlab-org/opstrace/opstrace/-/merge_requests/2236).
-Please check the MR description for more details on how it works.
+Check the MR description for more details on how it works.
The read path rate limiting implementation is tracked [here](https://gitlab.com/gitlab-org/opstrace/opstrace/-/issues/2356).
### Database schema
@@ -629,4 +629,4 @@ Long-term, we will need a way to monitor the number of user queries that failed
## Iterations
-Please refer to [Observability Group planning epic](https://gitlab.com/groups/gitlab-org/opstrace/-/epics/92) and its linked issues for up-to-date information.
+Refer to [Observability Group planning epic](https://gitlab.com/groups/gitlab-org/opstrace/-/epics/92) and its linked issues for up-to-date information.
diff --git a/doc/architecture/blueprints/observability_logging/system_overview.png b/doc/architecture/blueprints/observability_logging/system_overview.png
index 30c6510c3dc..57299e89512 100644
--- a/doc/architecture/blueprints/observability_logging/system_overview.png
+++ b/doc/architecture/blueprints/observability_logging/system_overview.png
Binary files differ
diff --git a/doc/architecture/blueprints/organization/index.md b/doc/architecture/blueprints/organization/index.md
index 49bf18442e9..f9a250e1205 100644
--- a/doc/architecture/blueprints/organization/index.md
+++ b/doc/architecture/blueprints/organization/index.md
@@ -300,7 +300,7 @@ We are conducting deeper research around this specific problem in [issue 420804]
The following iteration plan outlines how we intend to arrive at the Organization MVC. We are following the guidelines for [Experiment, Beta, and Generally Available features](../../../policy/experiment-beta-support.md).
-### Iteration 1: Organization Prototype (FY24Q4)
+### Iteration 1: [Organization Prototype](https://gitlab.com/groups/gitlab-org/-/epics/10018) (FY24Q2-FY25Q1)
In iteration 1, we introduce the concept of an Organization as a way to group top-level Groups together. Support for Organizations does not require any [Cells](../cells/index.md) work, but having them will make all subsequent iterations of Cells simpler. The goal of iteration 1 will be to generate a prototype that can be used by GitLab teams to test basic functionality within an Organization. The prototype contains the following functionality:
@@ -314,8 +314,9 @@ In iteration 1, we introduce the concept of an Organization as a way to group to
- A User can be part of multiple Organizations.
- Users can navigate between the different Organizations they are part of.
- Any User within or outside of an Organization can be invited to Groups and Projects contained by the Organization.
+- Organizations are not fully isolated. We aim to complete [phase 1 of Organization isolation](https://gitlab.com/groups/gitlab-org/-/epics/11837), with the goal to `define sharding_key` and `desired_sharding_key` rules.
-### Iteration 2: Organization MVC Experiment (FY25Q1)
+### Iteration 2: [Organization MVC Experiment](https://gitlab.com/groups/gitlab-org/-/epics/10650) (FY25Q2)
In iteration 2, an Organization MVC Experiment will be released. We will test the functionality with a select set of customers and improve the MVC based on these learnings. The MVC Experiment contains the following functionality:
@@ -325,7 +326,7 @@ In iteration 2, an Organization MVC Experiment will be released. We will test th
- Forking across Organizations will be defined.
- [Organization Isolation](isolation.md) will be finished to meet the requirements of the initial set of customers
-### Iteration 3: Organization MVC Beta (FY25Q1)
+### Iteration 3: [Organization MVC Beta](https://gitlab.com/groups/gitlab-org/-/epics/10651) (FY25Q3)
In iteration 3, the Organization MVC Beta will be released.
@@ -334,9 +335,9 @@ In iteration 3, the Organization MVC Beta will be released.
- Organization Owners can create, edit and delete Groups from the Groups overview.
- Organization Owners can create, edit and delete Projects from the Projects overview.
- The Organization URL path can be changed.
-- [Organization Isolation](isolation.md) is available.
+- Organizations are fully isolated. We aim to complete [phase 2 of Organization isolation](https://gitlab.com/groups/gitlab-org/-/epics/11838), with the goal to implement isolation constraints.
-### Iteration 4: Organization MVC GA (FY25Q2)
+### Iteration 4: [Organization MVC GA](https://gitlab.com/groups/gitlab-org/-/epics/10652) (FY25Q3)
In iteration 4, the Organization MVC will be rolled out.
@@ -346,6 +347,7 @@ After the initial rollout of Organizations, the following functionality will be
1. [Users can transfer existing top-level Groups into Organizations](https://gitlab.com/groups/gitlab-org/-/epics/11711).
1. [Organizations can invite Users](https://gitlab.com/gitlab-org/gitlab/-/issues/420166).
+1. Complete [phase 3 of Organization isolation](https://gitlab.com/groups/gitlab-org/-/epics/11839), with the goal to allow customers to move existing namespaces out of the default Organization into a new Organization.
1. Internal visibility will be made available on Organizations that are part of GitLab.com.
1. Restrict inviting Users outside of the Organization.
1. Enterprise Users will be made available at the Organization level.
@@ -368,12 +370,14 @@ We propose the following steps to successfully roll out Organizations:
- Phase 1: Rollout
- Organizations will be rolled out using the concept of a `default Organization`. All existing top-level groups on GitLab.com are already part of this `default Organization`. The Organization UI is feature flagged and can be enabled for a specific set of users initially, and the global user pool at the end of this phase. This way, users will already become familiar with the concept of an Organization and the Organization UI. No features would be impacted by enabling the `default Organization`. See issue [#418225](https://gitlab.com/gitlab-org/gitlab/-/issues/418225) for more details.
-- Phase 2: Migrations
- - GitLab, the organization, will be the first one to bud off into a separate Organization. We move all top-level groups that belong to GitLab into the new GitLab Organization, including the `gitLab-org` and `gitLab-com` top-level Groups. See issue [#418228](https://gitlab.com/gitlab-org/gitlab/-/issues/418228) for more details.
- - Existing customers can create their own Organization. Creation of an Organization remains optional.
-- Phase 3: Onboarding changes
- - New customers will only have the option to start their journey by creating an Organization.
-- Phase 4: Targeted efforts
+- Phase 2: Temporary onboarding changes
+ - New customers who were identified to not need personal namespaces and forking can create new Organizations from scratch. Top-level Groups cannot be migrated yet into a new Organization, so all content must be newly created in an Organization.
+- Phase 3: Migration of existing customers
+ - GitLab, the organization, will be the first one to bud off into a separate Organization. We move all top-level Groups that belong to GitLab into the new GitLab Organization, including the `gitLab-org` and `gitLab-com` top-level Groups. See issue [#418228](https://gitlab.com/gitlab-org/gitlab/-/issues/418228) for more details.
+ - Once top-level Group transfer from the default Organization to another Organization becomes available, existing customers can create their own Organization and migrate their top-level Groups into it. Creation of an Organization remains optional.
+- Phase 4: Permanent onboarding changes
+ - All new customers will only have the option to start their journey by creating a new Organization.
+- Phase 5: Targeted efforts
- Organizations are promoted, e.g. via a banner message, targeted conversations with large customers via the CSMs. Creating a separate Organization will remain a voluntary action.
- We increase the value proposition of the Organization, for instance by moving billing to the Organization level to provide incentives for more customers to move to a separate Organization. Adoption will be monitored.
diff --git a/doc/architecture/blueprints/organization/isolation.md b/doc/architecture/blueprints/organization/isolation.md
index 238269c4329..467bd1932bd 100644
--- a/doc/architecture/blueprints/organization/isolation.md
+++ b/doc/architecture/blueprints/organization/isolation.md
@@ -65,7 +65,7 @@ These are:
The major constraint these POCs were trying to overcome was that there is no standard way in the GitLab application or database to even determine what Organization (or Project or namespace) a piece of data belongs to.
This means that the first step is to implement a standard way to efficiently find the parent Organization for any model or row in the database.
-The proposed solution is ensuring that every single table that exists in the `gitlab_main_cell` and `gitlab_ci_cell` (Cell-local) databases must include a valid sharding key that is either `project_id` or `namespace_id`.
+The proposed solution is ensuring that every single table that exists in the `gitlab_main_cell`, `gitlab_ci` and `gitlab_pm` (Cell-local) databases must include a valid sharding key that is a reference to `projects`, `namespaces` or `organizations`.
At first we considered enforcing everything to have an `organization_id`, but we determined that this would be too expensive to update for customers that need to migrate large Groups out of the default Organization.
The added benefit is that more than half of our tables already have one of these columns.
Additionally, if we can't consistently attribute data to a top-level Group, then we won't be able to validate if a top-level Group is safe to be moved to a new Organization.
@@ -79,7 +79,7 @@ We can also use these sharding keys to help us decide whether:
## Detailed steps
-1. Implement developer facing documentation explaining the requirement to add these sharding keys and how they should choose between `project_id` and `namespace_id`.
+1. Implement developer facing documentation explaining the requirement to add these sharding keys and how they should choose.
1. Add a way to declare a sharding key in `db/docs` and automatically populate it for all tables that already have a sharding key
1. Implement automation in our CI pipelines and/or DB migrations that makes it impossible to create new tables without a sharding key.
1. Implement a way for people to declare a desired sharding key in `db/docs` as
@@ -107,7 +107,7 @@ We can also use these sharding keys to help us decide whether:
automated MRs for the sharding keys that can be automatically inferred
and automate creating issues for all the sharding keys that can't be
automatically inferred
-1. Validate that all existing `project_id` and `namespace_id` columns on all Cell-local tables can reliably be assumed to be the sharding key. This requires assigning issues to teams to confirm that these columns aren't used for some other purpose that would actually not be suitable. If there is an issue with a table we need to migrate and rename these columns, and then add a new `project_id` or `namespace_id` column with the correct sharding key.
+1. Validate that all existing sharding key columns on all Cell-local tables can reliably be assumed to be the sharding key. This requires assigning issues to teams to confirm that these columns aren't used for some other purpose that would actually not be suitable.
1. We allow customers to create new Organizations without the option to migrate namespaces into them. All namespaces need to be newly created in their new Organization.
1. Implement new functionality in GitLab similar to the [POC](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/131968), which allows a namespace owner to see if their namespace is fully isolated.
1. Implement functionality that allows namespace owners to migrate an existing namespace from one Organization to another. Most likely this will be existing customers that want to migrate their namespace out of the default Organization into a newly created Organization. Only isolated namespaces as implemented in the previous step will be allowed to move.
diff --git a/doc/architecture/blueprints/rate_limiting/index.md b/doc/architecture/blueprints/rate_limiting/index.md
index 7af50097e97..94a36d721e9 100644
--- a/doc/architecture/blueprints/rate_limiting/index.md
+++ b/doc/architecture/blueprints/rate_limiting/index.md
@@ -218,7 +218,7 @@ communicate how the responsible usage is defined at a given moment.
Because of how GitLab architecture has been built, GitLab Rails application, in
most cases, behaves as a central enterprise service bus (ESB) and there are a
-few satellite services communicating with it. Services like Container Registry,
+few satellite services communicating with it. Services like container registry,
GitLab Runners, Gitaly, Workhorse, KAS could use the API to receive a set of
application limits those are supposed to enforce. This will still allow us to
define all of them in a single place.
diff --git a/doc/architecture/blueprints/repository_backups/index.md b/doc/architecture/blueprints/repository_backups/index.md
index afd86e4979c..3b79f3fbe96 100644
--- a/doc/architecture/blueprints/repository_backups/index.md
+++ b/doc/architecture/blueprints/repository_backups/index.md
@@ -59,7 +59,7 @@ This should relieve the major pain points of the existing two strategies:
Snapshots rely on cloud platforms to be able to take physical snapshots of the
disks that Gitaly and Praefect use to store data. While never officially
-recommended this strategy tends to be used once creating or restoring backups
+recommended, this strategy tends to be used once creating or restoring backups
using `backup.rake` takes too long.
Gitaly and Git use lock files and fsync in order to prevent repository
diff --git a/doc/architecture/blueprints/runner_tokens/index.md b/doc/architecture/blueprints/runner_tokens/index.md
index 7e5ce57dcdc..f2e9d624d20 100644
--- a/doc/architecture/blueprints/runner_tokens/index.md
+++ b/doc/architecture/blueprints/runner_tokens/index.md
@@ -97,7 +97,7 @@ token in the `--registration-token` argument:
| Token type | Behavior |
| ---------- | -------- |
-| [Registration token](../../../security/token_overview.md#runner-authentication-tokens) | Leverages the `POST /api/v4/runners` REST endpoint to create a new runner, creating a new entry in `config.toml`. |
+| [Registration token](../../../security/token_overview.md#runner-authentication-tokens) | Leverages the `POST /api/v4/runners` REST endpoint to create a new runner, creating a new entry in `config.toml` and a `system_id` value in a sidecar file if missing (`.runner_system_id`). |
| [Runner authentication token](../../../security/token_overview.md#runner-authentication-tokens) | Leverages the `POST /api/v4/runners/verify` REST endpoint to ensure the validity of the authentication token. Creates an entry in `config.toml` file and a `system_id` value in a sidecar file if missing (`.runner_system_id`). |
### Transition period
@@ -329,9 +329,7 @@ enum column created in the `ci_runners` table.
### Runner creation through API
Automated runner creation is possible through a new GraphQL mutation and the existing
-[`POST /runners` REST API endpoint](../../../api/runners.md#register-a-new-runner).
-The difference in the REST API endpoint is that it is modified to accept a request from an
-authorized user with a scope (instance, a group, or a project) instead of the registration token.
+[`POST /user/runners` REST API endpoint](../../../api/users.md#create-a-runner-linked-to-a-user).
These endpoints are only available to users that are
[allowed](../../../user/permissions.md#gitlab-cicd-permissions) to create runners at the specified
scope.
@@ -361,8 +359,9 @@ scope.
| Component | Milestone | Issue | Changes |
|------------------|----------:|-------|---------|
-|GitLab Runner Helm Chart| `%15.10` | Update the Runner Helm Chart to support registration with the authentication token. |
-|GitLab Runner Operator| `%15.10` | Update the Runner Operator to support registration with the authentication token. |
+| GitLab Runner Helm Chart | `%15.10` | Update the Runner Helm Chart to support registration with the authentication token. |
+| GitLab Runner Operator | `%15.10` | Update the Runner Operator to support registration with the authentication token. |
+| GitLab Runner Helm Chart | `%16.2` | Add `systemID` to Runner Helm Chart. |
### Stage 3 - Database changes
@@ -414,7 +413,7 @@ scope.
| Component | Milestone | Changes |
|------------------|----------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| GitLab Rails app | `%16.0` | Adapt `register_{group|project}_runner` permissions to take [application setting](https://gitlab.com/gitlab-org/gitlab/-/issues/386712) in consideration. |
-| GitLab Rails app | `%16.1` | Make [`POST /api/v4/runners` endpoint](../../../api/runners.md#register-a-new-runner) permanently return `HTTP 410 Gone` if either `allow_runner_registration_token` setting disables registration tokens.<br/>A future v5 version of the API should return `HTTP 404 Not Found`. |
+| GitLab Rails app | `%16.1` | Make [`POST /api/v4/runners` endpoint](../../../api/runners.md#create-an-instance-runner) permanently return `HTTP 410 Gone` if either `allow_runner_registration_token` setting disables registration tokens.<br/>A future v5 version of the API should return `HTTP 404 Not Found`. |
| GitLab Rails app | `%16.1` | Add runner group metadata to the runner list. |
| GitLab Rails app | | Add UI to allow disabling use of registration tokens in top-level group settings. |
| GitLab Rails app | | Hide legacy UI showing registration with a registration token, if it disabled on in top-level group settings or by admins. |
@@ -441,7 +440,7 @@ scope.
## FAQ
-Please follow [the user documentation](../../../ci/runners/new_creation_workflow.md).
+Follow [the user documentation](../../../ci/runners/new_creation_workflow.md).
## Status
diff --git a/doc/architecture/blueprints/secret_detection/decisions/001_use_ruby_push_check_approach_within_monolith.md b/doc/architecture/blueprints/secret_detection/decisions/001_use_ruby_push_check_approach_within_monolith.md
new file mode 100644
index 00000000000..c81e6748e70
--- /dev/null
+++ b/doc/architecture/blueprints/secret_detection/decisions/001_use_ruby_push_check_approach_within_monolith.md
@@ -0,0 +1,32 @@
+---
+owning-stage: "~devops::secure"
+description: "GitLab Secret Detection ADR 001: Use Ruby Push Check approach within monolith"
+---
+
+# GitLab Secret Detection ADR 001: Use Ruby Push Check approach within monolith
+
+## Context
+
+There are a number of concerns around the performance of secret detection using a regex-based approach at scale. The primary considerations include transfer latency between nodes and both CPU and memory bloat. These concerns manifested in two ways: the language to be used for performing regex matching and the deployment architecture.
+
+The original discussion in [the exploration issue](https://gitlab.com/gitlab-org/gitlab/-/issues/428499) covers many of these concerns and background.
+
+### Implementation language
+
+The two primary languages considered were Ruby and Go.
+
+The choice to use other languages (such as C++) for implementation was discarded in favour of Ruby and Go due to team familiarity, speed of deployment, and portability. See [this benchmarking issue](https://gitlab.com/gitlab-org/gitlab/-/issues/423832) for performance comparisons between the two.
+
+### Deployment architecture
+
+Several options were considered for deployments: directly embedding the logic within the Rails monolith's Push Check execution path, placement as a sidecar within a Rails node deployment, placement as a sidecar within a Gitaly node as a [server-side hook](../../../../administration/server_hooks.md), and deployment as a standalone service.
+
+## Decision
+
+For the initial iteration around blocking push events using a prereceive integration, the decision was made to proceed with Ruby-based approach, leveraging `re2` for performant regex processing. Additionally, the decision was made to integrate the logic directly into the monolith rather than as a discrete service or server-side hook within Gitaly.
+
+A Gitaly server-side hook would have performance benefits around minimal transfer latency for Git blobs between scanning service and Gitaly blob storage. However, an extra request would be needed between Gitaly and the Rails application to contextualize the scan. Additionally, the current hook architecture is [discouraged and work is planned to migrate towards a new plugin architecture in the near future](https://gitlab.com/gitlab-org/gitaly/-/issues/5642).
+
+The Ruby Push Check approach follows a clear execution plan to achieve delivery by anticipated timeline and is more closely aligned with the long-term direction of platform-wide scanning. For example, future scanning of issuables will require execution within the trust boundary of the Rails application rather than Gitaly context. This approach, however, has raised concerns around elevated memory usage within the Rails application leading to availability concerns. This direction may also require migrating towards Gitaly's new plugin architecture in the future once the timeline is known.
+
+A standalone service may be considered in the future but requires considerations of a technical approach that should be better informed by data gathered during [pre-production profiling](https://gitlab.com/gitlab-org/gitlab/-/issues/428499).
diff --git a/doc/architecture/blueprints/secret_detection/index.md b/doc/architecture/blueprints/secret_detection/index.md
index 76bf6dd4088..fb77fffee40 100644
--- a/doc/architecture/blueprints/secret_detection/index.md
+++ b/doc/architecture/blueprints/secret_detection/index.md
@@ -61,10 +61,47 @@ As a long-term priority we should consider unifying the management of the two
secret types however that work is out of scope for the current blueprints goals,
which remain focused on active detection.
+### Target types
+
+Target object types refer to the scanning targets prioritized for detection of leaked secrets.
+
+In order of priority this includes:
+
+1. non-binary Git blobs
+1. job logs
+1. issuable creation (issues, MRs, epics)
+1. issuable updates (issues, MRs, epics)
+1. issuable comments (issues, MRs, epics)
+
+Targets out of scope for the initial phases include:
+
+- Media types (JPEG, PDF, ...)
+- Snippets
+- Wikis
+- Container images
+
+### Token types
+
+The existing Secret Detection configuration covers ~100 rules across a variety
+of platforms. To reduce total cost of execution and likelihood of false positives
+the dedicated service targets only well-defined tokens. A well-defined token is
+defined as a token with a precise definition, most often a fixed substring prefix or
+suffix and fixed length.
+
+Token types to identify in order of importance:
+
+1. Well-defined GitLab tokens (including Personal Access Tokens and Pipeline Trigger Tokens)
+1. Verified Partner tokens (including AWS)
+1. Remainder tokens currently included in Secret Detection CI configuration
+
## Proposal
+### Decisions
+
+- [001: Use Ruby Push Check approach within monolith](decisions/001_use_ruby_push_check_approach_within_monolith.md)
+
The first iteration of the experimental capability will feature a blocking
-pre-receive hook implemented within the Rails application. This iteration
+pre-receive hook implemented in the Rails application. This iteration
will be released in an experimental state to select users and provide
opportunity for the team to profile the capability before considering extraction
into a dedicated service.
@@ -89,10 +126,41 @@ as self-managed instances.
- Performance of scanning against volume of domain objects (such as push frequency)
- Queueing of scan requests
+### Transfer optimizations for large Git data blobs
+
+As described in [Gitaly's upload-pack traffic blueprint](../gitaly_handle_upload_pack_in_http2_server/index.md#git-data-transfer-optimization-with-sidechannel), we have faced problems in the past handling large data transfers over gRPC. This could be a concern as we expand secret detection to large blob sizes to increase coverage over leaked secrets. We expect to rollout pre-receive scanning with a 1 megabyte blob size limit which should be well within boundaries. From [Protobuffers' documentation](https://protobuf.dev/programming-guides/techniques/#large-data):
+
+> As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.
+
+In expansion phases we must explore chunking or alternative strategies like the optimized sidechannel approach used by Gitaly.
+
## Design and implementation details
+The implementation of the secret scanning service is highly dependent on the outcomes of our benchmarking
+and capacity planning against both GitLab.com and our
+[Reference Architectures](../../../administration/reference_architectures/index.md).
+As the scanning capability must be an on-by-default component of both our SaaS and self-managed
+instances [the PoC](#iterations), the deployment characteristics must be considered to determine whether
+this is a standalone component or executed as a subprocess of the existing Sidekiq worker fleet
+(similar to the implementation of our Elasticsearch indexing service).
+
+Similarly, the scan target volume will require a robust and scalable enqueueing system to limit resource consumption.
+
+The detection capability relies on a multiphase rollout, from an experimental component implemented directly in the monolith to a standalone service capable of scanning text blobs generically.
+
+See [technical discovery](https://gitlab.com/gitlab-org/gitlab/-/issues/376716)
+for further background exploration.
+
+See [this thread](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/105142#note_1194863310)
+for past discussion around scaling approaches.
+
+### Phase 1 - Ruby pushcheck pre-receive integration
+
The critical paths as outlined under [goals above](#goals) cover two major object
-types: Git blobs (corresponding to push events) and arbitrary text blobs.
+types: Git text blobs (corresponding to push events) and arbitrary text blobs. In Phase 1,
+we focus entirely on Git text blobs.
+
+This phase will be considered "Experimental" with limited availability for customer opt-in, through instance level application settings.
The detection flow for push events relies on subscribing to the PreReceive hook
to scan commit data using the [PushCheck interface](https://gitlab.com/gitlab-org/gitlab/blob/3f1653f5706cd0e7bbd60ed7155010c0a32c681d/lib/gitlab/checks/push_check.rb). This `SecretScanningService`
@@ -100,6 +168,185 @@ service fetches the specified blob contents from Gitaly, scans
the commit contents, and rejects the push when a secret is detected.
See [Push event detection flow](#push-event-detection-flow) for sequence.
+In the case of a push detection, the commit is rejected inline and error returned to the end user.
+
+#### High-Level Architecture
+
+The Phase 1 architecture involves no additional components and is entirely encapsulated in the Rails application server. This provides a rapid deployment with tight integration within auth boundaries and no distribution coordination.
+
+The primary drawback relies on resource utilization, adding additional CPU, memory, transfer volume, and request latency to existing application nodes.
+
+```plantuml
+@startuml Phase2
+skinparam linetype ortho
+
+card "**External Load Balancer**" as elb #6a9be7
+
+together {
+ card "**GitLab Rails**" as gitlab #32CD32
+ card "**Gitaly**" as gitaly #FF8C00
+ card "**PostgreSQL**" as postgres #4EA7FF
+ card "**Redis**" as redis #FF6347
+ card "**Sidekiq**" as sidekiq #ff8dd1
+}
+}
+
+gitlab -[#32CD32]--> gitaly
+gitlab -[#32CD32]--> postgres
+gitlab -[#32CD32]--> redis
+gitlab -[#32CD32]--> sidekiq
+
+elb -[#6a9be7]-> gitlab
+
+gitlab .[#32CD32]----> postgres
+sidekiq .[#ff8dd1]----> postgres
+
+@enduml
+```
+
+#### Push event detection flow
+
+```mermaid
+sequenceDiagram
+ autonumber
+ actor User
+ User->>+Workhorse: git push with-secret
+ Workhorse->>+Gitaly: tcp
+ Gitaly->>+Rails: PreReceive
+ Rails->>-Gitaly: ListAllBlobs
+ Gitaly->>-Rails: ListAllBlobsResponse
+
+ Rails->>+GitLabSecretDetection: Scan(blob)
+ GitLabSecretDetection->>-Rails: found
+
+ Rails->>User: rejected: secret found
+
+ User->>+Workhorse: git push without-secret
+ Workhorse->>+Gitaly: tcp
+ Gitaly->>+Rails: PreReceive
+ Rails->>-Gitaly: ListAllBlobs
+ Gitaly->>-Rails: ListAllBlobsResponse
+
+ Rails->>+GitLabSecretDetection: Scan(blob)
+ GitLabSecretDetection->>-Rails: not_found
+
+ Rails->>User: accepted
+```
+
+### Phase 2 - Standalone pre-receive service
+
+The critical paths as outlined under [goals above](#goals) cover two major object
+types: Git text blobs (corresponding to push events) and arbitrary text blobs. In Phase 2,
+we focus entirely on Git text blobs.
+
+This phase emphasizes scaling the service outside of the monolith for general availability and to allow
+an on-by-default behavior. The architecture is adapted to provide an isolated and independently
+scalable service outside of the Rails monolith.
+
+In the case of a push detection, the commit is rejected inline and error returned to the end user.
+
+#### High-Level Architecture
+
+The Phase 2 architecture involves extracting the secret detection logic into a standalone service
+which communicates directly with both the Rails application and Gitaly. This provides a means to scale
+the secret detection nodes independently, and reduce resource usage overhead on the rails application.
+
+Scans still runs synchronously as a (potentially) blocking pre-receive transaction.
+
+Note that the node count is purely illustrative, but serves to emphasize the independent scaling requirements for the scanning service.
+
+```plantuml
+
+@startuml Phase2
+skinparam linetype ortho
+
+card "**External Load Balancer**" as elb #6a9be7
+card "**Internal Load Balancer**" as ilb #9370DB
+
+together {
+ collections "**GitLab Rails** x3" as gitlab #32CD32
+ collections "**Sidekiq** x3" as sidekiq #ff8dd1
+}
+
+together {
+ collections "**Consul** x3" as consul #e76a9b
+}
+
+card "SecretScanningService Cluster" as prsd_cluster {
+ collections "**SecretScanningService** x5" as prsd #FF8C00
+}
+
+card "Gitaly Cluster" as gitaly_cluster {
+ collections "**Gitaly** x3" as gitaly #FF8C00
+}
+
+card "Database" as database {
+ collections "**PGBouncer** x3" as pgbouncer #4EA7FF
+}
+
+elb -[#6a9be7]-> gitlab
+
+gitlab -[#32CD32,norank]--> ilb
+gitlab .[#32CD32]----> database
+gitlab -[hidden]-> consul
+
+sidekiq -[#ff8dd1,norank]--> ilb
+sidekiq .[#ff8dd1]----> database
+sidekiq -[hidden]-> consul
+
+ilb -[#9370DB]--> prsd_cluster
+ilb -[#9370DB]--> gitaly_cluster
+ilb -[#9370DB]--> database
+ilb -[hidden]u-> consul
+
+consul .[#e76a9b]u-> gitlab
+consul .[#e76a9b]u-> sidekiq
+consul .[#e76a9b]-> database
+consul .[#e76a9b]-> gitaly_cluster
+consul .[#e76a9b]-> prsd_cluster
+
+@enduml
+```
+
+#### Push event detection flow
+
+```mermaid
+sequenceDiagram
+ autonumber
+ actor User
+ User->>+Workhorse: git push with-secret
+ Workhorse->>+Gitaly: tcp
+ Gitaly->>+GitLabSecretDetection: PreReceive
+ GitLabSecretDetection->>-Gitaly: ListAllBlobs
+ Gitaly->>-GitLabSecretDetection: ListAllBlobsResponse
+
+ Gitaly->>+GitLabSecretDetection: PreReceive
+
+ GitLabSecretDetection->>GitLabSecretDetection: Scan(blob)
+ GitLabSecretDetection->>-Gitaly: found
+
+ Gitaly->>+Rails: PreReceive
+
+ Rails->>User: rejected: secret found
+
+ User->>+Workhorse: git push without-secret
+ Workhorse->>+Gitaly: tcp
+ Gitaly->>+GitLabSecretDetection: PreReceive
+ GitLabSecretDetection->>-Gitaly: ListAllBlobs
+ Gitaly->>-GitLabSecretDetection: ListAllBlobsResponse
+
+ Gitaly->>+GitLabSecretDetection: PreReceive
+
+ GitLabSecretDetection->>GitLabSecretDetection: Scan(blob)
+ GitLabSecretDetection->>-Gitaly: not_found
+
+ Gitaly->>+Rails: PreReceive
+
+ Rails->>User: accepted
+```
+
+### Phase 3 - Expansion beyond pre-
+
The detection flow for arbitrary text blobs, such as issue comments, relies on
subscribing to `Notes::PostProcessService` (or equivalent service) to enqueue
Sidekiq requests to the `SecretScanningService` to process the text blob by object type
@@ -117,8 +364,44 @@ In any other case of detection, the Rails application manually creates a vulnera
using the `Vulnerabilities::ManuallyCreateService` to surface the finding in the
existing Vulnerability Management UI.
-See [technical discovery](https://gitlab.com/gitlab-org/gitlab/-/issues/376716)
-for further background exploration.
+#### Architecture
+
+There is no change to the architecture defined in Phase 2, however the individual load requirements may require scaling up the node counts for the detection service.
+
+#### Detection flow
+
+There is no change to the push event detection flow defined in Phase 2, however the added capability to scan
+arbitary text blobs directly from Rails allows us to emulate a pre-receive behavior for issuable creations,
+as well (see [target types](#target-types) for priority object types).
+
+```mermaid
+sequenceDiagram
+ autonumber
+ actor User
+ User->>+Workhorse: git push with-secret
+ Workhorse->>+Gitaly: tcp
+ Gitaly->>+GitLabSecretDetection: PreReceive
+ GitLabSecretDetection->>-Gitaly: ListAllBlobs
+ Gitaly->>-GitLabSecretDetection: ListAllBlobsResponse
+
+ Gitaly->>+GitLabSecretDetection: PreReceive
+
+ GitLabSecretDetection->>GitLabSecretDetection: Scan(blob)
+ GitLabSecretDetection->>-Gitaly: found
+
+ Gitaly->>+Rails: PreReceive
+
+ Rails->>User: rejected: secret found
+
+ User->>+Workhorse: POST issuable with-secret
+ Workhorse->>+Rails: tcp
+ Rails->>+GitLabSecretDetection: PreReceive
+
+ GitLabSecretDetection->>GitLabSecretDetection: Scan(blob)
+ GitLabSecretDetection->>-Rails: found
+
+ Rails->>User: rejected: secret found
+```
### Target types
@@ -151,7 +434,7 @@ Token types to identify in order of importance:
1. Well-defined GitLab tokens (including Personal Access Tokens and Pipeline Trigger Tokens)
1. Verified Partner tokens (including AWS)
-1. Remainder tokens currently included in Secret Detection CI configuration
+1. Remainder tokens included in Secret Detection CI configuration
### Detection engine
@@ -160,57 +443,13 @@ for all secret scanning in pipeline contexts. By using its `--no-git` configurat
we can scan arbitrary text blobs outside of a repository context and continue to
use it for non-pipeline scanning.
-In the case of PreReceive detection, we rely on a combination of keyword/substring matches
+In the case of pre-receive detection, we rely on a combination of keyword/substring matches
for pre-filtering and `re2` for regex detections. See [spike issue](https://gitlab.com/gitlab-org/gitlab/-/issues/423832) for initial benchmarks
Changes to the detection engine are out of scope until benchmarking unveils performance concerns.
Notable alternatives include high-performance regex engines such as [Hyperscan](https://github.com/intel/hyperscan) or it's portable fork [Vectorscan](https://github.com/VectorCamp/vectorscan).
-### High-level architecture
-
-The implementation of the secret scanning service is highly dependent on the outcomes of our benchmarking
-and capacity planning against both GitLab.com and our
-[Reference Architectures](../../../administration/reference_architectures/index.md).
-As the scanning capability must be an on-by-default component of both our SaaS and self-managed
-instances [the PoC](#iterations), the deployment characteristics must be considered to determine whether
-this is a standalone component or executed as a subprocess of the existing Sidekiq worker fleet
-(similar to the implementation of our Elasticsearch indexing service).
-
-Similarly, the scan target volume will require a robust and scalable enqueueing system to limit resource consumption.
-
-See [this thread](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/105142#note_1194863310)
-for past discussion around scaling approaches.
-
-### Push event detection flow
-
-```mermaid
-sequenceDiagram
- autonumber
- actor User
- User->>+Workhorse: git push with-secret
- Workhorse->>+Gitaly: tcp
- Gitaly->>+Rails: PreReceive
- Rails->>-Gitaly: ListAllBlobs
- Gitaly->>-Rails: ListAllBlobsResponse
-
- Rails->>+GitLabSecretDetection: Scan(blob)
- GitLabSecretDetection->>-Rails: found
-
- Rails->>User: rejected: secret found
-
- User->>+Workhorse: git push without-secret
- Workhorse->>+Gitaly: tcp
- Gitaly->>+Rails: PreReceive
- Rails->>-Gitaly: ListAllBlobs
- Gitaly->>-Rails: ListAllBlobsResponse
-
- Rails->>+GitLabSecretDetection: Scan(blob)
- GitLabSecretDetection->>-Rails: not_found
-
- Rails->>User: OK
-```
-
## Iterations
- ✓ Define [requirements for detection coverage and actions](https://gitlab.com/gitlab-org/gitlab/-/issues/376716)
diff --git a/doc/architecture/blueprints/secret_manager/secrets-manager-overview.png b/doc/architecture/blueprints/secret_manager/secrets-manager-overview.png
index 4e3985cc30e..c6c51b05164 100644
--- a/doc/architecture/blueprints/secret_manager/secrets-manager-overview.png
+++ b/doc/architecture/blueprints/secret_manager/secrets-manager-overview.png
Binary files differ