Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/architecture/blueprints')
-rw-r--r--doc/architecture/blueprints/cells/index.md15
-rw-r--r--doc/architecture/blueprints/container_registry_metadata_database_self_managed_rollout/index.md63
-rw-r--r--doc/architecture/blueprints/modular_monolith/bounded_contexts.md27
-rw-r--r--doc/architecture/blueprints/modular_monolith/hexagonal_monolith/index.md98
-rw-r--r--doc/architecture/blueprints/modular_monolith/index.md21
5 files changed, 168 insertions, 56 deletions
diff --git a/doc/architecture/blueprints/cells/index.md b/doc/architecture/blueprints/cells/index.md
index 0e93b9d5d3b..160228a99de 100644
--- a/doc/architecture/blueprints/cells/index.md
+++ b/doc/architecture/blueprints/cells/index.md
@@ -302,7 +302,7 @@ One iteration describes one quarter's worth of work.
1. Iteration 8 - FY25Q4
- TBD
-## Technical Proposals
+## Technical proposals
The Cells architecture has long lasting implications to data processing, location, scalability and the GitLab architecture.
This section links all different technical proposals that are being evaluated.
@@ -310,6 +310,19 @@ This section links all different technical proposals that are being evaluated.
- [Stateless Router That Uses a Cache to Pick Cell and Is Redirected When Wrong Cell Is Reached](proposal-stateless-router-with-buffering-requests.md)
- [Stateless Router That Uses a Cache to Pick Cell and pre-flight `/api/v4/cells/learn`](proposal-stateless-router-with-routes-learning.md)
+## Data pipeline ingestion
+
+The Cells architecture will have a significant impact on the current [data pipeline](https://about.gitlab.com/handbook/business-technology/data-team/platform/pipelines/SAAS-Gitlab-com/) which exports data from Postgres to Snowflake for the use of data analytics. This data pipeline fulfils many use cases (i.e. SAAS Service ping, Gainsight metrics and Reporting and Analytics of the SAAS Platform).
+
+The current data pipeline is limited by not having the possibility to get data via a CDC mechanism (which leads to data quality issues) and works by polling the Postgres database and looking for new and updated records or fully extracting data for certain tables which causes a lot of overhead.
+At the moment the data pipeline runs against two instances that get created from a snapshot of both the `main` and `ci` databases.
+This is done to avoid workload on the production databases.
+In the Cells architecture there will be more Postgres instances because of which the current pipeline couldn't scale to pull data from all the Postgres instances. Requirements around the data pipeline moving forward are as follows:
+
+- We need a process that allows capturing all the CDC (insert, update and delete) from all Cells, scaling automatically with N number of Cells.
+- We need to have (direct or indirect) access to database instances which allows it to do data catch up in case of major failure or root cause analysis for data anomalies.
+- We need monitoring in place to alert any incident that can delay the data ingestion.
+
## Impacted features
The Cells architecture will impact many features requiring some of them to be rewritten, or changed significantly.
diff --git a/doc/architecture/blueprints/container_registry_metadata_database_self_managed_rollout/index.md b/doc/architecture/blueprints/container_registry_metadata_database_self_managed_rollout/index.md
index 0987b317af8..df9448309ce 100644
--- a/doc/architecture/blueprints/container_registry_metadata_database_self_managed_rollout/index.md
+++ b/doc/architecture/blueprints/container_registry_metadata_database_self_managed_rollout/index.md
@@ -66,29 +66,46 @@ the setup and maintenance of the registry database for new and existing deploys.
For the registry, we need to develop and validate import tooling which
coordinates with the core import functionality which was used to migrate all
-container images on GitLab.com. Additionally, we must validate that each supported
-storage driver works as expected with the import process and provide estimated
-import times for admins.
-
-We can structure our work to meet the standards outlined in support for
-Experiment, Beta, and Alpha features. Doing so will help to prioritize core
-functionality and to allow users who wish to be early adopters to begin using
-the database and providing us with invaluable feedback.
-
-These levels of support could be advertised to self-managed users via a simple
-chart, allowing them to tell at a glance the status of this project as it relates
-to their situation.
-
-| Installation | GCS | AWS | Filesystem | Azure | OSS | Swift|
-| ------ | ------ |------ | ------ | ------ |------ | ------ |
-| Omnibus | GA | GA | Beta | Experimental | Experimental | Experimental |
-| Charts | GA | GA |Beta | Experimental | Experimental | Experimental |
-
-### Justification of Structuring Support by Driver
-
-It's possible that we could simplify the proposed support matrix by structuring
-it only by deployment environment and not differentiate by storage driver. The
-following two sections briefly summarize several points for and against.
+container images on GitLab.com. Additionally, we should provide estimated import
+times for admins for each supported storage driver.
+
+During the beta phase, we can highlight key features of our work to provide a
+quick reference for what features we have now, are planning, their statuses, and
+an excutive summary of the overall state of the migration experience.
+This could be advertised to self-managed users via a simple chart, allowing them
+to tell at a glance the status of this project and determine if it is feature-
+complete enough for their needs and level of risk tolerance.
+
+This should be documented in the container registry administration documentation,
+rather than in this blueprint. Providing this information there will place it in
+a familiar place for self-managed admins, will allow for logical cross-linking
+from other sections of the same document, such as from the garbage collection
+section.
+
+For example:
+
+The metadata database is in early beta for self-managed users. The core migration
+process for existing registries has been implemented, and online garbage collection
+is fully implemented. Certain database enabled features are only enabled for GitLab.com
+and automatic database provisioning for the registry database is not available.
+Please see the table below for the status of features related to the container
+registry database.
+
+| Feature | Description | Status | Link |
+| --------------------------- | ------------------------------------------------------------------- | ------------------ | ---------------------------------------------------------------------------------------------- |
+| Import Tool | Allows existing deployments to migrate to the database. | Completed | [Import Tool](https://gitlab.com/gitlab-org/container-registry/-/issues/884) |
+| Automatic Import Validation | Tests that the import maintained data integrity of imported images. | Backlog | [Validate self-managed imports](https://gitlab.com/gitlab-org/container-registry/-/issues/938) |
+| Foo Bar | Lorem ipsum dolor sit amet. | Scheduled for 16.5 | <LINK> |
+
+### Structuring Support by Driver
+
+The import operation heavily relies on the object storage driver implementation
+to iterate over all registry metadata so that it can be stored in the database.
+It's possible that implementation differences in the driver will make a
+meaningful impact on the performance and reliability of the import process.
+
+The following two sections briefly summarize several points for and against
+structuring support by driver.
#### Arguments Opposed to Structuring Support by Driver
diff --git a/doc/architecture/blueprints/modular_monolith/bounded_contexts.md b/doc/architecture/blueprints/modular_monolith/bounded_contexts.md
index 0f71e24864e..8133106050d 100644
--- a/doc/architecture/blueprints/modular_monolith/bounded_contexts.md
+++ b/doc/architecture/blueprints/modular_monolith/bounded_contexts.md
@@ -41,11 +41,16 @@ The majority of the code is not properly namespaced and organized:
In June 2023 we've started extracing gems out of the main codebase, into
[`gems/` directory inside the monorepo](https://gitlab.com/gitlab-org/gitlab/-/blob/4c6e120069abe751d3128c05ade45ea749a033df/doc/development/gems.md).
-This is our first step towards modularization: externalize code that can be
-extracted to prevent coupling from being introduced into modules that have been
-designed as separate components.
+This is our first step towards modularization.
-These gems as still part of the monorepo.
+- We want to separate generic code from domain code (that powers the business logic).
+- We want to cleanup `lib/` directory from generic code.
+- We want to isolate code that could live in a separate project, to prevent it from depending on domain code.
+
+These gems as still part of the monorepo but could be extracted into dedicated repositories if needed.
+
+Extraction of gems is non blocking to modularization but the less generic code exists in `lib/` the
+easier will be identifying and separating bounded context.
### 1. What makes a bounded context?
@@ -103,17 +108,3 @@ With this static list we could:
- Understand where to place new classes and modules.
- Enforce if any top-level namespaces are used that are not in the list of bounded contexts.
- Autoload non-standard Rails directories based on the given list.
-
-## Glossary
-
-- `modules` are Ruby modules and can be used to nest code hierarchically.
-- `namespaces` are unique hierarchies of Ruby constants. For example, `Ci::` but also `Ci::JobArtifacts::` or `Ci::Pipeline::Chain::`.
-- `packages` are Packwerk packages to group together related functionalities. These packages can be big or small depending on the design and architecture. Inside a package all constants (classes and modules) have the same namespace. For example:
- - In a package `ci`, all the classes would be nested under `Ci::` namespace. There can be also nested namespaces like `Ci::PipelineProcessing::`.
- - In a package `ci-pipeline_creation` all classes are nested under `Ci::PipelineCreation`, like `Ci::PipelineCreation::Chain::Command`.
- - In a package `ci` a class named `MergeRequests::UpdateHeadPipelineService` would not be allowed because it would not match the package's namespace.
- - This can be enforced easily with [Packwerk's based Rubocop Cops](https://github.com/rubyatscale/rubocop-packs/blob/main/lib/rubocop/cop/packs/root_namespace_is_pack_name.rb).
-- `bounded context` is a top-level Packwerk package that represents a macro aspect of the domain. For example: `Ci::`, `MergeRequests::`, `Packages::`, etc.
- - A bounded context is represented by a single Ruby module/namespace. For example, `Ci::` and not `Ci::JobArtifacts::`.
- - A bounded context can be made of 1 or multiple Packwerk packages. Nested packages would be recommended if the domain is quite complex and we want to enforce privacy among all the implementation details. For example: `Ci::PipelineProcessing::` and `Ci::PipelineCreation::` could be separate packages of the same bounded context and expose their public API while keeping implementation details private.
- - A new bounded context like `RemoteDevelopment::` can be represented a single package while large and complex bounded contexts like `Ci::` would need to be organized into smaller/nested packages.
diff --git a/doc/architecture/blueprints/modular_monolith/hexagonal_monolith/index.md b/doc/architecture/blueprints/modular_monolith/hexagonal_monolith/index.md
index eb4b428cf52..f0f689d48ca 100644
--- a/doc/architecture/blueprints/modular_monolith/hexagonal_monolith/index.md
+++ b/doc/architecture/blueprints/modular_monolith/hexagonal_monolith/index.md
@@ -25,12 +25,22 @@ Use [Packwerk](https://github.com/Shopify/packwerk) to enforce privacy and depen
## Details
+```mermaid
+flowchart TD
+ u([User]) -- interacts directly with --> AA[Application Adapter: WebUI, REST, GraphQL, git, ...]
+ AA --uses abstractions from--> D[Application Domain]
+ AA -- depends on --> Platform
+ D -- depends on --> Platform[Platform: gems, configs, framework, ...]
+```
+
### Application domain
-The application core (functional domains) is divided into separate top-level bounded contexts called after the
-[feature category](https://gitlab.com/gitlab-com/www-gitlab-com/blob/master/data/categories.yml) they represent.
+The application core (functional domains) is composed of all the code that describes the business logic, policies and data
+that is unique to GitLab product.
+It is divided into separate top-level [bounded contexts](../bounded_contexts.md).
A bounded-context is represented in the form of a Ruby module.
-This follows the existing [guideline on naming namespaces](../../../../development/software_design.md#use-namespaces-to-define-bounded-contexts) but puts more structure to it.
+This follows the existing [guideline on naming namespaces](../../../../development/software_design.md#use-namespaces-to-define-bounded-contexts)
+but puts more structure to it.
Modules should:
@@ -52,6 +62,12 @@ If a feature category is only relevant in the context of a parent feature catego
parent's bounded context. For example: Build artifacts existing in the context of Continuous Integration feature category
and they may be merged under a single bounded context.
+The application domain has no knowledge of outer layers like the application adapters and only depends on the
+platform code. This makes the domain code to be the SSoT of the business logic, be reusable and testable regardless
+whether the request came from the WebUI or REST API.
+
+If a dependency between an outer layer and an inner layer is required (domain code depending on the interface of an adapter), this can be solved using inversion of control techniques, especially dependency injection.
+
### Application adapters
>>>
@@ -67,9 +83,14 @@ Application adapters would be:
- Web UI (Rails controllers, view, JS and Vue client)
- REST API endpoints
- GraphQL Endpoints
-- Action Cable
-TODO: continue describing how adapters are organized and why they are separate from the domain code.
+They are responsible for the interaction with the user. Each adapter should interpret the request, parse parameters
+and invoke the right abstraction from the application domain, then present the result back to the user.
+
+Presentation logic, and possibly authentication, would be specific to the adapters layer.
+
+The application adapters layer depends on the platform code to run: the Rails framework, the gems that power the adapter,
+the configurations and utilities.
### Platform code
@@ -95,19 +116,76 @@ This means that aside from the Rails framework code, the rest of the platform co
Eventually all code inside `gems/` could potentially be extracted in a separate repository or open sourced.
Placing platform code inside `gems/` makes it clear that its purpose is to serve the application code.
-### Why Packwerk?
+### Enforcing boundaries
+
+Ruby does not have the concept of privacy of constants in a given module. Unlike other programming languages, even extracting
+well documented gems doesn't prevent other developers from coupling code to implementation details because all constants
+are public in Ruby.
+
+We can have a codebase perfectly organized in an hexagonal architecture but still having the application domain, the biggest
+part of the codebase, being a non modularized [big ball of mud](https://en.wikipedia.org/wiki/Big_ball_of_mud).
+
+Enforcing boundaries is also vital to maintaining the structure long term. We don't want that after a big modularization
+effort we slowly fall back into a big ball of mud gain by violating the boundaries.
+
+We explored the idea of [using Packwerk in a proof of concept](../proof_of_concepts.md#use-packwerk-to-enforce-module-boundaries)
+to enforce module boundaries.
-TODO:
+[Packwerk](https://github.com/Shopify/packwerk) is a static analyzer that allows to gradually introduce packages in the
+codebase and enforce privacy and explicit dependencies. Packwerk can detect if some Ruby code is using private implementation
+details of another package or if it's using a package that wasn't declared explicitly as a dependency.
-- boundaries not enforced at runtime. Ruby code will still work as being all loaded in the same memory space.
-- can be introduced incrementally. Not everything requires to be moved to packs for the Rails autoloader to work.
+Being a static analyzer it does not affect code execution, meaning that introducing Packwerk is safe and can be done
+gradually.
Companies like Gusto have been developing and maintaining a list of [development and engineering tools](https://github.com/rubyatscale)
for organizations that want to move to using a Rails modular monolith around Packwerk.
### EE and JH extensions
-TODO:
+One of the unique challenges of modularizing the GitLab codebase is the presence of EE extensions (managed by GitLab)
+and JH extensions (managed by JiHu).
+
+By moving related domain code (e.g. `Ci::`) under the same bounded context and Packwerk package, we would also need to
+move `ee/` extensions in it.
+
+To have top-level bounded contexts to also match Packwerk packages it means that all code related to a specific domain
+needs to be placed under the same package directory, including EE extensions, for example.
+
+The following is just an example of a possible directory structure:
+
+```shell
+domains
+├── ci
+│ ├── package.yml # package definition.
+│ ├── packwerk.yml # tool configurations for this package.
+│ ├── package_todo.yml # existing violations.
+│ ├── core # Core features available in Community Edition and always autoloaded.
+│ │ ├── app
+│ │ │ ├── models/...
+│ │ │ ├── services/...
+│ │ │ └── lib/... # domain-specific `lib` moved inside `app` together with other classes.
+│ │ └── spec
+│ │ └── models/...
+│ ├── ee # EE extensions specific to the bounded context, conditionally autoloaded.
+│ │ ├── models/...
+│ │ └── spec
+│ │ └── models/...
+│ └── public # Public constants are placed here so they can be referenced by other packages.
+│ ├── core
+│ │ ├── app
+│ │ │ └── models/...
+│ │ └── spec
+│ │ └── models/...
+│ └── ee
+│ ├── app
+│ │ └── models/...
+│ └── spec
+│ └── models/...
+├── merge_requests/
+├── repositories/
+└── ...
+```
## Challenges
diff --git a/doc/architecture/blueprints/modular_monolith/index.md b/doc/architecture/blueprints/modular_monolith/index.md
index ef50be643a6..f1e6c119552 100644
--- a/doc/architecture/blueprints/modular_monolith/index.md
+++ b/doc/architecture/blueprints/modular_monolith/index.md
@@ -93,12 +93,11 @@ There are many aspects and details required to make modularization of our
monolith successful. We will work on the aspects listed below, refine them, and
add more important details as we move forward towards the goal:
-1. [Deliver modularization proof-of-concepts that will deliver key insights](proof_of_concepts.md)
-1. [Align modularization plans to the organizational structure](bounded_contexts.md)
+1. [Deliver modularization proof-of-concepts that will deliver key insights](proof_of_concepts.md).
+1. Align modularization plans to the organizational structure by [defining bounded contexts](bounded_contexts.md).
+1. Separate domains into modules that will reflect organizational structure (TODO)
1. Start a training program for team members on how to work with decoupled domains (TODO)
1. Build tools that will make it easier to build decoupled domains through inversion of control (TODO)
-1. Separate domains into modules that will reflect organizational structure (TODO)
-1. Build necessary services to align frontend and backend modularization (TODO)
1. [Introduce hexagonal architecture within the monolith](hexagonal_monolith/index.md)
1. Introduce clean architecture with one-way-dependencies and host application (TODO)
1. Build abstractions that will make it possible to run and deploy domains separately (TODO)
@@ -107,6 +106,20 @@ add more important details as we move forward towards the goal:
In progress.
+## Glossary
+
+- `modules` are Ruby modules and can be used to nest code hierarchically.
+- `namespaces` are unique hierarchies of Ruby constants. For example, `Ci::` but also `Ci::JobArtifacts::` or `Ci::Pipeline::Chain::`.
+- `packages` are Packwerk packages to group together related functionalities. These packages can be big or small depending on the design and architecture. Inside a package all constants (classes and modules) have the same namespace. For example:
+ - In a package `ci`, all the classes would be nested under `Ci::` namespace. There can be also nested namespaces like `Ci::PipelineProcessing::`.
+ - In a package `ci-pipeline_creation` all classes are nested under `Ci::PipelineCreation`, like `Ci::PipelineCreation::Chain::Command`.
+ - In a package `ci` a class named `MergeRequests::UpdateHeadPipelineService` would not be allowed because it would not match the package's namespace.
+ - This can be enforced easily with [Packwerk's based Rubocop Cops](https://github.com/rubyatscale/rubocop-packs/blob/main/lib/rubocop/cop/packs/root_namespace_is_pack_name.rb).
+- `bounded context` is a top-level Packwerk package that represents a macro aspect of the domain. For example: `Ci::`, `MergeRequests::`, `Packages::`, etc.
+ - A bounded context is represented by a single Ruby module/namespace. For example, `Ci::` and not `Ci::JobArtifacts::`.
+ - A bounded context can be made of 1 or multiple Packwerk packages. Nested packages would be recommended if the domain is quite complex and we want to enforce privacy among all the implementation details. For example: `Ci::PipelineProcessing::` and `Ci::PipelineCreation::` could be separate packages of the same bounded context and expose their public API while keeping implementation details private.
+ - A new bounded context like `RemoteDevelopment::` can be represented a single package while large and complex bounded contexts like `Ci::` would need to be organized into smaller/nested packages.
+
## References
[List of references](references.md)