Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/architecture/blueprints/cells')
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-admin-area.md59
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-agent-for-kubernetes.md30
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-backups.md62
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-ci-runners.md170
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-container-registry.md132
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-contributions-forks.md121
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-dashboard.md30
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-data-migration.md131
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-database-sequences.md95
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-git-access.md164
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-gitlab-pages.md30
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-global-search.md48
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-graphql.md95
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-organizations.md59
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-personal-namespaces.md30
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-router-endpoints-classification.md47
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-schema-changes.md56
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-secrets.md49
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-snippets.md30
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-template.md30
-rw-r--r--doc/architecture/blueprints/cells/cells-feature-uploads.md30
-rw-r--r--doc/architecture/blueprints/cells/glossary.md106
-rw-r--r--doc/architecture/blueprints/cells/goals.md59
-rw-r--r--doc/architecture/blueprints/cells/images/pods-and-fulfillment.pngbin0 -> 20899 bytes
-rw-r--r--doc/architecture/blueprints/cells/images/term-cell.pngbin0 -> 26613 bytes
-rw-r--r--doc/architecture/blueprints/cells/images/term-cluster.pngbin0 -> 91814 bytes
-rw-r--r--doc/architecture/blueprints/cells/images/term-organization.pngbin0 -> 29527 bytes
-rw-r--r--doc/architecture/blueprints/cells/images/term-top-level-group.pngbin0 -> 15122 bytes
-rw-r--r--doc/architecture/blueprints/cells/impact.md58
-rw-r--r--doc/architecture/blueprints/cells/index.md360
-rw-r--r--doc/architecture/blueprints/cells/proposal-stateless-router-with-buffering-requests.md649
-rw-r--r--doc/architecture/blueprints/cells/proposal-stateless-router-with-routes-learning.md673
32 files changed, 3403 insertions, 0 deletions
diff --git a/doc/architecture/blueprints/cells/cells-feature-admin-area.md b/doc/architecture/blueprints/cells/cells-feature-admin-area.md
new file mode 100644
index 00000000000..31d5388d40b
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-admin-area.md
@@ -0,0 +1,59 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Admin Area'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Admin Area
+
+In our Cells architecture proposal we plan to share all admin related tables in
+GitLab. This allows simpler management of all Cells in one interface and reduces
+the risk of settings diverging in different Cells. This introduces challenges
+with admin pages that allow you to manage data that will be spread across all
+Cells.
+
+## 1. Definition
+
+There are consequences for admin pages that contain data that spans "the whole
+instance" as the Admin pages may be served by any Cell or possibly just 1 cell.
+There are already many parts of the Admin interface that will have data that
+spans many cells. For example lists of all Groups, Projects, Topics, Jobs,
+Analytics, Applications and more. There are also administrative monitoring
+capabilities in the Admin page that will span many cells such as the "Background
+Jobs" and "Background Migrations" pages.
+
+## 2. Data flow
+
+## 3. Proposal
+
+We will need to decide how to handle these exceptions with a few possible
+options:
+
+1. Move all these pages out into a dedicated per-cell Admin section. Probably
+ the URL will need to be routable to a single Cell like `/cells/<cell_id>/admin`,
+ then we can display this data per Cell. These pages will be distinct from
+ other Admin pages which control settings that are shared across all Cells. We
+ will also need to consider how this impacts self-managed customers and
+ whether, or not, this should be visible for single-cell instances of GitLab.
+1. Build some aggregation interfaces for this data so that it can be fetched
+ from all Cells and presented in a single UI. This may be beneficial to an
+ administrator that needs to see and filter all data at a glance, especially
+ when they don't know which Cell the data is on. The downside, however, is
+ that building this kind of aggregation is very tricky when all the Cells are
+ designed to be totally independent, and it does also enforce more strict
+ requirements on compatibility between Cells.
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-agent-for-kubernetes.md b/doc/architecture/blueprints/cells/cells-feature-agent-for-kubernetes.md
new file mode 100644
index 00000000000..37347cf836d
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-agent-for-kubernetes.md
@@ -0,0 +1,30 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Agent for Kubernetes'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Agent for Kubernetes
+
+> TL;DR
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-backups.md b/doc/architecture/blueprints/cells/cells-feature-backups.md
new file mode 100644
index 00000000000..d596bdd2078
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-backups.md
@@ -0,0 +1,62 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Backups'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Backups
+
+Each cells will take its own backups, and consequently have its own isolated
+backup / restore procedure.
+
+## 1. Definition
+
+GitLab Backup takes a backup of the PostgreSQL database used by the application,
+and also Git repository data.
+
+## 2. Data flow
+
+Each cell has a number of application databases to back up (e.g. `main`, and `ci`).
+
+Additionally, there may be cluster-wide metadata tables (e.g. `users` table)
+which is directly accesible via PostgreSQL.
+
+## 3. Proposal
+
+### 3.1. Cluster-wide metadata
+
+It is currently unknown how cluster-wide metadata tables will be accessible. We
+may choose to have cluster-wide metadata tables backed up separately, or have
+each cell back up its copy of cluster-wide metdata tables.
+
+### 3.2 Consistency
+
+#### 3.2.1 Take backups independently
+
+As each cell will communicate with each other via API, and there will be no joins
+to the users table, it should be acceptable for each cell to take a backup
+independently of each other.
+
+#### 3.2.2 Enforce snapshots
+
+We can require that each cell take a snapshot for the PostgreSQL databases at
+around the same time to allow for a consistent-enough backup.
+
+## 4. Evaluation
+
+As the number of cells increases, it will likely not be feasible to take a
+snapshot at the same time for all cells. Hence taking backups independently is
+the better option.
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-ci-runners.md b/doc/architecture/blueprints/cells/cells-feature-ci-runners.md
new file mode 100644
index 00000000000..e352be17dd3
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-ci-runners.md
@@ -0,0 +1,170 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: CI Runners'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: CI Runners
+
+GitLab in order to execute CI jobs [GitLab Runner](https://gitlab.com/gitlab-org/gitlab-runner/),
+very often managed by customer in their infrastructure.
+
+All CI jobs created as part of CI pipeline are run in a context of project
+it poses a challenge how to manage GitLab Runners.
+
+## 1. Definition
+
+There are 3 different types of runners:
+
+- instance-wide: runners that are registered globally with specific tags (selection criteria)
+- group runners: runners that execute jobs from a given top-level group or subprojects of that group
+- project runners: runners that execute jobs from projects or many projects: some runners might
+ have projects assigned from projects in different top-level groups.
+
+This alongside with existing data structure where `ci_runners` is a table describing
+all types of runners poses a challenge how the `ci_runners` should be managed in a Cells environment.
+
+## 2. Data flow
+
+GitLab Runners use a set of globally scoped endpoints to:
+
+- registration of a new runner via registration token `https://gitlab.com/api/v4/runners`
+ ([subject for removal](../runner_tokens/index.md)) (`registration token`)
+- requests jobs via an authenticated `https://gitlab.com/api/v4/jobs/request` endpoint (`runner token`)
+- upload job status via `https://gitlab.com/api/v4/jobs/:job_id` (`build token`)
+- upload trace via `https://gitlab.com/api/v4/jobs/:job_id/trace` (`build token`)
+- download and upload artifacts via `https://gitlab.com/api/v4/jobs/:job_id/artifacts` (`build token`)
+
+Currently three types of authentication tokens are used:
+
+- runner registration token ([subject for removal](../runner_tokens/index.md))
+- runner token representing an registered runner in a system with specific configuration (`tags`, `locked`, etc.)
+- build token representing an ephemeral token giving a limited access to updating a specific
+ job, uploading artifacts, downloading dependent artifacts, downloading and uploading
+ container registry images
+
+Each of those endpoints do receive an authentication token via header (`JOB-TOKEN` for `/trace`)
+or body parameter (`token` all other endpoints).
+
+Since the CI pipeline would be created in a context of a specific Cell it would be required
+that pick of a build would have to be processed by that particular Cell. This requires
+that build picking depending on a solution would have to be either:
+
+- routed to correct Cell for a first time
+- be made to be two phase: request build from global pool, claim build on a specific Cell using a Cell specific URL
+
+## 3. Proposal
+
+This section describes various proposals. Reader should consider that those
+proposals do describe solutions for different problems. Many or some aspects
+of those proposals might be the solution to the stated problem.
+
+### 3.1. Authentication tokens
+
+Even though the paths for CI Runners are not routable they can be made routable with
+those two possible solutions:
+
+- The `https://gitlab.com/api/v4/jobs/request` uses a long polling mechanism with
+ a ticketing mechanism (based on `X-GitLab-Last-Update` header). Runner when first
+ starts sends a request to GitLab to which GitLab responds with either a build to pick
+ by runner. This value is completely controlled by GitLab. This allows GitLab
+ to use JWT or any other means to encode `cell` identifier that could be easily
+ decodable by Router.
+- The majority of communication (in terms of volume) is using `build token` making it
+ the easiest target to change since GitLab is sole owner of the token that Runner later
+ uses for specific job. There were prior discussions about not storing `build token`
+ but rather using `JWT` token with defined scopes. Such token could encode the `cell`
+ to which router could easily route all requests.
+
+### 3.2. Request body
+
+- The most of used endpoints pass authentication token in request body. It might be desired
+ to use HTTP Headers as an easier way to access this information by Router without
+ a need to proxy requests.
+
+### 3.3. Instance-wide are Cell local
+
+We can pick a design where all runners are always registered and local to a given Cell:
+
+- Each Cell has it's own set of instance-wide runners that are updated at it's own pace
+- The project runners can only be linked to projects from the same organization
+ creating strong isolation.
+- In this model the `ci_runners` table is local to the Cell.
+- In this model we would require the above endpoints to be scoped to a Cell in some way
+ or made routable. It might be via prefixing them, adding additional Cell parameter,
+ or providing much more robust way to decode runner token and match it to Cell.
+- If routable token is used, we could move away from cryptographic random stored in
+ database to rather prefer to use JWT tokens that would encode
+- The Admin Area showing registered Runners would have to be scoped to a Cell
+
+This model might be desired since it provides strong isolation guarantees.
+This model does significantly increase maintenance overhead since each Cell is managed
+separately.
+
+This model may require adjustments to runner tags feature so that projects have consistent runner experience across cells.
+
+### 3.4. Instance-wide are cluster-wide
+
+Contrary to proposal where all runners are Cell local, we can consider that runners
+are global, or just instance-wide runners are global.
+
+However, this requires significant overhaul of system and to change the following aspects:
+
+- `ci_runners` table would likely have to be split decomposed into `ci_instance_runners`, ...
+- all interfaces would have to be adopted to use correct table
+- build queuing would have to be reworked to be two phase where each Cell would know of all pending
+ and running builds, but the actual claim of a build would happen against a Cell containing data
+- likely `ci_pending_builds` and `ci_running_builds` would have to be made `cluster-wide` tables
+ increasing likelihood of creating hotspots in a system related to CI queueing
+
+This model makes it complex to implement from engineering side. Does make some data being shared
+between Cells. Creates hotspots / scalability issues in a system (ex. during abuse) that
+might impact experience of organizations on other Cells.
+
+### 3.5. GitLab CI Daemon
+
+Another potential solution to explore is to have a dedicated service responsible for builds queueing
+owning it's database and working in a model of either sharded or celled service. There were prior
+discussions about [CI/CD Daemon](https://gitlab.com/gitlab-org/gitlab/-/issues/19435).
+
+If the service would be sharded:
+
+- depending on a model if runners are cluster-wide or cell-local this service would have to fetch
+ data from all Cells
+- if the sharded service would be used we could adapt a model of either sharing database containing
+ `ci_pending_builds/ci_running_builds` with the service
+- if the sharded service would be used we could consider a push model where each Cell pushes to CI/CD Daemon
+ builds that should be picked by Runner
+- the sharded service would be aware which Cell is responsible for processing the given build and could
+ route processing requests to designated Cell
+
+If the service would be celled:
+
+- all expectations of routable endpoints are still valid
+
+In general usage of CI Daemon does not help significantly with the stated problem. However, this offers
+a few upsides related to more efficient processing and decoupling model: push model and it opens a way
+to offer stateful communication with GitLab Runners (ex. gRPC or Websockets).
+
+## 4. Evaluation
+
+Considering all solutions it appears that solution giving the most promise is:
+
+- use "instance-wide are Cell local"
+- refine endpoints to have routable identities (either via specific paths, or better tokens)
+
+Other potential upsides is to get rid of `ci_builds.token` and rather use a `JWT token`
+that can much better and easier encode wider set of scopes allowed by CI runner.
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-container-registry.md b/doc/architecture/blueprints/cells/cells-feature-container-registry.md
new file mode 100644
index 00000000000..a5761808941
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-container-registry.md
@@ -0,0 +1,132 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Container Registry'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Container Registry
+
+GitLab Container Registry is a feature allowing to store Docker Container Images
+in GitLab. You can read about GitLab integration [here](../../../user/packages/container_registry/index.md).
+
+## 1. Definition
+
+GitLab Container Registry is a complex service requiring usage of PostgreSQL, Redis
+and Object Storage dependencies. Right now there's undergoing work to introduce
+[Container Registry Metadata](../container_registry_metadata_database/index.md)
+to optimize data storage and image retention policies of Container Registry.
+
+GitLab Container Registry is serving as a container for stored data,
+but on it's own does not authenticate `docker login`. The `docker login`
+is executed with user credentials (can be `personal access token`)
+or CI build credentials (ephemeral `ci_builds.token`).
+
+Container Registry uses data deduplication. It means that the same blob
+(image layer) that is shared between many projects is stored only once.
+Each layer is hashed by `sha256`.
+
+The `docker login` does request JWT time-limited authentication token that
+is signed by GitLab, but validated by Container Registry service. The JWT
+token does store all authorized scopes (`container repository images`)
+and operation types (`push` or `pull`). A single JWT authentication token
+can be have many authorized scopes. This allows container registry and client
+to mount existing blobs from another scopes. GitLab responds only with
+authorized scopes. Then it is up to GitLab Container Registry to validate
+if the given operation can be performed.
+
+The GitLab.com pages are always scoped to project. Each project can have many
+container registry images attached.
+
+Currently in case of GitLab.com the actual registry service is served
+via `https://registry.gitlab.com`.
+
+The main identifiable problems are:
+
+- the authentication request (`https://gitlab.com/jwt/auth`) that is processed by GitLab.com
+- the `https://registry.gitlab.com` that is run by external service and uses it's own data store
+- the data deduplication, the Cells architecture with registry run in a Cell would reduce
+ efficiency of data storage
+
+## 2. Data flow
+
+### 2.1. Authorization request that is send by `docker login`
+
+```shell
+curl \
+ --user "username:password" \
+ "https://gitlab/jwt/auth?client_id=docker&offline_token=true&service=container_registry&scope=repository:gitlab-org/gitlab-build-images:push,pull"
+```
+
+Result is encoded and signed JWT token. Second base64 encoded string (split by `.`) contains JSON with authorized scopes.
+
+```json
+{"auth_type":"none","access":[{"type":"repository","name":"gitlab-org/gitlab-build-images","actions":["pull"]}],"jti":"61ca2459-091c-4496-a3cf-01bac51d4dc8","aud":"container_registry","iss":"omnibus-gitlab-issuer","iat":1669309469,"nbf":166}
+```
+
+### 2.2. Docker client fetching tags
+
+```shell
+curl \
+ -H "Accept: application/vnd.docker.distribution.manifest.v2+json" \
+ -H "Authorization: Bearer token" \
+ https://registry.gitlab.com/v2/gitlab-org/gitlab-build-images/tags/list
+
+curl \
+ -H "Accept: application/vnd.docker.distribution.manifest.v2+json" \
+ -H "Authorization: Bearer token" \
+ https://registry.gitlab.com/v2/gitlab-org/gitlab-build-images/manifests/danger-ruby-2.6.6
+```
+
+### 2.3. Docker client fetching blobs and manifests
+
+```shell
+curl \
+ -H "Accept: application/vnd.docker.distribution.manifest.v2+json" \
+ -H "Authorization: Bearer token" \
+ https://registry.gitlab.com/v2/gitlab-org/gitlab-build-images/blobs/sha256:a3f2e1afa377d20897e08a85cae089393daa0ec019feab3851d592248674b416
+```
+
+## 3. Proposal
+
+### 3.1. Shard Container Registry separately to Cells architecture
+
+Due to it's architecture it extensive architecture and in general highly scalable
+horizontal architecture it should be evaluated if the GitLab Container Registry
+should be run not in Cell, but in a Cluster and be scaled independently.
+
+This might be easier, but would definitely not offer the same amount of data isolation.
+
+### 3.2. Run Container Registry within a Cell
+
+It appears that except `/jwt/auth` which would likely have to be processed by Router
+(to decode `scope`) the container registry could be run as a local service of a Cell.
+
+The actual data at least in case of GitLab.com is not forwarded via registry,
+but rather served directly from Object Storage / CDN.
+
+Its design encodes container repository image in a URL that is easily routable.
+It appears that we could re-use the same stateless Router service in front of Container Registry
+to serve manifests and blobs redirect.
+
+The only downside is increased complexity of managing standalone registry for each Cell,
+but this might be desired approach.
+
+## 4. Evaluation
+
+There do not seem any theoretical problems with running GitLab Container Registry in a Cell.
+Service seems that can be easily made routable to work well.
+
+The practical complexities are around managing complex service from infrastructure side.
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-contributions-forks.md b/doc/architecture/blueprints/cells/cells-feature-contributions-forks.md
new file mode 100644
index 00000000000..3e498c24144
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-contributions-forks.md
@@ -0,0 +1,121 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Contributions: Forks'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Contributions: Forks
+
+[Forking workflow](../../../user/project/repository/forking_workflow.md) allows users
+to copy existing project sources into their own namespace of choice (personal or group).
+
+## 1. Definition
+
+[Forking workflow](../../../user/project/repository/forking_workflow.md) is common workflow
+with various usage patterns:
+
+- allows users to contribute back to upstream project
+- persist repositories into their personal namespace
+- copy to make changes and release as modified project
+
+Forks allow users not having write access to parent project to make changes. The forking workflow
+is especially important for the Open Source community which is able to contribute back
+to public projects. However, it is equally important in some companies which prefer the strong split
+of responsibilites and tighter access control. The access to project is restricted
+to designated list of developers.
+
+Forks enable:
+
+- tigther control of who can modify the upstream project
+- split of the responsibilites: parent project might use CI configuration connecting to production systems
+- run CI pipelines in context of fork in much more restrictive environment
+- consider all forks to be unveted which reduces risks of leaking secrets, or any other information
+ tied with the project
+
+The forking model is problematic in Cells architecture for following reasons:
+
+- Forks are clones of existing repositories, forks could be created across different organizations, Cells and Gitaly shards.
+- User can create merge request and contribute back to upstream project, this upstream project might in a different organization and Cell.
+- The merge request CI pipeline is to executed in a context of source project, but presented in a context of target project.
+
+## 2. Data flow
+
+## 3. Proposals
+
+### 3.1. Intra-Cluster forks
+
+This proposal makes us to implement forks as a intra-ClusterCell forks where communication is done via API
+between all trusted Cells of a cluster:
+
+- Forks when created, they are created always in context of user choice of group.
+- Forks are isolated from Organization.
+- Organization or group owner could disable forking across organizations or forking in general.
+- When a Merge Request is created it is created in context of target project, referencing
+ external project on another Cell.
+- To target project the merge reference is transfered that is used for presenting information
+ in context of target project.
+- CI pipeline is fetched in context of source project as it-is today, the result is fetched into
+ Merge Request of target project.
+- The Cell holding target project internally uses GraphQL to fetch status of source project
+ and include in context of the information for merge request.
+
+Upsides:
+
+- All existing forks continue to work as-is, as they are treated as intra-Cluster forks.
+
+Downsides:
+
+- The purpose of Organizations is to provide strong isolation between organizations
+ allowing to fork across does break security boundaries.
+- However, this is no different to ability of users today to clone repository to local computer
+ and push it to any repository of choice.
+- Access control of source project can be lower than those of target project. System today
+ requires that in order to contribute back the access level needs to be the same for fork and upstream.
+
+### 3.2. Forks are created in a personal namespace of the current organization
+
+Instead of creating projects across organizations, the forks are created in a user personal namespace
+tied with the organization. Example:
+
+- Each user that is part of organization receives their personal namespace. For example for `GitLab Inc.`
+ it could be `gitlab.com/organization/gitlab-inc/@ayufan`.
+- The user has to fork into it's own personal namespace of the organization.
+- The user has that many personal namespaces as many organizations it belongs to.
+- The personal namespace behaves similar to currently offered personal namespace.
+- The user can manage and create projects within a personal namespace.
+- The organization can prevent or disable usage of personal namespaces disallowing forks.
+- All current forks are migrated into personal namespace of user in Organization.
+- All forks are part of to the organization.
+- The forks are not federated features.
+- The personal namespace and forked project do not share configuration with parent project.
+
+### 3.3. Forks are created as internal projects under current project
+
+Instead of creating projects across organizations, the forks are attachments to existing projects.
+Each user forking a project receives their unique project. Example:
+
+- For project: `gitlab.com/gitlab-org/gitlab`, forks would be created in `gitlab.com/gitlab-org/gitlab/@kamil-gitlab`.
+- Forks are created in a context of current organization, they do not cross organization boundaries
+ and are managed by the organization.
+- Tied to the user (or any other user-provided name of the fork).
+- The forks are not federated features.
+
+Downsides:
+
+- Does not answer how to handle and migrate all exisiting forks.
+- Might share current group / project settings - breaking some security boundaries.
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-dashboard.md b/doc/architecture/blueprints/cells/cells-feature-dashboard.md
new file mode 100644
index 00000000000..135f69c6ed3
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-dashboard.md
@@ -0,0 +1,30 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Dashboard'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Dashboard
+
+> TL;DR
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-data-migration.md b/doc/architecture/blueprints/cells/cells-feature-data-migration.md
new file mode 100644
index 00000000000..ef0865b4081
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-data-migration.md
@@ -0,0 +1,131 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Data migration'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+DISCLAIMER:
+This page may contain information related to upcoming products, features and
+functionality. It is important to note that the information presented is for
+informational purposes only, so please do not rely on the information for
+purchasing or planning purposes. Just like with all projects, the items
+mentioned on the page are subject to change or delay, and the development,
+release, and timing of any products, features, or functionality remain at the
+sole discretion of GitLab Inc.
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Data migration
+
+It is essential for Cells architecture to provide a way to migrate data out of big Cells
+into smaller ones. This describes various approaches to provide this type of split.
+
+We also need to handle for cases where data is already violating the expected
+isolation constraints of Cells (ie. references cannot span multiple
+organizations). We know that existing features like linked issues allowed users
+to link issues across any projects regardless of their hierarchy. There are many
+similar features. All of this data will need to be migrated in some way before
+it can be split across different cells. This may mean some data needs to be
+deleted, or the feature changed and modelled slightly differently before we can
+properly split or migrate the organizations between cells.
+
+Having schema deviations across different Cells, which is a necessary
+consequence of different databases, will also impact our ability to migrate
+data between cells. Different schemas impact our ability to reliably replicate
+data across cells and especially impact our ability to validate that the data is
+correctly replicated. It might force us to only be able to move data between
+cells when the schemas are all in sync (slowing down deployments and the
+rebalancing process) or possibly only migrate from newer to older schemas which
+would be complex.
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+### 3.1. Split large Cells
+
+A single Cell can only be divided into many Cells. This is based on principle
+that it is easier to create exact clone of an existing Cell in many replicas
+out of which some will be made authoritative once migrated. Keeping those
+replicas up-to date with Cell 0 is also much easier due to pre-existing
+replication solutions that can replicate the whole systems: Geo, PostgreSQL
+physical replication, etc.
+
+1. All data of an organization needs to not be divided across many Cells.
+1. Split should be doable online.
+1. New Cells cannot contain pre-existing data.
+1. N Cells contain exact replica of Cell 0.
+1. The data of Cell 0 is live replicated to as many Cells it needs to be split.
+1. Once consensus is achieved between Cell 0 and N-Cells the organizations to be migrated away
+ are marked as read-only cluster-wide.
+1. The `routes` is updated on for all organizations to be split to indicate an authoritative
+ Cell holding the most recent data, like `gitlab-org` on `cell-100`.
+1. The data for `gitlab-org` on Cell 0, and on other non-authoritative N-Cells are dormant
+ and will be removed in the future.
+1. All accesses to `gitlab-org` on a given Cell are validated about `cell_id` of `routes`
+ to ensure that given Cell is authoritative to handle the data.
+
+#### More challenges of this proposal
+
+1. There is no streaming replication capability for Elasticsearch, but you could
+ snapshot the whole Elasticsearch index and recreate, but this takes hours.
+ It could be handled by pausing Elasticsearch indexing on the initial cell during
+ the migration as indexing downtime is not a big issue, but this still needs
+ to be coordinated with the migration process
+1. Syncing Redis, Gitaly, CI Postgres, Main Postgres, registry Postgres, other
+ new data stores snapshots in an online system would likely lead to gaps
+ without a long downtime. You need to choose a sync point and at the sync
+ point you need to stop writes to perform the migration. The more data stores
+ there are to migrate at the same time the longer the write downtime for the
+ failover. We would also need to find a reliable place in the application to
+ actually block updates to all these systems with a high degree of
+ confidence. In the past we've only been confident by shutting down all rails
+ services because any rails process could write directly to any of these at
+ any time due to async workloads or other surprising code paths.
+1. How to efficiently delete all the orphaned data. Locating all `ci_builds`
+ associated with half the organizations would be very expensive if we have to
+ do joins. We haven't yet determined if we'd want to store an `organization_id`
+ column on every table, but this is the kind of thing it would be helpful for.
+
+### 3.2. Migrate organization from an existing Cell
+
+This is different to split, as we intend to perform logical and selective replication
+of data belonging to a single organization.
+
+Today this type of selective replication is only implemented by Gitaly where we can migrate
+Git repository from a single Gitaly node to another with minimal downtime.
+
+In this model we would require identifying all resources belonging to a given organization:
+database rows, object storage files, Git repositories, etc. and selectively copy them over
+to another (likely) existing Cell importing data into it. Ideally ensuring that we can
+perform logical replication live of all changed data, but change similarly to split
+which Cell is authoritative for this organization.
+
+1. It is hard to identify all resources belonging to organization.
+1. It requires either downtime for organization or a robust system to identify
+ live changes made.
+1. It likely will require a full database structure analysis (more robust than project import/export)
+ to perform selective PostgreSQL logical replication.
+
+#### More challenges of this proposal
+
+1. Logical replication is still not performant enough to keep up with our
+ scale. Even if we could use logical replication we still don't have an
+ efficient way to filter data related to a single organization without
+ joining all the way to the `organizations` table which will slow down
+ logical replication dramatically.
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-database-sequences.md b/doc/architecture/blueprints/cells/cells-feature-database-sequences.md
new file mode 100644
index 00000000000..d94dc3be864
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-database-sequences.md
@@ -0,0 +1,95 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Database Sequences'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+DISCLAIMER:
+This page may contain information related to upcoming products, features and
+functionality. It is important to note that the information presented is for
+informational purposes only, so please do not rely on the information for
+purchasing or planning purposes. Just like with all projects, the items
+mentioned on the page are subject to change or delay, and the development,
+release, and timing of any products, features, or functionality remain at the
+sole discretion of GitLab Inc.
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Database Sequences
+
+GitLab today ensures that every database row create has unique ID, allowing
+to access Merge Request, CI Job or Project by a known global ID.
+
+Cells will use many distinct and not connected databases, each of them having
+a separate IDs for most of entities.
+
+It might be desirable to retain globally unique IDs for all database rows
+to allow migrating resources between Cells in the future.
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+This are some preliminary ideas how we can retain unique IDs across the system.
+
+### 3.1. UUID
+
+Instead of using incremental sequences use UUID (128 bit) that is stored in database.
+
+- This might break existing IDs and requires adding UUID column for all existing tables.
+- This makes all indexes larger as it requires storing 128 bit instead of 32/64 bit in index.
+
+### 3.2. Use Cell index encoded in ID
+
+Since significant number of tables already use 64 bit ID numbers we could use MSB to encode
+Cell ID effectively enabling
+
+- This might limit amount of Cells that can be enabled in system, as we might decide to only
+ allocate 1024 possible Cell numbers.
+- This might make IDs to be migratable between Cells, since even if entity from Cell 1 is migrated to Cell 100
+ this ID would still be unique.
+- If resources are migrated the ID itself will not be enough to decode Cell number and we would need
+ lookup table.
+- This requires updating all IDs to 32 bits.
+
+### 3.3. Allocate sequence ranges from central place
+
+Each Cell might receive its own range of the sequences as they are consumed from a centrally managed place.
+Once Cell consumes all IDs assigned for a given table it would be replenished and a next range would be allocated.
+Ranges would be tracked to provide a faster lookup table if a random access pattern is required.
+
+- This might make IDs to be migratable between Cells, since even if entity from Cell 1 is migrated to Cell 100
+ this ID would still be unique.
+- If resources are migrated the ID itself will not be enough to decode Cell number and we would need
+ much more robust lookup table as we could be breaking previously assigned sequence ranges.
+- This does not require updating all IDs to 64 bits.
+- This adds some performance penalty to all `INSERT` statements in Postgres or at least from Rails as we need to check for the sequence number and potentially wait for our range to be refreshed from the ID server
+- The available range will need to be stored and incremented in a centralized place so that concurrent transactions cannot possibly get the same value.
+
+### 3.4. Define only some tables to require unique IDs
+
+Maybe this is acceptable only for some tables to have a globally unique IDs. It could be projects, groups
+and other top-level entities. All other tables like `merge_requests` would only offer Cell-local ID,
+but when referenced outside it would rather use IID (an ID that is monotonic in context of a given resource, like project).
+
+- This makes the ID 10000 for `merge_requests` be present on all Cells, which might be sometimes confusing
+ as for uniqueness of the resource.
+- This might make random access by ID (if ever needed) be impossible without using composite key, like: `project_id+merge_request_id`.
+- This would require us to implement a transformation/generation of new ID if we need to migrate records to another cell. This can lead to very difficult migration processes when these IDs are also used as foreign keys for other records being migrated.
+- If IDs need to change when moving between cells this means that any links to records by ID would no longer work even if those links included the `project_id`.
+- If we plan to allow these ids to not be unique and change the unique constraint to be based on a composite key then we'd need to update all foreign key references to be based on the composite key
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-git-access.md b/doc/architecture/blueprints/cells/cells-feature-git-access.md
new file mode 100644
index 00000000000..70b3f136904
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-git-access.md
@@ -0,0 +1,164 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Git Access'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Git Access
+
+This document describes impact of Cells architecture on all Git access (over HTTPS and SSH)
+patterns providing explanation of how potentially those features should be changed
+to work well with Cells.
+
+## 1. Definition
+
+Git access is done through out the application. It can be an operation performed by the system
+(read Git repository) or by user (create a new file via Web IDE, `git clone` or `git push` via command line).
+
+The Cells architecture defines that all Git repositories will be local to the Cell,
+so no repository could be shared with another Cell.
+
+The Cells architecture will require that any Git operation done can only be handled by a Cell holding
+the data. It means that any operation either via Web interface, API, or GraphQL needs to be routed
+to the correct Cell. It means that any `git clone` or `git push` operation can only be performed
+in a context of a Cell.
+
+## 2. Data flow
+
+The are various operations performed today by the GitLab on a Git repository. This describes
+the data flow how they behave today to better represent the impact.
+
+It appears that Git access does require changes only to a few endpoints that are scoped to project.
+There appear to be different types of repositories:
+
+- Project: assigned to Group
+- Wiki: additional repository assigned to Project
+- Design: similar to Wiki, additional repository assigned to Project
+- Snippet: creates a virtual project to hold repository, likely tied to the User
+
+### 2.1. Git clone over HTTPS
+
+Execution of: `git clone` over HTTPS
+
+```mermaid
+sequenceDiagram
+ User ->> Workhorse: GET /gitlab-org/gitlab.git/info/refs?service=git-upload-pack
+ Workhorse ->> Rails: GET /gitlab-org/gitlab.git/info/refs?service=git-upload-pack
+ Rails ->> Workhorse: 200 OK
+ Workhorse ->> Gitaly: RPC InfoRefsUploadPack
+ Gitaly ->> User: Response
+ User ->> Workhorse: POST /gitlab-org/gitlab.git/git-upload-pack
+ Workhorse ->> Gitaly: RPC PostUploadPackWithSidechannel
+ Gitaly ->> User: Response
+```
+
+### 2.2. Git clone over SSH
+
+Execution of: `git clone` over SSH
+
+```mermaid
+sequenceDiagram
+ User ->> Git SSHD: ssh git@gitlab.com
+ Git SSHD ->> Rails: GET /api/v4/internal/authorized_keys
+ Rails ->> Git SSHD: 200 OK (list of accepted SSH keys)
+ Git SSHD ->> User: Accept SSH
+ User ->> Git SSHD: git clone over SSH
+ Git SSHD ->> Rails: POST /api/v4/internal/allowed?project=/gitlab-org/gitlab.git&service=git-upload-pack
+ Rails ->> Git SSHD: 200 OK
+ Git SSHD ->> Gitaly: RPC SSHUploadPackWithSidechannel
+ Gitaly ->> User: Response
+```
+
+### 2.3. Git push over HTTPS
+
+Execution of: `git push` over HTTPS
+
+```mermaid
+sequenceDiagram
+ User ->> Workhorse: GET /gitlab-org/gitlab.git/info/refs?service=git-receive-pack
+ Workhorse ->> Rails: GET /gitlab-org/gitlab.git/info/refs?service=git-receive-pack
+ Rails ->> Workhorse: 200 OK
+ Workhorse ->> Gitaly: RPC PostReceivePack
+ Gitaly ->> Rails: POST /api/v4/internal/allowed?gl_repository=project-111&service=git-receive-pack
+ Gitaly ->> Rails: POST /api/v4/internal/pre_receive?gl_repository=project-111
+ Gitaly ->> Rails: POST /api/v4/internal/post_receive?gl_repository=project-111
+ Gitaly ->> User: Response
+```
+
+### 2.4. Git push over SSHD
+
+Execution of: `git clone` over SSH
+
+```mermaid
+sequenceDiagram
+ User ->> Git SSHD: ssh git@gitlab.com
+ Git SSHD ->> Rails: GET /api/v4/internal/authorized_keys
+ Rails ->> Git SSHD: 200 OK (list of accepted SSH keys)
+ Git SSHD ->> User: Accept SSH
+ User ->> Git SSHD: git clone over SSH
+ Git SSHD ->> Rails: POST /api/v4/internal/allowed?project=/gitlab-org/gitlab.git&service=git-receive-pack
+ Rails ->> Git SSHD: 200 OK
+ Git SSHD ->> Gitaly: RPC ReceivePack
+ Gitaly ->> Rails: POST /api/v4/internal/allowed?gl_repository=project-111
+ Gitaly ->> Rails: POST /api/v4/internal/pre_receive?gl_repository=project-111
+ Gitaly ->> Rails: POST /api/v4/internal/post_receive?gl_repository=project-111
+ Gitaly ->> User: Response
+```
+
+### 2.5. Create commit via Web
+
+Execution of `Add CHANGELOG` to repository:
+
+```mermaid
+sequenceDiagram
+ Web ->> Puma: POST /gitlab-org/gitlab/-/create/main
+ Puma ->> Gitaly: RPC TreeEntry
+ Gitaly ->> Rails: POST /api/v4/internal/allowed?gl_repository=project-111
+ Gitaly ->> Rails: POST /api/v4/internal/pre_receive?gl_repository=project-111
+ Gitaly ->> Rails: POST /api/v4/internal/post_receive?gl_repository=project-111
+ Gitaly ->> Puma: Response
+ Puma ->> Web: See CHANGELOG
+```
+
+## 3. Proposal
+
+The Cells stateless router proposal requires that any ambiguous path (that is not routable)
+will be made to be routable. It means that at least the following paths will have to be updated
+do introduce a routable entity (project, group, or organization).
+
+Change:
+
+- `/api/v4/internal/allowed` => `/api/v4/internal/projects/<gl_repository>/allowed`
+- `/api/v4/internal/pre_receive` => `/api/v4/internal/projects/<gl_repository>/pre_receive`
+- `/api/v4/internal/post_receive` => `/api/v4/internal/projects/<gl_repository>/post_receive`
+- `/api/v4/internal/lfs_authenticate` => `/api/v4/internal/projects/<gl_repository>/lfs_authenticate`
+
+Where:
+
+- `gl_repository` can be `project-1111` (`Gitlab::GlRepository`)
+- `gl_repository` in some cases might be a full path to repository as executed by GitLab Shell (`/gitlab-org/gitlab.git`)
+
+## 4. Evaluation
+
+Supporting Git repositories if a Cell can access only its own repositories does not appear to be complex.
+
+The one major complication is supporting snippets, but this likely falls in the same category as for the approach
+to support user's personal namespaces.
+
+## 4.1. Pros
+
+1. The API used for supporting HTTPS/SSH and Hooks are well defined and can easily be made routable.
+
+## 4.2. Cons
+
+1. The sharing of repositories objects is limited to the given Cell and Gitaly node.
+1. The across-Cells forks are likely impossible to be supported (discover: how this work today across different Gitaly node).
diff --git a/doc/architecture/blueprints/cells/cells-feature-gitlab-pages.md b/doc/architecture/blueprints/cells/cells-feature-gitlab-pages.md
new file mode 100644
index 00000000000..7e4ab785095
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-gitlab-pages.md
@@ -0,0 +1,30 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: GitLab Pages'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: GitLab Pages
+
+> TL;DR
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-global-search.md b/doc/architecture/blueprints/cells/cells-feature-global-search.md
new file mode 100644
index 00000000000..c1e2b93bc2d
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-global-search.md
@@ -0,0 +1,48 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Global search'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+DISCLAIMER:
+This page may contain information related to upcoming products, features and
+functionality. It is important to note that the information presented is for
+informational purposes only, so please do not rely on the information for
+purchasing or planning purposes. Just like with all projects, the items
+mentioned on the page are subject to change or delay, and the development,
+release, and timing of any products, features, or functionality remain at the
+sole discretion of GitLab Inc.
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Global search
+
+When we introduce multiple Cells we intend to isolate all services related to
+those Cells. This will include Elasticsearch which means our current global
+search functionality will not work. It may be possible to implement aggregated
+search across all cells, but it is unlikely to be performant to do fan-out
+searches across all cells especially once you start to do pagination which
+requires setting the correct offset and page number for each search.
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+Likely first versions of Cells will simply not support global searches and then
+we may later consider if building global searches to support popular use cases
+is worthwhile.
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-graphql.md b/doc/architecture/blueprints/cells/cells-feature-graphql.md
new file mode 100644
index 00000000000..d936a1b81ba
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-graphql.md
@@ -0,0 +1,95 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: GraphQL'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+DISCLAIMER:
+This page may contain information related to upcoming products, features and
+functionality. It is important to note that the information presented is for
+informational purposes only, so please do not rely on the information for
+purchasing or planning purposes. Just like with all projects, the items
+mentioned on the page are subject to change or delay, and the development,
+release, and timing of any products, features, or functionality remain at the
+sole discretion of GitLab Inc.
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: GraphQL
+
+GitLab extensively uses GraphQL to perform efficient data query operations.
+GraphQL due to it's nature is not directly routable. The way how GitLab uses
+it calls the `/api/graphql` endpoint, and only query or mutation of body request
+might define where the data can be accessed.
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+There are at least two main ways to implement GraphQL in Cells architecture.
+
+### 3.1. GraphQL routable by endpoint
+
+Change `/api/graphql` to `/api/organization/<organization>/graphql`.
+
+- This breaks all existing usages of `/api/graphql` endpoint
+ since the API URI is changed.
+
+### 3.2. GraphQL routable by body
+
+As part of router parse GraphQL body to find a routable entity, like `project`.
+
+- This still makes the GraphQL query be executed only in context of a given Cell
+ and not allowing the data to be merged.
+
+```json
+# Good example
+{
+ project(fullPath:"gitlab-org/gitlab") {
+ id
+ description
+ }
+}
+
+# Bad example, since Merge Request is not routable
+{
+ mergeRequest(id: 1111) {
+ iid
+ description
+ }
+}
+```
+
+### 3.3. Merging GraphQL Proxy
+
+Implement as part of router GraphQL Proxy which can parse body
+and merge results from many Cells.
+
+- This might make pagination hard to achieve, or we might assume that
+ we execute many queries of which results are merged across all Cells.
+
+```json
+{
+ project(fullPath:"gitlab-org/gitlab"){
+ id, description
+ }
+ group(fullPath:"gitlab-com") {
+ id, description
+ }
+}
+```
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-organizations.md b/doc/architecture/blueprints/cells/cells-feature-organizations.md
new file mode 100644
index 00000000000..03178d9e6ce
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-organizations.md
@@ -0,0 +1,59 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Organizations'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+DISCLAIMER:
+This page may contain information related to upcoming products, features and
+functionality. It is important to note that the information presented is for
+informational purposes only, so please do not rely on the information for
+purchasing or planning purposes. Just like with all projects, the items
+mentioned on the page are subject to change or delay, and the development,
+release, and timing of any products, features, or functionality remain at the
+sole discretion of GitLab Inc.
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Organizations
+
+One of the major designs of Cells architecture is strong isolation between Groups.
+Organizations as described by this blueprint provides a way to have plausible UX
+for joining together many Groups that are isolated from the rest of systems.
+
+## 1. Definition
+
+Cells do require that all groups and projects of a single organization can
+only be stored on a single Cell since a Cell can only access data that it holds locally
+and has very limited capabilities to read information from other Cells.
+
+Cells with Organizations do require strong isolation between organizations.
+
+It will have significant implications on various user-facing features,
+like Todos, dropdowns allowing to select projects, references to other issues
+or projects, or any other social functions present at GitLab. Today those functions
+were able to reference anything in the whole system. With the introduction of
+organizations such will be forbidden.
+
+This problem definition aims to answer effort and implications required to add
+strong isolation between organizations to the system. Including features affected
+and their data processing flow. The purpose is to ensure that our solution when
+implemented consistently avoids data leakage between organizations residing on
+a single Cell.
+
+## 2. Data flow
+
+## 3. Proposal
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-personal-namespaces.md b/doc/architecture/blueprints/cells/cells-feature-personal-namespaces.md
new file mode 100644
index 00000000000..e8f5c250a8e
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-personal-namespaces.md
@@ -0,0 +1,30 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Personal Namespaces'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Personal Namespaces
+
+> TL;DR
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-router-endpoints-classification.md b/doc/architecture/blueprints/cells/cells-feature-router-endpoints-classification.md
new file mode 100644
index 00000000000..7c2974ca258
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-router-endpoints-classification.md
@@ -0,0 +1,47 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Router Endpoints Classification'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+DISCLAIMER:
+This page may contain information related to upcoming products, features and
+functionality. It is important to note that the information presented is for
+informational purposes only, so please do not rely on the information for
+purchasing or planning purposes. Just like with all projects, the items
+mentioned on the page are subject to change or delay, and the development,
+release, and timing of any products, features, or functionality remain at the
+sole discretion of GitLab Inc.
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Router Endpoints Classification
+
+Classification of all endpoints is essential to properly route request
+hitting load balancer of a GitLab installation to a Cell that can serve it.
+
+Each Cell should be able to decode each request and classify for which Cell
+it belongs to.
+
+GitLab currently implements hundreds of endpoints. This document tries
+to describe various techniques that can be implemented to allow the Rails
+to provide this information efficiently.
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-schema-changes.md b/doc/architecture/blueprints/cells/cells-feature-schema-changes.md
new file mode 100644
index 00000000000..d712b24a8a0
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-schema-changes.md
@@ -0,0 +1,56 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Schema changes'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+DISCLAIMER:
+This page may contain information related to upcoming products, features and
+functionality. It is important to note that the information presented is for
+informational purposes only, so please do not rely on the information for
+purchasing or planning purposes. Just like with all projects, the items
+mentioned on the page are subject to change or delay, and the development,
+release, and timing of any products, features, or functionality remain at the
+sole discretion of GitLab Inc.
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Schema changes
+
+When we introduce multiple Cells that own their own databases this will
+complicate the process of making schema changes to Postgres and Elasticsearch.
+Today we already need to be careful to make changes comply with our zero
+downtime deployments. For example,
+[when removing a column we need to make changes over 3 separate deployments](../../../development/database/avoiding_downtime_in_migrations.md#dropping-columns).
+We have tooling like `post_migrate` that helps with these kinds of changes to
+reduce the number of merge requests needed, but these will be complicated when
+we are dealing with deploying multiple rails applications that will be at
+different versions at any one time. This problem will be particularly tricky to
+solve for shared databases like our plan to share the `users` related tables
+among all Cells.
+
+A key benefit of Cells may be that it allows us to run different
+customers on different versions of GitLab. We may choose to update our own cell
+before all our customers giving us even more flexibility than our current
+canary architecture. But doing this means that schema changes need to have even
+more versions of backward compatibility support which could slow down
+development as we need extra steps to make schema changes.
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-secrets.md b/doc/architecture/blueprints/cells/cells-feature-secrets.md
new file mode 100644
index 00000000000..20260c89ccd
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-secrets.md
@@ -0,0 +1,49 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Secrets'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Secrets
+
+Where possible, each cell should have its own distinct set of secrets.
+However, there will be some secrets that will be required to be the same for all
+cells in the cluster
+
+## 1. Definition
+
+GitLab has a lot of
+[secrets](https://docs.gitlab.com/charts/installation/secrets.html) that needs
+to be configured.
+
+Some secrets are for inter-component communication, e.g. `GitLab Shell secret`,
+and used only within a cell.
+
+Some secrets are used for features, e.g. `ci_jwt_signing_key`.
+
+## 2. Data flow
+
+## 3. Proposal
+
+1. Secrets used for features will need to be consistent across all cells, so that the UX is consistent.
+ 1. This is especially true for the `db_key_base` secret which is used for
+ encrypting data at rest in the database - so that projects that are
+ transferred to another cell will continue to work. We do not want to have
+ to re-encrypt such rows when we move projects/groups between cells.
+1. Secrets which are used for intra-cell communication only should be uniquely generated
+ per-cell.
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-snippets.md b/doc/architecture/blueprints/cells/cells-feature-snippets.md
new file mode 100644
index 00000000000..f5e72c0e3a0
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-snippets.md
@@ -0,0 +1,30 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Snippets'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Snippets
+
+> TL;DR
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-template.md b/doc/architecture/blueprints/cells/cells-feature-template.md
new file mode 100644
index 00000000000..3cece3dc99e
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-template.md
@@ -0,0 +1,30 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Problem A'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: A
+
+> TL;DR
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/cells-feature-uploads.md b/doc/architecture/blueprints/cells/cells-feature-uploads.md
new file mode 100644
index 00000000000..fdac3a9977c
--- /dev/null
+++ b/doc/architecture/blueprints/cells/cells-feature-uploads.md
@@ -0,0 +1,30 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Uploads'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Cells: Uploads
+
+> TL;DR
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/cells/glossary.md b/doc/architecture/blueprints/cells/glossary.md
new file mode 100644
index 00000000000..c3ec5fd12e4
--- /dev/null
+++ b/doc/architecture/blueprints/cells/glossary.md
@@ -0,0 +1,106 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Glossary'
+---
+
+# Cells: Glossary
+
+We use the following terms to describe components and properties of the Cells architecture.
+
+## Cell
+
+> Pod was renamed to Cell in <https://gitlab.com/gitlab-com/www-gitlab-com/-/merge_requests/121163>
+
+A Cell is a set of infrastructure components that contains multiple top-level groups that belong to different organizations. The components include both datastores (PostgreSQL, Redis etc.) and stateless services (web etc.). The infrastructure components provided within a Cell are shared among organizations and their top-level groups but not shared with other Cells. This isolation of infrastructure components means that Cells are independent from each other.
+
+<img src="images/term-cell.png" height="200">
+
+### Cell properties
+
+- Each cell is independent from the others
+- Infrastructure components are shared by organizations and their top-level groups within a Cell
+- More Cells can be provisioned to provide horizontal scalability
+- A failing Cell does not lead to failure of other Cells
+- Noisy neighbor effects are limited to within a Cell
+- Cells are not visible to organizations; it is an implementation detail
+- Cells may be located in different geographical regions (for example, EU, US, JP, UK)
+
+Discouraged synonyms: GitLab instance, cluster, shard
+
+## Cluster
+
+A cluster is a collection of Cells.
+
+<img src="images/term-cluster.png" height="300">
+
+### Cluster properties
+
+- A cluster holds cluster-wide metadata, for example Users, Routes, Settings.
+
+Discouraged synonyms: whale
+
+## Organizations
+
+GitLab references [Organizations in the initial set up](../../../topics/set_up_organization.md) and users can add a (free text) organization to their profile. There is no Organization entity established in the GitLab codebase.
+
+As part of delivering Cells, we propose the introduction of an `organization` entity. Organizations would represent billable entities or customers.
+
+Organizations are a known concept, present for example in [AWS](https://docs.aws.amazon.com/whitepapers/latest/organizing-your-aws-environment/core-concepts.html) and [GCP](https://cloud.google.com/resource-manager/docs/cloud-platform-resource-hierarchy#organizations).
+
+Organizations work under the following assumptions:
+
+1. Users care about what happens within their organizations.
+1. Features need to work within an organization.
+1. Only few features need to work across organizations.
+1. Users understand that the majority of pages they view are only scoped to a single organization at a time.
+1. Organizations are located on a single cell.
+
+![Term Organization](images/term-organization.png)
+
+### Organization properties
+
+- Top-level groups belong to organizations
+- Organizations are isolated from each other by default meaning that cross-group features will only work for group that exist within a single organization
+- User namespaces must not belong to an organization
+
+Discouraged synonyms: Billable entities, customers
+
+## Top-Level group
+
+Top-level group is the name given to the top most group of all other groups. Groups and projects are nested underneath the top-level group.
+
+Example:
+
+`https://gitlab.com/gitlab-org/gitlab/`:
+
+- `gitlab-org` is a `top-level group`; the root for all groups and projects of an organization
+- `gitlab` is a `project`; a project of the organization.
+
+The top-level group has served as the defacto Organization entity. With the creation of Organization, top-level groups will be [nested underneath Organizations](https://gitlab.com/gitlab-org/gitlab/-/issues/394796).
+
+Over time there won't be a distinction between a top-level group and a group. All features that make Top-level groups different from groups will move to Organization.
+
+Discouraged synonyms: Root-level namespace
+
+![Term Top-level Group](images/term-top-level-group.png)
+
+### Top-level group properties
+
+- Top-level groups belonging to an organization are located on the same Cell
+- Top-level groups can interact with other top-level groups that belong to the same organization
+
+## Users
+
+Users are available globally and not restricted to a single Cell. Users belong to a single organization, but can participate in many organizations through group and project membership with varying permissions. Inside organizations, users can create multiple top-level groups. User activity is not limited to a single organization but their contributions (for example TODOs) are only aggregated within an organization. This avoids the need for aggregating across cells.
+
+### User properties
+
+- Users are shared globally across all Cells
+- Users can create multiple top-level groups
+- Users can be a member of multiple top-level groups
+- Users belong to one organization. See [!395736](https://gitlab.com/gitlab-org/gitlab/-/issues/395736)
+- Users can be members of groups and projects in different organizations
+- Users can administer organizations
+- User activity is aggregated in an organization
+- Every user has one personal namespace
diff --git a/doc/architecture/blueprints/cells/goals.md b/doc/architecture/blueprints/cells/goals.md
new file mode 100644
index 00000000000..67dc25625c7
--- /dev/null
+++ b/doc/architecture/blueprints/cells/goals.md
@@ -0,0 +1,59 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Goals'
+---
+
+# Cells: Goals
+
+## Scalability
+
+The main goal of this new shared-infrastructure architecture is to provide additional scalability for our SaaS Platform. GitLab.com is largely monolithic and we have estimated (internal) that the current architecture has scalability limitations, even when database partitioning and decomposition are taken into account.
+
+Cells provide a horizontally scalable solution because additional Cells can be created based on demand. Cells can be provisioned and tuned as needed for optimal scalability.
+
+## Increased availability
+
+A major challenge for shared-infrastructure architectures is a lack of isolation between top-level groups. This can lead to noisy neighbor effects. A organization's behavior inside a top-level group can impact all other organizations. This is highly undesirable. Cells provide isolation at the cell level. A group of organizations is fully isolated from other organizations located on a different Cell. This minimizes noisy neighbor effects while still benefiting from the cost-efficiency of shared infrastructure.
+
+Additionally, Cells provide a way to implement disaster recovery capabilities. Entire Cells may be replicated to read-only standbys with automatic failover capabilities.
+
+## A consistent experience
+
+Organizations should have the same user experience on our SaaS platform as they do on a self-managed GitLab instance.
+
+## Regions
+
+GitLab.com is only hosted within the United States of America. Organizations located in other regions have voiced demand for local SaaS offerings. Cells provide a path towards [GitLab Regions](https://gitlab.com/groups/gitlab-org/-/epics/6037) because Cells may be deployed within different geographies. Depending on which of the organization's data is located outside a Cell, this may solve data residency and compliance problems.
+
+## Market segment
+
+Cells would provide a solution for organizations in the small to medium business (up to 100 users) and the mid-market segment (up to 2000 users).
+(See [segmentation definitions](https://about.gitlab.com/handbook/sales/field-operations/gtm-resources/#segmentation).)
+Larger organizations may benefit substantially from [GitLab Dedicated](../../../subscriptions/gitlab_dedicated/index.md).
+
+At this moment, GitLab.com has "social-network"-like capabilities that may not fit well into a more isolated organization model. Removing those features, however, possesses some challenges:
+
+1. How will existing `gitlab-org` contributors contribute to the namespace??
+1. How do we move existing top-level groups into the new model (effectively breaking their social features)?
+
+We should evaluate if the SMB and mid market segment is interested in these features, or if not having them is acceptable in most cases.
+
+## Self-managed
+
+For reasons of consistency, it is expected that self-managed instances will
+adopt the cells architecture as well. To expand, self-managed instances can
+continue with just a single Cell while supporting the option of adding additional
+Cells. Organizations, and possible User decomposition will also be adopted for
+self-managed instances.
+
+## High-level architecture problems to solve
+
+A number of technical issues need to be resolved to implement Cells (in no particular order). This section will be expanded.
+
+1. How are Cells provisioned? - [Design discussion](https://gitlab.com/gitlab-org/gitlab/-/issues/396641)
+1. What is a Cells topology? - [Design discussion](https://gitlab.com/gitlab-org/gitlab/-/issues/396641)
+1. How are users of an organization routed to the correct Cell? -
+1. How do users authenticate with Cells and Organizations? - [Design discussion](https://gitlab.com/gitlab-org/gitlab/-/issues/395736)
+1. How are Cells rebalanced?
+1. How can Cells implement disaster recovery capabilities?
diff --git a/doc/architecture/blueprints/cells/images/pods-and-fulfillment.png b/doc/architecture/blueprints/cells/images/pods-and-fulfillment.png
new file mode 100644
index 00000000000..fea32d1800e
--- /dev/null
+++ b/doc/architecture/blueprints/cells/images/pods-and-fulfillment.png
Binary files differ
diff --git a/doc/architecture/blueprints/cells/images/term-cell.png b/doc/architecture/blueprints/cells/images/term-cell.png
new file mode 100644
index 00000000000..799b2eccd95
--- /dev/null
+++ b/doc/architecture/blueprints/cells/images/term-cell.png
Binary files differ
diff --git a/doc/architecture/blueprints/cells/images/term-cluster.png b/doc/architecture/blueprints/cells/images/term-cluster.png
new file mode 100644
index 00000000000..03c92850b64
--- /dev/null
+++ b/doc/architecture/blueprints/cells/images/term-cluster.png
Binary files differ
diff --git a/doc/architecture/blueprints/cells/images/term-organization.png b/doc/architecture/blueprints/cells/images/term-organization.png
new file mode 100644
index 00000000000..dd6367ad84a
--- /dev/null
+++ b/doc/architecture/blueprints/cells/images/term-organization.png
Binary files differ
diff --git a/doc/architecture/blueprints/cells/images/term-top-level-group.png b/doc/architecture/blueprints/cells/images/term-top-level-group.png
new file mode 100644
index 00000000000..4af2468f50d
--- /dev/null
+++ b/doc/architecture/blueprints/cells/images/term-top-level-group.png
Binary files differ
diff --git a/doc/architecture/blueprints/cells/impact.md b/doc/architecture/blueprints/cells/impact.md
new file mode 100644
index 00000000000..878af4d1a5e
--- /dev/null
+++ b/doc/architecture/blueprints/cells/impact.md
@@ -0,0 +1,58 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells: Cross-section impact'
+---
+
+# Cells: Cross-section impact
+
+Cells is a fundamental architecture change that impacts other sections and stages. This section summarizes and links to other groups that may be impacted and highlights potential conflicts that need to be resolved. The Tenant Scale group is not responsible for achieving the goals of other groups but we want to ensure that dependencies are resolved.
+
+## Summary
+
+Based on discussions with other groups the net impact of introducing Cells and a new entity called organizations is mostly neutral. It may slow down development in some areas. We did not discover major blockers for other teams.
+
+1. We need to resolve naming conflicts (proposal is TBD)
+1. Cells requires introducing Organizations. Organizations are a new entity **above** top-level groups. Because this is a new entity, it may impact the ability to consolidate settings for Group::Organization and influence their decision on [how to approach introducing a an organization](https://gitlab.com/gitlab-org/gitlab/-/issues/376285#approach-2-organization-is-built-on-top-of-top-level-groups)
+1. Organizations may make it slightly easier for Fulfillment to realize their billing plans.
+
+## Impact on Group::Organization
+
+We synced with the Organization PM and Designer ([recording](https://youtu.be/b5Opn9cFWFk)) and discussed the similarities and differences between the Cells and Organization proposal ([presentation](https://docs.google.com/presentation/d/1FsUi22Up15b_tu6p2m-yLML3hCZ3rgrZrmzJAxUsNmU/edit?usp=sharing)).
+
+### Goals of Group::Organization
+
+As defined in the [organization documentation](../../../user/organization/index.md):
+
+1. Create an entity to manage everything you do as a GitLab administrator, including:
+ 1. Defining and applying settings to all of your groups, subgroups, and projects.
+ 1. Aggregating data from all your groups, subgroups, and projects.
+1. Reach feature parity between SaaS and self-managed installations, with all Admin Area settings moving to groups (?). Hardware controls remain on the instance level.
+
+The [organization roadmap outlines](https://gitlab.com/gitlab-org/gitlab/-/issues/368237#high-level-goals) the current goals in detail.
+
+### Potential conflicts with Cells
+
+- Organization defines a new entity as the primary organizational object for groups and projects.
+- We will only introduce one entity
+- Group::Organization highlighted the need to further validate the key assumption that users only care about what happens within their organization.
+
+## Impact on Fulfillment
+
+We synced with Fulfillment ([recording](https://youtu.be/FkQF3uF7vTY)) to discuss how Cells would impact them. Fulfillment is supportive of an entity above top-level groups. Their perspective is outline in [!5639](https://gitlab.com/gitlab-org/customers-gitlab-com/-/merge_requests/5639/diffs).
+
+### Goals of Fulfillment
+
+- Fulfillment has a longstanding plan to move billing from the top-level group to a level above. This would mean that a license applies for an organization and all its top-level groups.
+- Fulfillment uses Zuora for billing and would like to have a 1-to-1 relationship between an organization and their Zuora entity called BillingAccount. They want to move away from tying a license to a single user.
+- If a customer needs multiple organizations, the corresponding BillingAccounts can be rolled up into a consolidated billing account (similar to [AWS consolidated billing](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/consolidated-billing.html))
+- Ideally, a self-managed instance has a single Organization by default, which should be enough for most customers.
+- Fulfillment prefers only one additional entity.
+
+A rough representation of this is:
+
+![Cells and Fulfillment](images/pods-and-fulfillment.png)
+
+### Potential conflicts with Cells
+
+- There are no known conflicts between Fulfillment's plans and Cells
diff --git a/doc/architecture/blueprints/cells/index.md b/doc/architecture/blueprints/cells/index.md
new file mode 100644
index 00000000000..9938875adb6
--- /dev/null
+++ b/doc/architecture/blueprints/cells/index.md
@@ -0,0 +1,360 @@
+---
+status: accepted
+creation-date: "2022-09-07"
+authors: [ "@ayufan", "@fzimmer", "@DylanGriffith" ]
+coach: "@ayufan"
+approvers: [ "@fzimmer" ]
+owning-stage: "~devops::enablement"
+participating-stages: []
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+# Cells
+
+This document is a work-in-progress and represents a very early state of the Cells design. Significant aspects are not documented, though we expect to add them in the future.
+
+Cells is a new architecture for our Software as a Service platform. This architecture is horizontally-scalable, resilient, and provides a more consistent user experience. It may also provide additional features in the future, such as data residency control (regions) and federated features.
+
+For more information about Cells, see also:
+
+- [Glossary](glossary.md)
+- [Goals](goals.md)
+- [Cross-section impact](impact.md)
+
+## Work streams
+
+We can't ship the entire Cells architecture in one go - it is too large.
+Instead, we are defining key work streams required by the project.
+
+Not all objectives need to be fulfilled to reach production readiness.
+It is expected that some objectives will not be completed for General Availability (GA),
+but will be enough to run Cells in production.
+
+### 1. Data access layer
+
+Before Cells can be run in production we need to prepare the codebase to accept the Cells architecture.
+This preparation involves:
+
+- Allowing data sharing between Cells.
+- Updating the tooling for discovering cross-Cell data traversal.
+- Defining code practices for cross-Cell data traversal.
+- Analyzing the data model to define the data affinity.
+
+Under this objective the following steps are expected:
+
+1. **Allow to share cluster-wide data with database-level data access layer.**
+
+ Cells can connect to a database containing shared data. For example:
+ application settings, users, or routing information.
+
+1. **Evaluate the efficiency of database-level access vs. API-oriented access layer.**
+
+ Reconsider the consequences of database-level data access for data migration, resiliency of updates and of interconnected systems when we share only a subset of data.
+
+1. **Cluster-unique identifiers**
+
+ Every object has a unique identifier that can be used to access data across the cluster. The IDs for allocated projects, issues and any other objects are cluster-unique.
+
+1. **Cluster-wide deletions**
+
+ If entities deleted in Cell 2 are cross-referenced, they are properly deleted or nullified across clusters. We will likely re-use existing [loose foreign keys](../../../development/database/loose_foreign_keys.md) to extend it with cross-Cells data removal.
+
+1. **Data access layer**
+
+ Ensure that a stable data-access (versioned) layer that allows to share cluster-wide data is implemented.
+
+1. **Database migration**
+
+ Ensure that migrations can be run independently between Cells, and we safely handle migrations of shared data in a way that does not impact other Cells.
+
+### 2. Essential workflows
+
+To make Cells viable we require to define and support
+essential workflows before we can consider the Cells
+to be of Beta quality. Essential workflows are meant
+to cover the majority of application functionality
+that makes the product mostly useable, but with some caveats.
+
+The current approach is to define workflows from top to bottom.
+The order defines the presumed priority of the items.
+This list is not exhaustive as we would be expecting
+other teams to help and fix their workflows after
+the initial phase, in which we fix the fundamental ones.
+
+To consider a project ready for the Beta phase, it is expected
+that all features defined below are supported by Cells.
+In the cases listed below, the workflows define a set of tables
+to be properly attributed to the feature. In some cases,
+a table with an ambiguous usage has to be broken down.
+For example: `uploads` are used to store user avatars,
+as well as uploaded attachments for comments. It would be expected
+that `uploads` is split into `uploads` (describing group/project-level attachments)
+and `global_uploads` (describing, for example, user avatars).
+
+Except for initial 2-3 quarters this work is highly parallel.
+It would be expected that **group::tenant scale** would help other
+teams to fix their feature set to work with Cells. The first 2-3 quarters
+would be required to define a general split of data and build required tooling.
+
+1. **Instance-wide settings are shared across cluster.**
+
+ The Admin Area section for most part is shared across a cluster.
+
+1. **User accounts are shared across cluster.**
+
+ The purpose is to make `users` cluster-wide.
+
+1. **User can create group.**
+
+ The purpose is to perform a targeted decomposition of `users` and `namespaces`, because the `namespaces` will be stored locally in the Cell.
+
+1. **User can create project.**
+
+ The purpose is to perform a targeted decomposition of `users` and `projects`, because the `projects` will be stored locally in the Cell.
+
+1. **User can change profile avatar that is shared in cluster.**
+
+ The purpose is to fix global uploads that are shared in cluster.
+
+1. **User can push to Git repository.**
+
+ The purpose is to ensure that essential joins from the projects table are properly attributed to be
+ Cell-local, and as a result the essential Git workflow is supported.
+
+1. **User can run CI pipeline.**
+
+ The purpose is that `ci_pipelines` (like `ci_stages`, `ci_builds`, `ci_job_artifacts`) and adjacent tables are properly attributed to be Cell-local.
+
+1. **User can create issue, merge request, and merge it after it is green.**
+
+ The purpose is to ensure that `issues` and `merge requests` are properly attributed to be `Cell-local`.
+
+1. **User can manage group and project members.**
+
+ The `members` table is properly attributed to be either `Cell-local` or `cluster-wide`.
+
+1. **User can manage instance-wide runners.**
+
+ The purpose is to scope all CI Runners to be Cell-local. Instance-wide runners in fact become Cell-local runners. The expectation is to provide a user interface view and manage all runners per Cell, instead of per cluster.
+
+1. **User is part of organization and can only see information from the organization.**
+
+ The purpose is to have many organizations per Cell, but never have a single organization spanning across many Cells. This is required to ensure that information shown within an organization is isolated, and does not require fetching information from other Cells.
+
+### 3. Additional workflows
+
+Some of these additional workflows might need to be supported, depending on the group decision.
+This list is not exhaustive of work needed to be done.
+
+1. **User can use all group-level features.**
+1. **User can use all project-level features.**
+1. **User can share groups with other groups in an organization.**
+1. **User can create system webhook.**
+1. **User can upload and manage packages.**
+1. **User can manage security detection features.**
+1. **User can manage Kubernetes integration.**
+1. TBD
+
+### 4. Routing layer
+
+The routing layer is meant to offer a consistent user experience where all Cells are presented
+under a single domain (for example, `gitlab.com`), instead of
+having to navigate to separate domains.
+
+The user will able to use `https://gitlab.com` to access Cell-enabled GitLab. Depending
+on the URL access, it will be transparently proxied to the correct Cell that can serve this particular
+information. For example:
+
+- All requests going to `https://gitlab.com/users/sign_in` are randomly distributed to all Cells.
+- All requests going to `https://gitlab.com/gitlab-org/gitlab/-/tree/master` are always directed to Cell 5, for example.
+- All requests going to `https://gitlab.com/my-username/my-project` are always directed to Cell 1.
+
+1. **Technology.**
+
+ We decide what technology the routing service is written in.
+ The choice is dependent on the best performing language, and the expected way
+ and place of deployment of the routing layer. If it is required to make
+ the service multi-cloud it might be required to deploy it to the CDN provider.
+ Then the service needs to be written using a technology compatible with the CDN provider.
+
+1. **Cell discovery.**
+
+ The routing service needs to be able to discover and monitor the health of all Cells.
+
+1. **Router endpoints classification.**
+
+ The stateless routing service will fetch and cache information about endpoints
+ from one of the Cells. We need to implement a protocol that will allow us to
+ accurately describe the incoming request (its fingerprint), so it can be classified
+ by one of the Cells, and the results of that can be cached. We also need to implement
+ a mechanism for negative cache and cache eviction.
+
+1. **GraphQL and other ambigious endpoints.**
+
+ Most endpoints have a unique sharding key: the organization, which directly
+ or indirectly (via a group or project) can be used to classify endpoints.
+ Some endpoints are ambiguous in their usage (they don't encode the sharding key),
+ or the sharding key is stored deep in the payload. In these cases, we need to decide how to handle endpoints like `/api/graphql`.
+
+### 5. Cell deployment
+
+We will run many Cells. To manage them easier, we need to have consistent
+deployment procedures for Cells, including a way to deploy, manage, migrate,
+and monitor.
+
+We are very likely to use tooling made for [GitLab Dedicated](https://about.gitlab.com/dedicated/)
+with its control planes.
+
+1. **Extend GitLab Dedicated to support GCP.**
+1. TBD
+
+### 6. Migration
+
+When we reach production and are able to store new organizations on new Cells, we need
+to be able to divide big Cells into many smaller ones.
+
+1. **Use GitLab Geo to clone Cells.**
+
+ The purpose is to use GitLab Geo to clone Cells.
+
+1. **Split Cells by cloning them.**
+
+ Once Cell is cloned we change routing information for organizations.
+ Organization will encode `cell_id`. When we update `cell_id` it will automatically
+ make the given Cell to be authoritative to handle the traffic for the given organization.
+
+1. **Delete redundant data from previous Cells.**
+
+ Since the organization is now stored on many Cells, once we change `cell_id`
+ we will have to remove data from all other Cells based on `organization_id`.
+
+## Availability of the feature
+
+We are following the [Support for Experiment, Beta, and Generally Available features](../../../policy/alpha-beta-support.md).
+
+### 1. Experiment
+
+Expectations:
+
+- We can deploy a Cell on staging or another testing environment by using a separate domain (ex. `cell2.staging.gitlab.com`)
+ using [Cell deployment](#5-cell-deployment) tooling.
+- User can create organization, group and project, and run some of the [essential workflows](#2-essential-workflows).
+- It is not expected to be able to run a router to serve all requests under a single domain.
+- We expect data-loss of data stored on additional Cells.
+- We expect to tear down and create many new Cells to validate tooling.
+
+### 2. Beta
+
+Expectations:
+
+- We can run many Cells under a single domain (ex. `staging.gitlab.com`).
+- All features defined in [essential workflows](#2-essential-workflows) are supported.
+- Not all aspects of [Routing layer](#4-routing-layer) are finalized.
+- We expect additional Cells to be stable with minimal data loss.
+
+### 3. GA
+
+Expectations:
+
+- We can run many Cells under a single domain (for example, `staging.gitlab.com`).
+- All features defined in [essential workflows](#2-essential-workflows) are supported.
+- All features of [routing layer](#4-routing-layer) are supported.
+- Most of [additional workflows](#3-additional-workflows) are supported.
+- We don't expect to support any of [migration](#6-migration) aspects.
+
+### 4. Post GA
+
+Expectations:
+
+- We support all [additional workflows](#3-additional-workflows).
+- We can [migrate](#6-migration) existing organizations onto new Cells.
+
+## Iteration plan
+
+The delivered iterations will focus on solving particular steps of a given
+key work stream.
+
+It is expected that initial iterations will rather
+be slow, because they require substantially more
+changes to prepare the codebase for data split.
+
+One iteration describes one quarter's worth of work.
+
+1. Iteration 1 - FY24Q1
+
+ - Data access layer: Initial Admin Area settings are shared across cluster.
+ - Essential workflows: Allow to share cluster-wide data with database-level data access layer
+
+1. Iteration 2 - FY24Q2
+
+ - Essential workflows: User accounts are shared across cluster.
+ - Essential workflows: User can create group.
+
+1. Iteration 3 - FY24Q3
+
+ - Essential workflows: User can create project.
+ - Essential workflows: User can push to Git repository.
+ - Cell deployment: Extend GitLab Dedicated to support GCP
+ - Routing: Technology.
+
+1. Iteration 4 - FY24Q4
+
+ - Essential workflows: User can run CI pipeline.
+ - Essential workflows: User can create issue, merge request, and merge it after it is green.
+ - Data access layer: Evaluate the efficiency of database-level access vs. API-oriented access layer
+ - Data access layer: Cluster-unique identifiers.
+ - Routing: Cell discovery.
+ - Routing: Router endpoints classification.
+
+1. Iteration 5 - FY25Q1
+
+ - TBD
+
+## Technical Proposals
+
+The Cells architecture do have long lasting implications to data processing, location, scalability and the GitLab architecture.
+This section links all different technical proposals that are being evaluated.
+
+- [Stateless Router That Uses a Cache to Pick Cell and Is Redirected When Wrong Cell Is Reached](proposal-stateless-router-with-buffering-requests.md)
+
+- [Stateless Router That Uses a Cache to Pick Cell and pre-flight `/api/v4/cells/learn`](proposal-stateless-router-with-routes-learning.md)
+
+## Impacted features
+
+The Cells architecture will impact many features requiring some of them to be rewritten, or changed significantly.
+This is the list of known affected features with the proposed solutions.
+
+- [Cells: Git Access](cells-feature-git-access.md)
+- [Cells: Data Migration](cells-feature-data-migration.md)
+- [Cells: Database Sequences](cells-feature-database-sequences.md)
+- [Cells: GraphQL](cells-feature-graphql.md)
+- [Cells: Organizations](cells-feature-organizations.md)
+- [Cells: Router Endpoints Classification](cells-feature-router-endpoints-classification.md)
+- [Cells: Schema changes (Postgres and Elasticsearch migrations)](cells-feature-schema-changes.md)
+- [Cells: Backups](cells-feature-backups.md)
+- [Cells: Global Search](cells-feature-global-search.md)
+- [Cells: CI Runners](cells-feature-ci-runners.md)
+- [Cells: Admin Area](cells-feature-admin-area.md)
+- [Cells: Secrets](cells-feature-secrets.md)
+- [Cells: Container Registry](cells-feature-container-registry.md)
+- [Cells: Contributions: Forks](cells-feature-contributions-forks.md)
+- [Cells: Personal Namespaces](cells-feature-personal-namespaces.md)
+- [Cells: Dashboard: Projects, Todos, Issues, Merge Requests, Activity, ...](cells-feature-dashboard.md)
+- [Cells: Snippets](cells-feature-snippets.md)
+- [Cells: Uploads](cells-feature-uploads.md)
+- [Cells: GitLab Pages](cells-feature-gitlab-pages.md)
+- [Cells: Agent for Kubernetes](cells-feature-agent-for-kubernetes.md)
+
+## Decision log
+
+- 2022-03-15: Google Cloud as the cloud service. For details, see [issue 396641](https://gitlab.com/gitlab-org/gitlab/-/issues/396641#note_1314932272).
+
+## Links
+
+- [Internal Pods presentation](https://docs.google.com/presentation/d/1x1uIiN8FR9fhL7pzFh9juHOVcSxEY7d2_q4uiKKGD44/edit#slide=id.ge7acbdc97a_0_155)
+- [Internal link to all diagrams](https://drive.google.com/file/d/13NHzbTrmhUM-z_Bf0RjatUEGw5jWHSLt/view?usp=sharing)
+- [Cells Epic](https://gitlab.com/groups/gitlab-org/-/epics/7582)
+- [Database Group investigation](https://about.gitlab.com/handbook/engineering/development/enablement/data_stores/database/doc/root-namespace-sharding.html)
+- [Shopify Pods architecture](https://shopify.engineering/a-pods-architecture-to-allow-shopify-to-scale)
+- [Opstrace architecture](https://gitlab.com/gitlab-org/opstrace/opstrace/-/blob/main/docs/architecture/overview.md)
diff --git a/doc/architecture/blueprints/cells/proposal-stateless-router-with-buffering-requests.md b/doc/architecture/blueprints/cells/proposal-stateless-router-with-buffering-requests.md
new file mode 100644
index 00000000000..f352fea84b1
--- /dev/null
+++ b/doc/architecture/blueprints/cells/proposal-stateless-router-with-buffering-requests.md
@@ -0,0 +1,649 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells Stateless Router Proposal'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+This document is a work-in-progress and represents a very early state of the
+Pods design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Pods, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Proposal: Stateless Router
+
+We will decompose `gitlab_users`, `gitlab_routes` and `gitlab_admin` related
+tables so that they can be shared between all cells and allow any cell to
+authenticate a user and route requests to the correct cell. Cells may receive
+requests for the resources they don't own, but they know how to redirect back
+to the correct cell.
+
+The router is stateless and does not read from the `routes` database which
+means that all interactions with the database still happen from the Rails
+monolith. This architecture also supports regions by allowing for low traffic
+databases to be replicated across regions.
+
+Users are not directly exposed to the concept of Cells but instead they see
+different data dependent on their chosen "organization".
+[Organizations](glossary.md#organizations) will be a new model introduced to enforce isolation in the
+application and allow us to decide which request route to which cell, since an
+organization can only be on a single cell.
+
+## Differences
+
+The main difference between this proposal and the one [with learning routes](proposal-stateless-router-with-routes-learning.md)
+is that this proposal always sends requests to any of the Cells. If the requests cannot be processed,
+the requests will be bounced back with relevant headers. This requires that request to be buffered.
+It allows that request decoding can be either via URI or Body of request by Rails.
+This means that each request might be sent more than once and be processed more than once as result.
+
+The [with learning routes proposal](proposal-stateless-router-with-routes-learning.md) requires that
+routable information is always encoded in URI, and the router sends a pre-flight request.
+
+## Summary in diagrams
+
+This shows how a user request routes via DNS to the nearest router and the router chooses a cell to send the request to.
+
+```mermaid
+graph TD;
+ user((User));
+ dns[DNS];
+ router_us(Router);
+ router_eu(Router);
+ cell_us0{Cell US0};
+ cell_us1{Cell US1};
+ cell_eu0{Cell EU0};
+ cell_eu1{Cell EU1};
+ user-->dns;
+ dns-->router_us;
+ dns-->router_eu;
+ subgraph Europe
+ router_eu-->cell_eu0;
+ router_eu-->cell_eu1;
+ end
+ subgraph United States
+ router_us-->cell_us0;
+ router_us-->cell_us1;
+ end
+```
+
+<details><summary>More detail</summary>
+
+This shows that the router can actually send requests to any cell. The user will
+get the closest router to them geographically.
+
+```mermaid
+graph TD;
+ user((User));
+ dns[DNS];
+ router_us(Router);
+ router_eu(Router);
+ cell_us0{Cell US0};
+ cell_us1{Cell US1};
+ cell_eu0{Cell EU0};
+ cell_eu1{Cell EU1};
+ user-->dns;
+ dns-->router_us;
+ dns-->router_eu;
+ subgraph Europe
+ router_eu-->cell_eu0;
+ router_eu-->cell_eu1;
+ end
+ subgraph United States
+ router_us-->cell_us0;
+ router_us-->cell_us1;
+ end
+ router_eu-.->cell_us0;
+ router_eu-.->cell_us1;
+ router_us-.->cell_eu0;
+ router_us-.->cell_eu1;
+```
+
+</details>
+
+<details><summary>Even more detail</summary>
+
+This shows the databases. `gitlab_users` and `gitlab_routes` exist only in the
+US region but are replicated to other regions. Replication does not have an
+arrow because it's too hard to read the diagram.
+
+```mermaid
+graph TD;
+ user((User));
+ dns[DNS];
+ router_us(Router);
+ router_eu(Router);
+ cell_us0{Cell US0};
+ cell_us1{Cell US1};
+ cell_eu0{Cell EU0};
+ cell_eu1{Cell EU1};
+ db_gitlab_users[(gitlab_users Primary)];
+ db_gitlab_routes[(gitlab_routes Primary)];
+ db_gitlab_users_replica[(gitlab_users Replica)];
+ db_gitlab_routes_replica[(gitlab_routes Replica)];
+ db_cell_us0[(gitlab_main/gitlab_ci Cell US0)];
+ db_cell_us1[(gitlab_main/gitlab_ci Cell US1)];
+ db_cell_eu0[(gitlab_main/gitlab_ci Cell EU0)];
+ db_cell_eu1[(gitlab_main/gitlab_ci Cell EU1)];
+ user-->dns;
+ dns-->router_us;
+ dns-->router_eu;
+ subgraph Europe
+ router_eu-->cell_eu0;
+ router_eu-->cell_eu1;
+ cell_eu0-->db_cell_eu0;
+ cell_eu0-->db_gitlab_users_replica;
+ cell_eu0-->db_gitlab_routes_replica;
+ cell_eu1-->db_gitlab_users_replica;
+ cell_eu1-->db_gitlab_routes_replica;
+ cell_eu1-->db_cell_eu1;
+ end
+ subgraph United States
+ router_us-->cell_us0;
+ router_us-->cell_us1;
+ cell_us0-->db_cell_us0;
+ cell_us0-->db_gitlab_users;
+ cell_us0-->db_gitlab_routes;
+ cell_us1-->db_gitlab_users;
+ cell_us1-->db_gitlab_routes;
+ cell_us1-->db_cell_us1;
+ end
+ router_eu-.->cell_us0;
+ router_eu-.->cell_us1;
+ router_us-.->cell_eu0;
+ router_us-.->cell_eu1;
+```
+
+</details>
+
+## Summary of changes
+
+1. Tables related to User data (including profile settings, authentication credentials, personal access tokens) are decomposed into a `gitlab_users` schema
+1. The `routes` table is decomposed into `gitlab_routes` schema
+1. The `application_settings` (and probably a few other instance level tables) are decomposed into `gitlab_admin` schema
+1. A new column `routes.cell_id` is added to `routes` table
+1. A new Router service exists to choose which cell to route a request to.
+1. A new concept will be introduced in GitLab called an organization and a user can select a "default organization" and this will be a user level setting. The default organization is used to redirect users away from ambiguous routes like `/dashboard` to organization scoped routes like `/organizations/my-organization/-/dashboard`. Legacy users will have a special default organization that allows them to keep using global resources on `Cell US0`. All existing namespaces will initially move to this public organization.
+1. If a cell receives a request for a `routes.cell_id` that it does not own it returns a `302` with `X-Gitlab-Cell-Redirect` header so that the router can send the request to the correct cell. The correct cell can also set a header `X-Gitlab-Cell-Cache` which contains information about how this request should be cached to remember the cell. For example if the request was `/gitlab-org/gitlab` then the header would encode `/gitlab-org/* => Cell US0` (for example, any requests starting with `/gitlab-org/` can always be routed to `Cell US0`
+1. When the cell does not know (from the cache) which cell to send a request to it just picks a random cell within it's region
+1. Writes to `gitlab_users` and `gitlab_routes` are sent to a primary PostgreSQL server in our `US` region but reads can come from replicas in the same region. This will add latency for these writes but we expect they are infrequent relative to the rest of GitLab.
+
+## Detailed explanation of default organization in the first iteration
+
+All users will get a new column `users.default_organization` which they can
+control in user settings. We will introduce a concept of the
+`GitLab.com Public` organization. This will be set as the default organization for all existing
+users. This organization will allow the user to see data from all namespaces in
+`Cell US0` (for example, our original GitLab.com instance). This behavior can be invisible to
+existing users such that they don't even get told when they are viewing a
+global page like `/dashboard` that it's even scoped to an organization.
+
+Any new users with a default organization other than `GitLab.com Public` will have
+a distinct user experience and will be fully aware that every page they load is
+only ever scoped to a single organization. These users can never
+load any global pages like `/dashboard` and will end up being redirected to
+`/organizations/<DEFAULT_ORGANIZATION>/-/dashboard`. This may also be the case
+for legacy APIs and such users may only ever be able to use APIs scoped to a
+organization.
+
+## Detailed explanation of Admin Area settings
+
+We believe that maintaining and synchronizing Admin Area settings will be
+frustrating and painful so to avoid this we will decompose and share all Admin Area
+settings in the `gitlab_admin` schema. This should be safe (similar to other
+shared schemas) because these receive very little write traffic.
+
+In cases where different cells need different settings (for example, the
+Elasticsearch URL), we will either decide to use a templated
+format in the relevant `application_settings` row which allows it to be dynamic
+per cell. Alternatively if that proves difficult we'll introduce a new table
+called `per_cell_application_settings` and this will have 1 row per cell to allow
+setting different settings per cell. It will still be part of the `gitlab_admin`
+schema and shared which will allow us to centrally manage it and simplify
+keeping settings in sync for all cells.
+
+## Pros
+
+1. Router is stateless and can live in many regions. We use Anycast DNS to resolve to nearest region for the user.
+1. Cells can receive requests for namespaces in the wrong cell and the user
+ still gets the right response as well as caching at the router that
+ ensures the next request is sent to the correct cell so the next request
+ will go to the correct cell
+1. The majority of the code still lives in `gitlab` rails codebase. The Router doesn't actually need to understand how GitLab URLs are composed.
+1. Since the responsibility to read and write `gitlab_users`,
+ `gitlab_routes` and `gitlab_admin` still lives in Rails it means minimal
+ changes will be needed to the Rails application compared to extracting
+ services that need to isolate the domain models and build new interfaces.
+1. Compared to a separate routing service this allows the Rails application
+ to encode more complex rules around how to map URLs to the correct cell
+ and may work for some existing API endpoints.
+1. All the new infrastructure (just a router) is optional and a single-cell
+ self-managed installation does not even need to run the Router and there are
+ no other new services.
+
+## Cons
+
+1. `gitlab_users`, `gitlab_routes` and `gitlab_admin` databases may need to be
+ replicated across regions and writes need to go across regions. We need to
+ do an analysis on write TPS for the relevant tables to determine if this is
+ feasible.
+1. Sharing access to the database from many different Cells means that they are
+ all coupled at the Postgres schema level and this means changes to the
+ database schema need to be done carefully in sync with the deployment of all
+ Cells. This limits us to ensure that Cells are kept in closely similar
+ versions compared to an architecture with shared services that have an API
+ we control.
+1. Although most data is stored in the right region there can be requests
+ proxied from another region which may be an issue for certain types
+ of compliance.
+1. Data in `gitlab_users` and `gitlab_routes` databases must be replicated in
+ all regions which may be an issue for certain types of compliance.
+1. The router cache may need to be very large if we get a wide variety of URLs
+ (for example, long tail). In such a case we may need to implement a 2nd level of
+ caching in user cookies so their frequently accessed pages always go to the
+ right cell the first time.
+1. Having shared database access for `gitlab_users` and `gitlab_routes`
+ from multiple cells is an unusual architecture decision compared to
+ extracting services that are called from multiple cells.
+1. It is very likely we won't be able to find cacheable elements of a
+ GraphQL URL and often existing GraphQL endpoints are heavily dependent on
+ ids that won't be in the `routes` table so cells won't necessarily know
+ what cell has the data. As such we'll probably have to update our GraphQL
+ calls to include an organization context in the path like
+ `/api/organizations/<organization>/graphql`.
+1. This architecture implies that implemented endpoints can only access data
+ that are readily accessible on a given Cell, but are unlikely
+ to aggregate information from many Cells.
+1. All unknown routes are sent to the latest deployment which we assume to be `Cell US0`.
+ This is required as newly added endpoints will be only decodable by latest cell.
+ This Cell could later redirect to correct one that can serve the given request.
+ Since request processing might be heavy some Cells might receive significant amount
+ of traffic due to that.
+
+## Example database configuration
+
+Handling shared `gitlab_users`, `gitlab_routes` and `gitlab_admin` databases, while having dedicated `gitlab_main` and `gitlab_ci` databases should already be handled by the way we use `config/database.yml`. We should also, already be able to handle the dedicated EU replicas while having a single US primary for `gitlab_users` and `gitlab_routes`. Below is a snippet of part of the database configuration for the Cell architecture described above.
+
+<details><summary>Cell US0</summary>
+
+```yaml
+# config/database.yml
+production:
+ main:
+ host: postgres-main.cell-us0.primary.consul
+ load_balancing:
+ discovery: postgres-main.cell-us0.replicas.consul
+ ci:
+ host: postgres-ci.cell-us0.primary.consul
+ load_balancing:
+ discovery: postgres-ci.cell-us0.replicas.consul
+ users:
+ host: postgres-users-primary.consul
+ load_balancing:
+ discovery: postgres-users-replicas.us.consul
+ routes:
+ host: postgres-routes-primary.consul
+ load_balancing:
+ discovery: postgres-routes-replicas.us.consul
+ admin:
+ host: postgres-admin-primary.consul
+ load_balancing:
+ discovery: postgres-admin-replicas.us.consul
+```
+
+</details>
+
+<details><summary>Cell EU0</summary>
+
+```yaml
+# config/database.yml
+production:
+ main:
+ host: postgres-main.cell-eu0.primary.consul
+ load_balancing:
+ discovery: postgres-main.cell-eu0.replicas.consul
+ ci:
+ host: postgres-ci.cell-eu0.primary.consul
+ load_balancing:
+ discovery: postgres-ci.cell-eu0.replicas.consul
+ users:
+ host: postgres-users-primary.consul
+ load_balancing:
+ discovery: postgres-users-replicas.eu.consul
+ routes:
+ host: postgres-routes-primary.consul
+ load_balancing:
+ discovery: postgres-routes-replicas.eu.consul
+ admin:
+ host: postgres-admin-primary.consul
+ load_balancing:
+ discovery: postgres-admin-replicas.eu.consul
+```
+
+</details>
+
+## Request flows
+
+1. `gitlab-org` is a top level namespace and lives in `Cell US0` in the `GitLab.com Public` organization
+1. `my-company` is a top level namespace and lives in `Cell EU0` in the `my-organization` organization
+
+### Experience for paying user that is part of `my-organization`
+
+Such a user will have a default organization set to `/my-organization` and will be
+unable to load any global routes outside of this organization. They may load other
+projects/namespaces but their MR/Todo/Issue counts at the top of the page will
+not be correctly populated in the first iteration. The user will be aware of
+this limitation.
+
+#### Navigates to `/my-company/my-project` while logged in
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. They request `/my-company/my-project` without the router cache, so the router chooses randomly `Cell EU1`
+1. `Cell EU1` does not have `/my-company`, but it knows that it lives in `Cell EU0` so it redirects the router to `Cell EU0`
+1. `Cell EU0` returns the correct response as well as setting the cache headers for the router `/my-company/* => Cell EU0`
+1. The router now caches and remembers any request paths matching `/my-company/*` should go to `Cell EU0`
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant cell_eu0 as Cell EU0
+ participant cell_eu1 as Cell EU1
+ user->>router_eu: GET /my-company/my-project
+ router_eu->>cell_eu1: GET /my-company/my-project
+ cell_eu1->>router_eu: 302 /my-company/my-project X-Gitlab-Cell-Redirect={cell:Cell EU0}
+ router_eu->>cell_eu0: GET /my-company/my-project
+ cell_eu0->>user: <h1>My Project... X-Gitlab-Cell-Cache={path_prefix:/my-company/}
+```
+
+#### Navigates to `/my-company/my-project` while not logged in
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. The router does not have `/my-company/*` cached yet so it chooses randomly `Cell EU1`
+1. `Cell EU1` redirects them through a login flow
+1. Still they request `/my-company/my-project` without the router cache, so the router chooses a random cell `Cell EU1`
+1. `Cell EU1` does not have `/my-company`, but it knows that it lives in `Cell EU0` so it redirects the router to `Cell EU0`
+1. `Cell EU0` returns the correct response as well as setting the cache headers for the router `/my-company/* => Cell EU0`
+1. The router now caches and remembers any request paths matching `/my-company/*` should go to `Cell EU0`
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant cell_eu0 as Cell EU0
+ participant cell_eu1 as Cell EU1
+ user->>router_eu: GET /my-company/my-project
+ router_eu->>cell_eu1: GET /my-company/my-project
+ cell_eu1->>user: 302 /users/sign_in?redirect=/my-company/my-project
+ user->>router_eu: GET /users/sign_in?redirect=/my-company/my-project
+ router_eu->>cell_eu1: GET /users/sign_in?redirect=/my-company/my-project
+ cell_eu1->>user: <h1>Sign in...
+ user->>router_eu: POST /users/sign_in?redirect=/my-company/my-project
+ router_eu->>cell_eu1: POST /users/sign_in?redirect=/my-company/my-project
+ cell_eu1->>user: 302 /my-company/my-project
+ user->>router_eu: GET /my-company/my-project
+ router_eu->>cell_eu1: GET /my-company/my-project
+ cell_eu1->>router_eu: 302 /my-company/my-project X-Gitlab-Cell-Redirect={cell:Cell EU0}
+ router_eu->>cell_eu0: GET /my-company/my-project
+ cell_eu0->>user: <h1>My Project... X-Gitlab-Cell-Cache={path_prefix:/my-company/}
+```
+
+#### Navigates to `/my-company/my-other-project` after last step
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. The router cache now has `/my-company/* => Cell EU0`, so the router chooses `Cell EU0`
+1. `Cell EU0` returns the correct response as well as the cache header again
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant cell_eu0 as Cell EU0
+ participant cell_eu1 as Cell EU1
+ user->>router_eu: GET /my-company/my-project
+ router_eu->>cell_eu0: GET /my-company/my-project
+ cell_eu0->>user: <h1>My Project... X-Gitlab-Cell-Cache={path_prefix:/my-company/}
+```
+
+#### Navigates to `/gitlab-org/gitlab` after last step
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. The router has no cached value for this URL so randomly chooses `Cell EU0`
+1. `Cell EU0` redirects the router to `Cell US0`
+1. `Cell US0` returns the correct response as well as the cache header again
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant cell_eu0 as Cell EU0
+ participant cell_us0 as Cell US0
+ user->>router_eu: GET /gitlab-org/gitlab
+ router_eu->>cell_eu0: GET /gitlab-org/gitlab
+ cell_eu0->>router_eu: 302 /gitlab-org/gitlab X-Gitlab-Cell-Redirect={cell:Cell US0}
+ router_eu->>cell_us0: GET /gitlab-org/gitlab
+ cell_us0->>user: <h1>GitLab.org... X-Gitlab-Cell-Cache={path_prefix:/gitlab-org/}
+```
+
+In this case the user is not on their "default organization" so their TODO
+counter will not include their normal todos. We may choose to highlight this in
+the UI somewhere. A future iteration may be able to fetch that for them from
+their default organization.
+
+#### Navigates to `/`
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. Router does not have a cache for `/` route (specifically rails never tells it to cache this route)
+1. The Router choose `Cell EU0` randomly
+1. The Rails application knows the users default organization is `/my-organization`, so
+ it redirects the user to `/organizations/my-organization/-/dashboard`
+1. The Router has a cached value for `/organizations/my-organization/*` so it then sends the
+ request to `POD EU0`
+1. `Cell EU0` serves up a new page `/organizations/my-organization/-/dashboard` which is the same
+ dashboard view we have today but scoped to an organization clearly in the UI
+1. The user is (optionally) presented with a message saying that data on this page is only
+ from their default organization and that they can change their default
+ organization if it's not right.
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant cell_eu0 as Cell EU0
+ user->>router_eu: GET /
+ router_eu->>cell_eu0: GET /
+ cell_eu0->>user: 302 /organizations/my-organization/-/dashboard
+ user->>router: GET /organizations/my-organization/-/dashboard
+ router->>cell_eu0: GET /organizations/my-organization/-/dashboard
+ cell_eu0->>user: <h1>My Company Dashboard... X-Gitlab-Cell-Cache={path_prefix:/organizations/my-organization/}
+```
+
+#### Navigates to `/dashboard`
+
+As above, they will end up on `/organizations/my-organization/-/dashboard` as
+the rails application will already redirect `/` to the dashboard page.
+
+### Navigates to `/not-my-company/not-my-project` while logged in (but they don't have access since this project/group is private)
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. The router knows that `/not-my-company` lives in `Cell US1` so sends the request to this
+1. The user does not have access so `Cell US1` returns 404
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant cell_us1 as Cell US1
+ user->>router_eu: GET /not-my-company/not-my-project
+ router_eu->>cell_us1: GET /not-my-company/not-my-project
+ cell_us1->>user: 404
+```
+
+#### Creates a new top level namespace
+
+The user will be asked which organization they want the namespace to belong to.
+If they select `my-organization` then it will end up on the same cell as all
+other namespaces in `my-organization`. If they select nothing we default to
+`GitLab.com Public` and it is clear to the user that this is isolated from
+their existing organization such that they won't be able to see data from both
+on a single page.
+
+### Experience for GitLab team member that is part of `/gitlab-org`
+
+Such a user is considered a legacy user and has their default organization set to
+`GitLab.com Public`. This is a "meta" organization that does not really exist but
+the Rails application knows to interpret this organization to mean that they are
+allowed to use legacy global functionality like `/dashboard` to see data across
+namespaces located on `Cell US0`. The rails backend also knows that the default cell to render any ambiguous
+routes like `/dashboard` is `Cell US0`. Lastly the user will be allowed to
+navigate to organizations on another cell like `/my-organization` but when they do the
+user will see a message indicating that some data may be missing (for example, the
+MRs/Issues/Todos) counts.
+
+#### Navigates to `/gitlab-org/gitlab` while not logged in
+
+1. User is in the US so DNS resolves to the US router
+1. The router knows that `/gitlab-org` lives in `Cell US0` so sends the request
+ to this cell
+1. `Cell US0` serves up the response
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_us as Router US
+ participant cell_us0 as Cell US0
+ user->>router_us: GET /gitlab-org/gitlab
+ router_us->>cell_us0: GET /gitlab-org/gitlab
+ cell_us0->>user: <h1>GitLab.org... X-Gitlab-Cell-Cache={path_prefix:/gitlab-org/}
+```
+
+#### Navigates to `/`
+
+1. User is in US so DNS resolves to the router in US
+1. Router does not have a cache for `/` route (specifically rails never tells it to cache this route)
+1. The Router chooses `Cell US1` randomly
+1. The Rails application knows the users default organization is `GitLab.com Public`, so
+ it redirects the user to `/dashboards` (only legacy users can see
+ `/dashboard` global view)
+1. Router does not have a cache for `/dashboard` route (specifically rails never tells it to cache this route)
+1. The Router chooses `Cell US1` randomly
+1. The Rails application knows the users default organization is `GitLab.com Public`, so
+ it allows the user to load `/dashboards` (only legacy users can see
+ `/dashboard` global view) and redirects to router the legacy cell which is `Cell US0`
+1. `Cell US0` serves up the global view dashboard page `/dashboard` which is the same
+ dashboard view we have today
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_us as Router US
+ participant cell_us0 as Cell US0
+ participant cell_us1 as Cell US1
+ user->>router_us: GET /
+ router_us->>cell_us1: GET /
+ cell_us1->>user: 302 /dashboard
+ user->>router_us: GET /dashboard
+ router_us->>cell_us1: GET /dashboard
+ cell_us1->>router_us: 302 /dashboard X-Gitlab-Cell-Redirect={cell:Cell US0}
+ router_us->>cell_us0: GET /dashboard
+ cell_us0->>user: <h1>Dashboard...
+```
+
+#### Navigates to `/my-company/my-other-project` while logged in (but they don't have access since this project is private)
+
+They get a 404.
+
+### Experience for non-authenticated users
+
+Flow is similar to authenticated users except global routes like `/dashboard` will
+redirect to the login page as there is no default organization to choose from.
+
+### A new customers signs up
+
+They will be asked if they are already part of an organization or if they'd
+like to create one. If they choose neither they end up no the default
+`GitLab.com Public` organization.
+
+### An organization is moved from 1 cell to another
+
+TODO
+
+### GraphQL/API requests which don't include the namespace in the URL
+
+TODO
+
+### The autocomplete suggestion functionality in the search bar which remembers recent issues/MRs
+
+TODO
+
+### Global search
+
+TODO
+
+## Administrator
+
+### Loads `/admin` page
+
+1. Router picks a random cell `Cell US0`
+1. Cell US0 redirects user to `/admin/cells/cellus0`
+1. Cell US0 renders an Admin Area page and also returns a cache header to cache `/admin/cellss/cellus0/* => Cell US0`. The Admin Area page contains a dropdown list showing other cells they could select and it changes the query parameter.
+
+Admin Area settings in Postgres are all shared across all cells to avoid
+divergence but we still make it clear in the URL and UI which cell is serving
+the Admin Area page as there is dynamic data being generated from these pages and
+the operator may want to view a specific cell.
+
+## More Technical Problems To Solve
+
+### Replicating User Sessions Between All Cells
+
+Today user sessions live in Redis but each cell will have their own Redis instance. We already use a dedicated Redis instance for sessions so we could consider sharing this with all cells like we do with `gitlab_users` PostgreSQL database. But an important consideration will be latency as we would still want to mostly fetch sessions from the same region.
+
+An alternative might be that user sessions get moved to a JWT payload that encodes all the session data but this has downsides. For example, it is difficult to expire a user session, when their password changes or for other reasons, if the session lives in a JWT controlled by the user.
+
+### How do we migrate between Cells
+
+Migrating data between cells will need to factor all data stores:
+
+1. PostgreSQL
+1. Redis Shared State
+1. Gitaly
+1. Elasticsearch
+
+### Is it still possible to leak the existence of private groups via a timing attack?
+
+If you have router in EU, and you know that EU router by default redirects
+to EU located Cells, you know their latency (lets assume 10 ms). Now, if your
+request is bounced back and redirected to US which has different latency
+(lets assume that roundtrip will be around 60 ms) you can deduce that 404 was
+returned by US Cell and know that your 404 is in fact 403.
+
+We may defer this until we actually implement a cell in a different region. Such timing attacks are already theoretically possible with the way we do permission checks today but the timing difference is probably too small to be able to detect.
+
+One technique to mitigate this risk might be to have the router add a random
+delay to any request that returns 404 from a cell.
+
+## Should runners be shared across all cells?
+
+We have 2 options and we should decide which is easier:
+
+1. Decompose runner registration and queuing tables and share them across all
+ cells. This may have implications for scalability, and we'd need to consider
+ if this would include group/project runners as this may have scalability
+ concerns as these are high traffic tables that would need to be shared.
+1. Runners are registered per-cell and, we probably have a separate fleet of
+ runners for every cell or just register the same runners to many cells which
+ may have implications for queueing
+
+## How do we guarantee unique ids across all cells for things that cannot conflict?
+
+This project assumes at least namespaces and projects have unique ids across
+all cells as many requests need to be routed based on their ID. Since those
+tables are across different databases then guaranteeing a unique ID will
+require a new solution. There are likely other tables where unique IDs are
+necessary and depending on how we resolve routing for GraphQL and other APIs
+and other design goals it may be determined that we want the primary key to be
+unique for all tables.
diff --git a/doc/architecture/blueprints/cells/proposal-stateless-router-with-routes-learning.md b/doc/architecture/blueprints/cells/proposal-stateless-router-with-routes-learning.md
new file mode 100644
index 00000000000..aadc08016e3
--- /dev/null
+++ b/doc/architecture/blueprints/cells/proposal-stateless-router-with-routes-learning.md
@@ -0,0 +1,673 @@
+---
+stage: enablement
+group: Tenant Scale
+description: 'Cells Stateless Router Proposal'
+---
+
+<!-- vale gitlab.FutureTense = NO -->
+
+This document is a work-in-progress and represents a very early state of the
+Cells design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Cells, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Proposal: Stateless Router
+
+We will decompose `gitlab_users`, `gitlab_routes` and `gitlab_admin` related
+tables so that they can be shared between all cells and allow any cell to
+authenticate a user and route requests to the correct cell. Cells may receive
+requests for the resources they don't own, but they know how to redirect back
+to the correct cell.
+
+The router is stateless and does not read from the `routes` database which
+means that all interactions with the database still happen from the Rails
+monolith. This architecture also supports regions by allowing for low traffic
+databases to be replicated across regions.
+
+Users are not directly exposed to the concept of Cells but instead they see
+different data dependent on their chosen "organization".
+[Organizations](glossary.md#organizations) will be a new model introduced to enforce isolation in the
+application and allow us to decide which request route to which cell, since an
+organization can only be on a single cell.
+
+## Differences
+
+The main difference between this proposal and one [with buffering requests](proposal-stateless-router-with-buffering-requests.md)
+is that this proposal uses a pre-flight API request (`/api/v4/cells/learn`) to redirect the request body to the correct Cell.
+This means that each request is sent exactly once to be processed, but the URI is used to decode which Cell it should be directed.
+
+## Summary in diagrams
+
+This shows how a user request routes via DNS to the nearest router and the router chooses a cell to send the request to.
+
+```mermaid
+graph TD;
+ user((User));
+ dns[DNS];
+ router_us(Router);
+ router_eu(Router);
+ cell_us0{Cell US0};
+ cell_us1{Cell US1};
+ cell_eu0{Cell EU0};
+ cell_eu1{Cell EU1};
+ user-->dns;
+ dns-->router_us;
+ dns-->router_eu;
+ subgraph Europe
+ router_eu-->cell_eu0;
+ router_eu-->cell_eu1;
+ end
+ subgraph United States
+ router_us-->cell_us0;
+ router_us-->cell_us1;
+ end
+```
+
+### More detail
+
+This shows that the router can actually send requests to any cell. The user will
+get the closest router to them geographically.
+
+```mermaid
+graph TD;
+ user((User));
+ dns[DNS];
+ router_us(Router);
+ router_eu(Router);
+ cell_us0{Cell US0};
+ cell_us1{Cell US1};
+ cell_eu0{Cell EU0};
+ cell_eu1{Cell EU1};
+ user-->dns;
+ dns-->router_us;
+ dns-->router_eu;
+ subgraph Europe
+ router_eu-->cell_eu0;
+ router_eu-->cell_eu1;
+ end
+ subgraph United States
+ router_us-->cell_us0;
+ router_us-->cell_us1;
+ end
+ router_eu-.->cell_us0;
+ router_eu-.->cell_us1;
+ router_us-.->cell_eu0;
+ router_us-.->cell_eu1;
+```
+
+### Even more detail
+
+This shows the databases. `gitlab_users` and `gitlab_routes` exist only in the
+US region but are replicated to other regions. Replication does not have an
+arrow because it's too hard to read the diagram.
+
+```mermaid
+graph TD;
+ user((User));
+ dns[DNS];
+ router_us(Router);
+ router_eu(Router);
+ cell_us0{Cell US0};
+ cell_us1{Cell US1};
+ cell_eu0{Cell EU0};
+ cell_eu1{Cell EU1};
+ db_gitlab_users[(gitlab_users Primary)];
+ db_gitlab_routes[(gitlab_routes Primary)];
+ db_gitlab_users_replica[(gitlab_users Replica)];
+ db_gitlab_routes_replica[(gitlab_routes Replica)];
+ db_cell_us0[(gitlab_main/gitlab_ci Cell US0)];
+ db_cell_us1[(gitlab_main/gitlab_ci Cell US1)];
+ db_cell_eu0[(gitlab_main/gitlab_ci Cell EU0)];
+ db_cell_eu1[(gitlab_main/gitlab_ci Cell EU1)];
+ user-->dns;
+ dns-->router_us;
+ dns-->router_eu;
+ subgraph Europe
+ router_eu-->cell_eu0;
+ router_eu-->cell_eu1;
+ cell_eu0-->db_cell_eu0;
+ cell_eu0-->db_gitlab_users_replica;
+ cell_eu0-->db_gitlab_routes_replica;
+ cell_eu1-->db_gitlab_users_replica;
+ cell_eu1-->db_gitlab_routes_replica;
+ cell_eu1-->db_cell_eu1;
+ end
+ subgraph United States
+ router_us-->cell_us0;
+ router_us-->cell_us1;
+ cell_us0-->db_cell_us0;
+ cell_us0-->db_gitlab_users;
+ cell_us0-->db_gitlab_routes;
+ cell_us1-->db_gitlab_users;
+ cell_us1-->db_gitlab_routes;
+ cell_us1-->db_cell_us1;
+ end
+ router_eu-.->cell_us0;
+ router_eu-.->cell_us1;
+ router_us-.->cell_eu0;
+ router_us-.->cell_eu1;
+```
+
+## Summary of changes
+
+1. Tables related to User data (including profile settings, authentication credentials, personal access tokens) are decomposed into a `gitlab_users` schema
+1. The `routes` table is decomposed into `gitlab_routes` schema
+1. The `application_settings` (and probably a few other instance level tables) are decomposed into `gitlab_admin` schema
+1. A new column `routes.cell_id` is added to `routes` table
+1. A new Router service exists to choose which cell to route a request to.
+1. If a router receives a new request it will send `/api/v4/cells/learn?method=GET&path_info=/group-org/project` to learn which Cell can process it
+1. A new concept will be introduced in GitLab called an organization
+1. We require all existing endpoints to be routable by URI, or be fixed to a specific Cell for processing. This requires changing ambiguous endpoints like `/dashboard` to be scoped like `/organizations/my-organization/-/dashboard`
+1. Endpoints like `/admin` would be routed always to the specific Cell, like `cell_0`
+1. Each Cell can respond to `/api/v4/cells/learn` and classify each endpoint
+1. Writes to `gitlab_users` and `gitlab_routes` are sent to a primary PostgreSQL server in our `US` region but reads can come from replicas in the same region. This will add latency for these writes but we expect they are infrequent relative to the rest of GitLab.
+
+## Pre-flight request learning
+
+While processing a request the URI will be decoded and a pre-flight request
+will be sent for each non-cached endpoint.
+
+When asking for the endpoint GitLab Rails will return information about
+the routable path. GitLab Rails will decode `path_info` and match it to
+an existing endpoint and find a routable entity (like project). The router will
+treat this as short-lived cache information.
+
+1. Prefix match: `/api/v4/cells/learn?method=GET&path_info=/gitlab-org/gitlab-test/-/issues`
+
+ ```json
+ {
+ "path": "/gitlab-org/gitlab-test",
+ "cell": "cell_0",
+ "source": "routable"
+ }
+ ```
+
+1. Some endpoints might require an exact match: `/api/v4/cells/learn?method=GET&path_info=/-/profile`
+
+ ```json
+ {
+ "path": "/-/profile",
+ "cell": "cell_0",
+ "source": "fixed",
+ "exact": true
+ }
+ ```
+
+## Detailed explanation of default organization in the first iteration
+
+All users will get a new column `users.default_organization` which they can
+control in user settings. We will introduce a concept of the
+`GitLab.com Public` organization. This will be set as the default organization for all existing
+users. This organization will allow the user to see data from all namespaces in
+`Cell US0` (ie. our original GitLab.com instance). This behavior can be invisible to
+existing users such that they don't even get told when they are viewing a
+global page like `/dashboard` that it's even scoped to an organization.
+
+Any new users with a default organization other than `GitLab.com Public` will have
+a distinct user experience and will be fully aware that every page they load is
+only ever scoped to a single organization. These users can never
+load any global pages like `/dashboard` and will end up being redirected to
+`/organizations/<DEFAULT_ORGANIZATION>/-/dashboard`. This may also be the case
+for legacy APIs and such users may only ever be able to use APIs scoped to a
+organization.
+
+## Detailed explanation of Admin Area settings
+
+We believe that maintaining and synchronizing Admin Area settings will be
+frustrating and painful so to avoid this we will decompose and share all Admin Area
+settings in the `gitlab_admin` schema. This should be safe (similar to other
+shared schemas) because these receive very little write traffic.
+
+In cases where different cells need different settings (eg. the
+Elasticsearch URL), we will either decide to use a templated
+format in the relevant `application_settings` row which allows it to be dynamic
+per cell. Alternatively if that proves difficult we'll introduce a new table
+called `per_cell_application_settings` and this will have 1 row per cell to allow
+setting different settings per cell. It will still be part of the `gitlab_admin`
+schema and shared which will allow us to centrally manage it and simplify
+keeping settings in sync for all cells.
+
+## Pros
+
+1. Router is stateless and can live in many regions. We use Anycast DNS to resolve to nearest region for the user.
+1. Cells can receive requests for namespaces in the wrong cell and the user
+ still gets the right response as well as caching at the router that
+ ensures the next request is sent to the correct cell so the next request
+ will go to the correct cell
+1. The majority of the code still lives in `gitlab` rails codebase. The Router doesn't actually need to understand how GitLab URLs are composed.
+1. Since the responsibility to read and write `gitlab_users`,
+ `gitlab_routes` and `gitlab_admin` still lives in Rails it means minimal
+ changes will be needed to the Rails application compared to extracting
+ services that need to isolate the domain models and build new interfaces.
+1. Compared to a separate routing service this allows the Rails application
+ to encode more complex rules around how to map URLs to the correct cell
+ and may work for some existing API endpoints.
+1. All the new infrastructure (just a router) is optional and a single-cell
+ self-managed installation does not even need to run the Router and there are
+ no other new services.
+
+## Cons
+
+1. `gitlab_users`, `gitlab_routes` and `gitlab_admin` databases may need to be
+ replicated across regions and writes need to go across regions. We need to
+ do an analysis on write TPS for the relevant tables to determine if this is
+ feasible.
+1. Sharing access to the database from many different Cells means that they are
+ all coupled at the Postgres schema level and this means changes to the
+ database schema need to be done carefully in sync with the deployment of all
+ Cells. This limits us to ensure that Cells are kept in closely similar
+ versions compared to an architecture with shared services that have an API
+ we control.
+1. Although most data is stored in the right region there can be requests
+ proxied from another region which may be an issue for certain types
+ of compliance.
+1. Data in `gitlab_users` and `gitlab_routes` databases must be replicated in
+ all regions which may be an issue for certain types of compliance.
+1. The router cache may need to be very large if we get a wide variety of URLs
+ (ie. long tail). In such a case we may need to implement a 2nd level of
+ caching in user cookies so their frequently accessed pages always go to the
+ right cell the first time.
+1. Having shared database access for `gitlab_users` and `gitlab_routes`
+ from multiple cells is an unusual architecture decision compared to
+ extracting services that are called from multiple cells.
+1. It is very likely we won't be able to find cacheable elements of a
+ GraphQL URL and often existing GraphQL endpoints are heavily dependent on
+ ids that won't be in the `routes` table so cells won't necessarily know
+ what cell has the data. As such we'll probably have to update our GraphQL
+ calls to include an organization context in the path like
+ `/api/organizations/<organization>/graphql`.
+1. This architecture implies that implemented endpoints can only access data
+ that are readily accessible on a given Cell, but are unlikely
+ to aggregate information from many Cells.
+1. All unknown routes are sent to the latest deployment which we assume to be `Cell US0`.
+ This is required as newly added endpoints will be only decodable by latest cell.
+ Likely this is not a problem for the `/cells/learn` is it is lightweight
+ to process and this should not cause a performance impact.
+
+## Example database configuration
+
+Handling shared `gitlab_users`, `gitlab_routes` and `gitlab_admin` databases, while having dedicated `gitlab_main` and `gitlab_ci` databases should already be handled by the way we use `config/database.yml`. We should also, already be able to handle the dedicated EU replicas while having a single US primary for `gitlab_users` and `gitlab_routes`. Below is a snippet of part of the database configuration for the Cell architecture described above.
+
+**Cell US0**:
+
+```yaml
+# config/database.yml
+production:
+ main:
+ host: postgres-main.cell-us0.primary.consul
+ load_balancing:
+ discovery: postgres-main.cell-us0.replicas.consul
+ ci:
+ host: postgres-ci.cell-us0.primary.consul
+ load_balancing:
+ discovery: postgres-ci.cell-us0.replicas.consul
+ users:
+ host: postgres-users-primary.consul
+ load_balancing:
+ discovery: postgres-users-replicas.us.consul
+ routes:
+ host: postgres-routes-primary.consul
+ load_balancing:
+ discovery: postgres-routes-replicas.us.consul
+ admin:
+ host: postgres-admin-primary.consul
+ load_balancing:
+ discovery: postgres-admin-replicas.us.consul
+```
+
+**Cell EU0**:
+
+```yaml
+# config/database.yml
+production:
+ main:
+ host: postgres-main.cell-eu0.primary.consul
+ load_balancing:
+ discovery: postgres-main.cell-eu0.replicas.consul
+ ci:
+ host: postgres-ci.cell-eu0.primary.consul
+ load_balancing:
+ discovery: postgres-ci.cell-eu0.replicas.consul
+ users:
+ host: postgres-users-primary.consul
+ load_balancing:
+ discovery: postgres-users-replicas.eu.consul
+ routes:
+ host: postgres-routes-primary.consul
+ load_balancing:
+ discovery: postgres-routes-replicas.eu.consul
+ admin:
+ host: postgres-admin-primary.consul
+ load_balancing:
+ discovery: postgres-admin-replicas.eu.consul
+```
+
+## Request flows
+
+1. `gitlab-org` is a top level namespace and lives in `Cell US0` in the `GitLab.com Public` organization
+1. `my-company` is a top level namespace and lives in `Cell EU0` in the `my-organization` organization
+
+### Experience for paying user that is part of `my-organization`
+
+Such a user will have a default organization set to `/my-organization` and will be
+unable to load any global routes outside of this organization. They may load other
+projects/namespaces but their MR/Todo/Issue counts at the top of the page will
+not be correctly populated in the first iteration. The user will be aware of
+this limitation.
+
+#### Navigates to `/my-company/my-project` while logged in
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. They request `/my-company/my-project` without the router cache, so the router chooses randomly `Cell EU1`
+1. The `/cells/learn` is sent to `Cell EU1`, which responds that resource lives on `Cell EU0`
+1. `Cell EU0` returns the correct response
+1. The router now caches and remembers any request paths matching `/my-company/*` should go to `Cell EU0`
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant cell_eu0 as Cell EU0
+ participant cell_eu1 as Cell EU1
+ user->>router_eu: GET /my-company/my-project
+ router_eu->>cell_eu1: /api/v4/cells/learn?method=GET&path_info=/my-company/my-project
+ cell_eu1->>router_eu: {path: "/my-company", cell: "cell_eu0", source: "routable"}
+ router_eu->>cell_eu0: GET /my-company/my-project
+ cell_eu0->>user: <h1>My Project...
+```
+
+#### Navigates to `/my-company/my-project` while not logged in
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. The router does not have `/my-company/*` cached yet so it chooses randomly `Cell EU1`
+1. The `/cells/learn` is sent to `Cell EU1`, which responds that resource lives on `Cell EU0`
+1. `Cell EU0` redirects them through a login flow
+1. User requests `/users/sign_in`, uses random Cell to run `/cells/learn`
+1. The `Cell EU1` responds with `cell_0` as a fixed route
+1. User after login requests `/my-company/my-project` which is cached and stored in `Cell EU0`
+1. `Cell EU0` returns the correct response
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant cell_eu0 as Cell EU0
+ participant cell_eu1 as Cell EU1
+ user->>router_eu: GET /my-company/my-project
+ router_eu->>cell_eu1: /api/v4/cells/learn?method=GET&path_info=/my-company/my-project
+ cell_eu1->>router_eu: {path: "/my-company", cell: "cell_eu0", source: "routable"}
+ router_eu->>cell_eu0: GET /my-company/my-project
+ cell_eu0->>user: 302 /users/sign_in?redirect=/my-company/my-project
+ user->>router_eu: GET /users/sign_in?redirect=/my-company/my-project
+ router_eu->>cell_eu1: /api/v4/cells/learn?method=GET&path_info=/users/sign_in
+ cell_eu1->>router_eu: {path: "/users", cell: "cell_eu0", source: "fixed"}
+ router_eu->>cell_eu0: GET /users/sign_in?redirect=/my-company/my-project
+ cell_eu0-->>user: <h1>Sign in...
+ user->>router_eu: POST /users/sign_in?redirect=/my-company/my-project
+ router_eu->>cell_eu0: POST /users/sign_in?redirect=/my-company/my-project
+ cell_eu0->>user: 302 /my-company/my-project
+ user->>router_eu: GET /my-company/my-project
+ router_eu->>cell_eu0: GET /my-company/my-project
+ router_eu->>cell_eu0: GET /my-company/my-project
+ cell_eu0->>user: <h1>My Project...
+```
+
+#### Navigates to `/my-company/my-other-project` after last step
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. The router cache now has `/my-company/* => Cell EU0`, so the router chooses `Cell EU0`
+1. `Cell EU0` returns the correct response as well as the cache header again
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant cell_eu0 as Cell EU0
+ participant cell_eu1 as Cell EU1
+ user->>router_eu: GET /my-company/my-project
+ router_eu->>cell_eu0: GET /my-company/my-project
+ cell_eu0->>user: <h1>My Project...
+```
+
+#### Navigates to `/gitlab-org/gitlab` after last step
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. The router has no cached value for this URL so randomly chooses `Cell EU0`
+1. `Cell EU0` redirects the router to `Cell US0`
+1. `Cell US0` returns the correct response as well as the cache header again
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant cell_eu0 as Cell EU0
+ participant cell_us0 as Cell US0
+ user->>router_eu: GET /gitlab-org/gitlab
+ router_eu->>cell_eu0: /api/v4/cells/learn?method=GET&path_info=/gitlab-org/gitlab
+ cell_eu0->>router_eu: {path: "/gitlab-org", cell: "cell_us0", source: "routable"}
+ router_eu->>cell_us0: GET /gitlab-org/gitlab
+ cell_us0->>user: <h1>GitLab.org...
+```
+
+In this case the user is not on their "default organization" so their TODO
+counter will not include their normal todos. We may choose to highlight this in
+the UI somewhere. A future iteration may be able to fetch that for them from
+their default organization.
+
+#### Navigates to `/`
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. Router does not have a cache for `/` route (specifically rails never tells it to cache this route)
+1. The Router choose `Cell EU0` randomly
+1. The Rails application knows the users default organization is `/my-organization`, so
+ it redirects the user to `/organizations/my-organization/-/dashboard`
+1. The Router has a cached value for `/organizations/my-organization/*` so it then sends the
+ request to `POD EU0`
+1. `Cell EU0` serves up a new page `/organizations/my-organization/-/dashboard` which is the same
+ dashboard view we have today but scoped to an organization clearly in the UI
+1. The user is (optionally) presented with a message saying that data on this page is only
+ from their default organization and that they can change their default
+ organization if it's not right.
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant cell_eu0 as Cell EU0
+ user->>router_eu: GET /
+ router_eu->>cell_eu0: GET /
+ cell_eu0->>user: 302 /organizations/my-organization/-/dashboard
+ user->>router: GET /organizations/my-organization/-/dashboard
+ router->>cell_eu0: GET /organizations/my-organization/-/dashboard
+ cell_eu0->>user: <h1>My Company Dashboard... X-Gitlab-Cell-Cache={path_prefix:/organizations/my-organization/}
+```
+
+#### Navigates to `/dashboard`
+
+As above, they will end up on `/organizations/my-organization/-/dashboard` as
+the rails application will already redirect `/` to the dashboard page.
+
+### Navigates to `/not-my-company/not-my-project` while logged in (but they don't have access since this project/group is private)
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. The router knows that `/not-my-company` lives in `Cell US1` so sends the request to this
+1. The user does not have access so `Cell US1` returns 404
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant cell_us1 as Cell US1
+ user->>router_eu: GET /not-my-company/not-my-project
+ router_eu->>cell_us1: GET /not-my-company/not-my-project
+ cell_us1->>user: 404
+```
+
+#### Creates a new top level namespace
+
+The user will be asked which organization they want the namespace to belong to.
+If they select `my-organization` then it will end up on the same cell as all
+other namespaces in `my-organization`. If they select nothing we default to
+`GitLab.com Public` and it is clear to the user that this is isolated from
+their existing organization such that they won't be able to see data from both
+on a single page.
+
+### Experience for GitLab team member that is part of `/gitlab-org`
+
+Such a user is considered a legacy user and has their default organization set to
+`GitLab.com Public`. This is a "meta" organization that does not really exist but
+the Rails application knows to interpret this organization to mean that they are
+allowed to use legacy global functionality like `/dashboard` to see data across
+namespaces located on `Cell US0`. The rails backend also knows that the default cell to render any ambiguous
+routes like `/dashboard` is `Cell US0`. Lastly the user will be allowed to
+navigate to organizations on another cell like `/my-organization` but when they do the
+user will see a message indicating that some data may be missing (eg. the
+MRs/Issues/Todos) counts.
+
+#### Navigates to `/gitlab-org/gitlab` while not logged in
+
+1. User is in the US so DNS resolves to the US router
+1. The router knows that `/gitlab-org` lives in `Cell US0` so sends the request
+ to this cell
+1. `Cell US0` serves up the response
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_us as Router US
+ participant cell_us0 as Cell US0
+ user->>router_us: GET /gitlab-org/gitlab
+ router_us->>cell_us0: GET /gitlab-org/gitlab
+ cell_us0->>user: <h1>GitLab.org...
+```
+
+#### Navigates to `/`
+
+1. User is in US so DNS resolves to the router in US
+1. Router does not have a cache for `/` route (specifically rails never tells it to cache this route)
+1. The Router chooses `Cell US1` randomly
+1. The Rails application knows the users default organization is `GitLab.com Public`, so
+ it redirects the user to `/dashboards` (only legacy users can see
+ `/dashboard` global view)
+1. Router does not have a cache for `/dashboard` route (specifically rails never tells it to cache this route)
+1. The Router chooses `Cell US1` randomly
+1. The Rails application knows the users default organization is `GitLab.com Public`, so
+ it allows the user to load `/dashboards` (only legacy users can see
+ `/dashboard` global view) and redirects to router the legacy cell which is `Cell US0`
+1. `Cell US0` serves up the global view dashboard page `/dashboard` which is the same
+ dashboard view we have today
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_us as Router US
+ participant cell_us0 as Cell US0
+ participant cell_us1 as Cell US1
+ user->>router_us: GET /
+ router_us->>cell_us1: GET /
+ cell_us1->>user: 302 /dashboard
+ user->>router_us: GET /dashboard
+ router_us->>cell_us1: /api/v4/cells/learn?method=GET&path_info=/dashboard
+ cell_us1->>router_us: {path: "/dashboard", cell: "cell_us0", source: "routable"}
+ router_us->>cell_us0: GET /dashboard
+ cell_us0->>user: <h1>Dashboard...
+```
+
+#### Navigates to `/my-company/my-other-project` while logged in (but they don't have access since this project is private)
+
+They get a 404.
+
+### Experience for non-authenticated users
+
+Flow is similar to logged in users except global routes like `/dashboard` will
+redirect to the login page as there is no default organization to choose from.
+
+### A new customers signs up
+
+They will be asked if they are already part of an organization or if they'd
+like to create one. If they choose neither they end up no the default
+`GitLab.com Public` organization.
+
+### An organization is moved from 1 cell to another
+
+TODO
+
+### GraphQL/API requests which don't include the namespace in the URL
+
+TODO
+
+### The autocomplete suggestion functionality in the search bar which remembers recent issues/MRs
+
+TODO
+
+### Global search
+
+TODO
+
+## Administrator
+
+### Loads `/admin` page
+
+1. The `/admin` is locked to `Cell US0`
+1. Some endpoints of `/admin`, like Projects in Admin are scoped to a Cell
+ and users needs to choose the correct one in a dropdown, which results in endpoint
+ like `/admin/cells/cell_0/projects`.
+
+Admin Area settings in Postgres are all shared across all cells to avoid
+divergence but we still make it clear in the URL and UI which cell is serving
+the Admin Area page as there is dynamic data being generated from these pages and
+the operator may want to view a specific cell.
+
+## More Technical Problems To Solve
+
+### Replicating User Sessions Between All Cells
+
+Today user sessions live in Redis but each cell will have their own Redis instance. We already use a dedicated Redis instance for sessions so we could consider sharing this with all cells like we do with `gitlab_users` PostgreSQL database. But an important consideration will be latency as we would still want to mostly fetch sessions from the same region.
+
+An alternative might be that user sessions get moved to a JWT payload that encodes all the session data but this has downsides. For example, it is difficult to expire a user session, when their password changes or for other reasons, if the session lives in a JWT controlled by the user.
+
+### How do we migrate between Cells
+
+Migrating data between cells will need to factor all data stores:
+
+1. PostgreSQL
+1. Redis Shared State
+1. Gitaly
+1. Elasticsearch
+
+### Is it still possible to leak the existence of private groups via a timing attack?
+
+If you have router in EU, and you know that EU router by default redirects
+to EU located Cells, you know their latency (lets assume 10 ms). Now, if your
+request is bounced back and redirected to US which has different latency
+(lets assume that roundtrip will be around 60 ms) you can deduce that 404 was
+returned by US Cell and know that your 404 is in fact 403.
+
+We may defer this until we actually implement a cell in a different region. Such timing attacks are already theoretically possible with the way we do permission checks today but the timing difference is probably too small to be able to detect.
+
+One technique to mitigate this risk might be to have the router add a random
+delay to any request that returns 404 from a cell.
+
+## Should runners be shared across all cells?
+
+We have 2 options and we should decide which is easier:
+
+1. Decompose runner registration and queuing tables and share them across all
+ cells. This may have implications for scalability, and we'd need to consider
+ if this would include group/project runners as this may have scalability
+ concerns as these are high traffic tables that would need to be shared.
+1. Runners are registered per-cell and, we probably have a separate fleet of
+ runners for every cell or just register the same runners to many cells which
+ may have implications for queueing
+
+## How do we guarantee unique ids across all cells for things that cannot conflict?
+
+This project assumes at least namespaces and projects have unique ids across
+all cells as many requests need to be routed based on their ID. Since those
+tables are across different databases then guaranteeing a unique ID will
+require a new solution. There are likely other tables where unique IDs are
+necessary and depending on how we resolve routing for GraphQL and other APIs
+and other design goals it may be determined that we want the primary key to be
+unique for all tables.