diff options
author | GitLab Bot <gitlab-bot@gitlab.com> | 2023-05-17 19:05:49 +0300 |
---|---|---|
committer | GitLab Bot <gitlab-bot@gitlab.com> | 2023-05-17 19:05:49 +0300 |
commit | 43a25d93ebdabea52f99b05e15b06250cd8f07d7 (patch) | |
tree | dceebdc68925362117480a5d672bcff122fb625b /doc/architecture/blueprints/pods/pods-feature-ci-runners.md | |
parent | 20c84b99005abd1c82101dfeff264ac50d2df211 (diff) |
Add latest changes from gitlab-org/gitlab@16-0-stable-eev16.0.0-rc42
Diffstat (limited to 'doc/architecture/blueprints/pods/pods-feature-ci-runners.md')
-rw-r--r-- | doc/architecture/blueprints/pods/pods-feature-ci-runners.md | 172 |
1 files changed, 7 insertions, 165 deletions
diff --git a/doc/architecture/blueprints/pods/pods-feature-ci-runners.md b/doc/architecture/blueprints/pods/pods-feature-ci-runners.md index b75515a916f..1985bb21884 100644 --- a/doc/architecture/blueprints/pods/pods-feature-ci-runners.md +++ b/doc/architecture/blueprints/pods/pods-feature-ci-runners.md @@ -1,169 +1,11 @@ --- -stage: enablement -group: pods -comments: false -description: 'Pods: CI Runners' +redirect_to: '../cells/cells-feature-ci-runners.md' +remove_date: '2023-06-13' --- -This document is a work-in-progress and represents a very early state of the -Pods design. Significant aspects are not documented, though we expect to add -them in the future. This is one possible architecture for Pods, and we intend to -contrast this with alternatives before deciding which approach to implement. -This documentation will be kept even if we decide not to implement this so that -we can document the reasons for not choosing this approach. +This document was moved to [another location](../cells/cells-feature-ci-runners.md). -# Pods: CI Runners - -GitLab in order to execute CI jobs [GitLab Runner](https://gitlab.com/gitlab-org/gitlab-runner/), -very often managed by customer in their infrastructure. - -All CI jobs created as part of CI pipeline are run in a context of project -it poses a challenge how to manage GitLab Runners. - -## 1. Definition - -There are 3 different types of runners: - -- instance-wide: runners that are registered globally with specific tags (selection criteria) -- group runners: runners that execute jobs from a given top-level group or subprojects of that group -- project runners: runners that execute jobs from projects or many projects: some runners might - have projects assigned from projects in different top-level groups. - -This alongside with existing data structure where `ci_runners` is a table describing -all types of runners poses a challenge how the `ci_runners` should be managed in a Pods environment. - -## 2. Data flow - -GitLab Runners use a set of globally scoped endpoints to: - -- registration of a new runner via registration token `https://gitlab.com/api/v4/runners` - ([subject for removal](../runner_tokens/index.md)) (`registration token`) -- requests jobs via an authenticated `https://gitlab.com/api/v4/jobs/request` endpoint (`runner token`) -- upload job status via `https://gitlab.com/api/v4/jobs/:job_id` (`build token`) -- upload trace via `https://gitlab.com/api/v4/jobs/:job_id/trace` (`build token`) -- download and upload artifacts via `https://gitlab.com/api/v4/jobs/:job_id/artifacts` (`build token`) - -Currently three types of authentication tokens are used: - -- runner registration token ([subject for removal](../runner_tokens/index.md)) -- runner token representing an registered runner in a system with specific configuration (`tags`, `locked`, etc.) -- build token representing an ephemeral token giving a limited access to updating a specific - job, uploading artifacts, downloading dependent artifacts, downloading and uploading - container registry images - -Each of those endpoints do receive an authentication token via header (`JOB-TOKEN` for `/trace`) -or body parameter (`token` all other endpoints). - -Since the CI pipeline would be created in a context of a specific Pod it would be required -that pick of a build would have to be processed by that particular Pod. This requires -that build picking depending on a solution would have to be either: - -- routed to correct Pod for a first time -- be made to be two phase: request build from global pool, claim build on a specific Pod using a Pod specific URL - -## 3. Proposal - -This section describes various proposals. Reader should consider that those -proposals do describe solutions for different problems. Many or some aspects -of those proposals might be the solution to the stated problem. - -### 3.1. Authentication tokens - -Even though the paths for CI Runners are not routable they can be made routable with -those two possible solutions: - -- The `https://gitlab.com/api/v4/jobs/request` uses a long polling mechanism with - a ticketing mechanism (based on `X-GitLab-Last-Update` header). Runner when first - starts sends a request to GitLab to which GitLab responds with either a build to pick - by runner. This value is completely controlled by GitLab. This allows GitLab - to use JWT or any other means to encode `pod` identifier that could be easily - decodable by Router. -- The majority of communication (in terms of volume) is using `build token` making it - the easiest target to change since GitLab is sole owner of the token that Runner later - uses for specific job. There were prior discussions about not storing `build token` - but rather using `JWT` token with defined scopes. Such token could encode the `pod` - to which router could easily route all requests. - -### 3.2. Request body - -- The most of used endpoints pass authentication token in request body. It might be desired - to use HTTP Headers as an easier way to access this information by Router without - a need to proxy requests. - -### 3.3. Instance-wide are Pod local - -We can pick a design where all runners are always registered and local to a given Pod: - -- Each Pod has it's own set of instance-wide runners that are updated at it's own pace -- The project runners can only be linked to projects from the same organization - creating strong isolation. -- In this model the `ci_runners` table is local to the Pod. -- In this model we would require the above endpoints to be scoped to a Pod in some way - or made routable. It might be via prefixing them, adding additional Pod parameter, - or providing much more robust way to decode runner token and match it to Pod. -- If routable token is used, we could move away from cryptographic random stored in - database to rather prefer to use JWT tokens that would encode -- The Admin Area showing registered Runners would have to be scoped to a Pod - -This model might be desired since it provides strong isolation guarantees. -This model does significantly increase maintenance overhead since each Pod is managed -separately. - -This model may require adjustments to runner tags feature so that projects have consistent runner experience across pods. - -### 3.4. Instance-wide are cluster-wide - -Contrary to proposal where all runners are Pod local, we can consider that runners -are global, or just instance-wide runners are global. - -However, this requires significant overhaul of system and to change the following aspects: - -- `ci_runners` table would likely have to be split decomposed into `ci_instance_runners`, ... -- all interfaces would have to be adopted to use correct table -- build queuing would have to be reworked to be two phase where each Pod would know of all pending - and running builds, but the actual claim of a build would happen against a Pod containing data -- likely `ci_pending_builds` and `ci_running_builds` would have to be made `cluster-wide` tables - increasing likelihood of creating hotspots in a system related to CI queueing - -This model makes it complex to implement from engineering side. Does make some data being shared -between Pods. Creates hotspots / scalability issues in a system (ex. during abuse) that -might impact experience of organizations on other Pods. - -### 3.5. GitLab CI Daemon - -Another potential solution to explore is to have a dedicated service responsible for builds queueing -owning it's database and working in a model of either sharded or podded service. There were prior -discussions about [CI/CD Daemon](https://gitlab.com/gitlab-org/gitlab/-/issues/19435). - -If the service would be sharded: - -- depending on a model if runners are cluster-wide or pod-local this service would have to fetch - data from all Pods -- if the sharded service would be used we could adapt a model of either sharing database containing - `ci_pending_builds/ci_running_builds` with the service -- if the sharded service would be used we could consider a push model where each Pod pushes to CI/CD Daemon - builds that should be picked by Runner -- the sharded service would be aware which Pod is responsible for processing the given build and could - route processing requests to designated Pod - -If the service would be podded: - -- all expectations of routable endpoints are still valid - -In general usage of CI Daemon does not help significantly with the stated problem. However, this offers -a few upsides related to more efficient processing and decoupling model: push model and it opens a way -to offer stateful communication with GitLab Runners (ex. gRPC or Websockets). - -## 4. Evaluation - -Considering all solutions it appears that solution giving the most promise is: - -- use "instance-wide are Pod local" -- refine endpoints to have routable identities (either via specific paths, or better tokens) - -Other potential upsides is to get rid of `ci_builds.token` and rather use a `JWT token` -that can much better and easier encode wider set of scopes allowed by CI runner. - -## 4.1. Pros - -## 4.2. Cons +<!-- This redirect file can be deleted after <2023-06-13>. --> +<!-- Redirects that point to other docs in the same project expire in three months. --> +<!-- Redirects that point to docs in a different project or site (link is not relative and starts with `https:`) expire in one year. --> +<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html --> |