Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/agent')
-rw-r--r--doc/development/agent/gitops.md148
-rw-r--r--doc/development/agent/identity.md94
-rw-r--r--doc/development/agent/index.md87
-rw-r--r--doc/development/agent/local.md58
-rw-r--r--doc/development/agent/routing.md223
-rw-r--r--doc/development/agent/user_stories.md77
6 files changed, 687 insertions, 0 deletions
diff --git a/doc/development/agent/gitops.md b/doc/development/agent/gitops.md
new file mode 100644
index 00000000000..8c8586326fa
--- /dev/null
+++ b/doc/development/agent/gitops.md
@@ -0,0 +1,148 @@
+---
+stage: Configure
+group: Configure
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+---
+
+# GitOps with the Kubernetes Agent **(PREMIUM ONLY)**
+
+The [GitLab Kubernetes Agent](../../user/clusters/agent/index.md) supports the
+[pull-based version](https://www.gitops.tech/#pull-based-deployments) of
+[GitOps](https://www.gitops.tech/). To be useful, the feature must be able to perform these tasks:
+
+- Connect one or more Kubernetes clusters to a GitLab project or group.
+- Synchronize cluster-wide state from a Git repository.
+- Synchronize namespace-scoped state from a Git repository.
+- Control the following settings:
+
+ - The kinds of objects an agent can manage.
+ - Enabling the namespaced mode of operation for managing objects only in a specific namespace.
+ - Enabling the non-namespaced mode of operation for managing objects in any namespace, and
+ managing non-namespaced objects.
+
+- Synchronize state from one or more Git repositories into a cluster.
+- Configure multiple agents running in different clusters to synchronize state
+ from the same repository.
+
+## GitOps architecture
+
+In this architecture, the Kubernetes cluster (`agentk`) periodically fetches
+configuration from (`kas`), spawning a goroutine for each configured GitOps
+repository. Each goroutine makes a streaming `GetObjectsToSynchronize()` gRPC call.
+`kas` accepts these requests, then checks if this agent is authorized to access
+this GitLab repository. If authorized, `kas` polls Gitaly for repository updates
+and sends the latest manifests to the agent.
+
+Before each poll, `kas` verifies with GitLab that the agent's token is still valid.
+When `agentk` receives an updated manifest, it performs a synchronization using
+[`gitops-engine`](https://github.com/argoproj/gitops-engine).
+
+If a repository is removed from the list, `agentk` stops the `GetObjectsToSynchronize()`
+calls to that repository.
+
+```mermaid
+graph TB
+ agentk -- fetch configuration --> kas
+ agentk -- fetch GitOps manifests --> kas
+
+ subgraph "GitLab"
+ kas[kas]
+ GitLabRoR[GitLab RoR]
+ Gitaly[Gitaly]
+ kas -- poll GitOps repositories --> Gitaly
+ kas -- authZ for agentk --> GitLabRoR
+ kas -- fetch configuration --> Gitaly
+ end
+
+ subgraph "Kubernetes cluster"
+ agentk[agentk]
+ end
+```
+
+## Architecture considered but not implemented
+
+As part of the implementation process, this architecture was considered, but ultimately
+not implemented.
+
+In this architecture, `agentk` periodically fetches configuration from `kas`. For each
+configured GitOps repository, it spawns a goroutine. Each goroutine then spawns a
+copy of [`git-sync`](https://github.com/kubernetes/git-sync). It polls a particular
+repository and invokes a corresponding webhook on `agentk` when it changes. When that
+happens, `agentk` performs a synchronization using
+[`gitops-engine`](https://github.com/argoproj/gitops-engine).
+
+For repositories no longer in the list, `agentk` stops corresponding goroutines
+and `git-sync` copies, also deleting their cloned repositories from disk:
+
+```mermaid
+graph TB
+ agentk -- fetch configuration --> kas
+ git-sync -- poll GitOps repositories --> GitLabRoR
+
+ subgraph "GitLab"
+ kas[kas]
+ GitLabRoR[GitLab RoR]
+ kas -- authZ for agentk --> GitLabRoR
+ kas -- fetch configuration --> Gitaly[Gitaly]
+ end
+
+ subgraph "Kubernetes cluster"
+ agentk[agentk]
+ git-sync[git-sync]
+ agentk -- control --> git-sync
+ git-sync -- notify about changes --> agentk
+ end
+```
+
+## Comparing implemented and non-implemented architectures
+
+Both architectures attempt to answer the same question: how to grant an agent
+access to a non-public repository?
+
+In the **implemented** architecture:
+
+- Favorable: Fewer moving parts, as `git-sync` and `git` are not used, making this
+ design more reliable.
+- Favorable: Uses existing connectivity and authentication mechanisms are used (gRPC + `agentk` token).
+- Favorable: No polling through external infrastructure. Saves traffic and avoids
+ noise in access logs.
+
+In the **unimplemented** architecture:
+
+- Favorable: `agentk` uses `git-sync` to access repositories with standard protocols
+ (either HTTPS, or SSH and Git) with accepted authentication and authorization methods.
+
+ - Unfavorable: The user must put credentials into a `secret`. GitLab doesn't have
+ a mechanism for per-repository tokens for robots.
+ - Unfavorable: Rotating all credentials is more work than rotating a single `agentk` token.
+
+- Unfavorable: A dependency on an external component (`git-sync`) that can be avoided.
+- Unfavorable: More network traffic and connections than the implemented design
+
+### Ideas considered for the unimplemented design
+
+As part of the design process, these ideas were considered, and discarded:
+
+- Running `git-sync` and `gitops-engine` as part of `kas`.
+
+ - Favorable: More code and infrastructure under our control for GitLab.com
+ - Unfavorable: Running an arbitrary number of `git-sync` processes would require
+ an unbounded amount of RAM and disk space.
+ - Unfavorable: Unclear which `kas` replica is responsible for which agent and
+ repository synchronization. If done as part of `agentk`, leader election can be
+ done using [client-go](https://pkg.go.dev/k8s.io/client-go/tools/leaderelection?tab=doc).
+
+- Running `git-sync` and a "`gitops-engine` driver" helper program as a separate
+ Kubernetes `Deployment`.
+
+ - Favorable: Better isolation and higher resiliency. For example, if the node
+ with `agentk` dies, not all synchronization stops.
+ - Favorable: Each deployment has its own memory and disk limits.
+ - Favorable: Per-repository synchronization identity (distinct `ServiceAccount`)
+ can be implemented.
+ - Unfavorable: Time consuming to implement properly:
+
+ - Each `Deployment` needs CRUD (create, update, and delete) permissions.
+ - Users may want to customize a `Deployment`, or add and remove satellite objects
+ like `PodDisruptionBudget`, `HorizontalPodAutoscaler`, and `PodSecurityPolicy`.
+ - Metrics, monitoring, logs for the `Deployment`.
diff --git a/doc/development/agent/identity.md b/doc/development/agent/identity.md
new file mode 100644
index 00000000000..65de1a6f0c8
--- /dev/null
+++ b/doc/development/agent/identity.md
@@ -0,0 +1,94 @@
+---
+stage: Configure
+group: Configure
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+---
+
+# Kubernetes Agent identity and authentication **(PREMIUM ONLY)**
+
+This page uses the word `agent` to describe the concept of the
+GitLab Kubernetes Agent. The program that implements the concept is called `agentk`.
+Read the
+[architecture page](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/master/doc/architecture.md)
+for more information.
+
+## Agent identity and name
+
+In a GitLab installation, each agent must have a unique, immutable name. This
+name must be unique in the project the agent is attached to, and this name must
+follow the [DNS label standard from RFC 1123](https://tools.ietf.org/html/rfc1123).
+The name must:
+
+- Contain at most 63 characters.
+- Contain only lowercase alphanumeric characters or `-`.
+- Start with an alphanumeric character.
+- End with an alphanumeric character.
+
+Kubernetes uses the
+[same naming restriction](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names)
+for some names.
+
+The regex for names is: `/\A[a-z0-9]([-a-z0-9]*[a-z0-9])?\z/`.
+
+## Multiple agents in a cluster
+
+A Kubernetes cluster may have 0 or more agents running in it. Each agent likely
+has a different configuration. Some may enable features A and B, and some may
+enable features B and C. This flexibility enables different groups of people to
+use different features of the agent in the same cluster.
+
+For example, [Priyanka (Platform Engineer)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#priyanka-platform-engineer)
+may want to use cluster-wide features of the agent, while
+[Sasha (Software Developer)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#sasha-software-developer)
+uses the agent that only has access to a particular namespace.
+
+Each agent is likely running using a
+[`ServiceAccount`](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/),
+a distinct Kubernetes identity, with a distinct set of permissions attached to it.
+These permissions enable the agent administrator to follow the
+[principle of least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege)
+and minimize the permissions each particular agent needs.
+
+## Kubernetes Agent authentication
+
+When adding a new agent, GitLab provides the user with a bearer access token. The
+agent uses this token to authenticate with GitLab. This token is a random string
+and does not encode any information in it, but it is secret and must
+be treated with care. Store it as a `Secret` in Kubernetes.
+
+Each agent can have 0 or more tokens in a GitLab database. Having several valid
+tokens helps you rotate tokens without needing to re-register an agent. Each token
+record in the database has the following fields:
+
+- Agent identity it belongs to.
+- Token value. Encrypted at rest.
+- Creation time.
+- Who created it.
+- Revocation flag to mark token as revoked.
+- Revocation time.
+- Who revoked it.
+- A text field to store any comments the administrator may want to make about the token for future self.
+
+Tokens can be managed by users with `maintainer` and higher level of
+[permissions](../../user/permissions.md).
+
+Tokens are immutable, and only the following fields can be updated:
+
+- Revocation flag. Can only be updated to `true` once, but immutable after that.
+- Revocation time. Set to the current time when revocation flag is set, but immutable after that.
+- Comments field. Can be updated any number of times, including after the token has been revoked.
+
+The agent sends its token, along with each request, to GitLab to authenticate itself.
+For each request, GitLab checks the token's validity:
+
+- Does the token exist in the database?
+- Has the token been revoked?
+
+This information may be cached for some time to reduce load on the database.
+
+## Kubernetes Agent authorization
+
+GitLab provides the following information in its response for a given Agent access token:
+
+- Agent configuration Git repository. (The agent doesn't support per-folder authorization.)
+- Agent name.
diff --git a/doc/development/agent/index.md b/doc/development/agent/index.md
new file mode 100644
index 00000000000..95661c8ddbd
--- /dev/null
+++ b/doc/development/agent/index.md
@@ -0,0 +1,87 @@
+---
+stage: Configure
+group: Configure
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+---
+
+# Kubernetes Agent development **(PREMIUM ONLY)**
+
+This page contains developer-specific information about the GitLab Kubernetes Agent.
+[End-user documentation about the GitLab Kubernetes Agent](../../user/clusters/agent/index.md)
+is also available.
+
+The agent can help you perform tasks like these:
+
+- Integrate a cluster, located behind a firewall or NAT, with GitLab. To
+ learn more, read [issue #212810, Invert the model GitLab.com uses for Kubernetes integration by leveraging long lived reverse tunnels](https://gitlab.com/gitlab-org/gitlab/-/issues/212810).
+- Access API endpoints in a cluster in real time. For an example use case, read
+ [issue #218220, Allow Prometheus in K8s cluster to be installed manually](https://gitlab.com/gitlab-org/gitlab/-/issues/218220#note_348729266).
+- Enable real-time features by pushing information about events happening in a cluster.
+ For example, you could build a cluster view dashboard to visualize changes in progress
+ in a cluster. For more information about these efforts, read about the
+ [Real-Time Working Group](https://about.gitlab.com/company/team/structure/working-groups/real-time/).
+- Enable a [cache of Kubernetes objects through informers](https://github.com/kubernetes/client-go/blob/ccd5becdffb7fd8006e31341baaaacd14db2dcb7/tools/cache/shared_informer.go#L34-L183),
+ kept up-to-date with very low latency. This cache helps you:
+
+ - Reduce or eliminate information propagation latency by avoiding Kubernetes API calls
+ and polling, and only fetching data from an up-to-date cache.
+ - Lower the load placed on the Kubernetes API by removing polling.
+ - Eliminate any rate-limiting errors by removing polling.
+ - Simplify backend code by replacing polling code with cache access. While it's another
+ API call, no polling is needed. This example describes [fetching cached data synchronously from the front end](https://gitlab.com/gitlab-org/gitlab/-/issues/217792#note_348582537) instead of fetching data from the Kubernetes API.
+
+## Architecture of the Kubernetes Agent
+
+The GitLab Kubernetes Agent and the GitLab Kubernetes Agent Server use
+[bidirectional streaming](https://grpc.io/docs/what-is-grpc/core-concepts/#bidirectional-streaming-rpc)
+to allow the connection acceptor (the gRPC server, GitLab Kubernetes Agent Server) to
+act as a client. The connection acceptor sends requests as gRPC replies. The client-server
+relationship is inverted because the connection must be initiated from inside the
+Kubernetes cluster to bypass any firewall or NAT the cluster may be located behind.
+To learn more about this inversion, read
+[issue #212810](https://gitlab.com/gitlab-org/gitlab/-/issues/212810).
+
+This diagram describes how GitLab (`GitLab RoR`), the GitLab Kubernetes Agent (`agentk`), and the GitLab Kubernetes Agent Server (`kas`) work together.
+
+```mermaid
+graph TB
+ agentk -- gRPC bidirectional streaming --> kas
+
+ subgraph "GitLab"
+ kas[kas]
+ GitLabRoR[GitLab RoR] -- gRPC --> kas
+ kas -- gRPC --> Gitaly[Gitaly]
+ kas -- REST API --> GitLabRoR
+ end
+
+ subgraph "Kubernetes cluster"
+ agentk[agentk]
+ end
+```
+
+- `GitLab RoR` is the main GitLab application. It uses gRPC to talk to `kas`.
+- `agentk` is the GitLab Kubernetes Agent. It keeps a connection established to a
+ `kas` instance, waiting for requests to process. It may also actively send information
+ about things happening in the cluster.
+- `kas` is the GitLab Kubernetes Agent Server, and is responsible for:
+ - Accepting requests from `agentk`.
+ - [Authentication of requests](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/master/doc/identity_and_auth.md) from `agentk` by querying `GitLab RoR`.
+ - Fetching agent's configuration from a corresponding Git repository by querying Gitaly.
+ - Matching incoming requests from `GitLab RoR` with existing connections from
+ the right `agentk`, forwarding requests to it and forwarding responses back.
+ - (Optional) Sending notifications through ActionCable for events received from `agentk`.
+ - Polling manifest repositories for [GitOps support](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/master/doc/gitops.md) by communicating with Gitaly.
+
+<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
+To learn more about how the repository is structured, see
+[GitLab Kubernetes Agent repository overview](https://www.youtube.com/watch?v=j8CyaCWroUY).
+
+## Guiding principles
+
+GitLab prefers to add logic into `kas` rather than `agentk`. `agentk` should be kept
+streamlined and small to minimize the need for upgrades. On GitLab.com, `kas` is
+managed by GitLab, so upgrades and features can be added without requiring you
+to upgrade `agentk` in your clusters.
+
+`agentk` can't be viewed as a dumb reverse proxy because features are planned to be built
+[on top of the cache with informers](https://github.com/kubernetes/client-go/blob/ccd5becdffb7fd8006e31341baaaacd14db2dcb7/tools/cache/shared_informer.go#L34-L183).
diff --git a/doc/development/agent/local.md b/doc/development/agent/local.md
new file mode 100644
index 00000000000..75d45366238
--- /dev/null
+++ b/doc/development/agent/local.md
@@ -0,0 +1,58 @@
+---
+stage: Configure
+group: Configure
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+---
+
+# Run the Kubernetes Agent locally **(PREMIUM ONLY)**
+
+You can run `kas` and `agentk` locally to test the [Kubernetes Agent](index.md) yourself.
+
+1. Create a `cfg.yaml` file from the contents of
+ [`config_example.yaml`](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/master/pkg/kascfg/config_example.yaml), or this example:
+
+ ```yaml
+ agent:
+ listen:
+ network: tcp
+ address: 127.0.0.1:8150
+ websocket: false
+ gitops:
+ poll_period: "10s"
+ gitlab:
+ address: http://localhost:3000
+ authentication_secret_file: /Users/tkuah/code/ee-gdk/gitlab/.gitlab_kas_secret
+ ```
+
+1. Create a `token.txt`. This is the token for
+ [the agent you created](../../user/clusters/agent/index.md#create-an-agent-record-in-gitlab). This file must not contain a newline character. You can create the file with this command:
+
+ ```shell
+ echo -n "<TOKEN>" > token.txt
+ ```
+
+1. Start the binaries with the following commands:
+
+ ```shell
+ # Need GitLab to start
+ gdk start
+ # Stop GDK's version of kas
+ gdk stop gitlab-k8s-agent
+
+ # Start kas
+ bazel run //cmd/kas -- --configuration-file="$(pwd)/cfg.yaml"
+ ```
+
+1. In a new terminal window, run this command to start `agentk`:
+
+ ```shell
+ bazel run //cmd/agentk -- --kas-address=grpc://127.0.0.1:8150 --token-file="$(pwd)/token.txt"
+ ```
+
+You can also inspect the
+[Makefile](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/master/Makefile)
+for more targets.
+
+<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
+To learn more about how the repository is structured, see
+[GitLab Kubernetes Agent repository overview](https://www.youtube.com/watch?v=j8CyaCWroUY).
diff --git a/doc/development/agent/routing.md b/doc/development/agent/routing.md
new file mode 100644
index 00000000000..43cc78ccdfb
--- /dev/null
+++ b/doc/development/agent/routing.md
@@ -0,0 +1,223 @@
+---
+stage: Configure
+group: Configure
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+---
+
+# Routing `kas` requests in the Kubernetes Agent **(PREMIUM ONLY)**
+
+This document describes how `kas` routes requests to concrete `agentk` instances.
+GitLab must talk to GitLab Kubernetes Agent Server (`kas`) to:
+
+- Get information about connected agents. [Read more](https://gitlab.com/gitlab-org/gitlab/-/issues/249560).
+- Interact with agents. [Read more](https://gitlab.com/gitlab-org/gitlab/-/issues/230571).
+- Interact with Kubernetes clusters. [Read more](https://gitlab.com/gitlab-org/gitlab/-/issues/240918).
+
+Each agent connects to an instance of `kas` and keeps an open connection. When
+GitLab must talk to a particular agent, a `kas` instance connected to this agent must
+be found, and the request routed to it.
+
+## System design
+
+For an architecture overview please see
+[architecture.md](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/master/doc/architecture.md).
+
+```mermaid
+flowchart LR
+ subgraph "Kubernetes 1"
+ agentk1p1["agentk 1, Pod1"]
+ agentk1p2["agentk 1, Pod2"]
+ end
+
+ subgraph "Kubernetes 2"
+ agentk2p1["agentk 2, Pod1"]
+ end
+
+ subgraph "Kubernetes 3"
+ agentk3p1["agentk 3, Pod1"]
+ end
+
+ subgraph kas
+ kas1["kas 1"]
+ kas2["kas 2"]
+ kas3["kas 3"]
+ end
+
+ GitLab["GitLab Rails"]
+ Redis
+
+ GitLab -- "gRPC to any kas" --> kas
+ kas1 -- register connected agents --> Redis
+ kas2 -- register connected agents --> Redis
+ kas1 -- lookup agent --> Redis
+
+ agentk1p1 -- "gRPC" --> kas1
+ agentk1p2 -- "gRPC" --> kas2
+ agentk2p1 -- "gRPC" --> kas1
+ agentk3p1 -- "gRPC" --> kas2
+```
+
+For this architecture, this diagram shows a request to `agentk 3, Pod1` for the list of pods:
+
+```mermaid
+sequenceDiagram
+ GitLab->>+kas1: Get list of running<br />Pods from agentk<br />with agent_id=3
+ Note right of kas1: kas1 checks for<br />agent connected with agent_id=3.<br />It does not.<br />Queries Redis
+ kas1->>+Redis: Get list of connected agents<br />with agent_id=3
+ Redis-->-kas1: List of connected agents<br />with agent_id=3
+ Note right of kas1: kas1 picks a specific agentk instance<br />to address and talks to<br />the corresponding kas instance,<br />specifying which agentk instance<br />to route the request to.
+ kas1->>+kas2: Get the list of running Pods<br />from agentk 3, Pod1
+ kas2->>+agentk 3 Pod1: Get list of Pods
+ agentk 3 Pod1->>-kas2: Get list of Pods
+ kas2-->>-kas1: List of running Pods<br />from agentk 3, Pod1
+ kas1-->>-GitLab: List of running Pods<br />from agentk with agent_id=3
+```
+
+Each `kas` instance tracks the agents connected to it in Redis. For each agent, it
+stores a serialized protobuf object with information about the agent. When an agent
+disconnects, `kas` removes all corresponding information from Redis. For both events,
+`kas` publishes a notification to a Redis [pub-sub channel](https://redis.io/topics/pubsub).
+
+Each agent, while logically a single entity, can have multiple replicas (multiple pods)
+in a cluster. `kas` accommodates that and records per-replica (generally per-connection)
+information. Each open `GetConfiguration()` streaming request is given
+a unique identifier which, combined with agent ID, identifies an `agentk` instance.
+
+gRPC can keep multiple TCP connections open for a single target host. `agentk` only
+runs one `GetConfiguration()` streaming request. `kas` uses that connection, and
+doesn't see idle TCP connections because they are handled by the gRPC framework.
+
+Each `kas` instance provides information to Redis, so other `kas` instances can discover and access it.
+
+Information is stored in Redis with an [expiration time](https://redis.io/commands/expire),
+to expire information for `kas` instances that become unavailable. To prevent
+information from expiring too quickly, `kas` periodically updates the expiration time
+for valid entries. Before terminating, `kas` cleans up the information it adds into Redis.
+
+When `kas` must atomically update multiple data structures in Redis, it uses
+[transactions](https://redis.io/topics/transactions) to ensure data consistency.
+Grouped data items must have the same expiration time.
+
+In addition to the existing `agentk -> kas` gRPC endpoint, `kas` exposes two new,
+separate gRPC endpoints for GitLab and for `kas -> kas` requests. Each endpoint
+is a separate network listener, making it easier to control network access to endpoints
+and allowing separate configuration for each endpoint.
+
+Databases, like PostgreSQL, aren't used because the data is transient, with no need
+to reliably persist it.
+
+### `GitLab : kas` external endpoint
+
+GitLab authenticates with `kas` using JWT and the same shared secret used by the
+`kas -> GitLab` communication. The JWT issuer should be `gitlab` and the audience
+should be `gitlab-kas`.
+
+When accessed through this endpoint, `kas` plays the role of request router.
+
+If a request from GitLab comes but no connected agent can handle it, `kas` blocks
+and waits for a suitable agent to connect to it or to another `kas` instance. It
+stops waiting when the client disconnects, or when some long timeout happens, such
+as client timeout. `kas` is notified of new agent connections through a
+[pub-sub channel](https://redis.io/topics/pubsub) to avoid frequent polling.
+When a suitable agent connects, `kas` routes the request to it.
+
+### `kas : kas` internal endpoint
+
+This endpoint is an implementation detail, an internal API, and should not be used
+by any other system. It's protected by JWT using a secret, shared among all `kas`
+instances. No other system must have access to this secret.
+
+When accessed through this endpoint, `kas` uses the request itself to determine
+which `agentk` to send the request to. It prevents request cycles by only following
+the instructions in the request, rather than doing discovery. It's the responsibility
+of the `kas` receiving the request from the _external_ endpoint to retry and re-route
+requests. This method ensures a single central component for each request can determine
+how a request is routed, rather than distributing the decision across several `kas` instances.
+
+### API definitions
+
+```proto
+syntax = "proto3";
+
+import "google/protobuf/timestamp.proto";
+
+message KasAddress {
+ string ip = 1;
+ uint32 port = 2;
+}
+
+message ConnectedAgentInfo {
+ // Agent id.
+ int64 id = 1;
+ // Identifies a particular agentk->kas connection. Randomly generated when agent connects.
+ int64 connection_id = 2;
+ string version = 3;
+ string commit = 4;
+ // Pod namespace.
+ string pod_namespace = 5;
+ // Pod name.
+ string pod_name = 6;
+ // When the connection was established.
+ google.protobuf.Timestamp connected_at = 7;
+ KasAddress kas_address = 8;
+ // What else do we need?
+}
+
+message KasInstanceInfo {
+ string version = 1;
+ string commit = 2;
+ KasAddress address = 3;
+ // What else do we need?
+}
+
+message ConnectedAgentsForProjectRequest {
+ int64 project_id = 1;
+}
+
+message ConnectedAgentsForProjectResponse {
+ // There may 0 or more agents with the same id, depending on the number of running Pods.
+ repeated ConnectedAgentInfo agents = 1;
+}
+
+message ConnectedAgentsByIdRequest {
+ int64 agent_id = 1;
+}
+
+message ConnectedAgentsByIdResponse {
+ repeated ConnectedAgentInfo agents = 1;
+}
+
+// API for use by GitLab.
+service KasApi {
+ // Connected agents for a particular configuration project.
+ rpc ConnectedAgentsForProject (ConnectedAgentsForProjectRequest) returns (ConnectedAgentsForProjectResponse) {
+ }
+ // Connected agents for a particular agent id.
+ rpc ConnectedAgentsById (ConnectedAgentsByIdRequest) returns (ConnectedAgentsByIdResponse) {
+ }
+ // Depends on the need, but here is the call from the example above.
+ rpc GetPods (GetPodsRequest) returns (GetPodsResponse) {
+ }
+}
+
+message Pod {
+ string namespace = 1;
+ string name = 2;
+}
+
+message GetPodsRequest {
+ int64 agent_id = 1;
+ int64 connection_id = 2;
+}
+
+message GetPodsResponse {
+ repeated Pod pods = 1;
+}
+
+// Internal API for use by kas for kas -> kas calls.
+service KasInternal {
+ // Depends on the need, but here is the call from the example above.
+ rpc GetPods (GetPodsRequest) returns (GetPodsResponse) {
+ }
+}
+```
diff --git a/doc/development/agent/user_stories.md b/doc/development/agent/user_stories.md
new file mode 100644
index 00000000000..2929573ffd3
--- /dev/null
+++ b/doc/development/agent/user_stories.md
@@ -0,0 +1,77 @@
+---
+stage: Configure
+group: Configure
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+---
+
+# Kubernetes Agent user stories **(PREMIUM ONLY)**
+
+The [personas in action](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#user-personas)
+for the Kubernetes Agent are:
+
+- [Sasha, the Software Developer](https://about.gitlab.com/handbook/marketing/strategic-marketing/roles-personas/#sasha-software-developer).
+- [Allison, the Application Operator](https://about.gitlab.com/handbook/marketing/strategic-marketing/roles-personas/#allison-application-ops).
+- [Priyanka, the Platform Engineer](https://about.gitlab.com/handbook/marketing/strategic-marketing/roles-personas/#priyanka-platform-engineer).
+
+[Devon, the DevOps engineer](https://about.gitlab.com/handbook/marketing/strategic-marketing/roles-personas/#devon-devops-engineer)
+is intentionally excluded here, as DevOps is more of a role than a persona.
+
+There are various workflows to support, so some user stories might seem to contradict each other. They don't.
+
+## Software Developer user stories
+
+<!-- vale gitlab.FirstPerson = NO -->
+
+- As a Software Developer, I want to push my code, and move to the next development task,
+ to work on business applications.
+- As a Software Developer, I want to set necessary dependencies and resource requirements
+ together with my application code, so my code runs fine after deployment.
+
+<!-- vale gitlab.FirstPerson = YES -->
+
+## Application Operator user stories
+
+<!-- vale gitlab.FirstPerson = NO -->
+
+- As an Application Operator, I want to standardize the deployments used by my teams,
+ so I can support all teams with minimal effort.
+- As an Application Operator, I want to have a single place to define all the deployments,
+ so I can assure security fixes are applied everywhere.
+- As an Application Operator, I want to offer a set of predefined templates to
+ Software Developers, so they can get started quickly and can deploy to production
+ without my intervention, and I am not a bottleneck.
+- As an Application Operator, I want to know exactly what changes are being deployed,
+ so I can fulfill my SLAs.
+- As an Application Operator, I want deep insights into what versions of my applications
+ are running and want to be able to debug them, so I can fix operational issues.
+- As an Application Operator, I want application code to be automatically deployed
+ to staging environments when new versions are available.
+- As an Application Operator, I want to follow my preferred deployment strategy,
+ so I can move code into production in a reliable way.
+- As an Application Operator, I want review all code before it's deployed into production,
+ so I can fulfill my SLAs.
+- As an Application Operator, I want to be notified before deployment when new code needs my attention,
+ so I can review it swiftly.
+
+<!-- vale gitlab.FirstPerson = YES -->
+
+## Platform Engineer user stories
+
+<!-- vale gitlab.FirstPerson = NO -->
+
+- As a Platform Engineer, I want to restrict customizations to preselected values
+ for Operators, so I can fulfill my SLAs.
+- As a Platform Engineer, I want to allow some level of customization to Operators,
+ so I don't become a bottleneck.
+- As a Platform Engineer, I want to define all deployments in a single place, so
+ I can assure security fixes are applied everywhere.
+- As a Platform Engineer, I want to define the infrastructure by code, so my
+ infrastructure management is testable, reproducible, traceable, and scalable.
+- As a Platform Engineer, I want to define various policies that applications must
+ follow, so that I can fulfill my SLAs.
+- As a Platform Engineer, I want approved tooling for log management and persistent storage,
+ so I can scale, secure, and manage them as needed.
+- As a Platform Engineer, I want to be alerted when my infrastructure differs from
+ its definition, so I can make sure that everything is configured as expected.
+
+<!-- vale gitlab.FirstPerson = YES -->