Welcome to mirror list, hosted at ThFree Co, Russian Federation.

index.md « google_artifact_registry_integration « blueprints « architecture « doc - gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
blob: adde0f7f5879b6f83b09d121200014d223b39427 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
status: proposed
creation-date: "2023-08-31"
authors: [ "@jdrpereira", "@10io" ]
coach: "@grzesiek"
approvers: [ "@trizzi", "@crystalpoole" ]
owning-stage: "~devops::package"
participating-stages: []
---

# Google Artifact Registry Integration

## Summary

GitLab and Google Cloud have recently [announced](https://about.gitlab.com/blog/2023/08/29/gitlab-google-partnership-s3c/) a partnership to combine the unique capabilities of their platforms.

As highlighted in the announcement, one key goal is the ability to "_use Google's Artifact Registry with GitLab pipelines and packaging to create a security data plane_". The initial step toward this goal is to allow users to configure a new [Google Artifact Registry](https://cloud.google.com/artifact-registry) (abbreviated as GAR from now on) [project integration](../../../user/project/integrations/index.md) and display [container image artifacts](https://cloud.google.com/artifact-registry/docs/supported-formats) in the GitLab UI.

## Motivation

Please refer to the [announcement](https://about.gitlab.com/blog/2023/08/29/gitlab-google-partnership-s3c/) blog post for more details about the motivation and long-term goals of the GitLab and Google Cloud partnership.

Regarding the scope of this design document, our primary focus is to fulfill the Product requirement of providing users with visibility over their container images in GAR. The motivation for this specific goal is rooted in foundational research on the use of external registries as a complement to the GitLab Container Registry ([internal](https://gitlab.com/gitlab-org/ux-research/-/issues/2602)).

Since this marks the first step in the GAR integration, our aim is to achieve this goal in a way that establishes a foundation to facilitate reusability in the future. This groundwork could benefit potential future expansions, such as support for additional artifact formats (npm, Maven, etc.), and features beyond the Package stage (e.g., vulnerability scanning, deployments, etc.).

### Goals

- Allow GitLab users to configure a new [project integration](../../../user/project/integrations/index.md) for connecting to GAR.
- Limited to a single top-level GAR [repository](https://cloud.google.com/artifact-registry/docs/repositories) per GitLab project.
- Limited to GAR repositories in [Standard](https://cloud.google.com/artifact-registry/docs/repositories#mode) mode. Support for Remote and Virtual [repository modes](https://cloud.google.com/artifact-registry/docs/repositories#mode) (both in Preview) is a strech goal.
- Limited to GAR repositories of format [Container images](https://cloud.google.com/artifact-registry/docs/supported-formats#container).
- Use a Google Cloud [service account](https://cloud.google.com/iam/docs/service-account-overview) provided by the GitLab project owner/maintainer to interact with GAR.
- Allow GitLab users to list container images under the connected GAR repository, including sub-repositories. The list should be paginable and sortable.
- For each listed image, display its URI, list of tags, size, digest, upload time, media type, build time, and update time, as documented [here](https://cloud.google.com/artifact-registry/docs/reference/rest/v1/projects.locations.repositories.dockerImages#DockerImage).
- Listing container images under the connected GAR repository is restricted to users with [Reporter+](../../../user/permissions.md#roles) roles.

### Non-Goals

While some of these may become goals for future iterations, they are currently out of scope:

- Create, update and delete operations.
- Connecting to multiple (top-level) GAR repositories under the same project.
- Support for [repository formats](https://cloud.google.com/artifact-registry/docs/supported-formats) beyond container images.
- Support for other [Identity and Access Management (IAM)](https://cloud.google.com/iam) permissions/credentials beyond [service accounts](https://cloud.google.com/iam/docs/service-account-overview).
- GAR [cleanup policies](https://cloud.google.com/artifact-registry/docs/repositories/cleanup-policy).
- Filtering the images list by their attributes (name or value). The current [GAR API](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1#listdockerimagesrequest) does not support filtering.
- [Artifact analysis and vulnerability scanning](https://cloud.google.com/artifact-registry/docs/analysis).

## Proposal

### Design and Implementation Details

#### Project Integration

A new [project integration](../../../user/project/integrations/index.md) for GAR will be created. Once enabled, this will display a new "Google Artifact Registry" item in the "Operate" section of the sidebar. This is also where the [Harbor](../../../user/project/integrations/harbor.md) integration is displayed if enabled.

The GAR integration can be enabled by project owner/maintainer(s), who must provide four configuration parameters during setup:

- **GCP project ID**: The globally unique identifier for the GCP project where the target GAR repository lives.
- **Repository location**: The [GCP location](https://cloud.google.com/about/locations) where the target GAR repository lives.
- **Repository name**: The name of the target GAR repository.
- **GCP service account key**: The _content_ (not the file) of the [service account key](https://cloud.google.com/iam/docs/keys-create-delete) in JSON format ([sample](https://cloud.google.com/iam/docs/keys-create-delete#creating)).

#### Authentication

The integration is simplified by using a single GCP service account for the integration. Users retain the ability to [audit usage](https://cloud.google.com/iam/docs/audit-logging/examples-service-accounts#access-with-key) of this service account on the GCP side and revoke permissions if/when necessary.

The service account key provided during the integration setup must be granted at least with the [`Artifact Registry Reader`](https://cloud.google.com/artifact-registry/docs/access-control#permissions) role in the target GCP project.

Saving the (encrypted) service account key JSON content in the backend allows us to easily grab and use it to initialize the GAR client (more about that later). Providing the content of the key file instead of uploading it is similar to what we do with users' public SSH keys.

As previously highlighted, access to the GAR integration features is restricted to users with [Reporter+](../../../user/permissions.md#roles) roles.

#### Resource Mapping

For the [GitLab Container Registry](../../../user/packages/container_registry/index.md), repositories within a specific project must have a path that matches the project full path. This is essentially how we establish a resource mapping between GitLab Rails and the registry, which serves multiple purposes, including granular authorization, scoping storage usage to a given project/group/namespace, and more.

Regarding the GAR integration, since there is no equivalent entities for GitLab project/group/namespace resources on the GAR side, we aim to simplify matters by allowing users to attach any [GAR repository](https://cloud.google.com/artifact-registry/docs/repositories) to any GitLab project, regardless of their respective paths. Similarly, we do not plan to restrict the attachment of a particular GAR repository to a single GitLab project. Ultimately, it is up to users to determine how to organize both datasets in the way that best suits their needs.

#### GAR API

GAR provides three APIs: Docker API, REST API, and RPC API.

The [Docker API](https://cloud.google.com/artifact-registry/docs/reference/docker-api) is based on the [Docker Registry HTTP API V2](https://docs.docker.com/registry/spec/api), now superseded by the [OCI Distribution Specification API](https://github.com/opencontainers/distribution-spec/blob/main/spec.md) (from now on referred to as OCI API). This API is used for pushing/pulling images to/from GAR and also provides some discoverability operations. Please refer to [Alternative Solutions](#alternative-solutions) for the reasons why we don't intend to use it.

Among the proprietary GAR APIs, the [REST API](https://cloud.google.com/artifact-registry/docs/reference/rest) provides basic functionality for managing repositories. This includes [`list`](https://cloud.google.com/artifact-registry/docs/reference/rest/v1/projects.locations.repositories.dockerImages/list) and [`get`](https://cloud.google.com/artifact-registry/docs/reference/rest/v1/projects.locations.repositories.dockerImages/get) operations for container image repositories, which could be used for this integration. Both operations return the same data structure, represented by the [`DockerImage`](https://cloud.google.com/artifact-registry/docs/reference/rest/v1/projects.locations.repositories.dockerImages#DockerImage) object, so both provide the same level of detail.

Last but not least, there is also an [RPC API](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1), backed by gRPC and Protocol Buffers. This API provides the most functionality, covering all GAR features. From the available operations, we can make use of the [`ListDockerImagesRequest`](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1#listdockerimagesrequest) and [`GetDockerImageRequest`](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1#google.devtools.artifactregistry.v1.GetDockerImageRequest) operations. As with the REST API, both responses are composed of [`DockerImage`](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1#google.devtools.artifactregistry.v1.DockerImage) objects.

Between the two proprietary API options, we chose the RPC one because it provides support not only for the operations we need today but also offers better coverage of all GAR features, which will be beneficial in future iterations. Finally, we do not intend to make direct use of this API but rather use it through the official Ruby client SDK. Please see [Client SDK](#client-sdk) below for more details.

#### Backend Integration

##### Client SDK

To interact with GAR we will make use of the official GAR [Ruby client SDK](https://cloud.google.com/ruby/docs/reference/google-cloud-artifact_registry/latest).

*TODO: Add more details about the client SDK integration and its limitations (no filtering for example).*

##### Database Changes

*TODO: Describe any necessary changes to the database to support this integration.*

##### CI/CD variables

Similar to the [Harbor](../../../user/project/integrations/harbor.md#configure-gitlab) integration, once users activates the GAR integration, additional CI/CD variables will be automatically available if the integration is enabled. These will be set according to the requirements described in the [documentation](https://cloud.google.com/artifact-registry/docs/docker/authentication#json-key):

- `GCP_ARTIFACT_REGISTRY_URL`: This will be set to `https://LOCATION-docker.pkg.dev`, where `LOCATION` is the GCP project location configured for the integration.
- `GCP_ARTIFACT_REGISTRY_PROJECT_URI`: This will be set to `LOCATION-docker.pkg.dev/PROJECT-ID`. `PROJECT-ID` is the GCP project ID of the GAR repository configured for the integration.
- `GCP_ARTIFACT_REGISTRY_PASSWORD`: This will be set to the base64-encode version of the service account JSON key file configured for the integration.
- `GCP_ARTIFACT_REGISTRY_USER`: This will be set to `_json_key_base64`.

These can then be used to log in using `docker login`:

```shell
docker login -u $GCP_ARTIFACT_REGISTRY_USER -p $GCP_ARTIFACT_REGISTRY_PASSWORD $GCP_ARTIFACT_REGISTRY_URL
```

Similarly, these can be used to download images from the repository with `docker pull`:

```shell
docker pull $GCP_ARTIFACT_REGISTRY_PROJECT_URI/REPOSITORY/myapp:latest
```

Finally, provided that the configured service account has the `Artifact Registry Writer` role, one can also push images to GAR:

```shell
docker build -t $GCP_ARTIFACT_REGISTRY_REPOSITORY_URI/myapp:latest .
docker push $GCP_ARTIFACT_REGISTRY_REPOSITORY_URI/myapp:latest
```

For forward compatibility reasons, the repository name (`REPOSITORY` in the command above) must be appended to `GCP_ARTIFACT_REGISTRY_PROJECT_URI` by the user. In the first iteration we will only support a single GAR repository, and therefore we could technically provide an e.g. `GCP_ARTIFACT_REGISTRY_REPOSITORY_URI` variable with the repository name already included. However, once we add support for multiple repositories, there is no way we can tell what repository a user will want to target for a specific instruction. So it must be the user to tell that.

#### UI/UX

This integration will include a dedicated page named "Google Artifact Registry," listed under the "Operate" section of the sidebar. This page will enable users to view the list of all container images in the configured GAR repository. See the [UI/UX](ui_ux.md) page for additional details.

#### GraphQL APIs

*TODO: Describe any GraphQL APIs or changes to existing APIs that will be needed for this integration.*

## Alternative Solutions

### Use Docker/OCI API

One alternative solution considered was to use the Docker/OCI API provided by GAR, as it is a common standard for container registries. This approach would have allowed GitLab to reuse [existing logic](https://gitlab.com/gitlab-org/gitlab/-/blob/20df77103147c0c8ff1c22a888516eba4bab3c46/lib/container_registry/client.rb) for connecting to container registries, which could potentially speed up development. However, there were several drawbacks to this approach:

- **Authentication Complexity**: The API requires authentication tokens, which need to be requested at the [login endpoint](https://docs.docker.com/registry/spec/auth/token). These tokens have limited validity, adding complexity to the authentication process. Handling expiring tokens would have been necessary.

- **Limited Focus**: The API is solely focused on container registry objects, which does not align with the goal of creating a flexible integration framework for adopting additional GAR artifacts (e.g. package registry formats) down the road.

- **Discoverability Limitations**: The API has severe limitations when it comes to discoverability, lacking features like filtering or sorting.

- **Multiple Requests**: To retrieve all the required information about each image, multiple requests to different endpoints (listing tags, obtaining image manifests, and image configuration blobs) would have been necessary, leading to a `1+N` performance issue.

GitLab had previously faced significant challenges with the last two limitations, prompting the development of a custom [GitLab Container Registry API](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs-gitlab/api.md) to address them. Additionally, GitLab decided to [deprecate support](../../../update/deprecations.md#use-of-third-party-container-registries-is-deprecated) for connecting to third-party container registries using the Docker/OCI API due to these same limitations and the increased cost of maintaining two solutions in parallel. As a result, there is an ongoing effort to replace the use of the Docker/OCI API endpoints with custom API endpoints for all container registry functionalities in GitLab.

Considering these factors, the decision was made to build the GAR integration from scratch using the proprietary GAR API. This approach provides more flexibility and control over the integration and can serve as a foundation for future expansions, such as support for other GAR artifact formats.