Welcome to mirror list, hosted at ThFree Co, Russian Federation.

index.md « container_registry_metadata_database_self_managed_rollout « blueprints « architecture « doc - gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
blob: 84a95e3e7c35d9600fc493018fb4953fbc35cc07 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
---
status: proposed
creation-date: "2023-06-09"
authors: [ "@hswimelar" ]
coach: "@grzesiek"
approvers: [ "@trizzi ", "@sgoldstein" ]
owning-stage: "~devops::package"
participating-stages: []
---

<!-- Blueprints often contain forward-looking statements -->
<!-- vale gitlab.FutureTense = NO -->

# Container Registry Self-Managed Database Rollout

## Summary

The latest iteration of the [Container Registry](https://gitlab.com/gitlab-org/container-registry)
has been rearchitected to use a PostgreSQL database and deployed on GitLab.com.
Now we must bring the advantages provided by the database to self-managed users.
While the container registry retains the capacity to run without the new database,
many new and highly desired features cannot be implemented without it.
Additionally, unifying the registry used for GitLab.com and for self-managed
allows us to provide a cohesive user experience and reduces the burden
associated with maintaining the old registry implementation. To accomplish this,
we plan to eventually require all self-managed to migrate to the new registry
database, so that we may deprecate and remove support for the old object storage
metadata subsystem.

This document seeks to describe how we may use the proven core migration
functionality, which was used to migrate millions of container images on GitLab.com,
to enable self-managed users to enjoy the benefits of the metadata database.

## Motivation

Enabling self-managed users to migrate to the new metadata database allows these
users to take advantage of the new features that require the database. Additionally,
the greater adoption of the database allows the container registry team to focus
our knowledge and capacity, and will eventually allow us to fully remove the old
registry metadata subsystem, greatly improving the maintainability and stability
of the container registry for both GitLab.com and for self-managed users.

### Goals

- Progressively rollout the new dependency of a PostgreSQL database instance for the registry for charts and omnibus deployments.
- Progressively rollout automation for the registry PostgreSQL database instance for charts and omnibus deployments.
- Develop processes and tools that self-managed admins can use to migrate existing registry deployments to the metadata database.
- Develop processes and tools that self-managed admins can use spin up fresh installs of the Container Registry which use the metadata database.
- Create a plan which will eventually allow us to fully drop support for original object storage metadata subsystem.

### Non-Goals

- Developing new Container Registry features outside the scope of enabling admins to migrate to the metadata database.
- Determining lifecycle support decisions, such as when to default to the database, and when to end support for non-database registries.

## Proposal

There are two main components that must be further developed in order for
self-managed admins to move to the registry database: the deployment environment and
the registry migration tooling.

For the deployment environments need to document what the user needs to do to set up their
deployment such that the registry has access to a suitable database given the
expected registry workload. As well as develop tooling and automation to ease
the setup and maintenance of the registry database for new and existing deploys.

For the registry, we need to develop and validate import tooling which
coordinates with the core import functionality which was used to migrate all
container images on GitLab.com. Additionally, we should provide estimated import
times for admins for each supported storage driver.

During the beta phase, we can highlight key features of our work to provide a
quick reference for what features we have now, are planning, their statuses, and
an excutive summary of the overall state of the migration experience.
This could be advertised to self-managed users via a simple chart, allowing them
to tell at a glance the status of this project and determine if it is feature-
complete enough for their needs and level of risk tolerance.

This should be documented in the container registry administration documentation,
rather than in this blueprint. Providing this information there will place it in
a familiar place for self-managed admins, will allow for logical cross-linking
from other sections of the same document, such as from the garbage collection
section.

For example:

The metadata database is in early beta for self-managed users. The core migration
process for existing registries has been implemented, and online garbage collection
is fully implemented. Certain database enabled features are only enabled for GitLab.com
and automatic database provisioning for the registry database is not available.
Please see the table below for the status of features related to the container
registry database.

| Feature                     | Description                                                         | Status             | Link                                                                                           |
| --------------------------- | ------------------------------------------------------------------- | ------------------ | ---------------------------------------------------------------------------------------------- |
| Import Tool                 | Allows existing deployments to migrate to the database.             | Completed          | [Import Tool](https://gitlab.com/gitlab-org/container-registry/-/issues/884)                   |
| Automatic Import Validation | Tests that the import maintained data integrity of imported images. | Backlog            | [Validate self-managed imports](https://gitlab.com/gitlab-org/container-registry/-/issues/938) |
| Foo Bar                     | Lorem ipsum dolor sit amet.                                         | Scheduled for 16.5 | <LINK>                                                                                         |

### Structuring Support by Driver

The import operation heavily relies on the object storage driver implementation
to iterate over all registry metadata so that it can be stored in the database.
It's possible that implementation differences in the driver will make a
meaningful impact on the performance and reliability of the import process.

The following two sections briefly summarize several points for and against
structuring support by driver.

#### Arguments Opposed to Structuring Support by Driver

Each storage driver is well abstracted in the code, specifically the import process
makes use of the following Methods:

- Walk
- List
- GetContent
- Stat
- Reader

Each of the methods is a read method we do not need to create or delete data via
the object storage methods. Additionally, all of these methods are standard API
methods.

Given that we're not mutating data via object storage as part of the import
process, we should not need to double-check these drivers or try to predict
potential errors. Relying on user feedback during the beta to direct any efforts
we should be making here could prevent us from scheduling unnecessary work.

#### Arguments in Favor of Structuring Support by Driver

Our experience with enhancing and supporting offline garbage collection has
shown that while the storage driver implementation should not matter, it does.
The drivers have proven to have important differences in performance and
reliability. Many of the planned possible driver-related improvements are
related to testing and stability, rather than outright new work for each driver.

In particular, retries and error reporting across storage drivers are not as
standardized as one would hope for, and therefore there is a potential that a
long-running import process could be interrupted by an error that could have
been retried.

Creating import estimates based on combinations of the registry size and storage
driver, would also be of use to self-managed admins, looking to schedule their
migration. There will be a difference here between local filesystem storage and
object storage and there could be a difference between the object storage
providers as well.

Also, we could work with the importer to smooth out the differences in the
storage drivers. Even without unified retryable error reporting from the storage
drivers, we could have the importer retry more time and for more errors. There's
a risk we would retry several times on non-retryable errors, but since no writes
are being made to object storage, this should not ultimately be harmful.

Additionally, implementing [Validate self-managed imports](https://gitlab.com/gitlab-org/container-registry/-/issues/938)
would perform a consistency check against a sample of images before and after
import which would lead to greater consistency across all storage driver implementations.

## Design and Implementation Details

### The Import Tool

The [import tool](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs-gitlab/database-import-tool.md)
is a well-validated component of the Container Registry project that we have used
from the beginning as a way to perform local testing. This tool is a thin wrapper
over the core import functionality — the code which handles the import logic has
been extensively validated.

While the core import functionality is solid, we must ensure that this tool and
the surrounding process will enable non-expert users to import their registries
with both minimal risk and with minimal support from GitLab team members.
Therefore, the most important work remaining is crafting the UX of this tooling
such that those goals are met. This
[epic](https://gitlab.com/groups/gitlab-org/-/epics/8602) captures many of the
proposed improvements.

#### Design

The tool is designed such that a single execution flow can support both users
with large registries with strict uptime requirements who can take advantage of
a more involved process to reduce read-only time to the absolute minimum as well
as users with small registries who benefit from a streamlined workflow. This is
achieved via the same pre import, then full import cycle that was used on
GitLab.com, along with an additional step to catalog all unreferenced blobs held
in common storage.

##### One-Shot Import

In most cases, a user can simply choose to run the import tool while the registry
is offline or read-only in mode. This will be similar to what admins must
already do in order to run offline garbage collection. Each step completes in
sequence, moving directly to the next. The command exits when the import process
is complete and the registry is ready to make full use of the metadata database.

##### Minimal Downtime Import

For users with large registries and who are interested in the minimum possible
downtime, each step can be ran independently when the tool is passed the appropriate
flag. The user will first run the pre-import step while the registry is
performing its usual workload. Once that has completed, and the user is ready
to stop writes to the registry, the tag import step can be ran. As with the GitLab.com
migration, importing tags requires that the registry be offline or in
read-only mode. This step does the minimum possible work to achieve fast and
efficient tag imports and will always be the fastest of the three steps, reducing
the downtime component to a fraction of the total import time. The user can then
bring up the registry configured to use the metadata database. After that, the
user is free to run the third step during standard registry operations. This step
makes any dangling blobs in common storage visible to the database and therefore
the online garbage collection process.

### Distribution Paths

Tooling, process, and documentation will need to be developed in order to
support users who wish to use the metadata database, especially in regards to
providing a foundation for the new database instance required for the migration.

For new deployments, we should wait until we've moved to general support, have
automation in place for the registry database and migration, and have a major
GitLab version bump before enabling the database by default for self-managed.

#### Omnibus

#### Charts

## Alternative Solutions

### Do Nothing

#### Pros

- The database and associated features are generally most useful for large-scale, high-availability deployments.
- Eliminate the need to support an additional logical or physical database for self-managed deployments.

#### Cons

- The registry on GitLab.com and the registry used by self-managed will greatly diverge in supported features over time.
- The maintenance burden of supporting two registry implementations will reduce the velocity at which new registry features can be released.
- The registry on GitLab.com stops being an effective way to validate changes before they are released to self-managed.
- Large self-managed users continue to not be able to scale the registry to suit their needs.

### Gradual Migration

This approach would be to exactly replicate the GitLab.com migration on
self-managed.

#### Pros

- Replicate an already successful process.
- Scope downtime by repository, rather than instance.

#### Cons

- Dramatically increased complexity in all aspects of the migration process.
- Greatly increased possibility of data consistency issues.
- Less clear demarcation of registry migration progress.