Welcome to mirror list, hosted at ThFree Co, Russian Federation.

repository_storage_types.md « administration « doc - gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
blob: 3bd73b4df944f706f95b19f161a9586836799cd7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
---
stage: Systems
group: Gitaly
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
---

# Repository storage types **(FREE SELF)**

GitLab can be configured to use one or multiple repository storages. These storages can be:

- Accessed via [Gitaly](gitaly/index.md), optionally on
  [its own server](gitaly/configure_gitaly.md#run-gitaly-on-its-own-server).
- Mounted to the local disk. This [method](repository_storage_paths.md#configure-repository-storage-paths)
  is deprecated and [scheduled to be removed](https://gitlab.com/groups/gitlab-org/-/epics/2320) in
  GitLab 14.0.
- Exposed as an NFS shared volume. This method is deprecated and
  [scheduled to be removed](https://gitlab.com/groups/gitlab-org/-/epics/3371) in GitLab 14.0.

In GitLab:

- Repository storages are configured in:
  - `/etc/gitlab/gitlab.rb` by the `git_data_dirs({})` configuration hash for Linux package installations.
  - `gitlab.yml` by the `repositories.storages` key for self-compiled installations.
- The `default` repository storage is available in any installations that haven't customized it. By
  default, it points to a Gitaly node.

The repository storage types documented here apply to any repository storage defined in
`git_data_dirs({})` or `repositories.storages`.

## Hashed storage

> - [Introduced](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/28283) in GitLab 10.0.
> - Made the default for new installations in GitLab 12.0.
> - Enabled by default for new and renamed projects in GitLab 13.0.

Hashed storage stores projects on disk in a location based on a hash of the project's ID. Hashed
storage is different to [legacy storage](#legacy-storage) where a project is stored based on:

- The project's URL.
- The folder structure where the repository is stored on disk.

This makes the folder structure immutable and eliminates the need to synchronize state from URLs to
disk structure. This means that renaming a group, user, or project:

- Costs only the database transaction.
- Takes effect immediately.

The hash also helps spread the repositories more evenly on the disk. The top-level directory
contains fewer folders than the total number of top-level namespaces.

The hash format is based on the hexadecimal representation of a SHA256, calculated with
`SHA256(project.id)`. The top-level folder uses the first two characters, followed by another folder
with the next two characters. They are both stored in a special `@hashed` folder so they can
co-exist with existing legacy storage projects. For example:

```ruby
# Project's repository:
"@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.git"

# Wiki's repository:
"@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.wiki.git"
```

### Translate hashed storage paths

Troubleshooting problems with the Git repositories, adding hooks, and other tasks requires you
translate between the human-readable project name and the hashed storage path. You can translate:

- From a [project's name to its hashed path](#from-project-name-to-hashed-path).
- From a [hashed path to a project's name](#from-hashed-path-to-project-name).

#### From project name to hashed path

Administrators can look up a project's hashed path from its name or ID using:

- The [Admin Area](../user/admin_area/index.md#administering-projects).
- A Rails console.

To look up a project's hash path in the Admin Area:

1. On the left sidebar, expand the top-most chevron (**{chevron-down}**).
1. Select **Admin Area**.
1. On the left sidebar, select **Overview > Projects** and select the project.

The **Gitaly relative path** is displayed there and looks similar to:

```plaintext
"@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git"
```

To look up a project's hash path using a Rails console:

1. Start a [Rails console](operations/rails_console.md#starting-a-rails-console-session).
1. Run a command similar to this example (use either the project's ID or its name):

   ```ruby
   Project.find(16).disk_path
   Project.find_by_full_path('group/project').disk_path
   ```

#### From hashed path to project name

Administrators can look up a project's name from its hashed storage path using:

- A Rails console.
- The `config` file in the `*.git` directory.

To look up a project's name using the Rails console:

1. Start a [Rails console](operations/rails_console.md#starting-a-rails-console-session).
1. Run a command similar to this example:

   ```ruby
   ProjectRepository.find_by(disk_path: '@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9').project
   ```

The quoted string in that command is the directory tree you can find on your GitLab server. For
example, on a default Linux package installation this would be `/var/opt/gitlab/git-data/repositories/@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git`
with `.git` from the end of the directory name removed.

The output includes the project ID and the project name. For example:

```plaintext
=> #<Project id:16 it/supportteam/ticketsystem>
```

To look up a project's name using the `config` file in the `*.git` directory:

1. Locate the `*.git` directory. This directory is located in `/var/opt/gitlab/git-data/repositories/@hashed/`, where the first four
   characters of the hash are the first two directories in the path under `@hashed/`. For example, on a default Linux package installation the
   `*.git` directory of the hash `b17eb17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9` would be
   `/var/opt/gitlab/git-data/repositories/@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git`.
1. Open the `config` file and locate the `fullpath=` key under `[gitlab]`.

### Hashed object pools

> [Introduced](https://gitlab.com/gitlab-org/gitaly/-/issues/1606) in GitLab 12.1.

Object pools are repositories used to deduplicate forks of public and internal projects and
contain the objects from the source project. Using `objects/info/alternates`, the source project and
forks use the object pool for shared objects. For more information, see
[How Git object deduplication works in GitLab](../development/git_object_deduplication.md).

Objects are moved from the source project to the object pool when housekeeping is run on the source
project. Object pool repositories are stored similarly to regular repositories in a directory called `@pools` instead of `@hashed`

```ruby
# object pool paths
"@pools/#{hash[0..1]}/#{hash[2..3]}/#{hash}.git"
```

WARNING:
Do not run `git prune` or `git gc` in object pool repositories, which are stored in the `@pools` directory.
This can cause data loss in the regular repositories that depend on the object pool.

### Group wiki storage

> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/13195) in GitLab 13.5.

Unlike project wikis that are stored in the `@hashed` directory, group wikis are stored in a directory called `@groups`.
Like project wikis, group wikis follow the hashed storage folder convention, but use a hash of the group ID rather than the project ID.

For example:

```ruby
# group wiki paths
"@groups/#{hash[0..1]}/#{hash[2..3]}/#{hash}.wiki.git"
```

### Gitaly Cluster storage

If Gitaly Cluster is used, Praefect manages storage locations. For more information, see [Praefect-generated replica paths](gitaly/index.md#praefect-generated-replica-paths-gitlab-150-and-later).

### Object storage support

This table shows which storable objects are storable in each storage type:

| Storable object  | Legacy storage | Hashed storage | S3 compatible | GitLab version |
|:-----------------|:---------------|:---------------|:--------------|:---------------|
| Repository       | Yes            | Yes            | -             | 10.0           |
| Attachments      | Yes            | Yes            | -             | 10.2           |
| Avatars          | Yes            | No             | -             | -              |
| Pages            | Yes            | No             | -             | -              |
| Docker Registry  | Yes            | No             | -             | -              |
| CI/CD job logs   | No             | No             | -             | -              |
| CI/CD artifacts  | No             | No             | Yes           | 9.4 / 10.6     |
| CI/CD cache      | No             | No             | Yes           | -              |
| LFS objects      | Yes            | Similar        | Yes           | 10.0 / 10.7    |
| Repository pools | No             | Yes            | -             | 11.6           |

Files stored in an S3-compatible endpoint can have the same advantages as
[hashed storage](#hashed-storage), as long as they are not prefixed with
`#{namespace}/#{project_name}`. This is true for CI/CD cache and LFS objects.

#### Avatars

Each file is stored in a directory that matches the `id` assigned to it in the database. The
filename is always `avatar.png` for user avatars. When an avatar is replaced, the `Upload` model is
destroyed and a new one takes place with a different `id`.

#### CI/CD artifacts

CI/CD artifacts are:

- S3-compatible since GitLab 9.4, initially available in [GitLab Premium](https://about.gitlab.com/pricing/).
- Available in [GitLab Free](https://about.gitlab.com/pricing/) since GitLab 10.6.

#### LFS objects

[LFS Objects in GitLab](../topics/git/lfs/index.md) implement a similar
storage pattern using two characters and two-level folders, following the Git implementation:

```ruby
"shared/lfs-objects/#{oid[0..1}/#{oid[2..3]}/#{oid[4..-1]}"

# Based on object `oid`: `8909029eb962194cfb326259411b22ae3f4a814b5be4f80651735aeef9f3229c`, path will be:
"shared/lfs-objects/89/09/029eb962194cfb326259411b22ae3f4a814b5be4f80651735aeef9f3229c"
```

LFS objects are also [S3-compatible](lfs/index.md#storing-lfs-objects-in-remote-object-storage).

## Legacy storage

WARNING:
In GitLab 13.0, legacy storage is deprecated. If you haven't migrated to hashed storage yet, check
the [migration instructions](raketasks/storage.md#migrate-to-hashed-storage). Support for legacy
storage is [scheduled to be removed](https://gitlab.com/gitlab-org/gitaly/-/issues/1690) in GitLab
14.0. In GitLab 13.0 and later, switching new projects to legacy storage is not possible. The
option to choose between hashed and legacy storage in the Admin Area is disabled.

Legacy storage was the storage behavior prior to version GitLab 10.0. For historical reasons,
GitLab replicated the same mapping structure from the projects URLs:

- Project's repository: `#{namespace}/#{project_name}.git`.
- Project's wiki: `#{namespace}/#{project_name}.wiki.git`.

This structure enabled you to migrate from existing solutions to GitLab, and for Administrators to
find where the repository was stored. This approach also had some drawbacks:

- Storage location concentrated a large number of top-level namespaces. The impact could be
  reduced by [multiple repository storage paths](repository_storage_paths.md).
- Because backups were a snapshot of the same URL mapping, if you tried to recover a very old
  backup, you needed to verify whether any project had taken the place of an old removed or renamed
  project sharing the same URL. This meant that `mygroup/myproject` from your backup may not have
  been the same original project that was at that same URL today.
- Any change in the URL needed to be reflected on disk (when groups, users, or projects were
  renamed. This could add a lot of load in big installations, especially if using any type of
  network-based file system.