Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/repository_storage_types.md')
-rw-r--r--doc/administration/repository_storage_types.md172
1 files changed, 63 insertions, 109 deletions
diff --git a/doc/administration/repository_storage_types.md b/doc/administration/repository_storage_types.md
index 2e2ed431c8b..a95178c01e2 100644
--- a/doc/administration/repository_storage_types.md
+++ b/doc/administration/repository_storage_types.md
@@ -1,16 +1,15 @@
-# Repository Storage Types
+# Repository storage types **(CORE ONLY)**
-> [Introduced][ce-28283] in GitLab 10.0.
-
-Two different storage layouts can be used
-to store the repositories on disk and their characteristics.
+> - [Introduced](https://gitlab.com/gitlab-org/gitlab-foss/issues/28283) in GitLab 10.0.
+> - Hashed storage became the default for new installations in GitLab 12.0
+> - Hashed storage is enabled by default for new and renamed projects in GitLab 13.0.
GitLab can be configured to use one or multiple repository storage paths/shard
locations that can be:
- Mounted to the local disk
- Exposed as an NFS shared volume
-- Accessed via [Gitaly] on its own machine.
+- Accessed via [Gitaly](gitaly/index.md) on its own machine.
In GitLab, this is configured in `/etc/gitlab/gitlab.rb` by the `git_data_dirs({})`
configuration hash. The storage layouts discussed here will apply to any shard
@@ -20,40 +19,17 @@ The `default` repository shard that is available in any installations
that haven't customized it, points to the local folder: `/var/opt/gitlab/git-data`.
Anything discussed below is expected to be part of that folder.
-## Legacy Storage
-
-Legacy Storage is the storage behavior prior to version 10.0. For historical
-reasons, GitLab replicated the same mapping structure from the projects URLs:
+## Hashed storage
-- Project's repository: `#{namespace}/#{project_name}.git`
-- Project's wiki: `#{namespace}/#{project_name}.wiki.git`
+NOTE: **Note:**
+In GitLab 13.0, hashed storage is enabled by default and the legacy storage is
+deprecated. Support for legacy storage will be removed in GitLab 14.0.
+If you haven't migrated yet, check the
+[migration instructions](raketasks/storage.md#migrate-to-hashed-storage).
+The option to choose between hashed and legacy storage in the admin area has
+been disabled.
-This structure made it simple to migrate from existing solutions to GitLab and
-easy for Administrators to find where the repository is stored.
-
-On the other hand this has some drawbacks:
-
-Storage location will concentrate huge amount of top-level namespaces. The
-impact can be reduced by the introduction of
-[multiple storage paths](repository_storage_paths.md).
-
-Because backups are a snapshot of the same URL mapping, if you try to recover a
-very old backup, you need to verify whether any project has taken the place of
-an old removed or renamed project sharing the same URL. This means that
-`mygroup/myproject` from your backup may not be the same original project that
-is at that same URL today.
-
-Any change in the URL will need to be reflected on disk (when groups / users or
-projects are renamed). This can add a lot of load in big installations,
-especially if using any type of network based filesystem.
-
-## Hashed Storage
-
-CAUTION: **Important:**
-Geo requires Hashed Storage since 12.0. If you haven't migrated yet,
-check the [migration instructions](#how-to-migrate-to-hashed-storage) ASAP.
-
-Hashed Storage is the new storage behavior we rolled out with 10.0. Instead
+Hashed storage is the storage behavior we rolled out with 10.0. Instead
of coupling project URL and the folder structure where the repository will be
stored on disk, we are coupling a hash, based on the project's ID. This makes
the folder structure immutable, and therefore eliminates any requirement to
@@ -124,7 +100,7 @@ GitLab server. For example, on a default Omnibus installation this would be
`/var/opt/gitlab/git-data/repositories/@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git`
with `.git` from the end of the directory name removed.
-The output includes the project id and the project name:
+The output includes the project ID and the project name:
```plaintext
=> #<Project id:16 it/supportteam/ticketsystem>
@@ -134,6 +110,11 @@ The output includes the project id and the project name:
> [Introduced](https://gitlab.com/gitlab-org/gitaly/issues/1606) in GitLab 12.1.
+DANGER: **Danger:**
+Do not run `git prune` or `git gc` in pool repositories! This can
+cause data loss in "real" repositories that depend on the pool in
+question.
+
Forks of public projects are deduplicated by creating a third repository, the
object pool, containing the objects from the source project. Using
`objects/info/alternates`, the source project and forks use the object pool for
@@ -145,71 +126,15 @@ when housekeeping is run on the source project.
"@pools/#{hash[0..1]}/#{hash[2..3]}/#{hash}.git"
```
-DANGER: **Danger:**
-Do not run `git prune` or `git gc` in pool repositories! This can
-cause data loss in "real" repositories that depend on the pool in
-question.
-
-### How to migrate to Hashed Storage
-
-To start a migration, enable Hashed Storage for new projects:
-
-1. Go to **Admin > Settings > Repository** and expand the **Repository Storage** section.
-1. Select the **Use hashed storage paths for newly created and renamed projects** checkbox.
-
-Check if the change breaks any existing integration you may have that
-either runs on the same machine as your repositories are located, or may login to that machine
-to access data (for example, a remote backup solution).
-
-To schedule a complete rollout, see the
-[Rake task documentation for storage migration][rake/migrate-to-hashed] for instructions.
-
-If you do have any existing integration, you may want to do a small rollout first,
-to validate. You can do so by specifying a range with the operation.
-
-This is an example of how to limit the rollout to Project IDs 50 to 100, running in
-an Omnibus GitLab installation:
-
-```shell
-sudo gitlab-rake gitlab:storage:migrate_to_hashed ID_FROM=50 ID_TO=100
-```
-
-Check the [documentation][rake/migrate-to-hashed] for additional information and instructions for
-source-based installation.
-
-#### Rollback
-
-Similar to the migration, to disable Hashed Storage for new
-projects:
-
-1. Go to **Admin > Settings > Repository** and expand the **Repository Storage** section.
-1. Uncheck the **Use hashed storage paths for newly created and renamed projects** checkbox.
-
-To schedule a complete rollback, see the
-[Rake task documentation for storage rollback](raketasks/storage.md#rollback-from-hashed-storage-to-legacy-storage) for instructions.
-
-The rollback task also supports specifying a range of Project IDs. Here is an example
-of limiting the rollout to Project IDs 50 to 100, in an Omnibus GitLab installation:
-
-```shell
-sudo gitlab-rake gitlab:storage:rollback_to_legacy ID_FROM=50 ID_TO=100
-```
+### Hashed storage coverage migration
-If you have a Geo setup, please note that the rollback will not be reflected automatically
-on the **secondary** node. You may need to wait for a backfill operation to kick-in and remove
-the remaining repositories from the special `@hashed/` folder manually.
-
-### Hashed Storage coverage
-
-We are incrementally moving every storable object in GitLab to the Hashed
-Storage pattern. You can check the current coverage status below (and also see
-the [issue][ce-2821]).
-
-Note that things stored in an S3 compatible endpoint will not have the downsides
+Files stored in an S3 compatible endpoint will not have the downsides
mentioned earlier, if they are not prefixed with `#{namespace}/#{project_name}`,
which is true for CI Cache and LFS Objects.
-| Storable Object | Legacy Storage | Hashed Storage | S3 Compatible | GitLab Version |
+In the table below, you can find the coverage of the migration to the hashed storage.
+
+| Storable Object | Legacy storage | Hashed storage | S3 Compatible | GitLab Version |
| --------------- | -------------- | -------------- | ------------- | -------------- |
| Repository | Yes | Yes | - | 10.0 |
| Attachments | Yes | Yes | - | 10.2 |
@@ -222,18 +147,16 @@ which is true for CI Cache and LFS Objects.
| LFS Objects | Yes | Similar | Yes | 10.0 / 10.7 |
| Repository pools| No | Yes | - | 11.6 |
-#### Implementation Details
-
-##### Avatars
+#### Avatars
Each file is stored in a folder with its `id` from the database. The filename is always `avatar.png` for user avatars.
When avatar is replaced, `Upload` model is destroyed and a new one takes place with different `id`.
-##### CI Artifacts
+#### CI artifacts
CI Artifacts are S3 compatible since **9.4** (GitLab Premium), and available in GitLab Core since **10.6**.
-##### LFS Objects
+#### LFS objects
[LFS Objects in GitLab](../topics/git/lfs/index.md) implement a similar
storage pattern using 2 chars, 2 level folders, following Git's own implementation:
@@ -247,7 +170,38 @@ storage pattern using 2 chars, 2 level folders, following Git's own implementati
LFS objects are also [S3 compatible](lfs/index.md#storing-lfs-objects-in-remote-object-storage).
-[ce-2821]: https://gitlab.com/gitlab-com/infrastructure/issues/2821
-[ce-28283]: https://gitlab.com/gitlab-org/gitlab-foss/issues/28283
-[rake/migrate-to-hashed]: raketasks/storage.md#migrate-existing-projects-to-hashed-storage
-[gitaly]: gitaly/index.md
+## Legacy storage
+
+NOTE: **Deprecated:**
+In GitLab 13.0, hashed storage is enabled by default and the legacy storage is
+deprecated. If you haven't migrated yet, check the
+[migration instructions](raketasks/storage.md#migrate-to-hashed-storage).
+Support for legacy storage will be removed in GitLab 14.0. If you're on GitLab
+13.0 and later, switching new projects to legacy storage is not possible.
+The option to choose between hashed and legacy storage in the admin area has
+been disabled.
+
+Legacy storage is the storage behavior prior to version 10.0. For historical
+reasons, GitLab replicated the same mapping structure from the projects URLs:
+
+- Project's repository: `#{namespace}/#{project_name}.git`
+- Project's wiki: `#{namespace}/#{project_name}.wiki.git`
+
+This structure made it simple to migrate from existing solutions to GitLab and
+easy for Administrators to find where the repository is stored.
+
+On the other hand this has some drawbacks:
+
+Storage location will concentrate huge amount of top-level namespaces. The
+impact can be reduced by the introduction of
+[multiple storage paths](repository_storage_paths.md).
+
+Because backups are a snapshot of the same URL mapping, if you try to recover a
+very old backup, you need to verify whether any project has taken the place of
+an old removed or renamed project sharing the same URL. This means that
+`mygroup/myproject` from your backup may not be the same original project that
+is at that same URL today.
+
+Any change in the URL will need to be reflected on disk (when groups / users or
+projects are renamed). This can add a lot of load in big installations,
+especially if using any type of network based filesystem.