Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJames Fargher <jfargher@gitlab.com>2021-10-29 04:22:54 +0300
committerJames Fargher <jfargher@gitlab.com>2021-11-04 23:54:20 +0300
commitb6758c626e9f8842a93ba494eae0b814c35aac77 (patch)
tree7044e32d79d006386bef0347c9eb62a251f4d9fe
parent0764a2bf28acd5e7f596622230ecac40985439ed (diff)
Add instructions on how to create/restore backups using gitaly-backup
-rw-r--r--doc/backups.md107
-rw-r--r--doc/gitaly-backup.md206
2 files changed, 206 insertions, 107 deletions
diff --git a/doc/backups.md b/doc/backups.md
deleted file mode 100644
index d0e6e5c63..000000000
--- a/doc/backups.md
+++ /dev/null
@@ -1,107 +0,0 @@
-# Repository Backups
-
-The `gitaly-backup` command creates repository backups.
-
-## Legacy layout
-
-This layout is designed to be identical to historic `backup.rake` repository
-backups. Repository data is stored in bundle files in a pre-determined
-directory structure based on each repository's relative path. This directory
-structure is then archived into a tar file by `backup.rake`. Each time a backup
-is created, this entire directory structure is recreated.
-
-For example, a repository with the relative path of
-`@hashed/4e/c9/4ec9599fc203d176a301536c2e091a19bc852759b255bd6818810a42c5fed14a.git`
-creates the following structure:
-
-```text
-$BACKUP_DESTINATION_PATH/
- @hashed/
- 4e/
- c9/
- 4ec9599fc203d176a301536c2e091a19bc852759b255bd6818810a42c5fed14a.bundle
-```
-
-
-### Generating full backups
-
-A bundle with all references is created via the RPC `CreateBundle`. It
-effectively executes the following:
-
-```shell
-git bundle create repo.bundle --all
-```
-
-### Generating incremental backups
-
-This layout does not support incremental backups.
-
-## Pointer layout
-
-This layout is designed to support incremental backups. Each repository backup
-cannot overwrite a previous backup because this would leave dangling incremental
-backups. To prevent dangling incremental backups, every new full backup is put into a new directory.
-The two files called `LATEST` point to:
-
-- The latest full backup.
-- The latest increment of that full backup.
-
-These pointer files enable looking up
-backups from object storage without needing directory traversal (directory
-traversal typically requires additional permissions). In addition to the bundle
-files, each backup writes a full list of refs and their target object IDs.
-
-When the pointer files are not found, the pointer layout will fall back to
-using the legacy layout.
-
-For example, a repository with the relative path of
-`@hashed/4e/c9/4ec9599fc203d176a301536c2e091a19bc852759b255bd6818810a42c5fed14a.git`
-and a backup ID of `20210930065413` will create the following structure:
-
-```text
-$BACKUP_DESTINATION_PATH/
- @hashed/
- 4e/
- c9/
- 4ec9599fc203d176a301536c2e091a19bc852759b255bd6818810a42c5fed14a/
- LATEST
- 20210930065413/
- 001.bundle
- 001.refs
- LATEST
-```
-
-### Generating full backups
-
-1. A full list of references is retrieved via the RPC `ListRefs`. This list is written to `001.refs` in the same format as [`git-show-ref`](https://git-scm.com/docs/git-show-ref#_output).
-
-1. A bundle is generated using the retrieved reference names. Effectively, by running:
-
- ```shell
- awk '{print $2}' 001.refs | git bundle create repo.bundle --stdin
- ```
-1. The backup and increment pointers are written.
-
-### Generating incremental backups
-
-1. The next increment is calculated by finding the increment `LATEST` file and
- adding 1. For example, `001` + `1` = `002`.
-
-1. A full list of references is retrieved using the `ListRefs` RPC. This list is
- written to the calculated next increment (for example, `002.refs`) in the same
- format as [`git-show-ref`](https://git-scm.com/docs/git-show-ref#_output).
-
-1. The full list of the previous increments references is retrieved by reading
- the file. For example, `001.refs`.
-
-1. A bundle is generated using the negated list of reference targets of the
- previous increment and the new list of retrieved reference names
- by effectively running:
-
- ```shell
- { awk '{print "^" $1}' 001.refs; awk '{print $2}' 002.refs; } | git bundle create repo.bundle --stdin
- ```
-
- Negating the object IDs from the previous increment ensures that we stop
- traversing commits when we reach the HEAD of the branch at the time of the
- last incremental backup.
diff --git a/doc/gitaly-backup.md b/doc/gitaly-backup.md
new file mode 100644
index 000000000..177643be9
--- /dev/null
+++ b/doc/gitaly-backup.md
@@ -0,0 +1,206 @@
+# `gitaly-backup`
+
+`gitaly-backup` is used to create backups of the Git repository data from
+Gitaly and Gitaly Cluster.
+
+## Directly backup repository data
+
+1. For each project to backup, find the Gitaly storage name and relative or disk path using either:
+ - The [Admin area](https://docs.gitlab.com/ee/administration/repository_storage_types.html#from-project-name-to-hashed-path).
+ - The [repository storage API](https://docs.gitlab.com/ee/api/projects.html#get-the-path-to-repository-storage).
+
+1. Generate the backup job file. The job file consists of a series of JSON objects separated by a new line (`\n`).
+
+ | Attribute | Type | Required | Description |
+ |:--------------------|:---------|:---------|:------------|
+ | `address` | string | yes | Address of the Gitaly or Gitaly Cluster server. |
+ | `token` | string | yes | Authentication token for the Gitaly server. |
+ | `storage_name` | string | yes | Name of the storage where the repository is stored. |
+ | `relative_path` | string | yes | Relative path of the repository. |
+ | `gl_project_path` | string | no | Name of the project. Used for logging. |
+
+ For example, `backup_job.json`:
+
+ ```json
+ {
+ "address":"unix:/var/opt/gitlab/gitaly.socket",
+ "token":"",
+ "storage_name":"default",
+ "relative_path":"@hashed/f5/ca/f5ca38f748a1d6eaf726b8a42fb575c3c71f1864a8143301782de13da2d9202b.git",
+ "gl_project_path":"diaspora/diaspora-client"
+ }
+ {
+ "address":"unix:/var/opt/gitlab/gitaly.socket",
+ "token":"",
+ "storage_name":"default",
+ "relative_path":"@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git",
+ "gl_project_path":"brightbox/puppet"
+ }
+ ```
+
+1. Pipe the backup job file to `gitaly-backup create`.
+
+ ```shell
+ /opt/gitlab/embedded/bin/gitaly-backup create -path $BACKUP_DESTINATION_PATH < backup_job.json
+ ```
+
+ | Argument | Type | Required | Description |
+ |:----------------------|:----------|:---------|:------------|
+ | `-path` | string | yes | Directory where the backup files will be created. |
+ | `-parallel` | integer | no | Maximum number of parallel backups. |
+ | `-parallel-storage` | integer | no | Maximum number of parallel backups per storage. |
+ | `-id` | string | no | Used by the locator to determine a unique path for the backup when a full backup is created. |
+ | `-locator` | string | no | Determines the file-system layout. Any of `legacy`, `pointer` (default `legacy`). Note: The feature is not ready for production use. |
+ | `-incremental` | bool | no | Determines if an incremental backup should be created. Note: The feature is not ready for production use. |
+
+## Directly restore repository data
+
+1. For each project to restore, find the Gitaly storage name and relative or disk path using either:
+ - The [Admin area](https://docs.gitlab.com/ee/administration/repository_storage_types.html#from-project-name-to-hashed-path).
+ - The [repository storage API](https://docs.gitlab.com/ee/api/projects.html#get-the-path-to-repository-storage).
+
+1. Generate the restore job file. The job file consists of a series of JSON objects separated by a new-line (`\n`).
+
+ | Attribute | Type | Required | Description |
+ |:--------------------|:---------|:---------|:------------|
+ | `address` | string | yes | Address of the Gitaly or Gitaly Cluster server. |
+ | `token` | string | yes | Authentication token for the Gitaly server. |
+ | `storage_name` | string | yes | Name of the storage where the repository is stored. |
+ | `relative_path` | string | yes | Relative path of the repository. |
+ | `gl_project_path` | string | no | Name of the project. Used for logging. |
+
+ For example, `restore_job.json`:
+
+ ```json
+ {
+ "address":"unix:/var/opt/gitlab/gitaly.socket",
+ "token":"",
+ "storage_name":"default",
+ "relative_path":"@hashed/f5/ca/f5ca38f748a1d6eaf726b8a42fb575c3c71f1864a8143301782de13da2d9202b.git",
+ "gl_project_path":"diaspora/diaspora-client"
+ }
+ {
+ "address":"unix:/var/opt/gitlab/gitaly.socket",
+ "token":"",
+ "storage_name":"default",
+ "relative_path":"@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git",
+ "gl_project_path":"brightbox/puppet"
+ }
+ ```
+
+1. Pipe the restore job file to `gitaly-backup restore`.
+
+ ```shell
+ /opt/gitlab/embedded/bin/gitaly-backup restore -path $BACKUP_SOURCE_PATH < restore_job.json
+ ```
+
+ | Argument | Type | Required | Description |
+ |:----------------------|:----------|:---------|:------------|
+ | `-path` | string | yes | Directory where the backup files are stored. |
+ | `-parallel` | integer | no | Maximum number of parallel restores. |
+ | `-parallel-storage` | integer | no | Maximum number of parallel restores per storage. |
+ | `-locator` | string | no | Determines the file-system layout. Any of `legacy`, `pointer` (default `legacy`). Note: The feature is not ready for production use. |
+
+## How Git repository backups work
+
+The way backup files are arranged on the filesystem or on object storages is determined by the layout.
+
+### Legacy layout
+
+This layout is designed to be identical to historic `backup.rake` repository
+backups. Repository data is stored in bundle files in a pre-determined
+directory structure based on each repository's relative path. This directory
+structure is then archived into a tar file by `backup.rake`. Each time a backup
+is created, this entire directory structure is recreated.
+
+For example, a repository with the relative path of
+`@hashed/4e/c9/4ec9599fc203d176a301536c2e091a19bc852759b255bd6818810a42c5fed14a.git`
+creates the following structure:
+
+```text
+$BACKUP_DESTINATION_PATH/
+ @hashed/
+ 4e/
+ c9/
+ 4ec9599fc203d176a301536c2e091a19bc852759b255bd6818810a42c5fed14a.bundle
+```
+
+
+#### Generating full backups
+
+A bundle with all references is created via the RPC `CreateBundle`. It
+effectively executes the following:
+
+```shell
+git bundle create repo.bundle --all
+```
+
+#### Generating incremental backups
+
+This layout does not support incremental backups.
+
+### Pointer layout
+
+This layout is designed to support incremental backups. Each repository backup
+cannot overwrite a previous backup because this would leave dangling incremental
+backups. To prevent dangling incremental backups, every new full backup is put into a new directory.
+The two files called `LATEST` point to:
+
+- The latest full backup.
+- The latest increment of that full backup.
+
+These pointer files enable looking up
+backups from object storage without needing directory traversal (directory
+traversal typically requires additional permissions). In addition to the bundle
+files, each backup writes a full list of refs and their target object IDs.
+
+When the pointer files are not found, the pointer layout will fall back to
+using the legacy layout.
+
+For example, a repository with the relative path of
+`@hashed/4e/c9/4ec9599fc203d176a301536c2e091a19bc852759b255bd6818810a42c5fed14a.git`
+and a backup ID of `20210930065413` will create the following structure:
+
+```text
+$BACKUP_DESTINATION_PATH/
+ @hashed/
+ 4e/
+ c9/
+ 4ec9599fc203d176a301536c2e091a19bc852759b255bd6818810a42c5fed14a/
+ LATEST
+ 20210930065413/
+ 001.bundle
+ 001.refs
+ LATEST
+```
+
+#### Generating full backups
+
+1. A full list of references is retrieved via the RPC `ListRefs`. This list is written to `001.refs` in the same format as [`git-show-ref`](https://git-scm.com/docs/git-show-ref#_output).
+1. A bundle is generated using the retrieved reference names. Effectively, by running:
+
+ ```shell
+ awk '{print $2}' 001.refs | git bundle create repo.bundle --stdin
+ ```
+1. The backup and increment pointers are written.
+
+#### Generating incremental backups
+
+1. The next increment is calculated by finding the increment `LATEST` file and
+ adding 1. For example, `001` + `1` = `002`.
+1. A full list of references is retrieved using the `ListRefs` RPC. This list is
+ written to the calculated next increment (for example, `002.refs`) in the same
+ format as [`git-show-ref`](https://git-scm.com/docs/git-show-ref#_output).
+1. The full list of the previous increments references is retrieved by reading
+ the file. For example, `001.refs`.
+1. A bundle is generated using the negated list of reference targets of the
+ previous increment and the new list of retrieved reference names
+ by effectively running:
+
+ ```shell
+ { awk '{print "^" $1}' 001.refs; awk '{print $2}' 002.refs; } | git bundle create repo.bundle --stdin
+ ```
+
+ Negating the object IDs from the previous increment ensures that we stop
+ traversing commits when we reach the HEAD of the branch at the time of the
+ last incremental backup.