diff options
-rw-r--r-- | doc/PROCESS.md | 68 | ||||
-rw-r--r-- | doc/README.md | 33 | ||||
-rw-r--r-- | doc/protobuf.md | 86 | ||||
-rw-r--r-- | doc/ruby_endpoint.md | 1 | ||||
-rw-r--r-- | doc/serverside_git_usage.md | 15 | ||||
-rw-r--r-- | doc/sidechannel.md | 2 | ||||
-rw-r--r-- | doc/sql_migrations.md | 16 | ||||
-rw-r--r-- | doc/test_repos.md | 12 | ||||
-rw-r--r-- | doc/virtual_storage.md | 127 |
9 files changed, 176 insertions, 184 deletions
diff --git a/doc/PROCESS.md b/doc/PROCESS.md index 31c28b036..2e55d85d0 100644 --- a/doc/PROCESS.md +++ b/doc/PROCESS.md @@ -1,6 +1,6 @@ -## Gitaly Team Process +# Gitaly Team Process -### Feature flags +## Feature flags Gitaly uses feature flags to safely roll out features in production. Feature flags are part of the `context.Context` of each RPC. The `featureflag` package @@ -24,7 +24,7 @@ steps. [issue-for-feature-rollout]: https://gitlab.com/gitlab-org/gitaly/-/issues/new?issuable_template=Feature%20Flag%20Roll%20Out [feature-issue-template]: https://gitlab.com/gitlab-org/gitaly/-/blob/master/.gitlab/issue_templates/Feature%20Flag%20Roll%20Out.md -#### Use and limitations +### Use and limitations Feature flags are [enabled through chatops][enable-flags] (which is just a consumer [of the API][ff-api]). In @@ -58,14 +58,14 @@ project][bug-project-argument]. [bug-user-argument]: https://gitlab.com/gitlab-org/gitaly/-/issues/3385 [bug-project-argument]: https://gitlab.com/gitlab-org/gitaly/-/issues/3386 -### Feature flags issue checklist +## Feature flags issue checklist The rest of this section is help for the individual checklist steps in [the issue template][feature-issue-template]. If this is your first time doing this you might want to first skip ahead to the help below, you'll likely need to file some access requests. -#### Feature flag labels +### Feature flag labels The lifecycle of feature flags is monitored via issue labels. @@ -79,7 +79,7 @@ checklist the person rolling it out will add [featureflag-staging]: https://gitlab.com/gitlab-org/gitaly/-/issues?label_name[]=featureflag%3A%3Astaging [featureflag-production]: https://gitlab.com/gitlab-org/gitaly/-/issues?label_name[]=featureflag%3A%3Aproduction -#### Is the required code deployed? +### Is the required code deployed? A quick way to see if your MR is deployed is to check if [the release bot][release-bot] has deployed it to staging, canary or production by @@ -109,15 +109,15 @@ details on the tagging and release process. [gitlab-git]: https://gitlab.com/gitlab-org/gitlab/ [gitaly-git]: https://gitlab.com/gitlab-org/gitaly/ -#### Do we need a change management issue? +### Do we need a change management issue? -#### Enable on staging +### Enable on staging -##### Prerequisites +#### Prerequisites You'll need chatops access. See [above](#use-and-limitations). -##### Steps +#### Steps Run: @@ -125,9 +125,9 @@ Run: Where `X` is the name of your feature. -#### Test on staging +### Test on staging -##### Prerequisites +#### Prerequisites Access to <https://staging.gitlab.com/users> is not the same as on GitLab.com (or signing in with Google on the `@gitlab.com` account). You @@ -147,7 +147,7 @@ repository, and manually test from there. [staging-access-request]: https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/new?issuable_template=Individual_Bulk_Access_Request [staging-users-link]: https://staging.gitlab.com/users -##### Steps +#### Steps Manually use the feature in whatever way exercises the code paths being enabled. @@ -158,7 +158,7 @@ Then enable `X` on staging, with: /chatops run feature set gitaly_X --staging ``` -##### Discussion +#### Discussion It's a good idea to run the feature for a full day on staging, this is because there are daily smoke tests that run daily in that @@ -167,14 +167,14 @@ environment. These are handled by [gitlab-qa-git]: https://gitlab.com/gitlab-org/gitlab-qa#how-do-we-use-it -#### Enable in production +### Enable in production -##### Prerequisites +#### Prerequisites Have you waited enough time with the feature running in the staging environment? Good! -##### Steps +#### Steps To enable your `X` feature at 5/25/50 percent, run: @@ -207,7 +207,7 @@ the feature flag code is deleted. So make sure you don't skip the [actor-gates]: https://docs.gitlab.com/ee/development/feature_flags/controls.html#process -##### Discussion +#### Discussion What percentages should you pick and how long should you wait? @@ -228,9 +228,9 @@ Nobody's better off if you wait 10 hours at 1% to get error data you could have waited 1 hour at 10% to get, or just over 10 minutes with close monitoring at 50%. -#### Feature lifecycle after it is live +### Feature lifecycle after it is live -##### Discussion +#### Discussion After a feature is running at `100%` for what ever's deemed to be a safe amount of time we should change it to be `OnByDefault: true`. See @@ -246,7 +246,7 @@ This is because even after setting `OnByDefault: true` users might still have opted to disable the new feature. See [the discussion below](#two-phase-ruby-to-go-rollouts) for possibly needing to do such changes over multiple releases. -##### Two phase Ruby to Go rollouts +#### Two phase Ruby to Go rollouts Depending on what the feature does it may be bad to remove the `else` branch where we have the feature disabled at this point. E.g. if it's @@ -263,7 +263,7 @@ do such a two-phase removal. [example-on-by-default-mr]: https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3033 [example-post-go-ruby-code-removal-mr]: https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3056 -##### Remove the feature flag via chatops +#### Remove the feature flag via chatops After completing the above steps the feature flag should be deleted from the database of available features via `chatops`. @@ -291,7 +291,7 @@ Then delete it if that's the data you're expecting: /chatops run feature delete gitaly_X ``` -### Git Version Upgrades +## Git Version Upgrades With the introduction of [bundled Git][bundled-git] we have gained the ability to do feature-flag-based rollouts of new Git versions, and using feature flags @@ -322,7 +322,7 @@ one release. [bundled-git]: git-execution-environments.md#bundled-git-recommended -#### Detailed Process +### Detailed Process The following detailed steps need to be done to upgrade to a new Git version: @@ -350,7 +350,7 @@ The following detailed steps need to be done to upgrade to a new Git version: _after_ both old and new bundled Git binaries have been installed in parallel in a release already. -### Gitaly Releases +## Gitaly Releases Gitaly releases are tagged automatically by [`release-tools`][release-tools] when a Release Manager tags a GitLab @@ -358,7 +358,7 @@ version. [release-tools]: https://gitlab.com/gitlab-org/release-tools -#### Major or minor releases +### Major or minor releases Once we release GitLab X.Y.0, we also release Gitaly X.Y.0 based on the content of `GITALY_SERVER_VERSION`. This version file is automatically updated by `release-tools` during auto-deploy picking. @@ -446,7 +446,7 @@ graph TD; With this solution, the team can autonomously tag any RC they like, but the other releases are handled by the GitLab tagging process. -#### Patch releases +### Patch releases The Gitaly team usually works on patch releases in the context of a security release. @@ -455,7 +455,7 @@ A Gitaly maintainer will only take care of merging the fixes on the stable branc For patch releases, we don't merge back to master. But `release-tools` will commit a changelog update to both the patch release, and the master branch. -#### Creating a release candidate +### Creating a release candidate Release candidate (RC) can be created with a chatops command. This is the only type of release that a developer can build autonomously. @@ -471,7 +471,7 @@ tagging a RC is a good way to make sure the `gitlab` feature branch has the prop has a **manual** job, `update-downstream-server-version`, that will create a merge request on the GitLab codebase to bump the Gitaly server version, and this will be assigned to you. Once the build has completed successfully, assign it to a maintainer for review. -### Publishing the Ruby gem +## Publishing the Ruby gem If an updated version of the Ruby proto gem is needed, it can be published to rubygems.org with the `_support/publish-gem` script. @@ -480,7 +480,7 @@ If the changes needed are not yet released, [create a release candidate](#creati - Checkout the tag to publish (vX.Y.Z) - run `_support/publish-gem X.Y.Z` -### Publishing the go module +## Publishing the go module If an [updated version](https://golang.org/doc/modules/release-workflow) of the go module is needed, it can be [published](https://golang.org/doc/modules/publishing) by tag creation. @@ -495,7 +495,7 @@ make upgrade-module FROM_MODULE=v15 TO_MODULE=v16 It replaces old imports with the new version in the go source files, updates `*.proto` files and modifies `go.mod` file to use a new target version of the module. -#### Security release +### Security release Security releases involve additional processes to ensure that recent releases of GitLab are properly patched while avoiding the leaking of the security @@ -505,13 +505,13 @@ Before beginning work on a security fix, open a new Gitaly issue with the templa `Security Release` and follow the instructions at the top of the page for following the template. -### Experimental builds +## Experimental builds Push the release tag to `dev.gitlab.org/gitlab/gitaly`. After passing the test suite, the tag will automatically be built and published in <https://packages.gitlab.com/gitlab/unstable>. -### Patching Git +## Patching Git The Gitaly project is the single source of truth for the Git distribution across all of GitLab: all downstream distributions use the `make git` target to build @@ -538,7 +538,7 @@ will always have these patches. As a result, all code which makes use of patched-in features must have fallback code to support the [minimum required Git version](../README.md#installation) -### RPC deprecation process +## RPC deprecation process First create a deprecation issue at <https://gitlab.com/gitlab-org/gitaly/issues> with the title `Deprecate RPC FooBar`. Use label `Deprecation`. Below is a diff --git a/doc/README.md b/doc/README.md index e72f5fae3..e08f75a7c 100644 --- a/doc/README.md +++ b/doc/README.md @@ -1,39 +1,38 @@ -## Gitaly documentation +# Gitaly documentation The historical reasons for the inception of Gitaly and our design decisions are -written in [the design doc](doc/DESIGN.md). +written in [the design doc](DESIGN.md). -#### Configuring Gitaly +## Configuring Gitaly Running Gitaly requires it to be configured correctly, options are described in -GitLab's [configuration documentation](https://gitlab.com/gitlab-org/gitlab/blob/master/doc/administration/gitaly/index.md). +GitLab's [configuration documentation](https://docs.gitlab.com/ee/administration/gitaly/index.html). -The reference guide is documented in https://gitlab.com/gitlab-org/gitlab/blob/master/doc/administration/gitaly/reference.md. +The reference guide is documented in <https://docs.gitlab.com/ee/administration/gitaly/reference.html>. -#### Developing Gitaly +## Developing Gitaly -- When new to Gitaly development, start by reading the [beginners guide](doc/beginners_guide.md) -- When developing on Gitaly-Ruby, read the [Gitaly-Ruby doc](doc/ruby_endpoint.md) -- The Gitaly release process is described in [our process doc](doc/PROCESS.md) -- Tests use Git repositories too, [read more about them](doc/test_repos.md) +- When new to Gitaly development, start by reading the [beginners guide](beginners_guide.md) +- The Gitaly release process is described in [our process doc](PROCESS.md) +- Tests use Git repositories too, [read more about them](test_repos.md) - Praefect uses SQL. To create a new SQL migration see [sql_migrations.md](sql_migrations.md) - For Gitaly hooks documentation, see [Gitaly hooks documentation](hooks.md) -#### Gitaly Cluster +## Gitaly Cluster Gitaly does not replicate any data. If a Gitaly server goes down, any of its clients can't read or write to the repositories stored on that server. This means that Gitaly is not highly available. How this will be solved is described -[in the HA design document](doc/design_ha.md) +[in the HA design document](design_ha.md) -For configuration please read [praefects configuration documentation](doc/configuration/praefect.md). +For configuration please read [Praefect's configuration documentation](configuration/praefect.md). -#### Technical explanations +## Technical explanations - [Delta Islands](delta_islands.md) - [Disk-based Cache](design_diskcache.md) -- [gitaly-ssh](../cmd/gitaly-ssh/README.md) -- [Git object quarantine during git push](object_quarantine.md) +- [`gitaly-ssh`](../cmd/gitaly-ssh/README.md) +- [Git object quarantine during Git push](object_quarantine.md) - [Logging in Gitaly](logging.md) - [Tips for reading Git source code](reading_git_source.md) - [Serverside Git Usage](serverside_git_usage.md) @@ -41,7 +40,7 @@ For configuration please read [praefects configuration documentation](doc/config - [Sidechannel protocol](sidechannel.md) - [Backpressure](backpressure.md) -#### RFCs +## RFCs - [Praefect Queue storage](rfcs/praefect-queue-storage.md) - [Snapshot storage](rfcs/snapshot-storage.md) diff --git a/doc/protobuf.md b/doc/protobuf.md index b5ce7b913..61ac2f8f9 100644 --- a/doc/protobuf.md +++ b/doc/protobuf.md @@ -15,7 +15,7 @@ Run `make proto` from the root of the repository to regenerate the client libraries after updating .proto files. See -[developers.google.com](https://developers.google.com/protocol-buffers/docs/proto3) +[`developers.google.com`](https://developers.google.com/protocol-buffers/docs/proto3) for documentation of the 'proto3' Protocol buffer specification language. @@ -24,37 +24,37 @@ language. The core Protobuf concepts we use are rpc, service and message. We use these to define the Gitaly **protocol**. -- **rpc** a function that can be called from the client and that gets - executed on the server. Belongs to a service. Can have one of four - request/response signatures: message/message (example: get metadata for - commit xxx), message/stream (example: get contents of blob xxx), - stream/message (example: create new blob with contents xxx), - stream/stream (example: git SSH session). -- **service** a logical group of RPC's. -- **message** like a JSON object except it has pre-defined types. -- **stream** an unbounded sequence of messages. In the Ruby clients - this looks like an Enumerator. +- **rpc** a function that can be called from the client and that gets + executed on the server. Belongs to a service. Can have one of four + request/response signatures: message/message (example: get metadata for + commit xxx), message/stream (example: get contents of blob xxx), + stream/message (example: create new blob with contents xxx), + stream/stream (example: Git SSH session). +- **service** a logical group of RPC's. +- **message** like a JSON object except it has pre-defined types. +- **stream** an unbounded sequence of messages. In the Ruby clients + this looks like an Enumerator. gRPC provides an implementation framework based on these Protobuf concepts. -- A gRPC **server** implements one or more services behind a network - listener. Example: the Gitaly server application. -- The gRPC toolchain automatically generates **client libraries** that - handle serialization and connection management. Example: the Go - client package and Ruby gem in this repository. -- gRPC **clients** use the client libraries to make remote procedure - calls. These clients must decide what network address to reach their - gRPC servers on and handle connection reuse: it is possible to - spread different gRPC services over multiple connections to the same - gRPC server. -- Officially a gRPC connection is called a **channel**. In the Go gRPC - library these channels are called **client connections** because - 'channel' is already a concept in Go itself. In Ruby a gRPC channel - is an instance of GRPC::Core::Channel. We use the word 'connection' - in this document. The underlying transport of gRPC, HTTP/2, allows - multiple remote procedure calls to happen at the same time on a - single connection to a gRPC server. In principle, a multi-threaded - gRPC client needs only one connection to a gRPC server. +- A gRPC **server** implements one or more services behind a network + listener. Example: the Gitaly server application. +- The gRPC toolchain automatically generates **client libraries** that + handle serialization and connection management. Example: the Go + client package and Ruby gem in this repository. +- gRPC **clients** use the client libraries to make remote procedure + calls. These clients must decide what network address to reach their + gRPC servers on and handle connection reuse: it is possible to + spread different gRPC services over multiple connections to the same + gRPC server. +- Officially a gRPC connection is called a **channel**. In the Go gRPC + library these channels are called **client connections** because + 'channel' is already a concept in Go itself. In Ruby a gRPC channel + is an instance of GRPC::Core::Channel. We use the word 'connection' + in this document. The underlying transport of gRPC, HTTP/2, allows + multiple remote procedure calls to happen at the same time on a + single connection to a gRPC server. In principle, a multi-threaded + gRPC client needs only one connection to a gRPC server. ## Gitaly RPC Server Architecture @@ -183,15 +183,15 @@ As a general principle, remember that Git does not enforce encodings on most data inside repositories, so we can rarely assume data to be a Protobuf "string" (which implies UTF-8). -1. `bytes revision`: for fields that accept any of branch names / tag - names / commit ID's. Uses `bytes` to be encoding agnostic. -2. `string commit_id`: for fields that accept a commit ID. -3. `bytes ref`: for fields that accept a refname. -4. `bytes path`: for paths inside Git repositories, i.e., inside Git - `tree` objects. -5. `string relative_path`: for paths on disk on a Gitaly server, - created by "us" (GitLab the application) instead of the user, we - want to use UTF-8, or better, ASCII. +1. `bytes revision`: for fields that accept any of branch names / tag + names / commit ID's. Uses `bytes` to be encoding agnostic. +1. `string commit_id`: for fields that accept a commit ID. +1. `bytes ref`: for fields that accept a refname. +1. `bytes path`: for paths inside Git repositories, i.e., inside Git + `tree` objects. +1. `string relative_path`: for paths on disk on a Gitaly server, + created by "us" (GitLab the application) instead of the user, we + want to use UTF-8, or better, ASCII. ### Stream patterns @@ -203,7 +203,7 @@ messages should not typically be larger than 1MB. #### Stream response of many small items -``` +```go rpc FooBar(FooBarRequest) returns (stream FooBarResponse); message FooBarResponse { @@ -218,11 +218,11 @@ A typical example of an "Item" would be a commit. To avoid the penalty of network IO for each Item we return, we batch them together. You can think of this as a kind of buffered IO at the level of the Item messages. In Go, to ease the bookkeeping you can use -[gitlab.com/gitlab-org/gitaly/internal/helper/chunker](https://godoc.org/gitlab.com/gitlab-org/gitaly/internal/helper/chunker). +[`gitlab.com/gitlab-org/gitaly/internal/helper/chunker`](https://pkg.go.dev/gitlab.com/gitlab-org/gitaly/internal/helper/chunker). #### Single large item split over multiple messages -``` +```go rpc FooBar(FooBarRequest) returns (stream FooBarResponse); message FooBarResponse { @@ -243,7 +243,7 @@ the response stream has `header` set, all others have `data` but no `header`. In the particular case where you're sending back raw binary data from Go, you can use -[gitlab.com/gitlab-org/gitaly/streamio](https://godoc.org/gitlab.com/gitlab-org/gitaly/streamio) +[`gitlab.com/gitlab-org/gitaly/streamio`](https://pkg.go.dev/gitlab.com/gitlab-org/gitaly/streamio) to turn your gRPC response stream into an `io.Writer`. > Note that a number of existing RPC's do not use this pattern exactly; @@ -254,7 +254,7 @@ to turn your gRPC response stream into an `io.Writer`. #### Many large items split over multiple messages -``` +```go rpc FooBar(FooBarRequest) returns (stream FooBarResponse); message FooBarResponse { diff --git a/doc/ruby_endpoint.md b/doc/ruby_endpoint.md deleted file mode 100644 index 9bf04c088..000000000 --- a/doc/ruby_endpoint.md +++ /dev/null @@ -1 +0,0 @@ -This guide was changed into the [beginner's guide](beginners_guide.md). diff --git a/doc/serverside_git_usage.md b/doc/serverside_git_usage.md index 4826fc610..889dc0035 100644 --- a/doc/serverside_git_usage.md +++ b/doc/serverside_git_usage.md @@ -1,14 +1,15 @@ -## Server side Git usage +# Server side Git usage Gitaly uses three implementations to read and write to Git repositories: + 1. `git(1)` - The same Git used by clients all over the world 1. [LibGit2](https://github.com/libgit2/libgit2) - a linkable library used through Rugged and Git2Go 1. On ad-hoc basis, part of Git is implemented in this repository if the implementation is easy and stable. For example the [pktline](../internal/git/pktline) package. -### Using Git +## Using Git -#### Plumbing v.s. porcelain +### Plumbing v.s. porcelain `git(1)` is the default choice to access repositories for Gitaly. Not all commands that are available should be used in the Gitaly code base. @@ -20,18 +21,18 @@ are intended for scripted use or to build another porcelain. Generally speaking, Gitaly should only use plumbing commands. `man 1 git` contains a section on the low level plumbing. However, a lot of -git's plumbing-like functionality is exposed as commands not marked as +Git's plumbing-like functionality is exposed as commands not marked as plumbing, but whose API reliability can be considered the same. E.g. `git log`'s `--pretty=` formats, `git config -l -z`, the documented exit codes of `git remote` etc.. We should use good judgement when choosing what commands and command -functionality to use, with the aim of not having gitaly break due to +functionality to use, with the aim of not having Gitaly break due to e.g. an error message being rephrased or functionality the upstream `git` maintainers don't consider plumbing-like being removed or altered. -#### Executing Git commands +### Executing Git commands When executing Git, developers should always use the `git.CommandFactory` and sibling interfaces. These make sure Gitaly is protected against command injection, the @@ -39,7 +40,7 @@ correct `git` is used, and correct setup for observable command invocations are used. When working with `git(1)` in Ruby, please be sure to read the [Ruby shell scripting guide](https://docs.gitlab.com/ee/development/shell_commands.html). -### Using LibGit2 +## Using LibGit2 Gitaly uses [Git2Go](https://github.com/libgit2/git2go) for Golang, and [Rugged](https://github.com/libgit2/rugged) which both are thin adapters to call diff --git a/doc/sidechannel.md b/doc/sidechannel.md index 047b60fed..42fdff5df 100644 --- a/doc/sidechannel.md +++ b/doc/sidechannel.md @@ -45,7 +45,7 @@ route Workhorse->Gitaly traffic through a gRPC proxy. If you need a proxy between Workhorse and Gitaly, use a TCP proxy instead. For more information about how and why we introduced sidechannels, see -https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/463. +<https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/463>. ## Implementation details diff --git a/doc/sql_migrations.md b/doc/sql_migrations.md index e6832fdd4..2efc8df1d 100644 --- a/doc/sql_migrations.md +++ b/doc/sql_migrations.md @@ -2,13 +2,13 @@ SQL migration files are stored in `/internal/praefect/datastore/migrations`. -The underlying migration engine we use is [github.com/rubenv/sql-migrate](https://github.com/rubenv/sql-migrate). +The underlying migration engine we use is [`github.com/rubenv/sql-migrate`](https://github.com/rubenv/sql-migrate). To generate a new migration, run the `_support/new-migration` script from the top level of your Gitaly checkout. Praefect SQL migrations should be applied automatically when you deploy Praefect. If you want to run them manually, run: -``` +```shell praefect -config /path/to/config.toml sql-migrate ``` @@ -25,13 +25,13 @@ praefect -config /path/to/config.toml sql-migrate -ignore-unknown=false To see which migrations have been applied, run: -``` +```shell praefect -config /path/to/config.toml sql-migrate-status ``` For example, the output may look like: -``` +```plaintext +----------------------------------------+--------------------------------------+ | MIGRATION | APPLIED | +----------------------------------------+--------------------------------------+ @@ -46,8 +46,8 @@ For example, the output may look like: The first column contains the migration ID, and the second contains one of three items: 1. The date on which the migration was applied -2. `no` if the migration has not yet been applied -3. `unknown migration` if the migration is not known by the current Praefect binary +1. `no` if the migration has not yet been applied +1. `unknown migration` if the migration is not known by the current Praefect binary ## Rolling back migrations @@ -60,7 +60,7 @@ Count the number of migrations you want to roll back. ### 2. Perform a dry run and verify that the right migrations are getting rolled back -``` +```shell praefect -config /path/to/config.toml sql-migrate-down NUM_ROLLBACK ``` @@ -73,6 +73,6 @@ roll back. We use the same command as before, but we pass `-f` to indicate we want destructive changes (the rollbacks) to happen. -``` +```shell praefect -config /path/to/config.toml sql-migrate-down -f NUM_ROLLBACK ``` diff --git a/doc/test_repos.md b/doc/test_repos.md index a2abda060..940fdf82a 100644 --- a/doc/test_repos.md +++ b/doc/test_repos.md @@ -1,12 +1,12 @@ # Repositories used by the Gitaly test suite Gitaly uses two test repositories. One should be enough but we got a -second one for free when importing code from gitlab-ce. +second one for free when importing code from `gitlab-ce`. These repositories get cloned by `make prepare-tests`. They end up in: -- `_build/testrepos/gitlab-test.git` -- `_build/testrepos/gitlab-git-test.git` +- `_build/testrepos/gitlab-test.git` +- `_build/testrepos/gitlab-git-test.git` To prevent fragile tests, we use fixed `packed-refs` files for these repositories. They get installed by make (see `_support/makegen.go`) @@ -15,11 +15,11 @@ from files in `_support`. To update `packed-refs` run `git gc` in your test repo and copy the new `packed-refs` to the right location in `_support`. -## Example: +## Example -Let's add a new branch to gitlab-test. +Let's add a new branch to `gitlab-test`. -``` +```shell make prepare-tests git clone _build/testrepos/gitlab-test.git _build/gitlab-test diff --git a/doc/virtual_storage.md b/doc/virtual_storage.md index bbada3c14..8f0571d33 100644 --- a/doc/virtual_storage.md +++ b/doc/virtual_storage.md @@ -6,21 +6,22 @@ Praefect hides the distributed nature of the storage cluster from the client by Praefect records the expected state of each repository within a virtual storage in the `repositories` table: -| virtual_storage | relative_path | generation | -|-----------------|------------------------------------------------------------------------------------|------------| -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | 5 | +| virtual_storage | relative_path | generation | +|-----------------|--------------------------------------------------------------------------------------|------------| +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | 5 | The `repositories` table has three columns: [^1] + 1. `virtual_storage` indicates which virtual storage the repository belongs in. 1. `relative_path` indicates where the repository should be stored on a physical storage. 1. `generation` is monotonically increasing version number that is incremented on each mutator call to the repository. `repository_assignments` table records which physical storages are supposed to contain a replica of a repository. -| virtual_storage | relative_path | storage | -|-----------------|------------------------------------------------------------------------------------|----------| -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-1 | -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-2 | +| virtual_storage | relative_path | storage | +|-----------------|--------------------------------------------------------------------------------------|------------| +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-1` | +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-2` | The number of assigned storages each repository has indicates its desired replication factor. Each record contains: @@ -32,12 +33,13 @@ previous behavior of replicating a repository on every physical storage. Praefect tracks the current state of a repository on each physical storage in the `storage_repositories` table: -| virtual_storage | relative_path | storage | generation | -|-----------------|------------------------------------------------------------------------------------|----------|------------| -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-1 | 5 | -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-2 | 5 | +| virtual_storage | relative_path | storage | generation | +|-----------------|--------------------------------------------------------------------------------------|------------|------------| +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-1` | 5 | +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-2` | 5 | The `storage_repositories` table has four columns: + 1. `virtual_storage` indicates which virtual storage the repository belongs in. 1. `relative_path` indicates where the repository should be stored on a physical storage. 1. `storage` indicates which physical storage this record belongs to. @@ -73,22 +75,22 @@ Praefect expects an up to date copy of a repository to be present on every assig `repositories`: -| virtual_storage | relative_path | generation | -|-----------------|------------------------------------------------------------------------------------|------------| -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | 0 | +| virtual_storage | relative_path | generation | +|-----------------|--------------------------------------------------------------------------------------|------------| +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | 0 | `repository_assignments`: -| virtual_storage | relative_path | storage | -|-----------------|------------------------------------------------------------------------------------|------------| -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-1 | -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-2 | +| virtual_storage | relative_path | storage | +|-----------------|--------------------------------------------------------------------------------------|------------| +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-1` | +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-2` | `storage_repositories`: -| virtual_storage | relative_path | storage | generation | -|-----------------|------------------------------------------------------------------------------------|----------|------------| -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-1 | 0 | +| virtual_storage | relative_path | storage | generation | +|-----------------|--------------------------------------------------------------------------------------|-----------|------------| +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-1`| 0 | To fix the inconsistency, reconciler schedules `update`-type jobs to the storages missing the repository from random healthy storages with up to date replicas. @@ -97,29 +99,29 @@ To fix the inconsistency, reconciler schedules `update`-type jobs to the storage A repository is considered outdated if its `generation` number in `storage_repositories` does not match the expected generation in the `repositories` table. This might be due to two reasons: 1. The primary received a new mutator which was not yet replicated to the outdated physical storage. -2. Administrator accepted data loss in the repository. Accepting data loss increments the expected generation in `repositories` and sets the selected authoritative storage's generation to match in `storage_repositories`, leading every other copy of the repository to be considered outdated. See [Gitaly Cluster documentation](https://docs.gitlab.com/ee/administration/gitaly/praefect.html#accept-data-loss) for more information. +1. Administrator accepted data loss in the repository. Accepting data loss increments the expected generation in `repositories` and sets the selected authoritative storage's generation to match in `storage_repositories`, leading every other copy of the repository to be considered outdated. See [Gitaly Cluster documentation](https://docs.gitlab.com/ee/administration/gitaly/praefect.html#accept-data-loss) for more information. In the case below, `gitaly-2` has an outdated version of the repository as its generation does not match what's in the `repositories` table. `gitaly-2` is behind by three changes as generation counters starts from zero. If a physical storage is missing a repository, its generation should be considered to be `-1` to correctly calculate the number of changes it is behind. `repositories`: -| virtual_storage | relative_path | generation | -|-----------------|------------------------------------------------------------------------------------|------------| -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | 2 | +| virtual_storage | relative_path | generation | +|-----------------|--------------------------------------------------------------------------------------|------------| +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | 2 | `repository_assignments`: -| virtual_storage | relative_path | storage | -|-----------------|------------------------------------------------------------------------------------|------------| -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-1 | -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-2 | +| virtual_storage | relative_path | storage | +|-----------------|--------------------------------------------------------------------------------------|------------| +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-1` | +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-2` | `storage_repositories`: -| virtual_storage | relative_path | storage | generation | -|-----------------|------------------------------------------------------------------------------------|----------|------------| -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-1 | 2 | -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-2 | 0 | +| virtual_storage | relative_path | storage | generation | +|-----------------|--------------------------------------------------------------------------------------|------------|------------| +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-1` | 2 | +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-2` | 0 | To fix the inconsistency, reconciler schedules `update`-type jobs to the storages missing the repository from random healthy storages with up to date replicas. @@ -143,11 +145,11 @@ A physical storage might contain a repository that is not expected be present on `storage_repositories`: -| virtual_storage | relative_path | storage | generation | -|-----------------|------------------------------------------------------------------------------------|----------|------------| -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-2 | 2 | +| virtual_storage | relative_path | storage | generation | +|-----------------|--------------------------------------------------------------------------------------|------------|------------| +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-2` | 2 | -Praefect's reconciler doesn't fix the inconsistency at this time. A fix is tracked in https://gitlab.com/gitlab-org/gitaly/-/issues/3480. +Praefect's reconciler doesn't fix the inconsistency at this time. A fix is tracked in <https://gitlab.com/gitlab-org/gitaly/-/issues/3480>. ### Unassigned Replica @@ -157,22 +159,22 @@ Below, `gitaly-2` has been unassigned but still contains a replica of the reposi `repositories`: -| virtual_storage | relative_path | generation | -|-----------------|------------------------------------------------------------------------------------|------------| -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | 2 | +| virtual_storage | relative_path | generation | +|-----------------|--------------------------------------------------------------------------------------|------------| +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | 2 | `repository_assignments`: -| virtual_storage | relative_path | storage | -|-----------------|------------------------------------------------------------------------------------|------------| -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-1 | +| virtual_storage | relative_path | storage | +|-----------------|--------------------------------------------------------------------------------------|------------| +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-1` | `storage_repositories`: -| virtual_storage | relative_path | storage | generation | -|-----------------|------------------------------------------------------------------------------------|----------|------------| -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-1 | 2 | -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-2 | 2 | +| virtual_storage | relative_path | storage | generation | +|-----------------|--------------------------------------------------------------------------------------|------------|------------| +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-1` | 2 | +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-2` | 2 | ### Removed Physical Storage @@ -185,23 +187,23 @@ from the virtual storage. Below, the repository's replication factor is `1` as ` `repositories`: -| virtual_storage | relative_path | generation | -|-----------------|------------------------------------------------------------------------------------|------------| -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | 3 | +| virtual_storage | relative_path | generation | +|-----------------|--------------------------------------------------------------------------------------|------------| +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | 3 | `repository_assignments`: -| virtual_storage | relative_path | storage | -|-----------------|------------------------------------------------------------------------------------|------------| -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-1 | -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-2 | +| virtual_storage | relative_path | storage | +|-----------------|--------------------------------------------------------------------------------------|------------| +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-1` | +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-2` | `storage_repositories`: -| virtual_storage | relative_path | storage | generation | -|-----------------|------------------------------------------------------------------------------------|----------|------------| -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-1 | 2 | -| default | @hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git | gitaly-2 | 3 | +| virtual_storage | relative_path | storage | generation | +|-----------------|--------------------------------------------------------------------------------------|------------|------------| +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-1` | 2 | +| default | `@hashed/5f/9c/5f9c4ab08cac7457e9111a30e4664920607ea2c115a1433d7be98e97e64244ca.git` | `gitaly-2` | 3 | The reconciler considers assigned but removed storages as still assigned. This means it won't schedule `delete_replica` jobs to any assigned storage before the assignments of the removed storages are manually removed. @@ -219,12 +221,3 @@ of the removed storages are manually removed. ## Footnotes [^1]: A repository is uniquely identified by its primary key `(virtual_storage, relative_path)`. While Praefect doesn't expect a specific format for the relative path, it helps to know that it is generated in GitLab by hashing the unique ID of the GitLab project. To find out more, read about [hashed storage](https://docs.gitlab.com/ee/administration/repository_storage_types.html#hashed-storage). - - - - - - - - - |