Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPatrick Steinhardt <psteinhardt@gitlab.com>2022-10-24 09:01:46 +0300
committerPatrick Steinhardt <psteinhardt@gitlab.com>2022-10-24 09:01:46 +0300
commita09e2b67f58008784e7384c34a8031a839a4fd8d (patch)
treed6e88ae86c8e2f091199b17e4b3246b3226ae481
parent9100bc1ba991757588906a818e24d52932ba665c (diff)
parent6c7fdf3d008ad9572f8a63d18ef9c4c32483445f (diff)
Merge branch 'eread/further-tidy-up-of-markdownlint-errors' into 'master'
Further tidy up of Markdownlint errors See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/4960 Merged-by: Patrick Steinhardt <psteinhardt@gitlab.com> Approved-by: Patrick Steinhardt <psteinhardt@gitlab.com> Co-authored-by: Evan Read <eread@gitlab.com>
-rw-r--r--doc/DESIGN.md65
-rw-r--r--doc/PROCESS.md130
-rw-r--r--doc/design_diskcache.md51
-rw-r--r--doc/design_ha.md210
-rw-r--r--doc/design_pack_objects_cache.md8
-rw-r--r--doc/gitaly-backup.md2
-rw-r--r--doc/hooks.md14
-rw-r--r--doc/logging.md22
-rw-r--r--doc/object_pools.md42
-rw-r--r--doc/object_quarantine.md8
-rw-r--r--doc/observability.md10
11 files changed, 301 insertions, 261 deletions
diff --git a/doc/DESIGN.md b/doc/DESIGN.md
index 515e3210a..f87b06175 100644
--- a/doc/DESIGN.md
+++ b/doc/DESIGN.md
@@ -1,17 +1,20 @@
-## Reason
+# Gitaly design
+
+## Reason
### Git Characteristics That Make Horizontal Scaling Difficult
Git's fundamental behaviors are similar to relational database engines and are difficult to horizontally scale for the same reasons that serverless database is challenging and why serverless database cannot handle all existing relational database workloads.
-Gitaly is a layer that brings horizontal scaling and higher availability to massively scaled Git operations through a variety of optimizations in disk locality, caching results of intensive operations (like git pack-objects), coordinating between multiple nodes, cluster synchronization and sharding.
+Gitaly is a layer that brings horizontal scaling and higher availability to massively scaled Git operations through a variety of optimizations in disk locality, caching results of intensive operations (like `git pack-objects`), coordinating between multiple nodes, cluster synchronization and sharding.
+
+> **Note:** While Gitaly is designed to help Git scale horizontally, Gitaly internal operations depend on the standard open source release of the Git client which it calls during Git operations. So some Gitaly limitations still pass through from Git. The same is true of any server system that does not have a layer like Gitaly - but in such cases there is no ability to provide any horizontal scaling support at all.
-> **Note:** While Gitaly is designed to help Git scale horizontally, Gitaly internal operations depend on the standard open source release of the git client which it calls during git operations. So some Gitaly limitations still pass through from Git. The same is true of any server system that does not have a layer like Gitaly - but in such cases there is no ability to provide any horizontal scaling support at all.
#### Git Architectural Characteristics and Assumptions
- **Stateful, Atomic, ACID Transactions** (“database synonymous” workload with regard to memory / CPU / disk IO).
- **"Process Atomic" Transactions** - requires one commit to be coordinated by one and only one Git process.
-- **Atomic Storage** - assumes that operations of a single git command write to a single storage end-point.
+- **Atomic Storage** - assumes that operations of a single Git command write to a single storage end-point.
- **Storage channel speeds** - assumes low latency, high bandwidth storage access (near bus speeds).
- **ACID Isolation** - by design Git allows concurrent update access to the same repository as much as possible, in the area of updating Git Refs, record locking is necessary and implemented by Git.
- **Wide ranging burst memory / CPU / disk IO requirements** - assumes significant available memory headroom for operations that intensify depending on the content size.
@@ -22,27 +25,32 @@ Gitaly is a layer that brings horizontal scaling and higher availability to mass
These workload characteristics are not fundamentally predictable across the portfolio of source code that a given GitLab server may need to store. Large monorepos might exist at companies with few employees. Binaries storage - while not considered an ideal file type for Git file systems - is common in some industry segments or project types. This means that architecting a GitLab instance with built-in Git headroom limitations causes unexpected limitations of specific Git usage patterns of the people using the instance.
-These are some of the most challenging git workloads for Git:
+These are some of the most challenging Git workloads for Git:
+
- Large scale, busy monorepos (commit volume is high and packs for full clones are very large).
- High commit volume on a single repository (commit volume is high packs for full clones are very frequent).
- Binaries stored in the Git object database. (In GitLab Git LFS can be redirected to PaaS storage).
- Full history cloning - due to packfile creation requirements.
The above workload factors compound together when a given workload has more than one characteristic.
+
#### Affects on Horizontal Compute Architecture
+
- The memory burstiness profile of Git makes it (and therefore Gitaly) very challenging to reliably containerize because container systems have very strong memory limits. Exceeding these limits causes significant operational instability and/or termination by the container running system.
- The disk IO burstiness profile of Git makes it (and therefore Gitaly) very challenging to use remote file systems with reliability and integrity (e.g. NFS - including PaaS versions). This was, in fact, the first design reason for Gitaly - to avoid having the Git binary operate on remote storage.
- The CPU burstiness profile of Git (and therefore Gitaly) also makes it challenging to reliably containerize.
These are the challenges that imply an application layer is needed to help Git scale horizontally in any scaled implementation - not just GitLab. GitLab has built this layer and continues to chip away (iterate) against all of the above challenges in this innovative layer.
+
### Evidence To Back Building a New Horizontal Layer to Scale Git
-For GitLab.com the [git access is slow](https://gitlab.com/gitlab-com/infrastructure/issues/351).
+
+For GitLab.com the [Git access is slow](https://gitlab.com/gitlab-com/infrastructure/issues/351).
When looking at `Rugged::Repository.new` performance data we can see that our P99 spikes up to 30 wall seconds, while the CPU time keeps in the realm of the 15 milliseconds. Pointing at filesystem access as the culprit.
![rugged.new timings](doc/img/rugged-new-timings.png)
-Our P99 access time to just create a `Rugged::Repository` object, which is loading and processing the git objects from disk, spikes over 30 seconds, making it basically unusable. We also saw that just walking through the branches of gitlab-ce requires 2.4 wall seconds.
+Our P99 access time to just create a `Rugged::Repository` object, which is loading and processing the Git objects from disk, spikes over 30 seconds, making it basically unusable. We also saw that just walking through the branches of `gitlab-ce` requires 2.4 wall seconds.
We considered to move to metal to fix our problems with higher performance hardware. But our users are using GitLab in the cloud so it should work great there. And this way the increased performance will benefit every GitLab user.
@@ -50,7 +58,7 @@ Gitaly will make our situation better in a few steps:
1. One central place to monitor operations
1. Performance improvements doing less and caching more
-1. Move the git operations from the app to the file/git server with git rpc (routing git access over JSON HTTP calls)
+1. Move the Git operations from the app to the file/Git server with Git rpc (routing Git access over JSON HTTP calls)
1. Use Git ketch to allow active-active (push to a local server), and distributed read operations (read from a secondary). This is far in the future, we might also use a distributed key value store instead. See the [active-active issue](https://gitlab.com/gitlab-org/gitlab-ee/issues/1381). Until we are active active we can just use persistent storage on the cloud to shard, this eliminates the need for redundancy.
## Scope
@@ -59,7 +67,6 @@ To maintain the focus of the project, the following subjects are out-of-scope fo
1. Replication and high availability (including multi-master and active-active).
-
## References
- [GitHub diff pages](http://githubengineering.com/how-we-made-diff-pages-3x-faster/)
@@ -77,34 +84,34 @@ To maintain the focus of the project, the following subjects are out-of-scope fo
All design decisions should be added here.
-1. Why are we considering to use Git Ketch? It is open source, uses the git protocol itself, is made by experts in distributed systems (Google), and is as simple as we can think of. We have to accept that we'll have to run the JVM on the Git servers.
-1. We'll keep using the existing sharding functionality in GitLab to be able to add new servers. Currently we can use it to have multiple file/git servers. Later we will need multiple Git Ketch clusters.
-1. We need to get rid of NFS mounting at some point because one broken NFS server causes all the application servers to fail to the point where you can't even ssh in.
-1. We want to move the git executable as close to the disk as possible to reduce latency, hence the need for git rpc to talk between the app server and git.
+1. Why are we considering to use Git Ketch? It is open source, uses the Git protocol itself, is made by experts in distributed systems (Google), and is as simple as we can think of. We have to accept that we'll have to run the JVM on the Git servers.
+1. We'll keep using the existing sharding functionality in GitLab to be able to add new servers. Currently we can use it to have multiple file/Git servers. Later we will need multiple Git Ketch clusters.
+1. We need to get rid of NFS mounting at some point because one broken NFS server causes all the application servers to fail to the point where you can't even SSH in.
+1. We want to move the Git executable as close to the disk as possible to reduce latency, hence the need for Git rpc to talk between the app server and Git.
1. [Cached metadata is stored in Redis LRU](https://gitlab.com/gitlab-org/gitaly/issues/2#note_20157141)
1. [Cached payloads are stored in files](https://gitlab.com/gitlab-org/gitaly/issues/14) since Redis can't store large objects
-1. Why not use GitLab Git? So workhorse and ssh access can use the same system. We need this to manage cache invalidation.
+1. Why not use GitLab Git? So workhorse and SSH access can use the same system. We need this to manage cache invalidation.
1. Why not make this a library for most users instead of a daemon/server?
- * Centralization: We need this new layer to be accessed and to share resources from multiple sources. A library is not fit for this end.
- * A library would have to be used in one of our current components, none of which seems ideal to take on this task:
- * gitlab-shell: return to the gitolite model? No.
- * Gitlab-workhorse: is now a proxy for Rails; would then become simultaneous proxy and backend service. Sounds confusing.
- * Unicorn: cannot handle slow requests
- * Sidekiq: can handle slow jobs but not requests
- * Combination workhorse+unicorn+sidekiq+gitlab-shell: this is hard to get right and slow to build even when you are an expert
- * With a library we will still need to keep the NFS shares mounted in the application hosts. That puts a hard stop to scale our storage because we need to keep multiplying the NFS mounts in all the workers.
+ - Centralization: We need this new layer to be accessed and to share resources from multiple sources. A library is not fit for this end.
+ - A library would have to be used in one of our current components, none of which seems ideal to take on this task:
+ - `gitlab-shell`: return to the gitolite model? No.
+ - `gitlab-workhorse`: is now a proxy for Rails; would then become simultaneous proxy and backend service. Sounds confusing.
+ - Unicorn: cannot handle slow requests.
+ - Sidekiq: can handle slow jobs but not requests.
+ - Combination `gitlab-workhorse`+ Unicorn + Sidekiq + `gitlab-shell`: this is hard to get right and slow to build even when you are an expert.
+ - With a library we will still need to keep the NFS shares mounted in the application hosts. That puts a hard stop to scale our storage because we need to keep multiplying the NFS mounts in all the workers.
1. Can we focus on instrumenting first before building Gitaly? Prometheus doesn't work with Unicorn.
1. How do we ship this quickly without affecting users? Behind a feature flag like we did with workhorse. We can update it independently in production.
-1. How much memory will this use? Guess 50MB, we will save memory in the rails app, guess more in sidekiq (GBs but not sure), but initially more because more libraries are still loaded everywhere.
+1. How much memory will this use? Guess 50MB, we will save memory in the rails app, guess more in Sidekiq (GBs but not sure), but initially more because more libraries are still loaded everywhere.
1. What packaging tool do we use? [Govendor because we like it more](https://gitlab.com/gitlab-org/gitaly/issues/15)
-1. How will the networking work? A unix socket for git operations and TCP for monitoring. This prevents having to build out authentication at this early stage. https://gitlab.com/gitlab-org/gitaly/issues/16
-1. We'll include the `/vendor` directory in source control https://gitlab.com/gitlab-org/gitaly/issues/18
-1. We will use [E3 from BitBucket to measure performance closely in isolation](https://gitlab.com/gitlab-org/gitaly/issues/34).
-1. GitLab already has [logic so that the application servers know which file/git server contains what repository](https://docs.gitlab.com/ee/administration/repository_storages.html), this eliminates the need for a router.
+1. How will the networking work? A unix socket for Git operations and TCP for monitoring. This prevents having to build out authentication at this early stage. <https://gitlab.com/gitlab-org/gitaly/issues/16>
+1. We'll include the `/vendor` directory in source control <https://gitlab.com/gitlab-org/gitaly/issues/18>
+1. We will use [E3 from Bitbucket to measure performance closely in isolation](https://gitlab.com/gitlab-org/gitaly/issues/34).
+1. GitLab already has [logic so that the application servers know which file/Git server contains what repository](https://docs.gitlab.com/ee/administration/repository_storages.html), this eliminates the need for a router.
1. Use [gRPC](http://www.grpc.io/) instead of HTTP+JSON. Not so much for performance reasons (Protobuf is faster than JSON) but because gRPC is an RPC framework. With HTTP+JSON we have to invent our own framework; with gRPC we get a set of conventions to work with. This will allow us to move faster once we have learned how to use gRPC.
-1. All protocol definitions and auto-generated gRPC client code will be in the gitaly repo. We can include the client code from the rest of the application as a Ruby gem / Go package / client executable as needed. This will make cross-repo versioning easier.
+1. All protocol definitions and auto-generated gRPC client code will be in the `gitaly` repo. We can include the client code from the rest of the application as a Ruby gem / Go package / client executable as needed. This will make cross-repo versioning easier.
1. Gitaly will expose high-level Git operations, not low-level Git object/ref storage lookups. Many interesting Git operations involve an unbounded number of Git object lookups. For example, the number of Git object lookups needed to generate a diff depends on the number of changed files and how deep those files are in the repository directory structure. It is not feasible to make each of those Git object lookups a remote procedure call.
1. By default all Go packages in the Gitaly repository use the `/internal` directory, unless we explicitly want to export something. The only exception is the `/cmd` directory for executables.
1. GitLab requests should use as few Gitaly gRPC calls as possible. This means it is OK to move GitLab application logic into Gitaly when it saves us gRPC round trips.
1. Defining new gRPC calls is cheap. It is better to define a new 'high level' gRPC call and save gRPC round trips than to chain / combine 'low level' gRPC calls.
-1. Why is Gitaly written in Go? At the time the project started the only practical options were Ruby and Go. We expected to be able to handle more traffic with fewer resources if we used Go. Today (Q3 2019), part of Gitaly is written in Ruby. On the particular Gitaly server that hosts gitlab-org/gitlab-ce, we have a pool of gitaly-ruby processes using a total 20GB of RSS and handling 5 requests per second. The single Gitaly Go process on that machine uses less than 3GB of memory and handles 90 requests per second. \ No newline at end of file
+1. Why is Gitaly written in Go? At the time the project started the only practical options were Ruby and Go. We expected to be able to handle more traffic with fewer resources if we used Go. Today (Q3 2019), part of Gitaly is written in Ruby. On the particular Gitaly server that hosts `gitlab-org/gitlab`, we have a pool of `gitaly-ruby` processes using a total 20GB of RSS and handling 5 requests per second. The single Gitaly Go process on that machine uses less than 3GB of memory and handles 90 requests per second.
diff --git a/doc/PROCESS.md b/doc/PROCESS.md
index d893dbf3b..31c28b036 100644
--- a/doc/PROCESS.md
+++ b/doc/PROCESS.md
@@ -30,7 +30,9 @@ Feature flags are [enabled through chatops][enable-flags] (which is
just a consumer [of the API][ff-api]). In
[`#chat-ops-test`][chan-chat-ops-test] try:
- /chatops run feature list --match gitaly_
+```shell
+/chatops run feature list --match gitaly_
+```
If you get a permission error you need to request access first. That
can be done [in the `#production` channel][production-request-acl].
@@ -40,7 +42,9 @@ enabling or disabling. For example: to check if
[`gitaly_go_user_delete_tag`][chan-production] is enabled on staging
run:
- /chatops run feature get gitaly_go_user_delete_tag --staging
+```shell
+/chatops run feature get gitaly_go_user_delete_tag --staging
+```
Note that the full set of chatops features for the Rails environment
does not work in Gitaly. E.g. the [`--user` argument does
@@ -83,14 +87,16 @@ checking if the MR has [a `workflow::staging`][deployed-staging],
[`workflow::canary`][deployed-canary] or
[`workflow::production`][deployed-production] label.
-The [/help action on gitlab.com][help-action] shows the currently
+The [/help action on GitLab.com][help-action] shows the currently
deployed hash. Copy that `HASH` and look at `GITALY_SERVER_VERSION` in
-[gitlab-org/gitlab.git][gitlab-git] to see what the embedded gitaly
-version is. Or in [a gitaly.git checkout][gitaly-git] run this to see
+[`gitlab-org/gitlab.git`][gitlab-git] to see what the embedded Gitaly
+version is. Or in [a `gitaly.git` checkout][gitaly-git] run this to see
what commits aren't deployed yet:
- git fetch
- git shortlog $(curl -s https://gitlab.com/gitlab-org/gitlab/-/raw/HASH/GITALY_SERVER_VERSION)..origin/master
+```shell
+git fetch
+git shortlog $(curl -s https://gitlab.com/gitlab-org/gitlab/-/raw/HASH/GITALY_SERVER_VERSION)..origin/master
+```
See the [documentation on releases below](#gitaly-releases) for more
details on the tagging and release process.
@@ -123,19 +129,19 @@ Where `X` is the name of your feature.
##### Prerequisites
-Access to https://staging.gitlab.com/users is not the same as on
-gitlab.com (or signing in with Google on the @gitlab.com account). You
+Access to <https://staging.gitlab.com/users> is not the same as on
+GitLab.com (or signing in with Google on the `@gitlab.com` account). You
must [request access to it][staging-access-request].
As of December 2020 clicking "Sign in" on
-https://about.staging.gitlab.com will redirect to https://gitlab.com,
+<https://about.staging.gitlab.com> will redirect to <https://gitlab.com>,
so make sure to use the `/users` link.
As of writing signing in at [that link][staging-users-link] will land
you on the `/users` 404 page once you're logged in. You should then
typically manually modify the URL
`https://staging.gitlab.com/YOURUSER`
-(e.g. https://staging.gitlab.com/avar) or another way to get at a test
+(e.g. <https://staging.gitlab.com/avar>) or another way to get at a test
repository, and manually test from there.
[staging-access-request]: https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/new?issuable_template=Individual_Bulk_Access_Request
@@ -148,14 +154,16 @@ being enabled.
Then enable `X` on staging, with:
- /chatops run feature set gitaly_X --staging
+```shell
+/chatops run feature set gitaly_X --staging
+```
##### Discussion
It's a good idea to run the feature for a full day on staging, this is
because there are daily smoke tests that run daily in that
environment. These are handled by
-[gitlab-org/gitlab-qa.git][gitlab-qa-git]
+[`gitlab-org/gitlab-qa.git`][gitlab-qa-git]
[gitlab-qa-git]: https://gitlab.com/gitlab-org/gitlab-qa#how-do-we-use-it
@@ -170,17 +178,23 @@ environment? Good!
To enable your `X` feature at 5/25/50 percent, run:
- /chatops run feature set gitaly_X 5
- /chatops run feature set gitaly_X 25
- /chatops run feature set gitaly_X 50
+```shell
+/chatops run feature set gitaly_X 5
+/chatops run feature set gitaly_X 25
+/chatops run feature set gitaly_X 50
+```
And then finally when you're happy it works properly do:
- /chatops run feature set gitaly_X 100
+```shell
+/chatops run feature set gitaly_X 100
+```
Followed by:
- /chatops run feature set gitaly_X true
+```shell
+/chatops run feature set gitaly_X true
+```
Note that you need both the `100` and `true` as separate commands. See
[the documentation on actor gates][actor-gates]
@@ -229,11 +243,8 @@ pre-feature code from the codebase, and we should add another
changelog entry when doing that.
This is because even after setting `OnByDefault: true` users might
-still have opted to disable the new feature. See [the discussion
-below](#two-phase-ruby-to-go-rollouts)) for possibly needing to do
-such changes over multiple releases.
-
-[example-on-by-default-mr]: https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3033
+still have opted to disable the new feature. See [the discussion below](#two-phase-ruby-to-go-rollouts) for possibly
+needing to do such changes over multiple releases.
##### Two phase Ruby to Go rollouts
@@ -244,11 +255,10 @@ a rewrite of Ruby code in Go.
As we deploy the Ruby code might be in the middle of auto-restarting,
so we could remove its code before the Go code has a chance to update
with its default, and would still want to call it. So therefore you
-need to do any such removal in two gitlab.com release cycles.
+need to do any such removal in two GitLab release cycles.
-See the example of [MR !3033][example-on-by-default-mr] and [MR
-!3056][example-post-go-ruby-code-removal-mr] for how to do such a
-two-phase removal.
+See the example of [MR !3033][example-on-by-default-mr] and [MR !3056][example-post-go-ruby-code-removal-mr] for how to
+do such a two-phase removal.
[example-on-by-default-mr]: https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3033
[example-post-go-ruby-code-removal-mr]: https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3056
@@ -261,7 +271,9 @@ from the database of available features via `chatops`.
If you don't do this others will continue to see the features with
e.g.:
- /chatops run feature list --match=gitaly_
+```shell
+/chatops run feature list --match=gitaly_
+```
It also incrementally adds to data that needs to be fetched &
populated on every request.
@@ -269,11 +281,15 @@ populated on every request.
To remove the flag first sanity check that it's the feature you want,
that it's at [`100%` and is `true`](#enable-in-production):
- /chatops run feature get gitaly_X
+```shell
+/chatops run feature get gitaly_X
+```
Then delete it if that's the data you're expecting:
- /chatops run feature delete gitaly_X
+```shell
+/chatops run feature delete gitaly_X
+```
### Git Version Upgrades
@@ -296,9 +312,9 @@ that we have no such issues with zero-downtime upgrades:
Git binaries. The new version is guarded behind a feature flag at this
point in time.
-2. We roll out the feature flag and eventually remove it.
+1. We roll out the feature flag and eventually remove it.
-3. We remove the old bundled Git binaries.
+1. We remove the old bundled Git binaries.
Note that because we cannot remove the old Git binaries at the same time when we
add the new ones. We must ensure that both sets exist in parallel for at least
@@ -313,23 +329,23 @@ The following detailed steps need to be done to upgrade to a new Git version:
1. Add the new bundled Git distribution to the `Makefile`. See
c0d05650be681c2accb4cec5aac74a6dd77a2fa6.
-2. Add a new bundled Git execution environment with a feature flag. See
+1. Add a new bundled Git execution environment with a feature flag. See
b547b368c8f584e9aabe8eef9342f99440b0c248. Please note that execution
environments are ordered by decreasing priority: the first environment
whose feature flags are all turned on will be picked. You thus have to
add your new environment to the top.
-3. Roll out the feature flag by following our feature flag process. You may
+1. Roll out the feature flag by following our feature flag process. You may
decide to remove the feature flag before the feature flag is removed in
case it is a low-risk upgrade of the Git version (e.g. when you perform a
patch-release upgrade, only).
-4. Remove the feature flag. See 888e6233fd85691f0852ae6c4a3656da9bf3d8e4.
+1. Remove the feature flag. See 888e6233fd85691f0852ae6c4a3656da9bf3d8e4.
-5. Remove the execution environment of the old bundled Git version. See
+1. Remove the execution environment of the old bundled Git version. See
af1a3fe7b536d22a6db9ba6591d222b23d01d83f.
-6. Remove the old set of bundled Git binaries from the `Makefile`. See
+1. Remove the old set of bundled Git binaries from the `Makefile`. See
9c700ea473d781eea50eab685d643d95e9c4ffee. Note that this must only happen
_after_ both old and new bundled Git binaries have been installed in
parallel in a release already.
@@ -344,10 +360,10 @@ version.
#### Major or minor releases
-Once we release GitLab X.Y.0, we also release gitaly X.Y.0 based on the content of `GITALY_SERVER_VERSION`.
+Once we release GitLab X.Y.0, we also release Gitaly X.Y.0 based on the content of `GITALY_SERVER_VERSION`.
This version file is automatically updated by `release-tools` during auto-deploy picking.
-Because gitaly master is moving we need to take extra care of what we tag.
+Because Gitaly master is moving we need to take extra care of what we tag.
Let's imagine a situation like this on `master`
@@ -366,7 +382,7 @@ graph LR;
Commit `C` is picked into auto-deploy and the build is successfully deployed to production
-We are ready to tag `v12.9.0` but there is a new merge commit, `D`, on gitaly `master`.
+We are ready to tag `v12.9.0` but there is a new merge commit, `D`, on Gitaly `master`.
```mermaid
graph LR;
@@ -387,6 +403,7 @@ graph LR;
We cannot tag on `D` as it never reached production.
`release-tools` follows this algorithm:
+
1. create a stable branch from `GITALY_SERVER_VERSION` (commit `C`),
1. bump the version and
1. prepare the changelog (commit `C'`).
@@ -415,6 +432,7 @@ graph LR;
```
Legend
+
```mermaid
graph TD;
A["master commit"];
@@ -442,8 +460,8 @@ For patch releases, we don't merge back to master. But `release-tools` will comm
Release candidate (RC) can be created with a chatops command.
This is the only type of release that a developer can build autonomously.
-When working on a GitLab feature that requires a minimum gitaly version,
-tagging a RC is a good way to make sure the gitlab feature branch has the proper gitaly version.
+When working on a GitLab feature that requires a minimum Gitaly version,
+tagging a RC is a good way to make sure the `gitlab` feature branch has the proper Gitaly version.
- Pick the current milestone (i.e. 12.9)
- Pick a release candidate number, you can check `VERSION` to see if we have one already (12.9.0-rc1)
@@ -453,9 +471,9 @@ tagging a RC is a good way to make sure the gitlab feature branch has the proper
has a **manual** job, `update-downstream-server-version`, that will create a merge request on the GitLab codebase to bump the Gitaly server version, and this will be assigned to you.
Once the build has completed successfully, assign it to a maintainer for review.
-### Publishing the ruby gem
+### Publishing the Ruby gem
-If an updated version of the ruby proto gem is needed, it can be published to rubygems.org with the `_support/publish-gem` script.
+If an updated version of the Ruby proto gem is needed, it can be published to rubygems.org with the `_support/publish-gem` script.
If the changes needed are not yet released, [create a release candidate](#creating-a-release-candidate) first.
@@ -477,7 +495,7 @@ make upgrade-module FROM_MODULE=v15 TO_MODULE=v16
It replaces old imports with the new version in the go source files,
updates `*.proto` files and modifies `go.mod` file to use a new target version of the module.
-##### Security release
+#### Security release
Security releases involve additional processes to ensure that recent releases
of GitLab are properly patched while avoiding the leaking of the security
@@ -489,17 +507,17 @@ the template.
### Experimental builds
-Push the release tag to dev.gitlab.org/gitlab/gitaly. After
+Push the release tag to `dev.gitlab.org/gitlab/gitaly`. After
passing the test suite, the tag will automatically be built and
-published in https://packages.gitlab.com/gitlab/unstable.
+published in <https://packages.gitlab.com/gitlab/unstable>.
-### Patching git
+### Patching Git
The Gitaly project is the single source of truth for the Git distribution across
all of GitLab: all downstream distributions use the `make git` target to build
-and install the git version used at runtime. Given that there is only one
-central location where we define the git version and its features, this grants
-us the possibility to easily apply custom patches to git.
+and install the Git version used at runtime. Given that there is only one
+central location where we define the Git version and its features, this grants
+us the possibility to easily apply custom patches to Git.
In order for a custom patch to be accepted into the Gitaly project, it must meet
the high bar of being at least in the upstream's `next` branch. The mechanism is
@@ -507,14 +525,14 @@ thus intended as a process to ensure that we can test upstreamed patches faster
than having to wait for the next release, not to add patches which would never
be accepted upstream. Patches which were not upstreamed yet will not be
accepted: at no point in time do we want to start maintaining a friendly fork of
-git.
+Git.
In order to add a patch, you can simply add it to the `GIT_PATCHES` array in our
`Makefile`.
-Note: while there is only a single git distribution which is distributed across
+Note: while there is only a single Git distribution which is distributed across
all of GitLab's official distributions, there may be unofficial ones which use a
-different version of git (most importantly source-based installations). So even
+different version of Git (most importantly source-based installations). So even
if you add patches to Gitaly's Makefile, you cannot assume that installations
will always have these patches. As a result, all code which makes use of
patched-in features must have fallback code to support the [minimum required Git
@@ -522,11 +540,11 @@ version](../README.md#installation)
### RPC deprecation process
-First create a deprecation issue at https://gitlab.com/gitlab-org/gitaly/issues
+First create a deprecation issue at <https://gitlab.com/gitlab-org/gitaly/issues>
with the title `Deprecate RPC FooBar`. Use label `Deprecation`. Below is a
template for the issue description.
-```
+```markdown
We are deprecating RPC FooBar because **REASONS**.
- [ ] put a deprecation comment `// DEPRECATED: <ISSUE-LINK>` in ./proto **Merge Request LINK**
diff --git a/doc/design_diskcache.md b/doc/design_diskcache.md
index d72053d82..7fae1e808 100644
--- a/doc/design_diskcache.md
+++ b/doc/design_diskcache.md
@@ -21,25 +21,31 @@ For every repository using the disk cache, a special set of files is maintained
to indicate which cached responses are still valid. These files are stored
in a dedicated **state directory** for each repository:
- ${STATE_DIR} = ${STORAGE_PATH}/+gitaly/state/${REPO_RELATIVE_PATH}
+```plaintext
+${STATE_DIR} = ${STORAGE_PATH}/+gitaly/state/${REPO_RELATIVE_PATH}
+```
Before a mutating RPC handler is invoked, a gRPC middleware creates a "lease"
file in the state directory that signifies a mutating operation is in-flight.
These lease files reside at the following path:
- ${STATE_DIR}/pending/${RANDOM_FILENAME}
+```plaintext
+${STATE_DIR}/pending/${RANDOM_FILENAME}
+```
Upon the completion of the mutating RPC, the lease file will be removed and
the "latest" file will be updated with a random value to reflect the new
"state" of the repository.
- ${STATE_DIR}/latest
+```plaintext
+${STATE_DIR}/latest
+```
The contents of latest are used along with several other values to form an
aggregate key that addresses a specific request for a specific repository at a
specific repository state:
-```
+```plaintext
─────┐
latest (random value) │
@@ -97,7 +103,9 @@ in-flight mutator RPCs), it is safe to cache responses and retrieve cached
responses. The aggregate key digest is used to form a hexadecimal path to the
cached response in this format:
- ${STORAGE_PATH}/+gitaly/cache/${DIGEST:0:2}/${DIGEST:2}
+```plaintext
+${STORAGE_PATH}/+gitaly/cache/${DIGEST:0:2}/${DIGEST:2}
+```
**Note:** The first two characters of the digest are used as a subdirectory to
allow the random distribution of the digest algorithm (SHA256) to evenly
@@ -123,31 +131,31 @@ invalidator was not working in a previous run.
actively monitored. [Node exporter] is recommended for tracking resource
usage.
- There may be initial latency spikes when enabling this feature for large/busy
- GitLab instances until the cache is warmed up. On a busy site like gitlab.com,
+ GitLab instances until the cache is warmed up. On a busy site like GitLab.com,
this may last as long as several seconds to a minute.
The following Prometheus queries (adapted from [GitLab's dashboards])
will give you insight into the performance and behavior of the cache:
- [Cache invalidation behavior]
- - `sum(rate(gitaly_cacheinvalidator_optype_total[1m])) by (type)`
- - Shows the Gitaly RPC types (mutator or accessor). The cache benefits from
- Gitaly requests that are more often accessors than mutators.
+ - `sum(rate(gitaly_cacheinvalidator_optype_total[1m])) by (type)`
+ - Shows the Gitaly RPC types (mutator or accessor). The cache benefits from
+ Gitaly requests that are more often accessors than mutators.
- [Cache Throughput Bytes]
- - `sum(rate(gitaly_diskcache_bytes_fetched_total[1m]))`
- - `sum(rate(gitaly_diskcache_bytes_stored_total[1m]))`
- - Shows the cache's throughput at the byte level. Ideally, the throughput
- should correlate to the cache invalidation behavior.
+ - `sum(rate(gitaly_diskcache_bytes_fetched_total[1m]))`
+ - `sum(rate(gitaly_diskcache_bytes_stored_total[1m]))`
+ - Shows the cache's throughput at the byte level. Ideally, the throughput
+ should correlate to the cache invalidation behavior.
- [Cache Effectiveness]
- - `(sum(rate(gitaly_diskcache_requests_total[1m])) - sum(rate(gitaly_diskcache_miss_total[1m]))) / sum(rate(gitaly_diskcache_requests_total[1m]))`
- - Shows how often the cache is invoked for a hit vs a miss. A value close to
- 100% is desirable.
+ - `(sum(rate(gitaly_diskcache_requests_total[1m])) - sum(rate(gitaly_diskcache_miss_total[1m]))) / sum(rate(gitaly_diskcache_requests_total[1m]))`
+ - Shows how often the cache is invoked for a hit vs a miss. A value close to
+ 100% is desirable.
- [Cache Errors]
- - `sum(rate(gitaly_diskcache_errors_total[1m])) by (error)`
- - Shows edge case errors experienced by the cache. The following errors can
- be ignored:
- - `ErrMissingLeaseFile`
- - `ErrPendingExists`
+ - `sum(rate(gitaly_diskcache_errors_total[1m])) by (error)`
+ - Shows edge case errors experienced by the cache. The following errors can
+ be ignored:
+ - `ErrMissingLeaseFile`
+ - `ErrPendingExists`
[GitLab's dashboards]: https://dashboards.gitlab.net/d/5Y26KtFWk/gitaly-inforef-upload-pack-caching?orgId=1
[Cache invalidation behavior]: https://dashboards.gitlab.net/d/5Y26KtFWk/gitaly-inforef-upload-pack-caching?orgId=1&fullscreen&panelId=2
@@ -155,4 +163,3 @@ will give you insight into the performance and behavior of the cache:
[Cache Effectiveness]: https://dashboards.gitlab.net/d/5Y26KtFWk/gitaly-inforef-upload-pack-caching?orgId=1&fullscreen&panelId=8
[Cache Errors]: https://dashboards.gitlab.net/d/5Y26KtFWk/gitaly-inforef-upload-pack-caching?orgId=1&fullscreen&panelId=12
[Node exporter]: https://docs.gitlab.com/ee/administration/monitoring/prometheus/node_exporter.html
-[storage location]: https://docs.gitlab.com/ee/administration/repository_storage_paths.html
diff --git a/doc/design_ha.md b/doc/design_ha.md
index 259584947..9e22a62c5 100644
--- a/doc/design_ha.md
+++ b/doc/design_ha.md
@@ -1,59 +1,66 @@
# Gitaly High Availability (HA) Design
-Gitaly Cluster is an active-active cluster configuration for resilient git operations. [Refer to our specific requirements](https://gitlab.com/gitlab-org/gitaly/issues/1332).
+
+Gitaly Cluster is an active-active cluster configuration for resilient Git operations. [Refer to our specific requirements](https://gitlab.com/gitlab-org/gitaly/issues/1332).
Refer to [epic &289][epic] for current issues and discussions revolving around
HA MVC development.
## Terminology
+
The following terminology may be used within the context of the Gitaly Cluster project:
- Shard - partition of the storage for all repos. Each shard will require redundancy in the form of multiple Gitaly nodes (at least 3 when optimal) to maintain HA.
- Praefect - a transparent front end to all Gitaly shards. This reverse proxy ensures that all gRPC calls are forwarded to the correct shard by consulting the coordinator. The reverse proxy also ensures that write actions are performed transactionally when needed.
- - etymology: from Latin praefectus for _a person appointed to any of various positions of command, authority, or superintendence, as a chief magistrate in ancient Rome or the chief administrative official of a department of France or Italy._
- - [pronounced _pree-fect_](https://www.youtube.com/watch?v=MHszCZjPmTQ)
-- Node - This is the Gitaly service which performs the actual git read/write operations from/to disk. Has no knowledge of shards/praefects.
+ - etymology: from Latin praefectus for _a person appointed to any of various positions of command, authority, or superintendence, as a chief magistrate in ancient Rome or the chief administrative official of a department of France or Italy._
+ - [pronounced _pree-fect_](https://www.youtube.com/watch?v=MHszCZjPmTQ)
+- Node - This is the Gitaly service which performs the actual Git read/write operations from/to disk. Has no knowledge of shards/praefects.
- RPC categories (#1496):
- - Accessor - a side effect free (or read-only) RPC; does not modify the git repo (!228)
- - Mutator - an RPC that modifies the data in the git repo (!228)
+ - Accessor - a side effect free (or read-only) RPC; does not modify the Git repo (!228)
+ - Mutator - an RPC that modifies the data in the Git repo (!228)
- Transaction - mechanism used to ensure that a set of voters agree on the same
modifications.
- - Voter - a node registered in a transaction. Only registered voters may
- cast votes in transactions.
- - Vote - the change a voter intends to commit if the transaction succeeds.
- This is e.g. the hash of all references which are to be updated in their
- old and new state.
- - Quorum - minimum number of voters required to agree in order to commit a
- transaction.
- - Voting strategy - defines how many nodes are required to reach quorum.
- - strong - all nodes need to agree.
- - primary-wins - the transaction always succeeds as long as the primary
- has cast a vote.
- - majority-wins - the transaction succeeds when the primary and at least
- half of the secondaries agree.
- - Subtransactions - ordered list of voting processes of a transaction. For
- each vote cast by a voter, a new subtransaction is created. For a
- transaction to be successful, all subtransactions need to be successful.
- This is done so that Gitaly may perform multiple modifications in a single
- transaction.
- - reference-transaction - Git mechanism to update references. The
- [reference-transaction hook](https://git-scm.com/docs/githooks#_reference_transaction)
- directly hooks into this mechanism whenever a reference is being updated
- via Git.
+ - Voter - a node registered in a transaction. Only registered voters may
+ cast votes in transactions.
+ - Vote - the change a voter intends to commit if the transaction succeeds.
+ This is e.g. the hash of all references which are to be updated in their
+ old and new state.
+ - Quorum - minimum number of voters required to agree in order to commit a
+ transaction.
+ - Voting strategy - defines how many nodes are required to reach quorum.
+ - strong - all nodes need to agree.
+ - primary-wins - the transaction always succeeds as long as the primary
+ has cast a vote.
+ - majority-wins - the transaction succeeds when the primary and at least
+ half of the secondaries agree.
+ - Subtransactions - ordered list of voting processes of a transaction. For
+ each vote cast by a voter, a new subtransaction is created. For a
+ transaction to be successful, all subtransactions need to be successful.
+ This is done so that Gitaly may perform multiple modifications in a single
+ transaction.
+ - reference-transaction - Git mechanism to update references. The
+ [reference-transaction hook](https://git-scm.com/docs/githooks#_reference_transaction)
+ directly hooks into this mechanism whenever a reference is being updated
+ via Git.
## Design
+
The high level design takes a reverse proxy approach to fanning out write requests to the appropriate nodes:
<img src="https://docs.google.com/drawings/d/e/2PACX-1vRl7WS-6RBOWxyLSBbBBAoV9MupmTh5vTqMOw_AX9axlboqkybTbFqGqExLyyYOilqEW7S9euXdBHzX/pub?w=960&amp;h=720">
## Phases
+
An iterative low risk approach needs to be devised to add functionality and verify assumptions at a sustainable pace while not impeding the existing functionality.
### 1. Simple pass-through proxy - no added functionality
+
- allows us to set up telemetry for observability of new service
- allows us to evaluate a gRPC proxy library
### 2. Introduce State
+
The following details need to be persisted in Postgres:
+
- [x] Primary location for a project
- [ ] Redundant locations for a project
- [ ] Available storage locations (initially can be configuration file)
@@ -61,29 +68,30 @@ The following details need to be persisted in Postgres:
Initially, the state of the shard nodes will be static and loaded from a configuration file. Eventually, this will be made dynamic via a data store (Postgres).
### Resolving Location
+
The following existing interaction will remain intact for the first iteration of the HA feature:
```mermaid
sequenceDiagram
- Client->>Rails: Modify repo X
- Rails-->>Datastore: Where is Repo X?
- Datastore-->> Rails: Repo X is at location A
- Rails-->>Gitaly: Modify repo X at location A
- Gitaly-->>Rails: Request succeeded/failed
+ Client->>Rails: Modify repo X
+ Rails-->>Datastore: Where is Repo X?
+ Datastore-->> Rails: Repo X is at location A
+ Rails-->>Gitaly: Modify repo X at location A
+ Gitaly-->>Rails: Request succeeded/failed
```
-Once the Rails app has resolved the primary location for the project, the request is made to the praefect. The praefect then resolves the redundant locations via the coordinator before applying the changes.
+Once the Rails app has resolved the primary location for the project, the request is made to the Praefect. The Praefect then resolves the redundant locations via the coordinator before applying the changes.
```mermaid
sequenceDiagram
- Rails->>Praefect: Modify repo X at A
- Praefect->>Coordinator: Which locations complement A for X?
- Coordinator->>Praefect: Locations B and C complement A
- Praefect->>Nodes ABC: Modify repo X
- Nodes ABC->>Praefect: Modifications successful!
+ Rails->>Praefect: Modify repo X at A
+ Praefect->>Coordinator: Which locations complement A for X?
+ Coordinator->>Praefect: Locations B and C complement A
+ Praefect->>Nodes ABC: Modify repo X
+ Nodes ABC->>Praefect: Modifications successful!
```
-*Note: the above interaction between the praefect and nodes A-B-C is an all-or-nothing transaction. All nodes must complete in success, otherwise a single node failure will cause the entire transaction to fail. This will be improved when replication is introduced.*
+*Note: the above interaction between the Praefect and nodes A-B-C is an all-or-nothing transaction. All nodes must complete in success, otherwise a single node failure will cause the entire transaction to fail. This will be improved when replication is introduced.*
### 3. Replication
@@ -98,14 +106,14 @@ design](#strong-consistency-design) for more details.
```mermaid
sequenceDiagram
- Praefect->>Node A: Modify repo X
- Praefect->>Node B: Modify repo X
- Praefect->>Node C: Modify repo X
- Node A->>Praefect: Success :-)
- Node B->>Praefect: Success :-)
- Node C->>Praefect: FAILURE :'(
- Praefect->>Node C: Replicate From A
- Node C->>Praefect: Success!
+ Praefect->>Node A: Modify repo X
+ Praefect->>Node B: Modify repo X
+ Praefect->>Node C: Modify repo X
+ Node A->>Praefect: Success :-)
+ Node B->>Praefect: Success :-)
+ Node C->>Praefect: FAILURE :'(
+ Praefect->>Node C: Replicate From A
+ Node C->>Praefect: Success!
```
When Praefect proxies a non-transactional mutator RPC, it will first route the
@@ -115,12 +123,12 @@ to all secondaries.
```mermaid
sequenceDiagram
- Praefect->>Node A: Modify repo X
- Node A->>Praefect: Success!
- Praefect->>Node B: Replicate From A
- Praefect->>Node C: Replicate From A
- Node B->>Praefect: Success!
- Node C->>Praefect: Success!
+ Praefect->>Node A: Modify repo X
+ Node A->>Praefect: Success!
+ Praefect->>Node B: Replicate From A
+ Praefect->>Node C: Replicate From A
+ Node B->>Praefect: Success!
+ Node C->>Praefect: Success!
```
#### Replication Process
@@ -153,11 +161,11 @@ The snapshot process is very resource intensive for fork operations. When
snapshotting a large repo, you end up with n-1 (n == replica count) copies of
the repository being compressed and extracted to secondary replicas.
-Adding to this stress is the constraint of storage limitations for gitlab.com
+Adding to this stress is the constraint of storage limitations for GitLab.com
users. The GitLab handbook (`www-gitlab-com`) is now larger than the storage
quota for free users. Until a secondary replica performs housekeeping, it
will consume the storage quota of the extracted snapshot. If Praefect instead
-used fast forking (https://gitlab.com/gitlab-org/gitlab/-/issues/24523), this
+used fast forking (<https://gitlab.com/gitlab-org/gitlab/-/issues/24523>), this
would not be an issue since forked copies would only use a small amount of
additional data.
@@ -178,10 +186,10 @@ graph TD
B-->| yes | C[Peek into RPC Stream to determine Repository]
B-->| no | G[Forward request to Gitaly]
C-->D{Scoped for repository?}
- D-->| yes | E[Get target repository from message]
- D-->| no | G
+ D-->| yes | E[Get target repository from message]
+ D-->| no | G
E-->F[Schedule Replication]
- F-->G
+ F-->G
```
## Stages until v1.0
@@ -331,16 +339,16 @@ sake of simplicity, we can thus reduce the problem scope to ensure strong
consistency for reference updates, only. There are multiple paths in GitLab that
can trigger such a reference update, including but not limited to:
-- Clients execute git-push(1).
+- Clients execute `git-push(1)`.
- Creation of tags via GitLab's `UserCreateTag` RPC.
- Merges and rebases when accepting merge requests.
-Common to all of them is that they perform reference updates using git-core,
+Common to all of them is that they perform reference updates using `git-core`,
and, more importantly, its reference transaction mechanism. An ideal solution
would thus hook into this reference transaction mechanism directly via
-githooks(5), which has been implemented in git-core and is going to be part of
+githooks(5), which has been implemented in `git-core` and is going to be part of
release v2.28.0.
Strong consistency is implemented via the reference-transaction hook. This hook
@@ -531,45 +539,45 @@ In order to observe reference transactions, the following metrics can be used:
v13.1.0-rc3.
## Notes
-* Existing discussions
- * Requirements: https://gitlab.com/gitlab-org/gitaly/issues/1332
- * Design: https://gitlab.com/gitlab-org/gitaly/issues/1335
-* Prior art
- * Stemma by Palantir
- * [Announcement](https://medium.com/palantir/stemma-distributed-git-server-70afbca0fc29)
- * Extends jgit (java git implementation)
- * Spokes by GitHub
- * Application layer approach: uses underlying git software to propagate changes to other locations.
- * Bitbucket Data Center (BDC)
- * [BDC FAQ](https://confluence.atlassian.com/enterprise/bitbucket-data-center-faq-776663707.html)
- * Ketch by Google (no longer maintained)
- * [Sid's comment on performance issue](https://news.ycombinator.com/item?id=13934698)
- * Also jgit based
-* gRPC proxy considerations
- * [gRPC Proxy library](https://github.com/mwitkow/grpc-proxy)
- * Pros
- * Handles all gRPC requests generically
- * Cons
- * Lack of support
- * [See current importers of project](https://godoc.org/github.com/mwitkow/grpc-proxy/proxy?importers)
- * Low level implementation requires knowledge of gRPC internals
- * Custom code generation
- * Pros
- * Simple and maintainable
- * Allows us to handwrite proxy code and later automate with lessons learned via code generation
- * Cons
- * Process heavy; requires custom tooling
- * Requires a way to tell which methods are read/write
- * [See MR for marking modifying RPCs](https://gitlab.com/gitlab-org/gitaly-proto/merge_requests/228)
- * See also:
- * [nRPC](https://github.com/nats-rpc/nrpc) - gRPC via NATS
- * [grpclb](https://github.com/bsm/grpclb) - gRPC load balancer
-* Complications
- * Existing Rails app indicates the Gitaly instance that a request is destined for (e.g. request to modify repo X should be directed to gitaly #1).
- * This means that rails app must be kept in the loop about any changes made to the location of a repo.
- * This may be mitigated by changing the proxy implementation to interpret the destination address as a reference to a shard rather than a specific host. This might open the door to allowing for something like consistent hashing.
- * While Git is distributed in nature, some write operations need to be serialized to avoid race conditions. This includes ref updates.
- * How do we coordinate proxies when applying ref updates? Do we need to?
+- Existing discussions
+ - Requirements: <https://gitlab.com/gitlab-org/gitaly/issues/1332>
+ - Design: <https://gitlab.com/gitlab-org/gitaly/issues/1335>
+- Prior art
+ - Stemma by Palantir
+ - [Announcement](https://medium.com/palantir/stemma-distributed-git-server-70afbca0fc29)
+ - Extends jgit (java Git implementation)
+ - Spokes by GitHub
+ - Application layer approach: uses underlying Git software to propagate changes to other locations.
+ - Bitbucket Data Center (BDC)
+ - [BDC FAQ](https://confluence.atlassian.com/enterprise/bitbucket-data-center-faq-776663707.html)
+ - Ketch by Google (no longer maintained)
+ - [Sid's comment on performance issue](https://news.ycombinator.com/item?id=13934698)
+ - Also jgit based
+- gRPC proxy considerations
+ - [gRPC Proxy library](https://github.com/mwitkow/grpc-proxy)
+ - Pros
+ - Handles all gRPC requests generically
+ - Cons
+ - Lack of support
+ - [See current importers of project](https://godoc.org/github.com/mwitkow/grpc-proxy/proxy?importers)
+ - Low level implementation requires knowledge of gRPC internals
+ - Custom code generation
+ - Pros
+ - Simple and maintainable
+ - Allows us to handwrite proxy code and later automate with lessons learned via code generation
+ - Cons
+ - Process heavy; requires custom tooling
+ - Requires a way to tell which methods are read/write
+ - [See MR for marking modifying RPCs](https://gitlab.com/gitlab-org/gitaly-proto/merge_requests/228)
+ - See also:
+ - [nRPC](https://github.com/nats-rpc/nrpc) - gRPC via NATS
+ - [grpclb](https://github.com/bsm/grpclb) - gRPC load balancer
+- Complications
+ - Existing Rails app indicates the Gitaly instance that a request is destined for (e.g. request to modify repo X should be directed to Gitaly #1).
+ - This means that rails app must be kept in the loop about any changes made to the location of a repo.
+ - This may be mitigated by changing the proxy implementation to interpret the destination address as a reference to a shard rather than a specific host. This might open the door to allowing for something like consistent hashing.
+ - While Git is distributed in nature, some write operations need to be serialized to avoid race conditions. This includes ref updates.
+ - How do we coordinate proxies when applying ref updates? Do we need to?
[epic]: https://gitlab.com/groups/gitlab-org/-/epics/289
diff --git a/doc/design_pack_objects_cache.md b/doc/design_pack_objects_cache.md
index fca4c9e46..ff20971ce 100644
--- a/doc/design_pack_objects_cache.md
+++ b/doc/design_pack_objects_cache.md
@@ -1,6 +1,6 @@
# Pack-objects cache design notes
-The purpose of this document is to give more insight into the design choices we made when building the first iteration of the pack-objects cache in https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/372.
+The purpose of this document is to give more insight into the design choices we made when building the first iteration of the pack-objects cache in <https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/372>.
## Introduction
@@ -10,7 +10,7 @@ Please read [Pack-objects cache for CI Git clones epic](https://gitlab.com/group
## High-level architecture
-```
+```plaintext
Gitaly (PostUploadPack) git-upload-pack gitaly-hooks Gitaly (PackObjectsHook) git-pack-objects
------------+---------- -------+------- -----+------ -----------+------------ -------+--------
| fetch request | | | |
@@ -31,7 +31,7 @@ Gitaly (PostUploadPack) git-upload-pack gitaly-hooks Gitaly (PackObjectsHook)
The whole pack-objects cache path depends on
[uploadpack.packObjectsHook](https://git-scm.com/docs/git-config#Documentation/git-config.txt-uploadpackpackObjectsHook)
-option. When upload-pack would run git pack-objects to create a packfile for a
+option. When upload-pack would run `git pack-objects` to create a packfile for a
client, it will run `gitaly-hooks` binary instead. The arguments when calling
`gitaly-hooks` includes `git pack-objects` at the beginning. This pattern is
similar to how Gitaly handles Git hooks during a push (such as `pre-receive`
@@ -104,7 +104,7 @@ Local files get a speed boost from RAM, and GitLab.com servers have lots of unus
The pack-objects cache is off by default because in some cases it
significantly increases the number of bytes written to disk. For more
information, see this issue where [we turned on the cache for
-gitlab-com/www-gitlab-com](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/4010#note_534564684).
+`gitlab-com/www-gitlab-com`](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/4010#note_534564684).
It would be better if the cache was on by default. But, if you don't have
CI-like traffic, there is probably no benefit, and if your Gitaly
diff --git a/doc/gitaly-backup.md b/doc/gitaly-backup.md
index 875b666d1..087bb7059 100644
--- a/doc/gitaly-backup.md
+++ b/doc/gitaly-backup.md
@@ -148,7 +148,6 @@ $BACKUP_DESTINATION_PATH/
4ec9599fc203d176a301536c2e091a19bc852759b255bd6818810a42c5fed14a.bundle
```
-
#### Generating full backups
A bundle with all references is created via the RPC `CreateBundle`. It
@@ -205,6 +204,7 @@ $BACKUP_DESTINATION_PATH/
```shell
awk '{print $2}' 001.refs | git bundle create repo.bundle --stdin
```
+
1. The backup and increment pointers are written.
#### Generating incremental backups
diff --git a/doc/hooks.md b/doc/hooks.md
index 821367ca5..0c5b820f4 100644
--- a/doc/hooks.md
+++ b/doc/hooks.md
@@ -46,8 +46,8 @@ execution path is:
## `gitaly-hooks` binary
-`gitaly-hooks` is a binary that is the single point of entry for git hooks
-through gitaly.
+`gitaly-hooks` is a binary that is the single point of entry for Git hooks
+through Gitaly.
### Subcommands
@@ -55,11 +55,11 @@ through gitaly.
| subcommand | purpose | arguments | stdin |
|----------------|----------|-----------|--------|
-| `pre-receive` | used as the git pre-receive hook none | `<old-value>` SP `<new-value>` SP `<ref-name>` LF |
-| `update` | used as the git update hook | `<ref-name>` `<old-object>` `<new-object>` | none
-| `post-receive` | used as the git post-receive hook | none | `<old-value>` SP `<new-value>` SP `<ref-name>` LF |
-| `reference-transaction` | used as the git reference-transactionhook | `prepared|committed|aborted` | `<old-value>` SP `<new-value>` SP `<ref-name>` LF |
-| `git` | used as the git pack-objects hook | `pack-objects` `[--stdout]` `[--shallow-file]` | `<object-list>` |
+| `pre-receive` | used as the Git `pre-receive` hook none | `<old-value>` SP `<new-value>` SP `<ref-name>` LF |
+| `update` | used as the Git `update` hook | `<ref-name>` `<old-object>` `<new-object>` | none
+| `post-receive` | used as the Git `post-receive` hook | none | `<old-value>` SP `<new-value>` SP `<ref-name>` LF |
+| `reference-transaction` | used as the Git `reference-transaction` hook | `prepared|committed|aborted` | `<old-value>` SP `<new-value>` SP `<ref-name>` LF |
+| `git` | used as the Git `pack-objects` hook | `pack-objects` `[--stdout]` `[--shallow-file]` | `<object-list>` |
## Hook-specific logic
diff --git a/doc/logging.md b/doc/logging.md
index da6d9705e..de3e0c649 100644
--- a/doc/logging.md
+++ b/doc/logging.md
@@ -5,7 +5,7 @@ Gitaly creates several kinds of log data.
## Go application logs
The main Gitaly process uses logrus to writes structured logs to
-stdout. These logs use either the text or the json format of logrus,
+stdout. These logs use either the text or the JSON format of logrus,
depending on setting in the Gitaly config file.
The main Gitaly process writes log messages with global scope and
@@ -20,28 +20,28 @@ Many Gitaly RPC's spawn Git processes which may write errors or
warnings to stderr. Gitaly will capture these stderr messages and
include them in its main log, tagged with the request correlation ID.
-## Gitaly-ruby application logs
+## `gitaly-ruby` application logs
### Unstructured logs
-Gitaly-ruby writes logs to stdout. These logs are not structured. The
-main Gitaly process captures the gitaly-ruby process log messages and
+`gitaly-ruby` writes logs to stdout. These logs are not structured. The
+main Gitaly process captures the `gitaly-ruby` process log messages and
converts each line into a structured message that includes information
-about the gitaly-ruby process such as the PID. These logs then get
+about the `gitaly-ruby` process such as the PID. These logs then get
printed as part of the log stream of the main Gitaly process.
-There is no attribution of log messages in gitaly-ruby beyond the
-gitaly-ruby process ID. If an RPC implemented in gitaly-ruby runs a
+There is no attribution of log messages in `gitaly-ruby` beyond the
+`gitaly-ruby` process ID. If an RPC implemented in `gitaly-ruby` runs a
Git command, and if that Git command prints to stderr, it will show up
-as untagged data in the log stream for the gitaly-ruby parent process.
+as untagged data in the log stream for the `gitaly-ruby` parent process.
-Because of these properties, gitaly-ruby logs are often hard to read,
+Because of these properties, `gitaly-ruby` logs are often hard to read,
and it is often not possible to attribute log messages to individual
RPC requests.
### Structured logs
-Gitaly-ruby also writes a JSON structured log file with access log
+`gitaly-ruby` also writes a JSON structured log file with access log
information (method, duration, response code). It can be found in
`gitaly_ruby_json.log`.
@@ -61,5 +61,5 @@ Examples are:
- `gitaly_ruby_json.log`
There is another log file called `githost.log`. This log is generated
-by legacy code in gitaly-ruby. The way it is used, it might as well
+by legacy code in `gitaly-ruby`. The way it is used, it might as well
write to stdout.
diff --git a/doc/object_pools.md b/doc/object_pools.md
index f28ae423b..65072c787 100644
--- a/doc/object_pools.md
+++ b/doc/object_pools.md
@@ -1,4 +1,4 @@
-## Object Pools
+# Object Pools
When creating forks of a repository, most of the objects for forked repository
and the repository it forked from are shared. Storing those shared objects
@@ -11,7 +11,7 @@ The sharing of objects for a given repository and its object pool is done via
alternate object directories which Gitaly sets up when linking a repository to
an object pool by writing the `objects/info/alternates` file.
-### Lifetime of Object Pools
+## Lifetime of Object Pools
The lifetime of object pools is maintained via the
[ObjectPoolService](../proto/objectpool.proto), which provides various RPCs to
@@ -26,10 +26,10 @@ instead simply performs a copy of the references and objects.
Afterwards, any repositories which shall be a member of the pool needs to be
linked to it. Linking most importantly involves setting up the "alternates" file
of the pool member, but it also includes deleting all bitmaps for packs of the
-member. This is required by git because it can only ever use a single bitmap.
-While it's not an error to have multiple bitmaps, git will print a [user-visible
+member. This is required by Git because it can only ever use a single bitmap.
+While it's not an error to have multiple bitmaps, Git will print a [user-visible
warning](https://gitlab.com/gitlab-org/gitaly/-/issues/1728) on clone or fetch
-if there are. See [git-multi-pack-index(1)](https://git-scm.com/docs/multi-pack-index#_future_work)
+if there are. See [`git-multi-pack-index(1)`](https://git-scm.com/docs/multi-pack-index#_future_work)
for an explanation of this limitation.
Removing a member from an object pool is slightly more involved, as members of
@@ -38,11 +38,11 @@ It is thus not as simple as removing `objects/info/alternates`, as that would
leave behind a corrupt repository. Instead, Gitaly hard-links all objects which
are part of the object pool into the dissociating member first and removes the
alternate afterwards. In order to check whether the operation succeeded, Gitaly
-now runs git-fsck(1) to check for missing objects. If there are none, the
+now runs `git-fsck(1)` to check for missing objects. If there are none, the
dissociation has succeeded. Otherwise, it will fail and re-add the alternates
file.
-### Housekeeping
+## Housekeeping
Housekeeping for object pools is handled differently from normal repositories as
it not only involves repacking the pool, but also updating it. The housekeeping
@@ -56,27 +56,27 @@ It performs the following tasks:
shared between object pools and normal repositories. Most importantly, it
removes stale lockfiles and deletes known-broken stale references.
-2. A fetch is performed from the object pool member into the object pool with a
+1. A fetch is performed from the object pool member into the object pool with a
`+refs/*:refs/remotes/origin/*` refspec. This fetch is most notably not a
pruning fetch, that is any reference which gets deleted in the member will
stay around in the pool.
-3. The fetch may create new dangling objects which are not referenced anymore in
+1. The fetch may create new dangling objects which are not referenced anymore in
the pool repository. These dangling objects will be kept alive by creating
dangling references such that they do not get deleted in the pool. See
[Dangling Objects](#dangling-objects) for more information.
-4. Loose references are packed via git-pack-refs(1).
+1. Loose references are packed via `git-pack-refs(1)`.
-5. The pool is repacked via git-repack(1). The repack produces a single packfile
+1. The pool is repacked via `git-repack(1)`. The repack produces a single packfile
including all objects with a bitmap index. In order to improve reuse of
- packfiles where git will read data from the packfile directly instead of
+ packfiles where Git will read data from the packfile directly instead of
generating it on the fly, the packfile uses a delta island including
- `refs/heads` and `refs/tags`. This restricts git to only generate deltas for
+ `refs/heads` and `refs/tags`. This restricts Git to only generate deltas for
objects which are directly reachable via either a branch or a tag. Most
notably, this causes us to not generate deltas against dangling references.
-### Dangling Objects
+## Dangling Objects
When fetching from pool members into the object pool, then any force-updated
references may cause objects in the pool to not be referenced anymore. For
@@ -103,8 +103,8 @@ Having unreachable objects kept alive in this fashion does have its problems:
performing housekeeping tasks on the object pool itself. Fetches into the
object pool and repacking of references can thus become quite expensive.
-- Keeping dangling references alive makes git consider them as reachable. While
- this is the exact effect we want to achieve, it will also cause git to
+- Keeping dangling references alive makes Git consider them as reachable. While
+ this is the exact effect we want to achieve, it will also cause Git to
generate packfiles which may use such objects as delta bases which would under
normal circumstances be considered as unreachable. The resulting packfile is
thus potentially suboptimal. Gitaly works around this issue by using a delta
@@ -112,19 +112,19 @@ Having unreachable objects kept alive in this fashion does have its problems:
best-effort strategy, as it only considers a single object pool member's
reachability while ignoring potential reachability by any other pool member.
-### References
+## References
-When git repositories have alternates set up, then they by default advertise any
+When Git repositories have alternates set up, then they by default advertise any
references of the alternate itself. A client would thus typically also see both
dangling references as well as any other reference which was potentially already
deleted in the pool member which the client is fetching from. Besides being
inefficient, the resulting references would also be wrong.
To avoid advertising of such references, Gitaly uses a workaround of setting the
-config entry `core.alternateRefsCommand=exit 0 #`. This causes git to use the
-given command instead of executing git-for-each-ref(1) in the alternate and thus
+config entry `core.alternateRefsCommand=exit 0 #`. This causes Git to use the
+given command instead of executing `git-for-each-ref(1)` in the alternate and thus
stops it from advertising alternate references.
-### Further Reading
+## Further Reading
- [How Git object deduplication works in GitLab](https://docs.gitlab.com/ee/development/git_object_deduplication.html)
diff --git a/doc/object_quarantine.md b/doc/object_quarantine.md
index 1c1a0ac5c..afe219052 100644
--- a/doc/object_quarantine.md
+++ b/doc/object_quarantine.md
@@ -1,4 +1,4 @@
-# Git object quarantine during git push
+# Git object quarantine during Git push
While receiving a Git push, GitLab can reject pushes using the
`pre-receive` Git hook. Git has a special "object quarantine"
@@ -10,7 +10,7 @@ how GitLab is able to see quarantined objects.
## Git object quarantine
Git object quarantine was introduced in Git 2.11.0 via
-https://gitlab.com/gitlab-org/git/-/commit/25ab004c53cdcfea485e5bf437aeaa74df47196d.
+<https://gitlab.com/gitlab-org/git/-/commit/25ab004c53cdcfea485e5bf437aeaa74df47196d>.
To understand what it does we need to know how Git receives pushes on
the server.
@@ -125,8 +125,8 @@ the environment
variables](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/internal/git/alternates/alternates.go#L21-34)
that were present on the `pre-receive` hook, so that we can see the
quarantined objects. We do the same when we [instantiate a
-Gitlab::Git::Repository in
-gitaly-ruby](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/ruby/lib/gitlab/git/repository.rb#L44).
+`Gitlab::Git::Repository` in
+`gitaly-ruby`](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/ruby/lib/gitlab/git/repository.rb#L44).
### Relative paths
diff --git a/doc/observability.md b/doc/observability.md
index cf5ddd9db..2a4f62f18 100644
--- a/doc/observability.md
+++ b/doc/observability.md
@@ -26,13 +26,13 @@ entries.
Note that Grafana 'templates' use a combination of PromQL and
Grafana-specific modifiers.
-# Ad-hoc latency graphs with ELK
+## Ad-hoc latency graphs with ELK
Gitaly RPC latency data from Prometheus uses irregular (exponential)
bucket sizes which gives you unrealistic numbers. To get more realistic
percentiles you can use ELK.
-- Go to [ELK](https://log.gitlab.net)
-- Click 'Visualize'
-- Search for `gitaly rpc latency example`
-- Edit as needed
+- Go to [ELK](https://log.gitlab.net)
+- Click 'Visualize'
+- Search for `gitaly rpc latency example`
+- Edit as needed