Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJames Fargher <proglottis@gmail.com>2020-03-23 06:10:39 +0300
committerJames Fargher <proglottis@gmail.com>2020-03-23 06:10:39 +0300
commit6bc73cfe5c2589fb682b373434ed4f68b76afb07 (patch)
treeca2d6443180d02979b25dceccc2ebcd53ee75cbc
parentd036be95bba1d82314faf1675b702833e8f92711 (diff)
parent2621fe80ce42c0dec90be7ba99a4a2432bdcdc88 (diff)
Merge branch 'jv-doc-quarantine' into 'master'
Add developer documentation about Git object quarantine Closes #2542 See merge request gitlab-org/gitaly!1925
-rw-r--r--doc/README.md1
-rw-r--r--doc/object_quarantine.md147
2 files changed, 148 insertions, 0 deletions
diff --git a/doc/README.md b/doc/README.md
index 4cc571dca..5ce5435d1 100644
--- a/doc/README.md
+++ b/doc/README.md
@@ -33,6 +33,7 @@ For configuration please read [praefects configuration documentation](doc/config
- [Disk-based Cache](design_diskcache.md)
- [Tips for reading Git source code](reading_git_source.md)
- [gitaly-ssh](../cmd/gitaly-ssh/README.md)
+- [Git object quarantine during git push](object_quarantine.md)
#### Proposals
diff --git a/doc/object_quarantine.md b/doc/object_quarantine.md
new file mode 100644
index 000000000..46447c376
--- /dev/null
+++ b/doc/object_quarantine.md
@@ -0,0 +1,147 @@
+# Git object quarantine during git push
+
+While receiving a Git push, GitLab can reject pushes using the
+`pre-receive` Git hook. Git has a special "object quarantine"
+mechanism that allows it to eagerly delete rejected Git objects.
+
+In this document we will explain how Git object quarantine works, and
+how GitLab is able to see quarantined objects.
+
+## Git object quarantine
+
+Git object quarantine was introduced in Git 2.11.0 via
+https://gitlab.com/gitlab-org/git/-/commit/25ab004c53cdcfea485e5bf437aeaa74df47196d.
+To understand what it does we need to know how Git receives pushes on
+the server.
+
+### How Git receives a push
+
+On a Git server, a push goes into `git receive-pack`. This process does the following things:
+
+1. receive the Git objects pushed by the client and write them to disk
+1. receive the ref update commands from the client and keep them in memory
+1. check connectivity (no missing objects)
+1. run `pre-receive` and feed it the intended ref update commands
+1. if `pre-receive` rejects the push, clean up and stop
+1. apply ref update commands one by one. For each command, run the `update` hook which can reject the ref update.
+1. after all ref updates have been applied run the `post-receive` hook
+1. report success to the client and end the session
+
+Object quarantine exists for the sake of the cleanup that happens when
+`pre-receive` rejects the push (step 5 above). It changes the _timing_ of the
+cleanup. Without object quarantine, objects that were part of a
+rejected push would sit around until `git gc` would judge them as both
+unused and "old". How long that takes depends on how often `git gc`
+runs (or `git prune`), and on the configuration of when objects are
+"old". Because of object quarantine, rejected objects can be deleted
+immediately: Git can just `rm -rf` the quarantine directory and
+they're gone.
+
+### Git implementation
+
+The Git implementation of this mechanism rests on two things.
+
+#### 1. Alternate object directories
+
+The objects in a Git repository can be stored across multiple
+directories: 1 main directory, usually `/objects`, and 0 or more
+alternate directories. Together these act like a search path: when
+looking for an object Git first checks the main directory, then each
+alternate, until it finds the object.
+
+#### 2. Config overrides via environment variables
+
+Git can inject custom config into subprocesses via environment
+variables. In the case of Git object directories, these are
+`GIT_OBJECT_DIRECTORY` (the main object directory) and
+`GIT_ALTERNATE_OBJECT_DIRECTORIES` (a search path of `:`-separated
+alternate object directories).
+
+#### Putting it all together
+
+1. `git receive-pack` receives a push
+1. `git receive-pack` [creates a quarantine directory `objects/incoming-$RANDOM`](https://gitlab.com/gitlab-org/git/-/blob/v2.24.0/builtin/receive-pack.c#L1715)
+1. `git receive-pack` [configures the unpack process](https://gitlab.com/gitlab-org/git/-/blob/v2.24.0/builtin/receive-pack.c#L1721) to write objects into the quarantine directory
+1. `git receive-pack` unpacks the objects into the quarantine directory
+1. `git receive-pack` [runs the `pre-receive` hook](https://gitlab.com/gitlab-org/git/-/blob/v2.24.0/builtin/receive-pack.c#L1498) with special `GIT_OBJECT_DIRECTORY` and `GIT_ALTERNATE_OBJECT_DIRECTORIES` environment variables that add the quarantine directory to the search path
+1. If the `pre-receive` hook rejects the push, `git receive-pack` removes the quarantine directory and its contents. The push is aborted.
+1. If the `pre-receive` hook passes, `git receive-pack` [merges the quarantine directory into the main object directory](https://gitlab.com/gitlab-org/git/-/blob/v2.24.0/builtin/receive-pack.c#L1510).
+1. `git receive-pack` enters the ref update transaction
+
+Note that by the time the `update` hook runs, the quarantine directory
+has already been merged into the main object directory so it no longer
+matters. The same goes for the `post-receive` hook which runs even
+later.
+
+Because `pre-receive` has the special quarantine configuration data in
+environment variables, any `git` process spawned by `pre-receive` will
+inherit the quarantine config and will be able to see the objects that
+are being pushed.
+
+## GitLab and Git object quarantine
+
+### Why does all this matter to GitLab
+
+GitLab uses Git hooks, among other things, to implement features that
+can reject Git pushes. For example, you can mark a branch as
+"protected" in the GitLab web UI, and then certain types of users can
+no longer push to that branch. That feature is implemented via the [Git
+`pre-receive` hook](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/ruby/gitlab-shell/hooks/pre-receive).
+
+As mentioned above, Git object quarantine normally works more or less
+automatically because `git` commands spawned by the `pre-receive` hook
+inherit the special environment variables that contain the path to the
+quarantine directory. In the case of GitLab's hooks we have a problem,
+however, because the GitLab hooks are "dumb". All the GitLab hooks do
+is take the inputs of the hook executable (the list of ref update
+commands) and send them to the GitLab Rails internal API via a POST
+request. The application logic that decides whether the push is
+allowed resides in Rails. The hook just waits and reports back result
+of the POST API request to GitLab.
+
+During the POST, the internal GitLab API makes Gitaly calls back into the repo to
+examine the objects being pushed. For example, if force pushes are not
+allowed, GitLab will call the IsAncestor RPC. That RPC call then wants
+to look at a commit that is in the process of being pushed. But
+because that commit is in quarantine, the RPC will fail because the
+commit cannot be found.
+
+### How GitLab passes the object quarantine information around
+
+To overcome this problem, the GitLab `pre-receive` hook [reads the
+object directory configuration from its
+environment](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/ruby/gitlab-shell/lib/object_dirs_helper.rb#L3),
+and passes this information [along with the HTTP API
+call](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/ruby/gitlab-shell/lib/gitlab_access.rb#L24).
+On the Rails side, we then [put the object directory information in
+the "request
+store"](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/api/internal/base.rb#L43)
+(i.e., request-scoped thread-local storage). And then during that
+Rails request, when Rails makes Gitaly requests on this repo, we send
+back the quarantine information [in the Gitaly `Repository`
+struct](https://gitlab.com/gitlab-org/gitlab/-/blob/f81f30c29a0edce20f6737fdccc3315c8baab9d1/lib/gitlab/gitaly_client/util.rb#L8-17).
+And finally, inside Gitaly, when we spawn a Git process, we [re-create
+the environment
+variables](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/internal/git/alternates/alternates.go#L21-34)
+that were present on the `pre-receive` hook, so that we can see the
+quarantined objects. We do the same when we [instantiate a
+Gitlab::Git::Repository in
+gitaly-ruby](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/ruby/lib/gitlab/git/repository.rb#L44).
+
+### Relative paths
+
+During the Gitaly migration we had to handle a complication with the
+object quarantine information: Git uses absolute paths for this. These
+paths get generated wherever `git receive-pack` runs, i.e., on the
+Gitaly server. During the migration, the repositories were also
+accessible via NFS at the Rails side, but at a different path. That
+meant that the absolute paths supplied by Git would be invalid part of
+the time.
+
+To work around this, the GitLab `pre-receive` hook [converts the
+absolute paths from Git into relative
+paths](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/ruby/gitlab-shell/lib/object_dirs_helper.rb#L16),
+relative to the repository directory. These relative paths then get
+passed around inside GitLab. At the time Gitaly recreates the object
+directory variables, it [converts the paths back from relative to
+absolute](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/internal/git/alternates/alternates.go#L23).