Welcome to mirror list, hosted at ThFree Co, Russian Federation.

object_quarantine.md « doc - gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
blob: afe21905205bbc931d52e5bb10b81741a55dbb48 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# Git object quarantine during Git push

While receiving a Git push, GitLab can reject pushes using the
`pre-receive` Git hook. Git has a special "object quarantine"
mechanism that allows it to eagerly delete rejected Git objects.

In this document we will explain how Git object quarantine works, and
how GitLab is able to see quarantined objects.

## Git object quarantine

Git object quarantine was introduced in Git 2.11.0 via
<https://gitlab.com/gitlab-org/git/-/commit/25ab004c53cdcfea485e5bf437aeaa74df47196d>.
To understand what it does we need to know how Git receives pushes on
the server.

### How Git receives a push

On a Git server, a push goes into `git receive-pack`. This process does the following things:

1. receive the Git objects pushed by the client and write them to disk
1. receive the ref update commands from the client and keep them in memory
1. check connectivity (no missing objects)
1. run `pre-receive` and feed it the intended ref update commands
1. if `pre-receive` rejects the push, clean up and stop
1. apply ref update commands one by one. For each command, run the `update` hook which can reject the ref update.
1. after all ref updates have been applied run the `post-receive` hook
1. report success to the client and end the session

Object quarantine exists for the sake of the cleanup that happens when
`pre-receive` rejects the push (step 5 above). It changes the _timing_ of the
cleanup. Without object quarantine, objects that were part of a
rejected push would sit around until `git gc` would judge them as both
unused and "old". How long that takes depends on how often `git gc`
runs (or `git prune`), and on the configuration of when objects are
"old". Because of object quarantine, rejected objects can be deleted
immediately: Git can just `rm -rf` the quarantine directory and
they're gone.

### Git implementation

The Git implementation of this mechanism rests on two things.

#### 1. Alternate object directories

The objects in a Git repository can be stored across multiple
directories: 1 main directory, usually `/objects`, and 0 or more
alternate directories. Together these act like a search path: when
looking for an object Git first checks the main directory, then each
alternate, until it finds the object.

#### 2. Config overrides via environment variables

Git can inject custom config into subprocesses via environment
variables. In the case of Git object directories, these are
`GIT_OBJECT_DIRECTORY` (the main object directory) and
`GIT_ALTERNATE_OBJECT_DIRECTORIES` (a search path of `:`-separated
alternate object directories).

#### Putting it all together

1. `git receive-pack` receives a push
1. `git receive-pack` [creates a quarantine directory `objects/incoming-$RANDOM`](https://gitlab.com/gitlab-org/git/-/blob/v2.24.0/builtin/receive-pack.c#L1715)
1. `git receive-pack` [configures the unpack process](https://gitlab.com/gitlab-org/git/-/blob/v2.24.0/builtin/receive-pack.c#L1721) to write objects into the quarantine directory
1. `git receive-pack` unpacks the objects into the quarantine directory
1. `git receive-pack` [runs the `pre-receive` hook](https://gitlab.com/gitlab-org/git/-/blob/v2.24.0/builtin/receive-pack.c#L1498) with special `GIT_OBJECT_DIRECTORY` and `GIT_ALTERNATE_OBJECT_DIRECTORIES` environment variables that add the quarantine directory to the search path
1. If the `pre-receive` hook rejects the push, `git receive-pack` removes the quarantine directory and its contents. The push is aborted.
1. If the `pre-receive` hook passes, `git receive-pack` [merges the quarantine directory into the main object directory](https://gitlab.com/gitlab-org/git/-/blob/v2.24.0/builtin/receive-pack.c#L1510).
1. `git receive-pack` enters the ref update transaction

Note that by the time the `update` hook runs, the quarantine directory
has already been merged into the main object directory so it no longer
matters. The same goes for the `post-receive` hook which runs even
later.

Because `pre-receive` has the special quarantine configuration data in
environment variables, any `git` process spawned by `pre-receive` will
inherit the quarantine config and will be able to see the objects that
are being pushed.

## GitLab and Git object quarantine

### Why does all this matter to GitLab

GitLab uses Git hooks, among other things, to implement features that
can reject Git pushes. For example, you can mark a branch as
"protected" in the GitLab web UI, and then certain types of users can
no longer push to that branch. That feature is implemented via the [Git
`pre-receive` hook](https://gitlab.com/gitlab-org/gitaly/-/blob/71d527f4f16c1f0e76793f055def0299b375cc7d/internal/gitaly/service/hook/pre_receive.go).

As mentioned above, Git object quarantine normally works more or less
automatically because `git` commands spawned by the `pre-receive` hook
inherit the special environment variables that contain the path to the
quarantine directory. In the case of GitLab's hooks we have a problem,
however, because the GitLab hooks are "dumb". All the GitLab hooks do
is take the inputs of the hook executable (the list of ref update
commands) and send them to the GitLab Rails internal API via a POST
request. The application logic that decides whether the push is
allowed resides in Rails. The hook just waits and reports back result
of the POST API request to GitLab.

During the POST, the internal GitLab API makes Gitaly calls back into the repo to
examine the objects being pushed. For example, if force pushes are not
allowed, GitLab will call the IsAncestor RPC. That RPC call then wants
to look at a commit that is in the process of being pushed. But
because that commit is in quarantine, the RPC will fail because the
commit cannot be found.

### How GitLab passes the object quarantine information around

To overcome this problem, the GitLab `pre-receive` hook [reads the
object directory configuration from its
environment](https://gitlab.com/gitlab-org/gitaly/-/blob/71d527f4f16c1f0e76793f055def0299b375cc7d/internal/gitlabshell/env.go#L9).
and passes this information [along with the HTTP API
call](https://gitlab.com/gitlab-org/gitaly/-/blob/71d527f4f16c1f0e76793f055def0299b375cc7d/internal/gitaly/hook/manager.go#L30-46).
On the Rails side, we then [put the object directory information in
the "request
store"](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/api/internal/base.rb#L43)
(i.e., request-scoped thread-local storage). And then during that
Rails request, when Rails makes Gitaly requests on this repo, we send
back the quarantine information [in the Gitaly `Repository`
struct](https://gitlab.com/gitlab-org/gitlab/-/blob/f81f30c29a0edce20f6737fdccc3315c8baab9d1/lib/gitlab/gitaly_client/util.rb#L8-17).
And finally, inside Gitaly, when we spawn a Git process, we [re-create
the environment
variables](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/internal/git/alternates/alternates.go#L21-34)
that were present on the `pre-receive` hook, so that we can see the
quarantined objects. We do the same when we [instantiate a
`Gitlab::Git::Repository` in
`gitaly-ruby`](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/ruby/lib/gitlab/git/repository.rb#L44).

### Relative paths

During the Gitaly migration we had to handle a complication with the
object quarantine information: Git uses absolute paths for this. These
paths get generated wherever `git receive-pack` runs, i.e., on the
Gitaly server. During the migration, the repositories were also
accessible via NFS at the Rails side, but at a different path. That
meant that the absolute paths supplied by Git would be invalid part of
the time.

To work around this, the GitLab `pre-receive` hook [converts the
absolute paths from Git into relative
paths](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/ruby/gitlab-shell/lib/object_dirs_helper.rb#L16),
relative to the repository directory. These relative paths then get
passed around inside GitLab. At the time Gitaly recreates the object
directory variables, it [converts the paths back from relative to
absolute](https://gitlab.com/gitlab-org/gitaly/-/blob/969bac80e2f246867c1a976864bd1f5b34ee43dd/internal/git/alternates/alternates.go#L23).