Age | Commit message (Collapse) | Author |
|
Migrate the temporary Git exec path set up by the Git command factory to
be created in the runtime directory.
Changelog: changed
|
|
Migrate the temporary hook directories set up by the Git command factory
to be created in the runtime directory.
Changelog: changed
|
|
Migrate the default location of internal sockets to be created in the
runtime directory.
Changelog: changed
|
|
In Gitaly, we're creating different kinds of files at runtime which are
required to operate correctly. These files are by default created in the
operating system's temporary directory, which is typically `/tmp`. While
it is clear that this directory can often be tmpfs and thus volatile,
this is perfectly fine: we regenerate the runtime data on every start
anyway.
Modern systems based on systemd use systemd-tmpfiles(8) though, which
also supports regular pruning of temporary files. So if the files we
create in `/tmp` aren't accessed for a specific grace period then the
daemon will clean those up. This problem becomes a lot worse though if
`/tmp` is mounted with the `noatime` mount option: even if files are
constantly used, systemd will eventually remove them anyway. Of course,
this completely breaks all parts of Gitaly which rely on these files:
hooks, the Git execution environment, and internal sockets.
The root cause for this problem is that Gitaly doesn't have a go-to
solution to host all such files, but instead it has ad-hoc solutions
for every new kind of file we need to exist at runtime. If we had that,
and if its location was configurable such that administrators can decide
themselves where to put them so that they don't get pruned, then this
problem wouldn't exist or at least be the responsibility of the admin.
This commit thus introduces a new runtime directory configuration into
Gitaly that is supposed to unify all current locations where we create
runtime files into a single well-defined location. This reduces the
problem we need to solve into a single one instead of creating the
problem anew for every new kind of runtime data.
By default, we're still kind of forced to create the runtime directory
in `/tmp`: except for the storage locations, it is the only location
known to be writeable by us. While we could try and abuse storage
locations, e.g. by just using the first storage as the location for the
runtime directory, this would put additional restrictions on the storage
paths which don't currently exist because we need to ensure short path
names so that sockets continue to work alright. But on systems where it
is known that `/tmp` will get regularly cleaned up, an administrator can
just point the new `runtime_dir` config to an arbitrary existing path,
which will then cover all runtime files.
Changelog: added
|
|
Update Danger config to use new type & subtype labels
See merge request gitlab-org/gitaly!4387
|
|
Fix error handling in GetTreeEntries
See merge request gitlab-org/gitaly!4414
|
|
|
|
housekeeping: Improve visibility into performed maintenance tasks
Closes #4103
See merge request gitlab-org/gitaly!4406
|
|
[ci skip]
|
|
[ci skip]
|
|
Makefile: Add more patches to speed up git-fetch(1) to v2.35.1.gl1
Closes #4104
See merge request gitlab-org/gitaly!4408
|
|
CreateRepositoryFromURL switch to mirror flag
See merge request gitlab-org/gitaly!4378
|
|
Changelog: added
|
|
Signed-off-by: Rémy Coutable <remy@rymai.me>
|
|
This rule was introduced in
https://gitlab.com/gitlab-org/ruby/gems/gitlab-dangerfiles/-/releases/v2.1.0
Signed-off-by: Rémy Coutable <remy@rymai.me>
|
|
Signed-off-by: Rémy Coutable <remy@rymai.me>
|
|
The rule was introduced in
https://gitlab.com/gitlab-org/ruby/gems/gitlab-dangerfiles/-/releases/v2.9.0
Signed-off-by: Rémy Coutable <remy@rymai.me>
|
|
Use features introduced in
https://gitlab.com/gitlab-org/ruby/gems/gitlab-dangerfiles/-/releases/v2.6.0
Signed-off-by: Rémy Coutable <remy@rymai.me>
|
|
Signed-off-by: Rémy Coutable <remy@rymai.me>
|
|
housekeeping: Don't prune recent objects
See merge request gitlab-org/gitaly!4410
|
|
When running git-gc(1), Git automatically executes git-prune(1) with a
grace period of two weeks: only when objects are older than that grace
period will they be pruned. This is really important to fix a race
condition with concurrent commands, which all will first write new
objects into the repository before making them reachable via a ref
update. This means that there is a brief period between writing the new
objects and making them reachable. If objects would be pruned without a
grace period, it could kick in right before updating the references and
thus cause repository corruption. The two-weeks grace period bridges
this gap by giving enough head room to finish the transaction.
With the recent introduction of the heuristics-based OptimizeRepository
RPC we have also started to use git-prune(1). The assumption was that
the two-weeks grace period was the default of git-prune(1), but as it
turns out it is only the default of git-gc(1). So because we are now
running git-prune(1) without any parameters, we accidentally started to
delete all unreachable objects, even if they had just been created.
Curiously, this bug only started to surface as we were migrating Rails
to use OptimizeRepository. Seemingly, because we previously only were
executing the RPC as part of our nightly job, the logic didn't trigger
in the right moment and thus never caused any (known) problems. Still,
this is a serious bug.
Fix this issue by passing `--expire=two.weeks.ago` to git-prune(1).
Changelog: fixed
|
|
repository: Fix indeterministic voting when creating new repos
Closes #4100
See merge request gitlab-org/gitaly!4402
|
|
into 'master'
operations: Fix wrong error code when UserSquash conflicts
See merge request gitlab-org/gitaly!4403
|
|
The rollout of Git v2.35.1.gl1 is currently blocked in production
because we have discovered various repositories where broken references
cause errors with the new version. So right now, the new Git version is
still disabled in all deployments, also because the feature flag is
default-disabled, as well.
This blockade gives us a last chance to sneak in some patches for Git
v2.35.1.gl1 which have landed in `next` meanwhile. Missing this chance
would mean that we'd have to wait a few more releases until we can land
another revision, or otherwise we'd be running with three different Git
versions in production at the same time: the current Git v2.33.1.gl3,
then Git v2.35.1.gl1, and finally the new version we'd be about to
introduce. So let's take this chance and slightly bend the rollout
process of new Git versions.
Apply commits from 60aae8731c (Merge branch 'ps/fetch-mirror-optim' into
next, 2022-03-08) into Git v2.35.1.gl1. These patches optimize fetches
by:
- Patch 1/5 starts to look up up "want" lines via the commit-graph,
resulting in a 7% speedup.
- Patch 2/5 avoids repeated lookups of commits that were only
required when writing to FETCH_HEAD, providing a 8% speedup.
- Patches 3-5/5 start to skip reading packed-refs when searching for
symbolic refs in the context of a fetch. This provides a 13%
speedup.
All benchmarks have been performed in www-gitlab-com.
Changelog: performance
|
|
With OptimizeRepository becoming our new RPC which performs all
repository maintenance in one place it's harder to tell what exactly is
going on without proper metrics. One bit of information that gets lost
is whether repacks are incremental or full repacks, which is quite an
important distinction due to the impact on latency.
Expose this information both via logs and via Prometheus to keep better
track of it. While at it, alos exxpose information about whether we are
writing bitmaps or not.
Changelog: added
|
|
Expose the number of stale files we have pruned via a Prometheus metric
so that it's easier to see what kinds of garbage we frequently need to
clean up.
Changelog: added
|
|
We expose a Prometheus counter that is counting how many empty
directories we have pruned. This is a broader concept in our
housekeeping code, where we also prune other kinds of stale files.
Generalize the counter into a counter vector such that we can reuse the
same counter for all the different types of data we prune. While this
breaks the metric in case it was used anywhere, there are no references
to this counter across the complete GitLab group. Furthermore, we
haven't ever guaranteed backwards compatibility for metrics anyway.
Changelog: changed
|
|
While we have a central manager component which is supposed to hold all
state related to housekeeping, we missed to migrate one of our metrics
into it.
Migrate it to get rid of one more global variable.
|
|
Because we may be deleting files which cause reference directories to
become empty, pruning of empty reference directories needs to happen
last in our housekeeping tasks. Because of this we also don't log info
about pruned references in the log message which reports all the other
removals.
Refactor the code to return the number of pruned empty reference dirs
from `removeRefEmptyDirs()` such that we can log them in the calling
function with all the other cleanups.
|
|
The test which verifies that we correctly prune specific files is using
a subfunction to execute subtests, which makes the test hard to extend.
Refactor it by inlining the subtests.
|
|
In 889450266 (ci: Run tests as unprivileged user, 2022-01-13) we have
converted tests to run as unprivileged user. Back then we forgot to also
adjust the Coverage job though, which is still running as a privileged
user.
Convert the job to also run tests unprivileged. This fixes an upcoming
test failure we're about to introduce where housekeeping tasks remove a
file that it shouldn't be able to because of a lack of permissions.
|
|
When creating repositories we use transactional voting to determine that
the repositories have been created the same on all nodes part of the
transaction. This voting happens after we have seeded the repository,
and the vote is computed by walking through the repository's directory
and hashing all its files. We need to be careful though to skip files
which we know to be indeterministic:
- FETCH_HEAD may contain URLs which are different for each of the
nodes.
- Object packfiles contained in the object database are not
deterministic, mostly because it may use multiple threads to
compute deltas.
Luckily, we do not have to rely on either of both types of files in
order to ensure that the user-visible state of the repository is the
same, so we can indeed just skip them.
While we already have the logic to skip these files, this logic didn't
work alright because we embarassingly forgot to actually return
`fs.SkipDir` in case we see the object directory. So even though we
thought we skipped these files, in reality we didn't.
This bug has been manifesting in production in form of CreateFork, which
regularly fails to reach quorum at random on a subset of nodes. The root
cause here is that we use git-clone(1) to seed repository contents of
the fork, which triggers exactly the case of indeterministic packfiles
noted above. So any successful CreateFork RPC call really only succeeded
by pure luck.
Fix this issue by correctly skipping over "object" directories. While at
it, fix how we skip over FETCH_HEAD by returning `nil`: it's a file and
not a directory, so it doesn't make much sense to return `fs.SkipDir`.
Changelog: fixed
|
|
repository: Add updateHeadFromBundle in CreateRepositoryFromBundle
Closes #4086
See merge request gitlab-org/gitaly!4401
|
|
|
|
[ci skip]
|
|
Extend invalid metadata deletion logic to repos existin on target
Closes #4083
See merge request gitlab-org/gitaly!4396
|
|
With the new improved error hadnling in UserSquash we're now returning
errors in some cases where we previously didn't. One of those cases is
when the rebase performed during the squash results in a merge conflict.
While it is correct to return an error in this case, we're using an
Internal error code for this case, which indicates that Gitaly is to
blame instead of the parameters which have been passed by the user.
Fix the error code to instead be FailedPrecondition. This error code is
special-cased by our monitoring infrastructure to not raise any alerts.
Note that this change is only fixing issues with monitoring: Rails
handles the error alright by inspecting the error details instead of the
error code.
Changelog: fixed
|
|
[ci skip]
|
|
proto: Add structured error types for UserRebaseCofirmable
See merge request gitlab-org/gitaly!4382
|
|
This commit introduces these changes by creating a new
UserRebaseConfirmableError Protobuf message which contains all potential
structured errors we want to return from the UserSquash RPC.
Changelog: added
|
|
Disable implicit pool creation on link behind a feature flag
See merge request gitlab-org/gitaly!4397
|
|
[ci skip]
|
|
doc: Document supported Git execution envinronments
See merge request gitlab-org/gitaly!4392
|
|
operations: Implement structured errors for UserSquash
See merge request gitlab-org/gitaly!4374
|
|
Document the different ways to access Git installations supported by
Gitaly. Most importantly, this also documents the way our new bundled
Git binaries work and why they were introduced.
|
|
[ci skip]
|
|
Add squash parameter to git2go merge
See merge request gitlab-org/gitaly!4241
|
|
cgroups: Remove paths field
See merge request gitlab-org/gitaly!4398
|
|
repository: allow CreateRepository to take default_branch
See merge request gitlab-org/gitaly!4385
|
|
The paths field on the CgroupV1Manager was being accessed concurrently,
leading to a panic. However, this field is not actually used by
anything. Fix this issue by removing the field.
Changelog: fixed
|