Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/housekeeping.md')
-rw-r--r--doc/administration/housekeeping.md59
1 files changed, 59 insertions, 0 deletions
diff --git a/doc/administration/housekeeping.md b/doc/administration/housekeeping.md
index 0209f97bd31..c9b5784fa68 100644
--- a/doc/administration/housekeeping.md
+++ b/doc/administration/housekeeping.md
@@ -20,6 +20,65 @@ Do not manually execute Git commands to perform housekeeping in Git
repositories that are controlled by GitLab. Doing so may lead to corrupt
repositories and data loss.
+## Housekeeping strategy
+
+Gitaly can perform housekeeping tasks in a Git repository in two ways:
+
+- [Eager housekeeping](#eager-housekeeping) executes specific housekeeping tasks
+ independent of the state a repository is in.
+- [Heuristical housekeeping](#heuristical-housekeeping) executes housekeeping
+ tasks based on a set of heuristics that determine what housekeeping tasks need
+ to be executed based on the repository state.
+
+### Eager housekeeping
+
+The "eager" housekeeping strategy executes housekeeping tasks in a repository
+independent of the repository state. This is the default strategy as used by the
+[manual trigger](#manual-trigger) and the [push-based trigger](#push-based-trigger).
+
+The eager housekeeping strategy is controlled by the GitLab application.
+Depending on the trigger that caused the housekeeping job to run, GitLab asks
+Gitaly to perform specific housekeeping tasks. Gitaly performs these tasks even
+if the repository is in an optimized state. As a result, this strategy can be
+inefficient in large repositories where performing the housekeeping tasks may
+be slow.
+
+### Heuristical housekeeping
+
+> - [Introduced](https://gitlab.com/gitlab-org/gitaly/-/issues/2634) in GitLab 14.9 for the [manual trigger](#manual-trigger) and the [push-based trigger](#push-based-trigger) [with a flag](feature_flags.md) named `optimized_housekeeping`. Disabled by default.
+> - [Enabled on GitLab.com](https://gitlab.com/gitlab-org/gitlab/-/issues/353607) in GitLab 14.10.
+
+FLAG:
+On self-managed GitLab, by default this feature is not available for the [manual trigger](#manual-trigger) and the [push-based trigger](#push-based-trigger).
+To make it available, ask an administrator to [enable the feature flag](feature_flags.md) named `optimized_housekeeping`.
+
+The heuristical (or "opportunistic") housekeeping strategy analyzes the
+repository's state and executes housekeeping tasks only when it finds one or
+more data structures are insufficiently optimized. This is the strategy used by
+[scheduled housekeeping](#scheduled-housekeeping). It can optionally be enabled
+for the [manual trigger](#manual-trigger) and the [push-based trigger](#push-based-trigger)
+by enabling the `optimized_housekeeping` feature flag.
+
+Heuristical housekeeping uses the following information to decide on the tasks
+it needs to run:
+
+- The number of loose and stale objects.
+- The number of packfiles that contain already-compressed objects.
+- The number of loose references.
+- The presence of a commit-graph.
+
+The decision whether any of the analyzed data structures need to be optimized is
+based on the size of the repository:
+
+- Objects are repacked frequently the bigger the total size of all objects.
+- References are repacked less frequently the more references there are in
+ total.
+
+Gitaly does this to offset the fact that optimizing those data structures takes
+more time the bigger they get. It is especially important in large
+monorepositories (which receive a lot of traffic) to avoid optimizing them too
+frequently.
+
## Running housekeeping tasks
There are different ways in which GitLab runs housekeeping tasks: