Age | Commit message (Collapse) | Author |
|
By default, a repository can spawn processes in at most one cgroups. If
the number of repositories is more than the number of cgroups (likely),
multiple repositories share the same one. This model works very well if
the repositories under the management of Cgroup are equivalent in size
or traffic. If a node has some enormous repositories (mono-repo, for
example), the scoping cgroups become excessively large compared to the
rest. This imbalance situation might force the operators to lift the
repository-level cgroups. As a result, the isolation effect is not
effective.
This commit allows a repository to use up to M repository cgroups
instead of one. This feature is designed to balance resource usage
between cgroups, mitigate competition for resources within a single
cgroup, and enhance memory usage efficiency and isolation. The value can
be adjusted based on the specific workload and number of repository
cgroups on the node. A Git process uses its target repository's relative
path as the hash key to find the corresponding cgroup. It is allocated
randomly to any of the consequent AllocationCount cgroups. It wraps
around if needed.
|
|
Currently we block on startup while pruning cgroups from old Gitaly
processes. However, stale cgroups have no impact on Gitaly's other
startup tasks as we namespace them by pid. We can safely make this a
background task so that critical startup tasks can move ahead unimpeded.
On a host with 1000 repo cgroups and teardown is ~1ms per cgroup, this
improves startup time by roughly 15%. On our hosts where cgroup teardown
is closer to 20ms this will have a much larger impact.
Benchmark 1: ./gitaly-5b092369 serve config.toml
Time (mean ± σ): 632.7 ms ± 150.7 ms [User: 473.7 ms, System: 226.7 ms]
Range (min … max): 461.9 ms … 868.1 ms 10 runs
Benchmark 2: ./async-prune serve config.toml
Time (mean ± σ): 549.3 ms ± 127.9 ms [User: 464.6 ms, System: 223.4 ms]
Range (min … max): 427.1 ms … 754.6 ms 10 runs
Summary
./async-prune serve config.toml ran
1.15 ± 0.38 times faster than ./gitaly-5b092369 serve config.toml
|
|
To mitigate the impact of removing `tableflip` we need both startup and
shutdown of Gitaly processes to be as fast as possible. With 105f6dd816
(cgroups: Create repository cgroups on-demand, 2023-10-26) we improved
startup times by creating cgroups individually when needed, rather than
as a blocking task during startup.
We currently block shutdown while removing our existing
cgroups. This is as slow as cgroup creation, taking up to 20 seconds to
remove 1000 cgroups on hosts with large number cgroups.
`Cleanup` is equivalent to an eager `PruneOldCgroups` scoped to the
current processes's cgroups. However, there is no urgent need to remove
cgroups immediately, so long as we ensure they don't build up
excessively. We can rely on the eventual cleanup from `PruneOldCgroups`
run by the next Gitaly process, which allows us to avoid delaying
shutdown.
Stop using `Cleanup` and remove it from the `cgroup.Manager` interface.
|
|
In https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6508, we
implemented a fix to ignore highly evictable Page Caches from cgroup
memory threshold. The used metric was parent cgroup's `inactive_file`.
Unfortunately, that metric reflects the inactive Page Caches of direct
processes inside the parent cgroup. It doesn't account for the indirect
processes in children repository cgroups. We don't spawn any process in
the parent cgroup. So, the fix was useless.
In Cgroup V1, `memory.stat` has `total_inactive_file`. It's exactly what
we are looking for. In Cgroup V2, `inactive_file` includes all of its
substree consumptions. So, we can keep using that field.
|
|
Now that all version-specific tests have been converted to unified tests
in `handler_linux_test.go`, move the remaining helper functions into
that file as well. Remove the version-specific test files as they are
now empty.
|
|
Combine the v1 and v2 versions of `TestStats()` to a single test.
Here we have the opposite of `TestPruneOldCgroups()`, the test cases
are unique to v1 or v2, but the assertions are version agnostic. Add a
`version` field to the test cases, rather than iterating over each case
with each version as we did with the other tests.
|
|
Combine the v1 and v2 copies of `TestPruneOldCgroups` into a single
test. We are unable to deduplicate most of the actual assertions as they
rely heavily on the subcomponent paths. This is still a worthwhile
change as the test cases can still be shared.
|
|
Convert the separate v1 and v2 `TestMetrics` versions to a single test.
|
|
Convert the v1 and v2 versions of TestCleanup to a single unified test.
|
|
Convert the v1 and v2 versions of `TestAddCommand` to a single shared
test.
|
|
Convert the v1 and v2 copies of `TestSetup_RepoCgroups` to a single test
that covers both. We drop the `Setup` from the test name as repository
cgroups are no longer created during setup.
|
|
Coalesce `TestSetup_ParentCgroups()` and `TestSetup_ParentCgroupsV2()`
into a single test that handles both v1 and v2 cgroups.
Note that the `requireCgroupComponents()` helper introduced slightly
changes the conditions checked versus the v1 test. The v2 manager
requires that `setupMockCgroupFiles()` be executed before running
`Setup()`. The v1 test used `requireCgroup()`, which would verify that a
file was _not_ present, which is no longer be the case now that we have
setup the cgroup files. We switch to `requireCgroupWithInt()` to avoid
this failure.
|
|
Rename `TestNewManager` to `TestNewManagerNoop` so that we can use the
more generic name for our combined test. Unify the separate manager
tests for v1 and v2 cgroups into a single `TestNewManager` test.
|
|
Introduce a `mockCgroup` interface to allow tests to create a v1 or v2
mock depending on a test parameter, rather than duplicating the entire
test.
This requires accessing the `root` path via the new `rootPath()` method,
and a convenience method for calculating the mock's repository cgroup
paths is also added.
|
|
We will shortly be creating an interface to cover the mock cgroups for
v1 and v2. Rename `mockCgroup` to `mockCgroupV1` to clear the way.
|
|
On systems with very large numbers of existing cgroups, initializing
Gitaly's repository cgroups may take upwards of 20 seconds. With
`tableflip` this delay has no impact as the existing process is still
present to take traffic until the new one finishes bootstrapping.
However, we need to remove `tableflip` as the WAL does not allow
concurrent Gitaly processes accessing the same repository.
Create a given repository cgroup on-demand the first time a
process needs to be added to it.
Changelog: performance
|
|
Update the `CloneIntoCgroup` method to check if the target cgroup is
present and create it if not.
This also updates the method to be clear that the file handle we open is
for the target cgroup directory, not a file. From clone(2):
This file descriptor can be obtained by opening a cgroup v2
directory using either the O_RDONLY or the O_PATH flag.
|
|
We will shortly start creating cgroups on-demand, rather than up-front
as part of Gitaly's start up process.
Hoist the loop over repository cgroups into the manager's `Collect`
method, passing the path into the version-specific handlers. This allows
us to avoid leaking the `cgroupLock` abstraction outside of the manager.
|
|
We will shortly start creating repository cgroups on-demand, rather than
at startup. This will require synchronization as we may have concurrent
requests hitting the same cgroup. Racing requests may cause spurious
failures and waste system resources.
Add the `setupRepoCgroup` and `setupRepoCgroupAndAdd` methods to create
individual repo cgroups and take a `cgroupLock`. These provides
exclusive access when creating the cgroup. Also add a map which is
pre-populated with initialized `cgroupLock`s for every available value.
|
|
Currently we generate the `reposResources` value via the
`configRepositoryResources()` function on the `CGroupManager`. This
works well when we create all of the repository cgroups in one
batch, but we will shortly move to creating them on-demand.
To avoid regenerating this struct for each of the N repository cgroups
defined, create it once during construction of the `CgroupManger` and
store it in the struct.
|
|
This commit adds some more essential fields collected from `memory.stat`
file. There are two differences between Cgroup V1 and V2:
- `rss` in V1 is named `anon` in V2.
- `cache` in V1 is named `file` in V2.
The names in V2 reflect the meanings of underlying metrics better. Thus,
the reported stats use V2's names.
|
|
We duplicate the code to load and format errors for cgroups several
times in the handlers. Deduplicate these into a `loadCgroup` method.
|
|
We reuse the same command many times in the cgroup tests.
Move the command arguments to a variable to make it clear we're running
the same thing every time.
|
|
We repeatedly calculate the cgroup ID for command in our tests.
Deduplicate these instances into a helper method.
|
|
Update the name of the first argument to `setupParent` in the interface
definition to match the implementations.
|
|
This commit exposes a method from the cgroup manager to configure a
command directly into a cgroup. This is done by configuring the cgroup
in the SysProcAttr so the command gets spawned directly in the cgroup.
The functionality is only available with cgroups version 2 and kernel
version 5.7 or later.
|
|
|
|
Remove the `Warnf()` function from the `Logger` interface. Please refer
to the preceding commit that removes `Debugf()` for the motivation
behind this change.
|
|
Stop using logrus' `NullLogger` in favor of our own internal logger that
can be set up with the testhelper package.
|
|
Introduce a wrapper for the logrus logging infrastructure. This is a
first step towards abstracting away the actual implementation of our
logs behind a type that we can iterate on and will thus pave the way
towards adoption of the `slog` package.
Note that this is only one step of this transition. The interface does
not yet roundtrip to itself, e.g. calling `log.Logger.WithError()` will
result in a `logrus.Entry`, not in a `log.Logger`. This will be handled
at a later point.
|
|
This commit wires the tests to call SharedLogger instead of NewLogger
so they'll use the test case's shared logger. This way the log output
is printed to the same logger and is ordered.
|
|
Gitaly now has a helper for recording log output and outputting it
only in tests. Replace the usage of the discarding logger with the
recording logger. These changes are search and replace so these call
sites didn't even need a log entry to begin with but rather use a
FieldLogger.
NewDiscardingLogEntry is removed as it is no longer used.
|
|
Our style guide states that the `TestMain` function should be placed in
a file named `testhelper_test.go` for consistency's sake. A number of
packages were placing the test in a standard test file.
In cases where `TestMain` was the only function in that file rename it
to `testhelper_test.go`. If there were other functions, relocate
`TestMain` into a new `testhelper_test.go` file.
|
|
We're using the global logging instance provided by `log.Default()`,
which is about to go away soon. Refactor the code to instead pass in a
proprely configured logging instance.
|
|
The cgroups managers use the global logrus logger to log data, which
is discouraged. Convert the code to instead inject a logger so that we
can use a properly configured one.
|
|
limiter: Implement Cgroup resource watchers
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6129
Merged-by: Justin Tobler <jtobler@gitlab.com>
Approved-by: Justin Tobler <jtobler@gitlab.com>
Reviewed-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
Reviewed-by: John Cai <jcai@gitlab.com>
Reviewed-by: Justin Tobler <jtobler@gitlab.com>
Co-authored-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
|
|
cgroup: using a noop manager on linux without cgroup
Closes #4511
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6150
Merged-by: karthik nayak <knayak@gitlab.com>
Approved-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
Approved-by: karthik nayak <knayak@gitlab.com>
Reviewed-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
Reviewed-by: karthik nayak <knayak@gitlab.com>
Co-authored-by: ZheNing Hu <adlternative@gmail.com>
|
|
This commit implements Stats() function to cgroup managers. This
function collects the usage statistics of the cgroups managed by the
cgroup manager. This information is fetched from cgroupfs files,
especially `cpu.stat`, `memory.events`, and some `memory.*` files.
|
|
This commit adds the Ready() function to the cgroup manager. This
function is to check if the manager is configured properly and ready to
use.
|
|
Cgroup testing suite has an utility to mock underlying cgroupfs files.
This mocking utility hard-codes the values or accepts a very limited set
of configs. This commits adds a map that allows this utility to mock
custom values of files.
|
|
Due to the lack of consideration for non-cgroup scenarios
on Linux systems, Gitaly exits with the error message
"unknown cgroup version", which hinders the usage for
these Linux users. Therefore, using noop implementation
of the cgroup manager for Linux systems without cgroup
is necessary to solve this issue.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
|
|
In https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5547, we
added the support for CgroupV2. It also introduced some linting
offenses on MacOS. They are mostly about unused functions or types. On
Linux, everything is fine. But on MacOS, files with specific build tags
are not included.
|
|
As the metrics detected by Prometheus in cgroups V1 and V2
are basically the same but slightly different, the initialization
of all V1/V2 metrics is put into metrics.go. This makes the
code cleaner and easier to observe the differences between
the detection meterics of V1 and V2.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
|
|
Due to the lack of cgroup V2 support in Gitaly, some operating systems
that use cgroup V2 may not be able to use cgroups in Gitaly properly.
Therefore, this patch adds support for cgroup V2 in Gitaly, which maintains
a similar interface to cgroup V1 but removes the
"gitaly_cgroup_memory_reclaim_attempts_total" metric and modifies the label
of the "gitaly_cgroup_procs_total" metric compared to V1.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
|
|
We're preparing to add V2 implementation for cgroup. Since
cgroup V1 and V2 have similar implementation logic, we've
extracted the common logic into the cgroupHandler interface,
and implemented CGroupManager that satisfies the original
Manager interface, which internally uses the cgroupHandler
interface. Currently, only V1 implementation is available.
To be compatible with non-Linux environments, we've added
a Noop CgroupManger implementation in manager.go by specifying
a build tag for non-Linux operating systems. We've also
ensured that the existing multiple test files can only run
on Linux by specifying the linux operating system in the build tags.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
|
|
Update the cgroups dependency to major version 3. This change includes a
bunch of renames where upstream moved the cgroups implementations into
different packages. Adapt our code accordingly.
|
|
We're about to release Gitaly v16.0. As we've landed a bunch of
previously announced removals it's thus time to bump our Go module
version from v15 to v16.
|
|
What
---
- Add a new configuration under `cgroups` called `cpu_quota_us` to
configure `cfs_quota_us` for the parent cgroup
https://docs.kernel.org/scheduler/sched-bwc.html?highlight=cfs_quota_us
- Add a new configuration under `cgroups.repositories` called
`cpu_quota_us` to configure `cfs_quota_us` for the repository cgroup
https://docs.kernel.org/scheduler/sched-bwc.html?highlight=cfs_quota_us
- Add metrics
- `gitaly_cgroup_cpu_cfs_periods_total`: Read from `cpu.stat` nr_periods https://docs.kernel.org/scheduler/sched-bwc.html#statistics
- `gitaly_cgroup_cpu_cfs_throttled_periods_total`: Read from `cpu.stat` nr_throttled https://docs.kernel.org/scheduler/sched-bwc.html#statistics
- `gitaly_cgroup_cpu_cfs_throttled_seconds_total`: Read from `cpu.stat` throttled_time https://docs.kernel.org/scheduler/sched-bwc.html#statistics
- Add more test coverage when only specific values are set.
Why
---
At the moment we limit memory and CPU via
[`cpu.shares`](https://kernel.googlesource.com/pub/scm/linux/kernel/git/glommer/memcg/+/cpu_stat/Documentation/cgroups/cpu.txt)
which will only throttle a cgroup when there is contention on the CPU.
This means that potentially a single repository can still hog all of the
CPU on a gitaly node. We've seen a case of this in
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8318, a
single repository saturated the CPU, and the scheduler couldn't balance
the CPU for other tasks/requests to be scheduled.
We hoped CPU shares would be enough, but we need an upper CPU quota for
gitaly cgroups so no single repository can fully saturate the CPU.
There are a few concerns that are addressed
Concern 1: cfs_period_us
`cfs_period_us` is used to calculate the `cfs_quota_us` (what we are
setting now), the default value seems to be
[hardcoded](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/kernel/sched/fair.c?h=v5.15.92#n5492)
but the Linux kernel but this can be updated, so Gitaly is explicitly
settings this to 100ms (default value)
Concern 2: not using `cfs_burst_us`
This could allow for CPU bursts, even when they exceed the
`cfs_quota_us`, we don't set this because it's available on the newer kernel
versions (5.15). The way users can avoid throttling is by
oversubscribing `cfs_quota_us`
Concern 3: Wasting available resources
When the user sets these we'll be artificially limiting the CPU that they
consume, this can leave performance on the table when a repository is
using all its quota, and no other process is using the CPU. This is the
only drawback and one we are willing to take since it adds more
reliability in the long run. We can reduce the effect of this by oversubscribing.
Concern 4: Observability
The kernel already exports
[stats](https://docs.kernel.org/scheduler/sched-bwc.html#statistics)
which Gitaly exposes as, and also
[cadvisor](https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md#prometheus-container-metrics)
Reference: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/17332
Changelog: added
Signed-off-by: Steve Azzopardi <sazzopardi@gitlab.com>
|
|
Bulk update all file and executable permissions to use the new perm
package.
Strictly speaking changing os.ModePerm to perm.PublicFile is a
permission change (it goes from executable to normal), but since
ModePerm is only used in tests this should be safe.
|
|
There's a bunch of generic packages that really don't depend much on the
repository object format. Convert them to test with the SHA256 object
format.
|