Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-11-22cgroup: Allow a repository to use up to M repository cgroupsQuang-Minh Nguyen
By default, a repository can spawn processes in at most one cgroups. If the number of repositories is more than the number of cgroups (likely), multiple repositories share the same one. This model works very well if the repositories under the management of Cgroup are equivalent in size or traffic. If a node has some enormous repositories (mono-repo, for example), the scoping cgroups become excessively large compared to the rest. This imbalance situation might force the operators to lift the repository-level cgroups. As a result, the isolation effect is not effective. This commit allows a repository to use up to M repository cgroups instead of one. This feature is designed to balance resource usage between cgroups, mitigate competition for resources within a single cgroup, and enhance memory usage efficiency and isolation. The value can be adjusted based on the specific workload and number of repository cgroups on the node. A Git process uses its target repository's relative path as the hash key to find the corresponding cgroup. It is allocated randomly to any of the consequent AllocationCount cgroups. It wraps around if needed.
2023-11-21gitaly: Prune old cgroups in backgroundWill Chandler
Currently we block on startup while pruning cgroups from old Gitaly processes. However, stale cgroups have no impact on Gitaly's other startup tasks as we namespace them by pid. We can safely make this a background task so that critical startup tasks can move ahead unimpeded. On a host with 1000 repo cgroups and teardown is ~1ms per cgroup, this improves startup time by roughly 15%. On our hosts where cgroup teardown is closer to 20ms this will have a much larger impact. Benchmark 1: ./gitaly-5b092369 serve config.toml Time (mean ± σ): 632.7 ms ± 150.7 ms [User: 473.7 ms, System: 226.7 ms] Range (min … max): 461.9 ms … 868.1 ms 10 runs Benchmark 2: ./async-prune serve config.toml Time (mean ± σ): 549.3 ms ± 127.9 ms [User: 464.6 ms, System: 223.4 ms] Range (min … max): 427.1 ms … 754.6 ms 10 runs Summary ./async-prune serve config.toml ran 1.15 ± 0.38 times faster than ./gitaly-5b092369 serve config.toml
2023-11-21gitaly: Don't cleanup cgroups on exitWill Chandler
To mitigate the impact of removing `tableflip` we need both startup and shutdown of Gitaly processes to be as fast as possible. With 105f6dd816 (cgroups: Create repository cgroups on-demand, 2023-10-26) we improved startup times by creating cgroups individually when needed, rather than as a blocking task during startup. We currently block shutdown while removing our existing cgroups. This is as slow as cgroup creation, taking up to 20 seconds to remove 1000 cgroups on hosts with large number cgroups. `Cleanup` is equivalent to an eager `PruneOldCgroups` scoped to the current processes's cgroups. However, there is no urgent need to remove cgroups immediately, so long as we ensure they don't build up excessively. We can rely on the eventual cleanup from `PruneOldCgroups` run by the next Gitaly process, which allows us to avoid delaying shutdown. Stop using `Cleanup` and remove it from the `cgroup.Manager` interface.
2023-11-17limiter: Ignore `total_inactive_file` instead of `inactive_file`Quang-Minh Nguyen
In https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6508, we implemented a fix to ignore highly evictable Page Caches from cgroup memory threshold. The used metric was parent cgroup's `inactive_file`. Unfortunately, that metric reflects the inactive Page Caches of direct processes inside the parent cgroup. It doesn't account for the indirect processes in children repository cgroups. We don't spawn any process in the parent cgroup. So, the fix was useless. In Cgroup V1, `memory.stat` has `total_inactive_file`. It's exactly what we are looking for. In Cgroup V2, `inactive_file` includes all of its substree consumptions. So, we can keep using that field.
2023-11-13cgroups: Move remaining test functions to handlerWill Chandler
Now that all version-specific tests have been converted to unified tests in `handler_linux_test.go`, move the remaining helper functions into that file as well. Remove the version-specific test files as they are now empty.
2023-11-13cgroups: Unify TestStatsWill Chandler
Combine the v1 and v2 versions of `TestStats()` to a single test. Here we have the opposite of `TestPruneOldCgroups()`, the test cases are unique to v1 or v2, but the assertions are version agnostic. Add a `version` field to the test cases, rather than iterating over each case with each version as we did with the other tests.
2023-11-13cgroups: Unify TestPruneOldCgroupsWill Chandler
Combine the v1 and v2 copies of `TestPruneOldCgroups` into a single test. We are unable to deduplicate most of the actual assertions as they rely heavily on the subcomponent paths. This is still a worthwhile change as the test cases can still be shared.
2023-11-13cgroups: Unify TestMetricsWill Chandler
Convert the separate v1 and v2 `TestMetrics` versions to a single test.
2023-11-13cgroups: Unify TestCleanupWill Chandler
Convert the v1 and v2 versions of TestCleanup to a single unified test.
2023-11-13cgroups: Unify TestAddCommandWill Chandler
Convert the v1 and v2 versions of `TestAddCommand` to a single shared test.
2023-11-13cgroups: Unify TestSetup_RepoCgroupsWill Chandler
Convert the v1 and v2 copies of `TestSetup_RepoCgroups` to a single test that covers both. We drop the `Setup` from the test name as repository cgroups are no longer created during setup.
2023-11-13cgroups: Unify TestSetup_ParentCgroupsWill Chandler
Coalesce `TestSetup_ParentCgroups()` and `TestSetup_ParentCgroupsV2()` into a single test that handles both v1 and v2 cgroups. Note that the `requireCgroupComponents()` helper introduced slightly changes the conditions checked versus the v1 test. The v2 manager requires that `setupMockCgroupFiles()` be executed before running `Setup()`. The v1 test used `requireCgroup()`, which would verify that a file was _not_ present, which is no longer be the case now that we have setup the cgroup files. We switch to `requireCgroupWithInt()` to avoid this failure.
2023-11-13cgroups: Unify tests for TestNewManagerWill Chandler
Rename `TestNewManager` to `TestNewManagerNoop` so that we can use the more generic name for our combined test. Unify the separate manager tests for v1 and v2 cgroups into a single `TestNewManager` test.
2023-11-13cgroups: Introduce mockCgroup interfaceWill Chandler
Introduce a `mockCgroup` interface to allow tests to create a v1 or v2 mock depending on a test parameter, rather than duplicating the entire test. This requires accessing the `root` path via the new `rootPath()` method, and a convenience method for calculating the mock's repository cgroup paths is also added.
2023-11-13cgroups: Rename mockCgroup to mockCgroupV1Will Chandler
We will shortly be creating an interface to cover the mock cgroups for v1 and v2. Rename `mockCgroup` to `mockCgroupV1` to clear the way.
2023-11-08cgroups: Create repository cgroups on-demandWill Chandler
On systems with very large numbers of existing cgroups, initializing Gitaly's repository cgroups may take upwards of 20 seconds. With `tableflip` this delay has no impact as the existing process is still present to take traffic until the new one finishes bootstrapping. However, we need to remove `tableflip` as the WAL does not allow concurrent Gitaly processes accessing the same repository. Create a given repository cgroup on-demand the first time a process needs to be added to it. Changelog: performance
2023-11-08cgroups: Check if cgroup exists in CloneIntoCgroupWill Chandler
Update the `CloneIntoCgroup` method to check if the target cgroup is present and create it if not. This also updates the method to be clear that the file handle we open is for the target cgroup directory, not a file. From clone(2): This file descriptor can be obtained by opening a cgroup v2 directory using either the O_RDONLY or the O_PATH flag.
2023-11-08cgroups: Update collect to check if cgroup existsWill Chandler
We will shortly start creating cgroups on-demand, rather than up-front as part of Gitaly's start up process. Hoist the loop over repository cgroups into the manager's `Collect` method, passing the path into the version-specific handlers. This allows us to avoid leaking the `cgroupLock` abstraction outside of the manager.
2023-11-08cgroups: Add synchronized cgroup creation methodsWill Chandler
We will shortly start creating repository cgroups on-demand, rather than at startup. This will require synchronization as we may have concurrent requests hitting the same cgroup. Racing requests may cause spurious failures and waste system resources. Add the `setupRepoCgroup` and `setupRepoCgroupAndAdd` methods to create individual repo cgroups and take a `cgroupLock`. These provides exclusive access when creating the cgroup. Also add a map which is pre-populated with initialized `cgroupLock`s for every available value.
2023-11-07cgroups: Store repo resources in CGroupManagerWill Chandler
Currently we generate the `reposResources` value via the `configRepositoryResources()` function on the `CGroupManager`. This works well when we create all of the repository cgroups in one batch, but we will shortly move to creating them on-demand. To avoid regenerating this struct for each of the N repository cgroups defined, create it once during construction of the `CgroupManger` and store it in the struct.
2023-11-03cgroup: Collect some more stats from `memory.stat` fileQuang-Minh Nguyen
This commit adds some more essential fields collected from `memory.stat` file. There are two differences between Cgroup V1 and V2: - `rss` in V1 is named `anon` in V2. - `cache` in V1 is named `file` in V2. The names in V2 reflect the meanings of underlying metrics better. Thus, the reported stats use V2's names.
2023-10-27cgroups: Add loadCgroup method to handlersWill Chandler
We duplicate the code to load and format errors for cgroups several times in the handlers. Deduplicate these into a `loadCgroup` method.
2023-10-27cgroups: Move repeated command to package variableWill Chandler
We reuse the same command many times in the cgroup tests. Move the command arguments to a variable to make it clear we're running the same thing every time.
2023-10-27cgroups: Add helper to calculate cgroup IDWill Chandler
We repeatedly calculate the cgroup ID for command in our tests. Deduplicate these instances into a helper method.
2023-10-27cgroups: Fix resource name in interface definitionWill Chandler
Update the name of the first argument to `setupParent` in the interface definition to match the implementations.
2023-10-23Expose a method to configure a command directly into a cgroupSami Hiltunen
This commit exposes a method from the cgroup manager to configure a command directly into a cgroup. This is done by configuring the cgroup in the SysProcAttr so the command gets spawned directly in the cgroup. The functionality is only available with cgroups version 2 and kernel version 5.7 or later.
2023-10-04Fix linting errors detected locallyEvan Read
2023-09-26log: Drop `Warnf()` function from `Logger` interfacePatrick Steinhardt
Remove the `Warnf()` function from the `Logger` interface. Please refer to the preceding commit that removes `Debugf()` for the motivation behind this change.
2023-09-18tests: Stop using `NullLogger` in favor of our own internal loggerPatrick Steinhardt
Stop using logrus' `NullLogger` in favor of our own internal logger that can be set up with the testhelper package.
2023-09-15log: Wrap the logrus loggerPatrick Steinhardt
Introduce a wrapper for the logrus logging infrastructure. This is a first step towards abstracting away the actual implementation of our logs behind a type that we can iterate on and will thus pave the way towards adoption of the `slog` package. Note that this is only one step of this transition. The interface does not yet roundtrip to itself, e.g. calling `log.Logger.WithError()` will result in a `logrus.Entry`, not in a `log.Logger`. This will be handled at a later point.
2023-08-31Replace NewLogger calls with SharedLogger calls in testsSami Hiltunen
This commit wires the tests to call SharedLogger instead of NewLogger so they'll use the test case's shared logger. This way the log output is printed to the same logger and is ordered.
2023-08-31Replace use of NewDiscardingLogEntry with the recording loggerSami Hiltunen
Gitaly now has a helper for recording log output and outputting it only in tests. Replace the usage of the discarding logger with the recording logger. These changes are search and replace so these call sites didn't even need a log entry to begin with but rather use a FieldLogger. NewDiscardingLogEntry is removed as it is no longer used.
2023-08-30Move TestMain into testhelper_test.goWill Chandler
Our style guide states that the `TestMain` function should be placed in a file named `testhelper_test.go` for consistency's sake. A number of packages were placing the test in a standard test file. In cases where `TestMain` was the only function in that file rename it to `testhelper_test.go`. If there were other functions, relocate `TestMain` into a new `testhelper_test.go` file.
2023-08-21cgroups: Use injected logging instancePatrick Steinhardt
We're using the global logging instance provided by `log.Default()`, which is about to go away soon. Refactor the code to instead pass in a proprely configured logging instance.
2023-08-16cgroups: Inject properly configured loggerPatrick Steinhardt
The cgroups managers use the global logrus logger to log data, which is discouraged. Convert the code to instead inject a logger so that we can use a properly configured one.
2023-07-28Merge branch 'qmnguyen0711/implement-cgroup-resource-watcher' into 'master'Justin Tobler
limiter: Implement Cgroup resource watchers See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6129 Merged-by: Justin Tobler <jtobler@gitlab.com> Approved-by: Justin Tobler <jtobler@gitlab.com> Reviewed-by: Quang-Minh Nguyen <qmnguyen@gitlab.com> Reviewed-by: John Cai <jcai@gitlab.com> Reviewed-by: Justin Tobler <jtobler@gitlab.com> Co-authored-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
2023-07-28Merge branch 'adlternative/zh-no-cgroup-linux-fix' into 'master'karthik nayak
cgroup: using a noop manager on linux without cgroup Closes #4511 See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6150 Merged-by: karthik nayak <knayak@gitlab.com> Approved-by: Quang-Minh Nguyen <qmnguyen@gitlab.com> Approved-by: karthik nayak <knayak@gitlab.com> Reviewed-by: Quang-Minh Nguyen <qmnguyen@gitlab.com> Reviewed-by: karthik nayak <knayak@gitlab.com> Co-authored-by: ZheNing Hu <adlternative@gmail.com>
2023-07-28cgroup: Implement Stats() function to collect generic cgroup metricsQuang-Minh Nguyen
This commit implements Stats() function to cgroup managers. This function collects the usage statistics of the cgroups managed by the cgroup manager. This information is fetched from cgroupfs files, especially `cpu.stat`, `memory.events`, and some `memory.*` files.
2023-07-28cgroup: Add Ready() to cgroup managerQuang-Minh Nguyen
This commit adds the Ready() function to the cgroup manager. This function is to check if the manager is configured properly and ready to use.
2023-07-28cgroup: Enhance cgroup mock setupQuang-Minh Nguyen
Cgroup testing suite has an utility to mock underlying cgroupfs files. This mocking utility hard-codes the values or accepts a very limited set of configs. This commits adds a map that allows this utility to mock custom values of files.
2023-07-28cgroup: using a noop manager on linux without cgroupZheNing Hu
Due to the lack of consideration for non-cgroup scenarios on Linux systems, Gitaly exits with the error message "unknown cgroup version", which hinders the usage for these Linux users. Therefore, using noop implementation of the cgroup manager for Linux systems without cgroup is necessary to solve this issue. Signed-off-by: ZheNing Hu <adlternative@gmail.com>
2023-07-14Fix cgroup's MacOS-specific linter failuresQuang-Minh Nguyen
In https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5547, we added the support for CgroupV2. It also introduced some linting offenses on MacOS. They are mostly about unused functions or types. On Linux, everything is fine. But on MacOS, files with specific build tags are not included.
2023-07-10cgroup: move all metrics initialization to meterics.goqmnguyen0711/adlternative-zh-cgroups-v2ZheNing Hu
As the metrics detected by Prometheus in cgroups V1 and V2 are basically the same but slightly different, the initialization of all V1/V2 metrics is put into metrics.go. This makes the code cleaner and easier to observe the differences between the detection meterics of V1 and V2. Signed-off-by: ZheNing Hu <adlternative@gmail.com>
2023-07-10cgroup: add support for cgroup v2ZheNing Hu
Due to the lack of cgroup V2 support in Gitaly, some operating systems that use cgroup V2 may not be able to use cgroups in Gitaly properly. Therefore, this patch adds support for cgroup V2 in Gitaly, which maintains a similar interface to cgroup V1 but removes the "gitaly_cgroup_memory_reclaim_attempts_total" metric and modifies the label of the "gitaly_cgroup_procs_total" metric compared to V1. Signed-off-by: ZheNing Hu <adlternative@gmail.com>
2023-07-09cgroup: refactor cgroup interfacesZheNing Hu
We're preparing to add V2 implementation for cgroup. Since cgroup V1 and V2 have similar implementation logic, we've extracted the common logic into the cgroupHandler interface, and implemented CGroupManager that satisfies the original Manager interface, which internally uses the cgroupHandler interface. Currently, only V1 implementation is available. To be compatible with non-Linux environments, we've added a Noop CgroupManger implementation in manager.go by specifying a build tag for non-Linux operating systems. We've also ensured that the existing multiple test files can only run on Linux by specifying the linux operating system in the build tags. Signed-off-by: ZheNing Hu <adlternative@gmail.com>
2023-05-22go.mod: Update cgroups dependency to major version 3Patrick Steinhardt
Update the cgroups dependency to major version 3. This change includes a bunch of renames where upstream moved the cgroups implementations into different packages. Adapt our code accordingly.
2023-05-10go: Bump module version from v15 to v16Patrick Steinhardt
We're about to release Gitaly v16.0. As we've landed a bunch of previously announced removals it's thus time to bump our Go module version from v15 to v16.
2023-03-13Cgroups: add cpu_quota_us limitSteve Azzopardi
What --- - Add a new configuration under `cgroups` called `cpu_quota_us` to configure `cfs_quota_us` for the parent cgroup https://docs.kernel.org/scheduler/sched-bwc.html?highlight=cfs_quota_us - Add a new configuration under `cgroups.repositories` called `cpu_quota_us` to configure `cfs_quota_us` for the repository cgroup https://docs.kernel.org/scheduler/sched-bwc.html?highlight=cfs_quota_us - Add metrics - `gitaly_cgroup_cpu_cfs_periods_total`: Read from `cpu.stat` nr_periods https://docs.kernel.org/scheduler/sched-bwc.html#statistics - `gitaly_cgroup_cpu_cfs_throttled_periods_total`: Read from `cpu.stat` nr_throttled https://docs.kernel.org/scheduler/sched-bwc.html#statistics - `gitaly_cgroup_cpu_cfs_throttled_seconds_total`: Read from `cpu.stat` throttled_time https://docs.kernel.org/scheduler/sched-bwc.html#statistics - Add more test coverage when only specific values are set. Why --- At the moment we limit memory and CPU via [`cpu.shares`](https://kernel.googlesource.com/pub/scm/linux/kernel/git/glommer/memcg/+/cpu_stat/Documentation/cgroups/cpu.txt) which will only throttle a cgroup when there is contention on the CPU. This means that potentially a single repository can still hog all of the CPU on a gitaly node. We've seen a case of this in https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8318, a single repository saturated the CPU, and the scheduler couldn't balance the CPU for other tasks/requests to be scheduled. We hoped CPU shares would be enough, but we need an upper CPU quota for gitaly cgroups so no single repository can fully saturate the CPU. There are a few concerns that are addressed Concern 1: cfs_period_us `cfs_period_us` is used to calculate the `cfs_quota_us` (what we are setting now), the default value seems to be [hardcoded](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/kernel/sched/fair.c?h=v5.15.92#n5492) but the Linux kernel but this can be updated, so Gitaly is explicitly settings this to 100ms (default value) Concern 2: not using `cfs_burst_us` This could allow for CPU bursts, even when they exceed the `cfs_quota_us`, we don't set this because it's available on the newer kernel versions (5.15). The way users can avoid throttling is by oversubscribing `cfs_quota_us` Concern 3: Wasting available resources When the user sets these we'll be artificially limiting the CPU that they consume, this can leave performance on the table when a repository is using all its quota, and no other process is using the CPU. This is the only drawback and one we are willing to take since it adds more reliability in the long run. We can reduce the effect of this by oversubscribing. Concern 4: Observability The kernel already exports [stats](https://docs.kernel.org/scheduler/sched-bwc.html#statistics) which Gitaly exposes as, and also [cadvisor](https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md#prometheus-container-metrics) Reference: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/17332 Changelog: added Signed-off-by: Steve Azzopardi <sazzopardi@gitlab.com>
2023-02-08Extract file and executable permissionsJames Fargher
Bulk update all file and executable permissions to use the new perm package. Strictly speaking changing os.ModePerm to perm.PublicFile is a permission change (it goes from executable to normal), but since ModePerm is only used in tests this should be safe.
2023-02-07global: Convert generic packages to test with SHA256 object formatPatrick Steinhardt
There's a bunch of generic packages that really don't depend much on the repository object format. Convert them to test with the SHA256 object format.