diff options
author | GitLab Bot <gitlab-bot@gitlab.com> | 2023-10-27 03:12:17 +0300 |
---|---|---|
committer | GitLab Bot <gitlab-bot@gitlab.com> | 2023-10-27 03:12:17 +0300 |
commit | dfc8a99695e16feffcc811a536e58e2c9be75ce2 (patch) | |
tree | 34307394a3bde02dc1e8a18f94c5958a24cc91d5 /doc/administration | |
parent | 277496b843d3c14cfd48286b1718b03775d83bbc (diff) |
Add latest changes from gitlab-org/gitlab@master
Diffstat (limited to 'doc/administration')
-rw-r--r-- | doc/administration/gitaly/configure_gitaly.md | 2 | ||||
-rw-r--r-- | doc/administration/gitaly/index.md | 86 | ||||
-rw-r--r-- | doc/administration/monitoring/prometheus/gitlab_metrics.md | 1 | ||||
-rw-r--r-- | doc/administration/operations/puma.md | 31 | ||||
-rw-r--r-- | doc/administration/sidekiq/index.md | 14 |
5 files changed, 1 insertions, 133 deletions
diff --git a/doc/administration/gitaly/configure_gitaly.md b/doc/administration/gitaly/configure_gitaly.md index f62f0a5a4e2..c4f064b5eba 100644 --- a/doc/administration/gitaly/configure_gitaly.md +++ b/doc/administration/gitaly/configure_gitaly.md @@ -361,7 +361,7 @@ Configure Gitaly server in one of two ways: WARNING: If directly copying repository data from a GitLab server to Gitaly, ensure that the metadata file, default path `/var/opt/gitlab/git-data/repositories/.gitaly-metadata`, is not included in the transfer. -Copying this file causes GitLab to use the [Rugged patches](index.md#direct-access-to-git-in-gitlab) for repositories hosted on the Gitaly server, +Copying this file causes GitLab to use the direct disk access to repositories hosted on the Gitaly server, leading to `Error creating pipeline` and `Commit not found` errors, or stale data. ### Configure Gitaly clients diff --git a/doc/administration/gitaly/index.md b/doc/administration/gitaly/index.md index 46f6a5829c8..6784ff4d970 100644 --- a/doc/administration/gitaly/index.md +++ b/doc/administration/gitaly/index.md @@ -587,92 +587,6 @@ off Gitaly Cluster to a sharded Gitaly instance: 1. [Move the repositories](../operations/moving_repositories.md#moving-repositories) to the newly created storage. You can move them by shard or by group, which gives you the opportunity to spread them over multiple Gitaly servers. -## Direct access to Git in GitLab - -Direct access to Git uses code in GitLab known as the "Rugged patches". - -Before Gitaly existed, what are now Gitaly clients accessed Git repositories directly, either: - -- On a local disk in the case of a single-machine Linux package installation. -- Using NFS in the case of a horizontally-scaled GitLab installation. - -In addition to running plain `git` commands, GitLab used a Ruby library called -[Rugged](https://github.com/libgit2/rugged). Rugged is a wrapper around -[libgit2](https://libgit2.org/), a stand-alone implementation of Git in the form of a C library. - -Over time it became clear that Rugged, particularly in combination with -[Unicorn](https://yhbt.net/unicorn/), is extremely efficient. Because `libgit2` is a library and -not an external process, there was very little overhead between: - -- GitLab application code that tried to look up data in Git repositories. -- The Git implementation itself. - -Because the combination of Rugged and Unicorn was so efficient, the GitLab application code ended up -with lots of duplicate Git object lookups. For example, looking up the default branch commit a dozen -times in one request. We could write inefficient code without poor performance. - -When we migrated these Git lookups to Gitaly calls, we suddenly had a much higher fixed cost per Git -lookup. Even when Gitaly is able to re-use an already-running `git` process (for example, to look up -a commit), you still have: - -- The cost of a network roundtrip to Gitaly. -- Inside Gitaly, a write/read roundtrip on the Unix pipes that connect Gitaly to the `git` process. - -Using GitLab.com to measure, we reduced the number of Gitaly calls per request until we no longer felt -the efficiency loss of losing Rugged. It also helped that we run Gitaly itself directly on the Git -file servers, rather than by using NFS mounts. This gave us a speed boost that counteracted the -negative effect of not using Rugged anymore. - -Unfortunately, other deployments of GitLab could not remove NFS like we did on GitLab.com, and they -got the worst of both worlds: - -- The slowness of NFS. -- The increased inherent overhead of Gitaly. - -The code removed from GitLab during the Gitaly migration project affected these deployments. As a -performance workaround for these NFS-based deployments, we re-introduced some of the old Rugged -code. This re-introduced code is informally referred to as the "Rugged patches". - -### Automatic detection - -> Automatic detection for Rugged [disabled](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/95445) in GitLab 15.3. - -FLAG: -On self-managed GitLab, by default automatic detection of whether Rugged should be used (per storage) is not available. -To make it available, an administrator can [disable the feature flag](../../administration/feature_flags.md) named -`skip_rugged_auto_detect`. - -The Ruby methods that perform direct Git access are behind -[feature flags](../../development/gitaly.md#legacy-rugged-code), disabled by default. It wasn't -convenient to set feature flags to get the best performance, so we added an automatic mechanism that -enables direct Git access. - -When GitLab calls a function that has a "Rugged patch", it performs two checks: - -- Is the feature flag for this patch set in the database? If so, the feature flag setting controls - the GitLab use of "Rugged patch" code. -- If the feature flag is not set, GitLab tries accessing the file system underneath the - Gitaly server directly. If it can, it uses the "Rugged patch": - - If using Puma and [thread count](../../install/requirements.md#puma-threads) is set - to `1`. - -The result of these checks is cached. - -To see if GitLab can access the repository file system directly, we use the following heuristic: - -- Gitaly ensures that the file system has a metadata file in its root with a UUID in it. -- Gitaly reports this UUID to GitLab by using the `ServerInfo` RPC. -- GitLab Rails tries to read the metadata file directly. If it exists, and if the UUIDs match, - assume we have direct access. - -Direct Git access is: - -- [Disabled](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/95445) by default in GitLab 15.3 and later for - compatibility with [Praefect-generated replica paths](#praefect-generated-replica-paths-gitlab-150-and-later). It - can be enabled if Rugged [feature flags](../../development/gitaly.md#legacy-rugged-code) are enabled. -- Enabled by default in GitLab 15.2 and earlier because it fills in the correct repository paths in the GitLab - configuration file `config/gitlab.yml`. This satisfies the UUID check. - ### Transition to Gitaly Cluster For the sake of removing complexity, we must remove direct Git access in GitLab. However, we can't diff --git a/doc/administration/monitoring/prometheus/gitlab_metrics.md b/doc/administration/monitoring/prometheus/gitlab_metrics.md index 16287741e6a..f6ee2961ce2 100644 --- a/doc/administration/monitoring/prometheus/gitlab_metrics.md +++ b/doc/administration/monitoring/prometheus/gitlab_metrics.md @@ -243,7 +243,6 @@ configuration option in `gitlab.yml`. These metrics are served from the | `geo_cursor_last_event_timestamp` | Gauge | 10.2 | Last UNIX timestamp of the event log processed by the secondary | `url` | | `geo_status_failed_total` | Counter | 10.2 | Number of times retrieving the status from the Geo Node failed | `url` | | `geo_last_successful_status_check_timestamp` | Gauge | 10.2 | Last timestamp when the status was successfully updated | `url` | -| `geo_job_artifacts_synced_missing_on_primary` | Gauge | 10.7 | Number of job artifacts marked as synced due to the file missing on the primary | `url` | | `geo_package_files` | Gauge | 13.0 | Number of package files on primary | `url` | | `geo_package_files_checksummed` | Gauge | 13.0 | Number of package files checksummed on primary | `url` | | `geo_package_files_checksum_failed` | Gauge | 13.0 | Number of package files failed to calculate the checksum on primary | `url` | diff --git a/doc/administration/operations/puma.md b/doc/administration/operations/puma.md index f16f1ac46ae..89f1574697f 100644 --- a/doc/administration/operations/puma.md +++ b/doc/administration/operations/puma.md @@ -140,37 +140,6 @@ When running Puma in single mode, some features are not supported: For more information, see [epic 5303](https://gitlab.com/groups/gitlab-org/-/epics/5303). -## Performance caveat when using Puma with Rugged - -For deployments where NFS is used to store Git repositories, GitLab uses -[direct Git access](../gitaly/index.md#direct-access-to-git-in-gitlab) to improve performance by using -[Rugged](https://github.com/libgit2/rugged). - -Rugged usage is automatically enabled if direct Git access [is available](../gitaly/index.md#automatic-detection) and -Puma is running single threaded, unless it is disabled by a [feature flag](../../development/gitaly.md#legacy-rugged-code). - -MRI Ruby uses a Global VM Lock (GVL). GVL allows MRI Ruby to be multi-threaded, but running at -most on a single core. - -Git includes intensive I/O operations. When Rugged uses a thread for a long period of time, -other threads that might be processing requests can starve. Puma running in single thread mode -does not have this issue, because concurrently at most one request is being processed. - -GitLab is working to remove Rugged usage. Even though performance without Rugged -is acceptable today, in some cases it might be still beneficial to run with it. - -Given the caveat of running Rugged with multi-threaded Puma, and acceptable -performance of Gitaly, we disable Rugged usage if Puma multi-threaded is -used (when Puma is configured to run with more than one thread). - -This default behavior may not be the optimal configuration in some situations. If Rugged -plays an important role in your deployment, we suggest you benchmark to find the -optimal configuration: - -- The safest option is to start with single-threaded Puma. -- To force Rugged to be used with multi-threaded Puma, you can use a - [feature flag](../../development/gitaly.md#legacy-rugged-code). - ## Configuring Puma to listen over SSL Puma, when deployed with a Linux package installation, listens over a Unix socket by diff --git a/doc/administration/sidekiq/index.md b/doc/administration/sidekiq/index.md index a27723faa4a..0a7974c9622 100644 --- a/doc/administration/sidekiq/index.md +++ b/doc/administration/sidekiq/index.md @@ -356,20 +356,6 @@ To enable LDAP with the synchronization worker for Sidekiq: If you use [SAML Group Sync](../../user/group/saml_sso/group_sync.md), you must configure [SAML Groups](../../integration/saml.md#configure-users-based-on-saml-group-membership) on all your Sidekiq nodes. -## Disable Rugged - -Calls into Rugged, Ruby bindings for `libgit2`, [lock the Sidekiq processes (GVL)](https://silverhammermba.github.io/emberb/c/#c-in-ruby-threads), -blocking all jobs on that worker from proceeding. If Rugged calls performed by Sidekiq are slow, this can cause significant delays in -background task processing. - -By default, Rugged is used when Git repository data is stored on local storage or on an NFS mount. -Using Rugged is recommended when using NFS, but if -you are using local storage, disabling Rugged can improve Sidekiq performance: - -```shell -sudo gitlab-rake gitlab:features:disable_rugged -``` - ## Related topics - [Extra Sidekiq processes](extra_sidekiq_processes.md) |