Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2023-09-08 06:11:54 +0300
committerGitLab Bot <gitlab-bot@gitlab.com>2023-09-08 06:11:54 +0300
commit72db8879531eb432b1d3b6957477543d59d94c49 (patch)
tree46826a886f1883ca0fc2cb2bff05a0f3dc92da21
parente85e128aa0ed5539e3a70a908c6ab0ff6b59a365 (diff)
Add latest changes from gitlab-org/gitlab@master
-rw-r--r--.gitlab/CODEOWNERS3
-rw-r--r--config/sidekiq_queues.yml2
-rw-r--r--doc/administration/reference_architectures/index.md4
-rw-r--r--doc/ci/large_repositories/index.md257
-rw-r--r--doc/ci/pipelines/pipeline_efficiency.md2
-rw-r--r--doc/ci/pipelines/settings.md4
-rw-r--r--doc/topics/git/troubleshooting_git.md2
-rw-r--r--doc/update/index.md41
-rw-r--r--doc/update/plan_your_upgrade.md6
-rw-r--r--doc/user/project/repository/managing_large_repositories.md408
-rw-r--r--locale/gitlab.pot18
11 files changed, 432 insertions, 315 deletions
diff --git a/.gitlab/CODEOWNERS b/.gitlab/CODEOWNERS
index af21d1ad02d..62224524065 100644
--- a/.gitlab/CODEOWNERS
+++ b/.gitlab/CODEOWNERS
@@ -732,7 +732,6 @@ lib/gitlab/checks/**
/doc/ci/examples/deployment/ @phillipwells
/doc/ci/examples/semantic-release.md @phillipwells
/doc/ci/interactive_web_terminal/ @fneill
-/doc/ci/large_repositories/ @fneill
/doc/ci/resource_groups/ @phillipwells
/doc/ci/runners/ @fneill
/doc/ci/services/ @fneill
@@ -977,7 +976,7 @@ lib/gitlab/checks/**
/doc/user/project/repository/ @aqualls
/doc/user/project/repository/code_suggestions/ @sselhorn
/doc/user/project/repository/file_finder.md @ashrafkhamis
-/doc/user/project/repository/managing_large_repositories.md @axil
+/doc/user/project/repository/managing_large_repositories.md @eread
/doc/user/project/repository/web_editor.md @ashrafkhamis
/doc/user/project/requirements/ @msedlakjakubowski
/doc/user/project/service_desk/ @msedlakjakubowski
diff --git a/config/sidekiq_queues.yml b/config/sidekiq_queues.yml
index 20c08bdf403..e383defbed5 100644
--- a/config/sidekiq_queues.yml
+++ b/config/sidekiq_queues.yml
@@ -281,6 +281,8 @@
- 1
- - gitlab_shell
- 2
+- - gitlab_subscriptions_add_on_purchases_cleanup_user_add_on_assignment
+ - 1
- - gitlab_subscriptions_refresh_seats
- 1
- - gitlab_subscriptions_trials_apply_trial
diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md
index 51409c8a056..2ad9380a00f 100644
--- a/doc/administration/reference_architectures/index.md
+++ b/doc/administration/reference_architectures/index.md
@@ -113,7 +113,7 @@ requires at least two separate environments:
- One primary site.
- One or more secondary sites that serve as replicas.
-
+
If the primary site becomes unavailable, you can fail over to one of the secondary sites.
This **advanced and complex** setup should only be undertaken if DR is
@@ -204,7 +204,7 @@ However, additional workloads can multiply the impact of operations by triggerin
You may need to adjust the suggested specifications to compensate if you use, for example:
- Security software on the nodes.
-- Hundreds of concurrent CI jobs for [large repositories](../../ci/large_repositories/index.md).
+- Hundreds of concurrent CI jobs for [large repositories](../../user/project/repository/managing_large_repositories.md).
- Custom scripts that [run at high frequency](../logs/log_parsing.md#print-top-api-user-agents).
- [Integrations](../../integration/index.md) in many large projects.
- [Server hooks](../server_hooks.md).
diff --git a/doc/ci/large_repositories/index.md b/doc/ci/large_repositories/index.md
index da5ebe21341..cef0554bf6c 100644
--- a/doc/ci/large_repositories/index.md
+++ b/doc/ci/large_repositories/index.md
@@ -1,254 +1,11 @@
---
-stage: Verify
-group: Runner
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
-type: reference
+redirect_to: '../../user/project/repository/managing_large_repositories.md'
+remove_date: '2023-11-30'
---
-# Optimize GitLab for large repositories **(FREE ALL)**
+This document was moved to [another location](../../user/project/repository/managing_large_repositories.md).
-Large repositories consisting of more than 50k files in a worktree
-may require more optimizations beyond
-[pipeline efficiency](../pipelines/pipeline_efficiency.md)
-because of the time required to clone and check out.
-
-GitLab and GitLab Runner handle this scenario well
-but require optimized configuration to efficiently perform its
-set of operations.
-
-The general guidelines for handling big repositories are simple.
-Each guideline is described in more detail in the sections below:
-
-- Always fetch incrementally. Do not clone in a way that results in recreating all of the worktree.
-- Always use shallow clone to reduce data transfer. Be aware that this puts more burden
- on GitLab instance due to higher CPU impact.
-- Control the clone directory if you heavily use a fork-based workflow.
-- Optimize `git clean` flags to ensure that you remove or keep data that might affect or speed-up your build.
-
-## Shallow cloning
-
-> Introduced in GitLab Runner 8.9.
-
-GitLab and GitLab Runner perform a [shallow clone](../pipelines/settings.md#limit-the-number-of-changes-fetched-during-clone)
-by default.
-
-Ideally, you should always use `GIT_DEPTH` with a small number
-like 10. This instructs GitLab Runner to perform shallow clones.
-Shallow clones make Git request only the latest set of changes for a given branch,
-up to desired number of commits as defined by the `GIT_DEPTH` variable.
-
-This significantly speeds up fetching of changes from Git repositories,
-especially if the repository has a very long backlog consisting of number
-of big files as we effectively reduce amount of data transfer.
-
-The following example makes the runner shallow clone to fetch only a given branch;
-it does not fetch any other branches nor tags.
-
-```yaml
-variables:
- GIT_DEPTH: 10
-
-test:
- script:
- - ls -al
-```
-
-## Git strategy
-
-> Introduced in GitLab Runner 8.9.
-
-By default, GitLab is configured to use the [`fetch` Git strategy](../runners/configure_runners.md#git-strategy),
-which is recommended for large repositories.
-This strategy reduces the amount of data to transfer and
-does not really impact the operations that you might do on a repository from CI.
-
-## Git clone path
-
-> Introduced in GitLab Runner 11.10.
-
-[`GIT_CLONE_PATH`](../runners/configure_runners.md#custom-build-directories) allows you to
-control where you clone your sources. This can have implications if you
-heavily use big repositories with fork workflow.
-
-Fork workflow from GitLab Runner's perspective is stored as a separate repository
-with separate worktree. That means that GitLab Runner cannot optimize the usage
-of worktrees and you might have to instruct GitLab Runner to use that.
-
-In such cases, ideally you want to make the GitLab Runner executor be used only
-for the given project and not shared across different projects to make this
-process more efficient.
-
-The [`GIT_CLONE_PATH`](../runners/configure_runners.md#custom-build-directories) has to be
-within the `$CI_BUILDS_DIR`. Currently, it is impossible to pick any path
-from disk.
-
-## Git clean flags
-
-> Introduced in GitLab Runner 11.10.
-
-[`GIT_CLEAN_FLAGS`](../runners/configure_runners.md#git-clean-flags) allows you to control
-whether or not you require the `git clean` command to be executed for each CI
-job. By default, GitLab ensures that you have your worktree on the given SHA,
-and that your repository is clean.
-
-[`GIT_CLEAN_FLAGS`](../runners/configure_runners.md#git-clean-flags) is disabled when set
-to `none`. On very big repositories, this might be desired because `git
-clean` is disk I/O intensive. Controlling that with `GIT_CLEAN_FLAGS: -ffdx
--e .build/` (for example) allows you to control and disable removal of some
-directories within the worktree between subsequent runs, which can speed-up
-the incremental builds. This has the biggest effect if you re-use existing
-machines and have an existing worktree that you can re-use for builds.
-
-For exact parameters accepted by
-[`GIT_CLEAN_FLAGS`](../runners/configure_runners.md#git-clean-flags), see the documentation
-for [`git clean`](https://git-scm.com/docs/git-clean). The available parameters
-are dependent on Git version.
-
-## Git fetch extra flags
-
-> [Introduced](https://gitlab.com/gitlab-org/gitlab-runner/-/issues/4142) in GitLab Runner 13.1.
-
-[`GIT_FETCH_EXTRA_FLAGS`](../runners/configure_runners.md#git-fetch-extra-flags) allows you
-to modify `git fetch` behavior by passing extra flags.
-
-For example, if your project contains a large number of tags that your CI jobs don't rely on,
-you could add [`--no-tags`](https://git-scm.com/docs/git-fetch#Documentation/git-fetch.txt---no-tags)
-to the extra flags to make your fetches faster and more compact.
-
-Also in the case where you repository does _not_ contain a lot of
-tags, `--no-tags` can [make a big difference in some cases](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746).
-If your CI builds do not depend on Git tags it is worth trying.
-
-See the [`GIT_FETCH_EXTRA_FLAGS` documentation](../runners/configure_runners.md#git-fetch-extra-flags)
-for more information.
-
-## Fork-based workflow
-
-> Introduced in GitLab Runner 11.10.
-
-Following the guidelines above, let's imagine that we want to:
-
-- Optimize for a big project (more than 50k files in directory).
-- Use forks-based workflow for contributing.
-- Reuse existing worktrees. Have preconfigured runners that are pre-cloned with repositories.
-- Runner assigned only to project and all forks.
-
-Let's consider the following two examples, one using `shell` executor and
-other using `docker` executor.
-
-### `shell` executor example
-
-Let's assume that you have the following [`config.toml`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html).
-
-```toml
-concurrent = 4
-
-[[runners]]
- url = "GITLAB_URL"
- token = "TOKEN"
- executor = "shell"
- builds_dir = "/builds"
- cache_dir = "/cache"
-
- [runners.custom_build_dir]
- enabled = true
-```
-
-This `config.toml`:
-
-- Uses the `shell` executor,
-- Specifies a custom `/builds` directory where all clones are stored.
-- Enables the ability to specify `GIT_CLONE_PATH`,
-- Runs at most 4 jobs at once.
-
-### `docker` executor example
-
-Let's assume that you have the following [`config.toml`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html).
-
-```toml
-concurrent = 4
-
-[[runners]]
- url = "GITLAB_URL"
- token = "TOKEN"
- executor = "docker"
- builds_dir = "/builds"
- cache_dir = "/cache"
-
- [runners.docker]
- volumes = ["/builds:/builds", "/cache:/cache"]
-```
-
-This `config.toml`:
-
-- Uses the `docker` executor,
-- Specifies a custom `/builds` directory on disk where all clones are stored.
- We host mount the `/builds` directory to make it reusable between subsequent runs
- and be allowed to override the cloning strategy.
-- Doesn't enable the ability to specify `GIT_CLONE_PATH` as it is enabled by default.
-- Runs at most 4 jobs at once.
-
-### Our `.gitlab-ci.yml`
-
-Once we have the executor configured, we need to fine tune our `.gitlab-ci.yml`.
-
-Our pipeline is most performant if we use the following `.gitlab-ci.yml`:
-
-```yaml
-variables:
- GIT_CLONE_PATH: $CI_BUILDS_DIR/$CI_CONCURRENT_ID/$CI_PROJECT_NAME
-
-build:
- script: ls -al
-```
-
-This YAML setting configures a custom clone path. This path makes it possible to re-use worktrees
-between the parent project and forks because we use the same clone path for all forks.
-
-Why use `$CI_CONCURRENT_ID`? The main reason is to ensure that worktrees used are not conflicting
-between projects. The `$CI_CONCURRENT_ID` represents a unique identifier within the given executor.
-When we use it to construct the path, this directory does not conflict
-with other concurrent jobs running.
-
-### Store custom clone options in `config.toml`
-
-Ideally, all job-related configuration should be stored in `.gitlab-ci.yml`.
-However, sometimes it is desirable to make these schemes part of the runner's configuration.
-
-In the above example of Forks, making this configuration discoverable for users may be preferred,
-but this brings administrative overhead as the `.gitlab-ci.yml` needs to be updated for each branch.
-In such cases, it might be desirable to keep the `.gitlab-ci.yml` clone path agnostic, but make it
-a configuration of the runner.
-
-We can extend our [`config.toml`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html)
-with the following specification that is used by the runner if `.gitlab-ci.yml` does not override it:
-
-```toml
-concurrent = 4
-
-[[runners]]
- url = "GITLAB_URL"
- token = "TOKEN"
- executor = "docker"
- builds_dir = "/builds"
- cache_dir = "/cache"
-
- environment = [
- "GIT_CLONE_PATH=$CI_BUILDS_DIR/$CI_CONCURRENT_ID/$CI_PROJECT_NAME"
- ]
-
- [runners.docker]
- volumes = ["/builds:/builds", "/cache:/cache"]
-```
-
-This makes the cloning configuration to be part of the given runner
-and does not require us to update each `.gitlab-ci.yml`.
-
-## Git fetch caching step
-
-For very active repositories with a large number of references and files, consider using the
-[Gitaly pack-objects cache](../../administration/gitaly/configure_gitaly.md#pack-objects-cache).
-The pack-objects cache:
-
-- Benefits all repositories on your GitLab server.
-- Automatically works for forks.
+<!-- This redirect file can be deleted after <2023-11-30>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
diff --git a/doc/ci/pipelines/pipeline_efficiency.md b/doc/ci/pipelines/pipeline_efficiency.md
index fcefb9cf198..3d24be0f8e1 100644
--- a/doc/ci/pipelines/pipeline_efficiency.md
+++ b/doc/ci/pipelines/pipeline_efficiency.md
@@ -28,7 +28,7 @@ The easiest indicators to check for inefficient pipelines are the runtimes of th
stages, and the total runtime of the pipeline itself. The total pipeline duration is
heavily influenced by the:
-- [Size of the repository](../large_repositories/index.md)
+- [Size of the repository](../../user/project/repository/managing_large_repositories.md)
- Total number of stages and jobs.
- Dependencies between jobs.
- The ["critical path"](#directed-acyclic-graphs-dag-visualization), which represents
diff --git a/doc/ci/pipelines/settings.md b/doc/ci/pipelines/settings.md
index 1d904640435..02559da75a0 100644
--- a/doc/ci/pipelines/settings.md
+++ b/doc/ci/pipelines/settings.md
@@ -169,7 +169,7 @@ You can choose how your repository is fetched from GitLab when a job runs.
for every job. However, the local working copy is always pristine.
- `git fetch` is faster because it re-uses the local working copy (and falls
back to clone if it doesn't exist). This is recommended, especially for
- [large repositories](../large_repositories/index.md#git-strategy).
+ [large repositories](../../user/project/repository/managing_large_repositories.md#git-strategy).
The configured Git strategy can be overridden by the [`GIT_STRATEGY` variable](../runners/configure_runners.md#git-strategy)
in the `.gitlab-ci.yml` file.
@@ -192,7 +192,7 @@ a repository.
In GitLab versions 14.7 and later, newly created projects have a default `git depth`
value of `20`. GitLab versions 14.6 and earlier have a default `git depth` value of `50`.
-This value can be overridden by the [`GIT_DEPTH` variable](../large_repositories/index.md#shallow-cloning)
+This value can be overridden by the [`GIT_DEPTH` variable](../../user/project/repository/managing_large_repositories.md#shallow-cloning)
in the `.gitlab-ci.yml` file.
## Set a limit for how long jobs can run
diff --git a/doc/topics/git/troubleshooting_git.md b/doc/topics/git/troubleshooting_git.md
index 43d5d746539..0cc7259a104 100644
--- a/doc/topics/git/troubleshooting_git.md
+++ b/doc/topics/git/troubleshooting_git.md
@@ -197,7 +197,7 @@ The root causes vary, so multiple potential solutions exist, and you may need to
apply more than one:
- If this error occurs when cloning a large repository, you can
- [decrease the cloning depth](../../ci/large_repositories/index.md#shallow-cloning)
+ [decrease the cloning depth](../../user/project/repository/managing_large_repositories.md#shallow-cloning)
to a value of `1`. For example:
```shell
diff --git a/doc/update/index.md b/doc/update/index.md
index cbdc46e94ce..b821be3deea 100644
--- a/doc/update/index.md
+++ b/doc/update/index.md
@@ -25,12 +25,9 @@ has additional information about upgrading, including:
Depending on the installation method and your GitLab version, there are multiple
official ways to upgrade GitLab:
-- [Linux packages (Omnibus)](#linux-packages-omnibus)
-- [Self-compiled installations](#self-compiled-installation)
-- [Docker installations](#installation-using-docker)
-- [Kubernetes (Helm) installations](#installation-using-helm)
+::Tabs
-### Linux packages (Omnibus)
+:::TabTitle Linux packages (Omnibus)
The [package upgrade guide](package/index.md)
contains the steps needed to upgrade a package installed by official GitLab
@@ -39,12 +36,27 @@ repositories.
There are also instructions when you want to
[upgrade to a specific version](package/index.md#upgrade-to-a-specific-version-using-the-official-repositories).
-### Self-compiled installation
+:::TabTitle Helm chart (Kubernetes)
+
+GitLab can be deployed into a Kubernetes cluster using Helm.
+Instructions on how to upgrade a cloud-native deployment are in
+[a separate document](https://docs.gitlab.com/charts/installation/upgrade.html).
+
+Use the [version mapping](https://docs.gitlab.com/charts/installation/version_mappings.html)
+from the chart version to GitLab version to determine the [upgrade path](#upgrade-paths).
+
+:::TabTitle Docker
+
+GitLab provides official Docker images for both Community and Enterprise
+editions, and they are based on the Omnibus package. See how to
+[install GitLab using Docker](../install/docker.md).
+
+:::TabTitle Self-compiled (source)
- [Upgrading Community Edition and Enterprise Edition from source](upgrading_from_source.md) -
The guidelines for upgrading Community Edition and Enterprise Edition from source.
- [Patch versions](patch_versions.md) guide includes the steps needed for a
- patch version, such as 13.2.0 to 13.2.1, and apply to both Community and Enterprise
+ patch version, such as 15.2.0 to 15.2.1, and apply to both Community and Enterprise
Editions.
In the past we used separate documents for the upgrading instructions, but we
@@ -54,20 +66,7 @@ can still be found in the Git repository:
- [Old upgrading guidelines for Community Edition](https://gitlab.com/gitlab-org/gitlab-foss/tree/11-8-stable/doc/update)
- [Old upgrading guidelines for Enterprise Edition](https://gitlab.com/gitlab-org/gitlab/-/tree/11-8-stable-ee/doc/update)
-### Installation using Docker
-
-GitLab provides official Docker images for both Community and Enterprise
-editions, and they are based on the Omnibus package. See how to
-[install GitLab using Docker](../install/docker.md).
-
-### Installation using Helm
-
-GitLab can be deployed into a Kubernetes cluster using Helm.
-Instructions on how to upgrade a cloud-native deployment are in
-[a separate document](https://docs.gitlab.com/charts/installation/upgrade.html).
-
-Use the [version mapping](https://docs.gitlab.com/charts/installation/version_mappings.html)
-from the chart version to GitLab version to determine the [upgrade path](#upgrade-paths).
+::EndTabs
## Plan your upgrade
diff --git a/doc/update/plan_your_upgrade.md b/doc/update/plan_your_upgrade.md
index 0cf367af024..25578dbef59 100644
--- a/doc/update/plan_your_upgrade.md
+++ b/doc/update/plan_your_upgrade.md
@@ -103,11 +103,7 @@ For the upgrade plan, start by creating an outline of a plan that best applies
to your instance and then upgrade it for any relevant features you're using.
- Generate an upgrade plan by reading and understanding the relevant documentation:
- - upgrade based on the installation method:
- - [Linux package (Omnibus)](index.md#linux-packages-omnibus)
- - [Self-compiled](index.md#self-compiled-installation)
- - [Docker](index.md#installation-using-docker)
- - [Helm Charts](index.md#installation-using-helm)
+ - Upgrade based on the [installation method](index.md#upgrade-based-on-installation-method).
- [Zero-downtime upgrades](zero_downtime.md) (if possible and desired)
- [Convert from GitLab Community Edition to Enterprise Edition](package/convert_to_ee.md)
- What version should you upgrade to:
diff --git a/doc/user/project/repository/managing_large_repositories.md b/doc/user/project/repository/managing_large_repositories.md
index e5c1e4a6614..1d5127b5e08 100644
--- a/doc/user/project/repository/managing_large_repositories.md
+++ b/doc/user/project/repository/managing_large_repositories.md
@@ -1,53 +1,411 @@
---
stage: Systems
-group: Distribution
+group: Gitaly
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
-description: "Documentation on large repositories."
---
-# Managing large repositories **(FREE SELF)**
+# Managing monorepos
-GitLab, like any Git based system, is subject to similar performance restraints when it comes to large
-repositories that size into the gigabytes.
+Monorepos have become a regular part of development team workflows. While they have many advantages, monorepos can present performance challenges
+when using them in GitLab. Therefore, you should know:
-In the following sections, we detail several best practices for improving performance with these large repositories on GitLab.
+- What repository characteristics can impact performance.
+- Some tools and steps to optimize monorepos.
-## Large File System (LFS)
+## Impact on performance
-It's *strongly* recommended in any Git system that binary or blob files (for example, packages, audio, video, or graphics) are stored as Large File Storage (LFS) objects. With LFS, the objects are stored externally, such as in Object Storage, which reduces the number and size of objects in the repository. Storing objects in external Object Storage can improve performance.
+Because GitLab is a Git-based system, it is subject to similar performance
+constraints as Git when it comes to large repositories that are gigabytes in
+size.
-To analyze if a repository has large objects, you can use a tool like [`git-sizer`](https://github.com/github/git-sizer) for detailed analysis. This tool shows details about what makes up the repository, and highlights any areas of concern. If any large objects are found, you can then remove them with a tool such as [`git filter-repo`](reducing_the_repo_size_using_git.md).
+Monorepos can be large for [many reasons](https://about.gitlab.com/blog/2022/09/06/speed-up-your-monorepo-workflow-in-git/#characteristics-of-monorepos).
+
+Large repositories pose a performance risk performance when used in GitLab, especially if a large monorepo receives many clones or pushes a day, which is common for them.
+
+Git itself has performance limitations when it comes to handling
+monorepos.
+
+[Gitaly](https://gitlab.com/gitlab-org/gitaly) is our Git storage service built
+on top of [Git](https://git-scm.com/). This means that any limitations of
+Git are experienced in Gitaly, and in turn by end users of GitLab.
+
+## Profiling repositories
+
+Large repositories generally experience performance issues in Git. Knowing why
+your repository is large can help you develop mitigation strategies to avoid
+performance problems.
+
+You can use [`git-sizer`](https://github.com/github/git-sizer) to get a snapshot
+of repository characteristics and discover problem aspects of your monorepo.
+
+For example:
+
+```shell
+Processing blobs: 1652370
+Processing trees: 3396199
+Processing commits: 722647
+Matching commits to trees: 722647
+Processing annotated tags: 534
+Processing references: 539
+| Name | Value | Level of concern |
+| ---------------------------- | --------- | ------------------------------ |
+| Overall repository size | | |
+| * Commits | | |
+| * Count | 723 k | * |
+| * Total size | 525 MiB | ** |
+| * Trees | | |
+| * Count | 3.40 M | ** |
+| * Total size | 9.00 GiB | **** |
+| * Total tree entries | 264 M | ***** |
+| * Blobs | | |
+| * Count | 1.65 M | * |
+| * Total size | 55.8 GiB | ***** |
+| * Annotated tags | | |
+| * Count | 534 | |
+| * References | | |
+| * Count | 539 | |
+| | | |
+| Biggest objects | | |
+| * Commits | | |
+| * Maximum size [1] | 72.7 KiB | * |
+| * Maximum parents [2] | 66 | ****** |
+| * Trees | | |
+| * Maximum entries [3] | 1.68 k | * |
+| * Blobs | | |
+| * Maximum size [4] | 13.5 MiB | * |
+| | | |
+| History structure | | |
+| * Maximum history depth | 136 k | |
+| * Maximum tag depth [5] | 1 | |
+| | | |
+| Biggest checkouts | | |
+| * Number of directories [6] | 4.38 k | ** |
+| * Maximum path depth [7] | 13 | * |
+| * Maximum path length [8] | 134 B | * |
+| * Number of files [9] | 62.3 k | * |
+| * Total size of files [9] | 747 MiB | |
+| * Number of symlinks [10] | 40 | |
+| * Number of submodules | 0 | |
+```
+
+In this example, a few items are raised with a high level of concern. See the
+following sections for information on solving:
+
+- A high number of references.
+- Large blobs.
+
+### Large number of references
+
+A reference in Git (a branch or tag) is used to refer to a commit. Each
+reference is stored as an individual file. If you are curious, you can go
+to any `.git` directory and look under the `refs` directory.
+
+A large number of references can cause performance problems because, with more references,
+object walks that Git does are larger for various operations such as clones, pushes, and
+housekeeping tasks.
+
+#### Mitigation strategies
+
+To mitigate the effects of a large number of references in a monorepo:
+
+- Create an automated process for cleaning up old branches.
+- If certain references don't need to be visible to the client, hide them using the
+ [`transfer.hideRefs`](https://git-scm.com/docs/git-config#Documentation/git-config.txt-transferhideRefs)
+ configuration setting. Because Gitaly ignores any on-server Git configuration, you must change the Gitaly configuration
+ itself in `/etc/gitlab/gitlab.rb`:
+
+ ```ruby
+ gitaly['configuration'] = {
+ # ...
+ git: {
+ # ...
+ config: [
+ # ...
+ { key: "transfer.hideRefs", value: "refs/namespace_to_hide" },
+ ],
+ },
+ }
+ ```
+
+In Git 2.42.0 and later, different Git operations can skip over hidden references
+when doing an object graph walk.
+
+### Using LFS for large blobs
+
+Because Git is built to handle text data, it doesn't handle large
+binary files efficiently.
+
+Therefore, you should store binary or blob files (for example, packages, audio, video, or graphics)
+as Large File Storage (LFS) objects. With LFS, the objects are stored externally, such as in Object
+Storage, which reduces the number and size of objects in the repository. Storing
+objects in external Object Storage can improve performance.
+
+To analyze if a repository has large objects, you can use a tool like
+[`git-sizer`](https://github.com/github/git-sizer) for detailed analysis. This
+tool shows details about what makes up the repository, and highlights any areas
+of concern. If any large objects are found, you can then remove them with a tool
+such as [`git filter-repo`](reducing_the_repo_size_using_git.md).
For more information, refer to the [Git LFS documentation](../../../topics/git/lfs/index.md).
-## Gitaly Pack Objects Cache
+## Optimizing large repositories for GitLab
+
+Other than modifying your workflow and the actual repository, you can take other
+steps to maximize performance of monorepos with GitLab.
+
+### Gitaly pack-objects cache
+
+For very active repositories with a large number of references and files, consider using the
+[Gitaly pack-objects cache](../../../administration/gitaly/configure_gitaly.md#pack-objects-cache).
+The pack-objects cache:
+
+- Benefits all repositories on your GitLab server.
+- Automatically works for forks.
+
+You should always:
+
+- Fetch incrementally. Do not clone in a way that recreates all of the worktree.
+- Use shallow clones to reduce data transfer. Be aware that this puts more burden on GitLab instance because of higher CPU impact.
+
+Control the clone directory if you heavily use a fork-based workflow. Optimize
+`git clean` flags to ensure that you remove or keep data that might affect or
+speed-up your build.
+
+For more information, see [Pack-objects cache](../../../administration/gitaly/configure_gitaly.md#pack-objects-cache).
+
+### Reduce concurrent clones in CI/CD
+
+Large repositories tend to be monorepos. This usually means that these
+repositories get a lot of traffic not only from users, but from CI/CD.
+
+CI/CD loads tend to be concurrent because pipelines are scheduled during set times.
+As a result, the Git requests against the repositories can spike notably during
+these times and lead to reduced performance for both CI/CD and users alike.
+
+You should reduce CI/CD pipeline concurrency by staggering them to run at different times. For example, a set running at one time and another set running several
+minutes later.
+
+#### Shallow cloning
+
+GitLab and GitLab Runner perform a [shallow clone](../../../ci/pipelines/settings.md#limit-the-number-of-changes-fetched-during-clone)
+by default.
+
+Ideally, you should always use `GIT_DEPTH` with a small number
+like 10. This instructs GitLab Runner to perform shallow clones.
+Shallow clones make Git request only the latest set of changes for a given branch,
+up to desired number of commits as defined by the `GIT_DEPTH` variable.
+
+This significantly speeds up fetching of changes from Git repositories,
+especially if the repository has a very long backlog consisting of a number
+of big files because we effectively reduce amount of data transfer.
+The following pipeline configuration example makes the runner shallow clone to fetch only a given branch.
+The runner does not fetch any other branches nor tags.
+
+```yaml
+variables:
+ GIT_DEPTH: 10
+
+test:
+ script:
+ - ls -al
+```
+
+#### Git strategy
+
+By default, GitLab is configured to use the [`fetch` Git strategy](../../../ci/runners/configure_runners.md#git-strategy),
+which is recommended for large repositories.
+This strategy reduces the amount of data to transfer and
+does not really impact the operations that you might do on a repository from CI/CD.
+
+#### Git clone path
+
+[`GIT_CLONE_PATH`](../../../ci/runners/configure_runners.md#custom-build-directories) allows you to
+control where you clone your repositories. This can have implications if you
+heavily use big repositories with a fork-based workflow.
+
+A fork, from the perspective of GitLab Runner, is stored as a separate repository
+with a separate worktree. That means that GitLab Runner cannot optimize the usage
+of worktrees and you might have to instruct GitLab Runner to use that.
+
+In such cases, ideally you want to make the GitLab Runner executor be used only
+for the given project and not shared across different projects to make this
+process more efficient.
+
+The [`GIT_CLONE_PATH`](../../../ci/runners/configure_runners.md#custom-build-directories) must be
+in the directory set in `$CI_BUILDS_DIR`. You can't pick any path from disk.
+
+#### Git clean flags
+
+[`GIT_CLEAN_FLAGS`](../../../ci/runners/configure_runners.md#git-clean-flags) allows you to control
+whether or not you require the `git clean` command to be executed for each CI/CD
+job. By default, GitLab ensures that:
+
+- You have your worktree on the given SHA.
+- Your repository is clean.
+
+[`GIT_CLEAN_FLAGS`](../../../ci/runners/configure_runners.md#git-clean-flags) is disabled when set
+to `none`. On very big repositories, this might be desired because `git
+clean` is disk I/O intensive. Controlling that with `GIT_CLEAN_FLAGS: -ffdx
+-e .build/` (for example) allows you to control and disable removal of some
+directories in the worktree between subsequent runs, which can speed-up
+the incremental builds. This has the biggest effect if you re-use existing
+machines and have an existing worktree that you can re-use for builds.
+
+For exact parameters accepted by
+[`GIT_CLEAN_FLAGS`](../../../ci/runners/configure_runners.md#git-clean-flags), see the documentation
+for [`git clean`](https://git-scm.com/docs/git-clean). The available parameters
+are dependent on the Git version.
+
+#### Git fetch extra flags
+
+[`GIT_FETCH_EXTRA_FLAGS`](../../../ci/runners/configure_runners.md#git-fetch-extra-flags) allows you
+to modify `git fetch` behavior by passing extra flags.
+
+For example, if your project contains a large number of tags that your CI/CD jobs don't rely on,
+you could add [`--no-tags`](https://git-scm.com/docs/git-fetch#Documentation/git-fetch.txt---no-tags)
+to the extra flags to make your fetches faster and more compact.
+
+Also in the case where you repository does _not_ contain a lot of
+tags, `--no-tags` can [make a big difference in some cases](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746).
+If your CI/CD builds do not depend on Git tags, setting `--no-tags` is worth trying.
+
+For more information, see the [`GIT_FETCH_EXTRA_FLAGS` documentation](../../../ci/runners/configure_runners.md#git-fetch-extra-flags).
+
+#### Fork-based workflow
+
+Following the guidelines above, let's imagine that we want to:
+
+- Optimize for a big project (more than 50k files in directory).
+- Use forks-based workflow for contributing.
+- Reuse existing worktrees. Have preconfigured runners that are pre-cloned with repositories.
+- Runner assigned only to project and all forks.
+
+Let's consider the following two examples, one using `shell` executor and
+other using `docker` executor.
+
+##### `shell` executor example
+
+Let's assume that you have the following [`config.toml`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html).
+
+```toml
+concurrent = 4
+
+[[runners]]
+ url = "GITLAB_URL"
+ token = "TOKEN"
+ executor = "shell"
+ builds_dir = "/builds"
+ cache_dir = "/cache"
+
+ [runners.custom_build_dir]
+ enabled = true
+```
+
+This `config.toml`:
+
+- Uses the `shell` executor,
+- Specifies a custom `/builds` directory where all clones are stored.
+- Enables the ability to specify `GIT_CLONE_PATH`,
+- Runs at most 4 jobs at once.
+
+##### `docker` executor example
+
+Let's assume that you have the following [`config.toml`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html).
+
+```toml
+concurrent = 4
+
+[[runners]]
+ url = "GITLAB_URL"
+ token = "TOKEN"
+ executor = "docker"
+ builds_dir = "/builds"
+ cache_dir = "/cache"
+
+ [runners.docker]
+ volumes = ["/builds:/builds", "/cache:/cache"]
+```
+
+This `config.toml`:
+
+- Uses the `docker` executor,
+- Specifies a custom `/builds` directory on disk where all clones are stored.
+ We host mount the `/builds` directory to make it reusable between subsequent runs
+ and be allowed to override the cloning strategy.
+- Doesn't enable the ability to specify `GIT_CLONE_PATH` as it is enabled by default.
+- Runs at most 4 jobs at once.
+
+##### Our `.gitlab-ci.yml`
+
+Once we have the executor configured, we need to fine tune our `.gitlab-ci.yml`.
+
+Our pipeline is most performant if we use the following `.gitlab-ci.yml`:
+
+```yaml
+variables:
+ GIT_CLONE_PATH: $CI_BUILDS_DIR/$CI_CONCURRENT_ID/$CI_PROJECT_NAME
+
+build:
+ script: ls -al
+```
+
+This YAML setting configures a custom clone path. This path makes it possible to re-use worktrees
+between the parent project and forks because we use the same clone path for all forks.
+
+Why use `$CI_CONCURRENT_ID`? The main reason is to ensure that worktrees used are not conflicting
+between projects. The `$CI_CONCURRENT_ID` represents a unique identifier within the given executor.
+When we use it to construct the path, this directory does not conflict
+with other concurrent jobs running.
+
+### Store custom clone options in `config.toml`
+
+Ideally, all job-related configuration should be stored in `.gitlab-ci.yml`.
+However, sometimes it is desirable to make these schemes part of the runner's configuration.
-Gitaly, the service that provides storage for Git repositories, can be configured to cache a short rolling window of Git fetch responses. This is recommended for large repositories as it can notably reduce server load when your server receives lots of fetch traffic.
+In the above example of forks, making this configuration discoverable for users may be preferred,
+but this brings administrative overhead as the `.gitlab-ci.yml` needs to be updated for each branch.
+In such cases, it might be desirable to keep the `.gitlab-ci.yml` clone path agnostic, but make it
+a configuration of the runner.
-Refer to the [Gitaly Pack Objects Cache for more information](../../../administration/gitaly/configure_gitaly.md#pack-objects-cache).
+We can extend our [`config.toml`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html)
+with the following specification that is used by the runner if `.gitlab-ci.yml` does not override it:
-## Reference Architectures
+```toml
+concurrent = 4
-Large repositories tend to be found in larger organisations with many users. The GitLab Quality and Support teams provide several [Reference Architectures](../../../administration/reference_architectures/index.md) that are the recommended way to deploy GitLab at scale.
+[[runners]]
+ url = "GITLAB_URL"
+ token = "TOKEN"
+ executor = "docker"
+ builds_dir = "/builds"
+ cache_dir = "/cache"
-In these types of setups it's recommended that the GitLab environment used matches a Reference Architecture to improve performance.
+ environment = [
+ "GIT_CLONE_PATH=$CI_BUILDS_DIR/$CI_CONCURRENT_ID/$CI_PROJECT_NAME"
+ ]
-## Gitaly Cluster
+ [runners.docker]
+ volumes = ["/builds:/builds", "/cache:/cache"]
+```
-Gitaly Cluster can notably improve large repository performance as it holds multiple replicas of the repository across several nodes. As a result, Gitaly Cluster can load balance read requests against those repositories and is also fault-tolerant.
+This makes the cloning configuration to be part of the given runner
+and does not require us to update each `.gitlab-ci.yml`.
-It's recommended for large repositories, however, Gitaly Cluster is a large solution with additional complexity of setup, and management. Refer to the [Gitaly Cluster documentation for more information](../../../administration/gitaly/index.md), specifically the [Before deploying Gitaly Cluster](../../../administration/gitaly/index.md#before-deploying-gitaly-cluster) section.
+### Reference architectures
-## Keep GitLab up to date
+Large repositories tend to be found in larger organisations with many users. The GitLab Quality and Support teams provide several [reference architectures](../../../administration/reference_architectures/index.md) that are the recommended way to deploy GitLab at scale.
-Performance improvements and fixes are added continuously in GitLab. As such, it's recommended you keep GitLab updated to the latest version where possible to benefit from these.
+In these types of setups, the GitLab environment used should match a reference architecture to improve performance.
-## Reduce concurrent clones in CI/CD
+### Gitaly Cluster
-Large repositories tend to be monorepos. This in turn typically means that these repositories get a lot of traffic not only from users, but from CI/CD.
+Gitaly Cluster can notably improve large repository performance because it holds multiple replicas of the repository across several nodes.
+As a result, Gitaly Cluster can load balance read requests against those replicas and is fault-tolerant.
-CI/CD loads tend to be concurrent as pipelines are scheduled during set times. As a result, the Git requests against the repositories can spike notably during these times and lead to reduced performance for both CI and users alike.
+Though Gitaly Cluster is recommended for large repositories, it is a large solution with additional complexity of setup and management. Refer to the
+[Gitaly Cluster documentation for more information](../../../administration/gitaly/index.md), specifically the
+[Before deploying Gitaly Cluster](../../../administration/gitaly/index.md#before-deploying-gitaly-cluster) section.
-When designing CI/CD pipelines, it's advisable to reduce their concurrency by staggering them to run at different times, for example, a set running at one time, and another set running several minutes later.
+### Keep GitLab up to date
-There's several other actions that can be explored to improve CI/CD performance with large repositories. Refer to the [Runner documentation for more information](../../../ci/large_repositories/index.md).
+You should keep GitLab updated to the latest version where possible to benefit from performance improvements and fixes are added continuously to GitLab.
diff --git a/locale/gitlab.pot b/locale/gitlab.pot
index 0c89b1f1e1a..92e4bc8fa38 100644
--- a/locale/gitlab.pot
+++ b/locale/gitlab.pot
@@ -42515,9 +42515,6 @@ msgstr ""
msgid "SecurityReports|Add a comment or reason for dismissal"
msgstr ""
-msgid "SecurityReports|Add comment & dismiss"
-msgstr ""
-
msgid "SecurityReports|Add or remove projects to monitor in the security area. Projects included in this list will have their results displayed in the security dashboard and vulnerability report."
msgstr ""
@@ -42572,6 +42569,9 @@ msgstr ""
msgid "SecurityReports|Configure security testing"
msgstr ""
+msgid "SecurityReports|Confirm dismissal"
+msgstr ""
+
msgid "SecurityReports|Create Issue"
msgstr ""
@@ -42587,9 +42587,15 @@ msgstr ""
msgid "SecurityReports|Development vulnerabilities"
msgstr ""
+msgid "SecurityReports|Dismiss as"
+msgstr ""
+
msgid "SecurityReports|Dismiss vulnerability"
msgstr ""
+msgid "SecurityReports|Dismissal comment"
+msgstr ""
+
msgid "SecurityReports|Dismissed '%{vulnerabilityName}'"
msgstr ""
@@ -42620,6 +42626,9 @@ msgstr ""
msgid "SecurityReports|Download the patch to apply it manually"
msgstr ""
+msgid "SecurityReports|Edit dismissal"
+msgstr ""
+
msgid "SecurityReports|Either you don't have permission to view this dashboard or the dashboard has not been setup. Please check your permission settings with your administrator or check your dashboard configurations to proceed."
msgstr ""
@@ -42737,9 +42746,6 @@ msgstr ""
msgid "SecurityReports|Results show vulnerabilities introduced by the merge request, in addition to existing vulnerabilities from the latest successful pipeline in your project's default branch."
msgstr ""
-msgid "SecurityReports|Save comment"
-msgstr ""
-
msgid "SecurityReports|Scan details"
msgstr ""