Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2022-10-22 03:10:31 +0300
committerGitLab Bot <gitlab-bot@gitlab.com>2022-10-22 03:10:31 +0300
commit51d09c5698aed5bbfa66a7773db93b674be63a56 (patch)
treec326dec71001ec3f9b451586f3d459c0674ae5db /doc
parent798e0b592009fc6613117d7e127092fc650ee48b (diff)
Add latest changes from gitlab-org/gitlab@master
Diffstat (limited to 'doc')
-rw-r--r--doc/administration/geo/replication/troubleshooting.md2
-rw-r--r--doc/administration/housekeeping.md170
-rw-r--r--doc/administration/issue_closing_pattern.md2
-rw-r--r--doc/administration/nfs.md2
-rw-r--r--doc/administration/packages/container_registry.md2
-rw-r--r--doc/administration/snippets/index.md2
-rw-r--r--doc/architecture/blueprints/ci_data_decay/index.md4
-rw-r--r--doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md2
-rw-r--r--doc/architecture/blueprints/cloud_native_build_logs/index.md2
-rw-r--r--doc/architecture/blueprints/database_scaling/size-limits.md2
10 files changed, 141 insertions, 49 deletions
diff --git a/doc/administration/geo/replication/troubleshooting.md b/doc/administration/geo/replication/troubleshooting.md
index 3f16c1552ad..effc49d21ba 100644
--- a/doc/administration/geo/replication/troubleshooting.md
+++ b/doc/administration/geo/replication/troubleshooting.md
@@ -469,7 +469,7 @@ This happens because the PostgreSQL certificate that the Omnibus GitLab package
the Common Name `PostgreSQL`, but the replication is connecting to a different host and GitLab attempts to use
the `verify-full` SSL mode by default.
-In order to fix this, you can either:
+To fix this issue, you can either:
- Use the `--sslmode=verify-ca` argument with the `replicate-geo-database` command.
- For an already replicated database, change `sslmode=verify-full` to `sslmode=verify-ca`
diff --git a/doc/administration/housekeeping.md b/doc/administration/housekeeping.md
index 15287b917e7..0209f97bd31 100644
--- a/doc/administration/housekeeping.md
+++ b/doc/administration/housekeeping.md
@@ -1,43 +1,85 @@
---
stage: Systems
-group: Distribution
+group: Gitaly
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
---
# Housekeeping **(FREE SELF)**
-GitLab supports and automates housekeeping tasks in your current repository such as:
+GitLab supports and automates housekeeping tasks in Git repositories to ensure
+that they can be served as efficiently as possible. Housekeeping tasks include:
-- Compressing Git objects.
+- Compressing Git objects and revisions.
- Removing unreachable objects.
+- Removing stale data like lock files.
+- Maintaining data structures that improve performance.
+- Updating object pools to improve object deduplication across forks.
-## Configure housekeeping
+WARNING:
+Do not manually execute Git commands to perform housekeeping in Git
+repositories that are controlled by GitLab. Doing so may lead to corrupt
+repositories and data loss.
+
+## Running housekeeping tasks
+
+There are different ways in which GitLab runs housekeeping tasks:
+
+- A project's administrator can [manually trigger](#manual-trigger) repository
+ housekeeping tasks.
+- GitLab can automatically schedule housekeeping tasks [after a number of Git pushes](#push-based-trigger).
+- GitLab can [schedule a job](#scheduled-housekeeping) that runs housekeeping
+ tasks for all repositories in a configurable timeframe.
+
+### Manual trigger
+
+Administrators of repositories can manually trigger housekeeping tasks in a
+repository. In general this is not required as GitLab knows to automatically run
+housekeeping tasks. The manual trigger can be useful when either:
+
+- A repository is known to require housekeeping.
+- Automated push-based scheduling of housekeeping tasks has been disabled.
+
+To trigger housekeeping tasks manually:
+
+1. On the top bar, select **Main menu > Projects** and find your project.
+1. On the left sidebar, select **Settings > General**.
+1. Expand **Advanced**.
+1. Select **Run housekeeping**.
+
+This starts an asynchronous background worker for the project's repository. The
+background worker executes `git gc`, which performs a number of optimizations.
+
+### Push-based trigger
-GitLab automatically runs `git gc` and `git repack` on repositories after Git pushes:
+GitLab automatically runs repository housekeeping tasks after a configured
+number of pushes:
- [`git gc`](https://git-scm.com/docs/git-gc) runs a number of housekeeping tasks such as:
- Compressing Git objects to reduce disk space and increase performance.
- Removing unreachable objects that may have been created from changes to the repository, like force-overwriting branches.
- [`git repack`](https://git-scm.com/docs/git-repack) either:
- - Runs an incremental repack, according to a [configured period](#housekeeping-options). This
+ - Runs an incremental repack, according to a [configured period](#configure-push-based-maintenance). This
packs all loose objects into a new packfile and prunes the now-redundant loose objects.
- - Runs a full repack, according to a [configured period](#housekeeping-options). This repacks all
+ - Runs a full repack, according to a [configured period](#configure-push-based-maintenance). This repacks all
packfiles and loose objects into a single new packfile, and deletes the old now-redundant loose
objects and packfiles. It also optionally creates bitmaps for the new packfile.
+- [`git pack-refs`](https://git-scm.com/docs/git-pack-refs) compresses references
+ stored as loose files into a single file.
-You can change how often this happens or turn it off:
+#### Configure push-based maintenance
+
+You can change how often these tasks run when pushes occur, or you can turn
+them off entirely:
1. On the top bar, select **Main menu > Admin**.
1. On the left sidebar, select **Settings > Repository**.
1. Expand **Repository maintenance**.
-1. In the **Housekeeping** section, configure the [housekeeping options](#housekeeping-options).
+1. In the **Housekeeping** section, configure the housekeeping options.
1. Select **Save changes**.
-### Housekeeping options
-
The following housekeeping options are available:
-- **Enable automatic repository housekeeping**: Regularly run `git repack` and `git gc`. If you
+- **Enable automatic repository housekeeping**: Regularly run housekeeping tasks. If you
keep this setting disabled for a long time, Git repository access on your GitLab server becomes
slower and your repositories use more disk space.
- **Incremental repack period**: Number of Git pushes after which an incremental `git repack` is
@@ -60,30 +102,80 @@ Housekeeping also [removes unreferenced LFS files](../raketasks/cleanup.md#remov
from your project on the same schedule as the `git gc` operation, freeing up storage space for your
project.
-WARNING:
-Running `git gc` or `git repack` commands manually in the
-[repository folder](repository_storage_types.md#from-project-name-to-hashed-path)
-is discouraged. If the created pack files get incorrect access rights (that is, owned by the wrong user)
-browsing to the project page might result in `404` and `503` errors.
-
-## How housekeeping handles pool repositories
-
-Housekeeping for pool repositories is handled differently from standard repositories. It is
-ultimately performed by the Gitaly RPC `FetchIntoObjectPool`.
-
-This is the current call stack by which it is invoked:
-
-1. `Repositories::HousekeepingService#execute_gitlab_shell_gc`
-1. `Projects::GitGarbageCollectWorker#perform`
-1. `Projects::GitDeduplicationService#fetch_from_source`
-1. `ObjectPool#fetch`
-1. `ObjectPoolService#fetch`
-1. `Gitaly::FetchIntoObjectPoolRequest`
-
-To manually invoke it from a [Rails console](operations/rails_console.md) if needed, you can call
-`project.pool_repository.object_pool.fetch`. This is a potentially long-running task, though Gitaly
-times out in about 8 hours.
-
-WARNING:
-Do not run `git prune` or `git gc` in pool repositories! This can cause data loss in "real"
-repositories that depend on the pool in question.
+### Scheduled housekeeping
+
+While GitLab automatically performs housekeeping tasks based on the number of
+pushes, it does not maintain repositories that don't receive any pushes at all.
+As a result, inactive repositories or repositories that are only getting read
+requests may not benefit from improvements in the repository housekeeping
+strategy.
+
+Administrators can enable a background job that performs housekeeping in all
+repositories at a customizable interval to remedy this situation. This
+background job processes all repositories hosted by a Gitaly node in a random
+order and eagerly performs housekeeping tasks on them. The Gitaly node will stop
+processing repositories if it takes longer than the configured interval.
+
+#### Configure scheduled housekeeping
+
+Background maintenance of Git repositories is configured in Gitaly. By default,
+Gitaly performs background repository maintenance every day at 12:00 noon for a
+duration of 10 minutes.
+
+You can change this default in Gitaly configuration. The following snippet
+enables daily background repository maintenance starting at 23:00 for 1 hour
+for the `default` storage:
+
+```toml
+[daily_maintenance]
+start_hour = 23
+start_minute = 00
+duration = 1h
+storages = ["default"]
+```
+
+Use the following snippet to completely disable background repository
+maintenance:
+
+```toml
+[daily_maintenance]
+disabled = true
+```
+
+## Object pool repositories
+
+Object pool repositories are used by GitLab to deduplicate objects across forks
+of a repository. When creating the first fork, we:
+
+1. Create an object pool repository that contains all objects of the repository
+ that is about to be forked.
+1. Link the repository to this new object pool via Git's altenates mechanism.
+1. Repack the repository so that it uses objects from the object pool. It thus
+ can drop its own copy of the objects.
+
+Any forks of this repository can now link against the object pool and thus only
+have to keep objects that diverge from the primary repository.
+
+GitLab needs to perform special housekeeping operations in object pools:
+
+- Gitaly cannot ever delete unreachable objects from object pools because they
+ might be used by any of the forks that are connected to it.
+- Gitaly must keep all objects reachable due to the same reason. Object pools
+ thus maintain references to unreachable "dangling" objects so that they don't
+ ever get deleted.
+- GitLab must update object pools regularly to pull in new objects that have
+ been added in the primary repository. Otherwise, an object pool will become
+ increasingly inefficient at deduplicating objects.
+
+These housekeeping operations are performed by the specialized
+`FetchIntoObjectPool` RPC that handles all of these special tasks while also
+executing the regular housekeeping tasks we execute for normal Git
+repositories.
+
+Object pools are getting optimized automatically whenever the primary member is
+getting garbage collected. Therefore, the cadence can be configured using the
+same Git GC period in that project.
+
+If you need to manually invoke the RPC from a [Rails console](operations/rails_console.md),
+you can call `project.pool_repository.object_pool.fetch`. This is a potentially
+long-running task, though Gitaly times out after about 8 hours.
diff --git a/doc/administration/issue_closing_pattern.md b/doc/administration/issue_closing_pattern.md
index d10f5320109..e9150ae0650 100644
--- a/doc/administration/issue_closing_pattern.md
+++ b/doc/administration/issue_closing_pattern.md
@@ -17,7 +17,7 @@ in the project's default branch.
## Change the issue closing pattern
-In order to change the pattern you need to have access to the server that GitLab
+To change the pattern, you must have access to the server that GitLab
is installed on.
The default pattern can be located in [`gitlab.yml.example`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/config/gitlab.yml.example)
diff --git a/doc/administration/nfs.md b/doc/administration/nfs.md
index 9072bd1f344..a4acafd16da 100644
--- a/doc/administration/nfs.md
+++ b/doc/administration/nfs.md
@@ -44,7 +44,7 @@ GitLab support is unable to continue with the investigation if both:
- The date of the request is on or after the release of GitLab version 15.6.
- Support Engineers and Management determine that all reasonable non-NFS root causes have been exhausted.
-If the issue is reproducible, or if it happens intermittently but regularly, GitLab Support can investigate providing the issue reproduces without the use of NFS. In order to reproduce without NFS, the affected repositories should be migrated to a different Gitaly shard, such as Gitaly cluster or a standalone Gitaly VM, backed with block storage.
+If the issue is reproducible, or if it happens intermittently but regularly, GitLab Support can investigate providing the issue reproduces without the use of NFS. To reproduce without NFS, the affected repositories should be migrated to a different Gitaly shard, such as Gitaly cluster or a standalone Gitaly VM, backed with block storage.
### Why remove NFS for Git repository data
diff --git a/doc/administration/packages/container_registry.md b/doc/administration/packages/container_registry.md
index d04e3217f57..579ba5ddffc 100644
--- a/doc/administration/packages/container_registry.md
+++ b/doc/administration/packages/container_registry.md
@@ -1156,7 +1156,7 @@ blobs start being deleted is anything permanent done.
## Configuring GitLab and Registry to run on separate nodes (Omnibus GitLab)
By default, package assumes that both services are running on the same node.
-In order to get GitLab and Registry to run on a separate nodes, separate configuration
+To get GitLab and Registry to run on a separate nodes, separate configuration
is necessary for Registry and GitLab.
### Configuring Registry
diff --git a/doc/administration/snippets/index.md b/doc/administration/snippets/index.md
index 89a571946af..7bf828afedd 100644
--- a/doc/administration/snippets/index.md
+++ b/doc/administration/snippets/index.md
@@ -26,7 +26,7 @@ content changes.
### Snippets size limit configuration
This setting is not available through the [Admin Area settings](../../user/admin_area/settings/index.md).
-In order to configure this setting, use either the Rails console
+To configure this setting, use either the Rails console
or the [Application settings API](../../api/settings.md).
NOTE:
diff --git a/doc/architecture/blueprints/ci_data_decay/index.md b/doc/architecture/blueprints/ci_data_decay/index.md
index 221c2364f79..f1942ceeb3f 100644
--- a/doc/architecture/blueprints/ci_data_decay/index.md
+++ b/doc/architecture/blueprints/ci_data_decay/index.md
@@ -67,7 +67,7 @@ When a build gets archived it will not be possible to retry it, but we still do
keep all the processing metadata in the database, and it consumes resources
that are scarce in the primary database.
-In order to improve performance and make it easier to scale CI/CD data storage
+To improve performance and make it easier to scale CI/CD data storage
we might want to follow these three tracks described below.
![pipeline data time decay](pipeline_data_time_decay.png)
@@ -210,7 +210,7 @@ We accept the possible necessity of building a separate API endpoint /
endpoints needed to access pipeline data through the API.
In the new API users might need to provide a time range in which the data has
-been created to search through their pipelines / builds. In order to make it
+been created to search through their pipelines / builds. To make it
efficient it might be necessary to restrict access to querying data residing in
more than two partitions at once. We can do that by supporting time ranges
spanning the duration that equals to the builds archival policy.
diff --git a/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md b/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md
index 5f907ecdaa4..875827d7c95 100644
--- a/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md
+++ b/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md
@@ -269,7 +269,7 @@ table, it is possible to have many logical partitions per one physical partition
## Storing partitions metadata in the database
-In order to build an efficient mechanism that will be responsible for creating
+To build an efficient mechanism that will be responsible for creating
new partitions, and to implement time decay we want to introduce a partitioning
metadata table, called `ci_partitions`. In that table we would store metadata
about all the logical partitions, with many pipelines per partition. We may
diff --git a/doc/architecture/blueprints/cloud_native_build_logs/index.md b/doc/architecture/blueprints/cloud_native_build_logs/index.md
index b77d7998fc8..df807d45694 100644
--- a/doc/architecture/blueprints/cloud_native_build_logs/index.md
+++ b/doc/architecture/blueprints/cloud_native_build_logs/index.md
@@ -31,7 +31,7 @@ a job is complete, the trace file contents are sent to the object store.
New architecture writes data to Redis instead of writing build logs into a
file.
-In order to make this performant and resilient enough, we implemented a chunked
+To make this performant and resilient enough, we implemented a chunked
I/O mechanism - we store data in Redis in chunks, and migrate them to an object
store once we reach a desired chunk size.
diff --git a/doc/architecture/blueprints/database_scaling/size-limits.md b/doc/architecture/blueprints/database_scaling/size-limits.md
index 0bb1ae9efb4..e530bd6eff0 100644
--- a/doc/architecture/blueprints/database_scaling/size-limits.md
+++ b/doc/architecture/blueprints/database_scaling/size-limits.md
@@ -117,7 +117,7 @@ limit 30;
NOTE:
In PostgreSQL context, a **physical table** is either a regular table or a partition of a partitioned table.
-In order to maintain and improve operational stability and lessen development burden, we target a **table size less than 100 GB for a physical table on GitLab.com** (including its indexes). This has numerous benefits:
+To maintain and improve operational stability and lessen development burden, we target a **table size less than 100 GB for a physical table on GitLab.com** (including its indexes). This has numerous benefits:
1. Improved query performance and more stable query plans
1. Significantly reduce vacuum run times and increase frequency of vacuum runs to maintain a healthy state - reducing overhead on the database primary