Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/job_artifacts_troubleshooting.md')
-rw-r--r--doc/administration/job_artifacts_troubleshooting.md460
1 files changed, 460 insertions, 0 deletions
diff --git a/doc/administration/job_artifacts_troubleshooting.md b/doc/administration/job_artifacts_troubleshooting.md
new file mode 100644
index 00000000000..ba60cbd7aba
--- /dev/null
+++ b/doc/administration/job_artifacts_troubleshooting.md
@@ -0,0 +1,460 @@
+---
+stage: Verify
+group: Pipeline Security
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
+---
+
+# Job artifact troubleshooting for administrators
+
+When administering job artifacts, you might encounter the following issues.
+
+## Job artifacts using too much disk space
+
+Job artifacts can fill up your disk space quicker than expected. Some possible
+reasons are:
+
+- Users have configured job artifacts expiration to be longer than necessary.
+- The number of jobs run, and hence artifacts generated, is higher than expected.
+- Job logs are larger than expected, and have accumulated over time.
+- The file system might run out of inodes because
+ [empty directories are left behind by artifact housekeeping](https://gitlab.com/gitlab-org/gitlab/-/issues/17465).
+ [The Rake task for _orphaned_ artifact files](../raketasks/cleanup.md#remove-orphan-artifact-files)
+ removes these.
+- Artifact files might be left on disk and not deleted by housekeeping. Run the
+ [Rake task for _orphaned_ artifact files](../raketasks/cleanup.md#remove-orphan-artifact-files)
+ to remove these. This script should always find work to do, as it also removes empty directories (see above).
+- [Artifact housekeeping was changed significantly](#housekeeping-disabled-in-gitlab-146-to-152),
+ and you might need to enable a feature flag to used the updated system.
+- The [keep latest artifacts from most recent success jobs](../ci/jobs/job_artifacts.md#keep-artifacts-from-most-recent-successful-jobs)
+ feature is enabled.
+
+In these and other cases, identify the projects most responsible
+for disk space usage, figure out what types of artifacts are using the most
+space, and in some cases, manually delete job artifacts to reclaim disk space.
+
+### Artifacts housekeeping
+
+Artifacts housekeeping is the process that identifies which artifacts are expired
+and can be deleted.
+
+#### Housekeeping disabled in GitLab 14.6 to 15.2
+
+Artifact housekeeping was disabled in GitLab 14.6. It was significantly improved
+in GitLab 14.10, and the changes were back ported to patch versions of GitLab 14.6 and later,
+introduced behind [feature flags](feature_flags.md) disabled by default. The flags were
+enabled by default [in GitLab 15.3](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/92931).
+
+If artifacts housekeeping does not seem to be working in GitLab 14.6 to GitLab 15.2,
+you should check if the feature flags are enabled.
+
+To check if the feature flags are enabled:
+
+1. Start a [Rails console](operations/rails_console.md#starting-a-rails-console-session).
+
+1. Check if the feature flags are enabled.
+
+ - GitLab 14.10 and earlier:
+
+ ```ruby
+ Feature.enabled?(:ci_detect_wrongly_expired_artifacts, default_enabled: :yaml)
+ Feature.enabled?(:ci_update_unlocked_job_artifacts, default_enabled: :yaml)
+ Feature.enabled?(:ci_job_artifacts_backlog_work, default_enabled: :yaml)
+ ```
+
+ - GitLab 15.0 and later:
+
+ ```ruby
+ Feature.enabled?(:ci_detect_wrongly_expired_artifacts)
+ Feature.enabled?(:ci_update_unlocked_job_artifacts)
+ Feature.enabled?(:ci_job_artifacts_backlog_work)
+ ```
+
+1. If any of the feature flags are disabled, enable them:
+
+ ```ruby
+ Feature.enable(:ci_detect_wrongly_expired_artifacts)
+ Feature.enable(:ci_update_unlocked_job_artifacts)
+ Feature.enable(:ci_destroy_unlocked_job_artifacts)
+ ```
+
+These changes include switching artifacts from `unlocked` to `locked` if
+they [should be retained](../ci/jobs/job_artifacts.md#keep-artifacts-from-most-recent-successful-jobs).
+
+#### Artifacts with `unknown` status
+
+Artifacts created before housekeeping was updated have a status of `unknown`. After they expire,
+these artifacts are not processed by the new housekeeping.
+
+You can check the database to confirm if your instance has artifacts with the `unknown` status:
+
+1. Start a database console:
+
+ ::Tabs
+
+ :::TabTitle Linux package (Omnibus)
+
+ ```shell
+ sudo gitlab-psql
+ ```
+
+ :::TabTitle Helm chart (Kubernetes)
+
+ ```shell
+ # Find the toolbox pod
+ kubectl --namespace <namespace> get pods -lapp=toolbox
+ # Connect to the PostgreSQL console
+ kubectl exec -it <toolbox-pod-name> -- /srv/gitlab/bin/rails dbconsole --include-password --database main
+ ```
+
+ :::TabTitle Docker
+
+ ```shell
+ sudo docker exec -it <container_name> /bin/bash
+ gitlab-psql
+ ```
+
+ :::TabTitle Self-compiled (source)
+
+ ```shell
+ sudo -u git -H psql -d gitlabhq_production
+ ```
+
+ ::EndTabs
+
+1. Run the following query:
+
+ ```sql
+ select expire_at, file_type, locked, count(*) from ci_job_artifacts
+ where expire_at is not null and
+ file_type != 3
+ group by expire_at, file_type, locked having count(*) > 1;
+ ```
+
+If records are returned, then there are artifacts which the housekeeping job
+is unable to process. For example:
+
+```plaintext
+ expire_at | file_type | locked | count
+-------------------------------+-----------+--------+--------
+ 2021-06-21 22:00:00+00 | 1 | 2 | 73614
+ 2021-06-21 22:00:00+00 | 2 | 2 | 73614
+ 2021-06-21 22:00:00+00 | 4 | 2 | 3522
+ 2021-06-21 22:00:00+00 | 9 | 2 | 32
+ 2021-06-21 22:00:00+00 | 12 | 2 | 163
+```
+
+Artifacts with locked status `2` are `unknown`. Check
+[issue #346261](https://gitlab.com/gitlab-org/gitlab/-/issues/346261#note_1028871458)
+for more details.
+
+#### Clean up `unknown` artifacts
+
+The Sidekiq worker that processes all `unknown` artifacts is enabled by default in
+GitLab 15.3 and later. It analyzes the artifacts returned by the above database query and
+determines which should be `locked` or `unlocked`. Artifacts are then deleted
+by that worker if needed.
+
+The worker can be enabled on self-managed instances running GitLab 14.10 and later:
+
+1. Start a [Rails console](operations/rails_console.md#starting-a-rails-console-session).
+
+1. Check if the feature is enabled.
+
+ - GitLab 14.10:
+
+ ```ruby
+ Feature.enabled?(:ci_job_artifacts_backlog_work, default_enabled: :yaml)
+ ```
+
+ - GitLab 15.0 and later:
+
+ ```ruby
+ Feature.enabled?(:ci_job_artifacts_backlog_work)
+ ```
+
+1. Enable the feature, if needed:
+
+ ```ruby
+ Feature.enable(:ci_job_artifacts_backlog_work)
+ ```
+
+The worker processes 10,000 `unknown` artifacts every seven minutes, or roughly two million
+in 24 hours.
+
+There is a related `ci_job_artifacts_backlog_large_loop_limit` feature flag
+which causes the worker to process `unknown` artifacts
+[in batches that are five times larger](https://gitlab.com/gitlab-org/gitlab/-/issues/356319).
+This flag is not recommended for use on self-managed instances.
+
+### List projects and builds with artifacts with a specific expiration (or no expiration)
+
+Using a [Rails console](operations/rails_console.md), you can find projects that have job artifacts with either:
+
+- No expiration date.
+- An expiration date more than 7 days in the future.
+
+Similar to [deleting artifacts](#delete-job-artifacts-from-jobs-completed-before-a-specific-date), use the following example time frames
+and alter them as needed:
+
+- `7.days.from_now`
+- `10.days.from_now`
+- `2.weeks.from_now`
+- `3.months.from_now`
+
+Each of the following scripts also limits the search to 50 results with `.limit(50)`, but this number can also be changed as needed:
+
+```ruby
+# Find builds & projects with artifacts that never expire
+builds_with_artifacts_that_never_expire = Ci::Build.with_downloadable_artifacts.where(artifacts_expire_at: nil).limit(50)
+builds_with_artifacts_that_never_expire.find_each do |build|
+ puts "Build with id #{build.id} has artifacts that don't expire and belongs to project #{build.project.full_path}"
+end
+
+# Find builds & projects with artifacts that expire after 7 days from today
+builds_with_artifacts_that_expire_in_a_week = Ci::Build.with_downloadable_artifacts.where('artifacts_expire_at > ?', 7.days.from_now).limit(50)
+builds_with_artifacts_that_expire_in_a_week.find_each do |build|
+ puts "Build with id #{build.id} has artifacts that expire at #{build.artifacts_expire_at} and belongs to project #{build.project.full_path}"
+end
+```
+
+### List projects by total size of job artifacts stored
+
+List the top 20 projects, sorted by the total size of job artifacts stored, by
+running the following code in the Rails console (`sudo gitlab-rails console`):
+
+```ruby
+include ActionView::Helpers::NumberHelper
+ProjectStatistics.order(build_artifacts_size: :desc).limit(20).each do |s|
+ puts "#{number_to_human_size(s.build_artifacts_size)} \t #{s.project.full_path}"
+end
+```
+
+You can change the number of projects listed by modifying `.limit(20)` to the
+number you want.
+
+### List largest artifacts in a single project
+
+List the 50 largest job artifacts in a single project by running the following
+code in the Rails console (`sudo gitlab-rails console`):
+
+```ruby
+include ActionView::Helpers::NumberHelper
+project = Project.find_by_full_path('path/to/project')
+Ci::JobArtifact.where(project: project).order(size: :desc).limit(50).map { |a| puts "ID: #{a.id} - #{a.file_type}: #{number_to_human_size(a.size)}" }
+```
+
+You can change the number of job artifacts listed by modifying `.limit(50)` to
+the number you want.
+
+### List artifacts in a single project
+
+List the artifacts for a single project, sorted by artifact size. The output includes the:
+
+- ID of the job that created the artifact
+- artifact size
+- artifact file type
+- artifact creation date
+- on-disk location of the artifact
+
+```ruby
+p = Project.find_by_id(<project_id>)
+arts = Ci::JobArtifact.where(project: p)
+
+list = arts.order(size: :desc).limit(50).each do |art|
+ puts "Job ID: #{art.job_id} - Size: #{art.size}b - Type: #{art.file_type} - Created: #{art.created_at} - File loc: #{art.file}"
+end
+```
+
+To change the number of job artifacts listed, change the number in `limit(50)`.
+
+### Delete job artifacts from jobs completed before a specific date
+
+WARNING:
+These commands remove data permanently from both the database and from disk. Before running them, we highly recommend seeking guidance from a Support Engineer, or running them in a test environment with a backup of the instance ready to be restored, just in case.
+
+If you need to manually remove job artifacts associated with multiple jobs while
+**retaining their job logs**, this can be done from the Rails console (`sudo gitlab-rails console`):
+
+1. Select jobs to be deleted:
+
+ To select all jobs with artifacts for a single project:
+
+ ```ruby
+ project = Project.find_by_full_path('path/to/project')
+ builds_with_artifacts = project.builds.with_downloadable_artifacts
+ ```
+
+ To select all jobs with artifacts across the entire GitLab instance:
+
+ ```ruby
+ builds_with_artifacts = Ci::Build.with_downloadable_artifacts
+ ```
+
+1. Delete job artifacts older than a specific date:
+
+ NOTE:
+ This step also erases artifacts that users have chosen to
+ ["keep"](../ci/jobs/job_artifacts.md#download-job-artifacts).
+
+ ```ruby
+ builds_to_clear = builds_with_artifacts.where("finished_at < ?", 1.week.ago)
+ builds_to_clear.find_each do |build|
+ Ci::JobArtifacts::DeleteService.new(build).execute
+ build.update!(artifacts_expire_at: Time.now)
+ end
+ ```
+
+ In [GitLab 15.3 and earlier](https://gitlab.com/gitlab-org/gitlab/-/issues/372537), use the following instead:
+
+ ```ruby
+ builds_to_clear = builds_with_artifacts.where("finished_at < ?", 1.week.ago)
+ builds_to_clear.find_each do |build|
+ build.artifacts_expire_at = Time.now
+ build.erase_erasable_artifacts!
+ end
+ ```
+
+ `1.week.ago` is a Rails `ActiveSupport::Duration` method which calculates a new
+ date or time in the past. Other valid examples are:
+
+ - `7.days.ago`
+ - `3.months.ago`
+ - `1.year.ago`
+
+ `erase_erasable_artifacts!` is a synchronous method, and upon execution the artifacts are immediately removed;
+ they are not scheduled by a background queue.
+
+### Delete job artifacts and logs from jobs completed before a specific date
+
+WARNING:
+These commands remove data permanently from both the database and from disk. Before running them, we highly recommend seeking guidance from a Support Engineer, or running them in a test environment with a backup of the instance ready to be restored, just in case.
+
+If you need to manually remove **all** job artifacts associated with multiple jobs,
+**including job logs**, this can be done from the Rails console (`sudo gitlab-rails console`):
+
+1. Select the jobs to be deleted:
+
+ To select jobs with artifacts for a single project:
+
+ ```ruby
+ project = Project.find_by_full_path('path/to/project')
+ builds_with_artifacts = project.builds.with_downloadable_artifacts
+ ```
+
+ To select jobs with artifacts across the entire GitLab instance:
+
+ ```ruby
+ builds_with_artifacts = Ci::Build.with_downloadable_artifacts
+ ```
+
+ Occasionally, when choosing jobs with artifacts, there could be a risk of the process being terminated due to selecting a large number of rows. This can result in high memory usage and eventually lead to the process being killed due to an Out-of-Memory (OOM) error. To resolve this, you can run in small batches. The example below limits each batch to 1000.
+
+ To select jobs with artifacts for a single project:
+
+ ```ruby
+ project = Project.find_by_full_path('path/to/project')
+ builds_with_artifacts = project.builds.with_downloadable_artifacts.find_each(batch_size: 1000)
+ ```
+
+ To select jobs with artifacts across the entire GitLab instance:
+
+ ```ruby
+ builds_with_artifacts = Ci::Build.with_downloadable_artifacts.find_each(batch_size: 1000)
+ ```
+
+1. Select the user which is mentioned in the web UI as erasing the job:
+
+ ```ruby
+ admin_user = User.find_by(username: 'username')
+ ```
+
+1. Erase the job artifacts and logs older than a specific date:
+
+ ```ruby
+ builds_to_clear = builds_with_artifacts.where("finished_at < ?", 1.week.ago)
+ builds_to_clear.find_each do |build|
+ print "Ci::Build ID #{build.id}... "
+
+ if build.erasable?
+ Ci::BuildEraseService.new(build, admin_user).execute
+ puts "Erased"
+ else
+ puts "Skipped (Nothing to erase or not erasable)"
+ end
+ end
+ ```
+
+ In [GitLab 15.3 and earlier](https://gitlab.com/gitlab-org/gitlab/-/issues/369132), replace
+ `Ci::BuildEraseService.new(build, admin_user).execute` with `build.erase(erased_by: admin_user)`.
+
+ `1.week.ago` is a Rails `ActiveSupport::Duration` method which calculates a new
+ date or time in the past. Other valid examples are:
+
+ - `7.days.ago`
+ - `3.months.ago`
+ - `1.year.ago`
+
+## Job artifact upload fails with error 500
+
+If you are using object storage for artifacts and a job artifact fails to upload,
+review:
+
+- The job log for an error message similar to:
+
+ ```plaintext
+ WARNING: Uploading artifacts as "archive" to coordinator... failed id=12345 responseStatus=500 Internal Server Error status=500 token=abcd1234
+ ```
+
+- The [workhorse log](logs/index.md#workhorse-logs) for an error message similar to:
+
+ ```json
+ {"error":"MissingRegion: could not find region configuration","level":"error","msg":"error uploading S3 session","time":"2021-03-16T22:10:55-04:00"}
+ ```
+
+In both cases, you might need to add `region` to the job artifact [object storage configuration](object_storage.md).
+
+## Job artifact upload fails with `500 Internal Server Error (Missing file)`
+
+Bucket names that include folder paths are not supported with [consolidated object storage](object_storage.md#configure-a-single-storage-connection-for-all-object-types-consolidated-form).
+For example, `bucket/path`. If a bucket name has a path in it, you might receive an error similar to:
+
+```plaintext
+WARNING: Uploading artifacts as "archive" to coordinator... POST https://gitlab.example.com/api/v4/jobs/job_id/artifacts?artifact_format=zip&artifact_type=archive&expire_in=1+day: 500 Internal Server Error (Missing file)
+FATAL: invalid argument
+```
+
+If a job artifact fails to upload with the above error when using consolidated object storage, make sure you are [using separate buckets](object_storage.md#use-separate-buckets) for each data type.
+
+## Job artifacts fail to upload with `FATAL: invalid argument` when using Windows mount
+
+If you are using a Windows mount with CIFS for job artifacts, you may see an
+`invalid argument` error when the runner attempts to upload artifacts:
+
+```plaintext
+WARNING: Uploading artifacts as "dotenv" to coordinator... POST https://<your-gitlab-instance>/api/v4/jobs/<JOB_ID>/artifacts: 500 Internal Server Error id=1296 responseStatus=500 Internal Server Error status=500 token=*****
+FATAL: invalid argument
+```
+
+To work around this issue, you can try:
+
+- Switching to an ext4 mount instead of CIFS.
+- Upgrading to at least Linux kernel 5.15 which contains a number of important bug fixes
+ relating to CIFS file leases.
+- For older kernels, using the `nolease` mount option to disable file leasing.
+
+For more information, [see the investigation details](https://gitlab.com/gitlab-org/gitlab/-/issues/389995).
+
+## Usage quota shows incorrect artifact storage usage
+
+> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/238536) in GitLab 14.10.
+
+Sometimes the [artifacts storage usage](../user/usage_quotas.md) displays an incorrect
+value for the total storage space used by artifacts. To recalculate the artifact
+usage statistics for all projects in the instance, you can run this background script:
+
+```shell
+bin/rake 'gitlab:refresh_project_statistics_build_artifacts_size[file.csv]'
+```
+
+The artifact usage value can fluctuate to `0` while the script is running. After
+recalculation, usage should display as expected again.