Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/user/storage_management_automation.md')
-rw-r--r--doc/user/storage_management_automation.md475
1 files changed, 261 insertions, 214 deletions
diff --git a/doc/user/storage_management_automation.md b/doc/user/storage_management_automation.md
index 9a505d23597..96f9ecd11a8 100644
--- a/doc/user/storage_management_automation.md
+++ b/doc/user/storage_management_automation.md
@@ -5,13 +5,14 @@ group: Utilization
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
---
-# Storage management automation **(FREE ALL)**
+# Automate storage management **(FREE ALL)**
-You can manage your storage through the GitLab UI and the API. This page describes how to
-automate storage analysis and cleanup to manage your [usage quota](usage_quotas.md). You can also
-manage your storage usage by making your pipelines more efficient. For more information, see [pipeline efficiency](../ci/pipelines/pipeline_efficiency.md).
+This page describes how to automate storage analysis and cleanup to manage your storage usage
+with the GitLab REST API.
-You can also use the [GitLab community forum and Discord](https://about.gitlab.com/community/) to ask for help with API automation.
+You can also manage your storage usage by improving [pipeline efficiency](../ci/pipelines/pipeline_efficiency.md).
+
+For more help with API automation, you can also use the [GitLab community forum and Discord](https://about.gitlab.com/community/).
## API requirements
@@ -19,7 +20,7 @@ To automate storage management, your GitLab.com SaaS or self-managed instance mu
### API authentication scope
-You must use the following scopes to [authenticate](../api/rest/index.md#authentication) with the API:
+Use the following scopes to [authenticate](../api/rest/index.md#authentication) with the API:
- Storage analysis:
- Read API access with the `read_api` scope.
@@ -30,15 +31,20 @@ You must use the following scopes to [authenticate](../api/rest/index.md#authent
You can use command-line tools or a programming language to interact with the REST API.
-### Command line
+### Command line tools
+
+To send API requests, install either:
+
+- curl with your preferred package manager.
+- [GitLab CLI](../editor_extensions/gitlab_cli/index.md) and use the `glab api` subcommand.
-You must install the following tools to send API requests:
+To format JSON responses, install `jq`. For more information, see [Tips for productive DevOps workflows: JSON formatting with jq and CI/CD linting automation](https://about.gitlab.com/blog/2021/04/21/devops-workflows-json-format-jq-ci-cd-lint/).
-- Install `curl` with your preferred package manager.
-- Install the [GitLab CLI](../editor_extensions/gitlab_cli/index.md) and use the `api` subcommand.
-- Install `jq` to format JSON responses. For more information, see [Tips for productive DevOps workflows: JSON formatting with jq and CI/CD linting automation](https://about.gitlab.com/blog/2021/04/21/devops-workflows-json-format-jq-ci-cd-lint/).
+To use these tools with the REST API:
-Example with `curl` and `jq`:
+::Tabs
+
+:::TabTitle curl
```shell
export GITLAB_TOKEN=xxx
@@ -46,7 +52,7 @@ export GITLAB_TOKEN=xxx
curl --silent --header "Authorization: Bearer $GITLAB_TOKEN" "https://gitlab.com/api/v4/user" | jq
```
-Example with the [GitLab CLI](../editor_extensions/gitlab_cli/index.md):
+:::TabTitle GitLab CLI
```shell
glab auth login
@@ -54,18 +60,25 @@ glab auth login
glab api groups/YOURGROUPNAME/projects
```
+::EndTabs
+
#### Using the GitLab CLI
-Some API endpoints require [pagination](../api/rest/index.md#pagination) and subsequent page fetches to retrieve all results. The [GitLab CLI](../editor_extensions/gitlab_cli/index.md) provides the flag `--paginate`.
+Some API endpoints require [pagination](../api/rest/index.md#pagination) and subsequent page fetches to retrieve all results. The GitLab CLI provides the flag `--paginate`.
-Requests that require sending a POST body formatted as JSON data can be written as `key=value` pairs passed to the `--raw-field` parameter.
+Requests that require a POST body formatted as JSON data can be written as `key=value` pairs passed to the `--raw-field` parameter.
For more information, see the [GitLab CLI endpoint documentation](../editor_extensions/gitlab_cli/index.md#core-commands).
### API client libraries
-The storage management and cleanup automation methods described in this page use the [`python-gitlab`](https://python-gitlab.readthedocs.io/en/stable/) library in programmatic example. The `python-gitlab` library provides
-a feature-rich programming interface. For more information about use cases for the `python-gitlab` library,
+The storage management and cleanup automation methods described in this page use:
+
+- The [`python-gitlab`](https://python-gitlab.readthedocs.io/en/stable/) library, which provides
+a feature-rich programming interface.
+- The `get_all_projects_top_level_namespace_storage_analysis_cleanup_example.py` script in the [GitLab API with Python](https://gitlab.com/gitlab-de/use-cases/gitlab-api/gitlab-api-python/) project.
+
+For more information about use cases for the `python-gitlab` library,
see [Efficient DevSecOps workflows: Hands-on `python-gitlab` API automation](https://about.gitlab.com/blog/2023/02/01/efficient-devsecops-workflows-hands-on-python-gitlab-api-automation/).
For more information about other API client libraries, see [Third-party clients](../api/rest/index.md#third-party-clients).
@@ -73,9 +86,9 @@ For more information about other API client libraries, see [Third-party clients]
NOTE:
Use [GitLab Duo Code Suggestions](project/repository/code_suggestions/index.md) to write code more efficiently.
-## Strategies for storage analysis
+## Storage analysis
-### Identify the storage types
+### Identify storage types
The [projects API endpoint](../api/projects.md#list-all-projects) provides statistics for projects
in your GitLab instance. To use the projects API endpoint, set the `statistics` key to boolean `true`.
@@ -90,9 +103,11 @@ This data provides insight into storage consumption of the project by the follow
- `uploads_size`: Uploads storage
- `wiki_size`: Wiki storage
-Additional queries are required for detailed storage statistics for [job artifacts](../api/job_artifacts.md), the [container registry](../api/container_registry.md), the [package registry](../api/packages.md) and [dependency proxy](../api/dependency_proxy.md). It is explained later in this how-to.
+To identify storage types:
-Example that uses `curl` and `jq` on the command line:
+::Tabs
+
+:::TabTitle curl
```shell
curl --silent --header "Authorization: Bearer $GITLAB_TOKEN" "https://gitlab.com/api/v4/projects/$GL_PROJECT_ID?statistics=true" | jq --compact-output '.id,.statistics' | jq
@@ -111,7 +126,7 @@ curl --silent --header "Authorization: Bearer $GITLAB_TOKEN" "https://gitlab.com
}
```
-Example that uses the [GitLab CLI](../editor_extensions/gitlab_cli/index.md):
+:::TabTitle GitLab CLI
```shell
export GL_PROJECT_ID=48349590
@@ -131,7 +146,7 @@ glab api --method GET projects/$GL_PROJECT_ID --field 'statistics=true' | jq --c
}
```
-Example using the `python-gitlab` library:
+:::TabTitle Python
```python
project_obj = gl.projects.get(project.id, statistics=True)
@@ -139,7 +154,9 @@ project_obj = gl.projects.get(project.id, statistics=True)
print("Project {n} statistics: {s}".format(n=project_obj.name_with_namespace, s=json.dump(project_obj.statistics, indent=4)))
```
-You can find an example implementation in the script `get_all_projects_top_level_namespace_storage_analysis_cleanup_example.py` which is located in the [GitLab API with Python project](https://gitlab.com/gitlab-de/use-cases/gitlab-api/gitlab-api-python/). Export the `GL_GROUP_ID` environment variable and run the script to see the project statistics printed in the terminal.
+::EndTabs
+
+To print statistics for the project to the terminal, export the `GL_GROUP_ID` environment variable and run the script:
```shell
export GL_TOKEN=xxx
@@ -162,12 +179,12 @@ Project Developer Evangelism and Technical Marketing at GitLab / playground / A
}
```
-### Analyzing multiple subgroups and projects
+### Analyze storage in projects and groups
-You can use automation to analyze multiple projects and groups. For example, you can start at the top namespace level,
+You can automate analysis of multiple projects and groups. For example, you can start at the top namespace level,
and recursively analyze all subgroups and projects. You can also analyze different storage types.
-Here's an example of an algorithm that analyzes multiple subgroups and projects:
+Here's an example of an algorithm to analyze multiple subgroups and projects:
1. Fetch the top-level namespace ID. You can copy the ID value from the [namespace/group overview](../user/namespace/index.md#types-of-namespaces).
1. Fetch all [subgroups](../api/groups.md#list-a-groups-subgroups) from the top-level group, and save the IDs in a list.
@@ -175,7 +192,17 @@ Here's an example of an algorithm that analyzes multiple subgroups and projects:
1. Identify the storage type to analyze, and collect the information from project attributes, like project statistics, and job artifacts.
1. Print an overview of all projects, grouped by group, and their storage information.
-Example with the [GitLab CLI](../editor_extensions/gitlab_cli/index.md):
+The shell approach with `glab` might be more suitable for smaller analyses. For larger analyses, you should use a script that
+uses the API client libraries. This type of script can improve readability, data storage, flow control, testing, and reusability.
+
+To ensure the script doesn't reach [API rate limits](../api/rest/index.md#rate-limits), the following
+example code is not optimized for parallel API requests.
+
+To implement this algorithm:
+
+::Tabs
+
+:::TabTitle GitLab CLI
```shell
export GROUP_NAME="gitlab-de"
@@ -221,10 +248,7 @@ glab api projects/48349590/jobs | jq --compact-output '.[]' | jq --compact-outpu
[{"file_type":"archive","size":1049089,"filename":"artifacts.zip","file_format":"zip"},{"file_type":"metadata","size":157,"filename":"metadata.gz","file_format":"gzip"},{"file_type":"trace","size":3140,"filename":"job.log","file_format":null}]
```
-While the shell approach with `glab` works for smaller analysis, you should consider a script that
-uses the API client libraries. This improves readability, storing data, flow control, testing, and reusability.
-
-You can also implement this algorithm with a Python script that uses the `python-gitlab` library:
+:::TabTitle Python
```python
#!/usr/bin/env python
@@ -266,6 +290,8 @@ if __name__ == "__main__":
print("DEBUG: ID {i}: {a}".format(i=job.id, a=job.attributes['artifacts']))
```
+::EndTabs
+
The script outputs the project job artifacts in a JSON formatted list:
```json
@@ -291,47 +317,28 @@ The script outputs the project job artifacts in a JSON formatted list:
]
```
-The full script `get_all_projects_top_level_namespace_storage_analysis_cleanup_example.py` with specific examples for automating storage management and cleanup is located is located in the [GitLab API with Python](https://gitlab.com/gitlab-de/use-cases/gitlab-api/gitlab-api-python/) project. To ensure the script doesn't reach [API rate limits](../api/rest/index.md#rate-limits), the example code is not optimized for parallel API requests.
+## Manage CI/CD pipeline storage
-### Helper functions
-
-You may need to convert timestamp seconds into a duration format, or print raw bytes in a more
-representative format. You can use the following helper functions to transform values for improved
-readability:
-
-```shell
-# Current Unix timestamp
-date +%s
-
-# Convert `created_at` date time with timezone to Unix timestamp
-date -d '2023-08-08T18:59:47.581Z' +%s
-```
-
-Example with Python that uses the `python-gitlab` API library:
-
-```python
-def render_size_mb(v):
- return "%.4f" % (v / 1024 / 1024)
+Job artifacts consume most of the pipeline storage, and job logs can also generate several hundreds of kilobytes.
+You should delete the unnecessary job artifacts first and then clean up job logs after analysis.
-def render_age_time(v):
- return str(datetime.timedelta(seconds = v))
+WARNING:
+Deleting job log and artifacts is a destructive action that cannot be reverted. Use with caution. Deleting certain files, including report artifacts, job logs, and metadata files, affects GitLab features that use these files as data sources.
-# Convert `created_at` date time with timezone to Unix timestamp
-def calculate_age(created_at_datetime):
- created_at_ts = datetime.datetime.strptime(created_at_datetime, '%Y-%m-%dT%H:%M:%S.%fZ')
- now = datetime.datetime.now()
- return (now - created_at_ts).total_seconds()
-```
+### List job artifacts
-## Managing storage in CI/CD pipelines
+To analyze pipeline storage, you can use the [Job API endpoint](../api/jobs.md#list-project-jobs) to retrieve a list of
+job artifacts. The endpoint returns the job artifacts `file_type` key in the `artifacts` attribute.
+The `file_type` key indicates the artifact type:
-WARNING:
-Deleting job log and artifacts is a destructive action that cannot be reverted. Use with caution. Deleting certain files, including report artifacts, job logs, and metadata files, affects GitLab features that use these files as data sources.
+- `archive` is used for the generated job artifacts as a zip file.
+- `metadata` is used for additional metadata in a Gzip file.
+- `trace` is used for the `job.log` as a raw file.
-Job artifacts consume most of the pipeline storage, and job logs can also generate several hundreds of kilobytes.
-You should delete the unnecessary job artifacts first and then clean up job logs after analysis.
+Job artifacts provide a data structure that can be written as a cache file to
+disk, which you can use to test the implementation.
-### Analyze pipeline storage
+Based on the example code for fetching all projects, you can extend the Python script to do more analysis.
The following example shows a response from a query for job artifacts in a project:
@@ -358,25 +365,19 @@ The following example shows a response from a query for job artifacts in a proje
]
```
-The [Job API endpoint](../api/jobs.md#list-project-jobs) returns the job artifacts `file_type` key in the `artifacts` attribute. The the job artifacts `file_type` key provides insights into the specific artifact type:
-
-- `archive` is used for the generated job artifacts as a zip file.
-- `metadata` is used for additional metadata in a Gzip file.
-- `trace` is used for the `job.log` as a raw file.
-
-These three types are relevant for storage counting, and should be collected for a later summary. Based on the example code for fetching all projects, you can extend the Python script to do more analysis.
-
-The Python code loops over all projects, and fetches a `project_obj` object variable that contains all attributes. Because there can be many pipelines and jobs, fetching the list of jobs can be expensive in one call. Therefore, this is done using [keyset pagination](https://python-gitlab.readthedocs.io/en/stable/api-usage.html#pagination). The remaining step is to fetch the `artifacts` attribute from the `job` object.
-
Based on how you implement the script, you could either:
- Collect all job artifacts and print a summary table at the end of the script.
- Print the information immediately.
-Collecting the job artifacts provides a data structure that can be written as a cache file to
-disk for example, which you can use when testing the implementation.
+In the following example, job artifacts are collected in the `ci_job_artifacts` list. The script
+loops over all projects, and fetches:
-In the following example, the job artifacts are collected in the `ci_job_artifacts` list.
+- The `project_obj` object variable that contains all attributes.
+- The `artifacts` attribute from the `job` object.
+
+You can use [keyset pagination](https://python-gitlab.readthedocs.io/en/stable/api-usage.html#pagination)
+to iterate over large lists of pipelines and jobs.
```python
ci_job_artifacts = []
@@ -415,7 +416,8 @@ In the following example, the job artifacts are collected in the `ci_job_artifac
print("No artifacts found.")
```
-At the end of the script, the job artifacts are printed as a Markdown formatted table. You can copy the table content into a new issue comment or description, or populate a Markdown file in a GitLab repository.
+At the end of the script, job artifacts are printed as a Markdown formatted table. You can copy the table
+content to an issue comment or description, or populate a Markdown file in a GitLab repository.
```shell
$ python3 get_all_projects_top_level_namespace_storage_analysis_cleanup_example.py
@@ -430,22 +432,22 @@ $ python3 get_all_projects_top_level_namespace_storage_analysis_cleanup_example.
| [gitlab-de/playground/artifact-gen-group/gen-job-artifacts-4](Gen Job Artifacts 4) | 4828297945 | job.log | trace | 0.0030 |
```
-The full example of the script `get_all_projects_top_level_namespace_storage_analysis_cleanup_example.py` is located in the [GitLab API with Python project](https://gitlab.com/gitlab-de/use-cases/gitlab-api/gitlab-api-python/). To ensure the script doesn't hit [API rate limits](../api/rest/index.md#rate-limits), the example code is not optimized for parallel API requests.
+### Delete job artifacts in bulk
+
+You can use a Python script to filter the types of job artifacts to delete in bulk.
-### Delete job artifacts
+Filter the API queries results to compare:
-You can use a filter to select the types of job artifacts to delete in bulk. A typical request:
+- The `created_at` value to calculate the artifact age.
+- The `size` attribute to determine if artifacts meet the size threshold.
+
+A typical request:
- Deletes job artifacts older than the specified number of days.
- Deletes job artifacts that exceed a specified amount of storage. For example, 100 MB.
-You can use a Python script to implement this type of filter. You can filter the API queries results, and compare
-the `created_at` value to calculate the artifact age.
-
-You can also loop over all job artifacts and compare their `size` attribute to see whether they match
-the size threshold. When a matching job has been found, it is marked for deletion. Because of the
-analysis that happens when the script loops through job attributes, the job can be marked as deleted
-only. When the collection loops remove the object locks, all marked as deleted jobs can actually be deleted.
+In the following example, the script loops through job attributes and marks them for deletion.
+When the collection loops remove the object locks, the script deletes the job artifacts marked for deletion.
```python
for project in projects:
@@ -489,18 +491,22 @@ only. When the collection loops remove the object locks, all marked as deleted j
# Print collection summary (removed for readability)
```
-The full example of the script `get_all_projects_top_level_namespace_storage_analysis_cleanup_example.py` is located in the [GitLab API Python project](https://gitlab.com/gitlab-de/use-cases/gitlab-api/gitlab-api-python/).
-
-#### Delete all job artifacts for a project
+### Delete all job artifacts for a project
If you do not need the project's [job artifacts](../ci/jobs/job_artifacts.md), you can
-use the following command to delete them all. This action cannot be reverted.
+use the following command to delete all job artifacts. This action cannot be reverted.
+
+Artifact deletion can take several minutes or hours, depending on the number of artifacts to delete. Subsequent
+analysis queries against the API might return the artifacts as a false-positive result.
+To avoid confusion with results, do not immediately run additional API requests.
-Job artifact deletion happens asynchronously in GitLab and can take a while to complete in the background. Subsequent analysis queries against the API can still return the artifacts as a false-positive result. Artifact deletion can take minutes or hours, depending on the artifacts to delete. To avoid confusion with results, do not run immediate additional API requests.
+The [artifacts for the most recent successful jobs](../ci/jobs/job_artifacts.md#keep-artifacts-from-most-recent-successful-jobs) are kept by default.
-The [artifacts for the most recent successful jobs](../ci/jobs/job_artifacts.md#keep-artifacts-from-most-recent-successful-jobs) are also kept by default.
+To delete all job artifacts for a project:
-Example with curl:
+::Tabs
+
+:::TabTitle curl
```shell
export GL_PROJECT_ID=48349590
@@ -508,7 +514,7 @@ export GL_PROJECT_ID=48349590
curl --silent --header "Authorization: Bearer $GITLAB_TOKEN" --request DELETE "https://gitlab.com/api/v4/projects/$GL_PROJECT_ID/artifacts"
```
-Example with the [GitLab CLI](../editor_extensions/gitlab_cli/index.md):
+:::TabTitle GitLab CLI
```shell
glab api --method GET projects/$GL_PROJECT_ID/jobs | jq --compact-output '.[]' | jq --compact-output '.id, .artifacts'
@@ -516,17 +522,19 @@ glab api --method GET projects/$GL_PROJECT_ID/jobs | jq --compact-output '.[]' |
glab api --method DELETE projects/$GL_PROJECT_ID/artifacts
```
-Example with the [`python-gitlab` library](https://python-gitlab.readthedocs.io/en/stable/gl_objects/pipelines_and_jobs.html#jobs):
+:::TabTitle Python
```python
project.artifacts.delete()
```
+::EndTabs
+
### Delete job logs
When you delete a job log you also [erase the entire job](../api/jobs.md#erase-a-job).
-Example with the [GitLab CLI](../editor_extensions/gitlab_cli/index.md):
+Example with the GitLab CLI:
```shell
glab api --method GET projects/$GL_PROJECT_ID/jobs | jq --compact-output '.[]' | jq --compact-output '.id'
@@ -541,9 +549,9 @@ glab api --method POST projects/$GL_PROJECT_ID/jobs/4836226180/erase | jq --comp
"success"
```
-In the `python-gitlab` API library, you must use [`job.erase()`](https://python-gitlab.readthedocs.io/en/stable/gl_objects/pipelines_and_jobs.html#jobs) instead of `job.delete_artifacts()`.
+In the `python-gitlab` API library, use [`job.erase()`](https://python-gitlab.readthedocs.io/en/stable/gl_objects/pipelines_and_jobs.html#jobs) instead of `job.delete_artifacts()`.
To avoid this API call from being blocked, set the script to sleep for a short amount of time between calls
-that delete the job artifact.
+that delete the job artifact:
```python
for job in jobs_marked_delete_artifacts:
@@ -555,20 +563,101 @@ that delete the job artifact.
time.sleep(1)
```
-The full example of the script `get_all_projects_top_level_namespace_storage_analysis_cleanup_example.py` is located in the [GitLab API with Python project](https://gitlab.com/gitlab-de/use-cases/gitlab-api/gitlab-api-python/).
-
Support for creating a retention policy for job logs is proposed in [issue 374717](https://gitlab.com/gitlab-org/gitlab/-/issues/374717).
-### Inventory of job artifacts expiry settings
+### Delete old pipelines
+
+Pipelines do not add to the overall storage consumption, but if required you can delete them with the following methods.
+
+Automatic deletion of old pipelines is proposed in [issue 338480](https://gitlab.com/gitlab-org/gitlab/-/issues/338480).
+
+Example with the GitLab CLI:
+
+```shell
+export GL_PROJECT_ID=48349590
+
+glab api --method GET projects/$GL_PROJECT_ID/pipelines | jq --compact-output '.[]' | jq --compact-output '.id,.created_at'
+960031926
+"2023-08-08T22:09:52.745Z"
+959884072
+"2023-08-08T18:59:47.581Z"
+
+glab api --method DELETE projects/$GL_PROJECT_ID/pipelines/960031926
+
+glab api --method GET projects/$GL_PROJECT_ID/pipelines | jq --compact-output '.[]' | jq --compact-output '.id,.created_at'
+959884072
+"2023-08-08T18:59:47.581Z"
+```
+
+The `created_at` key must be converted from a timestamp to Unix epoch time,
+for example with `date -d '2023-08-08T18:59:47.581Z' +%s`. In the next step, the
+age can be calculated with the difference between now, and the pipeline creation
+date. If the age is larger than the threshold, the pipeline should be deleted.
+
+The following example uses a Bash script that expects `jq` and the GitLab CLI installed, and authorized, and the exported environment variable `GL_PROJECT_ID`.
+
+The full script `get_cicd_pipelines_compare_age_threshold_example.sh` is located in the [GitLab API with Linux Shell](https://gitlab.com/gitlab-de/use-cases/gitlab-api/gitlab-api-linux-shell) project.
+
+```shell
+#/bin/bash
+
+CREATED_AT_ARR=$(glab api --method GET projects/$GL_PROJECT_ID/pipelines | jq --compact-output '.[]' | jq --compact-output '.created_at' | jq --raw-output @sh)
+
+for row in ${CREATED_AT_ARR[@]}
+do
+ stripped=$(echo $row | xargs echo)
+ #echo $stripped #DEBUG
+
+ CREATED_AT_TS=$(date -d "$stripped" +%s)
+ NOW=$(date +%s)
+
+ AGE=$(($NOW-$CREATED_AT_TS))
+ AGE_THRESHOLD=$((90*24*60*60)) # 90 days
+
+ if [ $AGE -gt $AGE_THRESHOLD ];
+ then
+ echo "Pipeline age $AGE older than threshold $AGE_THRESHOLD, should be deleted."
+ # TODO call glab to delete the pipeline. Needs an ID collected from the glab call above.
+ else
+ echo "Pipeline age $AGE not older than threshold $AGE_THRESHOLD. Ignore."
+ fi
+done
+```
+
+You can use the [`python-gitlab` API library](https://python-gitlab.readthedocs.io/en/stable/gl_objects/pipelines_and_jobs.html#project-pipelines) and
+the `created_at` attribute to implement a similar algorithm that compares the job artifact age:
+
+```python
+ # ...
+
+ for pipeline in project.pipelines.list(iterator=True):
+ pipeline_obj = project.pipelines.get(pipeline.id)
+ print("DEBUG: {p}".format(p=json.dumps(pipeline_obj.attributes, indent=4)))
+
+ created_at = datetime.datetime.strptime(pipeline.created_at, '%Y-%m-%dT%H:%M:%S.%fZ')
+ now = datetime.datetime.now()
+ age = (now - created_at).total_seconds()
+
+ threshold_age = 90 * 24 * 60 * 60
+
+ if (float(age) > float(threshold_age)):
+ print("Deleting pipeline", pipeline.id)
+ pipeline_obj.delete()
+```
+
+### List expiry settings for job artifacts
To manage artifact storage, you can update or configure when an artifact expires.
The expiry setting for artifacts are configured in each job configuration in the `.gitlab-ci.yml`.
-If you have multiple projects, and depending on how job definitions are organized in the CI/CD configuration, it may be difficult to locate the expiry setting. You can use a script to search the entire CI/CD configuration. This includes access to objects that are resolved after inheriting values, like `extends` or `!reference`.
+If there are multiple projects, and based on how job definitions are organized in the CI/CD configuration, it might be difficult
+to locate the expiry setting. You can use a script to search the entire CI/CD configuration. This includes access to objects that
+are resolved after they inherit values, like `extends` or `!reference`.
+
The script retrieves merged CI/CD configuration files and searches for the artifacts key to:
-- Identify the jobs that don't have an expiry setting.
-- Return the expiry setting for jobs that have the artifact expiry configured.
+- Identify jobs that do not have an expiry setting.
+- Return expiry settings for jobs that have the artifact expiry configured.
The following process describes how the script searches for the artifact expiry setting:
@@ -626,7 +715,16 @@ The following process describes how the script searches for the artifact expiry
print(f'| [{ details["project_name"] }]({details["project_web_url"]}) | { details["job_name"] } | { details["artifacts_expiry"] if details["artifacts_expiry"] is not None else "❌ N/A" } |')
```
-The script generates a Markdown summary table with project name and URL, job name, and the `artifacts:expire_in` setting, or `N/A` if not existing. It does not print job templates starting with a `.` character which are not instantiated as runtime job objects that would generate artifacts.
+The script generates a Markdown summary table with:
+
+- Project name and URL.
+- Job name.
+- The `artifacts:expire_in` setting, or `N/A` if there is no setting.
+
+The script does not print job templates that:
+
+- Start with a `.` character.
+- Are not instantiated as runtime job objects that generate artifacts.
```shell
export GL_GROUP_ID=56595735
@@ -660,9 +758,9 @@ glab api --method GET projects/$GL_PROJECT_ID/search --field "scope=blobs" --fie
For more information about the inventory approach, see [How GitLab can help mitigate deletion of open source container images on Docker Hub](https://about.gitlab.com/blog/2023/03/16/how-gitlab-can-help-mitigate-deletion-open-source-images-docker-hub/).
-### Set the default expiry for job artifacts in projects
+### Set default expiry for job artifacts
-Based on the output of the `get_all_cicd_config_artifacts_expiry.py` script, you can define the [default artifact expiration](../ci/yaml/index.md#default) in your `.gitlab-ci.yml` configuration.
+To set the default expiry for job artifacts in a project, specify the `expire_in` value in the `.gitlab-ci.yml` file:
```yaml
default:
@@ -670,93 +768,17 @@ default:
expire_in: 1 week
```
-### Delete old pipelines
-
-Pipelines do not add to the overall storage consumption, but if you want to delete them you can use the following methods.
-
-Example using the [GitLab CLI](../editor_extensions/gitlab_cli/index.md):
-
-```shell
-export GL_PROJECT_ID=48349590
-
-glab api --method GET projects/$GL_PROJECT_ID/pipelines | jq --compact-output '.[]' | jq --compact-output '.id,.created_at'
-960031926
-"2023-08-08T22:09:52.745Z"
-959884072
-"2023-08-08T18:59:47.581Z"
-
-glab api --method DELETE projects/$GL_PROJECT_ID/pipelines/960031926
-
-glab api --method GET projects/$GL_PROJECT_ID/pipelines | jq --compact-output '.[]' | jq --compact-output '.id,.created_at'
-959884072
-"2023-08-08T18:59:47.581Z"
-```
-
-The `created_at` key must be converted from a timestamp to Unix epoch time,
-for example with `date -d '2023-08-08T18:59:47.581Z' +%s`. In the next step, the
-age can be calculated with the difference between now, and the pipeline creation
-date. If the age is larger than the threshold, the pipeline should be deleted.
-
-The following example uses a Bash script that expects `jq` and the [GitLab CLI](../editor_extensions/gitlab_cli/index.md) installed, and authorized, and the exported environment variable `GL_PROJECT_ID`.
-
-The full script `get_cicd_pipelines_compare_age_threshold_example.sh` is located in the [GitLab API with Linux Shell](https://gitlab.com/gitlab-de/use-cases/gitlab-api/gitlab-api-linux-shell) project.
-
-```shell
-#/bin/bash
-
-CREATED_AT_ARR=$(glab api --method GET projects/$GL_PROJECT_ID/pipelines | jq --compact-output '.[]' | jq --compact-output '.created_at' | jq --raw-output @sh)
-
-for row in ${CREATED_AT_ARR[@]}
-do
- stripped=$(echo $row | xargs echo)
- #echo $stripped #DEBUG
-
- CREATED_AT_TS=$(date -d "$stripped" +%s)
- NOW=$(date +%s)
-
- AGE=$(($NOW-$CREATED_AT_TS))
- AGE_THRESHOLD=$((90*24*60*60)) # 90 days
-
- if [ $AGE -gt $AGE_THRESHOLD ];
- then
- echo "Pipeline age $AGE older than threshold $AGE_THRESHOLD, should be deleted."
- # TODO call glab to delete the pipeline. Needs an ID collected from the glab call above.
- else
- echo "Pipeline age $AGE not older than threshold $AGE_THRESHOLD. Ignore."
- fi
-done
-```
-
-You can use the [`python-gitlab` API library](https://python-gitlab.readthedocs.io/en/stable/gl_objects/pipelines_and_jobs.html#project-pipelines) and
-the `created_at` attribute to implement a similar algorithm that compares the job artifact age:
+## Manage Container Registries storage
-```python
- # ...
+Container registries are available [in a project](../api/container_registry.md#within-a-project) or [in a group](../api/container_registry.md#within-a-group). You can analyze both locations to implement a cleanup strategy.
- for pipeline in project.pipelines.list(iterator=True):
- pipeline_obj = project.pipelines.get(pipeline.id)
- print("DEBUG: {p}".format(p=json.dumps(pipeline_obj.attributes, indent=4)))
+### List container registries
- created_at = datetime.datetime.strptime(pipeline.created_at, '%Y-%m-%dT%H:%M:%S.%fZ')
- now = datetime.datetime.now()
- age = (now - created_at).total_seconds()
+To list Container Registries in a project:
- threshold_age = 90 * 24 * 60 * 60
+::Tabs
- if (float(age) > float(threshold_age)):
- print("Deleting pipeline", pipeline.id)
- pipeline_obj.delete()
-```
-
-The full example of the script `get_all_projects_top_level_namespace_storage_analysis_cleanup_example.py` is located in the [GitLab API with Python project](https://gitlab.com/gitlab-de/use-cases/gitlab-api/gitlab-api-python/).
-
-Automatically deleting old pipelines in GitLab is tracked in [this feature proposal](https://gitlab.com/gitlab-org/gitlab/-/issues/338480).
-
-## Manage storage for Container Registries
-
-Container registries are available [in a project](../api/container_registry.md#within-a-project) or [in a group](../api/container_registry.md#within-a-group). Both locations require analysis and cleanup strategies.
-
-The following example uses using `curl` and `jq` for a project:
+:::TabTitle curl
```shell
export GL_PROJECT_ID=48057080
@@ -771,7 +793,7 @@ curl --silent --header "Authorization: Bearer $GITLAB_TOKEN" "https://gitlab.com
3401613
```
-The following example uses the [GitLab CLI](../editor_extensions/gitlab_cli/index.md) for a project:
+:::TabTitle GitLab CLI
```shell
export GL_PROJECT_ID=48057080
@@ -794,9 +816,9 @@ glab api --method GET projects/$GL_PROJECT_ID/registry/repositories/4435617/tags
3401613
```
-A similar automation shell script is created in the [delete old pipelines](#delete-old-pipelines) section.
+::EndTabs
-The `python-gitlab` API library provides bulk deletion interfaces explained in the next section.
+A similar automation shell script is created in the [delete old pipelines](#delete-old-pipelines) section.
### Delete container images in bulk
@@ -810,7 +832,7 @@ you can configure:
WARNING:
On GitLab.com, due to the scale of the Container Registry, the number of tags deleted by this API is limited.
If your Container Registry has a large number of tags to delete, only some of them are deleted. You might need
-to call the API multiple times. To schedule tags for automatic deletion, use a [cleanup policy](#cleanup-policy-for-containers) instead.
+to call the API multiple times. To schedule tags for automatic deletion, use a [cleanup policy](#create-a-cleanup-policy-for-containers) instead.
The following example uses the [`python-gitlab` API library](https://python-gitlab.readthedocs.io/en/stable/gl_objects/repository_tags.html) to fetch a list of tags, and calls the `delete_in_bulk()` method with filter parameters.
@@ -828,18 +850,17 @@ The following example uses the [`python-gitlab` API library](https://python-gitl
repository.tags.delete_in_bulk(name_regex_delete="v.+", keep_n=2)
```
-The full example of the script `get_all_projects_top_level_namespace_storage_analysis_cleanup_example.py` is located
-in the [GitLab API with Python](https://gitlab.com/gitlab-de/use-cases/gitlab-api/gitlab-api-python/) project.
+### Create a cleanup policy for containers
-### Cleanup policy for containers
+Use the project REST API endpoint to [create cleanup policies](packages/container_registry/reduce_container_registry_storage.md#use-the-cleanup-policy-api) for containers. After you set the cleanup policy, all container images that match your specifications are deleted automatically. You do not need additional API automation scripts.
-Use the project REST API endpoint to [create cleanup policies](packages/container_registry/reduce_container_registry_storage.md#use-the-cleanup-policy-api). The following example uses the [GitLab CLI](../editor_extensions/gitlab_cli/index.md) to create a cleanup policy.
-
-To send the attributes as a body parameter, you must:
+To send the attributes as a body parameter:
- Use the `--input -` parameter to read from the standard input.
- Set the `Content-Type` header.
+The following example uses the GitLab CLI to create a cleanup policy:
+
```shell
export GL_PROJECT_ID=48057080
@@ -859,19 +880,17 @@ echo '{"container_expiration_policy_attributes":{"cadence":"1month","enabled":tr
```
-After you set up the cleanup policy, all container images that match your specifications are deleted automatically. You do not need additional API automation scripts.
-
### Optimize container images
You can optimize container images to reduce the image size and overall storage consumption in the container registry. Learn more in the [pipeline efficiency documentation](../ci/pipelines/pipeline_efficiency.md#optimize-docker-images).
-## Manage storage for Package Registry
+## Manage Package Registry storage
Package registries are available [in a project](../api/packages.md#within-a-project) or [in a group](../api/packages.md#within-a-group).
### List packages and files
-The following example shows fetching packages from a defined project ID using the [GitLab CLI](../editor_extensions/gitlab_cli/index.md). The result set is an array of dictionary items that can be filtered with the `jq` command chain.
+The following example shows fetching packages from a defined project ID using the GitLab CLI. The result set is an array of dictionary items that can be filtered with the `jq` command chain.
```shell
# https://gitlab.com/gitlab-de/playground/container-package-gen-group/generic-package-generator
@@ -923,7 +942,7 @@ and loops over its package files to print the `file_name` and `size` attributes.
[Deleting a file in a package](../api/packages.md#delete-a-package-file) can corrupt the package. You should delete the package when performing automated cleanup maintenance.
-To delete a package, use the [GitLab CLI](../editor_extensions/gitlab_cli/index.md) to change the `--method`
+To delete a package, use the GitLab CLI to change the `--method`
parameter to `DELETE`:
```shell
@@ -981,18 +1000,39 @@ Package size: 20.0033
Package size 20.0033 > threshold 10.0000, deleting package.
```
-The full example of the script `get_all_projects_top_level_namespace_storage_analysis_cleanup_example.py` is located in the [GitLab API with Python](https://gitlab.com/gitlab-de/use-cases/gitlab-api/gitlab-api-python/) project.
-
### Dependency Proxy
Review the [cleanup policy](packages/dependency_proxy/reduce_dependency_proxy_storage.md#cleanup-policies) and how to [purge the cache using the API](packages/dependency_proxy/reduce_dependency_proxy_storage.md#use-the-api-to-clear-the-cache)
-## Community resources
+## Improve output readability
-These resources are not officially supported. Ensure to test scripts and tutorials before running destructive cleanup commands that may not be reverted.
+You might need to convert timestamp seconds into a duration format, or print raw bytes in a more
+representative format. You can use the following helper functions to transform values for improved
+readability:
-- Forum topic: [Storage management automation resources](https://forum.gitlab.com/t/storage-management-automation-resources/)
-- Script: [GitLab Storage Analyzer](https://gitlab.com/gitlab-de/use-cases/gitlab-api/gitlab-storage-analyzer), unofficial project by the [GitLab Developer Evangelism team](https://gitlab.com/gitlab-de/). You find similar code examples in this documentation how-to here.
+```shell
+# Current Unix timestamp
+date +%s
+
+# Convert `created_at` date time with timezone to Unix timestamp
+date -d '2023-08-08T18:59:47.581Z' +%s
+```
+
+Example with Python that uses the `python-gitlab` API library:
+
+```python
+def render_size_mb(v):
+ return "%.4f" % (v / 1024 / 1024)
+
+def render_age_time(v):
+ return str(datetime.timedelta(seconds = v))
+
+# Convert `created_at` date time with timezone to Unix timestamp
+def calculate_age(created_at_datetime):
+ created_at_ts = datetime.datetime.strptime(created_at_datetime, '%Y-%m-%dT%H:%M:%S.%fZ')
+ now = datetime.datetime.now()
+ return (now - created_at_ts).total_seconds()
+```
## Testing for storage management automation
@@ -1143,3 +1183,10 @@ Use the following projects to test storage usage with [cost factors for forks](u
- Fork [`gitlab-org/gitlab`](https://gitlab.com/gitlab-org/gitlab) into a new namespace or group (includes LFS, Git repository).
- Fork [`gitlab-com/www-gitlab-com`](https://gitlab.com/gitlab-com/www-gitlab-comgitlab-com/www-gitlab-com) into a new namespace or group.
+
+## Community resources
+
+The following resources are not officially supported. Ensure to test scripts and tutorials before running destructive cleanup commands that may not be reverted.
+
+- Forum topic: [Storage management automation resources](https://forum.gitlab.com/t/storage-management-automation-resources/)
+- Script: [GitLab Storage Analyzer](https://gitlab.com/gitlab-de/use-cases/gitlab-api/gitlab-storage-analyzer), unofficial project by the [GitLab Developer Evangelism team](https://gitlab.com/gitlab-de/). You find similar code examples in this documentation how-to here.