From 9395d198f9b9ec59858d2f316e58cda22ab80050 Mon Sep 17 00:00:00 2001 From: Nick Thomas Date: Mon, 19 Nov 2018 15:03:58 +0000 Subject: Use BFG object maps to clean projects --- .../project/repository/img/repository_cleanup.png | Bin 0 -> 20833 bytes .../repository/reducing_the_repo_size_using_git.md | 109 ++++++++++++++++----- 2 files changed, 83 insertions(+), 26 deletions(-) create mode 100644 doc/user/project/repository/img/repository_cleanup.png (limited to 'doc') diff --git a/doc/user/project/repository/img/repository_cleanup.png b/doc/user/project/repository/img/repository_cleanup.png new file mode 100644 index 00000000000..2749392ffa4 Binary files /dev/null and b/doc/user/project/repository/img/repository_cleanup.png differ diff --git a/doc/user/project/repository/reducing_the_repo_size_using_git.md b/doc/user/project/repository/reducing_the_repo_size_using_git.md index d534c8cbe4b..672567a8d7d 100644 --- a/doc/user/project/repository/reducing_the_repo_size_using_git.md +++ b/doc/user/project/repository/reducing_the_repo_size_using_git.md @@ -1,43 +1,105 @@ # Reducing the repository size using Git A GitLab Enterprise Edition administrator can set a [repository size limit][admin-repo-size] -which will prevent you to exceed it. +which will prevent you from exceeding it. When a project has reached its size limit, you will not be able to push to it, create a new merge request, or merge existing ones. You will still be able to create new issues, and clone the project though. Uploading LFS objects will also be denied. -In order to lift these restrictions, the administrator of the GitLab instance -needs to increase the limit on the particular project that exceeded it or you -need to instruct Git to rewrite changes. - If you exceed the repository size limit, your first thought might be to remove -some data, make a new commit and push back to the repository. Unfortunately, -it's not so easy and that workflow won't work. Deleting files in a commit doesn't -actually reduce the size of the repo since the earlier commits and blobs are -still around. What you need to do is rewrite history with Git's -[`filter-branch` option][gitscm]. +some data, make a new commit and push back to the repository. Perhaps you can +move some blobs to LFS, or remove some old dependency updates from history. +Unfortunately, it's not so easy and that workflow won't work. Deleting files in +a commit doesn't actually reduce the size of the repo since the earlier commits +and blobs are still around. What you need to do is rewrite history with Git's +[`filter-branch` option][gitscm], or a tool like the [BFG Repo-Cleaner][bfg]. Note that even with that method, until `git gc` runs on the GitLab side, the -"removed" commits and blobs will still be around. And if a commit was ever -included in an MR, or if a build was run for a commit, or if a user commented -on it, it will be kept around too. So, in these cases the size will not decrease. - -The only fool proof way to actually decrease the repository size is to prune all -the unneeded stuff locally, and then create a new project on GitLab and start -using that instead. +"removed" commits and blobs will still be around. You also need to be able to +push the rewritten history to GitLab, which may be impossible if you've already +exceeded the maximum size limit. -With that being said, you can try reducing your repository size with the -following method. - -## Using `git filter-branch` to purge files +In order to lift these restrictions, the administrator of the GitLab instance +needs to increase the limit on the particular project that exceeded it, so it's +always better to spot that you're approaching the limit and act proactively to +stay underneath it. If you hit the limit, and your admin can't - or won't - +temporarily increase it for you, your only option is to prune all the unneeded +stuff locally, and then create a new project on GitLab and start using that +instead. + +If you can continue to use the original project, we recommend [using the +BFG Repo-Cleaner](#using-the-bfg-repo-cleaner). It's faster and simpler than +`git filter-branch`, and GitLab can use its account of what has changed to clean +up its own internal state, maximizing the space saved. > **Warning:** > Make sure to first make a copy of your repository since rewriting history will > purge the files and information you are about to delete. Also make sure to > inform any collaborators to not use `pull` after your changes, but use `rebase`. +> **Warning:** +> This process is not suitable for removing sensitive data like password or keys +> from your repository. Information about commits, including file content, is +> cached in the database, and will remain visible even after they have been +> removed from the repository. + +## Using the BFG Repo-Cleaner + +> [Introduced](https://gitlab.com/gitlab-org/gitlab-ce/issues/19376) in GitLab 11.6. + +1. [Install BFG](https://rtyley.github.io/bfg-repo-cleaner/). + +1. Navigate to your repository: + + ``` + cd my_repository/ + ``` + +1. Change to the branch you want to remove the big file from: + + ``` + git checkout master + ``` + +1. Create a commit removing the large file from the branch, if it still exists: + + ``` + git rm path/to/big_file.mpg + git commit -m 'Remove unneeded large file' + ``` + +1. Rewrite history: + + ``` + bfg --delete-files path/to/big_file.mpg + ``` + + An object map file will be written to `object-id-map.old-new.txt`. Keep it + around - you'll need it for the final step! + +1. Force-push the changes to GitLab: + + ``` + git push --force-with-lease origin master + ``` + + If this step fails, someone has changed the `master` branch while you were + rewriting history. You could restore the branch and re-run BFG to preserve + their changes, or use `git push --force` to overwrite their changes. + +1. Navigate to **Project > Settings > Repository > Repository Cleanup**: + + ![Repository settings cleanup form](img/repository_cleanup.png) + + Upload the `object-id-map.old-new.txt` file and press **Start cleanup**. + This will remove any internal git references to the old commits, and run + `git gc` against the repository. You will receive an email once it has + completed. + +## Using `git filter-branch` + 1. Navigate to your repository: ``` @@ -70,11 +132,6 @@ following method. Your repository should now be below the size limit. -> **Note:** -> As an alternative to `filter-branch`, you can use the `bfg` tool with a -> command like: `bfg --delete-files path/to/big_file.mpg`. Read the -> [BFG Repo-Cleaner][bfg] documentation for more information. - [admin-repo-size]: https://docs.gitlab.com/ee/user/admin_area/settings/account_and_limit_settings.html#repository-size-limit [bfg]: https://rtyley.github.io/bfg-repo-cleaner/ [gitscm]: https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History#The-Nuclear-Option:-filter-branch -- cgit v1.2.3