Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-01-24Add repacking support to transaction managerqmnguyen0711/add-repacking-support-to-transaction-managerQuang-Minh Nguyen
Gitaly has a sophisticated housekeeping system. That system packs loose objects, re-organizes the on-disk layout of packfiles, prunes unreachable objects, etc. It aims to make the repository optimal, performant, and cost-effective. It's a crucial component for Gitaly. The current housekeeping approach is working on the repository concurrently with the TransactionManager. This is not okay as the TransactionManager is expected to be the single writer in the repository. We'll thus need a different method for repacking objects of a repository to synchronize it with all other access. The WAL manager has a very different way of handling concurrent requests. As a result, the repacking task should adapt the new architecture accordingly. The manager handles a repacking task in three stages: preparation, verification, and applying. When a transaction is committed, the goroutine of the transaction runs repacking preparation. This stage triggers `git-repack(1)` command with different parameters depending on the desired strategy. Afterward, it attaches the list of new files and a list of deleted files to the transaction. This stage can span multiple minutes/hours. While it runs, the manager can accept other update transactions. When the preparation stage finishes, the repacking transaction is submitted to the manager and the verification is performed. This verification is head-of-line blocking for each repository. The manager verifies if the repacking task causes any conflict with other transactions accepted beforehand. There are two types of conflicts: - Another transaction points to new references to pruned objects. - Another transaction includes a change that depends on pruned objects. Both cases require examining the list of committed transactions since the time the repacking task started. The manager collects the reference tips and verifies if they are still accessible from the repository or in any new packfiles produced by other transactions. The dependency check is not supported now. In the future, when Git supports extracting relevant objects of a pruned object, we can resolve conflicts smarter. At present, the manager rejects the repacking task if it finds any conflict. If the task is good to go, the manager appends the WAL log entry. Finally, the corresponding log entry is applied. The manager removes redundant packfiles and links new ones. If there are any concurrent transactions that introduce file changes, their resulting packfiles are located next to the repacked one(s). At this stage, we don't want to modify the housekeeping scheduler. The scheduler decides when and how a housekeeping task should run on a repository. It has different repacking strategies depending on the repository situation. The manager handles those strategies accordingly. There are 4 of them now: - IncrementalWithUnreachable: this strategy packs unreachable objects into a single packfile. In the WAL transaction, all changes are packed by default. So, this strategy is a no-op. - Geometric: this strategy rearranges the list of packfiles according to a geometric progression without taking reachability into account. It doesn't prune objects either. - FullWithUnreachable: this strategy merges all packfiles into a single packfile, simultaneously removing any loose objects. Unreachable objects are then appended to the end of this unified packfile. - FullWithCruft: In traditional housekeeping, the manager gets rid of unreachable objects via full repacking with cruft. It pushes all unreachable objects to a cruft packfile and keeps track of each object mtimes. All unreachable objects exceeding a grace period are cleaned up. The grace period is to ensure the housekeeping doesn't delete a to-be-reachable object accidentally. In WAL, it's feasible to examine the list of applied transactions. As a result, we don't need to take object expiry or cruft pack into account. This operation triggers a normal full repack without cruft packing. We keep the same strategy name for backward compatibility. Those strategies have increasing costs as well as corresponding effects. The lower-cost ones will be triggered more frequently. Only the last strategy involves object pruning. Others are safe for concurrency.
2024-01-24Add repacking task to log entryQuang-Minh Nguyen
This commit adds Repack task to transaction Log entry. This task models the repacking housekeeping task. It includes two major fields: NewFiles and DeletedFiles. - NewFiles are the new packfiles that will be added to the repository. They are the result of `git-repack(1)` command. - DeletedFiles are the redundant packfiles which are already packed into the new packfiles above. We keep that list to prevent deleting new packfiles introduced by other concurrent updates.
2024-01-24Add an assertion to assert state of packfiles in transactionsQuang-Minh Nguyen
This commit implements new asserter for a more detailed examining objects of a repository. This assert helper allows the tests to: - Assert all packfiles in objects/pack repository. - Assert the content and verify indexes of each packfile. - Assert multi-pack-index file. - Assert "invisible" objects which don't locate on disk but present from Git's perspective. Typically, it's used in an alternate setting where pool member can see other objects of the pool. This asserter is a superset of the existing Objects asserter. However, it's too verbose. A majority of the tests should not be concerned with the on-disk packfile layout of objects. So, the existing Objects asserter is kept.
2024-01-24Generalize packRefsDirectoryEntry to anyDirectoryEntryQuang-Minh Nguyen
Previously, we introduce packRefsDirectoryEntry to assert the on-disk content of pack-refs file in WAL's luggage. That assertion verify if that file exists but don't look into its content. The content is verifed in a later stage. We'll do the same for packfiles. This commit generalize this assert helper for re-using in later commits.
2024-01-24Expose packfile index parserQuang-Minh Nguyen
This commit extracts the index parser of the packfile package. This parser parses the output of `git-show-index(1)` command and triggers the input callback when it finds any object. This parser will be used in some following commits to read the index files attached in WAL transactions.
2024-01-24housekeeping: Expose some housekeeping repacking functionsQuang-Minh Nguyen
This commit extracts and exposes and housekeeping utility functions. They will be shared to the transaction manager in the following commits.
2024-01-23Merge branch 'kn-update-protolint' into 'master'HEADmasterJohn Cai
tools/protolint: update package to latest Closes #5749 See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6632 Merged-by: John Cai <jcai@gitlab.com> Approved-by: John Cai <jcai@gitlab.com> Co-authored-by: Karthik Nayak <knayak@gitlab.com>
2024-01-23Merge branch 'deterministic-gpg-signature' into 'master'Toon Claes
fix: deterministic gpg signature See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6621 Merged-by: Toon Claes <toon@gitlab.com> Approved-by: karthik nayak <knayak@gitlab.com> Approved-by: Toon Claes <toon@gitlab.com> Reviewed-by: karthik nayak <knayak@gitlab.com> Co-authored-by: Adrien <adrien@huggingface.co>
2024-01-23signature: extend CreateSignature to accept timestamp Adrien Carreira
Since the gpg signature is time-dependent, aligning the timestamp with the author's date makes the gpg signature consistent across all gitaly nodes.
2024-01-23tools/protolint: Update package to latestKarthik Nayak
Update the "protolint" package from v0.46.1 to v0.47.5. This updates most of the dependencies, including "gopkg.in/yaml". The older "gopkg.in/yaml" package, below v2.2.4 [1], contained a vulnerability [2] which is now removed. [1]: https://github.com/go-yaml/yaml/releases/tag/v2.2.4 [2]: https://pkg.go.dev/vuln/GO-2022-0956
2024-01-22Update VERSION filesv16.9.0-rc1gitaly-ci-jobs-48759gitaly-ci-jobs-48758GitLab Release Tools Bot
2024-01-22Merge branch 'wc/filter-repo' into 'master'Will Chandler
cleanup: Add RewriteHistory RPC See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6615 Merged-by: Will Chandler <wchandler@gitlab.com> Approved-by: karthik nayak <knayak@gitlab.com> Reviewed-by: Patrick Steinhardt <psteinhardt@gitlab.com> Reviewed-by: Will Chandler <wchandler@gitlab.com> Reviewed-by: karthik nayak <knayak@gitlab.com>
2024-01-22Merge branch 'toon-git243-only' into 'master'Toon Claes
git: Use Git version 2.43 only Closes #5739 See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6624 Merged-by: Toon Claes <toon@gitlab.com> Approved-by: Eric Ju <eju@gitlab.com> Approved-by: Sami Hiltunen <shiltunen@gitlab.com>
2024-01-22coordinator: Enable transactions for new RewriteHistory RPCWill Chandler
Enable transactions for the new RewriteHistory RPC.
2024-01-22cleanup: Validate repo was not modified before fetchingWill Chandler
On large repositories git-filter-repo(1) make take a significant amount of time to run. Should a write occur after the git-fast-export(1) portion of the task has completed, it is possible that the repository history will not be fully rewritten. To guard against this condition, we checksum the repository before and after running filter-repo. If the checksums do not match we abort and do not fetch the updated history into the repository.
2024-01-22cleanup: Don't run filter-repo in-placeWill Chandler
git-filter-repo(1) uses git-fast-import(1) to import the rewritten repository history. This will unpack the new objects, then iterate over references serially and update them using reference transactions. This does not atomically update the references[0], so an interruption during this final stage will result in partially applied changes. To mitigate this risk, create a temporary staging repository to write the updated history into, then atomically force fetch that into the original repo. This has the downside of being slower than modifying the repository in-place[1], but improving safety for a high-risk operation like this is a greater priority. [0] https://gitlab.com/gitlab-org/git/-/blob/d4dbce1db5cd227a57074bcfc7ec9f0655961bba/builtin/fast-import.c#L1659-1668 [1] https://github.com/newren/git-filter-repo/issues/66#issuecomment-602100316
2024-01-22cleanup: Add RewriteHistory RPCWill Chandler
Historically we have advised users who need to rewrite history to do so locally and force push their change to Gitlab. However, upcoming changes may prevent a user from pushing in scenarios where they need to remove a large blob from their repository's history. To handle this scenario, we introduce a new `RewriteHistory` RPC which will invoke git-filer-repo(1) on the target repository. filter-repo has a large number of options, but we will support only two: --strip-blogs-with-ids Given a file containing a list of newline-delimited object ids, rewrite history to remove them from all commits. --replace-text Given a file of literals and patterns, replace all matching instances in history with '***REMOVED***'. filter-repo works by fetching the repository contents via git-fast-export(1), making the requested changes, and writing the changes back via git-fast-import(1). As filter-repo uses the '--force' flag[0] the repository must be made read-only before calling this RPC. filter-repo is currently incompatible with SHA256 repositories. [0] https://git-scm.com/docs/git-fast-import#_parallel_operation Changelog: added
2024-01-22git: Add filter-repo subcommandWill Chandler
Add git-filter-repo(1) as a recognized subcommand.
2024-01-22ci: Install Python3 for FIPS jobsWill Chandler
The ubi image used for FIPS testing does not have Python3 installed by default. Install it as part of the `before_script` steps.
2024-01-19Makefile: Add git-filter-repo for testingWill Chandler
We will shortly begin using git-filter-repo(1) to rewrite repository history. This is a Python script that requires Python 3.5+, but it has no external dependencies and can be downloaded as a single file. We have added Python3 and installed git-filter-repo in Omnibus and CNG GitLab, but still need it to be present for local testing and CI. Add a target to clone filter-repo as a dependency and copy the script into ${BUILD_DIR}/bin which is added to PATH for our tests. We add a '.version' file for filter-repo to be consistent with the other dependencies, but this isn't really required as we're not running a build process in its source directory.
2024-01-19Merge branch 'jliu-fix-restore-order' into 'master'Toon Claes
backup: Check for backup before deleting repo See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6620 Merged-by: Toon Claes <toon@gitlab.com> Approved-by: Toon Claes <toon@gitlab.com> Approved-by: Sami Hiltunen <shiltunen@gitlab.com> Co-authored-by: James Liu <jliu@gitlab.com>
2024-01-19Makefile: Stop compiling and installing Git v2.42Toon Claes
In previous commit we've dropped all use of Git v2.42, so now we can stop building it.
2024-01-19git: Use Git version 2.43 onlyToon Claes
Remove the feature flag to toggle the use of v2.43, enable it by default, and remove the use of v2.42. Label: maintenance::dependency
2024-01-19Merge branch 'toon-collapse-pb-go' into 'master'karthik nayak
.gitattributes: Mark .pb.go files as generated See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6623 Merged-by: karthik nayak <knayak@gitlab.com> Approved-by: karthik nayak <knayak@gitlab.com> Reviewed-by: karthik nayak <knayak@gitlab.com> Co-authored-by: Toon Claes <toon@gitlab.com>
2024-01-19.gitattributes: Mark .pb.go files as generatedToon Claes
To make code reviews easier, mark "*.pb.go" files as generated, which makes them collapsed by default in GitLab. See: https://docs.gitlab.com/ee/user/project/merge_requests/changes.html#collapse-generated-files
2024-01-19Merge branch 'feat/issue-5700-dump-trace2-events-using-feature-flag' into ↵Quang-Minh Nguyen
'master' Create new hook to export git trace 2 events in gitaly logs Closes #5700 See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6570 Merged-by: Quang-Minh Nguyen <qmnguyen@gitlab.com> Approved-by: Quang-Minh Nguyen <qmnguyen@gitlab.com> Reviewed-by: Sami Hiltunen <shiltunen@gitlab.com> Reviewed-by: Quang-Minh Nguyen <qmnguyen@gitlab.com> Reviewed-by: Emily Chui <echui@gitlab.com> Co-authored-by: Emily Chui <echui@gitlab.com>
2024-01-19Create new hook to export git trace 2 events in gitaly logsEmily Chui
2024-01-19Merge branch 'jc/automate-rails-pipeline' into 'master'Justin Tobler
.gitlab-ci.yml: start rails spec and cleanup automatically See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6618 Merged-by: Justin Tobler <jtobler@gitlab.com> Approved-by: Lin Jen-Shin <jen-shin@gitlab.com> Approved-by: Justin Tobler <jtobler@gitlab.com> Co-authored-by: John Cai <jcai@gitlab.com>
2024-01-18.gitlab-ci.yml: start rails spec and cleanup automaticallyJohn Cai
2024-01-18backup: Check for backup before deleting repoJames Liu
Reorders the repository restore logic so that we do not remove the repo until we've checked that it has an existing backup. If the repo has no backup, it's skipped from the restore and will be removed later by the caller. The previous order of operations caused issues when performing a full restore that included a dangling repo (a repo created after the backup was taken). When the `Restore` function was executed against this repo, it was removed immediately. Then, the remainder of the restore was skipped since no backup existed for that repo. Since the repo was not marked as restored, the caller of the `Restore` function tried to remove it once again, leading to a "repository not found" error since it had already been erased. The "missing backup" test case for the restore of a specific backup has been updated to expect that the repo exists to align with this logic
2024-01-18Merge branch 'manifest_latest' into 'master'qmnguyen0711/rework-metrics-and-logs-of-housekeeping-tasksPatrick Steinhardt
Write latest manifest file See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6613 Merged-by: Patrick Steinhardt <psteinhardt@gitlab.com> Approved-by: Patrick Steinhardt <psteinhardt@gitlab.com> Approved-by: Will Chandler <wchandler@gitlab.com> Reviewed-by: Patrick Steinhardt <psteinhardt@gitlab.com> Co-authored-by: James Fargher <jfargher@gitlab.com>
2024-01-18Merge branch 'smh-create-fork-partitioning' into 'master'Justin Tobler
Partition fork with source repository in CreateFork Closes #5762 See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6612 Merged-by: Justin Tobler <jtobler@gitlab.com> Approved-by: Will Chandler <wchandler@gitlab.com> Approved-by: Justin Tobler <jtobler@gitlab.com> Co-authored-by: Sami Hiltunen <shiltunen@gitlab.com>
2024-01-18Merge branch 'xx/fix-typos' into 'master'Will Chandler
fix: Fix a collection of typos found by typos-cli See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6594 Merged-by: Will Chandler <wchandler@gitlab.com> Approved-by: Evan Read <eread@gitlab.com> Co-authored-by: Xing Xin <xingxin.xx@bytedance.com>
2024-01-17fix: Fix a collection of typos found by typos-cliXing Xin
Fix typos found by typos-cli(https://github.com/crate-ci/typos). Some affected tests are adjusted. There are a bunch of other typos are ignored, including * CHANGELOG.md * NOTICE * internal/.../migrations/20201208163237_cleanup_notifications_payload.go * other intended typos or false positives Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
2024-01-17Update changelog for 16.8.0GitLab Release Tools Bot
[ci skip]
2024-01-17Merge branch 'jliu-track-restored-repos-second-attempt' into 'master'Quang-Minh Nguyen
backup: Track repos that have been processed (re-attempt) See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6614 Merged-by: Quang-Minh Nguyen <qmnguyen@gitlab.com> Approved-by: Quang-Minh Nguyen <qmnguyen@gitlab.com> Reviewed-by: Quang-Minh Nguyen <qmnguyen@gitlab.com> Co-authored-by: James Liu <jliu@gitlab.com>
2024-01-17backup: Remove test exceptions for WALJames Liu
Now that backups no longer invoke the RemoveAll RPC, we can remove the exemption for these tests when the WAL is enabled.
2024-01-17praefect: Intercept WalkRepos RPCJames Liu
Adds a handler to Praefect to intercept calls to the WalkRepos RPC. The handler provides an alternate implementation of listing repositories in a storage, which queries the Praefect DB rather than walking the filesystem on disk. This is required so when the RPC is invoked via Praefect, the DB is used as the source of truth rather than a random Gitaly node. The only user-facing difference between this and the original implementation is that the `modification_time` attribute of the response message is left empty, as this cannot be determined via the DB.
2024-01-17proto: Deprecate RemoveAllJames Liu
Now that we've adjusted the restore mechanism to delete individual repos as needed, this RPC is no longer required. See the following issues for more context: - https://gitlab.com/gitlab-org/gitaly/-/issues/5357 - https://gitlab.com/gitlab-org/gitaly/-/issues/5269 Changelog: deprecated
2024-01-17backup: Delete RemoveAllRepositories from StrategyJames Liu
This is now deprecated in favour of removing individual repos.
2024-01-17backup: Only remove "dangling" repositoriesJames Liu
Instead of invoking the RemoveAll() RPC prior to the restore, we instead compare the set of existing repositories to the set of repositories contained in the backup being restored. The difference between the two sets -- the set of "dangling" repositories -- are removed individually. This is done to ensure the state of repos in Gitaly matches the worldview held by the Rails DB after a GitLab instance restore. Also fixes the existing tests so that all repositories are created with `gittest.CreateRepository` and are thus visible when we later query `ListRepositories` in the restore logic.
2024-01-16Merge branch '5743-fix-rails-trigger' into 'master'John Cai
Do not pass Gitaly variables to Rails pipeline Closes #5743 See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6617 Merged-by: John Cai <jcai@gitlab.com> Approved-by: John Cai <jcai@gitlab.com> Co-authored-by: Lin Jen-Shin <jen-shin@gitlab.com>
2024-01-16Do not pass Gitaly variables to Rails pipelineLin Jen-Shin
2024-01-16backup: Try read latest manifest when availableJames Fargher
Now that a latest manifest is written, we can use this manifest when loading the "latest" backup.
2024-01-16backup: Write latest manifest fileJames Fargher
Manifests currently cannot restore the "latest" backup because it would require an expensive object-storage directory traversal. So instead it defers to the pointer layout. The problem with this is that you then loose manifest only features like setting the default branch properly. Here we write two manifests: the normal backup manifest as before and an additional "latest" manifest file. This latest manifest is overwritten on each backup taken. For WORM we would ideally not overwrite any files, but until this is implemented we need something to fill the latest restore gap. Changelog: changed
2024-01-15backup: Stop sending invalid requestsJames Liu
Modifies the Restore CLI tests so we stop sending an invalid restore command as the final JSON object into stdin. This is required as a subsequent commit will move the repository removal logic to execute after the Pipeline completes successfully. If the tests purposely cause the pipeline to fail, the restore logic will never execute. Coverage of the Pipeline's error messaging is covered separately in the Pipeline's unit tests.
2024-01-15backup: Add ListRepositories to the strategyJames Liu
Adds a new method to the Strategy interface used by regular and server-side backups for performing repository backups and restores. This new method calls the internal WalkRepos() RPC to fetch a list of repos in a given storage.
2024-01-15backup: Add RemoveRepository to the strategyJames Liu
Adds a new method to the Strategy interface used by regular and server-side backups for performing repository backups and restores. This new method removes a single repository from its storage, and will eventually replace the existing RemoveAllRepositories method.
2024-01-15backup: Track repos that have been processedJames Liu
Adds a map to the Pipeline to track repos that have been restored or backed up. A mutex is used to synchronise access to the map, as entries are appended by goroutines operating in the workers. The signature of Done() is modified to return the map, and is intentionally ignored in the actual backup and restore logic for now. A subsequent commit will utilise the map for restore operations.
2024-01-14Partition fork with source repository in CreateForkSami Hiltunen
Gitaly's TransactionManager requires all object pool members to be in the same partition. Currently we're ensuring doing that generally by extracting the additional repository from the request and ensuring the target repository of the RPC gets partitioned with it. Additional repository is used in the ObjectPoolService's RPCs to tag the other repository being accessed in the pool related operations, and it may be either the object pool or one of the member repositories depending on the RPC. When a repository is first accessed, we also check whether it has an existing alternate link on the disk, and place the repository in the same partition with its alternate if so. These two methods are enough to ensure in general the pools and their members get partitioned together. One execption to this is CreateFork. The created fork should be placed in the same partition as the origin repository as they'll both eventually be connected to the same pool. CreateFork does not tag the source repository as an additional repository so the general handling does not apply for it. Tagging it as the additional repository won't work with Praefect as Praefect rewrites the paths of additional repositories. The additional repository is fetched through the API so it needs to have its original relative path intact. This commit introduces special handling for CreateFork. The transaction middleware checks whether the request is a CreateForkRequest. If so, the source repository is extracted and the newly created fork gets partitioned with it. Along with the behavior changes, we add a test that exercises the entire fork creation flow as typically done. Turns out Gitaly didn't have a test covering the scenario at all.