Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-03-31Remove implicit pool creation on link behaviorsmh-remove-implicit-pool-creationSami Hiltunen
If a repository is being linked to an object pool that does not exist, the current behavior is to create the pool implicitly. This doesn't bode well with Praefect as it doesn't know whether a pool was or wasn't created and can't thus create a metadata record for the pool. The functionality was removed behind a feature flag. It doesn't seem like anything relies on the implicit pool creation, so this commit removes the feature flag and returns an error when linking a repository to a non-existent pool. Changelog: changed
2022-03-31Merge branch 'pks-proto-introduce-maintenance-operations' into 'master'Patrick Steinhardt
proto: Introduce new "maintenance" RPC type Closes #4079 See merge request gitlab-org/gitaly!4399
2022-03-31Merge branch 'pks-git-grep-limit-threading' into 'master'James Fargher
git: Globally limit number of threads Closes #4120 See merge request gitlab-org/gitaly!4443
2022-03-30Merge branch 'jc-repo-cleaner-grace-period' into 'master'John Cai
repocleaner: Only log repositories that have a mod time older than 24 hours and are not in the Praefect DB See merge request gitlab-org/gitaly!4449
2022-03-30internalgitaly: WalkRepos to return last accessed timestampJohn Cai
In order to make a decision on which repositories to clean up, Praefect's repocleaner will match repository paths with what it sees in the database. If it doesn't see a record in the database, that means Praefect doesn't know about this repository. However, it could be the case that a repository has just been created and the record for it doesn't yet exist in the database. In order for repocleaner to know, it needs the information to be returned from Gitaly. Add a field in the RPC response, and return the last modified date on the refs/ dir so we can use it as a proxy for when the repository was created.
2022-03-30proto: Mark related RPCs as maintenance operationsPatrick Steinhardt
We've got all the infrastructure in place now to handle maintenance operations specially in our infrastructure. Mark all RPCs which are maintenance-related as maintenance operations to switch them to use the new infrastucture. Note that we've still got a feature flag which toggles between old and new routing behaviour for these RPCs. Changelog: changed
2022-03-30housekeeping: Skip repacking empty repositoriesPatrick Steinhardt
When a repository does not have any bitmaps or in case it is missing bloom filters in the commit graph we always try to do a full repack in order to generate these data structures. This logic is required because: - Bitmaps are only generated by git-repack(1) for full repacks. This limitation exists because there can only be one bitmap per repository. - In case commit-graphs exist but missing bloom filters we need to completely rewrite them in order to enable this extension. While the logic makes sense, it also causes us to repack repositories which are empty: they don't contain either of these data structures, and consequentially we think we need a full repack. While this is not a huge problem because git-repack(1) would finish fast anyway in case the repo doesn't contain any objects, we still needlessly spawn a process. Also, our metrics report that we're doing a lot of full repacks, which can be confusing. Improve our heuristics by taking into account whether the repository has objects at all. If there aren't any, we can avoid all this busywork and just skip repacking altogether. Changelog: changed
2022-03-29Merge branch 'pks-sidechannel-migrate-to-runtime-dir' into 'master'Toon Claes
sidechannel: Convert to use runtime directory to store sockets See merge request gitlab-org/gitaly!4428
2022-03-29git: Globally limit threads used by git-pack-objects(1)Patrick Steinhardt
By default, git-pack-objects(1) will spawn as many threads as there are logical CPU cores on the machine. While this typically makes sense on a client's machine because a client will want the command to finish as fast as possible, on the server-side this is less of a good idea because it can easily impact other, concurrently running Git commands. This is why we have started to limit the number of threads this command may use when performing repository maintenance. Packing of objects is in no way limited to maintenance though, but it's also executed e.g. when serving packfiles. Let's globally limit the number of threads git-pack-objects(1) uses by injecting the `pack.threads` configuration into all commands which may generate packfiles. Changelog: performance
2022-03-28ssh: Inject feature flags via the test contextPatrick Steinhardt
When setting up a push for our tests, we also have the ability to inject feature flags by setting the `featureFlags` field. This has a major downside though: the feature flags injected via the context and the feature flags executed at a later point when actually running the command may easily drift apart. This inconsistency is something that may break assumptions on our side though, where some parts may now evaluate a flag to `true` while others evaluate it to `false`. Fix this inconsistency by instead accepting feature flags via a context. This both simplifies the code and fixes such an upcoming incompatibility when we're about to default-enable Git v2.35.1.gl1.
2022-03-28ssh: Refactor tests to not create worktreePatrick Steinhardt
Our SSH tests need to set up a repository with some new objects so that it can verify that these objects end up on the remote side after the push. To create those objects, we currently use a worktree and then execute git-commit(1). This usage pattern is outdated though: we have a `gittest.WriteCommit()` helper which does not require a worktree. Refactor tests to use this new helper so that we can get rid of using a worktree.
2022-03-24Makefile: Upgrade gofumpt to v0.3.1Patrick Steinhardt
Upgrade gofumpt to v0.3.1. Besides a very small number of new rules, the most important benefit is that the new version parallelizes formatting of files. It is thus much faster, also for our codebase: $ hyperfine --warmup=1 'make format' 'make format GOFUMPT_VERSION=0.3.0' Benchmark #1: make format Time (mean ± σ): 1.238 s ± 0.060 s [User: 1.337 s, System: 0.084 s] Range (min … max): 1.125 s … 1.312 s 10 runs Benchmark #2: make format GOFUMPT_VERSION=0.3.0 Time (mean ± σ): 239.5 ms ± 26.0 ms [User: 1.804 s, System: 0.109 s] Range (min … max): 180.6 ms … 284.7 ms 14 runs Summary 'make format GOFUMPT_VERSION=0.3.0' ran 5.17 ± 0.61 times faster than 'make format' Reformat our code with the new version.
2022-03-24Merge branch 'jv-get-archive-request-hash' into 'master'Toon Claes
GetArchive: log request hash See merge request gitlab-org/gitaly!4425
2022-03-23Merge branch 'jc-rebase-cofirmable-structured-errors' into 'master'Toon Claes
repository: Structured errors for UserRebaseConfirmable See merge request gitlab-org/gitaly!4419
2022-03-23sidechannel: Convert to use runtime directory to store socketspks-sidechannel-migrate-to-runtime-dirPatrick Steinhardt
The sidechannel code is used to create sidechannels that circumvent gRPC for efficiency's sake. This socket is currently created in the system's temporary directory as returned by `os.MkdirTemp()`. This has the very real downside that it easily leaks created files and directories in case we have a logic bug anywhere. We have recently introduced a new runtime directory that can help us in this situation: the runtime directory is created once at a central place and will be cleaned up when shutting down Gitaly. Consequentially, even if we leaked and sidechannel files, we'd still remove them on shutdown. And even if we didn't: the runtime directory is designed so that we can check whether it's used because it has the current process's PID as part of its component. So if a runtime directory exists whose PID doesn't refer to any existing process it's safe to remove. While we don't have any such logic yet, it can easily be added at a later point and have all code which started to use the runtime directory benefit at the same time. Migrate the sidechannel code to create sockets in a subdirectory within the runtime directory called "sidechannel.d" if the runtime directory is set via the hooks payload. Changelog: changed
2022-03-22repository: Structured errors for UserRebaseConfirmableJohn Cai
Currently, UserRebaseConfirmable will return gRPC OK even when there is an error during rebase or if there is an error during PreReceive when it calls the GitLab API for an access check. This change modifies the error handling to return a detailed gRPC structured error in either of these failure modes. Changelog: changed
2022-03-21GetArchive: log request hashJacob Vosmaer
This will allow us to determine in the future if it is worth adding a cache for GetArchive.
2022-03-21operations: Always use structured errors for UserSquashPatrick Steinhardt
With d88d29185 (operations: Return error from UserSquash when resolving revisions fails, 2022-03-01), we have converted UserSquash to return structured errors in some cases where it previously returned no errors at all. Instead, the RPC used to embed the error in the response struct via the `GitError` field. This has made us blind for actual issues which happen in production, and furthermore it trips Praefect's logic which decides whether we need to create repliction jobs or not. Rails has been adapted already in c7bcaadf193 (gitaly_client: Handle detailed errors for UserSquash, 2022-03-02), and the changes have been rolled out to production. While we required one follow-up fix to the new error codes we returned in 6c63998ad (operations: Fix wrong error code when UserSquash conflicts, 2022-03-11), the rollout has otherwise been successful. Remove the feature flag altogether and deprecated the `GitError` field, which isn't used anymore. Changelog: changed
2022-03-21git: Disable generation of server info in git-repack(1)Patrick Steinhardt
In order to be able to serve Git repositories via a dumb HTTP server, Git needs to create a bunch of metadata that tells clients what they need to fetch. This metadata is generated via git-update-server-info(1), which is can be automatically called via both git-receive-pack(1) and git-repack(1). While the former doesn't do so by default, the latter does. For us this is a waste of resources because we don't ever serve repos via the dumb HTTP protocol. Generating this information thus wastes both precious CPU cycles and disk space for data that is ultimately never used by anything. The waste of disk space is even more pronounced because git-repack(1) doesn't always clean up the temporary files it uses to atomically update the final files. So when the command gets killed, we may accumulate more and more temporary files. In extreme cases we have seen in production, a repository whose on-disk size of actual data was less than 5GB had accumulated about 35GB of these temporary files. Stop generating this information in git-repack(1) completely. Ideally, we'd do so by injecting configuration into all repack commands, but there is no such config option right now. Instead, we need to pass the `-n` flag everywhere we execute git-repack(1). Note that this doesn't stop generating the data in all places: commands like git-gc(1) invoke git-repack(1), but we have no ability to tell it to pass `-n` to git-repack(1). Neither is there a config option which would allow us to globally disable the generation. The current approach is thus only a best-effort one as a stop-gap solution while we're in the process of upstreaming patches which introduce such a config option. The cleanup logic for existing repositories is added in a later step in this patch series. Changelog: performance
2022-03-18Merge branch 'pks-operations-squash-commit-voting' into 'master'Toon Claes
operations: Fix missing votes on squashed commits Closes #4109 See merge request gitlab-org/gitaly!4417
2022-03-18operations: Fix missing votes on squashed commitsPatrick Steinhardt
The UserSquash RPC computes a squashed commit by first rebasing a range of commits on top of another commit, and then collapsing these into a single commit. This RPC is notably different from almost all of our other RPCs because it never writes any references to disk, and neither does it ever execute any access checks as the other User RPCs do. This design is quite weird: - There is a known race window where the new objects are not referenced, so they could be pruned by maintenance calls. - We accept objects into the repository which may not be sanctioned by our access checks. - Replication jobs cannot replicate the squashed commit because they aren't referenced. - We never perform transactional voting because no references are updated. Together these problems show that the RPC call is misdesigned, but fixing this design would require a bigger refactoring to make it work alright in Rails. In this commit we fix the last bullet point though: because this RPC never performs transactional voting, we're always creating replication jobs after the call finishes because Praefect didn't observe any transactional votes. As mentioned though, we don't have any reference to vote on, so the best we can do is vote on the commit ID of the newly written squash commit to make sure that it is the same across nodes. This commit does so by introducing a quarantine directory that is used to stage all new objects first before they're migrated to the final repository. We then vote on the object ID of the staged squash commit. Only if this vote is successful will we successfully commit the object to disk. This commit thus solves two things: first it fixes the missing transactional voting. And second it causes us to discard all objects in case the RPC errors. Changelog: fixed
2022-03-18housekeeping: Fix Prometheus metrics' latency bucketspks-housekeeping-track-total-count-of-optimizationsPatrick Steinhardt
The housekeeping manager is hosting a latency metric which tracks how long specific tasks of the housekeeping manager take. But while latency metrics should typically use the GRPC latency buckets as configured in the Gitaly configuration, we accidentally didn't in the context of the housekeeping manager. Fix this bug by passing in the Prometheus configuration.
2022-03-16Merge branch 'fix-error-handling-in-GetTreeEntries' into 'master'John Cai
Fix error handling in GetTreeEntries See merge request gitlab-org/gitaly!4414
2022-03-16Fix error handling in GetTreeEntriesHordur Freyr Yngvason
2022-03-15Repository: allow opting for --mirror for CreateRepositoryFromURLGabriel Mazetto
Changelog: added
2022-03-15Merge branch 'pks-repository-creation-fix-indeterministic-voting' into 'master'John Cai
repository: Fix indeterministic voting when creating new repos Closes #4100 See merge request gitlab-org/gitaly!4402
2022-03-15Merge branch 'pks-user-squash-fix-error-code-with-improved-error-handling' ↵Sami Hiltunen
into 'master' operations: Fix wrong error code when UserSquash conflicts See merge request gitlab-org/gitaly!4403
2022-03-15repository: Fix indeterministic voting when creating new repospks-repository-creation-fix-indeterministic-votingPatrick Steinhardt
When creating repositories we use transactional voting to determine that the repositories have been created the same on all nodes part of the transaction. This voting happens after we have seeded the repository, and the vote is computed by walking through the repository's directory and hashing all its files. We need to be careful though to skip files which we know to be indeterministic: - FETCH_HEAD may contain URLs which are different for each of the nodes. - Object packfiles contained in the object database are not deterministic, mostly because it may use multiple threads to compute deltas. Luckily, we do not have to rely on either of both types of files in order to ensure that the user-visible state of the repository is the same, so we can indeed just skip them. While we already have the logic to skip these files, this logic didn't work alright because we embarassingly forgot to actually return `fs.SkipDir` in case we see the object directory. So even though we thought we skipped these files, in reality we didn't. This bug has been manifesting in production in form of CreateFork, which regularly fails to reach quorum at random on a subset of nodes. The root cause here is that we use git-clone(1) to seed repository contents of the fork, which triggers exactly the case of indeterministic packfiles noted above. So any successful CreateFork RPC call really only succeeded by pure luck. Fix this issue by correctly skipping over "object" directories. While at it, fix how we skip over FETCH_HEAD by returning `nil`: it's a file and not a directory, so it doesn't make much sense to return `fs.SkipDir`. Changelog: fixed
2022-03-15Merge branch 'nanmu42-update-head' into 'master'Toon Claes
repository: Add updateHeadFromBundle in CreateRepositoryFromBundle Closes #4086 See merge request gitlab-org/gitaly!4401
2022-03-15repository: Add updateHeadFromBundle in CreateRepositoryFromBundleLI Zhennan
2022-03-14Merge branch 'smh-extend-invalid-source-repository' into 'master'James Fargher
Extend invalid metadata deletion logic to repos existin on target Closes #4083 See merge request gitlab-org/gitaly!4396
2022-03-11operations: Fix wrong error code when UserSquash conflictspks-user-squash-fix-error-code-with-improved-error-handlingPatrick Steinhardt
With the new improved error hadnling in UserSquash we're now returning errors in some cases where we previously didn't. One of those cases is when the rebase performed during the squash results in a merge conflict. While it is correct to return an error in this case, we're using an Internal error code for this case, which indicates that Gitaly is to blame instead of the parameters which have been passed by the user. Fix the error code to instead be FailedPrecondition. This error code is special-cased by our monitoring infrastructure to not raise any alerts. Note that this change is only fixing issues with monitoring: Rails handles the error alright by inspecting the error details instead of the error code. Changelog: fixed
2022-03-10Merge branch 'smh-no-implicit-pool-creation' into 'master'Toon Claes
Disable implicit pool creation on link behind a feature flag See merge request gitlab-org/gitaly!4397
2022-03-09Merge branch 'pks-user-squash-structured-errors' into 'master'John Cai
operations: Implement structured errors for UserSquash See merge request gitlab-org/gitaly!4374
2022-03-08Disable implicit pool creation on link behind a feature flagsmh-no-implicit-pool-creationSami Hiltunen
This commit adds a feature flag to disable the implicit pool creation when joining a repository to a non-existent object pool. It looks like this behavior is not needed given Rails should create the object pool prior to considering it ready for joining. Disabling this behavior makes it easier to handle pools in Praefect as Praefect needs to create metadata records for all of the repositories. This implicit behavior might create a repository without Praefect's knowledge. Special handling would be required for this particular RPC only, so let's see if we can remove the behavior rather than implement it in Praefect.
2022-03-07Extend invalid metadata deletion logic to repos existin on targetsmh-extend-invalid-source-repositorySami Hiltunen
Praefect contains logic to delete metadata record of a non-existent source repository during replication. Praefect checks for the ErrInvalidSourceRepository error, and if it receives it, it deletes the metadata record of the source repository. Right now, the error is only returned from ReplicateRepository if the source repository didn't exist when the target is creating the repository for the first time. This means Praefect's clean up logic doesn't run if the target already has the repository on the disk but the source repository doesn't exist. This can lead to reconciliation loops where Praefect keeps trying to replicate from the non-existent source repository over and over again as all of the replications fail. This commit changes ReplicateRepository to return the error when the source doesn't exist but the target repository does. This logic was initially implemented to break the reonciliation loops that occurred due to Praefect creating invalid metadata records when a repository creation failed. As DeleteObjectPool RPC was not handled as a repository deleting RPC in Praefect, the metadata records were left in place which is currently causing reconciliation loops. Changing the error return allows for Praefect to clean up those invalid metadata records. In the long run, Praefect's background metadata verifier would capture such issues and this logic could be removed. Changelog: fixed
2022-03-07repository: Allow CreateRepository to take default_branchjc-create-repo-default-branchJohn Cai
In GitLab rails, there is code to ensure that the default branch on a new repository is set. This is currently done with an extra WriteRef call. This is done in a racy way, which causes problems with transaction voting. To fix this, we can simplify the logic altogether and avoid adding another roundtrip, add a parameter in CreateRepositoryRequest to pass a default_branch. Changelog: changed
2022-03-07proto: Deprecate PreReceiveError field in UserMergeToRefResponsePatrick Steinhardt
The PreReceiveError field in the UserMergeToRefResponse isn't ever set because the RPC call doesn't in fact perform any access checks at all. Digging back through history this doesn't seem to be an oversight in the Go port, but was the case in the Ruby implementation already. While the initial implementation of UserMergeToRef in 8fbc337b8 (Allow merging to a ref without changing the target branch, 2019-01-25) still used to call hooks, this was explicitly changed with 19fdfa12a (Skip hooks for UserMergeToRef RPC, 2019-06-14) to not do so anymore. The field thus hasn't been set anymore for more than two years by now. Deprecate the PreReceiveError field so that we can eventually remove it. Changelog: deprecated
2022-03-02Merge branch 'fj-add-load-content-param-to-wiki-find-page' into 'master'John Cai
Add skip_content param to wiki find page call See merge request gitlab-org/gitaly!4373
2022-03-02Add skip_content param to wiki find page callFrancisco Javier López
2022-03-01operations: Return error from UserSquash when merging failspks-user-squash-structured-errorsPatrick Steinhardt
We do not return an error in case creating the squashed merge commit in UserSquash call fails. This makes us blind for the error conditions in this RPC, which is in turn creating problems in Praefect setups where we have to rely on errors in order to decide whether we have to create replication jobs or not. Return an error in case the merge fails. Note that we don't return a structured error in this case: the range of commits we're about to squash-merge has already been rebased, so we do not expect to ever see a merge conflict here. As a consequence, a failure to create the merge commit is unexpected and thus doesn't warrant its own error type. Changelog: changed
2022-03-01operations: Return error from UserSquash on conflictPatrick Steinhardt
We do not return an error in case resolving rebasing commits in UserSquash call fails. This makes us blind for the error conditions in this RPC, which is in turn creating problems in Praefect setups where we have to rely on errors in order to decide whether we have to create replication jobs or not. Return structured errors in case the error condition triggers such that callers can tell what kind of error they have been hitting. Changelog: changed
2022-03-01operations: Return error from UserSquash when resolving revisions failsPatrick Steinhardt
We do not return an error in case resolving either the start or end revision of a UserSquash call fails. This makes us blind for the error conditions in this RPC, which is in turn creating problems in Praefect setups where we have to rely on errors in order to decide whether we have to create replication jobs or not. Return structured errors in case the error condition triggers such that callers can tell what kind of error they have been hitting. Changelog: changed
2022-03-01operations: Document the UserSquash implementationPatrick Steinhardt
Add a short documentation snippet to UserSquash so that we can get rid of the previous linting exception.
2022-02-28Merge branch 'pks-ff-remove-fetch-internal-with-sidechannel' into 'master'Sami Hiltunen
localrepo: Remove flag to switch to sidechannels for internal fetches Closes #4065 See merge request gitlab-org/gitaly!4375
2022-02-28Merge branch 'brodock/gitaly-clone-improvements' into 'master'Christian Couder
Enable CloneRepositoryFromURL to authenticate with custom token See merge request gitlab-org/gitaly!4239
2022-02-28localrepo: Remove flag to switch to sidechannels for internal fetchesPatrick Steinhardt
In commit 8e5cb6419 (git: Convert internal fetches to use sidechannel, 2022-02-16), we have converted internalf teches to use the sidechannel. With this change, communication exchanged between git-upload-pack(1) and git-fetch(1) don't use gRPC encoding anymore, but instead use pkt-line encoding such that we can avoid having to serialize and deserialize lots of gRPC messages. This avoids the overhead induced by grpc-go and should thus reduce the load while serving an internal fetch. This change has been rolled out to production on February 23rd without any issues observed. Remove the feature flag to unconditionally enable use of the sidechannel. Changelog: changed
2022-02-25localrepo: Do transactional vote in SetDefaultBranchJohn Cai
SetDefaultBranch sets HEAD to a reference using git-symbolic-ref. However, git-symbolic-ref does not call the reference-transaction hook so we have no way of voting. Praefect will detect that the vote failed and schedule a replication job each time. This leads to unnecessary replication jobs being scheduled. Modify SetDefaultBranch to do its own operation using a safe locking file writer which supports transaction voting instead of calling git-symbolic-ref. Update all the callsites to call SetDefaultBranch with a transaction manager. Protect this change behind a feature flag write_ref_manual_voting. Changelog: changed
2022-02-25git: Refactor CheckRefFormatJohn Cai
CheckRefFormat calls git-check-ref-format(1) and if there is an error from the command, returns a separate CheckRefFormat error. This doesn't help much however. It swallows up the context of the original error. But also, the implementation of git-check-ref-format(1) is such that if the format is not valid, it will return with an exit code of 1. Clean up the interface of this function by instead, checking if the exit code of git-check-ref-format(1) is 1. If it is, then we know the format is invalid. If the exit code is anything else, something went wrong so we bubble up the error to be handled by the caller. We can also get rid of CheckRefFormatError. Update the one callsite that checks for CheckRefFormatError
2022-02-23Update quarantine repository tests for relative path rewritingSami Hiltunen
Quarantine directories are set in Gitaly to receive objects into a separate directory from the repository's main object database. The tests are not currently account for relative path rewriting that Praefect would do. To do so, this commit changes the tests to set the quarantine directories based on the rewritten relative path. The API still needs to be called with the relative path so Praefect can route the request correctly.