Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-04-21Remove Maintenance routing feature flagJohn Cai
Since the maintenance routing feature flag has been running in production without issue, we can now remove it. Changelog: changed
2022-04-07Merge branch 'smh-delete-object-pool-type' into 'master'Sami Hiltunen
Handle DeleteObjectPool calls in Praefect Closes #3742 and #4078 See merge request gitlab-org/gitaly!4395
2022-04-07Merge branch 'smh-remove-implicit-pool-creation' into 'master'Sami Hiltunen
Remove implicit pool creation on link behavior See merge request gitlab-org/gitaly!4455
2022-04-07Merge branch 'jc-remove-write-ref-ff' into 'master'James Fargher
featureflag: Remove TransactionalSymbolicRefUpdates featureflag See merge request gitlab-org/gitaly!4467
2022-04-07Merge branch 'jc-rate-limiter' into 'master'John Cai
Add RateLimiting Closes #4026 See merge request gitlab-org/gitaly!4427
2022-04-06server: Allow multiple limit handlersJohn Cai
Now that we are adding a second limit handle, adjust the code to allow for multiple limit handlers to be passed into a server invocation.
2022-04-06featureflag: Remove TransactionalSymbolicRefUpdates featureflagJohn Cai
This feature flag has been set to default on, and deleted from production. There have been no observable issues, so it's now safe to fully remove the feature flag. Changelog: changed
2022-04-06commit: Add CheckObjectsExist RPCJohn Cai
When pushing commits to a repository, access checks are run. In order to use the quarantine directory, we need a way to filter out revisions that a repository already has in the case that a packfile sends over objects that already exists on the server. In this case, we don't need to check the access. Add an RPC that when given a list of revisions, returns the ones that already exist in the repository, and the ones that do not exist in the repository. Changelog: added
2022-04-05limithandler: Pave the way for a second limiter typeJohn Cai
A future commit will add a new middleware that will limit based on the rate rather than concurrent calls. There is a good amount of logic currently used by the concurrency limiter that can be reused since a rate limiter is also operating on incoming requests based on RPC name. To make easier to add this new limiter type in the future, refactor the code by adding some abstractions easier to add another type of limiter.
2022-04-05Allow Commit.RawBlame to take a Range parameterNick Thomas
Git supports a wide array of range options for this option, but for now we restrict the formats we support to the very simplest, which is of the form "1,100". We can use this to implement limits on how many lines of the blame we retrieve GitLab-side, which should help to reduce the resource usage of this particular RPC. Changelog: added
2022-04-04Merge branch 'jc-accurate-calculation-of-repo-size' into 'master'John Cai
Use rev-list --all --objects --disk-usage to calculate repository usage See merge request gitlab-org/gitaly!4430
2022-04-04repository: Use Size() to calculate repo size behind feature flagJohn Cai
In the previous commit, we added a Size() method in localrepo that calls git rev-list --all --objects --disk-usage to calculate the size of the repository. Add a feature flag that, when toggled on, will cause RepositorySize to use this new way of calculating repository size. Changelog: added
2022-04-04ssh: Clean up output when pre-receive hook failsToon Claes
When the pre-receive hook failed, git-receive-pack(1) exits with code 0. It's arguable whether this is the expected behavior, but anyhow it means cmd.Wait() did not error out. On the other hand, the gitaly-hooks command did stop the transaction upon failure. So this final vote fails. To avoid this error being presented to the end user, ignore it when the transaction was stopped. Issue: https://gitlab.com/gitlab-org/gitaly/-/issues/3636 Changelog: fixed
2022-04-04ssh: Add test with custom pre-receive hookToon Claes
Add test to check if the error message from a failing custom pre-receive hook is presented to the user.
2022-04-04ssh: Split out the push command in testToon Claes
In a future commit we'd like to process the stdout and stderr output from the ssh push command separately. With this commit the setting up of the command is extracted into a function so in the future it's possible to specify where to write stdout/stderr to.
2022-04-04Don't disable Praefect in RemoveRepository testsSami Hiltunen
Praefect is currently disabled in RemoveRepository tests which allows for Praefect's RemoveRepository behavior to deviate from the Gitaly's. This commit enables Praefect for these tests so we can be sure we catch behavior deviations between the two handlers.
2022-04-04Handle DeleteObjectPool calls in PraefectSami Hiltunen
Praefect currently proxies DeleteObjectPool calls to Gitalys like any other mutating RPC. This has the same problem as RemoveRepository previously had; a pool can be deleted from the disk and it's metadata record still left in the database. This can then cause problems as Praefect believes a pool replica still exists where one doesn't exist. Praefect doesn't even treat DeleteObjectPool as a repository removing RPC. This means the deletions have never been replicated to the secondaries and the pool metadata records have been left in place. This then causes the reconciler to repeatedly attempt to reconcile from a non-existing primary pool replica to the secondaries. This commit fixes both issues by handling pool deletions in Praefect. Similarly to RemoveRepository, the metadata record of the pool is first deleted and only then the pool is attempted to remove from the Gitalys that have it. This ensures atomicity of the removals when Praefect is rewriting the replica paths. With the behavior fixed, the Praefect specific test asserations can be removed as the behavior now matches what how a plain Gitaly would behave. The handler in Praefect is tested via the same tests that test the handler in Gitaly. Adding separate tests doesn't make sense as external behavior of the the handlers should match and the handler shouldn't exists in Praefect if it is removed from Gitaly. Changelog: fixed
2022-04-04Add test case for missing pool in DeleteObjectPoolSami Hiltunen
DeleteObjectPool's tests do not currently test what happens if the request is missing the pool repository. This commit adds the missing test case. While at it, it exports the functionality to extract a pool from the request so Praefect's DeleteObjectPool handler can reuse it later and pass the same test.
2022-04-04Verify object pool existence after deletion in testsSami Hiltunen
TestDelete does not verify whether the object pool exists or not after the call to DeleteObjectPool. This is somewhat of an imporant assertion to make to ensure the deletion logic is working correctly. This commit adds that missing assertion.
2022-03-31Merge branch 'pks-smarthttp-refactor-flaky-gitconfig-test' into 'master'John Cai
smarthttp: Fix test race where response buffer may not be filled See merge request gitlab-org/gitaly!4447
2022-03-31Remove implicit pool creation on link behaviorsmh-remove-implicit-pool-creationSami Hiltunen
If a repository is being linked to an object pool that does not exist, the current behavior is to create the pool implicitly. This doesn't bode well with Praefect as it doesn't know whether a pool was or wasn't created and can't thus create a metadata record for the pool. The functionality was removed behind a feature flag. It doesn't seem like anything relies on the implicit pool creation, so this commit removes the feature flag and returns an error when linking a repository to a non-existent pool. Changelog: changed
2022-03-31Merge branch 'pks-proto-introduce-maintenance-operations' into 'master'Patrick Steinhardt
proto: Introduce new "maintenance" RPC type Closes #4079 See merge request gitlab-org/gitaly!4399
2022-03-31Merge branch 'pks-git-grep-limit-threading' into 'master'James Fargher
git: Globally limit number of threads Closes #4120 See merge request gitlab-org/gitaly!4443
2022-03-30Merge branch 'jc-repo-cleaner-grace-period' into 'master'John Cai
repocleaner: Only log repositories that have a mod time older than 24 hours and are not in the Praefect DB See merge request gitlab-org/gitaly!4449
2022-03-30internalgitaly: WalkRepos to return last accessed timestampJohn Cai
In order to make a decision on which repositories to clean up, Praefect's repocleaner will match repository paths with what it sees in the database. If it doesn't see a record in the database, that means Praefect doesn't know about this repository. However, it could be the case that a repository has just been created and the record for it doesn't yet exist in the database. In order for repocleaner to know, it needs the information to be returned from Gitaly. Add a field in the RPC response, and return the last modified date on the refs/ dir so we can use it as a proxy for when the repository was created.
2022-03-30proto: Mark related RPCs as maintenance operationsPatrick Steinhardt
We've got all the infrastructure in place now to handle maintenance operations specially in our infrastructure. Mark all RPCs which are maintenance-related as maintenance operations to switch them to use the new infrastucture. Note that we've still got a feature flag which toggles between old and new routing behaviour for these RPCs. Changelog: changed
2022-03-30housekeeping: Skip repacking empty repositoriesPatrick Steinhardt
When a repository does not have any bitmaps or in case it is missing bloom filters in the commit graph we always try to do a full repack in order to generate these data structures. This logic is required because: - Bitmaps are only generated by git-repack(1) for full repacks. This limitation exists because there can only be one bitmap per repository. - In case commit-graphs exist but missing bloom filters we need to completely rewrite them in order to enable this extension. While the logic makes sense, it also causes us to repack repositories which are empty: they don't contain either of these data structures, and consequentially we think we need a full repack. While this is not a huge problem because git-repack(1) would finish fast anyway in case the repo doesn't contain any objects, we still needlessly spawn a process. Also, our metrics report that we're doing a lot of full repacks, which can be confusing. Improve our heuristics by taking into account whether the repository has objects at all. If there aren't any, we can avoid all this busywork and just skip repacking altogether. Changelog: changed
2022-03-30smarthttp: Fix test race where response buffer may not be filledpks-smarthttp-refactor-flaky-gitconfig-testPatrick Steinhardt
We have observed a racy test in production where fetching a hidden ref via `PostUploadPack` isn't failing in the way we expect it to. While the error code is there and the RPC call itself has failed, the response buffer we see is completely empty. This race can easily be caused by our testing infrastructure though: we use `makePostUploadPackWithSidechannelRequest()` to set up a sidechannel and then copy the sidechannel data via a separate Goroutine. In case we see an error though while handling the sidechannel (which is the case: we expect to fail fetching the hidden ref), we don't synchronize with the Goroutine copying the response. Fix this race by properly synchronizing the Goroutine via a wait group.
2022-03-29smarthttp: Refactor flaky test which checks that GitConfig is appliedPatrick Steinhardt
One of our tests is verifying that the `GitConfigOptions` which can be passed to us via the `PostUploadPackRequest` is applied as expected. This is done by creating a reference and then injecting config which tells git-upload-pack(1) to hide this reference. If this config is applied as expected, then we should see that the server refuses to give us the object pointed to by the hidden reference. Refactor this test to be less reliant on the repository state, which is kind of confusing. The object pointed to by the hidden reference should in fact already be pointed to by another visible reference, which is quite puzzling. Furthermore, we also hard-code how many objects we expect to receive, which is one more thing that depends on this specific repository. Convert the test to use an empty repository instead and create two commits ad-hoc.
2022-03-29Merge branch 'pks-sidechannel-migrate-to-runtime-dir' into 'master'Toon Claes
sidechannel: Convert to use runtime directory to store sockets See merge request gitlab-org/gitaly!4428
2022-03-29git: Globally limit threads used by git-pack-objects(1)Patrick Steinhardt
By default, git-pack-objects(1) will spawn as many threads as there are logical CPU cores on the machine. While this typically makes sense on a client's machine because a client will want the command to finish as fast as possible, on the server-side this is less of a good idea because it can easily impact other, concurrently running Git commands. This is why we have started to limit the number of threads this command may use when performing repository maintenance. Packing of objects is in no way limited to maintenance though, but it's also executed e.g. when serving packfiles. Let's globally limit the number of threads git-pack-objects(1) uses by injecting the `pack.threads` configuration into all commands which may generate packfiles. Changelog: performance
2022-03-28ssh: Inject feature flags via the test contextPatrick Steinhardt
When setting up a push for our tests, we also have the ability to inject feature flags by setting the `featureFlags` field. This has a major downside though: the feature flags injected via the context and the feature flags executed at a later point when actually running the command may easily drift apart. This inconsistency is something that may break assumptions on our side though, where some parts may now evaluate a flag to `true` while others evaluate it to `false`. Fix this inconsistency by instead accepting feature flags via a context. This both simplifies the code and fixes such an upcoming incompatibility when we're about to default-enable Git v2.35.1.gl1.
2022-03-28ssh: Refactor tests to not create worktreePatrick Steinhardt
Our SSH tests need to set up a repository with some new objects so that it can verify that these objects end up on the remote side after the push. To create those objects, we currently use a worktree and then execute git-commit(1). This usage pattern is outdated though: we have a `gittest.WriteCommit()` helper which does not require a worktree. Refactor tests to use this new helper so that we can get rid of using a worktree.
2022-03-24Makefile: Upgrade gofumpt to v0.3.1Patrick Steinhardt
Upgrade gofumpt to v0.3.1. Besides a very small number of new rules, the most important benefit is that the new version parallelizes formatting of files. It is thus much faster, also for our codebase: $ hyperfine --warmup=1 'make format' 'make format GOFUMPT_VERSION=0.3.0' Benchmark #1: make format Time (mean ± σ): 1.238 s ± 0.060 s [User: 1.337 s, System: 0.084 s] Range (min … max): 1.125 s … 1.312 s 10 runs Benchmark #2: make format GOFUMPT_VERSION=0.3.0 Time (mean ± σ): 239.5 ms ± 26.0 ms [User: 1.804 s, System: 0.109 s] Range (min … max): 180.6 ms … 284.7 ms 14 runs Summary 'make format GOFUMPT_VERSION=0.3.0' ran 5.17 ± 0.61 times faster than 'make format' Reformat our code with the new version.
2022-03-24Merge branch 'jv-get-archive-request-hash' into 'master'Toon Claes
GetArchive: log request hash See merge request gitlab-org/gitaly!4425
2022-03-23Merge branch 'jc-rebase-cofirmable-structured-errors' into 'master'Toon Claes
repository: Structured errors for UserRebaseConfirmable See merge request gitlab-org/gitaly!4419
2022-03-23sidechannel: Convert to use runtime directory to store socketspks-sidechannel-migrate-to-runtime-dirPatrick Steinhardt
The sidechannel code is used to create sidechannels that circumvent gRPC for efficiency's sake. This socket is currently created in the system's temporary directory as returned by `os.MkdirTemp()`. This has the very real downside that it easily leaks created files and directories in case we have a logic bug anywhere. We have recently introduced a new runtime directory that can help us in this situation: the runtime directory is created once at a central place and will be cleaned up when shutting down Gitaly. Consequentially, even if we leaked and sidechannel files, we'd still remove them on shutdown. And even if we didn't: the runtime directory is designed so that we can check whether it's used because it has the current process's PID as part of its component. So if a runtime directory exists whose PID doesn't refer to any existing process it's safe to remove. While we don't have any such logic yet, it can easily be added at a later point and have all code which started to use the runtime directory benefit at the same time. Migrate the sidechannel code to create sockets in a subdirectory within the runtime directory called "sidechannel.d" if the runtime directory is set via the hooks payload. Changelog: changed
2022-03-22repository: Structured errors for UserRebaseConfirmableJohn Cai
Currently, UserRebaseConfirmable will return gRPC OK even when there is an error during rebase or if there is an error during PreReceive when it calls the GitLab API for an access check. This change modifies the error handling to return a detailed gRPC structured error in either of these failure modes. Changelog: changed
2022-03-21GetArchive: log request hashJacob Vosmaer
This will allow us to determine in the future if it is worth adding a cache for GetArchive.
2022-03-21operations: Always use structured errors for UserSquashPatrick Steinhardt
With d88d29185 (operations: Return error from UserSquash when resolving revisions fails, 2022-03-01), we have converted UserSquash to return structured errors in some cases where it previously returned no errors at all. Instead, the RPC used to embed the error in the response struct via the `GitError` field. This has made us blind for actual issues which happen in production, and furthermore it trips Praefect's logic which decides whether we need to create repliction jobs or not. Rails has been adapted already in c7bcaadf193 (gitaly_client: Handle detailed errors for UserSquash, 2022-03-02), and the changes have been rolled out to production. While we required one follow-up fix to the new error codes we returned in 6c63998ad (operations: Fix wrong error code when UserSquash conflicts, 2022-03-11), the rollout has otherwise been successful. Remove the feature flag altogether and deprecated the `GitError` field, which isn't used anymore. Changelog: changed
2022-03-21git: Disable generation of server info in git-repack(1)Patrick Steinhardt
In order to be able to serve Git repositories via a dumb HTTP server, Git needs to create a bunch of metadata that tells clients what they need to fetch. This metadata is generated via git-update-server-info(1), which is can be automatically called via both git-receive-pack(1) and git-repack(1). While the former doesn't do so by default, the latter does. For us this is a waste of resources because we don't ever serve repos via the dumb HTTP protocol. Generating this information thus wastes both precious CPU cycles and disk space for data that is ultimately never used by anything. The waste of disk space is even more pronounced because git-repack(1) doesn't always clean up the temporary files it uses to atomically update the final files. So when the command gets killed, we may accumulate more and more temporary files. In extreme cases we have seen in production, a repository whose on-disk size of actual data was less than 5GB had accumulated about 35GB of these temporary files. Stop generating this information in git-repack(1) completely. Ideally, we'd do so by injecting configuration into all repack commands, but there is no such config option right now. Instead, we need to pass the `-n` flag everywhere we execute git-repack(1). Note that this doesn't stop generating the data in all places: commands like git-gc(1) invoke git-repack(1), but we have no ability to tell it to pass `-n` to git-repack(1). Neither is there a config option which would allow us to globally disable the generation. The current approach is thus only a best-effort one as a stop-gap solution while we're in the process of upstreaming patches which introduce such a config option. The cleanup logic for existing repositories is added in a later step in this patch series. Changelog: performance
2022-03-18Merge branch 'pks-operations-squash-commit-voting' into 'master'Toon Claes
operations: Fix missing votes on squashed commits Closes #4109 See merge request gitlab-org/gitaly!4417
2022-03-18operations: Fix missing votes on squashed commitsPatrick Steinhardt
The UserSquash RPC computes a squashed commit by first rebasing a range of commits on top of another commit, and then collapsing these into a single commit. This RPC is notably different from almost all of our other RPCs because it never writes any references to disk, and neither does it ever execute any access checks as the other User RPCs do. This design is quite weird: - There is a known race window where the new objects are not referenced, so they could be pruned by maintenance calls. - We accept objects into the repository which may not be sanctioned by our access checks. - Replication jobs cannot replicate the squashed commit because they aren't referenced. - We never perform transactional voting because no references are updated. Together these problems show that the RPC call is misdesigned, but fixing this design would require a bigger refactoring to make it work alright in Rails. In this commit we fix the last bullet point though: because this RPC never performs transactional voting, we're always creating replication jobs after the call finishes because Praefect didn't observe any transactional votes. As mentioned though, we don't have any reference to vote on, so the best we can do is vote on the commit ID of the newly written squash commit to make sure that it is the same across nodes. This commit does so by introducing a quarantine directory that is used to stage all new objects first before they're migrated to the final repository. We then vote on the object ID of the staged squash commit. Only if this vote is successful will we successfully commit the object to disk. This commit thus solves two things: first it fixes the missing transactional voting. And second it causes us to discard all objects in case the RPC errors. Changelog: fixed
2022-03-18housekeeping: Fix Prometheus metrics' latency bucketspks-housekeeping-track-total-count-of-optimizationsPatrick Steinhardt
The housekeeping manager is hosting a latency metric which tracks how long specific tasks of the housekeeping manager take. But while latency metrics should typically use the GRPC latency buckets as configured in the Gitaly configuration, we accidentally didn't in the context of the housekeeping manager. Fix this bug by passing in the Prometheus configuration.
2022-03-16Merge branch 'fix-error-handling-in-GetTreeEntries' into 'master'John Cai
Fix error handling in GetTreeEntries See merge request gitlab-org/gitaly!4414
2022-03-16Fix error handling in GetTreeEntriesHordur Freyr Yngvason
2022-03-15Repository: allow opting for --mirror for CreateRepositoryFromURLGabriel Mazetto
Changelog: added
2022-03-15Merge branch 'pks-repository-creation-fix-indeterministic-voting' into 'master'John Cai
repository: Fix indeterministic voting when creating new repos Closes #4100 See merge request gitlab-org/gitaly!4402
2022-03-15Merge branch 'pks-user-squash-fix-error-code-with-improved-error-handling' ↵Sami Hiltunen
into 'master' operations: Fix wrong error code when UserSquash conflicts See merge request gitlab-org/gitaly!4403
2022-03-15repository: Fix indeterministic voting when creating new repospks-repository-creation-fix-indeterministic-votingPatrick Steinhardt
When creating repositories we use transactional voting to determine that the repositories have been created the same on all nodes part of the transaction. This voting happens after we have seeded the repository, and the vote is computed by walking through the repository's directory and hashing all its files. We need to be careful though to skip files which we know to be indeterministic: - FETCH_HEAD may contain URLs which are different for each of the nodes. - Object packfiles contained in the object database are not deterministic, mostly because it may use multiple threads to compute deltas. Luckily, we do not have to rely on either of both types of files in order to ensure that the user-visible state of the repository is the same, so we can indeed just skip them. While we already have the logic to skip these files, this logic didn't work alright because we embarassingly forgot to actually return `fs.SkipDir` in case we see the object directory. So even though we thought we skipped these files, in reality we didn't. This bug has been manifesting in production in form of CreateFork, which regularly fails to reach quorum at random on a subset of nodes. The root cause here is that we use git-clone(1) to seed repository contents of the fork, which triggers exactly the case of indeterministic packfiles noted above. So any successful CreateFork RPC call really only succeeded by pure luck. Fix this issue by correctly skipping over "object" directories. While at it, fix how we skip over FETCH_HEAD by returning `nil`: it's a file and not a directory, so it doesn't make much sense to return `fs.SkipDir`. Changelog: fixed