Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-04-27catfile: Fix losing all context information on decorrelationpks-catfile-fix-flaky-process-termination-testPatrick Steinhardt
When creating a new cached catfile process we need to decorrelate it from the parent context. For one this means that we need to remove the correlation ID and any tracing spans from the context. And second it means we need to suppress cancellation of the parent so that our cached process doesn't get killed when the RPC context is cancelled. The way we do this right now is to use `context.Background()`: this just creates a completely new background. Funny enough, we still try to strip information from this new context, which doesn't make any sense given that it's empty already. But this hints at an actual bug: the intent is not to get a completely new context, but by using `context.Background()` we also lose information such as feature flags here. Fix this bug by not creating a completely new context. Instead, we just suppress context cancellation via `helper.SuppressCancellation()` and then continue to decorrelate it like we always used to do. Only that now it actually does something. Changelog: fixed
2022-04-26catfile: Make process cancellation synchronous to fix test racePatrick Steinhardt
The catfile cache is using Goroutines to clean up processes. This is not deterministic and thus frequently causes our tests to flake when we lose one of the races caused by this. Furthermore, the asynchronicity makes the code quite hard to understand. Refactor the code to use synchronous cancellation. Instead of waiting for the context to get cancelled, it is now the caller that is responsible for killing the process. The caller transparently either gets a cancellation function that - kills the process directly in case it is an uncachable process. - or returns the process to the cache in case it is a cachable process. This significantly simplifies the code and also makes it deterministic, which fixes the flaky tests we have observed. Furthermore, this also allows us to get rid of one more Goroutine per cached catfile process. Note that this allows us to get rid of the signalling logic we had in place previous to this change: there is no need to wait for processes to return to the cache in our tests anymore because this is now guaranteed after the cancellation function was called.
2022-04-26catfile: Deduplicate logic to tear down processesPatrick Steinhardt
Both the `CatfileReader` and `CatfileInfoReader` have the same logic to tear down the process when the context gets cancelled. This logic can easily be unified into the cache: it already got a Goroutine that waits for the context to get cancelled to decrement the process counter, so it is a perfect fit to also host the other logic. Refactor the code to do this. While this reduces complexity, it also cuts the number of Goroutines in half.
2022-04-26catfile: Replace mentions of "batch" in the cachePatrick Steinhardt
The catfile cache still mentions "batch" in a lot of places, even though what it's actually handling is processes that may or may not be a batch process. Adjust the variable names to instead say "process".
2022-04-26Merge branch 'pks-user-squash-drop-quarantined-voting-ff' into 'master'James Fargher
operations: Remove feature flag for transactional voting in UserSquash Closes #4119 See merge request gitlab-org/gitaly!4496
2022-04-25Merge branch 'smh-verification-trigger-only-generation' into 'master'Sami Hiltunen
Ignore verification columns for read-only cache updates Closes #4159 See merge request gitlab-org/gitaly!4468
2022-04-25Merge branch 'jv-yamux-close-timeout' into 'master'Sami Hiltunen
sidechannel: lower yamux close timeout See merge request gitlab-org/gitaly!4491
2022-04-25operations: Remove feature flag for transactional voting in UserSquashPatrick Steinhardt
With af4ea3258 (operations: Fix missing votes on squashed commits, 2022-03-18) we have introduced the use of quarantine directories in the `UserSquash()` RPC to enable transactional voting. This change in behaviour has been rolled out to production, and the change was enabled by default in e8413304d (operations: Always use structured errors for UserSquash, 2022-03-21). We haven't seen any issues with this change. Remove the feature flag to always enable it. Changelog: removed
2022-04-25Merge branch 'jc-restore-custom-hooks-vote' into 'master'Toon Claes
repository: RestoreCustomHooks to do transaction voting Closes #4081 See merge request gitlab-org/gitaly!4481
2022-04-22Update VERSION filesv15.0.0-rc1GitLab Release Tools Bot
[ci skip]
2022-04-22sidechannel: lower yamux close timeoutjv-yamux-close-timeoutJacob Vosmaer
Lower the yamux StreamCloseTimeout setting from 5 minutes to 1 second for sidechannel connections. The motivation for changing this is to make it easier to test client side cancelation in Workhorse tests.
2022-04-22Merge branch 'fix-typo-proto-file' into 'master'Toon Claes
Fix typo in documentation of UserCommitFilesRequestHeader See merge request gitlab-org/gitaly!4422
2022-04-22Merge branch 'jc-remove-maintenance-ff' into 'master'James Fargher
Remove Maintenance routing feature flag See merge request gitlab-org/gitaly!4486
2022-04-21Update VERSION filesGitLab Release Tools Bot
[ci skip]
2022-04-21Merge branch 'smh-release-stale-leases' into 'master'John Cai
Release expired verification leases periodically See merge request gitlab-org/gitaly!4478
2022-04-21Update changelog for 14.10.0GitLab Release Tools Bot
[ci skip]
2022-04-21repository: RestoreCustomHooks to do transaction votingjc-restore-custom-hooks-voteJohn Cai
RestoreCustomHooks changes data in a repository, so it should do voting. Otherwise, each time we create a replication job for it, which is useless because we don't replicate hooks. Add transactional voting to the RestoreCustomHooks RPC by integrating with the new LockingDirectory. Changelog: changed
2022-04-21safe: Add new LockingDirectoryJohn Cai
Sometimes it's useful to be able to lock a directory before modifying it. This is similar to LockingFileWriter, except it just gives the ability to lock a directory so another process cannot also lock the directory. This will be immediately useful in RestoreCustomHooks where we are about ot add transactional voting to it. With transactional voting, the semantics are that the `Prepared` vote is made once the data that is about to be changed is locked. This will enable us to lock the custom hooks directory. Changelog: added
2022-04-21Remove Maintenance routing feature flagJohn Cai
Since the maintenance routing feature flag has been running in production without issue, we can now remove it. Changelog: changed
2022-04-20Release expired verification leases periodicallySami Hiltunen
The background verifier sets a lease time on a replica when it picks it up for verification. If the worker dies for some reason, the lease will remain in place and no other worker will pick up the replica for verification again until the lease is cleared. The lease itself tells the maximum time the worker itself would be working on the replica. After it has been passed, it would be safe for another worker to pick up the replica for verification again. This commit adds a background goroutine that periodically releases expired leases so other workers can take up the work if the original worker failed and did not release the lease. The 'verificaton_leases' index is added so the query can efficiently find the replicas with leases acquired to find the stale ones.
2022-04-20Merge branch 'eread/update-version-of-danger-files-dependency' into 'master'James Fargher
Update version of danger-files dependency See merge request gitlab-org/gitaly!4485
2022-04-20Update version of danger-files dependencyEvan Read
2022-04-20Merge branch 'jc-docs-backpressure' into 'master'John Cai
docs: Document Gitaly backpressure See merge request gitlab-org/gitaly!4469
2022-04-20docs: Document Gitaly backpressureJohn Cai
There are a number of knobs in Gitaly to tune backpressure Gitaly can impose on services that call it. This commit documents these.
2022-04-20Merge branch 'sh-fips-mode' into 'master'James Fargher
Add support for FIPS encryption See merge request gitlab-org/gitaly!4482
2022-04-19Merge branch 'jc-list-repo-to-use-shorter-grace-period' into 'master'John Cai
repocleaner: Allow NewWalker to receive grace period parameter Closes #4164 See merge request gitlab-org/gitaly!4474
2022-04-19praefect: Set 6 hour grace period in list_untracked_repositoriesJohn Cai
A default grace period of 6 hours is sufficient for the subcommand.
2022-04-19Merge branch 'smh-verify-subcmd' into 'master'Sami Hiltunen
Implement 'praefect verify' subcommand Closes #4091 See merge request gitlab-org/gitaly!4463
2022-04-19Implement 'praefect verify' subcommandSami Hiltunen
Praefect periodically verifies the repository metadata in the background. The interval may be too long especially if there was an incident, say a disk failure. After recovering the Gitaly node, the disk may be completely empty or may contains an older snapshot which does not contain all expected repositories. In such cases, it would be great if Praefect could be manually instructed to verify the storage again as soon as possible rather than waiting for the next scheduled verification interval to pass. This commit adds the 'praefect verify' subcommand that allows for doing that. It takes in either a repository id, a virtual storage or a (virtual storage, storage) tuple and marks all replicas matching the selector as being unverified. Praefect's metadata verifier will then prioritize verifying these repositories over other repositories pending verification. This allows administrators to speed up the verification process and thus recovery. Changelog: added
2022-04-19Add proto definitions for MarkUnverified RPCSami Hiltunen
With the introduction of metadata verification, Praefect needs a tool to manually mark a repository as needing verification immediately rather than after the specified verification interval has passed. That tool will require a new RPC that it can call achieve its goal. This commit adds the proto definitions for MarkUnverified RPC which can be called to either mark a single repository by ID, a whole virtual storage, or a whole storage as needing verification. Changelog: added
2022-04-19Merge branch 'jc-limit-error-proto' into 'master'Sami Hiltunen
proto: Add LimitError as a structured error See merge request gitlab-org/gitaly!4476
2022-04-17Add support for FIPS encryptionsh-fips-modeStan Hu
This commit adds support of using a FIPS-validated SSL library with compiled Go executables when `FIPS_MODE=1 make` is run. A Go compiler that supports BoringSSL either directly (e.g. the `dev.boringcrypto` branch) or with a dynamically linked OpenSSL (e.g. https://github.com/golang-fips/go) is required. This is similar to the changes to support FIPS in GitLab Runner and in GitLab Pages: https://gitlab.com/gitlab-org/gitlab-pages/-/merge_requests/716 Changelog: added
2022-04-14Merge branch 'toon-go-fixes' into 'master'Toon Claes
Makefile: Make GITALY_EXECUTABLES deferred again See merge request gitlab-org/gitaly!4477
2022-04-14Merge branch 'jc-update-gitaly-toml-with' into 'master'Toon Claes
Add new examples for concurrency and rate limiters See merge request gitlab-org/gitaly!4472
2022-04-14Makefile: Make GITALY_EXECUTABLES deferred againToon Claes
Recently, in b5c9c7efc (Makefile: Rename find_commands to GITALY_EXECUTABLES, 2022-03-25), we've changed the variable that holds the names of all Gitaly executable to be an immediate variable. While this is a good idea in general, it causes trouble in CI. In CI the compiled executables are put in cache, but the source files are not. So when files are pulled from cache, and any make target is built, it will expand GITALY_EXECUTABLES. Now source files are not pulled from the cache, so the `cmd` directory is missing. And therefore we revert it back to be a deferred variable.
2022-04-14proto: Add LimitError as a structured errorjc-limit-error-protoJohn Cai
When Gitaly enforces a limit, either due rate limiting or concurrency limiting, it needs to be able to return an error to its clients to provide context into why it failed so that clients can then inform its callers of why the call failed. Changelog: added
2022-04-14Merge branch 'smh-verify-metadata-output' into 'master'Sami Hiltunen
Expose last verification time in 'praefect metadata' Closes #4092 See merge request gitlab-org/gitaly!4466
2022-04-13Ignore verification columns for read-only cache updatessmh-verification-trigger-only-generationSami Hiltunen
Read-only cache receives invalidations on record updates via triggers in Postgres. Currently the notifications are sent for any modification to the records. The verification related columns are not relevant to the operation of the cache so this commit ignores the changes to the columns in the triggers. Changelog: changed
2022-04-13Expose last verification time in 'praefect metadata'Sami Hiltunen
Administrator's may want to know when Praefect has last verified a replica. This commit exposes that information via the 'praefect metadata' command. Changelog: changed
2022-04-13Fetch verified_at as part of GetRepositoryMetadataSami Hiltunen
GetRepositoryMetadata fetches a repository's metadata from the database. This commit expands the query to also fetch the newly added verified_at column so we can expose it in the 'praefect metadata' command to the admins.
2022-04-13Expose VerifiedAt via GetRepositoryMetadata proto definitionsSami Hiltunen
Administrators may want to know when a replica has been last verified by Praefect. GetRepositoryMetadata RPC is called by the 'metadata' sub-command to retrieve infromation about a repository and its replicas from Praefect's database. This commit adds the proto definitions for exposing the last verification time of replicas to the metadata sub-command. Changelog: changed
2022-04-13Merge branch 'smh-background-verifier' into 'master'Sami Hiltunen
Initial implementation of a metadata verifier See merge request gitlab-org/gitaly!4459
2022-04-13Wire metadata verifier in Praefect's mainSami Hiltunen
This commit wires the metadata verifier in Praefect's main so it can actually be configured for use. It's default disabled still as it still is missing some functionality that should be in place before generally enabling it, for example tooling like metrics, integration in to the 'praefect metadata' tool and a background routine to release stale leases. Changelog: added
2022-04-13Initial implementation of a metadata verifierSami Hiltunen
This commit adds an initial implementation of a metadata verifier to Praefect. Praefect stores metadata of the repositories stored on the cluster in Postgres. These metadata records may become out of sync with the disks if changes occur on the disks without going through Praefect, for example due to disk failures or manual modifications. Right now, Praefect only contains some temporary logic to clean up invalid metadata records when replication is attempted using a non-existent source repository. This was mostly put in place to stop reconciliation loops where Praefect keeps scheduling replication jobs from the non-existent repository that will never succeed. While this performs some clean up, it's not sufficient to catch cases where something happens in the background without prompting replication. The metadata verifier introduced in this commit aims to catch these issues by verifying the metadata eveynow and then in the background with the state on the disks. For now, only the existence of the replica is verified, not the actual contents by checksumming. Each replica contains a 'verified_at' timestamp in the database that tells Praefect when the metadata record was last verified. If it exceeds a configurable threshold, the replica is considered to be due for reverification. Praefect then asks the Gitaly hosting the replica whether the replica still exists. If it doesn't the invalid metadata record is deleted and the removal is logged. To avoid multiple Praefects verifying the same replica concurrently, Praefect acquires the verification lease on the replica in the database prior to verifying the existence of the repository. The scheduling is fairly simplistic at the moment with each Praefect acquiring a batch of work every two seconds. This also serves as a crude way to rate limit the background verification work rather to avoid consuming too many resources while doing it. This should be sufficient for now althoug could later be improved. Praefect leaves the repository's record in place even if all of its replicas have been lost. This ensures no data loss goes unnoticed and that the loss needs to be acknowledged by removing the repository manually. Changelog: added
2022-04-13Add migrations for background verification schemaSami Hiltunen
This commit adds the necessary schema changes for the metadata background verification. Each replica receives two new columns: 1. 'verified_at' which contains the timestamp of the last successful verification of the replica. This effectively allows for identifying replicas that are in need of reverification. 2. 'verification_leased_until' which contains a timestamp until which a worker has acquired a lease to reverify the repository. This prevents multiple workers from picking the same repository for reverification at the same time. 'verification_queue' index is added to index replicas which have not been acquired by any worker. This allows for efficientl querying replicas that are in need of reverification later. Changelog: other
2022-04-12Update changelog for 14.9.3GitLab Release Tools Bot
[ci skip]
2022-04-12Merge branch 'jc-backpressure-training' into 'master'John Cai
README: Add link to backpressure video See merge request gitlab-org/gitaly!4475
2022-04-12README: Add link to backpressure videoJohn Cai
Add a link in the presentations section of the README to a video explaining the different ways to configure request limits (or backpressure) in Gitaly.
2022-04-08repocleaner: Allow NewWalker to receive grace period parameterJohn Cai
The Walker is used by both the periodic job that checks for untracked repositories as well as the list-untracked-repositories subcommand. It is useful to be able to give it a different grace period value. Add this parameter to the signature.
2022-04-08Merge branch 'command-stats-metrics' into 'master'Toon Claes
Expose command stats (rusage) metrics via prometheus See merge request gitlab-org/gitaly!4464