Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-10-07Enable Praefect in PostUploadPackWithSidechannel testsgit-2.31Quang-Minh Nguyen
Changelog: other
2021-10-07Merge branch 'pks-catfile-cache-split' into 'master'Patrick Steinhardt
Split up the catfile cache Closes #3763 See merge request gitlab-org/gitaly!3886
2021-10-07Merge branch 'jv-sidechannel-praefect' into 'master'James Fargher
Praefect: add sidechannel support See merge request gitlab-org/gitaly!3862
2021-10-06Praefect: proxy sidechannelsJacob Vosmaer
This commit adds backchannel support to the main gRPC listener of Praefect. And if clients make gRPC calls with sidechannels, Praefect will now proxy these to the Gitaly backend. Changelog: added
2021-10-06Merge branch 'pks-drop-fetch-internal-remote' into 'master'Pavlo Strokov
remote: Drop FetchInternalRemote RPC Closes #3667 See merge request gitlab-org/gitaly!3898
2021-10-06blob: Convert `GetBlob()` to use object readerPatrick Steinhardt
The `GetBlob()` RPC retrieves a single blob by object ID from the repository's object database. This is implemented via the catfile cache by first retrieving object info via `Batch.Info()`, and then the object data is retrieved via another call to `Batch.Blob()` in case the info says it is indeed a blob. This is inefficient: we spawn two processes and have two round trips to those processes. The only upside is that we avoid reading objects which are not a blob, but given that non-blob objetcs are typically not large-ish it is debatable whether this really has much of an impact at all. Refactor the code to use our new cached ObjectReader interface and just read the object directly. This only requires a single process and avoids one round trip by just reading the object directly. Changelog: performance
2021-10-06catfile: Fix race with done-notificationsPatrick Steinhardt
The notification that a process is done and closed gets sent out right at the beginning of `returnWhenDone()` and is thus sent out _before_ the process really is done already. This thus doesn't quite signal that the process is done, but just that is about to be processed. Defer the signal to the end such that it's sent only when the process has really been processed.
2021-10-06catfile: Implement `BatchProcess()` in terms of both readersPatrick Steinhardt
The `BatchProcess()` function either retrieves a `Batch` process from the cache, or alternatively it creates both the object and object info readers. Refactor this to reuse the caches of both these new readers such that the `Batch` is constructed from those cached readers and remove the old `Batch` cache.
2021-10-06catfile: Provide cacheable object info readersPatrick Steinhardt
Provide cacheable object info readers such that callers do not have to always create two separate git-cat-file(1) processes even though they only need the functionality to read object infos.
2021-10-06catfile: Provide cacheable object readersPatrick Steinhardt
Provide cacheable object readers such that callers do not have to always create two separate git-cat-file(1) processes even though they only need the functionality to read objects.
2021-10-06catfile: Allow reading objects with unknown typePatrick Steinhardt
The `objectReader` structure currently requires the caller to know about the object's type before it is actually being read: if the type does not match, then the object's data is discarded and an error is returned. This forces callers to first learn about the object type, e.g. via `git cat-file --batch-info`, which is additional overhead that is often not necessary at all. Refactor the interface to not require a type anymore. Like this, callers can use the interface without knowing about the object type beforehand. Instead, callers may still discard the object in case they realize it is not the type they expected.
2021-10-06catfile: Make `close()` a private functionPatrick Steinhardt
We do not allow callers outside of the catfile package to call `Close()` on returned processes given that we want to be the only ones which manage the lifetime of potentially cached processes. Make the function private to better highlight this. This allows us to embed it into the reader interfaces without exposing close to outside callers such that we can easily access the `close()` function on interfaces internally.
2021-10-06catfile: Allow for generic cached valuesPatrick Steinhardt
We're about to split up the catfile cache into two different caches for object readers and object info readers. Refactor the code to use a generic interface `cacheable` such that we can easily reuse the logic to manage caches of different types of cached processes.
2021-10-06catfile: Extract logic handling cached processesPatrick Steinhardt
Extract the low-level logic which handles the management of processes such that we can reuse it for different types.
2021-10-06catfile: Convert reporting of cache members to use a gauge vectorPatrick Steinhardt
We will split up the cache into two caches, one for object readers and one for object info readers. As a result, the current gauge we use for tracking the number of cached batch processes will not be as informative anymore given that it can only report the total count of processes, but cannot distinguish the types. Convert the gauge to a vector with a "type" label such that we can distinguish cached processes. While at it, this also pulls out handling of the metric from generic parts which manage the stack of cached processe and instead moves it into a separate function. This has the benefit that we can easily split out the generic logic for the second cache, and it removes handling of metrics from the critical section.
2021-10-06catfile: Provide function to safely count cache membersPatrick Steinhardt
The `len()` function provided by the `ProcessCache` structure counts entries without taking a lock, which isn't all that helpful: we can just call `len()` directly. The more interesting case though is to count cache members where we're required to take the lock due to possible concurrent updates, and we're about to add a few callsites which need this. Remove the old `len()` function and replace it with `entryCount()`, which counts members under the mutex. This is a preparatory commit for the subsequent commit which changes the cache members Promeutheus metric into a vector.
2021-10-06catfile: Disentangle metrics from managing the cachePatrick Steinhardt
We will split up the cache into two caches, one for object readers and one for object info readers. For this, we will want to reuse the "stack" that keeps track of cached processes. But due to it currently being entangled with Prometheus metrics, this refactoring is harder to do than one would want it to be. Pull up metrics into the high-level functions such that we can easily split out the low-level stack at a later point. This also has the benefit that we're not handling metrics in the critical section anymore.
2021-10-06catfile: Rename `BatchCache` to `ProcessCache`Patrick Steinhardt
We're about to extend the `BatchCache` to not only be a cache for `Batch` objects, but also for `ObjectReader`s and `ObjectInfoReader`s. Rename the cache to `ProcessCache` to reflect this upcoming change.
2021-10-06Merge branch 'pks-datastore-cleanup-timezone' into 'master'Pavlo Strokov
datastore: Fix storage cleanup's handling of timezones See merge request gitlab-org/gitaly!3930
2021-10-06remote: Drop FetchInternalRemote RPCPatrick Steinhardt
The FetchInternalRemote RPC had been used internally to replicate repositories across different Gitaly nodes. At some point in time, this was converted to do a direct fetch though without an additional RPC call because as it turned out, doing inter-Gitaly mutating RPC calls is problematic in combination with transactions given that the remote side would now try to cast votes against another Gitaly. Nowadays, there are no callers left which use this RPC. Remove the deprecated `FetchInternalRemote()` call. The backing logic which is still internally called remains though for use in other parts of Gitaly. We may eventually want to find a better place for it to live. Changelog: removed
2021-10-06remote: Refactor `FetchInternalRemote()` to use remoterepo interfacePatrick Steinhardt
Refactor `FetchInternalRemote()` to use the new function to get the default branch provided by the remoterepo package.
2021-10-06remoterepo: Implement `GetDefaultBranch()`Patrick Steinhardt
We have an ad-hoc implementation of `GetDefaultBranch()` for remote repositories in the `FetchInternalRemote()` RPC implementation. Given that this RPC call simply maps to `localrepo.GetDefaultBranch()`, it makes sense to pull up the implementation into the remoterepo interface to make it generally available. Implement `remoterepo.GetDefaultBranch()` and move tests of this function from localrepo-specific tests into the shared test suite.
2021-10-06remote: Modernize tests for `FetchInternalRemote`Patrick Steinhardt
The tests for `FetchInternalRemote` are a bit antique and hard to follow. Refactor them to reflect a more modern style. This is also a preparation of the upcoming change to drop the RPC call, where we only retain its internal non-RPC backing logic.
2021-10-06repository: Fix test for failing fetch in ReplicateRepositoryPatrick Steinhardt
While the testcase for ReplicateRepository which exercises failing fetches fails as expected, the reason for failure is not the failing fetch but instead it is that we do not have the required set of binaries installed. Fixing this surfaces another issue: because fetches nowadays happen internally instead of calling the `FetchInternalRemote()` RPC, the mock remote server which stubs out the RPC with an error isn't invoked at all. Fix the test setup by setting up a "real" gRPC server, where the fetch failure is provoked by writing garbage into the source repo's HEAD.
2021-10-06datastore: Fix storage cleanup's handling of timezonesPatrick Steinhardt
Timestamps in the `storage_cleanups` table are stored without timezones, but our timezone handling in `AcquireNextStorage()` is inconsistent: we typically just use `NOW()` to get the current timestamp, but in one case we use `NOW() AT TIME ZONE 'UTC'`. In some setups, this leads to time stamps which cannot be correctly compared with each other. Fix the bug by consistently storing time stamps without time zones. This bug is similar to 4a2ac0ed4 (datastore: Fix acknowledgement of stale jobs considering timezones, 2021-08-20). Changelog: fixed
2021-10-06Merge branch 'ps-replication-parallel-processing' into 'master'Pavlo Strokov
replication: Process replication events for storages in parallel See merge request gitlab-org/gitaly!3894
2021-10-06Merge branch 'pks-catfile-fixed-goroutine-leakage' into 'master'Patrick Steinhardt
catfile: Reintroduce refactoring and plug Goroutine leak in the cache Closes gitlab-com/gl-infra/production#5566 See merge request gitlab-org/gitaly!3905
2021-10-06catfile: Test for Goroutine leaksPatrick Steinhardt
A recent refactoring of the catfile cache has caused an incident via a new Goroutine leak. Assert that we do not leak any Goroutines via `MustHaveNoGoroutines()` now that the leak has been plugged such that we cannot have new regressions there.
2021-10-06catfile: Plug Goroutine leak in the cachePatrick Steinhardt
When constructing processes as part of the catfile cache, then we must create a separate context in case the process is cacheable such that it does not get killed when the original request context gets cancelled. Starting with commit 36c166a17 (catfile: Explicitly close processes, 2021-09-09), we have converted the code to always explicitly close processes instead of doing so via the context. As a result though, this separate detached context will never get cancelled at all, causing us to leak Goroutines which listen on context cancellation to perform various cleanup operations. While we could simply revert this commit, the original implementation has been quite indirect given that all process cancellation happened via this context. The context has thus been passed to the `BatchProcess`, which knew to cancel it as required. This is the burden to children of the cache, which is an inversion of responsibilites: it should be the cache which is managing lifetimes, not the processes. Fix the Goroutine leak by instead stowing away the separate context's cancellation function as part of the cache entries. Like this, the cache can retrieve and cancel them whenever it evicts entries from the cache, and it can cancel them when a process cannot be returned to the cache in case it is empty. In the other case of uncacheable processes, we do not replace the surrounding context's done function at all and can thus continue to rely on the caller to eventually cancel the context. Fixes: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/5566 Changelog: fixed
2021-10-06catfile: Explicitly stop the catfile cache in testsPatrick Steinhardt
Explicitly stop the catfile cache whenever we construct it in our test suite such that we shut down the monitoring Goroutine and evict any processes hosted by it.
2021-10-06testhelper: Introduce new function to check for leaking GoroutinesPatrick Steinhardt
In some packages, we have started to assert that we do not leak Goroutines via the goleak package. One exception we need to always have present is for the opencensus package given that it starts a Goroutine in its `init()` function which we have no way to stop. Provide a central function `MustHaveNoGoroutines()` in the testhelper package such that we do not have to duplicate this exception whenever we use the goleak package.
2021-10-06Revert "Merge branch 'sh-revert-3853' into 'master'"Patrick Steinhardt
This reverts commit f99dae3b6e630e3a251a3af2f3029320c594d829, reversing changes made to 15c4a53fb5a093291e0331331efadb52d76f0871.
2021-10-06Merge branch 'update_activesupport' into 'master'John Cai
Update ruby gem activesupport to v6.1.4.1 Closes #3750 See merge request gitlab-org/gitaly!3929
2021-10-06Update ruby gem activesupport to v6.1.4.1James Fargher
Since gitlab-labkit also depends on activesupport, this needed updating too. These gems have been updated to match gitlab-rails. Changelog: changed
2021-10-05Merge branch 'jc-praefect-command-add-repo' into 'master'Toon Claes
Add add-repository praefect subcmd Closes #3773 See merge request gitlab-org/gitaly!3918
2021-10-05Merge branch 'jv-sidechannel-client' into 'master'John Cai
client: add sidechannel support Closes gitlab-com/gl-infra/scalability#1303 See merge request gitlab-org/gitaly!3900
2021-10-05client: add sidechannel supportJacob Vosmaer
This change adds publicly exported code to the gitaly/client package that allows Gitaly clients to accept sidechannel connections coming back from a Gitaly server they have connected to. This is done via a new dial function DialSidechannel. Changelog: added
2021-10-05Merge branch 'check_empty_bundle' into 'master'John Cai
Determine when CreateBundleFromRefList would generate an empty bundle See merge request gitlab-org/gitaly!3923
2021-10-05Add track-repository praefect subcmdJohn Cai
Adds track-repository subcommand that allows an admin to add a repository that exists on one of the gitaly nodes to the praefect database. Does some safety checks before inserting the reords. Changelog: added
2021-10-05replication: Process replication events with configurable number of workersPavlo Strokov
Given that we are able to configure amount of workers used to process replication events we add that capability into the ReplMgr. Now it starts configured amount of workers per virtual storage to process replication events. If amount of workers is higher than the minimum amount of gitaly storages it will be aligned with the latest and the log message about will be issued. If amount is not set the default value of 1 will be used (the old behaviour). Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/2915
2021-10-05replication: Configure number of replication workers per virtual storagePavlo Strokov
The new configuration parameter to set number of the workers used to process replication jobs. It is a safe incremental move on to the rolling out parallel storage processing by the replication manager. The default value 1 behaves the same as no changes done to replication manager (the old behaviour). Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/2915
2021-10-05replication: Process replication events for storages in parallelPavlo Strokov
Current implementation iterates over gitaly storages and executes replication events on each sequentially. As some replication events could take significant amount of time to complete this results into a long replication lag for other storages as their replication is blocked. With this change we start to process replication events for each gitaly storage in parallel, so there is no replication lag between. If one storage is blocked on the processing of the replication event all others are allowed to run their replication events in a mean time. Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/2915 Changelog: changed
2021-10-05replication: Backoff function providerPavlo Strokov
To support parallel processing of the replication events we need to refactor usage of the backoff function. Now it is a factory that creates backoff functions on demand. With that we could create a backoff function for each goroutine, so they are independent one from another.
2021-10-04Determine when CreateBundleFromRefList would generate an empty bundleJames Fargher
In order to gracefully handle incremental backups where nothing has changed, we need to determine when `git bundle create` failed because the bundle was empty. This isn't an error for incremental backups, it simply means there were no updates. The FailedPrecondition GRPC code was chosen here since it best matches the litmus test: > (c) Use FailedPrecondition if the client should not retry until > the system state has been explicitly fixed. E.g., if an "rmdir" > fails because the directory is non-empty, FailedPrecondition > should be returned since the client should not retry unless > they have first fixed up the directory by deleting files from it. Changelog: changed
2021-10-04Merge branch 'restore_incremental' into 'master'Christian Couder
Incremental restores See merge request gitlab-org/gitaly!3915
2021-10-04backup: Restore from incremental backupsJames Fargher
Changelog: added
2021-10-04backup: Extract shared restore test setup into individual casesJames Fargher
We need to be able to specify the locator since the legacy locator will never support incrementals.
2021-10-04Accept patterns for BundleTestRepo test helperJames Fargher
Previously this helper would create bundles for all refs in a repository but to validate incremental backups we will need to be able to generate bundles that only have a selection of refs/objects. To maintain compatibility with existing call sites the default behaviour (no patterns) will continue to bundle all refs.
2021-10-04backup: Write LATEST file for incrementsJames Fargher
There's no reason to distinguish between a full backup and an increment. So let's just treat all steps an increments.
2021-10-04backup: Remodel backups to be a series of stepsJames Fargher
To allow restoring from incremental backups we need to be able to apply a list of bundles. So we change the type representing a backup from the previous `Full` type to a new `Backup` type, that better represents this list of steps.