Age | Commit message (Collapse) | Author |
|
Changelog: other
|
|
Split up the catfile cache
Closes #3763
See merge request gitlab-org/gitaly!3886
|
|
Praefect: add sidechannel support
See merge request gitlab-org/gitaly!3862
|
|
This commit adds backchannel support to the main gRPC listener of
Praefect. And if clients make gRPC calls with sidechannels, Praefect
will now proxy these to the Gitaly backend.
Changelog: added
|
|
remote: Drop FetchInternalRemote RPC
Closes #3667
See merge request gitlab-org/gitaly!3898
|
|
The `GetBlob()` RPC retrieves a single blob by object ID from the
repository's object database. This is implemented via the catfile cache
by first retrieving object info via `Batch.Info()`, and then the object
data is retrieved via another call to `Batch.Blob()` in case the info
says it is indeed a blob. This is inefficient: we spawn two processes
and have two round trips to those processes. The only upside is that we
avoid reading objects which are not a blob, but given that non-blob
objetcs are typically not large-ish it is debatable whether this really
has much of an impact at all.
Refactor the code to use our new cached ObjectReader interface and just
read the object directly. This only requires a single process and avoids
one round trip by just reading the object directly.
Changelog: performance
|
|
The notification that a process is done and closed gets sent out right
at the beginning of `returnWhenDone()` and is thus sent out _before_ the
process really is done already. This thus doesn't quite signal that the
process is done, but just that is about to be processed.
Defer the signal to the end such that it's sent only when the process
has really been processed.
|
|
The `BatchProcess()` function either retrieves a `Batch` process from
the cache, or alternatively it creates both the object and object info
readers. Refactor this to reuse the caches of both these new readers
such that the `Batch` is constructed from those cached readers and
remove the old `Batch` cache.
|
|
Provide cacheable object info readers such that callers do not have to
always create two separate git-cat-file(1) processes even though they
only need the functionality to read object infos.
|
|
Provide cacheable object readers such that callers do not have to always
create two separate git-cat-file(1) processes even though they only need
the functionality to read objects.
|
|
The `objectReader` structure currently requires the caller to know about
the object's type before it is actually being read: if the type does not
match, then the object's data is discarded and an error is returned.
This forces callers to first learn about the object type, e.g. via `git
cat-file --batch-info`, which is additional overhead that is often not
necessary at all.
Refactor the interface to not require a type anymore. Like this, callers
can use the interface without knowing about the object type beforehand.
Instead, callers may still discard the object in case they realize it is
not the type they expected.
|
|
We do not allow callers outside of the catfile package to call `Close()`
on returned processes given that we want to be the only ones which
manage the lifetime of potentially cached processes.
Make the function private to better highlight this. This allows us to
embed it into the reader interfaces without exposing close to outside
callers such that we can easily access the `close()` function on
interfaces internally.
|
|
We're about to split up the catfile cache into two different caches for
object readers and object info readers. Refactor the code to use a
generic interface `cacheable` such that we can easily reuse the logic to
manage caches of different types of cached processes.
|
|
Extract the low-level logic which handles the management of processes
such that we can reuse it for different types.
|
|
We will split up the cache into two caches, one for object readers and
one for object info readers. As a result, the current gauge we use for
tracking the number of cached batch processes will not be as informative
anymore given that it can only report the total count of processes, but
cannot distinguish the types.
Convert the gauge to a vector with a "type" label such that we can
distinguish cached processes. While at it, this also pulls out handling
of the metric from generic parts which manage the stack of cached
processe and instead moves it into a separate function. This has the
benefit that we can easily split out the generic logic for the second
cache, and it removes handling of metrics from the critical section.
|
|
The `len()` function provided by the `ProcessCache` structure counts
entries without taking a lock, which isn't all that helpful: we can just
call `len()` directly. The more interesting case though is to count
cache members where we're required to take the lock due to possible
concurrent updates, and we're about to add a few callsites which need
this.
Remove the old `len()` function and replace it with `entryCount()`,
which counts members under the mutex. This is a preparatory commit for
the subsequent commit which changes the cache members Promeutheus metric
into a vector.
|
|
We will split up the cache into two caches, one for object readers and
one for object info readers. For this, we will want to reuse the "stack"
that keeps track of cached processes. But due to it currently being
entangled with Prometheus metrics, this refactoring is harder to do than
one would want it to be.
Pull up metrics into the high-level functions such that we can easily
split out the low-level stack at a later point. This also has the
benefit that we're not handling metrics in the critical section anymore.
|
|
We're about to extend the `BatchCache` to not only be a cache for
`Batch` objects, but also for `ObjectReader`s and `ObjectInfoReader`s.
Rename the cache to `ProcessCache` to reflect this upcoming change.
|
|
datastore: Fix storage cleanup's handling of timezones
See merge request gitlab-org/gitaly!3930
|
|
The FetchInternalRemote RPC had been used internally to replicate
repositories across different Gitaly nodes. At some point in time, this
was converted to do a direct fetch though without an additional RPC call
because as it turned out, doing inter-Gitaly mutating RPC calls is
problematic in combination with transactions given that the remote side
would now try to cast votes against another Gitaly. Nowadays, there are
no callers left which use this RPC.
Remove the deprecated `FetchInternalRemote()` call. The backing logic
which is still internally called remains though for use in other parts
of Gitaly. We may eventually want to find a better place for it to live.
Changelog: removed
|
|
Refactor `FetchInternalRemote()` to use the new function to get the
default branch provided by the remoterepo package.
|
|
We have an ad-hoc implementation of `GetDefaultBranch()` for remote
repositories in the `FetchInternalRemote()` RPC implementation. Given
that this RPC call simply maps to `localrepo.GetDefaultBranch()`, it
makes sense to pull up the implementation into the remoterepo interface
to make it generally available.
Implement `remoterepo.GetDefaultBranch()` and move tests of this
function from localrepo-specific tests into the shared test suite.
|
|
The tests for `FetchInternalRemote` are a bit antique and hard to
follow. Refactor them to reflect a more modern style. This is also a
preparation of the upcoming change to drop the RPC call, where we only
retain its internal non-RPC backing logic.
|
|
While the testcase for ReplicateRepository which exercises failing
fetches fails as expected, the reason for failure is not the failing
fetch but instead it is that we do not have the required set of binaries
installed. Fixing this surfaces another issue: because fetches nowadays
happen internally instead of calling the `FetchInternalRemote()` RPC,
the mock remote server which stubs out the RPC with an error isn't
invoked at all.
Fix the test setup by setting up a "real" gRPC server, where the fetch
failure is provoked by writing garbage into the source repo's HEAD.
|
|
Timestamps in the `storage_cleanups` table are stored without timezones,
but our timezone handling in `AcquireNextStorage()` is inconsistent: we
typically just use `NOW()` to get the current timestamp, but in one case
we use `NOW() AT TIME ZONE 'UTC'`. In some setups, this leads to time
stamps which cannot be correctly compared with each other.
Fix the bug by consistently storing time stamps without time zones. This
bug is similar to 4a2ac0ed4 (datastore: Fix acknowledgement of stale
jobs considering timezones, 2021-08-20).
Changelog: fixed
|
|
replication: Process replication events for storages in parallel
See merge request gitlab-org/gitaly!3894
|
|
catfile: Reintroduce refactoring and plug Goroutine leak in the cache
Closes gitlab-com/gl-infra/production#5566
See merge request gitlab-org/gitaly!3905
|
|
A recent refactoring of the catfile cache has caused an incident via a
new Goroutine leak. Assert that we do not leak any Goroutines via
`MustHaveNoGoroutines()` now that the leak has been plugged such that we
cannot have new regressions there.
|
|
When constructing processes as part of the catfile cache, then we must
create a separate context in case the process is cacheable such that it
does not get killed when the original request context gets cancelled.
Starting with commit 36c166a17 (catfile: Explicitly close processes,
2021-09-09), we have converted the code to always explicitly close
processes instead of doing so via the context. As a result though, this
separate detached context will never get cancelled at all, causing us to
leak Goroutines which listen on context cancellation to perform various
cleanup operations.
While we could simply revert this commit, the original implementation
has been quite indirect given that all process cancellation happened via
this context. The context has thus been passed to the `BatchProcess`,
which knew to cancel it as required. This is the burden to children of
the cache, which is an inversion of responsibilites: it should be the
cache which is managing lifetimes, not the processes.
Fix the Goroutine leak by instead stowing away the separate context's
cancellation function as part of the cache entries. Like this, the cache
can retrieve and cancel them whenever it evicts entries from the cache,
and it can cancel them when a process cannot be returned to the cache in
case it is empty. In the other case of uncacheable processes, we do not
replace the surrounding context's done function at all and can thus
continue to rely on the caller to eventually cancel the context.
Fixes: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/5566
Changelog: fixed
|
|
Explicitly stop the catfile cache whenever we construct it in our test
suite such that we shut down the monitoring Goroutine and evict any
processes hosted by it.
|
|
In some packages, we have started to assert that we do not leak
Goroutines via the goleak package. One exception we need to always have
present is for the opencensus package given that it starts a Goroutine
in its `init()` function which we have no way to stop.
Provide a central function `MustHaveNoGoroutines()` in the testhelper
package such that we do not have to duplicate this exception whenever we
use the goleak package.
|
|
This reverts commit f99dae3b6e630e3a251a3af2f3029320c594d829, reversing
changes made to 15c4a53fb5a093291e0331331efadb52d76f0871.
|
|
Update ruby gem activesupport to v6.1.4.1
Closes #3750
See merge request gitlab-org/gitaly!3929
|
|
Since gitlab-labkit also depends on activesupport, this needed updating
too.
These gems have been updated to match gitlab-rails.
Changelog: changed
|
|
Add add-repository praefect subcmd
Closes #3773
See merge request gitlab-org/gitaly!3918
|
|
client: add sidechannel support
Closes gitlab-com/gl-infra/scalability#1303
See merge request gitlab-org/gitaly!3900
|
|
This change adds publicly exported code to the gitaly/client package
that allows Gitaly clients to accept sidechannel connections coming
back from a Gitaly server they have connected to. This is done via a
new dial function DialSidechannel.
Changelog: added
|
|
Determine when CreateBundleFromRefList would generate an empty bundle
See merge request gitlab-org/gitaly!3923
|
|
Adds track-repository subcommand that allows an admin to add a repository that
exists on one of the gitaly nodes to the praefect database. Does some
safety checks before inserting the reords.
Changelog: added
|
|
Given that we are able to configure amount of workers used to process
replication events we add that capability into the ReplMgr.
Now it starts configured amount of workers per virtual storage to
process replication events. If amount of workers is higher than
the minimum amount of gitaly storages it will be aligned with the
latest and the log message about will be issued. If amount is not
set the default value of 1 will be used (the old behaviour).
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/2915
|
|
The new configuration parameter to set number of the workers used
to process replication jobs. It is a safe incremental move on to the
rolling out parallel storage processing by the replication manager.
The default value 1 behaves the same as no changes done to replication
manager (the old behaviour).
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/2915
|
|
Current implementation iterates over gitaly storages and executes
replication events on each sequentially. As some replication events
could take significant amount of time to complete this results into
a long replication lag for other storages as their replication is
blocked. With this change we start to process replication events for
each gitaly storage in parallel, so there is no replication lag between.
If one storage is blocked on the processing of the replication event
all others are allowed to run their replication events in a mean time.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/2915
Changelog: changed
|
|
To support parallel processing of the replication events
we need to refactor usage of the backoff function. Now it
is a factory that creates backoff functions on demand.
With that we could create a backoff function for each
goroutine, so they are independent one from another.
|
|
In order to gracefully handle incremental backups where nothing has
changed, we need to determine when `git bundle create` failed because
the bundle was empty. This isn't an error for incremental backups, it
simply means there were no updates.
The FailedPrecondition GRPC code was chosen here since it best matches
the litmus test:
> (c) Use FailedPrecondition if the client should not retry until
> the system state has been explicitly fixed. E.g., if an "rmdir"
> fails because the directory is non-empty, FailedPrecondition
> should be returned since the client should not retry unless
> they have first fixed up the directory by deleting files from it.
Changelog: changed
|
|
Incremental restores
See merge request gitlab-org/gitaly!3915
|
|
Changelog: added
|
|
We need to be able to specify the locator since the legacy locator will
never support incrementals.
|
|
Previously this helper would create bundles for all refs in a
repository but to validate incremental backups we will need to be able
to generate bundles that only have a selection of refs/objects.
To maintain compatibility with existing call sites the default behaviour
(no patterns) will continue to bundle all refs.
|
|
There's no reason to distinguish between a full backup and an increment.
So let's just treat all steps an increments.
|
|
To allow restoring from incremental backups we need to be able to apply
a list of bundles. So we change the type representing a backup from the
previous `Full` type to a new `Backup` type, that better represents this
list of steps.
|