Age | Commit message (Collapse) | Author |
|
|
|
[ci skip]
|
|
Disable pruning for FetchBundle
See merge request gitlab-org/gitaly!3949
|
|
It turns out that incremental backup bundles do not contain enough
information for git to sensibly prune.
The tests had to be substantially refactored to allow this to be tested
properly. Now the final repo has its checksum verified against a known
standard.
Changelog: fixed
|
|
This helper method saves us having to actually clone the repo to figure
out what the checksum is.
|
|
When converting a checksum to a string it is often convenient to get a
full zero-padded hex output when the checksum is zero. So here we
encapsulate this logic into a method.
|
|
Stop calling PackObjectsHook from gitaly-hooks
See merge request gitlab-org/gitaly!3991
|
|
Add FindRefs RPC
Closes #3826
See merge request gitlab-org/gitaly!3947
|
|
Adds implementation and tests for FindRefsByOID RPC.
Changelog: added
|
|
Adds the proto definition and generated go and ruby code for a new RPC
FindRefsByOID used to find refs that point to a specific object ID with
some flags: patterns the refs need to match, a limit of how many results
are returned, and a field by which to sort the results.
See https://git-scm.com/docs/git-for-each-ref as a reference for how
these flags are used in git for-each-ref.
Changelog: added
|
|
In order to support more flags when calling git for-each-ref, implement
a options pattern for these flags. This way, more options can be defined
without having to change the ForEachRef method signature each time.
Changelog: changed
|
|
Add Pagination to FindAllTags PRC
See merge request gitlab-org/gitaly!3861
|
|
Makefile: Stop installing binaries into source dir
See merge request gitlab-org/gitaly!3990
|
|
With commit eb6fd6056 (Makefile: Stop installing binaries into source
dir, 2021-05-07), we stopped installing Gitaly binaries into its root
directory given that all users should either use the installed location,
or alternatively use the directory in "_build/bin". This change had been
reverted due to a regression, where the init.d script in Rails was still
using the old location of Gitaly [1]. The issue has since been fixed by
adjusting paths in the script to point into our build directory.
Reintroduce the change and stop installing binaries into the root
directory. To avoid existing installations from silently continuing to
use the old, outdated binaries, we also delete them from the source
directory.
[1]: https://gitlab.com/gitlab-org/gitlab/-/issues/331758
|
|
Perform joins using repository_id in valid_primaries view
Closes #3853
See merge request gitlab-org/gitaly!3988
|
|
Remove backwards_compatibility CI job
See merge request gitlab-org/gitaly!3993
|
|
Contributes to https://gitlab.com/gitlab-org/gitlab/-/issues/340591
**Problem**
FindAllTags request does not support pagination parameters. Lack of
pagination causes performance issues for repositories with many tags.
**Solution**
Add pagination parameter support similar to what we use for
FindLocalBranches.
Changelog: added
|
|
gitpipe: Drop useless pipeline steps
See merge request gitlab-org/gitaly!3980
|
|
The `RevisionTransform()` pipeline step isn't used anymore starting with
the preceding commit. Drop it.
|
|
The `FindAllTags()` RPC call has four pipeline steps:
1. It executes git-for-each-ref(1) to enumerate all refs.
2. The output of git-for-each-ref(1) is transformed such that for
each tag, we request object info both of the tag and its peeled
non-tag object via '^{}'.
3. git-cat-file(1) is used to see whether any ref is an annotated
tag or not. If the current object is an annotated tag, then we
pass along both the annotated tag and its peeled object info.
Otherwise, we skip sending along the peeled object info.
4. git-cat-file(1) will then retrieve objects. If it's a tag, it
knows to also request the second object and parse it such that we
can return both information about the tag, and the object it
references.
This is a lot of work to do, and FindAllTags is an RPC that's executed
quite often. In repos with many tags, it's shown to be hugely expensive
with lots of small allocations and bumping back-and-forth between the
Goroutines.
We can get rid of the second and third pipeline steps though by playing
a trick: git-for-each-ref(1) has the ability to conditionally print
things depending on what the current object is. Most importantly, it
allows us to print another line if the object type is a tag. We can thus
combine steps one and two into a single step by always printing the
current object, and printing the current object's OID with "^{}"
appended if the current object is a tag. Furthermore, this allow us to
skip inspecting the object type and filtering in the third step given
that git-for-each-ref(1) has already asserted object types for us.
Convert the code to do this. This both reduces the pipeline steps by
half and gets rid of one of two git-cat-file(1) processes. Together, we
should see a notable performance improvement with this.
Changelog: performance
|
|
The GetTagSignatures RPC uses a three-level pipeline from `Revlist()` to
`CatfileInfo()` to `CatfileObject()`. The second step isn't needed
anymore though given that `Revlist()`'s iterator can now be directly
passed to the `CatfileObject()` pipeline step.
Get rid of the useless pipeline step and thus avoid spawning a catfile
process.
Changelog: performance
|
|
The ListCommits RPC uses a three-level pipeline from `Revlist()` to
`CatfileInfo()` to `CatfileObject()`. The second step isn't needed
anymore though given that `Revlist()`'s iterator can now be directly
passed to the `CatfileObject()` pipeline step.
Get rid of the useless pipeline step and thus avoid spawning a catfile
process.
Changelog: performance
|
|
The ListLFSPointers RPC uses a three-level pipeline from `Revlist()` to
`CatfileInfo()` to `CatfileObject()`. The second step isn't needed
anymore though given that `Revlist()`'s iterator can now be directly
passed to the `CatfileObject()` pipeline step.
Get rid of the useless pipeline step and thus avoid spawning a catfile
process.
Changelog: performance
|
|
Depending on whether object data was requested in ListBlobs and
ListAllBlobs, we either only iterate over the `CatfileInfo()` pipeline
step to extrat information or we use `CatfileObject()` to also extract
the blobs' data. In the latter case we spawn `CatfileInfo()` regardless
of whether we actually require it though, and use it as intermediate
step between `Revlist()` and `CatfileObject()` that doesn't serve any
purpose.
Now that we can pass the `Revlist()` iterator into `CatfileObject()`
directly, we can get rid of this pipeline step altogether and thus
improve performance by getting rid of its process.
Changelog: performance
|
|
The pipeline architecture in the gitpipe package is currently quite
strict given that we cannot use e.g. the `Revlist()` iterator as input
to the `CatfileObject()` step. It's thus hard to mix-and-match pipeline
steps.
Lift this limitation by introducing a new ObjectIterator interface which
is common to all iterators: all it provides is access to the object ID
and name. It's thus sufficient to be used as input iterator for all of
the pipeline steps which do accept an input iterator.
|
|
Add a new option which allows the caller to specify the format to be
used by git-for-each-ref(1). While the actual format must always be in
the format '%(objectname) %(refname)', the caller can play some tricks
to inject additional lines into the format via conditional format items
or may change the refname, if he intends to.
The new option is not yet used.
|
|
The `RevisionFilter()` pipeline step is currently an extra pipeline step
with its own set of channels. This is inefficient given that we have an
extra pair of of channels on which we have to receive data from and send
data on. Given that filtering itself should typically be fast, the added
complexity isn't really worth it at all.
Convert the filter into an option for the `Revlist()` pipeline step such
that it runs before sending it down the pipeline. Given that it's been
unobvious whether "filtering" needs to return `true` or `false` to skip
over an object, the new option now has "skipping" semantics, which
should avoid that confusion. Note that this reverses the filter logic
though.
|
|
The `CatfileInfoFilter()` pipeline step is currently an extra pipeline
step with its own set of channels. This is inefficient given that we
have an extra pair of Go channels on which we have to receive data from
and send data on. Given that filtering itself should typically be fast,
the added complexity isn't really worth it at all.
Convert the filter into an option for the `CatfileInfo()` pipeline step
such that it runs before sending it down the pipeline. Given that it's
been unobvious whether "filtering" needs to return `true` or `false` to
skip over an object, the new option now has "skipping" semantics, which
should avoid that confusion. Note that this reverses the filter logic
though.
|
|
This removes a workaround for a bug in Git. The bug got fixed in Git
2.31.0 by
https://gitlab.com/gitlab-org/git/-/commit/ad5df6b782a13854c9ae9d273dd03c5b935ed7cb.
Because the minimum supported Git version in Gitaly is currently
2.31.0, we no longer need the workaround in Gitaly.
Changelog: other
|
|
This removes the pack_objects_hook_with_sidechannel feature flag and
makes gitaly-hooks always call PackObjectsHookWithSidechannel instead
of PackObjectsHook. The server implementation of PackObjectsHook will
be removed in a later GitLab release.
Changelog: removed
|
|
This commit updates the valid_primaries view to perform joins using
repository_id. This allows postgres to optimize the query by pushing the
repository_id filtering deeper into the query which avoids fetching unneeded
rows. The version of the query prior to this commit ends up getting valid
primary candidates for all repositories prior to filtering by repository id
which kills the performance.
|
|
The backwards_compatibility CI job runs the tests with the latest
migrations against the test suite from the previous version. While
testing the backwards compatibility of the migrations is important,
the current approach has problems that make it unfeasible in practice.
1. The CI job figures out the previous version by reading the VERSION
file from Gitaly's repo. This version file appears to be unupdated
in Gitaly's master branch. As it is, it points to 14.3.0-rc2 when
in fact it should point to some release candidate of 14.4. This means
the tests are not against the latest release but the one before that.
2. Some assertions the tests make fail with newer migrations even if
everything would be fine with a real setup. Some tests haven't setup
their test state to fully match the production setup, and thus fail
when the newer migrations are applied. In the same fashion, some tests
fail with newer migrations even if they've setup their test state
correctly. They may however make assertions on how the state looks like
after running some code, yet the new migrations cause the state to look
different from the asserted state. As an example, Praefect's deletion
handling was changed in gitlab-org/gitaly!3906. The new code removes all
records of a repository when the repository is deleted. The old code
testing deletion fails as it still expects the records to be in place.
The new code is backwards compatible though and would process deletion
jobs scheduled by the old code.
While testing for backwards compatibility is important, this commit removes
the job as it's not clear how to solve the problems. This unblocks further
fixes from being merged.
|
|
catfile: Fix cache eviction potentially hanging
Closes #3854
See merge request gitlab-org/gitaly!3987
|
|
Extract checksum calculation
See merge request gitlab-org/gitaly!3963
|
|
Upgrade libgit2 to v1.2.0
See merge request gitlab-org/gitaly!3966
|
|
git: Update to v2.31.1
See merge request gitlab-org/gitaly!3976
|
|
Create incremental backups
See merge request gitlab-org/gitaly!3937
|
|
Fix confused behavior of creating and moving files overwriting directory
Closes #3284 and #3762
See merge request gitlab-org/gitaly!3844
|
|
When evicting catfile processes from the cache, then we first try to
close the cached process via calling `close()`, and then cancel the
process's context. Given that the command is created with `SetupStdin`,
the command package will make sure to close stdin for us. But it's not
closing stdout, which means that if the process has any data pending
which it is trying to write to stdout, then it may hang forever.
In theory, only processes which are not dirty will ever be part of the
catfile cache. But given that git-cat-file(1) always appends a newline
to each object, we will also return processes to the cache which have
exactly one byte pending, which is the expected newline. So it can
definitely happen that a process ends up in the cache even though it's
still got pending data, and this is even the default assumption. What's
unclear is whether a process should in fact block in this write if it
got a single byte pending -- but given that we use pipes to communicate
with the process which are indeed synchronous, I can imagine that to
happen. And we did see pipeline failures where eviction of the cache
was hanging with a git-cat-file(1) process still lingering around.
Fix the potential deadlock by first cancelling the context. This will
cause us to send a SIGKILL signal to the process, which should unblock
it and thus also eviction of the cache.
Changelog: fixed
|
|
The `stack` structure is tracking a set of processes in a quasi-stack.
The name is weirdly chosen though and doesn't really help in
understanding what it's supposed to do.
Rename it to `processes` to hopefully make it easier to understand.
|
|
praefect: Fix formatting issue
See merge request gitlab-org/gitaly!3986
|
|
catfile: Drop inefficient Batch interface
Closes #3841
See merge request gitlab-org/gitaly!3969
|
|
Fix failing tests
See merge request gitlab-org/gitaly!3982
|
|
Reformat info service tests with gofumpt to fix a formatting issue.
|
|
It wasn't clear why the error unwrapping was needed, and this resulted
in an ugly anonymous error type assertion. This was cleaned up by
immediately converting the error to a known sentinel.
|
|
Now that incremental backups can be created, we add the boilerplate in
order to expose this to the commandline.
|
|
The created incremental backups are bundles where the targets of the
refs of the previous backup have been negated. This means that `git
bundle create` will stop traversing once it finds these targets and
hence the created bundle should not contain any of the commits from the
previous backup.
Changelog: added
|
|
Fix flaky test TestReconciler
Closes #3850
See merge request gitlab-org/gitaly!3984
|
|
A TestReconciler's subtest is asserting that only one 'delete-replica' type
job is scheduled even if multiple candidates are available. As there are two
candidates, the test can schedule a 'delete-replica' job for either of the
candidates. Previously this was not flaky as the query happened to be scheduling
the job always for the storage-2 which came lexicographically before storage-3.
This ordering is implicit in the query and thus not guaranteed. Since then, the
repository ID was added to the query which changed the implicit ordering. This
commit fixes the flaky test by adding 'storage-3' as the other possible delete-replica
receiving storage in the test case. Now the assertion is that either the storage-2
or storage-3 receive the job but not both.
|
|
TestReconciler is currently flaky as some of the tests cases contain
randomness. In order to test these random cases, we need to be able
to assert against multiple sets of replication jobs. This allows for
asserting against the random outcome by defining the produced replication
jobs should match one of the expected sets.
|