Age | Commit message (Collapse) | Author |
|
Historically FetchInternalRemote returned a boolean to indicate if the
fetch failed. This makes debugging ReplicateRepository more difficult
than it needs to be. Instead we return the error so it can be diagnosed
without needing to scrape logs.
Changelog: changed
|
|
Setting GITALY_DISABLE_FETCH_INTERNAL_REMOTE_ERRORS to 1 will disable
the feature.
|
|
Convert remaining LFS RPCs to use new pipeline code
See merge request gitlab-org/gitaly!3588
|
|
Extract typed environment variable helpers
See merge request gitlab-org/gitaly!3580
|
|
Write commit graph using bloom filter
Closes #3545
See merge request gitlab-org/gitaly!3545
|
|
Bloom filters speeds up execution of the 'git log'
and other commands that take a pathspec argument.
Let's enable Bloom filters in the commit graph by
passing the --changed-paths flag to git commit-graph.
If Bloom filters were not used for the existing commit graph
files we should remove them and re-create the commit graph
files as `--changed-paths` flag in this case would be skipped.
The --split=replace strategy retroactively computes Bloom
filters over the earlier layers and replaces the old commit graph
file or chain with a file set.
If commit graph creation is in progress the GarbageCollect call
will fail as it is not allowed to run multiple graph file
creation operations for the same repository. The operation uses
lock file to handle concurrent commit graph creation operations.
Closes: https://gitlab.com/gitlab-org/gitaly/-/issues/3545
|
|
Fix Unix socket address handling following gRPC upgrade
Closes #3664
See merge request gitlab-org/gitaly!3592
|
|
In 29f8bbbc, we upgraded our gRPC library version. The upgrade
introduced some differences in how the Unix socket addresses are handled.
We've previously prefixed unix addresses always with 'unix://'. In the
new version of gRPC, this does not work for relative paths as the first
section following the slashes are interpreted as an authority. Instead,
we should either uses 'unix:relative/path', 'unix:/absolute/path' or
'unix:///absolute/path'. As it is, Gitaly configurations that are using
relative paths for the internal socket directory do not work as it results
in address 'unix://relative/path'. This commit changes the two non-test
locations I found by grepping for 'unix://' to prefix the the socket only
by 'unix:' so both relative and absolute paths work.
Changelog: fixed
|
|
Reduce memory allocations in diff parser
See merge request gitlab-org/gitaly!3576
|
|
Add version suffix to gitaly-git2go binary
See merge request gitlab-org/gitaly!3587
|
|
These helpers should make parsing typed environment variables more
consistent.
|
|
0 downtime deployment replaces the binaries on the disk and then
does the HUP. When gitaly-git2go binary replaced with the new major
version, but gitaly process is still of the previous release the
gob fails to process marshalled data as it operates with structs
of the different import path (module major version is part of the
path). The change introduces version suffix to the gitaly-git2go
binary. It allows gitaly to call a proper binary of the gitaly-git2go
as it now uses a version suffix. At the time of deployment the new
binary will be placed and the old one will remain untouched. The
running old gitaly process will refer to the old binary of the
gitaly-git2go and after HOP the new gitaly binary will refer the
new version of the gitaly-git2go.
BinaryPath function support one corner case - for the gitlab.com
the deployment is not yet changed, but it already has a new v14
binary of the gitaly-git2go. That is why we first check if versioned
binary is present and if not it falls back to the raw name without
suffix. That ad hoc fix should be removed once deployment is changed.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/3647
|
|
[ci skip]
|
|
transaction: Drop fallback code to connect to Praefect via server info
See merge request gitlab-org/gitaly!3585
|
|
Convert `GetLFSPointers()` to use our new pipelining code. This will
allow us to eventually get rid of the old code in favor of this new and
much more flexible architecture. Like `ListLFSPointers()`, the new code
is currently gated by a feature flag.
|
|
Convert `ListAllLFSPointers()` to use our new pipelining code. This will
allow us to eventually get rid of the old code in favor of this new and
much more flexible architecture. Like `ListLFSPointers()`, the new code
is currently gated by a feature flag.
|
|
Right now, we only have a single RPC which uses the new pipeline code to
enumerate LFS pointers, but we're about to also convert the remaining
ones to use it. These RPCs will require the same logic to send out LFS
pointers as `ListLFSPointers()`.
Extract the code into a separate function to make it reusable and avoid
code duplication.
|
|
We're about to convert `ListAllLFSPointers()` to use our new pipeline
code. What this RPC does is to enumerate all existing blobs via `git
cat-file --batch-all-objects` regardless of whether they are referenced
by anything or not.
Unfortunately, our catfile cache doesn't allow for such usage, and thus
we can't reuse or adapt any of the existing pipeline steps. This commit
thus introduces a new pipeline step which uses above command to return
all objects part of the repository.
|
|
Refactor blob service to use Go pipelines for LFS objects
See merge request gitlab-org/gitaly!3584
|
|
Now that the pipelining code is in place, convert `ListLFSPointers()` to
use the new pipeline. The change is currently implemented behind a
feature flag to catch any issues with it, but eventually we can get rid
of the old implementation and only retain the new and much more flexible
pipeline architecture.
|
|
We're about to add a second implementation to the current existing
`ListLFSPointers()` implementation. Given that both callsites will
require validation of specified revisions, let's move it into
`ListLFSPointers()` directly such that the code is shared. Given that no
other callers of `findLFSPointersByRevision()` exist, this doesn't relax
any semantics.
|
|
Now that we have the infrastructure in place to set up a complete
pipeline which reads a set of revisions, extracts information about each
of the objects and then finally reads the objects from disk, this commit
adds a bunch of tests which do end-to-end tests of the assembled
pipeline.
|
|
While we now have a fully functional pipeline in place, it's not yet
able to filter down objects based on the information we got. This is now
implemented via a set of filters which can be inserted as intermediate
steps to reduce the number of objects passed onto the next stage.
|
|
In order to unify infrastructure when filtering down a list of revisions
to a list of objects, we're creating a set of pipeline steps which first
enumerate all objects, then extract information about these objects and
finally read the whole objects. This pipeline will allow us to flexibly
put together steps and filter down the object set at intermediate steps.
This commit implements the final pipeline step, which receives as input
a channel of object infos and reads each of the associated objects
fully via `git cat-file --batch`.
|
|
In order to unify infrastructure when filtering down a list of revisions
to a list of objects, we're creating a set of pipeline steps which first
enumerate all objects, then extract information about these objects and
finally read the whole objects. This pipeline will allow us to flexibly
put together steps and filter down the object set at intermediate steps.
This commit implements the second pipeline step, which receives as input
a channel of object IDs and outputs information about each of the
objects via `git cat-file --batch-check`.
|
|
In order to unify infrastructure when filtering down a list of revisions
to a list of objects, we're creating a set of pipeline steps which first
enumerate all objects, then extract information about these objects and
finally read the whole objects. This pipeline will allow us to flexibly
put together steps and filter down the object set at intermediate steps.
This commit implements the first pipeline step, which enumerates all
objects reachable from a set of revisions via git-rev-list(1).
|
|
Givven that all transactional voting nowadays happens via the
backchannel ID, there is no need anymore for the old Praefect server
structure. Let's delete it.
|
|
We do not use the Praefect server structure for anything anymore. Let's
strip it out from our tests such that we can remove it altogether.
|
|
Given that all transactional voting happens via the backchannel now,
there is no need to inject the Praefect server info anymore. Let's drop
it.
|
|
Now that all transactional voting uses the backchannel to connect back
to Praefect, we do not need the Praefect server info anymore. Let's drop
it.
|
|
Now that we only use the backchannel to connect back to Praefect for
transactional voting, there is no need to inject Praefect server info
into the Ruby sidecar anymore. Let's drop it.
|
|
We do not need the Praefect server info in any of the transactional
voting functions anymore given that we now always use the backchannel,
and the backchannel ID is redundantly encoded both in the Praefect
server structure and in the transaction information. Drop the Praefect
server parameter from voting functions.
|
|
Given that the backchannel voting code has been unconditionally enabled
in v13.12, we can now drop the old fallback code which was using the
Praefect server info if no backchannel was set.
|
|
The `CloneFromPoolInternal()` RPC is responsible for first creating the
desired target repo by closing from a pool repository and then updating
the pool via a fetch from the source repo. While the former step happens
via a direct call to git-clone(1), we connect to the remote service's
`FetchInternalRemote()` RPC for the latter.
This second step is problematic in the context of transactions though:
we call `FetchInternalRemote()` with the incoming context having been
converted to an outgoing context. The outgoing context thus retains
information about any ongoing transactions and also about the
backchannel ID. Given that this is a mutating RPC, the remote side will
now try vote on the transaction and dial back via the backchannel, but
given that the backchannel ID referred to the original Praefect node and
not to the now-intermediate Gitaly node, this cannot work and will thus
fail.
Now that we have `FetchInternalRemote()` exposed directly without having
to connect to the remote service, we can now easily fix the issue by
just calling this function directly.
No new tests are added because this failure will be exposed by
subsequent commits which remove the `PraefetcServer` structure.
|
|
The `git.WithRefTxHook()` nowadays only requires a `repository.GitRepo`
structure as input, but we still perform a cast to `gitalypb.Repository`
in `cloneFromPool()`. Drop this useless cast and pass in the repo
directly.
|
|
We'll need to be able to call `FetchInternalRemote()` on the same Gitaly
node without creating a connection to the remote service. Split out the
logic of `FetchInternalRemote()` into a separate non-receiver function
to allow for this.
|
|
We'll need to call `getRemoteDefaultBranch()` from a context where we
don't have a remote server as receiver. Pass in the connection pool
expicitly and make it a normal function to prepare for this.
|
|
Instead of having to pass around the command factory, we nowadays often
use the `git.RepositoryExecutor` interface which gets implemented by our
localrepo structure. Let's refactor `SetDefaultBranchRef()` to use this
interface.
|
|
blackbox: Code cleanups for blackbox and related stats package
See merge request gitlab-org/gitaly!3578
|
|
The "analyzehttp.go" file's most prominent member is the `Clone`
structure. Let's rename the file to "clone.go" to better match its
contents. This requires us to fix one last missing comment on the
`Clone` structure.
|
|
The `Clone` structure currently holds both parameters and results. Such
usage is always a code smell as it conflates two different concerns and
makes it harder to use correctly for the caller.
Let's disentangle both concerns by instead accepting parameters as part
of a refactored `PerformClone()` function. This function is not a
receiver anymore, but instead returns a newly constructed `Clone`
structure which only contains results.
|
|
The `Post` structure hosts statistics about a POST request against
git-upload-pack(1). Without knowing details about how git-upload-pack(1)
works though, one wouldn't be able to tell what it actually represents:
everything related to fetching the packfile.
Let's rename the structure to `HTTPFetchPack`, which more clearly labels
what it actually is about. While at it, this also refactors the way it's
constructed: instead of being constructed via the `Clone` structure,
it's now a standalone structure and can be constructed via
`performFetchPack()`. This disentangles concerns and makes it
easier to reason about the code.
|
|
The `Get` structure hosts statistics about a GET request against
git-upload-pack(1). Without knowing details about how git-upload-pack(1)
works though, one wouldn't be able to tell what it actually represents:
everything related to the reference discovery.
Let's rename the structure to `HTTPReferenceDiscovery`, which more
clearly labels what it actually is about. While at it, this also
refactors the way it's constructed: instead of being constructed via the
`Clone` structure, it's now a standalone structure and can be
constructed via `performReferenceDiscovery()`. This disentangles
concerns and makes it easier to reason about the code.
|
|
When doing a POST to git-upload-pack(1), then we send along a set of
"want"s which tell the remote side which references we want to receive.
So while this depends on us knowing which references exist in the first
place, it still is inarguably a property of the POST and not of the GET
which discovers available refs.
Let's make the relation more explicit by 1. moving wants into the `Post`
structure and 2. accepting announced references as parameter when doing
the post.
|
|
Right now, it's really hard to reason about which functions on the
`Clone`, `Get` and `Post` structures come from where given that we use
inlined structures all the time. To clean this up a bit and make code
easier to follow, this commit de-inlines all embedded structures into
proper member variables.
|
|
The code to parse the git-fetch-pack(1) body is tightly coupled with the
code which drives the actual request. This makes it hard to reason about
both concerns and also makes it hard to just parse a fetch pack body
without doing the actual request.
Refactor the code and split out a new `FetchPack` structure which
contains all logic to parse from an `io.Reader` without having to care
about the request itself. This structure is the equivalent to the
already existing `ReferenceDiscovery` structure.
|
|
Now that `printInteractive()` doesn't append a newline anymore, we can
convert other callers which up to now had to manually check whether
the `interactive` flag was set.
|
|
When `printInteractive()` is called, the assumption is that all lines
should be terminated by a newline. While that's true right now, this
assumption keeps us from using this function in other places which thus
need to manually check whether the `interactive` flag is set.
Do not append the newline in `printInteractive()` and adjust callers
accordingly to prepare for wider use of that function.
|
|
The gitaly-debug executable can be used to clone from a remote
repository and print several statistics about the clone itself, like for
example how long several stages took. Because of this, the stats package
supports an "interactive" flag which causes it to print various events
to stdout: only if it's set to true will `printInteractive()` actually
print anything.
One code smell we have right now is that `printInteractive()` returns an
error, but it's never actually checked. And we probably don't even want
to: it's only used for gitaly-debug to print to stdout, so it's harmless
enough if it would fail.
Get rid of the error code and just ignore when `fmt.Printf()` fails and
remove the golang-ci exemption.
|
|
Now that we have a Blackbox structure which can host all dependencies,
let's move Prometheus metrics into this structure to get rid of one more
source of globals.
|