Age | Commit message (Collapse) | Author |
|
Signed-off-by: Balasankar "Balu" C <balasankar@gitlab.com>
|
|
Set HEAD from backup bundle files
Closes #3856
See merge request gitlab-org/gitaly!4144
|
|
commit: Always enable efficient recursive tree listings
Closes #3915
See merge request gitlab-org/gitaly!4146
|
|
glsql: Compile DSN from configuration
See merge request gitlab-org/gitaly!4138
|
|
Add 'praefect metadata' subcommand
Closes #3481
See merge request gitlab-org/gitaly!4122
|
|
git: Allow to build and deploy Gitaly with bundled Git binaries
See merge request gitlab-org/gitaly!4046
|
|
Praefect doesn't yet have any way of fetching a repository's metadata.
With the addition of replica path, this is becoming increasingly important
as users may need to figure out where on the disk a given repository is
stored. They may also want to figure out which repository on the disk a
replica actually belongs to. Likewise, it's currently not possible to
really see which Gitaly node is acting as a primary of a repository. The
dataloss subcommand does provide some of this information but it only
returns information when a repository is not fully replicated. To address
this gap, this commit adds 'praefect metadata' subcommand that can be used
to fetch a repository's metadata. Users can fetch the metadata either by
the repository's ID or by its virtual storage and relative path to accommodate
all possible queries.
Changelog: added
|
|
Praefect has methods in the datastore to retrieve a repository's
metadata. This commit adds RPC definitions that allow for retrieving
the metadata from cluster. This RPC endpoint is going to be used by
a command line tool to retrieve a repository's metadata and by Gitaly
tests that run with Praefect in front and peek in to the filesystem.
Changelog: added
|
|
Add a new test job that exercises our ability to use bundled Git
binaries to drive our test suite.
|
|
Now that we can build bundled Git binaries, this commit introduces
support into the Git command factory to support them. If a new option
"git.use_bundled_binaries" is set to `true`, then we automatically
derive the Git binary's path from the binary directory of Gitaly.
Note that we must go through some hoops though given that the bundled
binaries have a "gitaly-" prefix. While it's not much of a problem when
executing Git directly, some Git commands require it to find auxiliary
helper binaries like "git-remote-http" or "git-receive-pack". To support
these cases, we thus create a temporary directory and symlink all
binaries Git expects to be present into it. By pointing GIT_EXEC_PATH at
this directory, we can thus trick Git into picking the correct set of
binaries.
This new bundled mode will required configuration changes by admins.
|
|
The `SetGitPath()` function is quite hard to read given that the
different cases are intermingled without clear separation from each
other. Refactor it to use a switch statement to more clearly show which
cases there are and make it readily extensible.
|
|
In order to allow the use of feature flags to toggle different Git
versions at runtime, Gitaly needs to take much more ownership of
binaries such that it knows where to find the Git binaries without
having the admin to always set up different paths whenever there are two
different versions of Git. As a first step towards taking more
ownership, this commit introduces the new `WITH_BUNDLED_GIT` variable to
our Makefile: if set, we will build and install required Git binaries.
In order to not accidentally overwrite any preexisting binaries, the
required binaries all have a "gitaly-" prefix.
Note that ideally, we'd want to install only the "git" binary, which
nowadays contains almost all of the functionality required by us. One
exception though is cURL-based remote helpers, which are still split out
into a separate binary "git-remote-http". Given that we do interact with
HTTP-based Git servers we thus have to build and install both binaries.
This target is currently completely optional, and we want to continue to
support running Gitaly with an external Git executable not built by
Gitaly. It may eventually replace the current "make git" target though,
which sets up a complete Git installation for us.
|
|
Our rspecs raise a bunch of warnings when creating repositories about
missing template directories if the template directory is indeed missing
from the host it's running on. We do not care about templates though and
have moved away from them already, where Go already knows to not use
them at all anymore. The upcoming change to bundle Git with Gitaly will
make it the norm that there are no templates anymore, and thus we'd
always start to hit this warning.
Fix the issue by explicitly telling Git that it shouldn't use the
template dir when initializing test repos.
|
|
With the upcoming bundled Git changes, we'll start to see failures in
the ssh tests. The test failures are such that some tests seemingly
weren't able to find the Git executable anymore, but interestingly
enough output also contained references to a temporary directory which
ought to contain an "output" file. As it turns out, this output stems
from a testcase which writes its own gitaly-hooks script. So somehow,
this script is leaking into other tests. And given that this script also
contains the path to the Git binary hosted in our current temporary Git
execution env, it's not much of a surprise that subsequent tests cannot
find it anymore.
As it turns out, the root cause for this is the recent introduction of
using hardlinks for linking binaries which were built just-in-time into
place in a073666ab (testcfg: Try to fix ETXTBSY errors with
gitaly-git2go executable, 2021-11-03). Because the test is first
building gitaly-hooks and then overwrites the resulting binary again,
the end result is that we overwrite the shared gitaly-hooks binary.
Fix this issue by not building the gitaly-hooks binary anymore.
Furthermore, to help avoid such issues in the future,
`WriteExecutable()` as changed such that it will fail if the target
file exists already. This requires one more fix for merge tests, which
write the same script Git hook twice.
|
|
Before we've reimplemented recursive listings of tree entries to use
git-ls-tree(1), the `treeEntries()` helper function was used to list
entries both for recursive and non-recursive queries. But with the
feature flag removed, the helper is only ever used to do non-recursive
listings.
Remove the arguments which had been used to implement the recursion and
the recursive call itself.
|
|
In commit 08507b227 (commit: Convert GetTreeEntries to use
git-ls-tree(1), 2021-11-08), we have introduced an alternative
implementation for the GetTreeEntries RPC for recursive listings, which
uses git-ls-tree(1) instead of manually parsing and walking down the
trees. This implementation has been rolled out on November 24th with the
expected results: recursive tree listings on production are ~5-7x faster
now.
Remove the feature flag and enable this new implementation by default.
Changelog: performance
|
|
git: Globally disable HTTP redirects
See merge request gitlab-org/gitaly!4131
|
|
glsql: increase db cleanup timeout
See merge request gitlab-org/gitaly!4141
|
|
While we've been careful to disable HTTP redirects in RPCs which
interact with remote repositories where the URL is user-supplied, it
feels a bit like whack-a-mole to always have to remember to supply the
required config option when adding any new calls. It's fragile, and
given that it can have security implications to allow redirects, this is
not an area of code we want to be fragile.
Improve this by instead globally disabling HTTP redirects for all Git
commands we're using which interact with a remote repository. Namely,
this includes git-clone, git-fetch, git-ls-remote, git-push and
git-remote.
Changelog: security
|
|
Previously backup relied on git-clone setting HEAD from bundle files but
now that we apply a series of bundles, the bundles are applied using
git-fetch instead. This means we need to apply our own HEAD setting
logic based off of the bundle.
ChecksumTestRepo was incorrectly calculating repository checksums
without HEAD.
Changelog: fixed
|
|
Add UpdateHead to FetchBundle
See merge request gitlab-org/gitaly!4076
|
|
Document object storage support in gitaly-backup
See merge request gitlab-org/gitaly!4133
|
|
[ci skip]
|
|
praefect: Add database read/write check
Closes gitlab#328493
See merge request gitlab-org/gitaly!4121
|
|
[ci skip]
|
|
Return a proper response on WikiUpdatePage failing on DuplicatePageError
Closes #3884
See merge request gitlab-org/gitaly!4033
|
|
on ci, we are hitting the timeout set here. It may be that pgbouncer is
taking a long to release these connections. Increasing the timeout to
account for this.
|
|
ci: Winter cleanup
See merge request gitlab-org/gitaly!4136
|
|
The only way to get a repository's metadata from Praefect is currently
the dataloss subcommand. However, the dataloss subcommand only works
if the repository is not fully replicated. This is not good as the admins
may be interested in the repository's metadata even if it is fully
replicated. This has become increasingly important with repository-specific
primaries and in the near future with Praefect generated relative paths.
To make it possible to get a repository's complete metadata always, this
commit generalizes the GetPartiallyAvailableRepositories and adds two
more methods for getting the metadata either by a repository's ID or by
it's (virtual_storage, relative_path). The new methods will be used in an
RPC that serves a new cli tool for getting the repository's metadata. The
query and the tests are shared between all of the methods to ensure they
stay in sync.
|
|
Add timeout to health check database inserts
See merge request gitlab-org/gitaly!4099
|
|
HealthManager currently does not apply a timeout to its database
updates. This can cause hard to diagnose issues where the health
checks are considered immediately outdated if the inserting takes
longer than the failover timeout. Currently the failover timeout is
10 seconds, so any inserts that take longer than 10 seconds are
immediately considered unhealthy when they complete. The inserts
really shouldn't take long. This commit adds a timeout for the inserts
so database writes taking too long end up being canceled and thus
logged as an error.
Changelog: fixed
|
|
HealthManager is now only sending an update on the first update. This
notification is used by Praefect in the main to ensure the first round
of health checks have been performed before proceeding to start the
servers. As we only ever send to the channel once and it is buffered,
this simplifies the send by removing the default case as this never
blocks.
|
|
HealthManager previously used to query for the health consensus and
see if it has changed. If it had, it signaled on a channel that would
then trigger an election run. We've since moved on to lazy elections
which do not require this sort of eventing. This commit removes the
consensus querying as it is not used for anything anymore.
The tests of the health consensus are still left in place though. The
healthy_storages view and it's logic still needs to be tested and this
is the most natural place to do so.
|
|
WaitForQueries is currently just checking the database has run the
expected number of queries as the last query from the given number of
connections. The tests that use this helper want to wait for a
query to be waiting on a lock. This ensures there is another
instance of the query actually blocked. This commit changes the function
to block until there is a given query blocked on a lock to better ensure
the correct case is hit. Additionally, the helper is not currently checking
the query is in the same database as the test is running. Parallel tests
running a matching query in a different database can cause this condition
to flake. This commit thus includes and additional check to ensure the query
is running in the same database as the current test.
|
|
catfile: Ensure structs are properly aligned in memory for 32-bit CPUs
See merge request gitlab-org/gitaly!4139
|
|
Specify the location of the junit report in the test_template anchor so
every job using the anchor has it.
|
|
The junit report is mentioned as normal artifact *and* report. This is
not needed, so remove it from the regular artifacts.
|
|
While we have a global default image, this default image is not
parameterized with the Ruby and Go version. Because of this, we have to
redefine the image in multiple locations, while we don't in others. This
is confusing, and ultimately completely unnecessary: we can just add the
variables in the default image given that we also specify both Ruby and
Go versions as default variables anyway.
Remove the duplicate image definitions and instead parameterize the
default image.
|
|
Nowadays, all tests use the Postgres database even when not using the
"test-with-praefect" target. As a result, each test job we have defined
includes both the "test_definition" and the "postgres_definition".
Remove this duplication by inlining the "postgres_definition" into the
"test_definition". While at it, also inline the "pgbouncer_definition"
into the only job which is using it.
|
|
The Praefect smoke test job sets up the Postgres service on its own,
even though we have the `postgres_definition` which would already go
most of the way of setting it up correctly.
Convert the test to use the shared Postgres service definition instead
to avoid code duplication. The only thing that needs to change for the
definition is that we didn't have a database name defined at all, which
is added as part of this commit.
|
|
The "coverage" job is executing tests just as all the other targets we
have, but it doesn't include the "test_definition". Fix it to instead
include "test_definition" such that we can iterate more easily on all
test jobs at once. This also lets us get rid of the stage definition
given that it's already part of the test definition.
|
|
The verify job runs, amongst other things, the lint target in the
Makefile. So the job to run `make lint` is redundant, therefore this
change removes the lint job.
|
|
We have two jobs which execute the "verify" and the "proto" targets,
respectively to check whether Go modules are tidy, whether the NOTICE
file is up-to-date, whether Protobuf files have been generated and to
run linting. The "verify" target does all the "proto" target already
does though, so the second job to explicitly call this target is
duplicating functionality we already have. The only additional thing it
brings to the table is that it will upload changed Protobuf files as
artifacts if this target fails.
Merge the "proto" job into the "verify" job to fix this duplication.
|
|
We don't use the "publish" stage at all. Drop it.
|
|
Job names we have are quite inconsistent, making it hard to spot jobs of
specific stages and jobs which are related to each other. Rename them to
have a common stage-specific prefix.
Ideally, we'd do the same for "analyze"-style jobs like linting or
static analysis. Most of these are included via external templates
though, so changing their names wouldn't work.
|
|
The CI jobs we create have grown all over the place without a strict
schema. This makes it harder than necessary to spot related jobs, e.g.
all jobs which execute Gitaly tests.
Reorder the jobs by stage and by functionality to make this easier.
|
|
hook: Fix custom hook errors not propagating correctly
Closes gitlab#342122 and gitlab#342607
See merge request gitlab-org/gitaly!4120
|
|
In GitLab 14.5, the use of `atomic.LoadInt64` caused a `panic: unaligned
64-bit atomic operation` on a Raspberry Pi2 platform. While we don't
officially support 32-bit processors, we have been supporting armfh
builds for years now.
From https://pkg.go.dev/sync/atomic#pkg-note-BUG: On ARM, 386, and
32-bit MIPS, it is the caller's responsibility to arrange for 64-bit
alignment of 64-bit words accessed atomically. The first word in a
variable or in an allocated struct, array, or slice can be relied upon
to be 64-bit aligned.
We can fix this problem by listing the int64 and int32 atomic fields in
order to ensure they are properly aligned. This does not change the
overall size of the structure.
For example, with Object:
Before: https://go.dev/play/p/MCk9TnxhkCY (80 bytes)
After: https://go.dev/play/p/83jdBOP-wpA (80 bytes)
Relates to https://gitlab.com/gitlab-org/gitaly/-/issues/3938
Changelog: fixed
|
|
glsql: wait for connections to be released
See merge request gitlab-org/gitaly!4132
|
|
In commit c68213377 (updateref: Generalize `PreReceiveError` according
to its usage, 2021-08-25), we have changed how pre-receive errors get
propagated to callers, where it's now instead using a more generalized
`HookError`. Part of this change was a refactoring of the error message
to be more descriptive: instead of printing hook errors printed to their
stdout/stderr as-is, we prepend the actual error we have been seeing to
pinpoint why hook execution failed.
As it turns out though, Rails requires the messages printed by custom
hooks to be returned as-is. This is to support a feature where hooks can
print messages either with a "GitLab:" or with a "GL-HOOK-ERR:" prefix,
which indicates to Rails that it should present the error message to the
user. Prepending the error message with unrelated metadata obviously
breaks this feature.
We are in the process of fixing the whole architecture to use structured
error messages instead, which contain error information in a more
detailed way and don't rely on parsing and/or prefixing error messages.
This doesn't work easily with custom hooks though: we have no control of
stdin and stderr of the command given that these may be connected to
git-receive-pack(1), which would in turn stream anything printed there
to the user. So we have no easy way of inspecting the error messages and
thus cannot properly bubble them up in a custom error type, either.
While we could introduce e.g. a `io.TeeReader` to enable for this, this
introduces additional complexity that may not be worth the hassle given
we're slowly moving towards a deprecation of custom hooks in the first
place.
So let's mark custom hook errors as such and add some special handling
in our hook updater so that both stderr and stdout are returned as-is.
At a later point, we should likely go the road via the `io.TeeReader`
such that we can properly use structured errors, too.
Changelog: fixed
|