Age | Commit message (Collapse) | Author |
|
This change should not be merged.
|
|
The test case TestStorageCleanup_AcquireNextStorage is flaky. To stop it
from blocking deploys, put it in quarantine.
Issue: https://gitlab.com/gitlab-org/gitaly/-/issues/3828
|
|
The verify job runs, amongst other things, the lint target in the
Makefile. So the job to run `make lint` is redundant, therefore this
change removes the lint job.
|
|
This change adds a jobs that also runs the quarantined tests, but the
job is marked to allow failures so it doesn't block deploys when the
pipeline on master is red due to flaky tests.
|
|
Specify the location of the junit report in the test_template anchor so
every job using the anchor has it.
|
|
The CI yaml file specifies default versions for tools like Ruby and
Golang. But the default image still uses older versions of these tools.
This change uses the default versions of these tools in the default
image. This removes the need the need to specify the image in many jobs.
|
|
The junit report is mentioned as normal artifact *and* report. This is
not needed, so remove it from the regular artifacts.
|
|
We are having issues with some tests being flaky and blocking deploys
because the pipeline on master is failing. This annoying, but often it's
not easy to quickly fix the flaky test.
This change allows us to put flaky tests in quarantine. The quarantined
tests are run in a separate job, but that job is marked to allow to
fail. This allows us to make the pipeline green again quickly, but still
have the tests running.
|
|
git: Bump minimum required Git version to v2.33.0
Closes #3815
See merge request gitlab-org/gitaly!3931
|
|
testserver: Fix Praefect resource exhaustion and assert Gitaly's healthiness
See merge request gitlab-org/gitaly!3945
|
|
Now that the minimum required Git version is v2.33.0 and newer, we can
assume that git-rev-list(1) can always handle object type filters.
Drop `SupportsObjectTypeFilter()` and adjust callsites accordingly.
Given that we now always set up those filters unconditionally, this
commit also gets rid of any additional pipeline steps which verify that
the current object's type is indeed the expected type.
Note that in some cases, we can get rid of the `CatfileInfo()` step
completely because we already know the type of the object, which we
could thus directly pass to the `CatfileObject()` pipeline step. This
refactoring is deferred to a later point in time though such that
pending refactorings for the catfile cache can land first.
|
|
Recently, we have seen a lot of issues with git-cat-file(1) due to
inefficiencies. In many cases, we can improve the code to elide
additional filtering by using new object type filters which have been
introduced for git-rev-list(1) in Git v2.33.0, which in some cases may
even give us the ability to drop some pipeline steps altogether.
Bump the minimum required Git version to v2.33.0 such that we can always
make use of these object type filters.
Changelog: deprecated
|
|
Tests for `FlushesUpdaterefStatus()` exercise the wrong function by
accident. Fix them to test the correct function.
|
|
We use the test-boot script to determine whether the Gitaly server
successfully boots up with a given Gitaly configuration. This config
currently not contain the path to the Git binary though, which is why it
will instead pick up the system-provided Git binary. This wouldn't
typically be the correct version though, given that we'd likely want to
boot Gitaly with the Git executable produced by `make git`. The result
is that Gitaly would refuse to boot if the system-provided version of
Git does not meet the minimum required version.
Fix the issue by setting the Git binary path.
|
|
We're about to drop support for Git versions older than v2.33.0. Drop
tests for these versions from the CI definitions.
|
|
Enable Praefect in PostUploadPackWithSidechannel tests
See merge request gitlab-org/gitaly!3938
|
|
test: Refactor for better readability
See merge request gitlab-org/gitaly!3932
|
|
Now that we always assert that Gitaly is healthy before returning from
both `RunGitalyServer()` and `StartGitalyServer()`, the implementations
of both functions are exactly the same, except that one returns only the
address of the started server while the other returns a structure.
Merge both implementations.
|
|
When starting Gitaly with Praefect, then we always wait until Gitaly
becomes healthy before we return from the function such that Praefect
can correctly route information to it. It does make sense to wait for
Gitaly to be healthy in the general case though such that we don't ever
run with Gitaly not yet being able to server requests.
Improve the situation by always asserting that Gitaly is healthy when
`RunGitalyServer()` or `StartGitalyServer()` returns. This is likely to
fix some flaky tests where we have seen Gitaly to not be able to serve
requests.
|
|
Both `StartGitalyServer()` and `RunGitalyServer()` essentially do the
same thing -- the only thing that's differen tis that one returns the
address of the started server, while the other one returns a
`GitalyServer` structure. Move them next to each other to group them
together.
This is a preparatory patch towards unifying both implementations.
|
|
The Praefect proxy is started via a separate executable which we'll
automatically inject when a certain environment variable is set. If the
Praefect proxy fails to come up and become healthy though, we leak this
executable because we don't kill it if the test fails. As a result,
there is a misleading panic at the end of the test run hinting at a
leaked process, while the real issue is that Praefect failed to become
healthy at all.
Fix this by queueing shutdown of the process before checking for
Praefect to become healthy.
|
|
One of the connectivity tests for gitaly-ssh uses a TLS-based setup to
assert that we're able to connect to them. The self-signed TLS
certificate we're using for this setup keeps us from connecting to the
Gitaly server directly. While we don't do that right now in the first
place, it will break as soon as we start to do a health check for test
Gitaly servers because we cannot connect to it in the first place.
Fix this issue by injecting self-signed certificates into the system
store. While at it, generate certs on the fly instead of using
pre-seeded test data.
|
|
We're using the grpc package directly to dial to the health server when
waiting for it to become healthy. This dialer doesn't know to set up
various things for us: it doesn't understand the "tcp://" and "tls://"
schemas, it won't handle setup of transport credentials and doesn't set
up interceptors. This has worked fine until now because we only used it
to check the Praefect server's health, but it will fail as soon as we
want to also check the Gitaly server's health.
Use our own "client" package to dial the health service such that we can
extend support for `waitHealthy()` to Gitaly servers.
|
|
When executing `waitHealthy()`, then we retry connecting to the health
client, where each retry is limited to 1 second. This shouldn't be
necessary though:
- `grpc.WaitForReady()` ensures that we retry connecting to the
service until we're either successful or until the context times
out. It's thus useless to re-dial.
- When instantiating the health server, it will always be
initialized in "serving" state for the default service.
Furthermore, we only ever check for the default service and never
modify the serving status of the server. As such, if we ever
receive an answer from the server, then it must be that it is
serving. It's thus useless to retry the RPC call.
In combination, it means that we do not need to loop at all in
`waitHealthy()`.
Simplify the function to not loop, but instead use a 3 second timeout
both for dialing the server and executing the health check.
|
|
We always pass the same timeout parameters to `waitHealthy()`, so there
is not much of a point to keep them.
Hardcode timeouts in `waitHealthy()` itself and get rid of these
parameters.
|
|
When using the "test-with-praefect" target, then most tests will fork
an extra Praefect instance which proxies connections to Gitaly. With our
recent push to use `t.Parallel()`, this means that there are now
potentially hundreds of concurrent Praefect instances created which both
consume a lot of memory and exhaust the connection pool of the Postgres
database. The result is out-of-memory situations and tests failing
because of too many clients connected to Postgres.
Fix this by limiting concurrency to at most 16 instances of Praefect.
|
|
supervisor: Close writers before signalling process is done
See merge request gitlab-org/gitaly!3940
|
|
When a supervised process is going down, then we need to close its
associated stdout and stderr loggers. This is done after we signal to
the caller that the process is done though by closing the wait channel,
which in turn means that any Goroutines spawned by the writers may still
be active at the time we're exiting.
Fix this by instead deferring the close. While this fixes the issue, it
also avoids any deadlocks due to an unclosed wait channel in case any of
the invoked function calls panics.
|
|
Supervisor tests require a test server executable to have been built,
which is done in the global test suite configuration. This results in
global state we don't really want to have and leaves behind temporary
directories.
Refactor the code to build the test server on demand using
`BuildBinary()`. The end result is both cleaner test setup and faster
tests given that we can now run tests in parallel.
|
|
The `buildBinary()` helper handles building of Go executables, where it
makes sure to build each executable once only. This function is used to
build a set of Gitaly executables which we expect to be present. But in
fact, it's more generally useful to also build for example Go test
servers.
Make the function public such that it can be reused in other packages.
|
|
We allow for the case where a test tries to build a binary with the same
target path multiple times. This is bound to be problematic though: if
we ever were to blindly parallelize these tests, then we'd potentially
copy over the target path while it's still being executed, which would
likely lead to ETXTBSY errors. It's furthermore an antipattern to reuse
the same configuration for multiple test runs due to potentially shared
state.
Fix all tests which do this. Next to a few tests which accidentally
build gitaly-hooks multiple times, the most common case is the
"praefect" binary which gets built automatically whenever we run a
new Gitaly server. Add a safeguard which fails the test as soon as it
tries to build the same binary twice.
|
|
The testhelper function to build binaries is a bit hard to read. One of
the sources is the variable names which point to the shared binary
paths, which are rather generically named and thus don't tell what these
really are about. And the other confusing part is the deferred function
call, which isn't really necessary at all.
Improve readability by renaming the variables and inlining the deferred
call.
|
|
list-untracked-repositories: Praefect sub-command to show untracked repositories
Closes #3792
See merge request gitlab-org/gitaly!3926
|
|
Code simplified to use a field of the glsql.DB struct
instead of extracting database name directly with SQL.
|
|
The test code is hard to read and understand. The change
simplifies things and provides additional comments into
the test code with explanations of what and why is done.
|
|
Praefect uses prometheus to export metrics from inside.
It relies on the defaults from the prometheus library
to gather set of metrics and register a new metrics.
Because of it the new metrics got registered on the
DefaultRegisterer - a global pre-configured registerer.
Because of that we can't call 'run' function multiple
times (for testing purposes) as it results to the metrics
registration error. To omit that problem the 'run' function
extended with prometheus.Registerer parameter that is used
to register praefect custom metrics. The production code
still uses the same DefaultRegisterer as it was before.
And the test code creates a new instance of the registerer
for each 'run' invocation, so there are no more duplicates.
|
|
The old implementation of the bootstrapper initialization
does not allow calling the 'run' function to start a service
because the tableflip library doesn't support multiple
instances to be created for one process.
Starting the Praefect service is required in tests to verify
sub-command execution. The bootstrapper initialization
extracted out of 'run' function. It allows using a new
Noop bootstrapper to run service without tableflip
support.
|
|
Let's reuse the repository creation function
introduced with the list-untracked-repositories
Praefect sub-command in a previous commit. This
function uses the test name in the repository
relative path. This can avoid possible side effects
from one test to affect other tests.
|
|
Praefect extended with a new sub-command 'list-untracked-repositories'.
It walks over all repositories of all accessible gitaly nodes and
checks if repositories exist in the praefect database. If repository
doesn't exist in the database the information about its location
will be printed on STDOUT in JSON format terminated with a newline
(it is configurable with --delimiter flag). STDERR is used to write
log messages and errors if any.
Closes: https://gitlab.com/gitlab-org/gitaly/-/issues/3792
Changelog: added
|
|
an error
The ExecOnRepositories method accepts a callback function that
is called once the batch is ready for the processing. The signature
of the callback function changed to return an error. If the error
occurs the ExecOnRepositories stops iteration over the repositories
and returns an error back to the caller. That change is a preparation
for the new 'list-untracked-repositories' sub-command.
|
|
The ExecOnRepositories method is extracted into a new Walker type.
It will be re-used for the new 'list-untracked-repositories' sub-command.
|
|
In order to not have to repeatedly rebuild the same executables during
test setups, the testhelper package provides a bunch of functions which
build executables via `sync.Once` and then copy them into place. One
shortcoming though is that we have a separate `sync.Once` for each of
the binaries though, which is easy to get wrong.
Refactor the code to use a `sync.Map` containing `sync.Once`s keyed by
source path. Like this, we can create them ad-hoc in a raceless way and
don't need to manage them globally.
|
|
Fail Read if objectReader is closed
Closes #3823
See merge request gitlab-org/gitaly!3944
|
|
A flaky test exposed the issue whereby an object reader can still be
read from after it is closed. This is because we are using a buffered
reader for reading the data from the command, so there could be some
buffered data that is still readable even though the objectReader is
closed.
The fix is to check if the objectReader has been closed in the Read
function, and if so to return an error immediately.
|
|
streamcache: Fix flaky TestCache_diskCleanup
Closes #3822
See merge request gitlab-org/gitaly!3942
|
|
Because implementation was used single sleep function for
both filestore cleanup and clean function invocation the
close of the channel to trigger "single" invocation of the
clean was broken. This is due to close will make channel
to return default results immediately. But for the test we
need to run it only once. We can't send two elements on
the single channel as we can't be sure both won't be consumed
by one of those functions. That is why we add separate sleep
function for each of those functions. By sending on each channel
we are sure each function will be triggered exactly once.
|
|
testhelper: Enable Goroutine leak checks by default
Closes #3790
See merge request gitlab-org/gitaly!3909
|
|
sql-migrate: Update storage_repositories table
Closes #3806
See merge request gitlab-org/gitaly!3927
|
|
In a recent incident, a refactoring of the catfile cache has caused an
incident due to leaking Goroutines. While we did already check for
leaking Goroutines in a subset of our test suite before via the goleak
check, this was opt-in the same way as `mustHaveNoChildProcess()` was.
It also was used a lot less often given that we did have hundreds of
actual Goroutine leaks in our test suite.
Introduce leak checking by default now that almost all of the Goroutine
leaks have been fixed. There is only two packages remaining which still
leak Goroutines: the catfile package, which will get fixed via an MR
which is currently in flight. And the backup package, which is leaking
Goroutines in a package external to us. To the best of my knowledge,
there is no way to plug those.
In any case, having on-by-default leak checking should put us in a much
better position to detect such regressions in the future.
|
|
When creating gRPC connections, then we spawn a set of Goroutines which
listen on these connections. As a result, if they are never closed,
those Goroutines are leaked.
Fix this by closing connections.
|