Age | Commit message (Collapse) | Author |
|
We've seen users, and CI failing [1] on the changes made in [2], so
revert it.
1. https://gitlab.com/gitlab-org/gitlab/-/merge_requests/72178
2. 34de8b796 (Makefile: Never ruby bundle as deployment, 2021-10-11)
Issue: https://gitlab.com/gitlab-org/gitlab-development-kit/-/issues/1315
|
|
datastore: Fix flaky test for AcquireNextStorage
See merge request gitlab-org/gitaly!3962
|
|
testserver: Block dialing until connection is up
Closes #3835
See merge request gitlab-org/gitaly!3959
|
|
testserver: Fix dialing Praefect server if socket does not yet exist
See merge request gitlab-org/gitaly!3964
|
|
When running the test-with-praefect target, then we're running the
Praefect executable as a transparent proxy in front of Gitaly servers.
We frequently observe test failures though because we fail to establish
the server's healthiness, where the error message typically indicates
that the server's socket path does not exist. This may have two root
causes: either we're really taking too long to spawn Praefect, or we
just fail to dial Praefect correctly in case the socket really hasn't
been created yet.
Plug one of these two potential races by precreating the Praefect socket
path. If this doesn't fix the issue, then the only thing left we can do
is to bump the three second timeout.
|
|
Makefile: Never ruby bundle as deployment
Closes gitlab-development-kit#1315
See merge request gitlab-org/gitaly!3960
|
|
The time period that triggers updates for the acquired
storage is too high and cause test to fail in some cases.
The change reduces the update period to omit that problem.
It still can appear in case of high load on CPU, but it
should happen rarely.
|
|
Join RepositoryStore queries using repository ID
See merge request gitlab-org/gitaly!3907
|
|
During the bundle step of the gitaly-ruby gems it tries to guess whether
it is part of a gitlab-development-kit. If it isn't it installs gems as
a "deployment", which will install the gems in vendor/bundle instead of
in $GEM_HOME. To detect this, it looks for .gdk-install-root in the
parent directory. This behavior is unexpected to many and also seems to
cause issues for some users.
Added to that, recently a change [1] was made to gitlab-org/gitlab to
install gems in $GEM_HOME when running tests. This speeds up the
installation of Gitaly. This change made it so the gems are never
installed as a "deployment"
This change removes the code to determine whether it's a deployment and
makes the gems install in $GEM_HOME unconditionally.
1. https://gitlab.com/gitlab-org/gitlab/-/commit/77f41fff0363ec0b1bc687e8df02995729e8218e
Issue: https://gitlab.com/gitlab-org/gitlab-development-kit/-/issues/1315
|
|
Fix incorrect pagination token detection for branches
See merge request gitlab-org/gitaly!3851
|
|
When waiting for spawned servers to become healthy, we use a context
timeout and then first `DialContext()` and second invoke its health
service. `DialContext()` will return immediately though without waiting
for the connection to come up if it's not executed via `WithBlock()`.
While this isn't much of a problem given that we use `WaitForReady()` in
the call to the health service, this moves errors caused by a failure to
establish the connection from dialing to invoking the RPC call. As a
consequence, errors can be misleading and we may have less information
available.
Improve this by using a blocking dial.
|
|
|
|
|
|
The testcase which exercises the logic with invalid pagination tokens
dosen't explicitly check for returned errors and data, which means that
it could in fact return unexpected values. Fix this by explicitly
verifying we get what we expect.
|
|
The FindLocalBranches accepts a page token in it's request. The caller
can fill this in to get the records from a certain offset. Prior to this
change a lexicographical comparison was used to determine the records
that come *after* the page token. This was done to make sure the page
token will work even if some records were removed in mean time.
This method works fine when the refs are requested in alphabetical
order, but that's not always the case. With this change the comparison
will use an exact match on the page token. When the page token is not
found, it raises an error indicating the page token was not found.
The change is behind a default-disabled feature flag and rolled out
through [1].
1. https://gitlab.com/gitlab-org/gitaly/-/issues/3817
|
|
This commit updates SetGeneration to use repository ID to identify
a repository instead of the virtual storage and relative path. Logic
is the same but some tests had minor changes to their setups as they
did not setup the repository IDs correctly. SetGeneration is used to
set a repository's generation after successfully applying a replication
job to it. Using the repository ID instead of the virtual storage and
the relative path guarantee any jobs targeting a previous deleted or
renamed repository at the same virtual storage and relative path do
not affect the repository that is now identified by them.
|
|
To prepare for Praefect generating disk paths for replicas, this commit
changes GetGeneration to identify the repository by its ID rather
than the virtual storage and relative path.
|
|
To prepare for Praefect generating disk paths for replicas, this commit
changes GetReplicatedGeneration to identify the repository by its ID rather
than the virtual storage and relative path. Functionality is the same but
the DowngradeAttemptedError had to be changed as the method no longer has
the information about virtual storage and relative path. This is fine though
as the virtual storage and relative path are available for logging higher up
in the call chain.
|
|
DeleteInvalidRepository is currently identifying the database records via
their virtual storage and relative path. As the replicas will soon have
different paths from the externally used relative path, this commit
updates DeleteInvalidRepository to identify the repository's database
records via its repository id.
|
|
TestReplicatorInvalidSourceRepository test hasn't been working correctly
as it does not assert a record exists for the repository prior to deleting
the record and asserting a record doesn't exist. SetGeneration was updated
earlier not to create records in the 'repositories' table anymore. Since then,
the SetGeneration call in the test hasn't been enough to create the database
record the test expects to delete. This commit fixes the issue by replacing
the 'SetGeneration' call with a 'CreateRepository' call that creates the
database record and asserts the record actually exists prior to testing the
deletion.
|
|
IncrementGeneration currently identifies the records to modify by the
virtual storage and relative path tuple. This needs to be changed as
the replicas will soon have Praefect generated relative paths which don't
match the relative path the client uses to access the repository. Relying
on the external relative path is also prone to cause races as it or the
repository it refers to could change during an in-flight request. Relying
on the repository ID retrieved at the beginning of the request handling
ensures all database operations always refer to the correct repository.
|
|
operations: Fix flakiness due to EOF timeout
Closes #3830
See merge request gitlab-org/gitaly!3955
|
|
migrations: Fix data race due to accessing global state
Closes #3827
See merge request gitlab-org/gitaly!3951
|
|
In our merge-related tests, we're using testhelper.ReceiveEOFWithTimeout
to wait for the EOF error. If we fail to receive it after more than a
second, then the test fails. It's debatable what the real merit of this
timeout is: the code is not timing sensitive, but we just want to assert
that it does indeed succeed. Furthermore, given our recent shift to use
`t.Parallel()` more, we have seen that this timeout may be exceeded on
occasion.
Remove these timeouts and simply assert that we do get an EOF error and
drop the now-unused helper function.
|
|
We have recently hit a data race in our migration tests due to two
different Goroutines accessing the global `IgnoreUnknown` flag in the
sql-migrations package. As it turns out, sql-migrations provides two
different APIs: a "simple" API which uses a global `MigrationSet` struct
to keep data in. And an explicit API where the caller needs to create
the `MigrationSet` by himself. Given that we use the former, it's clear
that we easily run into races as soon as two concurrent tests want to do
migrations.
Fix the data race by converting all callsites to use an explicit
`MigrationSet` structure without any global state.
|
|
The database backwards-compatibility job is testing whether an old
version of Praefect continues to work correctly in case it's running
with a database which has been migrated to a newer schema. To do so, we
check out the previous minor release, but restore migrations from the
current version. While this works alright in case there are only changes
to the migrations themselves (e.g. only migrations have been added), it
falls apart when the setup of migrations itself changes.
An upcoming change will remove setup of the migration package in
"migrations.go" and move it into local state. Naturally, if we run an
older version which does not yet set up that local state in the other
packages with a "migrations.go" file which doesn't set up the table name
anymore, the only result can be failure.
Fix the issue by not modifying "migrations.go" anymore.
|
|
git: Bump minimum required Git version to v2.33.0
Closes #3815
See merge request gitlab-org/gitaly!3931
|
|
testserver: Fix Praefect resource exhaustion and assert Gitaly's healthiness
See merge request gitlab-org/gitaly!3945
|
|
Now that the minimum required Git version is v2.33.0 and newer, we can
assume that git-rev-list(1) can always handle object type filters.
Drop `SupportsObjectTypeFilter()` and adjust callsites accordingly.
Given that we now always set up those filters unconditionally, this
commit also gets rid of any additional pipeline steps which verify that
the current object's type is indeed the expected type.
Note that in some cases, we can get rid of the `CatfileInfo()` step
completely because we already know the type of the object, which we
could thus directly pass to the `CatfileObject()` pipeline step. This
refactoring is deferred to a later point in time though such that
pending refactorings for the catfile cache can land first.
|
|
Recently, we have seen a lot of issues with git-cat-file(1) due to
inefficiencies. In many cases, we can improve the code to elide
additional filtering by using new object type filters which have been
introduced for git-rev-list(1) in Git v2.33.0, which in some cases may
even give us the ability to drop some pipeline steps altogether.
Bump the minimum required Git version to v2.33.0 such that we can always
make use of these object type filters.
Changelog: deprecated
|
|
Tests for `FlushesUpdaterefStatus()` exercise the wrong function by
accident. Fix them to test the correct function.
|
|
We use the test-boot script to determine whether the Gitaly server
successfully boots up with a given Gitaly configuration. This config
currently not contain the path to the Git binary though, which is why it
will instead pick up the system-provided Git binary. This wouldn't
typically be the correct version though, given that we'd likely want to
boot Gitaly with the Git executable produced by `make git`. The result
is that Gitaly would refuse to boot if the system-provided version of
Git does not meet the minimum required version.
Fix the issue by setting the Git binary path.
|
|
We're about to drop support for Git versions older than v2.33.0. Drop
tests for these versions from the CI definitions.
|
|
Enable Praefect in PostUploadPackWithSidechannel tests
See merge request gitlab-org/gitaly!3938
|
|
test: Refactor for better readability
See merge request gitlab-org/gitaly!3932
|
|
Now that we always assert that Gitaly is healthy before returning from
both `RunGitalyServer()` and `StartGitalyServer()`, the implementations
of both functions are exactly the same, except that one returns only the
address of the started server while the other returns a structure.
Merge both implementations.
|
|
When starting Gitaly with Praefect, then we always wait until Gitaly
becomes healthy before we return from the function such that Praefect
can correctly route information to it. It does make sense to wait for
Gitaly to be healthy in the general case though such that we don't ever
run with Gitaly not yet being able to server requests.
Improve the situation by always asserting that Gitaly is healthy when
`RunGitalyServer()` or `StartGitalyServer()` returns. This is likely to
fix some flaky tests where we have seen Gitaly to not be able to serve
requests.
|
|
Both `StartGitalyServer()` and `RunGitalyServer()` essentially do the
same thing -- the only thing that's differen tis that one returns the
address of the started server, while the other one returns a
`GitalyServer` structure. Move them next to each other to group them
together.
This is a preparatory patch towards unifying both implementations.
|
|
The Praefect proxy is started via a separate executable which we'll
automatically inject when a certain environment variable is set. If the
Praefect proxy fails to come up and become healthy though, we leak this
executable because we don't kill it if the test fails. As a result,
there is a misleading panic at the end of the test run hinting at a
leaked process, while the real issue is that Praefect failed to become
healthy at all.
Fix this by queueing shutdown of the process before checking for
Praefect to become healthy.
|
|
One of the connectivity tests for gitaly-ssh uses a TLS-based setup to
assert that we're able to connect to them. The self-signed TLS
certificate we're using for this setup keeps us from connecting to the
Gitaly server directly. While we don't do that right now in the first
place, it will break as soon as we start to do a health check for test
Gitaly servers because we cannot connect to it in the first place.
Fix this issue by injecting self-signed certificates into the system
store. While at it, generate certs on the fly instead of using
pre-seeded test data.
|
|
We're using the grpc package directly to dial to the health server when
waiting for it to become healthy. This dialer doesn't know to set up
various things for us: it doesn't understand the "tcp://" and "tls://"
schemas, it won't handle setup of transport credentials and doesn't set
up interceptors. This has worked fine until now because we only used it
to check the Praefect server's health, but it will fail as soon as we
want to also check the Gitaly server's health.
Use our own "client" package to dial the health service such that we can
extend support for `waitHealthy()` to Gitaly servers.
|
|
When executing `waitHealthy()`, then we retry connecting to the health
client, where each retry is limited to 1 second. This shouldn't be
necessary though:
- `grpc.WaitForReady()` ensures that we retry connecting to the
service until we're either successful or until the context times
out. It's thus useless to re-dial.
- When instantiating the health server, it will always be
initialized in "serving" state for the default service.
Furthermore, we only ever check for the default service and never
modify the serving status of the server. As such, if we ever
receive an answer from the server, then it must be that it is
serving. It's thus useless to retry the RPC call.
In combination, it means that we do not need to loop at all in
`waitHealthy()`.
Simplify the function to not loop, but instead use a 3 second timeout
both for dialing the server and executing the health check.
|
|
We always pass the same timeout parameters to `waitHealthy()`, so there
is not much of a point to keep them.
Hardcode timeouts in `waitHealthy()` itself and get rid of these
parameters.
|
|
When using the "test-with-praefect" target, then most tests will fork
an extra Praefect instance which proxies connections to Gitaly. With our
recent push to use `t.Parallel()`, this means that there are now
potentially hundreds of concurrent Praefect instances created which both
consume a lot of memory and exhaust the connection pool of the Postgres
database. The result is out-of-memory situations and tests failing
because of too many clients connected to Postgres.
Fix this by limiting concurrency to at most 16 instances of Praefect.
|
|
supervisor: Close writers before signalling process is done
See merge request gitlab-org/gitaly!3940
|
|
When a supervised process is going down, then we need to close its
associated stdout and stderr loggers. This is done after we signal to
the caller that the process is done though by closing the wait channel,
which in turn means that any Goroutines spawned by the writers may still
be active at the time we're exiting.
Fix this by instead deferring the close. While this fixes the issue, it
also avoids any deadlocks due to an unclosed wait channel in case any of
the invoked function calls panics.
|
|
Supervisor tests require a test server executable to have been built,
which is done in the global test suite configuration. This results in
global state we don't really want to have and leaves behind temporary
directories.
Refactor the code to build the test server on demand using
`BuildBinary()`. The end result is both cleaner test setup and faster
tests given that we can now run tests in parallel.
|
|
The `buildBinary()` helper handles building of Go executables, where it
makes sure to build each executable once only. This function is used to
build a set of Gitaly executables which we expect to be present. But in
fact, it's more generally useful to also build for example Go test
servers.
Make the function public such that it can be reused in other packages.
|
|
We allow for the case where a test tries to build a binary with the same
target path multiple times. This is bound to be problematic though: if
we ever were to blindly parallelize these tests, then we'd potentially
copy over the target path while it's still being executed, which would
likely lead to ETXTBSY errors. It's furthermore an antipattern to reuse
the same configuration for multiple test runs due to potentially shared
state.
Fix all tests which do this. Next to a few tests which accidentally
build gitaly-hooks multiple times, the most common case is the
"praefect" binary which gets built automatically whenever we run a
new Gitaly server. Add a safeguard which fails the test as soon as it
tries to build the same binary twice.
|
|
The testhelper function to build binaries is a bit hard to read. One of
the sources is the variable names which point to the shared binary
paths, which are rather generically named and thus don't tell what these
really are about. And the other confusing part is the deferred function
call, which isn't really necessary at all.
Improve readability by renaming the variables and inlining the deferred
call.
|