Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-10-12Makefile: Revert changes to ruby bundlerevert-bundle-deploymentToon Claes
We've seen users, and CI failing [1] on the changes made in [2], so revert it. 1. https://gitlab.com/gitlab-org/gitlab/-/merge_requests/72178 2. 34de8b796 (Makefile: Never ruby bundle as deployment, 2021-10-11) Issue: https://gitlab.com/gitlab-org/gitlab-development-kit/-/issues/1315
2021-10-12Merge branch 'ps-fix-flaky-acquirenextstorage' into 'master'Patrick Steinhardt
datastore: Fix flaky test for AcquireNextStorage See merge request gitlab-org/gitaly!3962
2021-10-12Merge branch 'pks-testserver-block-until-healthy' into 'master'Pavlo Strokov
testserver: Block dialing until connection is up Closes #3835 See merge request gitlab-org/gitaly!3959
2021-10-12Merge branch 'pks-testserver-precreate-praefect-socket' into 'master'Pavlo Strokov
testserver: Fix dialing Praefect server if socket does not yet exist See merge request gitlab-org/gitaly!3964
2021-10-12testserver: Precreate socket to fix health check race with PraefectPatrick Steinhardt
When running the test-with-praefect target, then we're running the Praefect executable as a transparent proxy in front of Gitaly servers. We frequently observe test failures though because we fail to establish the server's healthiness, where the error message typically indicates that the server's socket path does not exist. This may have two root causes: either we're really taking too long to spawn Praefect, or we just fail to dial Praefect correctly in case the socket really hasn't been created yet. Plug one of these two potential races by precreating the Praefect socket path. If this doesn't fix the issue, then the only thing left we can do is to bump the three second timeout.
2021-10-11Merge branch 'toon-no-bundle-hack' into 'master'James Fargher
Makefile: Never ruby bundle as deployment Closes gitlab-development-kit#1315 See merge request gitlab-org/gitaly!3960
2021-10-11datastore: Fix flaky test for AcquireNextStoragePavlo Strokov
The time period that triggers updates for the acquired storage is too high and cause test to fail in some cases. The change reduces the update period to omit that problem. It still can appear in case of high load on CPU, but it should happen rarely.
2021-10-11Merge branch 'smh-cluster-repository-id-14-4' into 'master'Pavlo Strokov
Join RepositoryStore queries using repository ID See merge request gitlab-org/gitaly!3907
2021-10-11Makefile: Never ruby bundle as deploymentToon Claes
During the bundle step of the gitaly-ruby gems it tries to guess whether it is part of a gitlab-development-kit. If it isn't it installs gems as a "deployment", which will install the gems in vendor/bundle instead of in $GEM_HOME. To detect this, it looks for .gdk-install-root in the parent directory. This behavior is unexpected to many and also seems to cause issues for some users. Added to that, recently a change [1] was made to gitlab-org/gitlab to install gems in $GEM_HOME when running tests. This speeds up the installation of Gitaly. This change made it so the gems are never installed as a "deployment" This change removes the code to determine whether it's a deployment and makes the gems install in $GEM_HOME unconditionally. 1. https://gitlab.com/gitlab-org/gitlab/-/commit/77f41fff0363ec0b1bc687e8df02995729e8218e Issue: https://gitlab.com/gitlab-org/gitlab-development-kit/-/issues/1315
2021-10-11Merge branch '299529_fix_incorrect_page_token_detection' into 'master'Toon Claes
Fix incorrect pagination token detection for branches See merge request gitlab-org/gitaly!3851
2021-10-11testserver: Block dialing until connection is upPatrick Steinhardt
When waiting for spawned servers to become healthy, we use a context timeout and then first `DialContext()` and second invoke its health service. `DialContext()` will return immediately though without waiting for the connection to come up if it's not executed via `WithBlock()`. While this isn't much of a problem given that we use `WaitForReady()` in the call to the health service, this moves errors caused by a failure to establish the connection from dialing to invoking the RPC call. As a consequence, errors can be misleading and we may have less information available. Improve this by using a blocking dial.
2021-10-11Add test to verify sequential pagination requestsVasilii Iakliushin
2021-10-11Provide feature flag to TestFindLocalBranches* testsVasilii Iakliushin
2021-10-11ref: Explicitly test for data returned by FindLocalBranchesPatrick Steinhardt
The testcase which exercises the logic with invalid pagination tokens dosen't explicitly check for returned errors and data, which means that it could in fact return unexpected values. Fix this by explicitly verifying we get what we expect.
2021-10-11ref: Use exact comparison on page tokenVasilii Iakliushin
The FindLocalBranches accepts a page token in it's request. The caller can fill this in to get the records from a certain offset. Prior to this change a lexicographical comparison was used to determine the records that come *after* the page token. This was done to make sure the page token will work even if some records were removed in mean time. This method works fine when the refs are requested in alphabetical order, but that's not always the case. With this change the comparison will use an exact match on the page token. When the page token is not found, it raises an error indicating the page token was not found. The change is behind a default-disabled feature flag and rolled out through [1]. 1. https://gitlab.com/gitlab-org/gitaly/-/issues/3817
2021-10-11Update SetGeneration to use repository IDSami Hiltunen
This commit updates SetGeneration to use repository ID to identify a repository instead of the virtual storage and relative path. Logic is the same but some tests had minor changes to their setups as they did not setup the repository IDs correctly. SetGeneration is used to set a repository's generation after successfully applying a replication job to it. Using the repository ID instead of the virtual storage and the relative path guarantee any jobs targeting a previous deleted or renamed repository at the same virtual storage and relative path do not affect the repository that is now identified by them.
2021-10-11Use repository ID in GetGenerationSami Hiltunen
To prepare for Praefect generating disk paths for replicas, this commit changes GetGeneration to identify the repository by its ID rather than the virtual storage and relative path.
2021-10-11Use repository ID in GetReplicatedGenerationSami Hiltunen
To prepare for Praefect generating disk paths for replicas, this commit changes GetReplicatedGeneration to identify the repository by its ID rather than the virtual storage and relative path. Functionality is the same but the DowngradeAttemptedError had to be changed as the method no longer has the information about virtual storage and relative path. This is fine though as the virtual storage and relative path are available for logging higher up in the call chain.
2021-10-11Use repository ID in DeleteInvalidRepositorySami Hiltunen
DeleteInvalidRepository is currently identifying the database records via their virtual storage and relative path. As the replicas will soon have different paths from the externally used relative path, this commit updates DeleteInvalidRepository to identify the repository's database records via its repository id.
2021-10-11Fix TestReplicatorInvalidSourceRepository testSami Hiltunen
TestReplicatorInvalidSourceRepository test hasn't been working correctly as it does not assert a record exists for the repository prior to deleting the record and asserting a record doesn't exist. SetGeneration was updated earlier not to create records in the 'repositories' table anymore. Since then, the SetGeneration call in the test hasn't been enough to create the database record the test expects to delete. This commit fixes the issue by replacing the 'SetGeneration' call with a 'CreateRepository' call that creates the database record and asserts the record actually exists prior to testing the deletion.
2021-10-11Use repository IDs in IncrementGenerationSami Hiltunen
IncrementGeneration currently identifies the records to modify by the virtual storage and relative path tuple. This needs to be changed as the replicas will soon have Praefect generated relative paths which don't match the relative path the client uses to access the repository. Relying on the external relative path is also prone to cause races as it or the repository it refers to could change during an in-flight request. Relying on the repository ID retrieved at the beginning of the request handling ensures all database operations always refer to the correct repository.
2021-10-11Merge branch 'pks-operations-merge-eof-flakiness' into 'master'Patrick Steinhardt
operations: Fix flakiness due to EOF timeout Closes #3830 See merge request gitlab-org/gitaly!3955
2021-10-11Merge branch 'pks-glsql-migration-data-race' into 'master'Pavlo Strokov
migrations: Fix data race due to accessing global state Closes #3827 See merge request gitlab-org/gitaly!3951
2021-10-11operations: Fix flakiness due to EOF timeoutPatrick Steinhardt
In our merge-related tests, we're using testhelper.ReceiveEOFWithTimeout to wait for the EOF error. If we fail to receive it after more than a second, then the test fails. It's debatable what the real merit of this timeout is: the code is not timing sensitive, but we just want to assert that it does indeed succeed. Furthermore, given our recent shift to use `t.Parallel()` more, we have seen that this timeout may be exceeded on occasion. Remove these timeouts and simply assert that we do get an EOF error and drop the now-unused helper function.
2021-10-11migrations: Fix data race due to accessing global statePatrick Steinhardt
We have recently hit a data race in our migration tests due to two different Goroutines accessing the global `IgnoreUnknown` flag in the sql-migrations package. As it turns out, sql-migrations provides two different APIs: a "simple" API which uses a global `MigrationSet` struct to keep data in. And an explicit API where the caller needs to create the `MigrationSet` by himself. Given that we use the former, it's clear that we easily run into races as soon as two concurrent tests want to do migrations. Fix the data race by converting all callsites to use an explicit `MigrationSet` structure without any global state.
2021-10-11ci: Fix backwards-compatibility job with changing setupPatrick Steinhardt
The database backwards-compatibility job is testing whether an old version of Praefect continues to work correctly in case it's running with a database which has been migrated to a newer schema. To do so, we check out the previous minor release, but restore migrations from the current version. While this works alright in case there are only changes to the migrations themselves (e.g. only migrations have been added), it falls apart when the setup of migrations itself changes. An upcoming change will remove setup of the migration package in "migrations.go" and move it into local state. Naturally, if we run an older version which does not yet set up that local state in the other packages with a "migrations.go" file which doesn't set up the table name anymore, the only result can be failure. Fix the issue by not modifying "migrations.go" anymore.
2021-10-11Merge branch 'pks-git-mandatory-v2.33.0' into 'master'Patrick Steinhardt
git: Bump minimum required Git version to v2.33.0 Closes #3815 See merge request gitlab-org/gitaly!3931
2021-10-11Merge branch 'pks-testhelper-gitaly-health-checks' into 'master'Toon Claes
testserver: Fix Praefect resource exhaustion and assert Gitaly's healthiness See merge request gitlab-org/gitaly!3945
2021-10-11git: Always assume we can handle object type filtersPatrick Steinhardt
Now that the minimum required Git version is v2.33.0 and newer, we can assume that git-rev-list(1) can always handle object type filters. Drop `SupportsObjectTypeFilter()` and adjust callsites accordingly. Given that we now always set up those filters unconditionally, this commit also gets rid of any additional pipeline steps which verify that the current object's type is indeed the expected type. Note that in some cases, we can get rid of the `CatfileInfo()` step completely because we already know the type of the object, which we could thus directly pass to the `CatfileObject()` pipeline step. This refactoring is deferred to a later point in time though such that pending refactorings for the catfile cache can land first.
2021-10-11git: Bump minimum required Git version to v2.33.0Patrick Steinhardt
Recently, we have seen a lot of issues with git-cat-file(1) due to inefficiencies. In many cases, we can improve the code to elide additional filtering by using new object type filters which have been introduced for git-rev-list(1) in Git v2.33.0, which in some cases may even give us the ability to drop some pipeline steps altogether. Bump the minimum required Git version to v2.33.0 such that we can always make use of these object type filters. Changelog: deprecated
2021-10-11git: Fix test for FlushesUpdaterefStatus exercising wrong functionPatrick Steinhardt
Tests for `FlushesUpdaterefStatus()` exercise the wrong function by accident. Fix them to test the correct function.
2021-10-11test-boot: Set up Git binary pathPatrick Steinhardt
We use the test-boot script to determine whether the Gitaly server successfully boots up with a given Gitaly configuration. This config currently not contain the path to the Git binary though, which is why it will instead pick up the system-provided Git binary. This wouldn't typically be the correct version though, given that we'd likely want to boot Gitaly with the Git executable produced by `make git`. The result is that Gitaly would refuse to boot if the system-provided version of Git does not meet the minimum required version. Fix the issue by setting the Git binary path.
2021-10-11ci: Drop tests for older Git versionsPatrick Steinhardt
We're about to drop support for Git versions older than v2.33.0. Drop tests for these versions from the CI definitions.
2021-10-11Merge branch 'qmnguyen0711/enable-praefect-sidechannel-test' into 'master'James Fargher
Enable Praefect in PostUploadPackWithSidechannel tests See merge request gitlab-org/gitaly!3938
2021-10-11Merge branch 'ps-repocleaner-test-enhance' into 'master'James Fargher
test: Refactor for better readability See merge request gitlab-org/gitaly!3932
2021-10-08testserver: Merge functions which start Gitaly serverPatrick Steinhardt
Now that we always assert that Gitaly is healthy before returning from both `RunGitalyServer()` and `StartGitalyServer()`, the implementations of both functions are exactly the same, except that one returns only the address of the started server while the other returns a structure. Merge both implementations.
2021-10-08testserver: Always assert that Gitaly can serve requestsPatrick Steinhardt
When starting Gitaly with Praefect, then we always wait until Gitaly becomes healthy before we return from the function such that Praefect can correctly route information to it. It does make sense to wait for Gitaly to be healthy in the general case though such that we don't ever run with Gitaly not yet being able to server requests. Improve the situation by always asserting that Gitaly is healthy when `RunGitalyServer()` or `StartGitalyServer()` returns. This is likely to fix some flaky tests where we have seen Gitaly to not be able to serve requests.
2021-10-08testserver: Move `StartGitalyServer()` next to `RunGitalyServer()`Patrick Steinhardt
Both `StartGitalyServer()` and `RunGitalyServer()` essentially do the same thing -- the only thing that's differen tis that one returns the address of the started server, while the other one returns a `GitalyServer` structure. Move them next to each other to group them together. This is a preparatory patch towards unifying both implementations.
2021-10-08testserver: Clean up Praefect proxy if it fails to become healthyPatrick Steinhardt
The Praefect proxy is started via a separate executable which we'll automatically inject when a certain environment variable is set. If the Praefect proxy fails to come up and become healthy though, we leak this executable because we don't kill it if the test fails. As a result, there is a misleading panic at the end of the test run hinting at a leaked process, while the real issue is that Praefect failed to become healthy at all. Fix this by queueing shutdown of the process before checking for Praefect to become healthy.
2021-10-08gitaly-ssh: Inject test certificate into system storePatrick Steinhardt
One of the connectivity tests for gitaly-ssh uses a TLS-based setup to assert that we're able to connect to them. The self-signed TLS certificate we're using for this setup keeps us from connecting to the Gitaly server directly. While we don't do that right now in the first place, it will break as soon as we start to do a health check for test Gitaly servers because we cannot connect to it in the first place. Fix this issue by injecting self-signed certificates into the system store. While at it, generate certs on the fly instead of using pre-seeded test data.
2021-10-08testserver: Use "client" package to dial health serverPatrick Steinhardt
We're using the grpc package directly to dial to the health server when waiting for it to become healthy. This dialer doesn't know to set up various things for us: it doesn't understand the "tcp://" and "tls://" schemas, it won't handle setup of transport credentials and doesn't set up interceptors. This has worked fine until now because we only used it to check the Praefect server's health, but it will fail as soon as we want to also check the Gitaly server's health. Use our own "client" package to dial the health service such that we can extend support for `waitHealthy()` to Gitaly servers.
2021-10-08testserver: Drop retry when waiting for server to become healthyPatrick Steinhardt
When executing `waitHealthy()`, then we retry connecting to the health client, where each retry is limited to 1 second. This shouldn't be necessary though: - `grpc.WaitForReady()` ensures that we retry connecting to the service until we're either successful or until the context times out. It's thus useless to re-dial. - When instantiating the health server, it will always be initialized in "serving" state for the default service. Furthermore, we only ever check for the default service and never modify the serving status of the server. As such, if we ever receive an answer from the server, then it must be that it is serving. It's thus useless to retry the RPC call. In combination, it means that we do not need to loop at all in `waitHealthy()`. Simplify the function to not loop, but instead use a 3 second timeout both for dialing the server and executing the health check.
2021-10-08testserver: Drop parameters for `waitHealthy()`Patrick Steinhardt
We always pass the same timeout parameters to `waitHealthy()`, so there is not much of a point to keep them. Hardcode timeouts in `waitHealthy()` itself and get rid of these parameters.
2021-10-08testserver: Fix resource exhaustion with too many Praefect instancesPatrick Steinhardt
When using the "test-with-praefect" target, then most tests will fork an extra Praefect instance which proxies connections to Gitaly. With our recent push to use `t.Parallel()`, this means that there are now potentially hundreds of concurrent Praefect instances created which both consume a lot of memory and exhaust the connection pool of the Postgres database. The result is out-of-memory situations and tests failing because of too many clients connected to Postgres. Fix this by limiting concurrency to at most 16 instances of Praefect.
2021-10-08Merge branch 'pks-supervisor-synchronize-kill' into 'master'Toon Claes
supervisor: Close writers before signalling process is done See merge request gitlab-org/gitaly!3940
2021-10-08supervisor: Close writers before signalling process is donePatrick Steinhardt
When a supervised process is going down, then we need to close its associated stdout and stderr loggers. This is done after we signal to the caller that the process is done though by closing the wait channel, which in turn means that any Goroutines spawned by the writers may still be active at the time we're exiting. Fix this by instead deferring the close. While this fixes the issue, it also avoids any deadlocks due to an unclosed wait channel in case any of the invoked function calls panics.
2021-10-08supervisor: Drop global test statePatrick Steinhardt
Supervisor tests require a test server executable to have been built, which is done in the global test suite configuration. This results in global state we don't really want to have and leaves behind temporary directories. Refactor the code to build the test server on demand using `BuildBinary()`. The end result is both cleaner test setup and faster tests given that we can now run tests in parallel.
2021-10-08testhelper: Expose helper to build Go executablesPatrick Steinhardt
The `buildBinary()` helper handles building of Go executables, where it makes sure to build each executable once only. This function is used to build a set of Gitaly executables which we expect to be present. But in fact, it's more generally useful to also build for example Go test servers. Make the function public such that it can be reused in other packages.
2021-10-08testhelper: Reject trying to build same binary multiple timesPatrick Steinhardt
We allow for the case where a test tries to build a binary with the same target path multiple times. This is bound to be problematic though: if we ever were to blindly parallelize these tests, then we'd potentially copy over the target path while it's still being executed, which would likely lead to ETXTBSY errors. It's furthermore an antipattern to reuse the same configuration for multiple test runs due to potentially shared state. Fix all tests which do this. Next to a few tests which accidentally build gitaly-hooks multiple times, the most common case is the "praefect" binary which gets built automatically whenever we run a new Gitaly server. Add a safeguard which fails the test as soon as it tries to build the same binary twice.
2021-10-08testhelper: Refactor building of binaries for improved readabilityPatrick Steinhardt
The testhelper function to build binaries is a bit hard to read. One of the sources is the variable names which point to the shared binary paths, which are rather generically named and thus don't tell what these really are about. And the other confusing part is the deferred function call, which isn't really necessary at all. Improve readability by renaming the variables and inlining the deferred call.