Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-11-16fix concurrency testingsmh-14-0-stable-concurrency-testSami Hiltunen
2021-11-05Update VERSION filesv14.0.1214-0-stableGitLab Release Tools Bot
[ci skip]
2021-11-05Update changelog for 14.0.12GitLab Release Tools Bot
[ci skip]
2021-11-01Merge branch 'ps-praefect-cmd-port-14-0' into '14-0-stable'John Cai
Backport praefect sub-commands to 14.0 See merge request gitlab-org/gitaly!4018
2021-10-27Add track-repository praefect subcommandPavlo Strokov
Adds track-repository subcommand that allows an admin to add a repository that exists on one of the gitaly nodes to the praefect database. Does some safety checks before inserting the reords. Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/3773 Changelog: added
2021-10-27list-untracked-repositories: Praefect sub-command to show untracked repositoriesPavlo Strokov
The change is a backport of the functionality implemented to resolve https://gitlab.com/gitlab-org/gitaly/-/issues/3792 issue. A new sub-command for the praefect binary. On run it connects to all gitaly storages set in the configuration file and receives from each list of the repositories existing on the disk. Each repository then checked if it exists in the praefect database and if it is not the location of the repository is printed out in JSON format to the stdout of the process. Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/3792 Changelog: added
2021-10-27remove-repository: A new sub-command for the praefect to remove repositoryPavlo Strokov
Users of the praefect are missing tools to manage state of the cluster. One of such tools is a repository removal cli. It should be used to remove any repository from the cluster. The removal covers cleanup of the database and removal from the gitaly storages. The command tries to remove as much as possible first by removing from praefect, replication event queue and each gitaly node configured in the provided config file. Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/3771 Changelog: added
2021-09-24Update VERSION filesv14.0.11GitLab Release Tools Bot
[ci skip]
2021-09-24Update changelog for 14.0.11GitLab Release Tools Bot
[ci skip]
2021-09-09Merge branch 'smh-fs-id-14-0' into '14-0-stable'Sami Hiltunen
Derive virtual storage's filesystem id from its name (14.0) See merge request gitlab-org/gitaly!3834
2021-09-03Workaround Rails filesystem ID tests in PraefectSami Hiltunen
Rails tests configure Praefect in front of the tests that exercise the Rugged direct git access code. As Praefect is now deriving the filesystem IDs from the names of the virtual storages, the filesystem id checks fail and thus the test fail. This is not a problem in practice as one wouldn't use rugged in a real-world setup with Praefect. This commit worksaround the tests by returning the filesystem ID from the Gitaly node if a virtual storage has only one Gitaly node configured. This matches the setup the tests use and thus pass them. The workaround and the filesystem ID code can be removed in 15.0 once the rugged patches and NFS support are dropped.
2021-09-03Derive virtual storage's filesystem id from its nameSami Hiltunen
Gitaly storages contain a UUID filesystem ID that is generated by the Gitaly for each of its storages. The ID is used to determine which storages can be accessed by Rails directly when rugged patches are enabled and to see whether two different storages point to the same directory when doing repository moves. When repository moves are performed, the worker first checks whether the repository's destination and source storage are the same. If they are, the move is not performed. The check is performed by comparing the filesystem IDs of the storages'. As Praefect is currently routing the server info RPC to a random Gitaly node, the filesystem ID can differ between calls as each of the Gitalys have their own ID. This causes the repository moving worker to occasionally delete repositories from the virtual storage as it receives two different IDs on sequential calls. The filesystem ID can identify cases when two storages refer to the same directory on a Gitaly node as the id is stored in a file in the storage. This is not really possible with Praefect. The storage's are only identified by the virtual storage's name. If the name changes, we can't really correlate the ID between the different names as Praefect would consider them different storages. Praefect also supports multiple virtual storages so it's not possible to generate a single ID and use it for all of the virtual storages. Given this, the approach taken here is to derive a stable filesystem ID from the virtual storage's name. This guarantees calls to a given virtual storage always return the same filesystem ID. Configuring two storages that point to the same filesystem should be considered an invalid configuration anyway. Historically, there's been cases when that has been done for plain Gitalys. This is not done for Praefect and wouldn't work as Praefect wouldn't find the repositories with an alternative virtual storage name. With that in mind, we don't have to consider the case where two virtual storages of different names point to the same backing Gitaly storages. The use cases for the filesystem ID seem to be limited and we may be able to remove it in the future once the rugged patches are removed. Changelog: fixed
2021-09-02Update VERSION filesv14.0.10GitLab Release Tools Bot
[ci skip]
2021-09-02Update changelog for 14.0.10GitLab Release Tools Bot
[ci skip]
2021-09-02Merge branch 'pks-coordinator-replication-v14.0' into '14-0-stable'Henri Philipps
Backport improved replication logic (v14.0) See merge request gitlab-org/gitaly!3824
2021-09-01coordinator: Only schedule replication for differing error statesPatrick Steinhardt
When finalizing a transaction, we always schedule replication jobs in case the primary has returned an error. Given that there are many RPCs which are expected to return errors in a controlled way, e.g. if a commit is missing, this causes us to create replication in many contexts where it's not necessary at all. Thinking about the issue, what we really care for is not whether an RPC failed or not. It's that primary and secondary nodes behaved the same. If both primary and secondaries succeeded, we're good. But if both failed with the same error, then we're good to as long as all transactions have been committed: quorum was reached on all votes and nodes failed in the same way, so we can assume that nodes did indeed perform the same changes. This commit thus relaxes the error condition to not schedule replication jobs anymore in case the primary failed, but to only schedule replication jobs to any node which has a different error than the primary. This has both the advantage that we only need to selectively schedule jobs for disagreeing nodes instead of targeting all secondaries and it avoids scheduling jobs in many cases where we do hit errors. Changelog: performance (cherry picked from commit 73839029f79d4ebdbc8d96475cf9bd0e2a599b2b)
2021-09-01transactions: Remove `DidCommitAnySubtransactions()`Patrick Steinhardt
The interface function `DidCommitAnySubtransactions()` isn't used by any callsite anymore. Drop it. (cherry picked from commit 528e8e926ef90d35ca85601f1eb28a947d63bea3)
2021-09-01coordinator: Create replication jobs if the primary cast a votePatrick Steinhardt
Starting with commit d87747c8 (Consider primary modified only if a subtransaction was committed, 2021-05-14), we consider primaries to not have been modified unless at least one subtransaction was committed. The intent of this change is to avoid queueing replication jobs in case an RPC returned an error without having modified any on-disk state. As it turns out, this optimization had unintended side effects: if an RPC fails on the first vote because of inconsistent state across all nodes, then we wouldn't ever schedule a replication job to fix this inconsistency. In some cases, this will keep up from making any progress at all because we will never converge towards the same state, for example in object pools. The current condition is clearly insufficient: if the initial vote fails, then we must schedule a replication job because we cannot tell much about the reason of its failure. This commit thus tightens the check: instead of requiring at least one committed subtransaction to consider the primary dirty, we now consider it dirty whenever it did cast a vote. With this change, we can still avoid replication jobs if secondaries created a subtransaction while the primary dropped out before casting a vote given that secondaries couldn't have reached quorum without the primary. But in all the other cases where the primary did cast a vote, we'll now go through our typical updated-outdated logic and will thus also know to replicate changes in case the first vote fails. Changelog: fixed (cherry picked from commit cbf4ed8ffcb9e2600bdb20da420bc694c8523d8f)
2021-09-01transactions: Implement function which tells whether votes did votePatrick Steinhardt
It's currently not possible to tell whether a given node has cast any vote or not. Implement it -- we'll need this as a heuristic to determine whether the primary has been dirtied. (cherry picked from commit 56b43c97077bbf6084ace172b22c9d7b70b9f91f)
2021-09-01coordinator: Explicitly exit early if primary is not dirtyPatrick Steinhardt
Under some circumstances, the primary node will not be considered dirty after a transactional mutator. No matter what, we do not have to schedule any replication jobs in these cases given that there shouldn't be any changes anyway. But even though we know early on that the primary is not dirty, we still collect updated and outdated nodes and return them to the caller. This code flow is a bit confusing and hard to reason about. Refactor the code to return early in case the primary is not dirty. (cherry picked from commit c12022da6d1c65bebd303786a7abcf04ee7d67b3)
2021-09-01coordinator: Combine node state log messages into a single messagePatrick Steinhardt
When determining updated and outdated secondaries for transactional mutators, we write several log messages stating why certain nodes are considered outdated or updated. Having this information split up across multiple messages is quite a pain if one wants to get a quick overview over why nodes are outdated given that one now has to search for multiple log messages. Combine these log messages into a single message which has secondary node states as a its metadata. (cherry picked from commit 10a91e71f8f0b6130cf40911e325964b49ecf4c2)
2021-09-01praefect: Fix incorrect error tracking for secondariesblanet
We use `nodeErrors` to track secondary errors of transaction involving RPCs, but there is a bug that the `ErrHandler` of each secondary is sharing the same variable `secondary` from the for-range loop. Thus we would always mark (and only mark) the last secondary error once there are any secondary errors. And what we would get from `getUpdatedAndOutdatedSecondaries` later is probably wrong. We can fix this by creating a new variable with the same name in for-range loop. Changelog: fixed (cherry picked from commit b76154dd7dce4aa7b5d3fa7416d1697965b656d0)
2021-09-01coordinator: Reflect primary's dirty-status when logging repl causePatrick Steinhardt
With commit d87747c82 (Consider primary modified only if a subtransaction was committed, 2021-05-14), we have changed replication logic to only trigger in case the primary is considered to have been dirtied. This was done such that we don't need to schedule replication jobs to secondaries in cases where we're sure that no user-visible change was performed anyway. The preexisting logging statements which inform about replication reasons haven't been adapted though, making it hard to see whether replication jobs were actually created or not based on the dirty status. Improve the situation by discarding log messages in case the primary has not been dirtied. (cherry picked from commit 3664de9fc2472c320c2be2e435e9c6c2c45eafdd)
2021-08-31Merge remote-tracking branch 'dev/14-0-stable' into 14-0-stableGitLab Release Tools Bot
2021-08-31Merge branch '14-0-sh-pack-objects-hook-cfg' into '14-0-stable'Sami Hiltunen
[14.0] Only activate Git pack-objects hook if cache is enabled See merge request gitlab-org/gitaly!3813
2021-08-31Update VERSION filesv14.0.9GitLab Release Tools Bot
[ci skip]
2021-08-31Update changelog for 14.0.9GitLab Release Tools Bot
[ci skip]
2021-08-31Only activate Git pack-objects hook if cache is enabledStan Hu
In https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3301, we dropped the `upload_pack_gitaly_hooks` feature flag because it was confusing to have to enable this feature flag on top of the pack objects cache setting in `config.toml`. However, we have found that spawning the gitaly-hooks environment can add significant CPU load due to overhead from transferring data over gRPC and spawning gitaly-hooks processes. We now only enable this hook if the pack objects cache is enabled. Relates to https://gitlab.com/gitlab-org/gitaly/-/issues/3754 Changelog: performance
2021-08-25Update VERSION filesv14.0.8GitLab Release Tools Bot
[ci skip]
2021-08-25Update changelog for 14.0.8GitLab Release Tools Bot
[ci skip]
2021-08-03Merge remote-tracking branch 'dev/14-0-stable' into 14-0-stableGitLab Release Tools Bot
2021-08-03Update VERSION filesv14.0.7GitLab Release Tools Bot
[ci skip]
2021-08-03Update changelog for 14.0.7GitLab Release Tools Bot
[ci skip]
2021-08-03Merge branch 'handle_long_commit_headers_14' into '14-0-stable'GitLab Release Tools Bot
Allow parsing of long git commit headers See merge request gitlab-org/security/gitaly!37
2021-07-20Update VERSION filesv14.0.6GitLab Release Tools Bot
[ci skip]
2021-07-20Update changelog for 14.0.6GitLab Release Tools Bot
[ci skip]
2021-07-20Merge branch 'pks-tx-remove-repository-v14.0' into '14-0-stable'Mayra Cabrera
coordinator: Fix repo creation/removal race for up-to-date secondaries for 14-0-stable See merge request gitlab-org/gitaly!3675
2021-07-16coordinator: Fix repo creation/removal race for up-to-date secondariesPatrick Steinhardt
When creating repositories, Rails will in some cases first schedule a call to `RemoveRepository()` to make sure that there's no old repository obstructing the path we are about to create the repository at. Given that `RemoveRepository()` isn't transactional, this action will be replicated to secondaries asynchronously. With our current design, these replication jobs will not cause a bump of the repository generation given that we may not even have an entry for the repository. Combined with the fact that we do not take outstanding replication jobs into account when computing whether a node is up-to-date or not, this means that secondaries will be considered up-to-date even though they have deletions pending. This results in a race between the replication jobs which are about to delete the target repository and the subsequent call to `CreateRepository()` et al. If this race is lost (that is, the deletion wasn't scheduled before `CreateRepository()` gets executed, which is almost always), then we will end up with the repository getting deleted during or after its creation. This race needs to be fixed via two changes: 1. We need to take outstanding modifying replication jobs into account when computing whether a node is up to date or not. Otherwise, we may be in the middle of processing a transactional RPC when a replication job kicks in and modifies repository state while we're operating on it. 2. With (1) implemented, we need to make `RemoveRepository()` transactional, otherwise it would always cause secondaries to be considered out-of-date and repository creation wouldn't use transactions in most cases. This commit implements (2) and enables transactional behaviour for `RemoveRepository()` which has been implemented in the preceding commit. Note that this requires a modification of `TestRemoveRepository()`: previously, we didn't set up the backchannel handshaker and thus couldn't accept votes from Gitaly nodes. Now that we have transactions enabled for this RPC, we must be able to handle votes or otherwise the RPC is slated for failure. We thus manually need to set up the node manager to have a backchannel handshaker. Changelog: fixed
2021-07-16repository: Support transactional voting in `RemoveRepository()`Patrick Steinhardt
The `RemoveRepository()` RPC currently does not support transactional voting. This is now implemented by doing a vote both previous to removing the repository and after the repository either has been removed or found to not exist. Transactional behaviour is not yet enabled in Praefect and thus this new logic isn't used. This will be handled in a subsequent commit. Changelog: added
2021-07-16repository: Refactor `RemoveRepository()` for easier readabilityPatrick Steinhardt
This commit contains some preliminary refactorings to make the `RemoveRepository()` RPC follow best practices and make it a bit more readable. No functional changes are expected from this commit.
2021-07-16coordinator: Clarify why transactional behaviour is disabled for RPCsPatrick Steinhardt
The map which specifies mutating RPCs and whether they should use transactions or not currently claims that the reason for disabled transactional behaviour is that the RPCs at hand simply don't modify references and thus shouldn't use transactions. This line of reasoning doesn't hold anymore: we have several RPCs which don't modify refs, but instead perform manual voting on different state. Split up the list of disabled RPCs into two: - The first list contains all the RPCs where we didn't implement manual voting logic yet, but which may be changed in the future to support transactional behaviour. - The second list contains RPCs which are considered idempotent "optimizing" RPCs: while they do modify on-disk state, the result as visible to the user should always remain the same. Those RPCs are special cased to not cause a repository generation bump and thus retaining non-transactional behaviour is fine there. This should hopefully represent the current state much more accurately.
2021-07-16praefect: Avoid double-proxied Gitaly nodesPatrick Steinhardt
Some of our Praefect server tests use `testserver.RunGitalyServer()` to setup their Gitaly nodes. While this isn't a problem by itself, this function will by default set up a Praefect proxy for the Gitaly nodes if running the "test-with-praefect" target. The result is that Gitaly nodes are proxied by two Praefects: first the one set up by above function, and then the one we're setting up ourselves in the tests. This is not a supported configuration and may cause RPCs to fail in case Gitaly needs to reach out to Praefect, e.g. for transactional voting. Fix the issue by passing the `WithDisablePraefect()` option.
2021-07-16testserver: Ensure Gitaly is healthy when using Praefect proxyPatrick Steinhardt
When using the Praefect proxy for our tests, then Praefect will use Gitaly's health information to inform routing decisions. As a result, we must make sure that Gitaly is in fact healthy before the tests start, otherwise routing may fail. Call `waitHealthy()` for the Gitaly connection to ensure that this is the case. In order to avoid code duplication, `waitHealthy()` is refactored to perform the dialling itself. This change requires us to get rid of the manually configured listening address in `TestCoordinator_grpcErrorHandling()` as it keeps us from connecting to the Gitaly node's health service. It doesn't seem to serve any immediate purpose anyway, so it's not much of an issue in the first place.
2021-07-08Update VERSION filesv14.0.5GitLab Release Tools Bot
[ci skip]
2021-07-08Update changelog for 14.0.5GitLab Release Tools Bot
[ci skip]
2021-07-07Merge remote-tracking branch 'dev/14-0-stable' into 14-0-stableGitLab Release Tools Bot
2021-07-07Update VERSION filesv14.0.4GitLab Release Tools Bot
[ci skip]
2021-07-07Update changelog for 14.0.4GitLab Release Tools Bot
[ci skip]
2021-07-06Update VERSION filesv14.0.3GitLab Release Tools Bot
[ci skip]
2021-07-06Update changelog for 14.0.3GitLab Release Tools Bot
[ci skip]