Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-14gitaly: Move write-ahead log logic into separate packagePatrick Steinhardt
The write-ahead log logic is currently contained in the rather generic `internal/gitaly` package. Since its introduction it has grown in scope though and has a more clearly defined purpose now, namely to manage all access to the Git repositories of a particular storage. Furthermore, the current naming schema creates some confusion due to the ever so tiny difference between `internal/gitaly/transaction_manager.go` betwen the write-ahead log's `internal/gitaly/transaction_manager.go` and `internal/gitaly/trannsaction/manager.go` part of our transactional voting logic. Move the implementation into a separate `storagemgr` package to more clearly define the purpose of the files.
2023-06-13Merge branch 'smh-backwards-compatible-hooks' into 'master'Sami Hiltunen
Extract hooks also into their non-WAL location See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5842 Merged-by: Sami Hiltunen <shiltunen@gitlab.com> Approved-by: James Fargher <proglottis@gmail.com> Approved-by: John Cai <jcai@gitlab.com>
2023-06-02Merge branch 'smh-error-dbl-commit-rollbak' into 'master'Toon Claes
Error when double committing or rollbacking a transaction See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5841 Merged-by: Toon Claes <toon@gitlab.com> Approved-by: Will Chandler <wchandler@gitlab.com> Co-authored-by: Sami Hiltunen <shiltunen@gitlab.com>
2023-06-02Merge branch 'pks-go-badger-v4' into 'master'Sami Hiltunen
go.mod: Update Badger to v4.1.0 Closes #4890 See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5864 Merged-by: Sami Hiltunen <shiltunen@gitlab.com> Approved-by: Quang-Minh Nguyen <qmnguyen@gitlab.com> Approved-by: Sami Hiltunen <shiltunen@gitlab.com> Co-authored-by: Patrick Steinhardt <psteinhardt@gitlab.com>
2023-06-02Merge branch 'smh-move-packs-in-place' into 'master'Sami Hiltunen
Apply logged pack files to repository without copying Closes #5046 See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5846 Merged-by: Sami Hiltunen <shiltunen@gitlab.com> Approved-by: karthik nayak <knayak@gitlab.com> Approved-by: Patrick Steinhardt <psteinhardt@gitlab.com> Reviewed-by: Patrick Steinhardt <psteinhardt@gitlab.com> Reviewed-by: Sami Hiltunen <shiltunen@gitlab.com> Reviewed-by: karthik nayak <knayak@gitlab.com>
2023-06-01go.mod: Update Badger to v4.1.0Patrick Steinhardt
Badger has bumped its major version from v3 series to v4. This bump is mostly uninteresting, but there are two interesting changes in there: - They have upgraded the minimum required Go version to 1.19. This has kept us from upgrading Badger given that we still supported Go 1.18. This has changed with a17fb7823 (go.mod: Use Go 1.19 as minimum required version, 2023-05-23) though, so we're unblocked with the upgrade. - They have changed their version schema from CalVer to SemVer, meaning that future releases should be more indicative of real major changes. Other than that there are no real changes that would be interesting to us.
2023-06-01Apply logged pack files to repository without copyingSami Hiltunen
TransactionManager is currently unpacking pack files in order to apply them to the repository from the log. This is inefficient: 1. The objects are copied from the pack file when they are unpacked. 2. Accessing unpacked objects is less efficient than packed ones. 3. It's more work to repack the objects later. To avoid these problems, this commit instead hard links the logged pack files from the log to the repository. This is done as follows: 1. When staging the transaction, a pack file is computed that contains objects that are newly reachable from the new reference tips. An index is also computed for the pack file. 2. The pack file and its index are logged. 3. The pack file and the index are hard linked from the log directory to the repository's 'objects/pack' directory. The pack file and the index are logged with the name Git gives them, so `pack-<digest>.{pack,idx}`. It would be simpler to log the files with static names so they are always named the same for all transactions, say `transaction.{pack,idx}`. They could then be linked to the repository's object directory under the log entry's index, so for example `objects/pack/<log_index>.{pack,idx}`. This would be simpler as we wouldn't have to pipe the pack prefix through the log. The problem is that Git doesn't seem to automatically remove these packs when it's doing a full repack with `git repack -ad`. This would lead to the packs accumulating in the repository. For that reason, we use the Git generated names. Applying pack files directly doesn't also come without downsides. With each write resulting in an additional pack file in the repository, looking up objects from the packs becomes slower. This is particularly a problem if there are large number of small writes. We might want to later have a look at using a threshold on the object count in the pack to decide whether we apply it directly or unpack it similarly to what Git does when receiving packs. We'll leave this for later though once we know this is a problem.
2023-06-01Log a pack file index along side transaction's pack fileSami Hiltunen
The objects a transaction makes reachable are collected into a pack file that then ultimately gets logged. On log applicaton that pack file is currently unpacked into the repository. A more efficient way to apply the pack file would be to hard link it into the repository directly from the log to avoid copying. In order for the objects to be available for reading, the pack file needs an index which we're currently not computing. To prepare for linking the pack files into place, this commit computes an index for the transaction's pack file prior to logging the transaction. The index is already logged along side the pack file but we don't yet make use of it.
2023-06-01Log a directory instead of a single pack fileSami Hiltunen
Each transaction has a staging directory where they can stage files for commit. Currently it is used to store two things. - stagingDir/quarantine stores the transaction's quarantine directory. This is where all the RPC handlers write the git objects prior while staging the transaction prior to a commit. - stagingdir/transaction.pack is created right before committing the transaction to create a pack file that can be logged. The current assumption is that we're only ever logging a single file, which is the pack file containing the transaction's objects. However, there are other files we could want to log along the pack file. For example, we could compute and log the pack file's index to avoid having to do so when applying the log entry. This commit changes the transaction.pack to be placed in stagingDir/wal-files/transaction.pack. Instead of logging only the pack file, we log the entire wal-files directory. This makes room for logging other files alongside the pack file as well. There's no changes in external behavior.
2023-05-31Merge branch 'pks-go-v1.20' into 'master'Sami Hiltunen
go: Use Go 1.19 as minimum required version Closes #5324 See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5831 Merged-by: Sami Hiltunen <shiltunen@gitlab.com> Approved-by: Quang-Minh Nguyen <qmnguyen@gitlab.com> Reviewed-by: Patrick Steinhardt <psteinhardt@gitlab.com> Reviewed-by: Quang-Minh Nguyen <qmnguyen@gitlab.com> Co-authored-by: Patrick Steinhardt <psteinhardt@gitlab.com>
2023-05-30Merge branch 'smh-validate-hooks' into 'master'Justin Tobler
Validate custom hook archive prior to accepting a transaction Closes #5126 See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5835 Merged-by: Justin Tobler <jtobler@gitlab.com> Approved-by: Justin Tobler <jtobler@gitlab.com> Approved-by: Toon Claes <toon@gitlab.com> Co-authored-by: Sami Hiltunen <shiltunen@gitlab.com>
2023-05-30gitaly: Fix deadlock with Go 1.20 in transaction manager testPatrick Steinhardt
In the transaction manager we have a set of tests that verify whether asynchronous deletion of repositories works as expected. This test has started to indeterministically deadlock with Go 1.20. Bisecting this regression points to upstream commit 8477562ce5 (cmd/compile: be more careful about pointer incrementing in range loops, 2022-11-11). This commit changes how pointers are incremented in loops so that there never is a Go pointer that points past the backing array. This is done by using a `uintptr` to track the next array member at the end of the loop, which is thus not getting treated as a valid pointer by Go. Interestingly, the regression is fixed with 0384235a15 (cmd/compile: don't mark uintptr->unsafe.Pointer conversion unsafe points, 2023-01-11), which changes semantics around conversions between `uintptr` and unsafe pointers. Previously, code preceding any such conversion is considered to be an unsafe point in the code flow. This had the consequence that Go didn't allow preemption of Goroutines between any such conversion and a preceding function call. And in combination with the above commit that introduces the regression it could now happen that a complete loop is considered to be unsafe where it previously was safe for preemption. The second commit then fixes the issue because it starts to not treat `uintptr` to unsafe pointer conversions as unsafe anymore. We seemingly have such a case in our transaction manager tests where a loop is now busy spinning without ever being preempted anymore when trying to delete repositories asynchronously. It is not exactly clear where this is happening though, but we can seemingly work around the deadlock by changing a non-ranged loop to stop busy spinning. Fix the deadlock by converting the `admitted` field of transaction from a boolean to a channel. This allows us to wait for transactions to be admitted without busy spinning and thus avoids the deadlock. It is more of a workaround than a proper fix, but this should be good enough for now given that the compiler regression is about to be fixed with Go 1.21 anyway.
2023-05-30gitaly: Cleanup stale reflocks in `prepareReferenceTransaction`Karthik Nayak
When running `prepareReferenceTransaction`, there is a possibility that it fails due to the existence of stale reference locks in the repository. Since TransactionManager is the only process apart from housekeeping which writes into the repository, we should be okay with cleaning these stale reference locks. So in `prepareReferenceTransaction`, if we encounter a stale reference lock, we add an inhibitor on housekeeping from running git-pack-refs(1). This is to avoid new reference locks from being created. Then we clear the existing stale reference locks before continuing further. To enable this we expose the internal function `addPackRefsInhibitor` as `AddPackRefsInhibitor` in the housekeeping package via the `HousekeepingManager`. We also add a new step type: `ModifyFiles` in `TestTransactionManager` which allows us to create files in the repository.
2023-05-30gitaly: Add `RepositoryManager` to `TransactionManager`Karthik Nayak
In the `TransactionManager` we want to cleanup stale lock files if we run into them (but not when housekeeping is running). To do this we need the `housekeeping.RepositoryManager`, let's add this field to the TransactionManager. The following commit[s] will utilize the field.
2023-05-30gitaly: Remove the `repository` interfaceKarthik Nayak
The interface `repository` was added to allow us to add hooks for testing. In the previous commit, we removed this dependency. Let's remove the interface altogether and pass `localrepo.Repo` everywhere. We need this for the upcoming commits wherein we'll integrate `housekeeping.Manager` which expects us to pass a `localrepo.Repo` variable.
2023-05-25Validate custom hook archive prior to accepting a transactionSami Hiltunen
TransactionManager is currently not validating the custom hook archive in any way prior to accepting and logging the transaction. This could lead to invalid hook archive being logged which would prevent applying the log entry of the transaction leading to transaction processing halting. This commit verifies the archive by extracting it on the disk to the staging directory prior to logging. These extracted files can then later be also used for computing a vote to Praefect from the hook files. For now, no other validation is performed than just ensuring the hooks can be extracted on the disk. This matches the behavior in SetCustomHooks which also doesn't verify anything. This is something we should improve later. However, if the hooks extract successfully, the log processing won't fail because of them. If the hooks fail to execute, it will fail the hook execution but that's outside of the scope of the TransactionManager. The hooks can still be fixed by committing new ones.
2023-05-25Extract hooks also into their non-WAL locationSami Hiltunen
The TransactionManager is storing new versions of custom hooks always into a new directory. This ensures we can isolate transactions reading custom hooks from concurrent updates. This creates a problem if we want the WAL logic to be safe to toggle off without losing data. If the hooks are only written to the new location, they won't be used anymore if the WAL logic is subsequently disabled. To make the WAL safe to toggle on and off, this commit writes the custom hooks from the log also to their old location in `<repo>/custom_hooks`. This ensures that the hook updates remain in place even if we have to disable the WAL after enabling it for some time.
2023-05-25Error when double committing or rollbacking a transactionSami Hiltunen
Transaction currently doesn't check whether it has already had Commit() or Rollback() called on it when either is called. This currently leads to failures and unexpected behavior. While there are currently no callers doing so, this will be a common pattern as we start integrating the transactions in the RPC handlers. They'll generally defer a rollback immediately after beginning a transaction and commit the transaction at the end of the RPC if is successful. The common sequence is thus calling commit, followed by the deferred rollback call. This commit changes Commit() and Rollback() to return an appropriate error if the transaction is already committed or rollbacked. The caller can then handle the error if it doesn't care about it. There will be a helper later to handle logging rollback errors. That logging helper will ignore the 'already committed' errors given they are expected but not other errors.
2023-05-25Simplify default branch setting in TransactionManagerSami Hiltunen
TransactionManager is currently applying the default branch update with a call to localrepo.Repo.SetDefaultBranch. This is not ideal: 1. It calls `check-ref-format` to verify the reference. This is not necessary when applying a log entry as the reference has already been validated prior to logging the update. 2. It performs the update manually on the disk. 3. It contains voting related logic which isn't needed when applying a log entry. Simplify and perform the update with a call to `git symbolic-ref` instead.
2023-05-25Rename updateDefaultBranch to applyDefaultBranchUpdateSami Hiltunen
This commit renames updateDefaultBranch to applyDefaultBranchUpdate for more consistent naming with the other log application methods.
2023-05-25Remove unnecessary validation from default branch updates with WALSami Hiltunen
The TransactionManager is currently performing some extra validation on default branch updates. Namely, it checks that the reference HEAD is being pointed to exists. This deviates from the non-WAL behavior which updates the HEAD to point to the reference regardless of its existence. This is also a problematic consistency guarantee to uphold. When repository is created, HEAD points to `refs/heads/main` but the branch doesn't necessarily exist meaning the starting state is already violating the consistency guarantee the updates attempt to uphold. More over, there's nothing preventing deleting the reference that is pointed to by HEAD. Other application code also already has to handle the case when reference pointed to by HEAD doesn't exist. Given these consistency guarantees are not upheld, and they are not needed, this commit removes the checks. The default branch is updated as long as the reference format is valid.
2023-05-25Inject git.CommandFactory to TransactionManagerSami Hiltunen
This commit injects a git.CommandFactory instance into TransactionManager. This will later be needed to spawn git commands that don't run in a given repository.
2023-05-24Ensure TransactionManager has applied all log entries before assertionsSami Hiltunen
There's a race currently in the TransactionManager tests where we're not always guaranteed to have all state applied on the disk before asserting it if the repository is being deleted by a concurrent transaction. Generally this is guaranteed due to transactions being processed one by one. The `checkManagerError` is queuing a failing transaction to see if the manager picks it for processing. When it does, the manager would have already applied all preceding log entries. Repository deletions are different as we can't delete the repository before all transactions with a pre-deletion snapshot are done as otherwise we'd delete the data they access. To avoid a dead lock where the manager is waiting for the transactions to finish and the transactions are waiting for the manager to admit them while it is stuck applying the deletion, the applyRepositoryDeletion is actively rejecting transactions that are waiting to commit when it is applying a deletion. Once all have been rejected, the deletion is applied. The above logic is the source of the race in the tests. The test transaction queued in `checkManagerError` is rejected before the deletion is applied. This unblocks the main test goroutine which then proceeds to run the assertions on the disk state before the deletion is fully applied. Fix this race by always beginning a new transaction if the manager is running in `checkManagerError`. Transaction's must wait for all committed data to be applied before they can begin, beginning and rolling back a transaction works to ensure all committed data has been applied. Transactions beginning in a non-existent repository were previously early exiting without waiting for the logged data to be applied. This was fine as if the repository doesn't exist, there's no data to be applied and returning a not found error early was fine. This commit removes the special casing so we also wait for the deletion to be applied before returning from Begin. This leads to no external behavior change but allows us to ensure the log has been applied by starting a transaction.
2023-05-23Allow forcing reference updates in TransactionManagerSami Hiltunen
TransactionManager is currently always verifying the reference's old value matches the expected one in the update. There are some RPCs in Gitaly that perform updates forcefully without checking the reference's old value. One such example is WriteRef. To support such use cases, this commit implements a Force flag in ReferenceUpdate that allows for applying the reference update regardless of the old value.
2023-05-23Include custom hook path in a transaction's snapshotSami Hiltunen
TransactionManager manages custom hooks via MVCC. Custom hooks are always written into a new directory to isolate transactions from concurrent writes. When executing or reading the custom hooks, the transactions should use the custom hooks included in the snapshot. While the log index of the custom hooks is already included in the snapshot, it's not useful for integration as it doesn't yet say where exactly the custom hooks should be executed from. Solve this problem by including an absolute path to the custom hooks on the disk. This will later be used by HookManager to execute the correct custom hooks for a transaction, and GetCustomHooks to fetch the correct version of the custom hooks. Backwards compatibility logic is included to execute custom hooks from the repository if none have yet been written via WAL.
2023-05-23Extract custom hook path generation into a functionSami Hiltunen
This commit extracts the custom hook path generation into a function. Extracting a function will later make it easier in tests to assert we have the correct path when we add the custom hook path to the transaction's snapshot. While at it, update the parent directory to be synced with the recently introduced helper designed for it.
2023-05-23Call hooks custom hooks in TransactionManagerSami Hiltunen
This commit updates references to hooks to talk about custom hooks in TransactionManager. This makes it clearer we are talking about the custom hooks the users can write in the repository, not the shims used by git to call back to Gitaly.
2023-05-21Prevent transaction beginning if initialization failsSami Hiltunen
Begin currently waits for the TransactionManager to initialize but it doesn't check whether the initialization was successful. As the transaction will not anyway succeed if the initialization failed, return an appropriate error from Begin instead. This way the transaction doesn't end up doing any useless work and the error message will be clearer than having a random failure at some point during the transaction.
2023-05-21Write-ahead log repository deletionsSami Hiltunen
Repository deletions need to be write-ahead logged as well. This ensures their atomicity and that they can be replicated later as part of the log. Once a repository deletion has been commited, all subsequent Begin and Commit calls will fail with a 'repository not found' error. The repository is logically deleted but not yet physically. The physical deletion needs to wait for open transactions to finish so we don't remove the files they are operating on. For now, it's possible to set other updates in the Transaction even if it ultimately removes the repository. Technically this is fine but it's a bit non-sensical and we don't have a use case for it in Gitaly. We'll probably later improve the interface by splitting out different transaction types so UpdateReferences() can't be called on the same transaction that deletes the repository.
2023-05-21Track open transactions in TransactionManagerSami Hiltunen
The TransactionManager needs to keep track of open transactions in order to avoid removing data they still need. Given sequence: - *latest hooks are from index 1* - Begin TX2 - Begin TX3 - Commit TX3 storing new hooks The TransactionManager can't prune hooks from index 1 before TX2 has finished as TX2 may still read them. Once all transactions that may be reading the old version of hooks have finished, the TransactionManager should prune the old version of the hooks to ensure we don't keep filling up the disk. This commit adds a list to keep track of open transactions. This list can later be used to synchronize with open transactions and waiting until all open transactions using the data have finished prior to removing it. As Begin is now registering the transactions and thus also writing to the TransactionManager's state from a different goroutine than Run(), the mutex is changed to a normal mutex from RWMutex.
2023-05-18Acknowledge transactions only after applicationSami Hiltunen
TransactionManager is currently ackonwledging transaction commits as soon as the transaction has been logged. This is the ultimate behavior we want to end up with. However, when we rollout the WAL, we want the configuration toggle to be safe to toggle back and forth. The current behavior means that if the WAL is toggled off before the log has been fully applied, we'd effectively lose writes. We can avoid this by only acknowledging the writes once they've been successfully applied to the repository. This way the write will be present even if the WAL is toggled off, or it won't have been acknowledged as committed. This also makes it easier to integrate the WAL in our tests. Many of the tests inspect disk state directly and do not interrogate the test state through RPCs. This means that the transactions may not be applied to the repository before the tests run their assertions on the state. Acknowledging writes only after they've been applied also synchronizes the tests as a side effect.
2023-05-10Inject a factory for localrepo.Repos instead of a RepoSami Hiltunen
TransactionManager is currently taking in a localrepo.Repo as a parameter. This works fine enough if the repository exists. We'll soon be handling repository creations and deletions as well. For those operations the same localrepo.Repo instance may not work: 1. If the repository is being created, we'll still need a git repo to stage the transaction in. Certain command like 'rev-list' and 'pack-objects' require a repository. We are also using Git to verify the references which we can't do without a repository. Repository creations will use a temporary staging repository to run these commands. The factory being injected here will be used to construct its localrepo.Repo instance. 2. localrepo.Repo is caching the object hash information. If a repository is deleted and recreated, the repository may be recreated with a different object format, in which case the cached format would be wrong. This likely never happens in context of GitLab Rails, but since the API allows for it, it needs to be handled. With the factory we can recreate the localrepo.Repo, and thus check the object format again after the recreation. This commit thus plugs in a localrepo.Repo factory instead of a particular instance to lay the ground for addressing both of the points.
2023-05-10Centralize transaction quarantining logic into a staging repositorySami Hiltunen
The transaction's quarantine directory is currently being applied to the repository in multiple places. This commit stores a quarantined repository in the transaction that is used by all of the staging and verification logic. This centralizes the logic in a single place, and makes it easier to plug more logic needed for example when the repository doesn't exist yet.
2023-05-10Instantiate log entry in processTransactionSami Hiltunen
We're currently instantiating the log entry in verifyReferences. Change it to be instantiated directly in processTransaction so it's easier to compose the log entry if verifyReferences is not the first method called in processTransaction.
2023-05-10Move default branch verification to processTransactionSami Hiltunen
Transaction's default branch update is currently being verified in verifyReference. If the verification passes, the branch update is set in the returned log entry. We're soon going to change verifyReferences to not return a log entry but just the reference updates. This makes it easier to compose the log entry from various parts of the transaction. To facilitate this, move default branch verification to happen directly in processTransaction.
2023-05-10Drop storage from database keysSami Hiltunen
The database keys used by the TransactionManager are currently in the form of <storage_name>:<relative_path>. Gitaly lacks a persistent identifier for the repositories, and they are generally identified by the relative path. For now, it's not really good either given the relative path can be changed through RenameRepository. This will be fixed in the future by removing RenameRepository, after which the relative path is stable. The storage part has additional considerations. First of all, it's not stable given the storage name could technically be changed. It's also unnecessary. The storages are independent of each other, and can be for example moved to a different node. They're also independent failure domains, given the failure of a storage doesn't affect another storage. Given this, each storage will have its own database instance once we integrate the write-ahead logging in Gitaly. Given each TransactionManager works within a single storage, the database is unique to a single storage, and a repository's relative path uniquely identifies a repository within a storage, the storage name is unnnecessary in the database keys. This commit thus removes it from the keys. The PartitionManager itself is not yet updated to handle separate databases for the storages. This will be done in a follow up.
2023-05-10Pass storage and relative path directly to TransactionManagerSami Hiltunen
TransactionManager is currently retrieving the repository path through the localrepo.Repo. This won't work in the future anymore once we WAL repository creations and deletions. Repo.Path() checks whether the directory is a git repository, which won't be the case if the repository was deleted or hasn't been created yet. We need the path when initializing the manager so we can load its state correctly. We're taking the storage path and relative path instead of a GitRepo as it works better in the context of the TransactionManager: 1. The TransactionManager always works within a single storage, given it operates on a single repository. The typical interface of using a Locator to find the path of a repository is thus needlessly complex, and would force error handling to check a non-existent storage is not accessed. Taking the path directly doesn't require unnecessary error handling here, and forces the caller to handle it. 2. For most cases, just taking the repository's absolute path would be enough However, we also need an identifier to use in the database keys. Currently the identifier is <storage>:<relative_path>. Each storage will be an independent failure domain and will have their own database instance. Given that, a relative path is enough to uniquely identify a repository within a storage. We also don't have a better identifier available yet so we use a relative path. After the RenameRepository is removed, the relative path will be stable and okay to use as an identifier. The PartitionManager is extended to take in the configured storages so it can pass the required parameters when constructing the TransactionManagers.
2023-05-10Release mutex when TransactionManager errors outSami Hiltunen
There's an error condition in TransactionManager that is not releasing the mutex like the success path does. This can leave other goroutines stuck waiting on the mutex. This is mostly a theoretical problem as the error condition should never occur. For thoroughness, let's fix it anyway.
2023-05-10go: Bump module version from v15 to v16Patrick Steinhardt
We're about to release Gitaly v16.0. As we've landed a bunch of previously announced removals it's thus time to bump our Go module version from v15 to v16.
2023-05-09Implement write-ahead logging for objectsSami Hiltunen
All writes must be write-ahead logged prior to being applied to the repository. Objects are currently not being logged, which can be a source of inconsistencies and also performance problems. This commit implements logging support for objects that are needed by a transaction. Objects can be included in a transaction by storing them in the tranasction's quarantine directory. When the transaction is being committed, the TransactionManager computes a pack file from the new reference tips set in the transaction. The pack file includes objects that are unreachable from the current set of references, so it includes both new objects from the quarantine directory and objects that are already present in the repository but are unreachable. This ensures that the pack file contains all objects that are needed to go from the current set of references to the new set of references after the transaction. This is important as the unreachable objects needed could be otherwise pruned, leading to the pack file no longer applying to the repository. As objects always flow through the log, this also means that only commited objects end up in the repository. This is an important property for backups. The repository will get into a consistent state by applying the write-ahead log. If objects could end up in the repository without being logged, some logged reference changes could fail once a repository is being recovered from a snapshot + log as neither the snapshot nor the log would be guaranteed to include the objects being newly referenced in a log entry. For replicated setups later, the fact that only committed objects end up in the repository means that all replicas are guaranteed to have received the same objects at some point. If objects from failed writes could end up in the repository, the leader could have a different set of objects from the replicas due to these objects which are not replicated. As the pack files are computed to include also unreachable objects, the pack file is guaranteed to apply on another replica regardless if it has garbage collected the objects. The pack files will apply even if the unreachable objects are pruned while they are sitting in the log. However, the current approach is not enough if there are concurrent transactions there is nothing holding on to old tips of references the pack file was computed against. This will be fixed in a follow up by maintaining internal references to the old tips of references until all dependent pack files have been applied. The pack file computation is computationally expensive but should be behaviorally correct. This is an iteration for now that allows us to proceed. We'll later need to update the approach to a less computationally heavy one, for example by just packing the quarantined objects and holding internal references to the objects the pack file depends on in the repository.
2023-04-26Merge branch 'jt-partition-manager' into 'master'Will Chandler
gitaly: Introduce partition managers See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5600 Merged-by: Will Chandler <wchandler@gitlab.com> Approved-by: Will Chandler <wchandler@gitlab.com> Reviewed-by: Sami Hiltunen <shiltunen@gitlab.com> Reviewed-by: Justin Tobler <jtobler@gitlab.com> Co-authored-by: Justin Tobler <jtobler@gitlab.com>
2023-04-26repoutil: Move ExtractHook helperJames Fargher
This helper used to live in the "internal/gitaly/hook" package but this package is primarily concerned with running hooks of all kinds rather than repository management. So we move this helper function to group all custom hook management together.
2023-04-24gitaly: Introduce transaction finalizersJustin Tobler
Currently `TransactionManager` is explicitly started before generating transactions. It also must be explicitly stopped once all transactions are complete. A future commit will introduce `PartitionManager` which will manage the lifecycle for all instances of `TransactionManager`. In order to know that a transaction is finished a signal must be sent back to `PartitionManager`. This change introduces `transactionFinalizer` to `TransactionManager` which executes on transaction commit and rollback. This can be leveraged by the `PartitionManager` to perform `TransactionManager` cleanup.
2023-04-24gitaly: Generalize repository ID generationJustin Tobler
Currently `getRepositoryID()` requires a `repository` as an argument. In a future commit a repository ID will also need to be generated directly from a `repository.GitRepo`. Since `repository` already implements `repository.GitRepo` and `getRepositoryID` does not rely on anything additional, the argument type is change from `repository` to `repository.GitRepo`.
2023-04-17Return an error from RollbackSami Hiltunen
Rollback will soon have logic to clean up after the Transaction. Change the signature to return an error as well so we can return any clean up errors to the caller.
2023-04-17Extract transaction processing logic into a methodSami Hiltunen
The Run method growing bigger and handling more and more. This commit extracts the transaction processing logic into a separate method. This keeps the size of Run in check and allows for easier deferring transaction related clean up logic.
2023-04-17Create TransactionManager's directories in initializeSami Hiltunen
TransactionManager stores state in directories in the repositories. Currently those directories are not guaranteed to exist when the TransactionManager is running. This forces the code to handle the fact. This commit introduces createDirectories which creates the expected directories if they don't exist when initializing. We then simplify locations that had to handle the case when an expected directory did not exist. For now there is just the hooks directory but we'll soon have a directory for pack files as well. The tests are updated to assert the default state automatically so we don't have to repeat it unnecessarily.
2023-04-11Enable snapshot reads of hooksSami Hiltunen
When a transaction begins, its read snapshot is determined. The snapshot is a consistent view of the data at a given point in time. Every operation through out the transaction should read this snapshot regardless of other writes happening and committing concurrently. To do so without blocking concurrent modifications, databases generally use multiversion concurrency control which is a technique where in multiple versions of a given data point is retained until they are no longer needed by open transactions. Gitaly's TransactionManager is currently storing multiple versions of custom hooks on the disk to enable snapshot reads of them. However, there's currently no way to know which hooks are included in given transaction's snapshot. This commit addresses that issue by including the right version of the hooks in the transaction's snapshot. When a transaction begins, it records the current hookIndex. The hookIndex tracks the log index of the latest log entry that updated the hooks. As the hooks are stored in the repository at `<repo>/wal/hooks/<log_index>`, this is enough to determine the latest committed hooks in the repository at the time the transaction began. This is then exposed through the transaction which allows for calling code to invoke the correct hooks to do a snapshot read. This will eventually be piped to the HookManager which would use the value to invoke the correct hooks for the transaction, and should also be considered during other reads of the hooks such as backups. The assumption is that hookIndex is always points to the latest hooks. This holds during normal operation as the hookIndex is always updated when new hooks are logged. To make this hold even after restarts, logic is added in `initialize` to determine the latest hooks by peeking in the write-ahead log and the repository. The hooks are currently not pruned from the repository after being replaced, so the hooks are guaranteed to stay in place while the Transaction runs. Now that we can track which hooks each transaction is using, we can later in a follow up implement pruning of old hooks once there are no open transactions using them.
2023-03-30Synchronize transaction beginning with log applicationSami Hiltunen
Begin is called to start a new transaction. Transaction should see all data that was committed prior to the transaction beginning. As it is currently, Begin has no synchronization with the log entry application. It's possible that a transaction begins before all committed data has been applied to the repository and thus available for the transaction to read. This commit addresses the problem by waiting for committed data to be applied before the transaction can begin. When Begin is called, it determines the transaction's read index. The read index is a log index at which the data is to be read at. When a transaction begins, we thus have to ensure all of the data logged prior to the transaction starting has been applied to the repository. This is achieved by the TransactionManager maintaining notification channels which it broadcasts log entry application to all waiters. These channels are listened on in Begin, and used to block the transaction from beginning prior to the committed data being applied. This guarantees the transaction will read all data committed prior to its start. As Begin now waits for all committed data to be available, we no longer have to wait for a log entry to be applied prior to releasing a Commit call. The writer is guaranteed to see the data it wrote when it begins a new transaction. Tests are updated to reflect this. Tests are also updated to assert the transaction snapshot matches what is expected.
2023-03-24Remove transactionFuture typeSami Hiltunen
transactionFuture type feels quite unnecessary as the result channel could be just stored on the Transaction as a private field like the other fields. Remove the unnecessary type to simplify the type structures a bit.