Age | Commit message (Collapse) | Author |
|
Each transaction has a read snapshot that maintains its isolation
from concurrent writes. The snapshot includes also the version of
custom hooks. When the hooks are executed or read, the operation
should be targeted at the correct version of hooks. The transaction's
snapshot includes the absolute path to the custom hooks on the disk.
This commit teaches HookManager to execute the snapshotted hooks
from the directory included in the transaction.
While the HookManager knows how to execute the correct hooks now,
not all of the call sites pass a transaction yet.
|
|
TransactionManager manages hooks via MVCC. Hooks are always written
into a new directory to isolate transactions from concurrent writes.
When executing or reading the hooks, the transactions should use
the hooks included in the snapshot. While the log index of the hooks
is already included in the snapshot, it's not useful for integration
as it doesn't yet say where exactly the hooks should be executed from.
Solve this problem by including an absolute path to the hooks on the
disk. This will later be used by HookManager to execute the correct
hooks for a transaction.
Backwards compatibility logic is included to execute hooks from the
repository if none have yet been written via WAL.
|
|
This commit extracts the hook path generation into a function.
Extracting a function will later make it easier in tests to assert
we have the correct path when we add the hook path to the transaction's
snapshot.
While at it, update the parent directory to be synced with the
recently introduced helper designed for it.
|
|
Transaction has a read snapshot that includes a specific version of
custom hooks. TransactionManager is managing hooks via multiversion
concurrency control, that is, hooks are always written as an entirely
new version and never update in place. The HookManager needs to take
in the Transaction so it can determine which version of the custom
hooks to execute. This commit wires the Transaction to the HookManager
but doesn't do any logic changes yet.
|
|
This commit adds the write-ahead log test targets to our CI jobs so
they get tested as well.
|
|
This commit adds 'test-wal' and 'test-with-praefect-wal' targets to
the makefile for running tests with the WAL enabled.
|
|
This commit changes UserCommitFiles to use the WAL for comitting its
writes if enabled. When enabled, a transaction is begun through
PartitionManager for the repository, the objects are written into the
transaction's quarantine directory, and ultimately both reference
changes and objects are committed through TransactionManager.
WAL is enabled if PartitionManager is set on the server instance. For
now it is only set when the `GITALY_TEST_WAL` environment variable is
set when running the test. A Makefile target will later be added for
testing the WAL and a CI job.
There are slight behavior changes with the hooks which are left be for
now:
- 'reference-transaction prepared' is currently not acquiring locks nor
verifying the reference changes. We'll likely have to extend the
TransactionManager to handle this before we can roll WAL out given
this affects Praefect's behavior.
- 'update' hook is invoked with the objects still in the quarantine as
opposed to the non-WAL variant. This will likely have to remain so.
The objects are only written into the repository when the transaction
is committed. For local execution, this shouldn't matter much since
the quarantine directory is configured in the invocation. For loopback
calls like what Rails does with pre-receive hook, this would be a
breaking change.
|
|
PartitionManager is the entry point for the WAL logic. It handles
routing transaction to the correct partitions and the lifecycle of
TransactionManagers responsible for those partitions. As a first step
into integrating it into our tests, wire the PartitionManager into
OperationService tests. As we eventually want run the tests with both
WAL enabled and disabled, the PartitionManager is configured only if
'GITALY_TEST_WAL' environment variable is set. All of the users of the
WAL will have to check whether or not it is set. This remains the same
when we eventually make it possible to enable the WAL logic through the
configuration. If set, we'll configure and dependency inject. If not set,
we'll leave it nil.
testPackObjectsConcurrency was changed to use a unique config for each
test case so each of the tests have a unique storages for the PartitionManager
to initialize state in.
|
|
We're about to wire PartitionManager into Gitaly's test server. On
initialization, it creates expected directories and opens a database
handle for each storage's database. This does not work in some tests
as they configure invalid storages which don't exist. Let's change the
tests to break the storages after initialization. Gitaly should anyway
be ensure that the storages exist on boot.
|
|
TestUserCommitFiles doesn't assert that the RPC didn't error if it
isn't expected. This leads to a panic in the later assertions when
accessing fields of the response when the response is nil. Add the
missing error assertion.
|
|
Transaction's staging directory could currently be initialized multiple
times if it is called more than once. Right now the only way would be
to call Transaction.QuarantineDirectory() multiple times. This can be
convenient in some contexts though, so let's support it.
|
|
TransactionManager is currently ackonwledging transaction commits
as soon as the transaction has been logged. This is the ultimate
behavior we want to end up with. However, when we rollout the WAL,
we want the configuration toggle to be safe to toggle back and forth.
The current behavior means that if the WAL is toggled off before the
log has been fully applied, we'd effectively lose writes. We can avoid
this by only acknowledging the writes once they've been successfully
applied to the repository. This way the write will be present even if
the WAL is toggled off, or it won't have been acknowledged as committed.
This also makes it easier to integrate the WAL in our tests. Many of the
testst inspect disk state directly and do not interrogate the test state
through RPCs. This means that the transactions may not be applied to the
repository before the tests run their assertiosn on the state. Acknowledging
writes only after they've been applied also synchronizes the tests as a
side effect.
|
|
We're about to integrate Gitaly's WAL logic into the first tests.
This is about to create a cyclic import between the ìnternal/gitaly
and internal/git/updateref packages. TransactionManager uses the
updateref package to perform the refrence updates. UpdaterWithHooks
is used from the operations service package to commit reference
changes and execute hooks. In order to handle Transaction's from the
TransactionManager, the updateref package would have to import them
creating a cyclic dependency.
UpdaterWithHooks is only used from Gitaly's services, and is not really
a utility for working with Git itself. Move it's implementation to
internal/gitaly/hook/updateref to break the cycle. It's places as a
subpackage in the hook package as it uses it to run the hooks.
|
|
PartitionManager is currently not logging the partition's ID when
logging error messages related to it. This makes it difficult to
identify which partition faced errors. Include the ID in the logs.
|
|
This commit extends PartitionManager's test to cover:
1. Multiple storages receiving transactions
2. Transaction for a non-existent storage.
3. Relative path cleaning and validation
|
|
Since all storages now have independent state for partitions, it is
no longer necessary to prefix the partition keys with the storage
name. Remove the storage name component from the partition keys.
|
|
Gitaly's PartitionManager is responsible for managing partitions. This
entails spawning TransactionManagers as needed to process transactions.
Currently all of the partitions share the same database and staging
directory. This is not correct as the storages generally reside on
different disks.
Doing copy free moves of files and hard links across disks is not possible.
TransactionManager relies on this functionality to log pack files without
copying them.
Each storage should have their own embedded database. The storages are meant
to be fully independent of each other and should keep functioning even if one
of them fails or is removed, so they need to contain all of the data they need
to function. If they share the database, all storages will lose their state if
the disk hosting the storage holding the shared database fails. There's also a
performance benefit for having a database separately for each storage, as the
database performance on a single storage doesn't affect other storages if all
of them have their own.
It's also not necessary to synchronize reads or writes between storages. If the
transactions target different storages, they are clearly working on separate
pieces of data.
This commit addresses the above points and makes the storages fully independent
of each other by:
- Giving each storage their own staging directory.
- Giving each storage their own database.
- Using separate locks for accessing data on different storages.
Together, these reduce unnecessary contention transactions to differnet storages.
As each storage is self contains all state, they become independent failure
domains. They can also be moved between nodes freely as they don't depend on data
in other storages.
|
|
praefect: migrate verify sub-command
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5767
Merged-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
Approved-by: Toon Claes <toon@gitlab.com>
Approved-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
Reviewed-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
Co-authored-by: Pavlo Strokov <pstrokov@gitlab.com>
|
|
gitaly-hooks: Quarantine flaky test
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5793
Merged-by: Toon Claes <toon@gitlab.com>
Approved-by: Toon Claes <toon@gitlab.com>
Co-authored-by: Justin Tobler <jtobler@gitlab.com>
|
|
Migration of the 'verify' sub-command from the old
implementation to the new approach with usage of the
third party library. The invocation of the command remains
the same, but it has more descriptive representation when
help is invoked. Also, it is possible now to invoke help
only for the sub-command to see more details about it and
its usage.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/5001
|
|
praefect: Migrate track-repository sub-command
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5778
Merged-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
Approved-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
Reviewed-by: Sami Hiltunen <shiltunen@gitlab.com>
Reviewed-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
Co-authored-by: Pavlo Strokov <pstrokov@gitlab.com>
|
|
hook: Stop overmatching hooks errors
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5781
Merged-by: Will Chandler <wchandler@gitlab.com>
Approved-by: Patrick Steinhardt <psteinhardt@gitlab.com>
Approved-by: Pavlo Strokov <pstrokov@gitlab.com>
Reviewed-by: Patrick Steinhardt <psteinhardt@gitlab.com>
Reviewed-by: Pavlo Strokov <pstrokov@gitlab.com>
|
|
Currently the `hook.CustomHookError` type is an alias for the `error`
interface. As a result, when we run `errors.As(err, &customHookErr)`,
this will match against all errors. This leads us to treat all hooks
failures, such as internal errors returned from the `internal/allowed`
API endpoint as custom hook errors.
Convert `hook.CustomHookError` to a struct so that the `errors.As`
matching is constrained to just this type.
Changelog: fixed
|
|
The `resource_exhausted_without_LimitError_detail` subtest in
`TestGitalyServerReturnsError_packObjects` is flaky. This change updates
the existing quarantine to exclude this additional subtest.
|
|
docs: Add protobuf documentation gitlab pages link
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5780
Merged-by: Justin Tobler <jtobler@gitlab.com>
Approved-by: Justin Tobler <jtobler@gitlab.com>
Approved-by: Evan Read <eread@gitlab.com>
Reviewed-by: Will Chandler <wchandler@gitlab.com>
Co-authored-by: Evan Read <eread@gitlab.com>
Co-authored-by: John Cai <jcai@gitlab.com>
|
|
|
|
go: Update module github.com/beevik/ntp to v0.3.2
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5744
Merged-by: Justin Tobler <jtobler@gitlab.com>
Approved-by: James Fargher <proglottis@gmail.com>
Co-authored-by: GitLab Renovate Bot <gitlab-bot@gitlab.com>
|
|
Internal BackupRepos RPC
Closes #4940
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5721
Merged-by: John Cai <jcai@gitlab.com>
Approved-by: Christian Couder <chriscool@tuxfamily.org>
Approved-by: John Cai <jcai@gitlab.com>
Co-authored-by: James Fargher <jfargher@gitlab.com>
|
|
go: Update module github.com/miekg/dns to v1.1.54
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5733
Merged-by: Justin Tobler <jtobler@gitlab.com>
Approved-by: Sami Hiltunen <shiltunen@gitlab.com>
Approved-by: Justin Tobler <jtobler@gitlab.com>
Co-authored-by: GitLab Renovate Bot <gitlab-bot@gitlab.com>
|
|
Add a link to our gitlab pages site that has generated HTML docs for
each of our RPCs.
|
|
'qmnguyen0711/return-resource-exhausted-instead-of-internal-when-spawn-token' into 'master'
Return ResourceExhausted instead of Internal for Spawn token timeout
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5746
Merged-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
Approved-by: karthik nayak <knayak@gitlab.com>
Reviewed-by: James Fargher <proglottis@gmail.com>
Reviewed-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
Reviewed-by: karthik nayak <knayak@gitlab.com>
|
|
Migration of the 'track-repository' sub-command from the old
implementation to the new approach with usage of the
third party library. The invocation of the command remains
the same, but it has more descriptive representation when
help is invoked. Also, it is possible now to invoke help
only for the sub-command to see more details about it and
its usage.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/5001
|
|
The new type represents an error and exists to mimic the same error
message as one generated by the github.com/urfave/cli/v2 package.
We can't use type from the package because it's unexportable.
|
|
'master'
gitpipe: Close channel only after queue cleanup
Closes #5102
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5743
Merged-by: karthik nayak <knayak@gitlab.com>
Approved-by: Patrick Steinhardt <psteinhardt@gitlab.com>
Approved-by: John Cai <jcai@gitlab.com>
Reviewed-by: Patrick Steinhardt <psteinhardt@gitlab.com>
Reviewed-by: karthik nayak <knayak@gitlab.com>
|
|
'master'
objectpool: Enable rev-list based connectivity checks
Closes #4489
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5764
Merged-by: Patrick Steinhardt <psteinhardt@gitlab.com>
Approved-by: Will Chandler <wchandler@gitlab.com>
Approved-by: Pavlo Strokov <pstrokov@gitlab.com>
|
|
into 'master'
featureflag: Fix missing issue for `FixRoutingWithAdditionalRepository`
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5774
Merged-by: Patrick Steinhardt <psteinhardt@gitlab.com>
Approved-by: karthik nayak <knayak@gitlab.com>
|
|
git/stats: Expose last modification time of the gitalternates file
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5775
Merged-by: Patrick Steinhardt <psteinhardt@gitlab.com>
Approved-by: James Fargher <proglottis@gmail.com>
Approved-by: John Cai <jcai@gitlab.com>
|
|
Backup fixes
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5751
Merged-by: Christian Couder <chriscool@tuxfamily.org>
Approved-by: Toon Claes <toon@gitlab.com>
Approved-by: Christian Couder <chriscool@tuxfamily.org>
Co-authored-by: James Fargher <jfargher@gitlab.com>
|
|
When disconnecting a repository from its object pool we first copy
over all objects and then remove the gitalternates file. In order to
verify that there wasn't any race during the update we will perform a
connectivity check after the gitalternates file was removed, and if it
fails, we move the gitalternates file back into place.
This connectivity check was implemented via git-fsck(1). But after some
upstream discussion in the Git project [1] it was noted that doing
connectivity checks is both faster and less memory intensive when using
git-rev-list(1) instead.
Quite a while ago, in v15.5.0, we have thus introduced the new rev-list
based connectivity check, but didn't roll the flag out due to whatever
reason. We have now done so almost a week ago and didn't observe any
issues caused by the new connectivity check. So let's remove the feature
flag and unconditionally enable the new, more performant connectivity
check.
[1]: https://lore.kernel.org/git/YyoljwDIn7PxRlC9@coredump.intra.peff.net/T/#t
Changelog: performance
|
|
We're about to add new heuristics to our repository houskeeping logic
that will repack all objects when alternate object directories have been
changed since the last full repack. Start exposing this information in
the git/stats package.
|
|
Our parsing of the `objects/info/alternates` file isn't quite robust and
has multiple bugs:
- It returns the path of the repository's object directory when the
alternates file is empty or when it contains empty lines.
- It does not know to treat lines starting with "#" as comments.
Refactor the code to use a scanner and handle both empty and commented
lines explicitly to fix these bugs.
Changelog: fixed
|
|
The `RepositoryInfo` structure has a field that exposes alternate object
directories to the caller. In the context of repository housekeeping
we'll also need the information when the alternates file was last
modified.
Let's prepare for this change by converting the simple field into a
structure `AlternatesInfo` that is self-contained. Add tests that
exercise parsing of alternates more thoroughly, where some of the tests
highlight buggy behaviour.
|
|
We're about to add more testcases that care about file mtimes. Let's
thus wrap this functionality into a simple test helper.
|
|
The `FixRoutingWithAdditionalRepository` feature flag is missing its
issue link. Add it.
|
|
into 'master'
praefect/router: Fix creation of repos with additional referenced repo
Closes #5094
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5741
Merged-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
Approved-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
Approved-by: Justin Tobler <jtobler@gitlab.com>
Reviewed-by: Patrick Steinhardt <psteinhardt@gitlab.com>
Reviewed-by: Justin Tobler <jtobler@gitlab.com>
Co-authored-by: Patrick Steinhardt <psteinhardt@gitlab.com>
|
|
Add pages site with Gitaly protobuf docs
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5770
Merged-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
Approved-by: James Fargher <proglottis@gmail.com>
Approved-by: Quang-Minh Nguyen <qmnguyen@gitlab.com>
Co-authored-by: Will Chandler <wchandler@gitlab.com>
|
|
Various small changes to TransactionManager in preparation for bigger changes
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5692
Merged-by: James Fargher <proglottis@gmail.com>
Approved-by: James Fargher <proglottis@gmail.com>
Approved-by: Toon Claes <toon@gitlab.com>
Co-authored-by: Sami Hiltunen <shiltunen@gitlab.com>
|
|
Fix flaky TransactionManager test
Closes #5131
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5765
Merged-by: James Fargher <proglottis@gmail.com>
Approved-by: James Fargher <proglottis@gmail.com>
Co-authored-by: Sami Hiltunen <shiltunen@gitlab.com>
|
|
The official gRPC documentation
(https://grpc.github.io/grpc/core/md_doc_statuscodes.html) clearly
distinguishes between different status codes, in particular:
- RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a
per-user quota, or perhaps the entire file system is out of space.
- INTERNAL: Internal errors. This means that some invariants expected by
the underlying system have been broken. This error code is reserved
for serious errors.
The spawn token system, developed by our team, serves as a means of
managing underlying fork/exec operations. When a process is unable to
create due to spawn token shortage, it can be viewed as a resource
issue, aligning with the definition of the ResourceExhausted error code.
This type of error is common in comparable scenarios, making it a
predictable outcome. It would be appropriate to reserve the Internal
error code for unexpected occurrences instead.
One more thing. In https://gitlab.com/groups/gitlab-org/-/epics/7891,
I'm currently implementing a pushback feature for clients who encounter
specific error codes. By converting these errors to ResourceExhausted,
clients will be forced to perform transparent retries in an exponential
and automatic manner when we add the support in client-side. At present,
we temporarily disable this ability by setting the retry-after time to
0. This will ultimately have a positive impact on the system.
|
|
Spawn token occasionally dumps some logs in two cases:
- The queuing time exceeds a certain threshold
- Fail to acquire the spawn token
Those logs are also dedicated logs outside of major gRPC logs. This
makes debugging and investigating a little bit hard. This commit moves
this kind of logs to gRPC logs. This change consolidates the logs in the
same place. It also removes redundant "spawn token acquired" logs.
|