Age | Commit message (Collapse) | Author |
|
|
|
Amend the docs I added in 86c8480e8 (Feature flag rollout doc: expand
on post-100% steps, 2021-01-12) to mention the example of
0ff3ee285 (Remove on-by-default go_user_delete_{branch,tag} feature
flags, 2021-01-21) instead of fbc9f83ab (Remove Ruby code for 100% on
user_delete_{branch,tag} in Go feature, 2021-01-15).
As noted in 0ff3ee285 what I did in fbc9f83ab was wrong &
dangerous. Let's not recommend it.
|
|
I screwed this up in [1] and [2] and as a result the feature didn't
get enabled 100% on January 14th like I thought, but on January 27th
when the changes [3][4] to remove the feature flag itself went
live ([4] being the relevant change).
1. https://gitlab.com/gitlab-org/gitaly/-/issues/3412#note_485304117
2. https://gitlab.com/gitlab-org/gitaly/-/issues/3413#note_485303898
3. https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3035
4. https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3033
|
|
Re-flow a paragraph I added in 40f953069 (Feature flag rollout doc:
rewrite & make better use of template, 2020-12-15).
|
|
Makefile: trivial cleanup of git build flags
See merge request gitlab-org/gitaly!3184
|
|
Prevent concurrent reconciliation
See merge request gitlab-org/gitaly!3182
|
|
Fix racy test TestCoordinator_grpcErrorHandling
Closes #3464
See merge request gitlab-org/gitaly!3191
|
|
blob: Port GetAllLFSPointers and GetLFSPointers to Go
See merge request gitlab-org/gitaly!3173
|
|
|
|
This commit ports the Ruby implementation of the GetLFSPointers RPC to
Go. The new implementation is hidden behind a feature flag.
|
|
This commit ports the Ruby implementation of the GetAllLFSPointers RPC
to Go. The new implementation is hidden behind a feature flag.
|
|
All three RPCs to find and return LFS pointers are streaming RPCs. Due
to RPC message size limitations, we need to make sure to only send a
subset of found LFS pointers in each message.
This commit introduces a new function `sliceLFSPointers()` which does
exactly that: given a slice of LFS pointers, it'll execute a callback
function with batches of this slice. The maximum slice size is the exact
same as in the Ruby code, namely 100.
Except for the newly added tests, this function is not yet used.
|
|
We're about to port the implementation of the blob service's LFS-related
RPCs to Go. The most important building block for this is to be able to
identify LFS pointers, either by walking a revision graph or by reading
blob IDs directly.
This commit introduces two new functions which implement this with
`findLFSPointersByRevisions()` and `readLFSPointers()`. While the former
one searches the revision graph for LFS pointers, the latter one accepts
a list of potential LFS pointers and returns all which in fact are valid
LFS pointers.
Except for the newly added tests, both functions are not yet used.
|
|
From the two LFS pointer related RPCs we have, two of them call a
function to validate the request while the third one doesn't.
Furthermore, the third one doesn't annotate its error message with its
RPC name like the others do.
While it doesn't hurt, it's at least inconsistent. This commit thus
fixes the inconsistency by moving the validation into a separate
helper function.
|
|
The BlobService server does not currently have access to the Gitaly
config. This commit makes it available such that we can use it in a
follow-up commit.
|
|
The LFS pointers tests have quite a lot of repetition of test data.
Refactor this to instead share a single global map of LFS pointers to
make tests more focussed and easier to extend.
|
|
The LFS pointer RPCs are currently undocumented. This commit adds
documentation for all associated functions and their messages.
|
|
featureflag: Enable first batch of transactional RPCs
See merge request gitlab-org/gitaly!3189
|
|
TestCoordinator_grpcErrorHandling has a race where in the
"primary error gets forwarded" error case. The mock server does acks
a call as done before setting a flag indicating the node was called.
This causes a race where the primary might return an error before the
secondary calls have been recorded as having arrived. This causes the
test to then flake as the primary error is returned and the assertions
are done before the secondary calls have been recorded.
This commit fixes the issue by recording the call as having arrived
prior to marking decremeting the counter of the WorkGroup.
|
|
Add unique index for delete_replica replication events
See merge request gitlab-org/gitaly!3183
|
|
Praefect's reconciler might run concurrently on multiple Praefect
nodes. It's using the default isolation level of read committed. Due
to this, none of the concurrent transactions will see the jobs queued by
other transactions. The queries will see the same state and schedule the
same jobs. To fix this, this commit introduces a transaction scoped
advisory lock that the reconciliation queries attempt to acquire. If they
fail to acquire the lock, they won't schedule any jobs. This prevents them
from scheduling duplicate jobs.
|
|
coordinator: Log cause of queued replication events
See merge request gitlab-org/gitaly!3190
|
|
It is hard to figure out why a given replication event was scheduled as
we do not provide any log messages about queued events at all. Fix this
by adding a log message which mentions both type and cause of queued
replication events.
|
|
Enable transactional behaviour for the first batch of RPCs. The toggled
RPCs have all been enabled in production either since January 27th or
since February 1st.
|
|
There should always be only one 'delete_replica' event scheduled to
prevent accidentally deleting too many replicas of a repository. If
we allow more than one, we could schedule multiple deletion jobs if
the assignments are shuffled around. This could lead to Praefect
deleting all of the replicas. To prevent this from happening, this
commit adds a unique index that ensures we only have a single
delete_replica job scheduled for any given (virtual_storage, relative_path)
tuple. This index can additionally be used by the reconciler to check
whether a deletion job already exists or not.
|
|
Don't elect replicas pending deletion in per repository elector
See merge request gitlab-org/gitaly!3177
|
|
ruby: Drop ported RPCs
See merge request gitlab-org/gitaly!3167
|
|
Fix handling of libgit2 build updates
See merge request gitlab-org/gitaly!3186
|
|
The Repository#git_env function sets up both the GL_PROTOCOL and
GL_REPOSITORY environment variables. Nowadays, hooks do not use these
values anymore, as they instead resort to the HooksPayload structure.
Furthermore, the two remaining callsites which use this function
shouldn't invoke hooks at all.
Remove the function.
|
|
The repository implementation and its specs have a set of helpers which
are unused. Remove them.
|
|
The feature flag for the Go port of FetchSourceBranch is gone since
v13.9. Remove the Ruby implementation.
|
|
The feature flag for the Go port of UserMergeToRef is gone since v13.7.
Remove the Ruby implementation.
|
|
The feature flag for the Go port of UserMergeBranch is gone since v13.9.
Remove the Ruby implementation.
|
|
If the libgit2 version is updated from an existing directory in
`_build/deps/libgit2`, previously we would fetch the branch but the
current version would remain intact. We now call a `git switch` to
switch to the proper branch.
This came out of
https://gitlab.com/gitlab-org/gitaly/-/merge_requests/3181. To test,
switch between `master` and that branch and run `make test`.
|
|
[ci skip]
|
|
Removal of the config.Config from production code
See merge request gitlab-org/gitaly!3179
|
|
Revert fd4d7395b (makegen: Fix building Git on musl-based platforms,
2020-05-14) and stop passing NO_REGEX=YesPlease.
This means we build with an old version of awk's regex engine, so at
the very least we shouldn't do so outside of musl libc.
But I understand from @pks-t that the musl libc experiment is
something he abandoned, so we can just drop this for now.
As can be seen from the reverted commit the comment I'm removing only
applied to that line, it just became ambiguous in a refactoring that
happened in eace779a9 (Makefile: Get rid of templating, 2020-06-18).
|
|
This was added in 0c1f53ee0 (Makefile: Always build Git with PCRE2
support, 2020-09-08), but since my upstream
git-vcs/git@e6c531b8081 (Makefile: make USE_LIBPCRE=YesPlease mean v2,
not v1, 2018-03-11) the USE_LIBPCRE=YesPlease form has been
preferred. That's been released with all versions of git since 2.18.0,
much newer than our current 2.29.0 requirement.
USE_LIBPCRE2=YesPlease in currently just an alias for
USE_LIBPCRE=YesPlease, so this changes nothing. It's just
future-proofing and using the switch that's advertised in the
documentation.
|
|
Removal of config.Config from cache package
See merge request gitlab-org/gitaly!3172
|
|
doc: Add documentation on object pools
See merge request gitlab-org/gitaly!3060
|
|
Infinite attempts for delete_replica jobs
See merge request gitlab-org/gitaly!3174
|
|
delete_replica type jobs are emitted by the reconciler for deleting
repository replicas from unassigned storages. When processing the jobs,
Praefect first deletes the repository from the disk and then updates the
database state to match. If deleting the repository from the disk
succeeds but updating the database state fails, Praefect would reattempt
the job. If all of the attempts fail and Praefect drops the job, the
replica would have been deleted from the disk but the database state
still implies that the storage should contain a replica of the repository.
We can't allow for that to happen as the reconciler could schedule additional
delete_replica jobs deleting the last copies of the repository as it still
sees the invalid database state indicating a repository exists on the storage
where it was deleted from. To avoid this scenario, we give infinite attempts
to delete_replica type jobs to ensure we never delete a replica from the disk
without having a record indicating it might have been done or is about to be
done. Reconciler allows only one delete_replica job to be scheduled for a given
repository, which then avoids the scenario where we delete all replicas based
on inconsistent database state.
|
|
Reconciler tests were using PostgresReplicationQueue to manipulate
replication job states. This does not work properly when one wants
to test historic states like cancelled, dead and completed as the
queue deletes them instead of storing them in the database. We may
have some historic state lying around though. To test against all
possible values, let's just simply set the database state as we want
it to be in the test with a custom query.
This also changes the tests to query the replication jobs directly from
the queue instead of using PostgresReplicationQueue to dequeue the jobs.
PostgresReplicationQueue hides some states as it performs job deduplication
with update jobs.
|
|
featureflag: Remove UserFFBranch feature gate
See merge request gitlab-org/gitaly!3180
|
|
Through b3a047a56 (Merge branch 'go_user_ff_branch_by_default' into 'master', 2021-01-27)
the UserFFBranch flag defaults to true. Which means that the feature
gate is up for removal if the changes are stable, which they are.
Thus this change removes the feature gate, and makes sure the tests
succeed. In the next release the Ruby implementation can be removed.
|
|
After a couple rounds of refactorings we can completely
remove config.Config global variable from the production
code. Now it should be passed as a parameter or injected
as a dependency to the places where it is required.
Now we mark it as deprecated.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/2699
|
|
coordinator: Fix replication and early failures when proxying transactional RPCs
See merge request gitlab-org/gitaly!3158
|
|
blob: Fix filtering of new LFS pointers
See merge request gitlab-org/gitaly!3175
|
|
|
|
When proxying streams in a transaction, we do frame-wise proxying to
both primary and all secondaries at once. This is done by kicking off a
bunch of Goroutines to forward from client to primary/secondaries, and
from primary to client. This brings up some interesting questions about
error handling though. While it's clear that we should be aborting the
proxying operation if the client errors out, as the channel is now
essentially broken, and it's also clear that, if proxying to the primary
fails, we need to cancel the stream. But right now, we also cancel
streams if proxying to the secondaries fail.
This last failure mode is quite debatable, and in fact it can cause
weird bugs. This is mostly because while we wait for all secondaries to
terminate, even if any of them errors out, the same isn't true for the
primary: we'll happily cancel the stream when proxying to secondaries
has concluded, if at least one of them has returned an error, even if
proxying to the primary is still ongoing.
As a mitigation, this commit thus starts to ignore any errors caused by
proxying to secondaries -- we already store their errors in a local map
anyway and consider them when creating replication jobs. It's not an
ideal fix though: when enough secondaries fail, then we know that
transactions cannot reach quorum anymore and should thus be able to just
abort the transaction. If necessary, we can do this in a follow-up fix.
|