diff options
author | Justin Tobler <jtobler@gitlab.com> | 2022-10-12 23:20:20 +0300 |
---|---|---|
committer | Justin Tobler <jtobler@gitlab.com> | 2022-10-13 02:01:34 +0300 |
commit | 714e7c99b0cb930ba08ecd0660834144cebeeabe (patch) | |
tree | 52f261e5ada750e236ecb765f2f26caf3ceac772 | |
parent | c4efe30d38cc916682aa405228a06c3d7c91caff (diff) |
Praefect: Update voter state on failed node RPCjt-praefect-transaction-error-handler
Currently when a secondary node RPC fails the error is ignored and the
transaction continues waiting for additional votes even if there are not
enough outstanding votes to reach quorum. If quorum becomes impossible,
due to the failed node, the transaction hangs until the context gets
canceled. This is not desirable as ideally once it has been established
that there is not enough outstanding votes to reach the required
threshold specified by the transaction, the transaction should be
canceled. This change adapts the `ErrHandler` function of the secondary
nodes to cancel the voter in the transaction associated with the failed
RPC. In this process the voter's result state is updated to
`VoteCanceled` and the subtransaction is checked to see if quorum can
still be achieved. If quorum is impossible the vote is failed and the
voters are unblocked. If quorum is still possible the voters remain
blocked waiting for further votes to decide the outcome.
-rw-r--r-- | internal/praefect/coordinator.go | 15 |
1 files changed, 11 insertions, 4 deletions
diff --git a/internal/praefect/coordinator.go b/internal/praefect/coordinator.go index e000c24cf..680ba7eb5 100644 --- a/internal/praefect/coordinator.go +++ b/internal/praefect/coordinator.go @@ -467,10 +467,17 @@ func (c *Coordinator) mutatorStreamParameters(ctx context.Context, call grpcCall ctxlogrus.Extract(ctx).WithError(err). Error("proxying to secondary failed") - // For now, any errors returned by secondaries are ignored. - // This is mostly so that we do not abort transactions which - // are ongoing and may succeed even with a subset of - // secondaries bailing out. + // Cancels failed node's voter in its current subtransaction. + // Also updates internal state of subtransaction to fail and + // release blocked voters if quorum becomes impossible. + if err := c.txMgr.CancelTransactionNodeVoter(transaction.ID(), secondary.Storage); err != nil { + ctxlogrus.Extract(ctx).WithError(err). + Error("canceling secondary voter failed") + } + + // The error is ignored, so we do not abort transactions + // which are ongoing and may succeed even with a subset + // of secondaries bailing out. return nil }, }) |