Age | Commit message (Collapse) | Author |
|
The `forwardClientToServers()` function forwards frames received from
the client to all servers. This forwarding happens async, where any
errors are reported back via an error channel.
This channel has a buffer of 1, and the single consumer of this buffer
will only ever consume a single error from that channel. It follows that
we mustn't ever try to send multiple errors via the same channel, or
otherwise the Goroutine would deadlock trying to send if there's no
consumer.
But we do try to send multiple errors in the case where we fail to
properly receive a message from the client: we try to send both this
error and the error of the errgroup through the channel. This has caused
multiple incidents already where so many Goroutines were deadlocked that
we accumulated multiple gigabytes of memory and eventually ran out of
memory.
Fix the leak by only sending a single error: if receiving the message
failed, we'll return this error. Otherwise, we'll return the error of
the errgroup. With the goleak detector, we can furthermore demonstrate
that there are no more Goroutine leaks after this patch.
|
|
We have been bitten by Goroutine leaks in the GRPC proxying code in some
cases, but until now we didn't have any meaningful way to detect this
kind of leak.
Fix the test gap by using go.uber.org/goleak to detect any leaked
Goroutines after the test. This nicely demonstrates that we currently do
have a Goroutine leak in the proxying code in `forwardClienToServers()`.
|
|
Some of the proxy handler tests do not properly shut down client
connections, which in turn leads to leaks of the GRPC resolver
Goroutines.
Plug the leak by closing all client connections.
|
|
While the error propagation tests already set up table-driven tests with
a nice description for each of the table entries, it's not using
`t.Run()` to execute them.
Refactor the testcase to use `t.Run()`.
|
|
When proxying streams to both primaries and secondaries, then we cancel
all streams as soon as either the client or the primary stream has
caused an error. Because streams are handled in Goroutines, this
cancellation happens asynchronously. We ignore this fact in the proxying
handler and simply return, which in effect means that streams may still
get proxied after the handler returns.
Naturally, this can cause races. Most importantly, it can happen that
the context is still getting accessed even though it's already about to
be wrapped up. One common race observed was that the primary stream was
calling `sentryhandler.MarkToSkip()` while secondaries were still being
proxied and to log a message via `ctxlogrus`. Given that `MarkToSkip()`
modifies the context while `ctxlogrus` wants to access its values, this
is a data race and thus causes a runtime panic.
Fix this race by synchronizing stream cancellation: we do not return
early and also do not modify the context before all streams have been
cancelled.
|
|
Originally, the GRPC proxy code only handled a single destination. With
the extension to also handle proxying to secondary nodes, its error
handling started to become kind of weird: we only get a subset of errors
and have no control over how these errors are handled. This brings along
multiple issues: first, we cannot know downstream in the coordinator
whether a node has failed or not and thus cannot know when to schedule
replication jobs. And second, when proxying to any secondary fails while
proxying to the primary succeeds, the proxy code will regard it as an
overall failure.
It's unfortunately hard to pin down exact semantics for error handling
here. One may wish to ignore errors from secondaries, but potentially
not in all cases. E.g. for transactional RPCs, we may only want to
ignore errors from secondaries until we know that its impossible to
reach quorum anymore because too many nodes have failed already. Coding
this behaviour into the proxy handler doesn't really make a lot of
sense.
This commit instead implements a new parameter for proxy destinations,
which is an optional error handler. If set, this function will be called
for all non-EOF errors. This allows both for custom handling of errors
in the coordinator as well as for filtering of errors.
The error handler is not yet used by any callsite.
|
|
The proxying code where the "real magic" happens currently only gets the
destination's peeked header message as well as the stream it's supposed
to proxy on. We'll need more information than this though in a later
commit, so let's refactor the code to instead pass down the complete
Destination structure.
No change in behaviour is expected from this commit.
|
|
The grpc-proxy proxy tests explicitly print logs both when we're
starting GRPC server and proxy as well as when handling requests. This
doesn't help in normal test runs, and when an error is hit in those
tests then the log messages probably wouldn't help either.
Let's drop those messages altogether to reduce the amount of distracting
messages.
|
|
The grpc-proxy proxy handler tests set up a separate logger for GRPC log
messages, but we're already setting one up as part of the testhelper
configuration. This leads to a spew of log messages when those tests get
executed.
Fix this by just dropping the separate logger in favor of the testhelper
one. While that one is configured to only print messages with the
"panic" log level by default, this is rather considered a feature and
not a bug.
|
|
GrpcErrorHasMessage returns a boolean flag and not always
used in a right way. We change it's signature to exclude
its misuse. Refactored implementation showed a bug in one
of the tests that was fixed: without sending response back
to the stream the client returns io.EOF error for each request.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/2699
|
|
Makefile: Fix conflicting module graphs for external build tools
See merge request gitlab-org/gitaly!3040
|
|
This commit adds an additional test that asserts the error propagation
behavior of the grpc-proxy.
|
|
The gocover-cobertura tools doesn't currently depend on go.mod, so if
one invokes it in a clean working tree we'd instead use the go.mod of
the parent project, which is Gitaly itself. Let's add the missing
dependency to fix this. This also changes how the test.proto for
Praefect is generated to be more consistent with our other protobuf
files.
|
|
The `testhelper.GetTemporaryGitalySocketFileName()` function currently
returns an error. Given that all callers use `require.NoError()` on that
error, let's convert it to instead receive a `testing.TB` and not return
an error.
|
|
The golangci-lint project has a set of default excludes for linters
which are applied by default. This means that we cannot choose ourselves
which linting messages we want to include, as some of them cannot be
controlled by us.
One noteworthy linting check which is excluded right now checks whether
comments for types and functions are prefixed with their respective
name. This is something which we try to enforce in Gitaly, too, but
naturally there's cases where it slips through review.
So let's disable the default excludes such that we can enable this
linting check. As it turns out, there's quite a lot of cases where we
failed to adhere to this style, which this commit also fixes. One thing
which we don't do though is to have per-package comments, so let's
actively ignore it for now.
|
|
|
|
The function `testhelper.GetTemporaryGitalySocketFileName()` generates a
new filename which can then be used for the Gitaly socket. Currently,
it's always using `/tmp` as the directory where those sockets are
generated. As they typically don't get cleaned up after tests, it thus
clutters that directory quite quickly. This commit instead moves them
into the global temporary test directory.
|
|
It is decided to go with all features enabled by default behaviour
for tests. This should help us identify potential problems with
feature flag combinations enabled in production. To verify implementation
without feature flag enabled it should be disabled explicitly in the
test.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/2702
|
|
|
|
|
|
This reverts commit efef3a43c95cfd0762a64fc477cef195407b41b9, reversing
changes made to 76bde563c7d6fb5964b5953e97b0f05750e6ac6c.
|
|
|
|
lint must use Go1.13 as it is highest supported version.
Fix deprecated use of `grpc_logrus.Extract`.
New linters introduced and replacement of old ones in favor
of new ones with same purpose.
Fix of new linter errors.
|
|
Test setup for checking of replication events processing
was incorrect and it lead to failing tests from time to
time (especially with a race detector).
Use of properly initialized global gitay-ruby instance.
Right order of binaries preparation and config setup.
It is imbossible to use multiple Gitaly instances in one
or multiple tests because of global Config used throw the
project.
gRPC services must be registered before Serve call for it.
Closes: https://gitlab.com/gitlab-org/gitaly/-/issues/2592
|
|
govet tool helps to identify a lot of common mistakes such as
unused\dead\unreachable code, wrong context usage\copy by value
of the locking primitives, etc.
It is enabled with disabled check for `uses unkeyed fields`.
|
|
The function `grpclog.SetLogger()` has been deprecated upstream in favor
of a new and more flexible `grpclog.SetLoggerV2()` that can
differentiate the log level. As the standard `log.Logger` interface does
not have the `Error` family of functions implemented, we cannot use it
out of the box as a logger for grpclog. We could implement these
functions ourselves on top of `log.Logger`, but that'd mean we have to
implement three functions. Instead, let's make use of the helper
function `grpclog.NewLoggerV2()`, that simply accepts an `io.Writer`.
|
|
The `grpc.WithTimeout()` function has been deprecated upstream in favor
of using the `context` package. This commit converts all uses of the
derpecated function with using a context created via
`context.WithTimeout()`. As we have abstracted away dialing Gitaly via
`client.Dial()`, this commit also introduces a new function
`client.DialContext()` and makes `client.Dial()` a thin wrapper around
the latter.
|
|
the old `grpc.Errorf` and `grpc.ErrorDesc` functions have been
deprecated in favor of the `grpc.status` package. This commit converts
the former ones with calls to `status.Errorf` and
`status.Convert(err).Message()` instead.
|
|
As the `grpc.Codec` interface has been deprecated upstream, this commit
converts to use the new `encoding.Codec` interface instead.
|
|
The `grpc.Codec` interface has been deprecated upstream with commit
5ba054bf (encoding: Introduce new method for registering and choosing
codecs (#1813), 2018-01-23) in favor of the new `encoding.Codec`
interface. We're still using the old interface throughout Gitaly, and in
fact we cannot currently fully migrate to using the old interface as
there is no way to create `grpc.ServerOption`s for the new interface.
In order to allow the conversion of most of our code to the new
interface, let our own codec type satisfy both the old `grpc.Codec` and
new `encoding.Codec` interface. The only thing that has changed between
both is the `String()` function, which has been renamed to `Name()`. It
is thus trivial to support both at the same time.
As the new `Codec` type collides with the `Codec()` function, rename the
latter to `NewCodec()`.
|
|
|
|
|
|
As golang.org/x/net is outdated and we don't need to use
a context package from it anymore it is replaced with
context from standard library.
Makefile changed to use proper protoc compiler.
|
|
Stream interceptor made unexported to omit potential use
in production code.
Closes: https://gitlab.com/gitlab-org/gitaly/issues/2434
|
|
New job 'lint' added to 'test' stage to perform static code analysis
using a common approach with golangci-lint tool described at
https://docs.gitlab.com/ee/development/go_guide/#automatic-linting
Closes: https://gitlab.com/gitlab-org/gitaly/issues/2253
|
|
|
|
The stream director has too many return values that makes it hard to
read. We want to add more values the stream director can return, so it's
cleaner to have it return something that encapsulates those other return
values.
|
|
|
|
|
|
|
|
|
|
|
|
This change absorbs the previously vendored library grpc-proxy into
the Gitaly project for breaking changes needed for Praefect
requirements. There were slight modifications needed to make the
project work within Gitaly:
- Go import paths were rewritten to refer to the new Gitaly location
- Test service code was moved to testdata folder to prevent CI linter warnings
- CI-lint overrides were added to exclude inherited technical debt (see follow up issue #1663)
- Removal of unused/ineffective code detected by linter.
|