Age | Commit message (Collapse) | Author |
|
limithandler: Refactor package to not use global state
See merge request gitlab-org/gitaly!4197
|
|
The functions `testdb.NewDB()` and `testdb.GetDBConfig()` both stutter.
Rename them to `testdb.New()` and `testdb.GetConfig()`, respectively.
|
|
The glsql package hosts both the implementation used to connect to a
"normal" production database as well as the logic to connect to a test
database. Mixing production and test code like this is bad practice
though.
Move test-related functions into the "testdb" package to fix this. This
change requires the glsql tests to use a separate package to break a
cyclic dependency between the "glsql" and "testdb" package. And
furthermore, the linter kicks in after this move and (legitly) complains
about missing error checks and closes for rows.
|
|
The limithandler package uses global variables to both track Prometheus
metrics and to handle its configuration. Especially the latter is really
fragile, where we need to set up state of the limithandler in various
places by calling global functions.
Fix this by making the `LimiterMiddleware` self-contained, where it
hosts all configuration as well as the Prometheus metrics.
|
|
The limit handler is currently created ad-hoc by Gitaly's server code.
We're about to change the way it hosts Prometheus metrics though such
that we can get rid of a set of global variables, and this will require
us to make sure that there's only a single instance of the limiter.
Inject the limit handler as a dependency when creating the server to
prepare for this.
|
|
praefect: add missing primaries check
See merge request gitlab-org/gitaly!4176
|
|
Add a praefect startup check to get the number of repositories that have
a missing primary. This check fails if there are any repositories
are deemed unavailable.
Changelog: added
|
|
We're about to disallow use of "normal" contexts created via the
`context` package given that they do not perform sanity checks for
feature flags. Instead, contexts should be created via the testhelper
package.
Convert tests to use the testhelper context.
|
|
bootstrap: Rewrite tests to not use timeouts
See merge request gitlab-org/gitaly!4188
|
|
praefect: replicate immediately after track-repository
Closes #3892 and #3909
See merge request gitlab-org/gitaly!4183
|
|
Adds some text that explains what is happening with replication. The
command either creates replication jobs in the queue, or it replicates
immediately. In both of these cases, print out helpful messages.
This also adds a writer in the command so we don't rely on the logger so
the output is more reader friendly.
Changelog: added
|
|
track-repository is a subcommand that tracks a repository in the
praefect database. This is generally used by an admin when a repository
is on disk but not tracked in the praefect db. Once tracked, replication
to the other nodes takes about 5 minutes and requires the Praefect
process to be running. This MR adds a flag that, if set, will immediately
process a relication job and wait for it to finish.
Changelog: added
|
|
The bootstrapping tests still need to use timeouts because of the grace
period, which is the duration we give the server to exit before we force
the upgrade. Convert the interface to instead accept tickers such that
we can get rid of using timeouts in our tests.
|
|
track-repository inserts database records for praefect to track the
repository, but doesn't insert any replication jobs. When a mutator RPC
comes through, it will trigger replication but we want to make sure that
no matter what the reositories are replicated to its secondaries.
Also refactors the tests to no longer test sql replication, as that is
no longer supported.
Changelog: fixed
|
|
The backup restoration tests aren't testing feature flags for atomic
creation and removal of repositories. This went undetected because the
tests used `context.Background()` to create the context instead of using
`testhelper.Context()`, which would've set up additional sanity checks
for feature flags.
Exercise the feature flags. This uncovers a bug in the test setup if
running with Praefect: because the repository entry does not exist in
the database, Praefect intercepted the repository removal and never
forwarded it to Gitaly. As a result, at the time we try to create the
repository the old repository still exists. This used to work alright,
but now fails with atomic repository creation enabled because the target
repository must not exist yet.
Fix this setup by creating the repository with `CreateRepository()`
first.
|
|
Tests should always use the testhelper package to create contexts such
that we can benefit from the feature flag sanity checks. Convert backup
tests to do so.
|
|
Use better parallelization for tests by adding `t.Parallel()` calls as
required. Like this, tests drop from 28 seconds of execution time to
only 8 seconds.
|
|
lint: Forbid use of context timeouts in tests
See merge request gitlab-org/gitaly!4162
|
|
The gitaly-lfs-smudge command is a smudge filter for Git which will
replace contents of LFS pointers with the actual LFS object's contents.
To do so, we need to request the object's contents from Rails via an
HTTP request. The tests exercising this code all of a sudden started
failing due to leaking Goroutines, where the leak happens in the HTTP
handling code. And sure enough: we never close the `http.Response` body,
which may likely be the root cause here.
Fix this by always closing the body. While I have no idea why leaks
started to happen just now, chances are high that this fixes the new
flake.
|
|
The function which processes stale jobs takes a configurable duration
for how often the monitoring loop should run. This makes it hard to
control the ticker from the outside, most importantly in tests.
Refactor the code to instead accept a ticker and amend tests to use a
manual ticker to get rid of timeouts.
|
|
So far we used JSON serialization to invoke submodule
sub-command of the git2go executable. We are moving
towards usage of the gob serialization to do the same
and use stdin and stdout of the process to get an input
and return the result of the execution. gob doesn't
require us to do mapping of the fields and allows
passing typed errors between the processes.
In order to support smooth migration we should support
both serialization formats. We can drop support of
the JSON in the next release.
Part of: https://gitlab.com/gitlab-org/gitaly/-/issues/3232
Changelog: changed
|
|
remove-repository is a destructive command. In order to make it safer to
use, make it print what it intends to destroy by default. This is a "dry
run". Only when --destroy is passed in will the command actuall remove
database records and delete the repository on disk.
Changelog: changed
|
|
The naming of the `GrpcEqualErr()` helper function is quite irregular
compared to other helper functions we have. Most importantly, it's
significantly different from `RequireGrpcCode()` and may thus be easy to
miss.
Rename it to `RequireGrpcError()` to clarify this.
|
|
Personally, whenever I want to compare gRPC errors and/or messages, I
stand there asking myself where the helper I want is located again. This
is caused by a split we have in test helpers related to gRPC: while we
do have `testhelper.RequireGrpcError()`, checking for actual errors is
done via `testassert.GrpcEqualErr()`. I always have to look up which of
both does what and where it's located.
Fix this by absorbing the helpers from the testassert package into the
testhelper package.
|
|
Consolidate the feature flags context APIs such that we have the same
set of functions for creating feature flags in incoming and outgoing
contexts.
|
|
Some tests use CreateRepository et al, but don't verify that they still
work with the corresponding feature flag which enables atomic repository
creation. Some don't care but will start to fail when tests are required
to explicitly inject all feature flags which are used, and others do
care but weren't addressed.
Add missing feature flags as required to prepare for explicitly required
feature flags.
|
|
The testhelper has functions to create test loggers for us. These are
awkardly named though, and furthermore we have a global variable which
contains a reference to one of the functions which creates the logger
for us, which is bad design. Furthermore, this global variable isn't
modified by anything anymore, making it essentially the same as the
function it's aliasing.
Rename functions to match Go best practices and remove the global
variable.
|
|
glsql: Compile DSN from configuration
See merge request gitlab-org/gitaly!4138
|
|
Add 'praefect metadata' subcommand
Closes #3481
See merge request gitlab-org/gitaly!4122
|
|
Praefect doesn't yet have any way of fetching a repository's metadata.
With the addition of replica path, this is becoming increasingly important
as users may need to figure out where on the disk a given repository is
stored. They may also want to figure out which repository on the disk a
replica actually belongs to. Likewise, it's currently not possible to
really see which Gitaly node is acting as a primary of a repository. The
dataloss subcommand does provide some of this information but it only
returns information when a repository is not fully replicated. To address
this gap, this commit adds 'praefect metadata' subcommand that can be used
to fetch a repository's metadata. Users can fetch the metadata either by
the repository's ID or by its virtual storage and relative path to accommodate
all possible queries.
Changelog: added
|
|
praefect: Add database read/write check
Closes gitlab#328493
See merge request gitlab-org/gitaly!4121
|
|
dial-nodes: add timeout flag
Closes #3936
See merge request gitlab-org/gitaly!4123
|
|
A recent refactor got rid of a default timeout for the dial nodes
subcommand. This change adds a flag -timeout that can be passed into the
command so that it doesn't have to wait the full 30 seconds to get a
timeout error back.
Changelog: changed
|
|
The change removes ToPQString method and replaces
its usages with glsql.DSN function.
|
|
In order to check if praefect can read/write to a database, we can use
the hello_world table. We can try inserting a row, and reading from it.
Changelog: added
|
|
One of our testcases creates a Gitaly repository twice, and this will
lead to an error when we convert `CreateRepository()` to disallow
recreation of repositories. Fix the test to only create it once.
|
|
praefect: Add text clarifying what list-untracked-repositories output signfiies
Closes #3890
See merge request gitlab-org/gitaly!4055
|
|
Collect summary of backup errors
See merge request gitlab-org/gitaly!4049
|
|
Use verbose flag in Checks
See merge request gitlab-org/gitaly!4060
|
|
When an admin runs the praefect startup checks, it would be useful to
print more information about each check. This change refactors some of
the code to allow that to happen, and also a -q flag that the user can
pass to silence this detailed outpt.
Changelog: changed
|
|
list-untracked-repositories currently outputs json rows for repositories
that exist on disk but are untracked by the database. It's unclear what
these rows are. Add helper text to signal to the user what these rows
mean.
Changelog: changed
|
|
Add a test to ensure that the
PrometheusExcludeDatabaseFromDefaultMetrics controls whether or not it
gets registered in the main metrics registry.
|
|
By default, when metrics are enabled, then each Praefect will expose
information about how many read-only repositories there are, which
requires Praefect to query the database. First, this will result in the
same metrics being exposed by every Praefect given that the database is
shared between all of them. And second, this will cause one query per
Praefect per scraping run. This cost does add up and generate quite some
load on the database, especially so if there is a lot of repositories in
that database, up to a point where it may overload the database
completely.
Fix this issue by splitting metrics which hit the database into a
separate endpoint "/db_metrics". This allows admins to set up a separate
scraper with a different scraping interval for this metric, and
furthermore it gives the ability to only scrape this metric for one of
the Praefect instances so the work isn't unnecessarily duplicated.
Given that this is a breaking change which will get backported, we must
make this behaviour opt-in for now. We thus include a new configuration
key "prometheus_use_database_endpoint" which enables the new behaviour
such that existing installations' metrics won't break on a simple point
release. The intent is to eventually remove this configuration though
and enable it for all setups on a major release.
Changelog: added
|
|
commit: Refactor recursive `GetTreeEntries()` for efficiency
Closes #3888
See merge request gitlab-org/gitaly!4052
|
|
Start collecting errors so that these can be summarised at the end of
the backup run. Previously errors were only reported in logging which is
often ignored.
Changelog: changed
|
|
praefect: Praefect should not hang if db is not reachable
Closes #3678
See merge request gitlab-org/gitaly!4034
|
|
Most of the linting exception we have are about missing documentation of
public variables or comments. Let's inline all of these -- it's hard to
keep track of exceptions and keep the list up-to-date. By having a
`//nolint` directive at the site where the document is missing it
becomes a breeze to directly clean it up. Furthermore, it also acts as a
reminder that one might want to add a comment when one stumbles over any
of these comments.
|
|
Clean up various exceptions for the errcheck linter, either by removing
exceptions which don't apply anymore, by handling the errors or by
ignoring errors.
|
|
The Praefect printed log message about listening addresses
too far from the actual point when it starts to listen on
them. This change moves the log message to the proper place.
|
|
If the database is not accessible the at the time of Praefect
start it hangs forever and doesn't report much information
about the reason. The attempt to fix it by using PingContext
with timeout set on the context failed because of the issue
in the lib/pq that is not yet fixed. The problem is solved
by a separate goroutine that issues the PingContext and main
goroutine watching the timeout. If timeout occurs it returns
an error up to the caller. Another goroutine won't be over
and hangs, but we don't care much as it is a critical error
and the service will be terminated.
Closes: https://gitlab.com/gitlab-org/gitaly/-/issues/3678
|