Age | Commit message (Collapse) | Author |
|
Fix typos found by typos-cli(https://github.com/crate-ci/typos). Some
affected tests are adjusted.
There are a bunch of other typos are ignored, including
* CHANGELOG.md
* NOTICE
* internal/.../migrations/20201208163237_cleanup_notifications_payload.go
* other intended typos or false positives
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
|
|
git: Remove the test repository
See merge request https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6273
Merged-by: Will Chandler <wchandler@gitlab.com>
Approved-by: karthik nayak <knayak@gitlab.com>
Approved-by: Will Chandler <wchandler@gitlab.com>
Reviewed-by: Patrick Steinhardt <psteinhardt@gitlab.com>
Co-authored-by: Patrick Steinhardt <psteinhardt@gitlab.com>
|
|
The benchmarking deployments for Gitaly still use the gitlab-shell
configuration. This setting is not required anymore though, and we
already configure the GitLab secret explicitly. Remove the section.
|
|
We have converted all of our tests to generate their test data at
runtime. Furthermore, all of our benchmarks use a dedicated benchmarking
repository. This means that our test repository is completely unused
now.
Remove the Makefile target to clone and set up the test repository.
|
|
A while ago we have introduced the `ignore_gitconfig` configuration. If
set, we will override GIT_CONFIG_SYSTEM and GIT_CONFIG_GLOBAL as well as
override XDG_CONFIG_HOME so that Git won't pick up gitconfig files found
in any of these scopes. The goal of this is that we only ever use the
Git configuration that is found either in Gitaly's `config.toml` or in
the repository-local gitconfig.
This toggle has been enabled in all distributions unconditionally
already and was scheduled for removal in v16.0. So let's remove that
toggle and unconditionally ignore any global- or system-level gitconfig
files.
Changelog: removed
|
|
Instead of using Ruby, build the praefect config file with `envsubst`.
|
|
Replace the Ruby script _support/test-boot with a small tool written in
Go.
Issue: https://gitlab.com/gitlab-org/gitaly/-/issues/4636
|
|
|
|
One of our CI jobs is testing that Gitaly works correctly when invalid
proxies have been configured. This test only checks the `rubyserver`
package though, which is about to be removed. The tests are thus not
required anymore.
Remove the CI jobs and the infrastructure supporting it.
|
|
|
|
Currently we clone Gitaly at HEAD and then peel the requested revision
and check that out. This works fine if you're using HEAD as your rev,
but a branch name will fail as we haven't created a local branch with
that name yet.
To resolve this, perform the initial clone using the requested revision
before peeling.
|
|
With 51ea0f5 (Add support for the gssapi-with-mic auth method,
2023-01-23), gitlab-shell now requires native gssapi libraries to build.
Update the ansible task to install the required packages.
|
|
Network `test-network` preventing multiple benchmarking instances from
being created and is unused. Remove it.
|
|
Add description of benchmarking output to the README.
|
|
Now that we have scripts to run benchmarks and profile Gitaly, we can
update the `benchmark` role to invoke them.
By default we clear the kernel page cache and run the profiling script,
but if needed these can be disabled with `./run-benchmarks --extra-vars
"profile=false clear_page_cache=false`.
`bench_duration` defaults to be slightly longer than `profile_duration`
to ensure that `ghz` is sending traffic for the full time we're
profiling.
`ghz_wait_duration` controls how long to wait before the `Run
ghz` task is considered to have failed. When writing HTML output `ghz`
may take 30+ seconds to finish, so a sizeable wait period helps prevent
spurious failures without adding delays if it exits sooner. Currently
we are using JSON output which does not add this delay.
|
|
Understanding where Gitaly and Git are spending their time, as well as
general system health are critical to useful benchmarking. Add a script
to the Gitaly node to run `perf` and a number of `libbpf-tools`
utilities while the node is under load.
Running this introduces a performance overhead of ~10%, mostly from
`perf`, which is run twice simultaneously. Once to profile only Gitaly
using `--call-graph=fp`, which works well with Golang, and again for the
system as whole using `--call-graph=dwarf`, which is more accurate for
Git and other C programs. The DWARF output is ~10x larger than function
pointer, causing flamegraphs built from it to take proportionately
longer, typically longer than the duration profiled.
The `libbpf-tools` utilities used are a bit of a grab bag, but quite
lightweight to run. This are BPF CO-RE utilities that run much more
lightly than `bcc`, which can be a resource hog. These focus primarily
on determing the amount of delay block I/O imposes, which may be useful
in determining how much of a penalty slower storage imposes on Gitaly.
Currently the only RPC being tested is `FindCommit`, which being
read-only hits the kernel page cache 100% of the time after the first
request.
- biolatency: Histogram of the latency of block I/O operations for each
attached disk.
https://github.com/iovisor/bcc/blob/master/tools/biolatency_example.txt
- biotop: List of processes performing the most block I/O.
https://github.com/iovisor/bcc/blob/master/tools/biotop_example.txt
- execsnoop: List of all processes forked by Gitaly and their
arguments.
https://github.com/iovisor/bcc/blob/master/tools/execsnoop_example.txt
- cpudist: Histogram of durations that programs executed by the kernel,
or with the `--offcpu` flag, how long they were slept.
https://github.com/iovisor/bcc/blob/master/tools/cpudist_example.txt
- cachestat: Statistics regarding kernel page cache hit rate.
https://github.com/iovisor/bcc/blob/master/tools/cachestat_example.txt
Note that the links above are to the `bcc` documentation for each tool
used. The arguments the `bcc` version takes may vary a bit from what
`libbpf-tools` allows, but they perform the same task.
Further work is needed for this be fully useable, most notably tracking
CPU and memory utilization. This is difficult with polling tools like
Prometheus's `node-exporter`, as most of the system load is typically
from short-lived Git processes that may spawn and exit between polling
intervals.
|
|
Using `ghz` requires a large number of parameters which would be
quite painful to write in YAML. Create a wrapper script for the client
host that does some basic parameter verification and then invokes `ghz`.
Currently the --concurrency[0] and --rps[1] values used are arbitrary.
In the future we should look into configuring these appropriately per
RPC. For example, 100 `OptimizeRepository` requests per second is not a
useful scenario.
[0] https://ghz.sh/docs/options#-c---concurrency
[1] https://ghz.sh/docs/options#-r---rps
|
|
To run requests via `ghz`, we need to provide a JSON-formatted file with
the parameters being passed to the RPC. As an initial example, let's add
a `FindCommit` request for `git.git`.
Revision `bWFzdGVy` is `master` in base64.
|
|
We need to loop over each RPC bench and its associated repos, but
Ansible's syntax to dynamically reuse tasks is a bit annoying and
requires that we split out each section that will be repeated into a
separate file and use `include_tasks`. To do this, we invoke
`rpc_loop.yml` from `main.yml` for each RPC, and then `bench.yml` for
each repo we're testing with the RPC.
|
|
`.rubocop_todo.yml` is loaded from the ruby directory via `make
rubocop`. It's easier just to fix these 3 errors than to try to fix
the script.
|
|
|
|
Document the basic steps for using the benchmarking scripts.
Changelog: added
|
|
Output from `ghz` provides raw latency numbers, but understanding where
Gitaly is spending its time requires more detail. To provide this, we
will profile it with Linux `perf` and `libbpf-tools`. The latter are
C versions of the older `bcc` BPF monitoring tools and are extremely
lightweight to run in terms of both memory and CPU. `bcc` uses a
python wrapper which can add significant load, particularly if multiple
tools are run at once.
`perf` output will be converted into flamegraphs.
|
|
Directly running Gitaly via Ansible as an asynchronous task would be
painful as they require a hard deadline and can be a bit flaky. In
addition, setting resource limits to the correct values would a pain.
To avoid this, create a systemd service for Gitaly so we can use the
`ansible.builtin.systemd` module to control it. Logs can still be
easily retrieved using `journalctl`, and we can directly set resource
limits to match those used in Omnibus GitLab[0].
[0] https://gitlab.com/gitlab-org/omnibus-gitlab/-/blob/73892599/config/templates/runit/runsvdir-start.erb#L20-37
|
|
Build Gitaly itself, along with Gitaly-Ruby and gitlab-shell. The
versions to use for Go, Ruby, and Gitaly are all parsed on the client
node, so that role must be run first.
To support installing arbitrary version of Ruby, we use ruby-build to
compile it from source. To ensure we are using the most relevant Git
version, we build Gitaly with the bundled Git option enabled. Currently
gitlab-shell is built so Gitaly can validate its presence, but is not
used.
|
|
The client node will need to send gRPC traffic to Gitaly. We will use
ghz[0] to do this, as it has extensive and well-documented[1] options,
and supports streaming RPCs, unlike `k6s` used by the GPT. To send RPCs
we need the protobuf definition files, so the Gitaly repo must be cloned
on the client as well for reference.
To ensure we are using the same commit on all hosts, we peel the
requested revision after cloning and persist this as a fact that can be
referred to from other hosts. The Go and Ruby versions used by the
designated commit are also parsed from `.tool-versions` and saved for
use on the Gitaly node.
We will also need to copy the `ghz` output over to the Gitaly node for
collection with the other benchmark outputs. In preparation for that,
create a new SSH key and save its pubkey as another fact so we can trust
it on the Gitaly node.
[0] https://github.com/bojand/ghz
[1] https://ghz.sh
|
|
Add an additional role to tear down a benchmarking instance when it is
no longer required.
This also removes the destroyed instances' host keys from known hosts
as GCP will frequently re-use the same IP address on new nodes. This
causes host key verification errors if we don't clean up known hosts.
|
|
Basic benchmarking will require two hosts, a Gitaly instance and a
client instance to send traffic from. Gitaly Cluster is beyond the scope
of this initial effort.
Create a terraform job that creates both hosts, with port 8075 open on
the Gitaly node for traffic. We use a `t2d` instance for Gitaly as these
provide 4 physical cores, as opposed to 2 hyperthreaded cores. In theory
this could reduce performance jitter, though I have not measured this to
be sure.
A disk image containing the test repositories is attached to the Gitaly
node on creation. These repositories are:
- git.git - A smaller repository with a fair amount of history.
- gitlab.git - Uses an object pool and has ~6,000,000 refs.
- linux.git - A large and well-groomed repository.
- homebrew-core.git - Has very large trees.
- chromium.git - Extremely large (40 GiB), with ~4,000,000 refs.
This task borrows its structure from the old Gitaly Cluster demo script in
`_support/terraform`.
|
|
Ansible can run slowly when performing a large number of operations, and
determining which tasks are slow is difficult with the default output.
Mitigate these issues by enabling pipelining[0], which speeds things up
dramatically and is compatible with the Ubuntu 22.04 hosts we're using,
and the `profile_tasks` callback[1] which print the start time of each
task during execution and a summary of task times on completion.
[0] https://docs.ansible.com/ansible/latest/reference_appendices/config.html#ansible-pipelining
[1] https://docs.ansible.com/ansible/latest/collections/ansible/posix/profile_tasks_callback.html
|
|
Remove infrastructure to clone the "gitlab-test-mirror.git" and
"gitlab-git-test.git" seed repositories. They are not used anymore.
|
|
Right now, the way we apply Git patches is by adding them to the Gitaly
project and using git-apply(1) to apply them ad-hoc. This is starting to
show its limits though:
- It is hard for us to provide a simple pointer to the Git sources
that we distribute to the customer.
- It is hard to execute tests for the patched Git version in an
automated fashion.
- It is hard to work on top of the already-patched Git distribution
to for example apply more patches.
These limitations are getting more noticeable now that we have split up
the Gitaly team into two teams, where we potentially want to backport
patches more aggressively.
With the split we have now made the repository at [1] the canonical Git
repository for all our efforts. This indeed opens up a much better way
to use custom Git versions: instead of hosting the patches in the Gitaly
repository, we start to tag Gitaly-specific releases in that repository.
These releases then carry all the additional patches on top.
This makes it trivial to use normal workflows:
- We can point customers to the Gitaly-specific tags which hold our
patches on top.
- We can automatically trigger CI pipelines on top of patched
Gitaly-specific releases by just pushing branches or tags.
- You can just clone the repository and checkout out the specific
tags.
Drop the infrastructure to patch releases in-place in favor of this new
architecture.
[1]: https://gitlab.com/gitlab-org/git.git
|
|
We've got multiple scripts that are required to generate Ruby code from
our Protobuf definitions in the `_support` directory. This has multiple
smells:
- It's out-of-line with all the other tools, which nowadays are
located in the `tools` directory.
- It's hard to discover and find out which parts logically form a
unit.
- We are reusing the Gemfile of the Ruby sidecar to pin the
`grpc-tools` dependency to a specific version.
Move the tooling into its own `tools/protogem` directory that's got its
own Gemfile to fix these points. This also allows us to auto-update
dependencies via the Renovate bot like we do for our other tools.
|
|
In the commit (cc04215eb) we removed the flag to enable git v2.37.0.
Making it now the default git version. Now we can remove the older git
version v2.35.0. In this commit remove it from the Makefile. This means
it will no longer be bundled with Gitaly.
Also remove the patches added for git v2.35.0, which are no longer
required.
|
|
We're about to stop installing Git into our current default location.
Instead, tests are supposed to use the binary wrappers provided by the
Git project so that we don't have to install it in the first place.
Adapt the test-boot script to use them.
|
|
The `check` subcommand has been relocated from `gitaly-hooks` to the
main `gitaly` binary. References to the subcommand were updated to
reflect this change.
|
|
Update our bundled Git version to v2.37.1. This both updates our major
version to include the latest changes from v2.37, but also updates our
minor version to include fixes for CVE-2022-29187, which is another
variant of opening repositories owned by a different user leading to
privilege escalation.
To the best of my knowledge, Gitaly is not impacted by this specific
vulnerability. It does not perform repository discovery by walking up
the filesystem hierarchy and thus wouldn't pick up repositories in any
of the parent directories of the storage root. And if an adversary is in
a posititon to change the owner of repositories contained in Gitaly's
storage root, they would already have other ways to attack the host.
Also note that we're upgrading the bundled Git version v2.36.1 in-place.
This can be done because its feature flag is not yet default-enabled and
hasn't been rolled out anywhere due to a set of incompatibilities.
Changelog: changed
|
|
We have changed the Postgres client version due to our update to a more
recent GitLab Build Image, which caused some minor changes in the
Praefect schema. Update the schema to match.
|
|
Move the `noticegen` tool into the top-level `tools/` directory so that
all of our custom build tools are in one place. This also makes its
sources discoverable for our formatter.
|
|
Move the `module-updater` tool into the top-level `tools/` directory so
that all of our custom build tools are in one place. This also makes its
sources discoverable for our formatter.
|
|
The Protoc plugins we use are hidden away deep into the `proto/`
directory, which makes it very hard to discover them when one doesn't
already know about their existence. Let's move them into a new top-level
`tools/` directory.
|
|
There is no real reason why the `protoc-gen-gitaly-lint` package
requires another internal package to provide the actual logic.
Furthermore, we want to move this plugin into a top-level `tools`
directory to make it easier to discover.
Absorb the `linter` package to make it easier to move the code around.
|
|
Install bundled Git v2.36.0.gl1 alongside v2.35.1.gl1. Note that we
carry forward a set of patches from the old version which hasn't made
it into the final release yet.
Changelog: added
|
|
Makefile: Drop bundled Git v2.33.1.gl3
See merge request gitlab-org/gitaly!4495
|
|
Ignore verification columns for read-only cache updates
Closes #4159
See merge request gitlab-org/gitaly!4468
|
|
We have finished the migration to bundled Git v2.35.1.gl1 in v14.10. Due
to concerns with zero-downtime upgrades we couldn't yet remove the old
version though. But now that we have waited for a release we can finally
remove the old version.
Remove the infrastructure to build and install bundled Git v2.33.1.gl3.
Changelog: removed
|
|
The background verifier sets a lease time on a replica when it picks
it up for verification. If the worker dies for some reason, the lease
will remain in place and no other worker will pick up the replica for
verification again until the lease is cleared. The lease itself tells
the maximum time the worker itself would be working on the replica.
After it has been passed, it would be safe for another worker to pick
up the replica for verification again. This commit adds a background
goroutine that periodically releases expired leases so other workers
can take up the work if the original worker failed and did not release
the lease. The 'verificaton_leases' index is added so the query can
efficiently find the replicas with leases acquired to find the stale
ones.
|
|
Read-only cache receives invalidations on record updates via triggers
in Postgres. Currently the notifications are sent for any modification
to the records. The verification related columns are not relevant to
the operation of the cache so this commit ignores the changes to the
columns in the triggers.
Changelog: changed
|
|
This commit adds the necessary schema changes for the metadata
background verification. Each replica receives two new columns:
1. 'verified_at' which contains the timestamp of the last successful
verification of the replica. This effectively allows for identifying
replicas that are in need of reverification.
2. 'verification_leased_until' which contains a timestamp until which
a worker has acquired a lease to reverify the repository. This prevents
multiple workers from picking the same repository for reverification at
the same time.
'verification_queue' index is added to index replicas which have not been
acquired by any worker. This allows for efficientl querying replicas that
are in need of reverification later.
Changelog: other
|
|
The current set of Git patches got quite big, and consequentially it's
hard to see which patches belong to what version. Reorder them into a
per-version subdirectory so that the grouping is clear. Furthermore,
this allows us to find all patches via wildcards instead of having to
manually list them in our Makefile.
|
|
|