Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-11-08git: Use spaces instead of the NUL-byteextract_refs_codingJames Fargher
There is a bug that prevents the NUL-byte being outputted for some variations of `--format`. So instead, since ref names cannot have spaces anyway, switch to using spaces.
2021-11-08gitpipe: Replace manual for-each-ref decodingJames Fargher
Previously ref parsing was done in place. This replaces the parsing with a shared implementation. To enable the git output to be decoded with the same code the format was changed to a consistent NUL byte separator. `PEELED` had to be lower-cased as git was interpreting %00PEELED as an escape code.
2021-11-08localrepo: Replace manual for-each-ref decoding in localrepoJames Fargher
Previously ref parsing was done in place. This replaces the parsing with a shared implementation.
2021-11-08git: Add ref decoder constructor for decoding for-each-ref outputJames Fargher
2021-11-08git: Rename NewRefsDecoder to NewShowRefDecoderJames Fargher
In anticipation of interpreting more reference listings from git, we rename the constructor to be specific to the type of output that is being decoded. Most other refs output use a null-byte as a separator. So we make the initial changes to make this separator configurable.
2021-11-08Move RefsDecoder into git packageJames Fargher
RefsDecoder decodes the output from git-show-ref. There is nothing specific to backups here and moving it to the git package should allow us to reuse the logic elsewhere.
2021-11-05Update changelog for 14.0.12GitLab Release Tools Bot
[ci skip]
2021-11-05Merge branch 'pks-objectpool-in-memory-fetch' into 'master'Sami Hiltunen
objectpool: Convert to use in-memory fetches See merge request gitlab-org/gitaly!4038
2021-11-05Merge branch 'docs-backup' into 'master'Patrick Steinhardt
Adds a description of how backups work See merge request gitlab-org/gitaly!3924
2021-11-04Add instructions on how to create/restore backups using gitaly-backupJames Fargher
2021-11-04Adds a description of how backups workJames Fargher
2021-11-04Merge branch 'pks-gitaly-git2go-etxtbsy' into 'master'Pavlo Strokov
testcfg: Try to fix ETXTBSY errors with gitaly-git2go executable Closes #3869 See merge request gitlab-org/gitaly!4041
2021-11-04Merge branch 'pks-git-no-templates' into 'master'Sami Hiltunen
git: Do not install templates when creating repos Closes #3285 See merge request gitlab-org/gitaly!4027
2021-11-04Merge branch 'smh-unique-conflict-stg' into 'master'Sami Hiltunen
Update relative path in SetGeneration Closes #3876 See merge request gitlab-org/gitaly!4019
2021-11-04Merge branch 'pks-catfile-drop-decorrelated-traces' into 'master'Sami Hiltunen
catfile: Stop creating decorrelated spans See merge request gitlab-org/gitaly!4042
2021-11-04Merge branch 'pks-supervisor-timeout-fixes' into 'master'Sami Hiltunen
supervisor: Fix bugs related to timeouts See merge request gitlab-org/gitaly!4029
2021-11-04reconciler: Consider replicas pending rename outdatedSami Hiltunen
Reconciler is scheduling update jobs to replicas that are pending renames. This began happening when we updated reconciler to join the records by repository id in 2b19d8c5. As the renames previously disjointed the records in `repositories` and `storage_repositories` due to the `relative_path` changing, the reconciler didn't target replicas that were pending rename. This commit changes the query to only use replicas which are not pending a rename to act as a source node for replication. Combined with a previous commit to update the relative path in SetGeneration if necessary, this will restore the same behavior the reconciler had earlier with renames; never using replicas pending rename as a source but possibly creating a new replica.
2021-11-04datastore: Consider replicas pending rename outdatedSami Hiltunen
GetConsistentStorages has a regression where a replica pending rename is considered up to date. Requests routed to the replica would fail as the replica is still stored at the old path. Previously the records were joined by the (virtual_storage, relative_path) which naturally considered the rename pending replicas outdated. Until the relative_path is completely removed from storage_repositories, this commit updates the query to ensure the replica's relative path is the same as the repository's. This indicates the rename operation has been applied.
2021-11-04datastore: Update relative path in SetGenerationSami Hiltunen
Praefect previously allowed multiple replica records of a single repository on a storage. This was possible if the repository was renamed, as the primary key was the repository's virtual storage and relative path. When the repository was renamed, the replica records that were not yet renamed were considered to be a different repository. Usually this disparity would be fixed by the rename replication job Praefect would schedule on renaming. If the job failed, the repository would be left in it's old path. The reconciler would eventually replicate the repository to the new path from another Gitaly node. Praefect is now using repository IDs and is enforcing that each Gitaly node has a single copy of a repository due to the uniqueness of the repository ID. This causes problems with the above flow as the replica left at the old path would have the same repository ID as the new replica reconciler is replicating to. This is causing problems on staging where some repositories had only some of the replicas renamed and the reconciler is now facing unique constraint issues when attempting to reconcile the repository to the new location. After the migration to unique relative paths is completed, this won't really be an issue anymore as we won't be moving the repositories on disks anymore and the renames are done atomically in the database. For now though, Praefect needs to workaround this by updating the replicas record to point to the relative path the replication job contained.
2021-11-04Merge branch 'ps-praefect-cmd-fix' into 'master'Pavlo Strokov
praefect: Fix inconsistencies in praefect sub-commands See merge request gitlab-org/gitaly!4026
2021-11-04git: Drop Remote interfacePatrick Steinhardt
The only implementation of the Remote interface, which is part of the localrepo interface, isn't used by anything anymore, and in fact we don't want to gain any new users either: on-disk remotes are not used in Gitaly anymore. Remove both the implementation and its interface.
2021-11-04objectpool: Stop fetching tagsPatrick Steinhardt
We already fetch tags into object pool via the refspec we pass to git-fetch(1), which include tags. In addition though we also fetch tags into the standard tag namespace "refs/tags/", which is simply wrong. Fix this bug by passing `--no-tags` to git-fetch(1) such that we only use our refspecs to fetch tags. Changelog: fixed
2021-11-04objectpool: Convert to use fetches without remotePatrick Steinhardt
There is only a single callsite left in our codebase which creates on-disk remotes, which is the `FetchIntoObjectPool` RPC. We do not need this remote for anything though, and we have in the past made the move to only ever use in-memory remotes. Convert the implementation to fetch from the repository path directly without creating a remote first. Note that this implicitly fixes a bug: when creating a remote, Git sets up a default fetchspec which we use in addition to the explicit fetchspec we pass to git-fetch(1), and as a result we create two refs for each branch we fetch. Now that we don't create a remote anymore this bug is fixed for good. Changelog: fixed
2021-11-04objectpool: Demonstrate broken fetching of refsPatrick Steinhardt
When fetching into an object pool, then we accidentally end up with duplicated branches and tags because of how refspecs are set up. Demonstrate these bugs by adding a new testcase.
2021-11-04testcfg: Try to fix ETXTBSY errors with gitaly-git2go executablePatrick Steinhardt
Our build helpers which build commands on the fly as tests require them first build the executable in a shared place and then copy them into the per-test binary directory. In our CI, we have occasionally seen errors when trying to execute the resulting binaries though, where execution always fails with ETXTBSY. This error happens when trying to exec into an executable that's been concurrently modified. This is quite a curious failure mode: we only ever build commands once due to using `sync.Once`, and the copied file is also closed before returning. So in theory, there shouldn't be anything that's still modifying the file. While digging a bit I stumbled over [1] by chance: as it turns out, there used to be an issue with Docker where executing a file that's just been chmod'ed will cause ETXTBSY. And that's exactly what we're doing: we copy binaries into place and chmod them afterwards such that they're executable. The root cause of this seems to be an issue with the overlay filesystem in use, which handled this inconsistently from how it's handled with a "normal" filesystem. Let's try to fix the bug by hardlinking the file into place instead. [1]: https://github.com/moby/moby/issues/9547
2021-11-03Update changelog for 13.12.15GitLab Release Tools Bot
[ci skip]
2021-11-03Update changelog for 13.12.14GitLab Release Tools Bot
[ci skip]
2021-11-03catfile: Stop creating decorrelated spansPatrick Steinhardt
Processes spawned by the catfile cache are potentially decorrelated from the main context such that these processes can be kept around after the RPC context has been cancelled for later reuse. In order to make these cached processes observable, we create all tracing spans for requests to the catfile process both in the current per-RPC context and in the decorrelated context hosted by the cache. We have since observed that creating traces is comparatively expensive for us given that it creates many allocations and also has quite some runtime overhead. Doing this work twice per request only makes this problem worse for us. While one way to fix this is the ongoing work to allow batching requests, another angle to tackle this problem is to only start creating one context. Naturally, we want to keep the per-RPC spans such that one can easily see what's going on in a given RPC. The information about how the catfile process is used globally across different RPC calls isn't really all that useful in the first place: while it gives a global picture, we wouldn't ever see any activity outside of the context of an RPC anyway. Drop support for creating decorrelated spans in the context of the cache. This not only simplifies the code, but also optimizes away some allocations and gives us a slight speedup. Before this change: BenchmarkListAllBlobs/ListAllBlobs-16 8 133269398 ns/op 155884456 B/op 27954 allocs/op BenchmarkListAllCommits/ListAllCommits-16 40 43940502 ns/op 9231545 B/op 32434 allocs/op BenchmarkFindAllTags/FindAllTags-16 6 185349615 ns/op 31240846 B/op 134990 allocs/op After this change: BenchmarkListAllBlobs/ListAllBlobs-16 8 130457695 ns/op 155643219 B/op 25795 allocs/op BenchmarkListAllCommits/ListAllCommits-16 39 40942865 ns/op 9171181 B/op 30244 allocs/op BenchmarkFindAllTags/FindAllTags-16 6 172061555 ns/op 30800354 B/op 122930 allocs/op We have roughly 10% less memory allocations and a slight speedup. Changelog: performance
2021-11-03supervisor: Fix RSS monitor retriggering too fastPatrick Steinhardt
When we supervise a running process, then regularly poll consumed memory by the process such that the user of the supervisor can kill the process if it blows up too much. The intent is to do this once every 15 seconds, but because we create the timer ticker before we have actually received information about the process to be monitored it can cause us to poll the process twice in a row when it takes more than 15 seconds to get that information. Fix the issue by resetting the timer before doing the select such that we always wait for 15 seconds. Note that this also converts us to use a `helper.TimerTicker`: the benefit is that its reset function knows to drain already-pending events. Changelog: fixed
2021-11-03supervisor: Fix crash and notification timers clashingPatrick Steinhardt
When a supervised process is currently running, we have two different timeouts on which we are listening: one to reset the crash counter when the command has been running long enough, and one to re-send the "up" notification to any potential listeners which is required because sending the notification is done with a timeout. Given that we set up both timeouts in the same `select` call it means that they are naturally clashing with each other: if the crash reset timer is configured to be shorter than the notification timer, then notifications won't ever be resent. Otherwise, if the crash reset timer is configured to be longer than the notification timer, then we won't ever reset the crash timer. And if they're the same, then both timers race with each other. Fix the issue by creating timer tickers instead which persist across loops. Changelog: fixed
2021-11-03helper: Drain timer ticker on resetPatrick Steinhardt
Resetting `time.Timer`s is quite tricky in cases where the tick hasn't been consumed by the caller given that one first has to stop the ticker, then consume the event in case there is any and finally the ticker can be reenabled by calling reset. Given that this is not quite obvious without taking a closer look at `time.Timer`'s documentation this is easy enough to miss. Adjust our own `Reset()` wrapper in the helper package to do this dance such that `Reset()` will reset the timer in all cases.
2021-11-03Merge branch 'pks-objectpool-drop-unlink-repository' into 'master'Pavlo Strokov
objectpool: Drop UnlinkRepostioryFromObjectPool RPC See merge request gitlab-org/gitaly!4037
2021-11-03Merge branch 'pks-catfile-batched-tracing' into 'master'Patrick Steinhardt
catfile: Refactor tracing to allow for batching See merge request gitlab-org/gitaly!4031
2021-11-03Merge branch 'ps-update-x-sys-pkg' into 'master'Patrick Steinhardt
git2go: Fix failure on MacOS X systems See merge request gitlab-org/gitaly!4039
2021-11-03git2go: Fix failure on MacOS X systemsPavlo Strokov
Because of the issue in the Go v1.17 https://github.com/golang/go/issues/46763 the git2go dependent tests was failing the execution with fatal error. Update of the golang.org/x/sys to newer version fixes this problem.
2021-11-03objectpool: Drop UnlinkRepostioryFromObjectPool RPCPatrick Steinhardt
The UnlikRepostioryFromObjectPool RPC is not used by anything anymore, and the implementation from it is dangerous given that it doesn't actually unlink a repository from its object pool: it only tries to remove a remote named after the pool's project path, which we wouldn't ever create in the first place. The RPC has thus been deprecated in release v14.3. Remove the RPC and its backing code. Changelog: removed
2021-11-03gittest: Do not install templates when creating reposPatrick Steinhardt
Given that we now don't install templates when creating repos anymore in our production code, we shouldn't do so either within our tests. Change the gittest helper to also set "init.templateDir=" to adapt and fix tests as required.
2021-11-03git: Do not install templates when creating reposPatrick Steinhardt
When executing git-init(1) or git-clone(1), then Git will first create the target repository and then copy all files found in a specific template directory into the target repository. The default templates, which are typically installed in "/usr/share/git-core/templates", contain example hooks, a default description and an exclude file which explains the syntax of excludes. None of these files are required at all, and they take up precious disk space without serving any immediate purpose. Furthermore, an upcoming change will give us the ability to distribute Git versions as two executables, only. As a consequence, in such a setup we wouldn't have any template directory available, which would lead Git to complain about it not having been found. Stop installing the template directory by injecting "init.templateDir=" into both git-init(1) and git-clone(1) to prepare for this change and stop wasting disk space. Changelog: changed
2021-11-03repository: Stop calling CreateRepository to create hooks symlinkPatrick Steinhardt
Back in the days, we used to set up Git hooks by creating a symlink to our directory containing the actual hook scripts. This has long be replaced by a much more robust mechanism which injects "core.hooksPath" into Git processes to point to that directory instead. Since then, we don't really need to setup hooks on-disk anymore, except if an admin wants to create per-project server-side hooks. Back when we used to create the symlink though we had to make sure that all repos which got newly initialized did in fact have the symlink. To do so, we used to call `CreateRepository()` after having set up repos, which created the symlink for us. Given that the call is idempotent, it was totally fine if the repository was already initialized: git-init(1) simply re-copies the templates into the repository and creates anything that's missing. Nowadays this call is superfluous though: we have just created the repo, we don't want to set up a symlink and thus it ultimately doesn't serve any purpose at all anymore. So let's get rid of this useless step and drop calls to `CreateRepository()` in `CreateRepositoryFromBundle()`, `CreateRepositoryFromURL()` and `CreateFork()` and start to assert that the hook directory is a directory, like it would be created by Git. Note that this also removes the transactional voting step as performed by CreateRepository. The voting in there is broken anyway given that it doesn't guarantee any atomicity anyway. This is being remodelled in another MR in the future which updates creation of repos to become atomic, and dropping the CreateRepository call now goes into the same direction.
2021-11-03catfile: Fix double-accounting of traces for uncached processesPatrick Steinhardt
In the catfile cache, we need to discern processes which are cached and those which aren't cached, e.g. because they are not created in a cacheable context. So while a catfile process always has two different contexts with the per-RPC context and the decorrelated context stored in the cache, these contexts may indeed be the same. But in fact, our tracing infrastructure in that package will always create tracing spans for both contexts, even if they're the same, which thus leads to double-accounting of each request. Fix the bug by only creating the secondary span in case it's different from the primary per-RPC context.
2021-11-03catfile: Refactor tracing to allow for batchingPatrick Steinhardt
When using the catfile interface, then each request to it will both create an opentracing span and increment a counter. In total, this results in quite a bunch of small short-lived allocations, where we have observed in production that across all RPCs, 12% of all of our allocations where done by opentracing. While useful, this cost is simply too big to warrant us keeping it. We're about to refactor the catfile interface to allow for more efficient I/O patterns by using batched requests and buffering requests to the process by using a request queue. Like this we optimize how we write to the process so we have less syscalls and thus less context switches. In this world, we can also refactor tracing: instead of creating one span per requested object, we can instead create a single span across the whole lifetime of this batched request. Naturally, we cannot log as much details as we do right now, where each span contains the revision we did request. Instead, we can tag the spans with how many objects of each type we have requested in total. While less detailed, this should be a good-enough tradeoff such that the spans don't loose all of their benefits. Refactor the catfile tracing architecture to prepare for this change and to allow for batching of traces by storing per-request-type counters in a new `trace` structure. The following benchmarks without and with this change show a small reduction in allocations of about 2-3%, which probably stems from the fact that we're not sending the revision anymore: BenchmarkListAllBlobs/ListAllBlobs-16 8 128331259 ns/op 155930660 B/op 28539 allocs/op BenchmarkListAllCommits/ListAllCommits-16 39 37256301 ns/op 9203049 B/op 33162 allocs/op BenchmarkFindAllTags/FindAllTags-16 6 172584078 ns/op 30899380 B/op 138956 allocs/op BenchmarkListAllBlobs/ListAllBlobs-16 8 127551661 ns/op 155772826 B/op 27936 allocs/op BenchmarkListAllCommits/ListAllCommits-16 38 38921473 ns/op 9223457 B/op 32429 allocs/op BenchmarkFindAllTags/FindAllTags-16 6 195437102 ns/op 30978034 B/op 135009 allocs/op So while this change is not really worth it on its own, it does prepare us to more readily refactor the catfile cache to batch requests while retaining useful tracing info. Changelog: performance
2021-11-03catfile: Fix correlation IDs for tracing spansPatrick Steinhardt
When creating a new tracing span in the catfile package, then we need to create those spans for two different contexts: the usual per-RPC context and the decorrelated context that's stored as part of the cache. In order to associate spans created in the decorrelated context with their originating RPC, we tag the span with the correlation ID of the per-RPC context. But because we pass arguments to `startSpan()` in the wrong order, we accidentally did it the other way round: the per-RPC context ends up with the correlation ID of the cached context. Fix this bug by switching the order in which we pass the contexts.
2021-11-01Merge branch 'pks-tests-fix-direct-git-invocations' into 'master'Pavlo Strokov
git: Prepare tests using direct Git invocations for bundled Git See merge request gitlab-org/gitaly!4028
2021-11-01ssh: Convert clones to avoid executing Git directlyPatrick Steinhardt
The SSH upload-pack tests execute Git directly. This will stop working as soon as we migrate to use a bundled version of Git given that we wouldn't be able to locate git-upload-pack(1) anymore. Convert the tests to instead use a `git.ExecCommandFactory()` to prepare the clone commands and execute them to be prepared for this change.
2021-11-01gittest: Convert HTTP server to not spawn Git directlyPatrick Steinhardt
The test HTTP server set up by the gittest package is using `exec.Cmd` to spawn a Git executable directly. This will stop working when we migrate to a bundled Git version because we'd be unable to locate the git-http-backend helper. The HTTP server requires the ability to set up stdout of the command such that it can serve as a CGI handler. Extend the `gittest.ExecOpts` helper to support setup of stdout and convert the HTTP server setup to use it.
2021-11-01gitaly-hooks: Do not execute Git binary directlyPatrick Steinhardt
One of our gitaly-hooks tests which exercises the pack-objects hook is using `os.Exec()` directly to spawn a Git command. This will stop working when we migrate to using a bundled Git version given that we wouldn't be able to locate Git's exec prefix anymore. While this testcase is a good candidate to use `gittest.ExecOpts()`, what is still missing is the ability to pass additional environment variables to the spawned command. Extend the config struct to allow for this and convert the test.
2021-11-01gittest: Convert `ExecStream()` to take a configurationPatrick Steinhardt
The `gittest.ExecStream()` helper takes an additional reader which gets used as the spawned command's standard input. In order to make more use of this helper though we'll need to add additional arguments, which would eventually become quite unwieldy to use. Convert the function to instead take the reader via a config struct such that it's more readily extensible.
2021-11-01repository: Do not execute Git binary directlyPatrick Steinhardt
The testcase for cloning from a redirecting server uses `exec.Cmd` to spawn the Git executable directly. While this works now, it will stop working as soon as we bundle Git as a binary given that we wouldn't be able to find the git-remote-http helper anymore. Prepare for this change by using a `git.ExecCommandFactory` instead to spawn the command.
2021-11-01server: Fix test setup to construct a proper configPatrick Steinhardt
Some of our tests in the server authentication tests are running a Gitaly server with an empty config, which is not a valid configuration. Fix this by using the testcfg package to build a valid config for us.
2021-11-01gitaly-hooks: Use temporary directory to store test logsPatrick Steinhardt
A test for the pack-objects hook is writing logs into Gitaly's source tree. This is something we don't want to do nowadays, where all tests should instead use the temporary test directory to create their data. Convert the test to store the log in a `testhelper.TempDir()` instead.