diff options
Diffstat (limited to 'doc/administration/gitaly/troubleshooting.md')
-rw-r--r-- | doc/administration/gitaly/troubleshooting.md | 58 |
1 files changed, 23 insertions, 35 deletions
diff --git a/doc/administration/gitaly/troubleshooting.md b/doc/administration/gitaly/troubleshooting.md index 5f05b6322b0..dbbd348556c 100644 --- a/doc/administration/gitaly/troubleshooting.md +++ b/doc/administration/gitaly/troubleshooting.md @@ -66,7 +66,7 @@ for details. ### Client side gRPC logs Gitaly uses the [gRPC](https://grpc.io/) RPC framework. The Ruby gRPC -client has its own log file which may contain useful information when +client has its own log file which may contain helpful information when you are seeing Gitaly errors. You can control the log level of the gRPC client with the `GRPC_LOG_LEVEL` environment variable. The default level is `WARN`. @@ -85,7 +85,7 @@ check for an SSL or TLS problem: ``` Check whether `Verify return code` field indicates a -[known Omnibus GitLab configuration problem](https://docs.gitlab.com/omnibus/settings/ssl.html). +[known Omnibus GitLab configuration problem](https://docs.gitlab.com/omnibus/settings/ssl/index.html). If `openssl` succeeds but `gitlab-rake gitlab:gitaly:check` fails, check [certificate requirements](configure_gitaly.md#certificate-requirements) for Gitaly. @@ -123,27 +123,6 @@ sudo cat /proc/$PID/environ | tr '\0' '\n' | grep ^CORRELATION_ID= This method isn't reliable for `git cat-file` processes, because Gitaly internally pools and re-uses those across RPCs. -### Observing `gitaly-ruby` traffic - -[`gitaly-ruby`](configure_gitaly.md#gitaly-ruby) is an internal implementation detail of Gitaly, -so, there's not that much visibility into what goes on inside -`gitaly-ruby` processes. - -If you have Prometheus set up to scrape your Gitaly process, you can see -request rates and error codes for individual RPCs in `gitaly-ruby` by -querying `grpc_client_handled_total`. - -All gRPC calls made by `gitaly-ruby` itself are internal calls from the main Gitaly process to one of its `gitaly-ruby` -sidecars. - -Assuming your `grpc_client_handled_total` counter only observes Gitaly, -the following query shows you RPCs are (most likely) internally -implemented as calls to `gitaly-ruby`: - -```prometheus -sum(rate(grpc_client_handled_total[5m])) by (grpc_method) > 0 -``` - ### Repository changes fail with a `401 Unauthorized` error If you run Gitaly on its own server and notice these conditions: @@ -345,7 +324,7 @@ You might see the following in Gitaly and Praefect logs: } ``` -This is a gRPC call +This information in the logs is a gRPC call [error response code](https://grpc.github.io/grpc/core/md_doc_statuscodes.html). If this error occurs, even though @@ -358,7 +337,7 @@ server to keep them synchronized. ### Gitaly not listening on new address after reconfiguring -When updating the `gitaly['listen_addr']` or `gitaly['prometheus_listen_addr']` values, Gitaly may +When updating the `gitaly['configuration'][:listen_addr]` or `gitaly['configuration'][:prometheus_listen_addr]` values, Gitaly may continue to listen on the old address after a `sudo gitlab-ctl reconfigure`. When this occurs, run `sudo gitlab-ctl restart` to resolve the issue. This should no longer be @@ -401,6 +380,12 @@ push, which causes a significant delay. If Git pushes are too slow when Dynatrace is enabled, disable Dynatrace. +### `gitaly check` fails with `401` status code + +`gitaly check` can fail with `401` status code if Gitaly can't access the internal GitLab API. + +One way to resolve this is to make sure the entry is correct for the GitLab internal API URL configured in `gitlab.rb` with `gitlab_rails['internal_api_url']`. + ## Gitaly fails to fork processes stored on `noexec` file systems Because of changes [introduced](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/5999) in GitLab 14.10, applying the `noexec` option to a mount @@ -492,7 +477,7 @@ in sync so the token check succeeds. This check helps identify the root cause of `permission denied` [errors being logged by Praefect](#permission-denied-errors-appearing-in-gitaly-or-praefect-logs-when-accessing-repositories). -For offline environments where access to public [`pool.ntp.org`](https://pool.ntp.org) servers is not possible, the Praefect `check` sub-command fails this +For offline environments where access to public `pool.ntp.org` servers is not possible, the Praefect `check` sub-command fails this check with an error message similar to: ```plaintext @@ -514,9 +499,9 @@ Here are common errors and potential causes: - 500 response code - `ActionView::Template::Error (7:permission denied)` - - `praefect['auth_token']` and `gitlab_rails['gitaly_token']` do not match on the GitLab server. + - `praefect['configuration'][:auth][:token]` and `gitlab_rails['gitaly_token']` do not match on the GitLab server. - `Unable to save project. Error: 7:permission denied` - - Secret token in `praefect['storage_nodes']` on GitLab server does not match the + - Secret token in `praefect['configuration'][:virtual_storage]` on GitLab server does not match the value in `gitaly['auth_token']` on one or more Gitaly servers. - 503 response code - `GRPC::Unavailable (14:failed to connect to all addresses)` @@ -530,7 +515,7 @@ Here are common errors and potential causes: Some common reasons for the Praefect database to experience elevated CPU usage include: - Prometheus metrics scrapes [running an expensive query](https://gitlab.com/gitlab-org/gitaly/-/issues/3796). If you have GitLab 14.2 - or above, set `praefect['separate_database_metrics'] = true` in `gitlab.rb`. + or above, set `praefect['configuration'][:prometheus_exclude_database_from_default_metrics] = true` in `gitlab.rb`. - [Read distribution caching](praefect.md#reads-distribution-caching) is disabled, increasing the number of queries made to the database when user traffic is high. Ensure read distribution caching is enabled. @@ -544,9 +529,8 @@ To determine the primary node of a repository: - With legacy election strategies in GitLab 13.12 and earlier, the primary was the same for all repositories in a virtual storage. To determine the current primary Gitaly node for a specific virtual storage: - - Use the `Shard Primary Election` [Grafana chart](praefect.md#grafana) on the + - (Recommended) Use the `Shard Primary Election` [Grafana chart](praefect.md#grafana) on the [`Gitlab Omnibus - Praefect` dashboard](https://gitlab.com/gitlab-org/grafana-dashboards/-/blob/master/omnibus/praefect.json). - This is recommended. - If you do not have Grafana set up, use the following command on each host of each Praefect node: @@ -650,7 +634,7 @@ If the supplied value for `-virtual-storage` is incorrect, the command returns t get metadata: rpc error: code = NotFound desc = repository not found ``` -The documented examples specify `-virtual-storage default`. Check the Praefect server setting `praefect['virtual_storages']` in `/etc/gitlab/gitlab.rb`. +The documented examples specify `-virtual-storage default`. Check the Praefect server setting `praefect['configuration'][:virtual_storage]` in `/etc/gitlab/gitlab.rb`. ### Check that repositories are in sync @@ -669,7 +653,7 @@ However, the Praefect database tables are not created on initial reconfigure and errors that relations do not exist if either: - The `gitlab-ctl reconfigure` command isn't executed. -- There are errors during the execution. +- Errors occur during the execution. For example: @@ -693,7 +677,7 @@ praefect sql-migrate: OK (applied 21 migrations) This indicates that the virtual storage name used in the [Praefect configuration](praefect.md#praefect) does not match the storage name used in -[`git_data_dirs` setting](praefect.md#gitaly) for GitLab. +[`gitaly['configuration'][:storage][<index>][:name]` setting](praefect.md#gitaly) for GitLab. Resolve this by matching the virtual storage names used in Praefect and GitLab configuration. @@ -715,9 +699,13 @@ Possible solutions: - Provision larger VMs to gain access to larger network traffic allowances. - Use your cloud service's monitoring and logging to check that the Praefect nodes are not exhausting their traffic allowances. +### `gitlab-ctl reconfigure` fails with error: `STDOUT: praefect: configuration error: error reading config file: toml: cannot store TOML string into a Go int` + +This error occurs when `praefect['database_port']` or `praefect['database_direct_port']` are configured as a string instead of an integer. + ## Profiling Gitaly -Gitaly exposes several of the Golang built-in performance profiling tools on the Prometheus listen port. For example, if Prometheus is listening +Gitaly exposes several of the Go built-in performance profiling tools on the Prometheus listen port. For example, if Prometheus is listening on port `9236` of the GitLab server: - Get a list of running `goroutines` and their backtraces: |