Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPatrick Steinhardt <psteinhardt@gitlab.com>2022-05-09 16:07:04 +0300
committerPatrick Steinhardt <psteinhardt@gitlab.com>2022-05-09 16:19:14 +0300
commitca71bbd189a528700f4583ee621b36bbb2b89ec7 (patch)
tree810bcbf2a4e49506d84c8156f6f3f4f7c4bfdf7e
parent847581896f9710369a385584b62ca85e6389aa1d (diff)
doc: Modernize our Protobuf documentationpks-proto-style-update
Modernize our Protobuf documentation to clearly document our current architecture and changed style guidelines.
-rw-r--r--doc/protobuf.md333
1 files changed, 197 insertions, 136 deletions
diff --git a/doc/protobuf.md b/doc/protobuf.md
index 87bcb1121..f7e188323 100644
--- a/doc/protobuf.md
+++ b/doc/protobuf.md
@@ -56,69 +56,122 @@ gRPC provides an implementation framework based on these Protobuf concepts.
single connection to a gRPC server. In principle, a multi-threaded
gRPC client needs only one connection to a gRPC server.
-## Design decisions
-
-1. In Gitaly's case there is one server application
- https://gitlab.com/gitlab-org/gitaly which implements all services
- in the protocol.
-1. In default GitLab installations each Gitaly client interacts with
- exactly 1 Gitaly server, on the same host, via a Unix domain socket.
- In a larger installation each Gitaly client will interact with many
- different Gitaly servers (one per GitLab storage shard) via TCP
- connections.
-1. Gitaly uses
- [grpc.Errorf](https://godoc.org/google.golang.org/grpc#Errorf) to
- return meaningful
- [errors](https://godoc.org/google.golang.org/grpc/codes#Code) to its
- clients.
-1. Each RPC `FooBar` has its own `FooBarRequest` and `FooBarResponse`
- message types. Try to keep the structure of these messages as flat as
- possible. Only add abstractions when they have a practical benefit.
-1. We never make backwards incompatible changes to an RPC that is
- already implemented on either the client side or server side.
- Instead we just create a new RPC call and start a deprecation
- procedure (see below) for the old one.
-1. It is encouraged to put comments (starting with `//`) in .proto files.
- Please put comments on their own lines. This will cause them to be
- treated as documentation by the protoc compiler.
-1. When choosing an RPC name don't use the service name as context.
- Good: `service CommitService { rpc CommitExists }`. Bad:
- `service CommitService { rpc Exists }`.
+## Gitaly RPC Server Architecture
+
+Gitaly consists of two different server applications which implement services:
+
+- Gitaly hosts all the logic required to access and modify Git repositories.
+ This is the place where actual repositories reside and where the Git commands
+ get executed.
+
+- Praefect is a transparent proxy that routes requests to one or more Gitaly
+ nodes. This server allows for load-balancing and high availability by keeping
+ multiple Gitaly nodes up-to-date with the same data.
+
+Gitaly clients either interact with Praefect or with a single Gitaly server. For
+most of the part the client does not need to know which of both types of servers
+it is currently interacting with: Praefect transparently proxies requests to
+Gitaly servers so that it behaves the same as a standalone Gitaly server.
+
+Servers can be sharded for larger installations so that only a subset of data is
+stored on each of the Gitaly servers.
+
+## Design
+
+### RPC definitions
+
+Each RPC `FooBar` has its own `FooBarRequest` and `FooBarResponse` message
+types. Try to keep the structure of these messages as flat as possible. Only add
+abstractions when they have a practical benefit.
+
+We never make backwards incompatible changes to an RPC that is already
+implemented on either the client side or server side. Instead we just create a
+new RPC call and start a deprecation procedure (see below) for the old one.
+
+### Comments
+
+Services, RPCs, messages and their fields declared in `.proto` files must have
+comments. This documentation must be sufficient to let potential callers figure
+out why this RPC exists and what the behaviour of an RPC is without looking up
+its implementation. Special error cases should be documented.
+
+### Errors
+
+Gitaly uses [error codes](https://pkg.go.dev/google.golang.org/grpc/codes) to
+indicate basic error classes. In case error codes are not sufficient for clients
+to make specific error cases actionable, Gitaly uses the [rich error
+model](https://www.grpc.io/docs/guides/error/#richer-error-model) provided by
+gRPC. With this error model, Gitaly can embed Protobuf messages into returned
+errors and thus provide exact information about error conditions to the client.
+In case the RPC needs to use the rich error model, it should have its own
+`FooBarError` message type.
+
+RPCs must return an error if the action failed. It is disallowed to return
+specific error cases via the RPC's normal response. This is required so that
+Praefect can correctly handle any such errors.
+
+### RPC concepts
+
+RPCs should not be focussed on a single usecase only, but instead they should be
+implemented with the underlying Git concept in mind. If they directly map to the
+way Git handles specific data instead of directly mapping to the usecase at
+hand, then chances are high that the RPC will be reusable for other, yet-unknown
+usecases.
+
+Common concepts that can be considered:
+
+- Accept revisions as documented in gitrevisions(5) instead of object IDs or
+ references. If possible, accept an array of revisions instead of a single
+ revision only so that callers can easily specify revision ranges without
+ requiring a separate RPC. Furthermore, accept pseudo-revisions like `--not`
+ and `--all`.
+- Accept fully-qualified references instead of branch names. This avoids issues
+ with ambiguity and makes it possible to use RPCs for references which are not
+ branches.
### RPC naming conventions
-Gitaly-Proto has RPCs that are resource based, for example when querying for a
-commit. Another class of RPCs are operations, where the result might be empty
-or one of the RPC error codes but the fact that the operation took place is
-of importance.
+Gitaly has RPCs that are resource based, for example when querying for a commit.
+Another class of RPCs are operations, where the result might be empty or one of
+the RPC error codes but the fact that the operation took place is of importance.
For all RPCs, start the name with a verb, followed by an entity, and if required
followed by a further specification. For example:
-- GetCommit
+
+- ListCommits
- RepackRepositoryIncremental
- CreateRepositoryFromBundle
-For resource RPCs the verbs in use are limited to: Get, List, Create, Update,
-Delete, or Is. Where both Get and List as verbs denote these operations have no side
-effects. These verbs differ in terms of the expected number of results the query
-yields. Get queries are limited to one result, and are expected to return one
-result to the client. List queries have zero or more results, and generally will
-create a gRPC stream for their results. When the `Is` verb is used, this RPC
-is expected to return a boolean, or an error. For example: `IsRepositoryEmpty`.
+For resource RPCs the verbs in use are limited to:
+
+- Get
+- List
+- Is
+- Create
+- Update
+- Delete
+
+Get and List as verbs denote these operations have no side effects. These verbs
+differ in terms of the expected number of results the query yields. Get queries
+are limited to one result, and are expected to return one result to the client.
+List queries have zero or more results, and generally will create a gRPC stream
+for their results.
+When the `Is` verb is used, this RPC is expected to return a boolean, or an
+error. For example: `IsRepositoryEmpty`.
-When an operation based RPC is defined, the verb should map to the first verb in
-the Git command it represents. Example; FetchRemote.
+When an operation-based RPC is defined, the verb should map to the first verb in
+the Git command it represents, e.g. `FetchRemote`.
-Note that the current interface defined in this repository does not yet abide
-fully to these conventions. Newly defined RPCs should, though, so eventually
-gitaly-proto converges to a common standard.
+Note that large parts of the current Gitaly RPC interface do not abide fully to
+these conventions. Newly defined RPCs should, though, so eventually the
+interface converges to a common standard.
### Common field names and types
-As a general principle, remember that Git does not enforce encodings on
-most data inside repositories, so we can rarely assume data to be a
-Protobuf "string" (which implies UTF-8).
+As a general principle, remember that Git does not enforce encodings on most
+data inside repositories, so we can rarely assume data to be a Protobuf "string"
+(which implies UTF-8).
1. `bytes revision`: for fields that accept any of branch names / tag
names / commit ID's. Uses `bytes` to be encoding agnostic.
@@ -132,7 +185,11 @@ Protobuf "string" (which implies UTF-8).
### Stream patterns
-These are some patterns we already use, or want to use going forward.
+Protobuf suppports streaming RPCs which allow for multiple request or response
+messages to be sent in a single RPC call. We use these whenever it is expected
+that an RPC may be invoked with lots of input parameters or when it may generate
+a lot of data. This is required by limitations in the gRPC framework where
+messages should not typically be larger than 1MB.
#### Stream response of many small items
@@ -147,10 +204,10 @@ message FooBarResponse {
}
```
-A typical example of an "Item" would be a commit. To avoid the penalty
-of network IO for each Item we return, we batch them together. You can
-think of this as a kind of buffered IO at the level of the Item
-messages. In Go, to ease the bookkeeping you can use
+A typical example of an "Item" would be a commit. To avoid the penalty of
+network IO for each Item we return, we batch them together. You can think of
+this as a kind of buffered IO at the level of the Item messages. In Go, to ease
+the bookkeeping you can use
[gitlab.com/gitlab-org/gitaly/internal/helper/chunker](https://godoc.org/gitlab.com/gitlab-org/gitaly/internal/helper/chunker).
#### Single large item split over multiple messages
@@ -170,13 +227,12 @@ message FooBarResponse {
}
```
-A typical example of a large item would be the contents of a Git blob.
-The header might contain the blob OID and the blob size. Only the first
-message in the response stream has `header` set, all others have `data`
-but no `header`.
+A typical example of a large item would be the contents of a Git blob. The
+header might contain the blob OID and the blob size. Only the first message in
+the response stream has `header` set, all others have `data` but no `header`.
-In the particular case where you're sending back raw binary data from
-Go, you can use
+In the particular case where you're sending back raw binary data from Go, you
+can use
[gitlab.com/gitlab-org/gitaly/streamio](https://godoc.org/gitlab.com/gitlab-org/gitaly/streamio)
to turn your gRPC response stream into an `io.Writer`.
@@ -203,78 +259,104 @@ message FooBarResponse {
}
```
-This looks the same as the "single large item" case above, except
-whenever a new large item begins, we send a new message with a non-empty
-`header` field.
+This looks the same as the "single large item" case above, except whenever a new
+large item begins, we send a new message with a non-empty `header` field.
#### Footers
-If the RPC requires it we can also send a footer using `oneof`. But by
-default, we prefer headers.
+If the RPC requires it we can also send a footer using `oneof`. But by default,
+we prefer headers.
### RPC Annotations
-In preparation for Gitaly Cluster, we are now requiring all RPC's to be annotated
-with an appropriate designation. All methods must contain one of the following lines:
+Gitaly Cluster needs to know about the nature of RPCs in order to decide how a
+specific request needs to be routed:
+
+- Accessors may be routed to any one Gitaly node which has an up-to-date
+ repository to allow for load-balancing reads. These RPCs must not have any
+ side effects.
+- Mutators will be routed to all Gitaly nodes which have an up-to-date
+ repository so that changes are performed on all nodes at once. Each node is
+ expected to cast transactional votes so that the actual data that is written
+ to disk is verified to be the same for all of them.
+- Maintenance RPCs are not deemed mission critical. They are routed on a
+ best-effort basis to all online nodes which have a specific repository.
+
+To classify RPCs, each declaration must contain one of the following lines:
- `option (op_type).op = ACCESSOR;`
- - Designates an RPC as being read-only (i.e. side effect free)
- `option (op_type).op = MUTATOR;`
- - Designates that an RPC modifies the repository
-
-Failing to designate an RPC correctly will result in a CI error. For example:
+- `option (op_type).op = MAINTENANCE;`
-`--gitaly_out: server.proto: Method ServerInfo missing op_type option`
+We use a custom `protoc` plugin to verify that all RPCs do in fact have such a
+declaration. This plugin can be executed via `make lint-proto`.
-Additionally, all mutator RPC's require additional annotations to clearly
+Additionally, all mutator RPCs require additional annotations to clearly
indicate what is being modified:
-- When an RPC modifies a server-wide resource, the scope should specify `SERVER`.
-- When an RPC modifies a storage-wide resource, the scope should specify `STORAGE`.
- - Additionally, every request should contain field marked with `storage` annotation.
-- When an RPC modifies a specific repository, the scope should specify `REPOSITORY`.
- - Additionally, every RPC with `REPOSITORY` scope, should also specify the target repository
- and may specify the additional repository.
+- Server-scoped RPCs modify server-wide resources.
+- Storage-scoped RPCs modify data in a specific storage.
+- Repository-scoped RPCs modify data in a specific repository.
-The target repository represents the location or address of the repository
-being modified by the operation. This is needed by Praefect (Gitaly Cluster) in
-order to properly schedule replications to keep repository replicas up to date.
+To declare the scope, mutators must contain one of the following lines:
-The target repository annotation marks where the target repository can be
-found in the message. The annotation is added near `gitaly.Repository` field
-(e.g. `Repository repository = 1 [(target_repository)=true];`). If annotated field isn't
-`gitaly.Repository` type then it has to contain field annotated `[(repository)=true]` with
-correct type. Having separate `repository` annotation allows to have same field in child
-message annotated as both `target_repository` and `additional_repository` depending on parent
-message.
+- `option(op_type).scope = SERVER;`
+- `option(op_type).scope = STORAGE;`: The associated request must have a field
+ tagged with `[(storage)=true]` that indicates the storage's name.
+- `option(op_type).scope = REPOSITORY;`: This is the default scoped and thus
+ doesn't need to be explicitly declared. The associated request must have a
+ field tagged with `[(target_repository)=true]` that indcates the repository's
+ location.
-The additional repository is annotated similarly to target repository but annotation
-is named `additional_repository`
+The target repository represents the location or address of the repository being
+modified by the operation. This is needed by Praefect (Gitaly Cluster) in order
+to properly schedule replications to keep repository replicas up to date.
+
+The target repository annotation marks where the target repository can be found
+in the message. The annotation is added near `gitaly.Repository` field (e.g.
+`Repository repository = 1 [(target_repository)=true];`). If annotated field
+isn't `gitaly.Repository` type then it has to contain field annotated
+`[(repository)=true]` with correct type. Having separate `repository` annotation
+allows to have same field in child message annotated as both `target_repository`
+and `additional_repository` depending on parent message.
+
+The additional repository is annotated similarly to target repository but
+annotation is named `additional_repository`.
See our examples of [valid](go/internal/cmd/protoc-gen-gitaly-lint/testdata/valid.proto) and
[invalid](go/internal/cmd/protoc-gen-gitaly-lint/invalid.proto) proto annotations.
+### Transactions and Atomicity
+
+With Gitaly Cluster, mutating RPCs will get routed to multiple Gitaly nodes at
+once. Each node must then vote on the changes it intends to perform, and only if
+quorum was reached on the change should it be persisted to disk. For this to
+work correctly, all mutating RPCs need to follow a set of rules:
+
+- Every mutator needs to vote at least twice on the data it is about to write: a
+ first preparatory vote must happen before data is visible to the user so that
+ data can be discarded in case nodes disagree without any impact on the
+ repository itself. And a second committing vote must happen to let Praefect
+ know that changes have indeed been committed to disk.
+- In the general case, the vote should be computed from all data that is to be
+ written.
+- Changes should be atomic: either all changes are persisted to disk or none
+ are.
+- The number of transactional votes should be kept at a minimum and should not
+ scale with the number of changes performed. Every vote incurs costs, which may
+ become prohibitively expensive in case a vote is executed per change.
+- Mutators must return an error in case anything unexpected happens. This error
+ needs to be deterministic so that Praefect can assert that a failing RPC call
+ has failed in the same way across nodes.
+
### Go Package
-If adding new protobuf files, make sure to correctly set the `go_package` option
-near the top of the file:
+All Protobuf files hosted in the Gitaly project must have their Go package
+declared. This is done via the `go_package` option:
`option go_package = "gitlab.com/gitlab-org/gitaly/v14/proto/go/gitalypb";`
-This allows other protobuf files to locate and import the Go generated stubs. If
-you forget to add a `go_package` option, you may receive an error similar to:
-
-`blob.proto is missing the go_package option`
-
-### Documentation
-
-New or updated RPCs and message types should be accompanied by comment strings.
-Good comment strings will explain why the RPC exists and how it behaves. Good
-message type comments will explain what the message is communicating. Each updated
-message field should have a comment.
-
-Refer to official protobuf documentation for
-[how to add comments](https://developers.google.com/protocol-buffers/docs/proto#adding_comments).
+This allows other protobuf files to locate and import the Go generated stubs.
## Contributing
@@ -288,43 +370,22 @@ the output of gzip). **Use the same Go version as .gitlab-ci.yml (Go
[DCO + License](CONTRIBUTING.md)
-### Build process
+## Workflows
+
+### Generating Protobuf sources
-After you change or add a .proto file you need to re-generate the Go
-and Ruby libraries before committing your change.
+After you change or add a .proto file you need to re-generate the Go and Ruby
+libraries before committing your change.
```shell
# Re-generate Go and Ruby libraries
make proto
```
-## How to deprecate an RPC call
+### Deprecating an RPC call
-See [DEPRECATION.md](DEPRECATION.md).
+See [PROCESS.md](PROCESS.md#rpc-deprecation-process).
-## Release
+## Releasing Protobuf definitions
-This will tag and release the gitaly-proto library, including
-pushing the gem to rubygems.org
-
-```shell
-make release version=X.Y.Z
-```
-
-## How to manually push the gem
-
-If the release script fails the gem may not be pushed. This is how you can do that after the fact:
-
-```shell
-# Use a sub-shell to limit scope of 'set -e'
-(
- set -e
-
- # Replace X.Y.Z with the version you are pushing
- GEM_VERSION=X.Y.Z
-
- git checkout v$GEM_VERSION
- gem build gitaly.gemspec
- gem push gitaly-$GEM_VERSION.gem
-)
-```
+See [PROCESS.md](PROCESS.md#publishing-the-ruby-gem).