Gitaly RFC guide

Introduces a process for introducing RFCs. Closes https://gitlab.com/gitlab-org/gitaly/-/issues/2835
author: Paul Okstad <pokstad@gitlab.com> 2020-06-18 20:15:09 +0300
committer: Paul Okstad <pokstad@gitlab.com> 2020-06-18 20:15:09 +0300
commit: 56daa260cf0ff3ecedbf207524111b58e53fa124 (patch)
tree: 721a443d9339f7a7676f8559140b451e4531c86c /doc/rfcs
parent: 7c4e11c528db4889e34af7ad50c02bf86578178d (diff)
4 files changed, 328 insertions, 0 deletions
diff --git a/doc/rfcs/README.md b/doc/rfcs/README.md
new file mode 100644
index 000000000..a7d2a0aa0
--- /dev/null
+++ b/doc/rfcs/README.md
@@ -0,0 +1,42 @@
+# Gitaly RFCs
+
+This directory contains all accepted Gitaly RFCs.
+
+This document describes the rationale and process for Request For Comments (RFCs) in the Gitaly project.
+
+## What is an RFC?
+
+A Gitaly RFC is a document that communicates something related to Gitaly. Once approved, the RFC acts as a guideline for Gitaly contributors. In this way, it is like a Gitaly specific version of the [GitLab handbook](https://about.gitlab.com/handbook/).
+
+## Why RFCs?
+
+The motivation behind an RFC process is to address shortfalls in the typical GitLab workflow. GitLab embraces small iterations, boring solutions, and breadth of features. Gitaly is known to have complex and deep design goals (e.g. Praefect). These goals may traverse many merge requests, issues, epics and contributors.
+
+In order to preserve architectural integrity, we sometimes require thought frameworks to keep everything and everyone aligned while contributing. The inverse is also true. We sometimes end up with knowledge spread out across GitLab that we wish to collate and digest into a more accessible form.
+
+## When to use an RFC?
+
+- Use an RFC when the idea you are communicating is difficult to capture with traditional methods.
+- Use an RFC to vet a complex concept with your peers and reach consensus on an approach.
+- Use an RFC to retroactively capture the thinking behind design decisions.
+
+Note: this is not an exhaustive list. Be creative :)
+
+## When NOT to use an RFC?
+
+- Don't use an RFC when an different approach works better (e.g. issue or MR).
+- An RFC is a heavier process that consumes more time. Use appropriately when long term thinking and extensive peer review is desired.
+
+## Process
+
+### Creating a new RFC
+
+1. Start by duplicating the [RFC template](template.md) and replacing all required placeholders.
+1. Name the file using an abbreviated form of the RFC title.
+1. When ready for peer review, create a merge request using the `RFC` merge request template.
+1. Follow the template steps.
+
+### Updating an existing RFC
+
+When updating an RFC, use discretion when deciding between using the normal merge request process, or using the more process heavy RFC process. The RFC process should be used for more substantial revisions of an RFC.
+
diff --git a/doc/rfcs/praefect-queue-storage.md b/doc/rfcs/praefect-queue-storage.md
new file mode 100644
index 000000000..992f18ca8
--- /dev/null
+++ b/doc/rfcs/praefect-queue-storage.md
@@ -0,0 +1,181 @@
+# Storage for Praefect's replication queue
+
+## Rationale
+
+Praefect is the traffic router and replication manager for Gitaly Cluster.
+Praefect is currently (November 2019) under development and far from
+being a minimum viable HA solution. We are at a point where we think we
+need to add database to Praefect's architecture.
+
+The router part of Praefect detects Gitaly calls that modify
+repositories, and submits jobs to a job queue indicating that the
+repository that got modified needs to have its replicas updated. The
+replication manager part consumes the job queue. Currently, this queue
+is implemented in-memory in the Praefect process.
+
+While useful for prototyping, this is unsuitable for real HA Gitaly for
+two reasons:
+
+1.  The job queue must be **persistent**. Currently, the queue is
+    emptied if a Praefect process restarts. This can lead to data loss
+    in case we fail over away from a repository that is ahead of its
+    replicas.
+2.  The job queue must be **shared**. We expect multiple Praefect
+    processes to be serving up the same Gitaly storage cluster. This is
+    so that Praefect itself is not a single point of failure. These
+    Praefect processes must all see and use the same job queue.
+
+## Does it have to be a queue?
+
+We don't strictly need a queue. We need a shared, persistent database
+that allows the router to mark a repository as being in need of
+replication, and that allows the replication manager to query for
+repositories that need to be replicated -- and to clear them as "needing
+replication" afterwards. A queue is just a way of modeling this
+communication pattern.
+
+## Does the queue database need to have special properties?
+
+Different types of databases make different trade-offs in their semantics
+and reliability. For our purposes, the most important thing is that
+**messages get delivered at least once**. Delivering more than once is
+wasteful but otherwise harmless: this is because we are doing idempotent
+Git fetches.
+
+If a message gets lost, that can lead to data loss.
+
+## What sort of throughput do we expect?
+
+Currently (November 2019), gitlab.com has about 5000 Gitaly calls per
+second. About 300 of those [are labeled as
+"mutators"](https://prometheus.gprd.gitlab.net/graph?g0.range_input=7d&g0.expr=sum(rate(gitaly_cacheinvalidator_optype_total%5B5m%5D))%20by%20(type)&g0.tab=0),
+which suggests that today we'd see about 300 replication jobs per
+second. Each job may need multiple writes as it progresses through
+different states; say 5 state changes. That makes 1500 writes per
+second.
+
+Note that we have room to maneuver with sharding. Contrary to the SQL
+database of GitLab itself, which is more or less monolithic across all
+projects, there is no functional requirement to co-locate any two
+repositories on the same Gitaly server, nor on the same Praefect
+cluster. So if you have 1 million repos, you could make 1 million
+Praefect clusters, with 1 million queue database instances (one behind
+each Praefect cluster). Each queue database would then see a very, very
+low job insertion rate.
+
+This scenario is unpractical from an operational standpoint, but
+functionally, it would be OK. In other words, we have horizontal leeway
+to avoid vertically scaling the queue database. There will of course be
+practical limits on how many instances of the queue database we can run.
+Especially because the queue database must be highly available.
+
+## The queue database must be highly available
+
+If the queue database is unavailable, Praefect should be forced into a
+read-only mode. This is undesirable, so I think we can say we want the
+queue database to be highly available itself.
+
+## Running the queue database should be operationally feasible
+
+As always at GitLab, we want to choose solutions that are suitable for
+self-managed GitLab installations.
+
+-   Should be open source
+-   Don't pick an open core solution, and rely on features that are not
+    in the core
+-   Don't assume that "the cloud" makes problems go away; assume there
+    is no cloud
+-   Running the queue database should require as little expertise as
+    possible, or it should be a commodity component
+
+## Do we have other database needs in Praefect?
+
+This takes us into YAGNI territory but it's worth considering.
+
+Praefect serves as a front end for a cluster of Gitaly servers (the
+"internal Gitaly nodes") that store the actual repository data. We will
+need some form of consensus over which internal Gitaly nodes are good
+(available) or bad (offline). This is not a YAGNI, we will need this.
+Like the queue this would be shared state. The most natural fit for
+this, within GitLab's current architecture, would be Consul. But Consul
+is not a good fit for storing the queue.
+
+We might want Praefect to have a catalogue of all repositories it is
+storing. With Gitaly, there is no such catalogue; the filesystem is the
+single source of truth. This strikes me as a YAGNI though. Even with
+Praefect, there will be filesystems "in the back" on the internal Gitaly
+nodes, and those could serve as the source of truth.
+
+## What are our options
+
+### Redis
+
+Pro:
+
+-   Already used in GitLab
+-   Has queue primitives
+
+Con:
+
+-   Deployed with snapshot persistence (RDB dump) in GitLab, which is
+    not the durability I think we want
+
+### Postgres
+
+Pro:
+
+-   Already used in GitLab
+-   Gold standard for persistence
+-   General purpose database: likely to be able to grow with us as we
+    develop other needs
+
+Con:
+
+-   Can be used for queues, but not meant for it
+-   Need to find queueing library, or develop SQL-backed queue ourselves
+    (hard, subtle)
+-   Because not meant to be a queue, may have a lower ceiling where we
+    are forced to scale horizontally. When we hit the ceiling we would
+    have to run multiple Praefect clusters each with their own HA
+    Postgres cluster behind it)
+
+### Kafka
+
+Pro:
+
+-   Closely matches description of "durable queue"
+
+Con:
+
+-   Would be new to GitLab: no development experience nor operational
+    experience
+
+### SQLite or BoltDB
+
+Embedded databases such as SQLite or BoltDB don't meet our requirements
+because we need shared access. Being embedded implies you don't have to
+go over a network, while going over a network is an essential feature
+for us: this enables us to have multiple machines running Praefect.
+
+### Consul
+
+Consul is something that GitLab already relies on. You could consider it
+a database although it is not presented as that by it authors. The
+advertised use cases are service discovery and having service mesh.
+
+Consul does contain a key-value store you can use to store values
+smaller than 512KB in. But the [documentation
+states](https://www.consul.io/docs/install/performance.html#memory-requirements):
+
+> NOTE: Consul is not designed to serve as a general purpose database,
+> and you should keep this in mind when choosing what data are populated
+> to the key/value store.
+
+## Conclusion
+
+I am strongly leaning towards Postgres because it seems like a safe,
+boring choice. It has strong persistence and it is generic, which is
+useful because we don't know what our needs are yet.
+
+Running your own HA Postgres is challenging but it's a challenge you
+need to take on anyway when you deploy HA GitLab.
diff --git a/doc/rfcs/snapshot-storage.md b/doc/rfcs/snapshot-storage.md
new file mode 100644
index 000000000..a504c9877
--- /dev/null
+++ b/doc/rfcs/snapshot-storage.md
@@ -0,0 +1,97 @@
+# Proposal: snapshot storage for Git repositories
+
+## High level summary
+
+Gitaly as it exists today is a service that stores all its state on a
+local filesystem. This filesystem must therefore be durable. In this
+document we describe an alternative storage system which can store
+repository snapshots in SQL and object storage (e.g. S3).
+
+Key properties:
+
+-   Use a SQL database as a catalogue of the repositories in snapshot storage
+-   Git data (objects and refs) is stored as cold "snapshots" in object
+    storage
+-   snapshots can have a "parent", so a repository is stored as a linked
+    list of snapshots. The linked list relation is stored in SQL.
+-   to use the repository it must first be copied down to a local
+    filesystem
+
+Possible applications:
+
+-   incremental repository backups
+-   cold storage for repositories
+
+## Primitives
+
+### Git repository snapshots
+
+In [MR 1244](https://gitlab.com/gitlab-org/gitaly/merge_requests/1244)
+we have an example of how we can use Git plumbing commands to
+efficiently create incremental snapshots of an entire Git repository,
+where each snapshot may be stored as a single blob. We do this by
+combining a full dump of the ref database of the repository,
+concatenated with either a full or incremental Git packfile.
+
+A snapshot can either be full (no "parent") or it can be incremental
+relative to a previous snapshots (its parent). The snapshots are
+incremental in exactly the same way that `git fetch` is incremental.
+
+### Snapshot list
+
+Once we can make full and incremental snapshots of a repository, we can
+represent that repository as a linked list of snapshots where the first
+element must be a full snapshot, and each later element is incremental.
+
+Within this snapshot list, we can think of a project as a reference to
+its latest snapshot: it is the head of the list.
+
+### Rebuilding a repository from its snapshots
+
+To rebuild a repository from its snapshots we must "install" all
+packfiles in its list on the Gitaly server we are using. This means more
+than just downloading, because a snapshot only contains the data that
+goes in `.pack` files, and this data is useless without a corresponding
+`.idx`. This works just the same as `git clone` and `git fetch`, where
+it is up to the client (the user) to have their local computer compute
+`.idx` files. Once all the packfiles in the graph of the repository have
+been instantiated along with their `.idx` companions, we bulk-import the
+ref database from the most recent snapshot.
+
+After this it is possible that we have a lot of packfiles, which is not
+good for performance. We also won't have a `.bitmap` file. So a final
+`git repack -adb` will be needed for performance reasons.
+
+### Compacting a snapshot list
+
+The only reason we represent a repository as a list of multiple
+snapshots is that this makes it faster to make new snapshots. For faster
+restores, and to keep the total list size in check, we can collapse
+multiple snapshots into one. This comes down to restoring the repository
+in a temporary directory, up to a known snapshot. Then we make a new
+full (i.e. non-incremental) snapshot from that point-in-time copy, and
+replace all snapshots up to and including that point with a single
+(full) snapshot.
+
+### Snapshot graph representation
+
+We could represent snapshots lists with a SQL table `snapshots` with a
+1-to-1 relation mapping back into itself (the "parent" relation).
+
+Each record in the `snapshots` table would have a corresponding object
+storage blob at some immutable URL.
+
+We need this SQL table as a catalogue of our object storage objects.
+
+## Where to build this
+
+Considering that Praefect will have a SQL database tracking all its
+repositories, and that Praefect is aware of when repositories change and
+a new snapshot is warranted, it would be a candidate for managing
+snapshots.
+
+However, we could also build this in gitlab-rails. That should work fine
+for periodic snapshots, where we take snapshots regardless of whether we
+know/think there was a change in the repository.
+
+We probably don't want to build this in Gitaly itself.
diff --git a/doc/rfcs/template.md b/doc/rfcs/template.md
new file mode 100644
index 000000000..e58a17fa4
--- /dev/null
+++ b/doc/rfcs/template.md
@@ -0,0 +1,8 @@
+# RFC: <REPLACE TITLE>
+
+## Abstract
+
+<REPLACE ABSTRACT>
+
+<!--- Replace this line and start writing your RFC. Good luck! -->
+
author	Paul Okstad <pokstad@gitlab.com>	2020-06-18 20:15:09 +0300
committer	Paul Okstad <pokstad@gitlab.com>	2020-06-18 20:15:09 +0300
commit	56daa260cf0ff3ecedbf207524111b58e53fa124 (patch)
tree	721a443d9339f7a7676f8559140b451e4531c86c /doc/rfcs
parent	7c4e11c528db4889e34af7ad50c02bf86578178d (diff)