Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/workhorse')
-rw-r--r--doc/development/workhorse/channel.md201
-rw-r--r--doc/development/workhorse/configuration.md218
-rw-r--r--doc/development/workhorse/gitlab_features.md73
-rw-r--r--doc/development/workhorse/index.md84
-rw-r--r--doc/development/workhorse/new_features.md78
5 files changed, 654 insertions, 0 deletions
diff --git a/doc/development/workhorse/channel.md b/doc/development/workhorse/channel.md
new file mode 100644
index 00000000000..33d7cc63f00
--- /dev/null
+++ b/doc/development/workhorse/channel.md
@@ -0,0 +1,201 @@
+---
+stage: Create
+group: Source Code
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Websocket channel support for Workhorse
+
+In some cases, GitLab can provide the following through a WebSocket:
+
+- In-browser terminal access to an environment: a running server or container,
+ onto which a project has been deployed.
+- Access to services running in CI.
+
+Workhorse manages the WebSocket upgrade and long-lived connection to the websocket
+connection, which frees up GitLab to process other requests. This document outlines
+the architecture of these connections.
+
+## Introduction to WebSockets
+
+Websockets are an "upgraded" `HTTP/1.1` request. They permit bidirectional
+communication between a client and a server. **Websockets are not HTTP**.
+Clients can send messages (known as frames) to the server at any time, and
+vice versa. Client messages are not necessarily requests, and server messages are
+not necessarily responses. WebSocket URLs have schemes like `ws://` (unencrypted) or
+`wss://` (TLS-secured).
+
+When requesting an upgrade to WebSocket, the browser sends a `HTTP/1.1`
+request like this:
+
+```plaintext
+GET /path.ws HTTP/1.1
+Connection: upgrade
+Upgrade: websocket
+Sec-WebSocket-Protocol: terminal.gitlab.com
+# More headers, including security measures
+```
+
+At this point, the connection is still HTTP, so this is a request.
+The server can send a normal HTTP response, such as `404 Not Found` or
+`500 Internal Server Error`.
+
+If the server decides to permit the upgrade, it sends a HTTP
+`101 Switching Protocols` response. From this point, the connection is no longer
+HTTP. It is now a WebSocket and frames, not HTTP requests, flow over it. The connection
+persists until the client or server closes the connection.
+
+In addition to the sub-protocol, individual websocket frames may
+also specify a message type, such as:
+
+- `BinaryMessage`
+- `TextMessage`
+- `Ping`
+- `Pong`
+- `Close`
+
+Only binary frames can contain arbitrary data. The frames are expected to be valid
+UTF-8 strings, in addition to any sub-protocol expectations.
+
+## Browser to Workhorse
+
+Using the terminal as an example:
+
+1. GitLab serves a JavaScript terminal emulator to the browser on a URL like
+ `https://gitlab.com/group/project/-/environments/1/terminal`.
+1. This URL opens a websocket connection to
+ `wss://gitlab.com/group/project/-/environments/1/terminal.ws`.
+ This endpoint exists only in Workhorse, and doesn't exist in GitLab.
+1. When receiving the connection, Workhorse first performs a `preauthentication`
+ request to GitLab to confirm the client is authorized to access the requested terminal:
+ - If the client has the appropriate permissions and the terminal exists, GitLab
+ responds with a successful response that includes details of the terminal
+ the client should be connected to.
+ - Otherwise, Workhorse returns an appropriate HTTP error response.
+1. If GitLab returns valid terminal details to Workhorse, it:
+ 1. Connects to the specified terminal.
+ 1. Upgrades the browser to a WebSocket.
+ 1. Proxies between the two connections for as long as the browser's credentials are valid.
+ 1. Send regular `PingMessage` control frames to the browser, to prevent intervening
+ proxies from terminating the connection while the browser is present.
+
+The browser must request an upgrade with a specific sub-protocol:
+
+- [`terminal.gitlab.com`](#terminalgitlabcom)
+- [`base64.terminal.gitlab.com`](#base64terminalgitlabcom)
+
+### `terminal.gitlab.com`
+
+This sub-protocol considers `TextMessage` frames to be invalid. Control frames,
+such as `PingMessage` or `CloseMessage`, have their usual meanings.
+
+- `BinaryMessage` frames sent from the browser to the server are
+ arbitrary text input.
+- `BinaryMessage` frames sent from the server to the browser are
+ arbitrary text output.
+
+These frames are expected to contain ANSI text control codes
+and may be in any encoding.
+
+### `base64.terminal.gitlab.com`
+
+This sub-protocol considers `BinaryMessage` frames to be invalid.
+Control frames, such as `PingMessage` or `CloseMessage`, have
+their usual meanings.
+
+- `TextMessage` frames sent from the browser to the server are
+ base64-encoded arbitrary text input. The server must
+ base64-decode them before inputting them.
+- `TextMessage` frames sent from the server to the browser are
+ base64-encoded arbitrary text output. The browser must
+ base64-decode them before outputting them.
+
+In their base64-encoded form, these frames are expected to
+contain ANSI terminal control codes, and may be in any encoding.
+
+## Workhorse to GitLab
+
+Using the terminal as an example, before upgrading the browser,
+Workhorse sends a normal HTTP request to GitLab on a URL like
+`https://gitlab.com/group/project/environments/1/terminal.ws/authorize`.
+This returns a JSON response containing details of where the
+terminal can be found, and how to connect it. In particular,
+the following details are returned in case of success:
+
+- WebSocket URL to connect** to, such as `wss://example.com/terminals/1.ws?tty=1`.
+- WebSocket sub-protocols to support, such as `["channel.k8s.io"]`.
+- Headers to send, such as `Authorization: Token xxyyz`.
+- Optional. Certificate authority to verify `wss` connections with.
+
+Workhorse periodically rechecks this endpoint. If it receives an error response,
+or the details of the terminal change, it terminates the websocket session.
+
+## Workhorse to the WebSocket server
+
+In GitLab, environments or CI jobs may have a deployment service (like
+`KubernetesService`) associated with them. This service knows
+where the terminals or the service for an environment may be found, and GitLab
+returns these details to Workhorse.
+
+These URLs are also WebSocket URLs. GitLab tells Workhorse which sub-protocols to
+speak over the connection, along with any authentication details required by the
+remote end.
+
+Before upgrading the browser's connection to a websocket, Workhorse:
+
+1. Opens a HTTP client connection, according to the details given to it by Workhorse.
+1. Attempts to upgrade that connection to a websocket.
+ - If it fails, an error response is sent to the browser.
+ - If it succeeds, the browser is also upgraded.
+
+Workhorse now has two websocket connections, albeit with differing sub-protocols,
+and then:
+
+- Decodes incoming frames from the browser, re-encodes them to the channel's
+ sub-protocol, and sends them to the channel.
+- Decodes incoming frames from the channel, re-encodes them to the browser's
+ sub-protocol, and sends them to the browser.
+
+When either connection closes or enters an error state, Workhorse detects the error
+and closes the other connection, terminating the channel session. If the browser
+is the connection that has disconnected, Workhorse sends an ANSI `End of Transmission`
+control code (the `0x04` byte) to the channel, encoded according to the appropriate
+sub-protocol. To avoid being disconnected, Workhorse replies to any websocket ping
+frame sent by the channel.
+
+Workhorse only supports the following sub-protocols:
+
+- [`channel.k8s.io`](#channelk8sio)
+- [`base64.channel.k8s.io`](#base64channelk8sio)
+
+Supporting new deployment services requires new sub-protocols to be supported.
+
+### `channel.k8s.io`
+
+Used by Kubernetes, this sub-protocol defines a simple multiplexed channel.
+
+Control frames have their usual meanings. `TextMessage` frames are
+invalid. `BinaryMessage` frames represent I/O to a specific file
+descriptor.
+
+The first byte of each `BinaryMessage` frame represents the file
+descriptor (`fd`) number, as a `uint8`. For example:
+
+- `0x00` corresponds to `fd 0`, `STDIN`.
+- `0x01` corresponds to `fd 1`, `STDOUT`.
+
+The remaining bytes represent arbitrary data. For frames received
+from the server, they are bytes that have been received from that
+`fd`. For frames sent to the server, they are bytes that should be
+written to that `fd`.
+
+### `base64.channel.k8s.io`
+
+Also used by Kubernetes, this sub-protocol defines a similar multiplexed
+channel to `channel.k8s.io`. The main differences are:
+
+- `TextMessage` frames are valid, rather than `BinaryMessage` frames.
+- The first byte of each `TextMessage` frame represents the file
+ descriptor as a numeric UTF-8 character, so the character `U+0030`,
+ or "0", is `fd 0`, `STDIN`.
+- The remaining bytes represent base64-encoded arbitrary data.
diff --git a/doc/development/workhorse/configuration.md b/doc/development/workhorse/configuration.md
new file mode 100644
index 00000000000..7f9331e6f1e
--- /dev/null
+++ b/doc/development/workhorse/configuration.md
@@ -0,0 +1,218 @@
+---
+stage: Create
+group: Source Code
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Workhorse configuration
+
+For historical reasons, Workhorse uses:
+
+- Command line flags.
+- A configuration file.
+- Environment variables.
+
+Add any new Workhorse configuration options into the configuration file.
+
+## CLI options
+
+```plaintext
+ gitlab-workhorse [OPTIONS]
+
+Options:
+ -apiCiLongPollingDuration duration
+ Long polling duration for job requesting for runners (default 50ns)
+ -apiLimit uint
+ Number of API requests allowed at single time
+ -apiQueueDuration duration
+ Maximum queueing duration of requests (default 30s)
+ -apiQueueLimit uint
+ Number of API requests allowed to be queued
+ -authBackend string
+ Authentication/authorization backend (default "http://localhost:8080")
+ -authSocket string
+ Optional: Unix domain socket to dial authBackend at
+ -cableBackend string
+ Optional: ActionCable backend (default authBackend)
+ -cableSocket string
+ Optional: Unix domain socket to dial cableBackend at (default authSocket)
+ -config string
+ TOML file to load config from
+ -developmentMode
+ Allow the assets to be served from Rails app
+ -documentRoot string
+ Path to static files content (default "public")
+ -listenAddr string
+ Listen address for HTTP server (default "localhost:8181")
+ -listenNetwork string
+ Listen 'network' (tcp, tcp4, tcp6, unix) (default "tcp")
+ -listenUmask int
+ Umask for Unix socket
+ -logFile string
+ Log file location
+ -logFormat string
+ Log format to use defaults to text (text, json, structured, none) (default "text")
+ -pprofListenAddr string
+ pprof listening address, e.g. 'localhost:6060'
+ -prometheusListenAddr string
+ Prometheus listening address, e.g. 'localhost:9229'
+ -proxyHeadersTimeout duration
+ How long to wait for response headers when proxying the request (default 5m0s)
+ -secretPath string
+ File with secret key to authenticate with authBackend (default "./.gitlab_workhorse_secret")
+ -version
+ Print version and exit
+```
+
+The 'auth backend' refers to the GitLab Rails application. The name is
+a holdover from when GitLab Workhorse only handled `git push` and `git pull` over
+HTTP.
+
+GitLab Workhorse can listen on either a TCP or a Unix domain socket. It
+can also open a second listening TCP listening socket with the Go
+[`net/http/pprof` profiler server](http://golang.org/pkg/net/http/pprof/).
+
+GitLab Workhorse can listen on Redis build and runner registration events if you
+pass a valid TOML configuration file through the `-config` flag.
+A regular setup it only requires the following (replacing the string
+with the actual socket)
+
+## Redis
+
+GitLab Workhorse integrates with Redis to do long polling for CI build
+requests. To configure it:
+
+- Configure Redis settings in the TOML configuration file.
+- Control polling behavior for CI build requests with the `-apiCiLongPollingDuration`
+ command-line flag.
+
+You can enable Redis in the configuration file while leaving CI polling
+disabled. This configuration results in an idle Redis Pub/Sub connection. The
+opposite is not possible: CI long polling requires a correct Redis configuration.
+
+For example, the `[redis]` section in the configuration file could contain:
+
+```plaintext
+[redis]
+URL = "unix:///var/run/gitlab/redis.sock"
+Password = "my_awesome_password"
+Sentinel = [ "tcp://sentinel1:23456", "tcp://sentinel2:23456" ]
+SentinelMaster = "mymaster"
+```
+
+- `URL` - A string in the format `unix://path/to/redis.sock` or `tcp://host:port`.
+- `Password` - Required only if your Redis instance is password-protected.
+- `Sentinel` - Required if you use Sentinel.
+
+If both `Sentinel` and `URL` are given, only `Sentinel` is used.
+
+Optional fields:
+
+```plaintext
+[redis]
+DB = 0
+MaxIdle = 1
+MaxActive = 1
+```
+
+- `DB` - The database to connect to. Defaults to `0`.
+- `MaxIdle` - How many idle connections can be in the Redis pool at once. Defaults to `1`.
+- `MaxActive` - How many connections the pool can keep. Defaults to `1`.
+
+## Relative URL support
+
+If you mount GitLab at a relative URL, like `example.com/gitlab`), use this
+relative URL in the `authBackend` setting:
+
+```plaintext
+gitlab-workhorse -authBackend http://localhost:8080/gitlab
+```
+
+## Interaction of authBackend and authSocket
+
+The interaction between `authBackend` and `authSocket` can be confusing.
+If `authSocket` is set, it overrides the host portion of `authBackend`, but not
+the relative path.
+
+In table form:
+
+| authBackend | authSocket | Workhorse connects to | Rails relative URL |
+|--------------------------------|-------------------|-----------------------|--------------------|
+| unset | unset | `localhost:8080` | `/` |
+| `http://localhost:3000` | unset | `localhost:3000` | `/` |
+| `http://localhost:3000/gitlab` | unset | `localhost:3000` | `/gitlab` |
+| unset | `/path/to/socket` | `/path/to/socket` | `/` |
+| `http://localhost:3000` | `/path/to/socket` | `/path/to/socket` | `/` |
+| `http://localhost:3000/gitlab` | `/path/to/socket` | `/path/to/socket` | `/gitlab` |
+
+The same applies to `cableBackend` and `cableSocket`.
+
+## Error tracking
+
+GitLab-Workhorse supports remote error tracking with [Sentry](https://sentry.io).
+To enable this feature, set the `GITLAB_WORKHORSE_SENTRY_DSN` environment variable.
+You can also set the `GITLAB_WORKHORSE_SENTRY_ENVIRONMENT` environment variable to
+use the Sentry environment feature to separate staging, production and
+development.
+
+Omnibus GitLab (`/etc/gitlab/gitlab.rb`):
+
+```ruby
+gitlab_workhorse['env'] = {
+ 'GITLAB_WORKHORSE_SENTRY_DSN' => 'https://foobar'
+ 'GITLAB_WORKHORSE_SENTRY_ENVIRONMENT' => 'production'
+}
+```
+
+Source installations (`/etc/default/gitlab`):
+
+```plaintext
+export GITLAB_WORKHORSE_SENTRY_DSN='https://foobar'
+export GITLAB_WORKHORSE_SENTRY_ENVIRONMENT='production'
+```
+
+## Distributed tracing
+
+Workhorse supports distributed tracing through [LabKit](https://gitlab.com/gitlab-org/labkit/)
+using [OpenTracing APIs](https://opentracing.io).
+
+By default, no tracing implementation is linked into the binary. You can link in
+different OpenTracing providers with [build tags](https://golang.org/pkg/go/build/#hdr-Build_Constraints)
+or build constraints by setting the `BUILD_TAGS` make variable.
+
+For more details of the supported providers, refer to LabKit. For an example of
+Jaeger tracing support, include the tags: `BUILD_TAGS="tracer_static tracer_static_jaeger"` like this:
+
+```shell
+make BUILD_TAGS="tracer_static tracer_static_jaeger"
+```
+
+After you compile Workhorse with an OpenTracing provider, configure the tracing
+configuration with the `GITLAB_TRACING` environment variable, like this:
+
+```shell
+GITLAB_TRACING=opentracing://jaeger ./gitlab-workhorse
+```
+
+## Continuous profiling
+
+Workhorse supports continuous profiling through [LabKit](https://gitlab.com/gitlab-org/labkit/)
+using [Stackdriver Profiler](https://cloud.google.com/profiler). By default, the
+Stackdriver Profiler implementation is linked in the binary using
+[build tags](https://golang.org/pkg/go/build/#hdr-Build_Constraints), though it's not
+required and can be skipped. For example:
+
+```shell
+make BUILD_TAGS=""
+```
+
+After you compile Workhorse with continuous profiling, set the profiler configuration
+with the `GITLAB_CONTINUOUS_PROFILING` environment variable. For example:
+
+```shell
+GITLAB_CONTINUOUS_PROFILING="stackdriver?service=workhorse&service_version=1.0.1&project_id=test-123 ./gitlab-workhorse"
+```
+
+## Related topics
+
+- [LabKit monitoring documentation](https://gitlab.com/gitlab-org/labkit/-/blob/master/monitoring/doc.go).
diff --git a/doc/development/workhorse/gitlab_features.md b/doc/development/workhorse/gitlab_features.md
new file mode 100644
index 00000000000..2aa8d9d2399
--- /dev/null
+++ b/doc/development/workhorse/gitlab_features.md
@@ -0,0 +1,73 @@
+---
+stage: Create
+group: Source Code
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Features that rely on Workhorse
+
+Workhorse itself is not a feature, but there are several features in
+GitLab that would not work efficiently without Workhorse.
+
+To put the efficiency benefit in context, consider that in 2020Q3 on
+GitLab.com [we see](https://thanos-query.ops.gitlab.net/graph?g0.range_input=1h&g0.max_source_resolution=0s&g0.expr=sum(ruby_process_resident_memory_bytes%7Bapp%3D%22webservice%22%2Cenv%3D%22gprd%22%2Crelease%3D%22gitlab%22%7D)%20%2F%20sum(puma_max_threads%7Bapp%3D%22webservice%22%2Cenv%3D%22gprd%22%2Crelease%3D%22gitlab%22%7D)&g0.tab=1&g1.range_input=1h&g1.max_source_resolution=0s&g1.expr=sum(go_memstats_sys_bytes%7Bapp%3D%22webservice%22%2Cenv%3D%22gprd%22%2Crelease%3D%22gitlab%22%7D)%2Fsum(go_goroutines%7Bapp%3D%22webservice%22%2Cenv%3D%22gprd%22%2Crelease%3D%22gitlab%22%7D)&g1.tab=1)
+Rails application threads using on average
+about 200MB of RSS vs about 200KB for Workhorse goroutines.
+
+Examples of features that rely on Workhorse:
+
+## 1. `git clone` and `git push` over HTTP
+
+Git clone, pull and push are slow because they transfer large amounts
+of data and because each is CPU intensive on the GitLab side. Without
+Workhorse, HTTP access to Git repositories would compete with regular
+web access to the application, requiring us to run way more Rails
+application servers.
+
+## 2. CI runner long polling
+
+GitLab CI runners fetch new CI jobs by polling the GitLab server.
+Workhorse acts as a kind of "waiting room" where CI runners can sit
+and wait for new CI jobs. Because of Go's efficiency we can fit a lot
+of runners in the waiting room at little cost. Without this waiting
+room mechanism we would have to add a lot more Rails server capacity.
+
+## 3. File uploads and downloads
+
+File uploads and downloads may be slow either because the file is
+large or because the user's connection is slow. Workhorse can handle
+the slow part for Rails. This improves the efficiency of features such
+as CI artifacts, package repositories, LFS objects, etc.
+
+## 4. Websocket proxying
+
+Features such as the web terminal require a long lived connection
+between the user's web browser and a container inside GitLab that is
+not directly accessible from the internet. Dedicating a Rails
+application thread to proxying such a connection would cost much more
+memory than it costs to have Workhorse look after it.
+
+## Quick facts (how does Workhorse work)
+
+- Workhorse can handle some requests without involving Rails at all:
+ for example, JavaScript files and CSS files are served straight
+ from disk.
+- Workhorse can modify responses sent by Rails: for example if you use
+ `send_file` in Rails then GitLab Workhorse will open the file on
+ disk and send its contents as the response body to the client.
+- Workhorse can take over requests after asking permission from Rails.
+ Example: handling `git clone`.
+- Workhorse can modify requests before passing them to Rails. Example:
+ when handling a Git LFS upload Workhorse first asks permission from
+ Rails, then it stores the request body in a tempfile, then it sends
+ a modified request containing the tempfile path to Rails.
+- Workhorse can manage long-lived WebSocket connections for Rails.
+ Example: handling the terminal websocket for environments.
+- Workhorse does not connect to PostgreSQL, only to Rails and (optionally) Redis.
+- We assume that all requests that reach Workhorse pass through an
+ upstream proxy such as NGINX or Apache first.
+- Workhorse does not accept HTTPS connections.
+- Workhorse does not clean up idle client connections.
+- We assume that all requests to Rails pass through Workhorse.
+
+For more information see ['A brief history of GitLab Workhorse'](https://about.gitlab.com/2016/04/12/a-brief-history-of-gitlab-workhorse/).
diff --git a/doc/development/workhorse/index.md b/doc/development/workhorse/index.md
new file mode 100644
index 00000000000..f7ca16e0f31
--- /dev/null
+++ b/doc/development/workhorse/index.md
@@ -0,0 +1,84 @@
+---
+stage: Create
+group: Source Code
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# GitLab Workhorse
+
+GitLab Workhorse is a smart reverse proxy for GitLab. It handles
+"large" HTTP requests such as file downloads, file uploads, Git
+push/pull and Git archive downloads.
+
+Workhorse itself is not a feature, but there are [several features in
+GitLab](gitlab_features.md) that would not work efficiently without Workhorse.
+
+The canonical source for Workhorse is
+[`gitlab-org/gitlab/workhorse`](https://gitlab.com/gitlab-org/gitlab/tree/master/workhorse).
+Prior to [epic #4826](https://gitlab.com/groups/gitlab-org/-/epics/4826), it was
+[`gitlab-org/gitlab-workhorse`](https://gitlab.com/gitlab-org/gitlab-workhorse/tree/master),
+but that repository is no longer used for development.
+
+## Install Workhorse
+
+To install GitLab Workhorse you need [Go 1.15 or newer](https://golang.org/dl) and
+[GNU Make](https://www.gnu.org/software/make/).
+
+To install into `/usr/local/bin` run `make install`.
+
+```plaintext
+make install
+```
+
+To install into `/foo/bin` set the PREFIX variable.
+
+```plaintext
+make install PREFIX=/foo
+```
+
+On some operating systems, such as FreeBSD, you may have to use
+`gmake` instead of `make`.
+
+*NOTE*: Some features depends on build tags, make sure to check
+[Workhorse configuration](configuration.md) to enable them.
+
+### Run time dependencies
+
+Workhorse uses [Exiftool](https://www.sno.phy.queensu.ca/~phil/exiftool/) for
+removing EXIF data (which may contain sensitive information) from uploaded
+images. If you installed GitLab:
+
+- Using the Omnibus package, you're all set.
+ *NOTE* that if you are using CentOS Minimal, you may need to install `perl`
+ package: `yum install perl`
+- From source, make sure `exiftool` is installed:
+
+ ```shell
+ # Debian/Ubuntu
+ sudo apt-get install libimage-exiftool-perl
+
+ # RHEL/CentOS
+ sudo yum install perl-Image-ExifTool
+ ```
+
+## Testing your code
+
+Run the tests with:
+
+```plaintext
+make clean test
+```
+
+Each feature in GitLab Workhorse should have an integration test that
+verifies that the feature 'kicks in' on the right requests and leaves
+other requests unaffected. It is better to also have package-level tests
+for specific behavior but the high-level integration tests should have
+the first priority during development.
+
+It is OK if a feature is only covered by integration tests.
+
+<!--
+## License
+
+This code is distributed under the MIT license, see the [LICENSE](LICENSE) file.
+-->
diff --git a/doc/development/workhorse/new_features.md b/doc/development/workhorse/new_features.md
new file mode 100644
index 00000000000..3ad15c1de16
--- /dev/null
+++ b/doc/development/workhorse/new_features.md
@@ -0,0 +1,78 @@
+---
+stage: Create
+group: Source Code
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Adding new features to Workhorse
+
+GitLab Workhorse is a smart reverse proxy for GitLab. It handles
+[long HTTP requests](#what-are-long-requests), such as:
+
+- File downloads.
+- File uploads.
+- Git pushes and pulls.
+- Git archive downloads.
+
+Workhorse itself is not a feature, but [several features in GitLab](gitlab_features.md)
+would not work efficiently without Workhorse.
+
+At a first glance, Workhorse appears to be just a pipeline for processing HTTP
+streams to reduce the amount of logic in your Ruby on Rails controller. However,
+don't treat it that way. Engineers trying to offload a feature to Workhorse often
+find it takes more work than originally anticipated:
+
+- It's a new programming language, and only a few engineers at GitLab are Go developers.
+- Workhorse has demanding requirements:
+ - It's stateless.
+ - Memory and disk usage must be kept under tight control.
+ - The request should not be slowed down in the process.
+
+## Avoid adding new features
+
+We suggest adding new features only if absolutely necessary and no other options exist.
+Splitting a feature between the Rails codebase and Workhorse is a deliberate choice
+to introduce technical debt. It adds complexity to the system, and coupling between
+the two components:
+
+- Building features using Workhorse has a considerable complexity cost, so you should
+ prefer designs based on Rails requests and Sidekiq jobs.
+- Even when using Rails-and-Sidekiq is more work than using Rails-and-Workhorse,
+ Rails-and-Sidekiq is easier to maintain in the long term. Workhorse is unique
+ to GitLab, while Rails-and-Sidekiq is an industry standard.
+- For global behaviors around web requests, consider using a Rack middleware
+ instead of Workhorse.
+- Generally speaking, use Rails-and-Workhorse only if the HTTP client expects
+ behavior reasonable to implement in Rails, like long requests.
+
+## What are long requests?
+
+One order of magnitude exists between Workhorse and Puma RAM usage. Having a connection
+open for longer than milliseconds is problematic due to the amount of RAM
+it monopolizes after it reaches the Ruby on Rails controller. We've identified two classes
+of long requests: data transfers and HTTP long polling. Some examples:
+
+- `git push`.
+- `git pull`.
+- Uploading or downloading an artifact.
+- A CI runner waiting for a new job.
+
+With the rise of cloud-native installations, Workhorse's feature set was extended
+to add object storage direct-upload. This change removed the need for the shared
+Network File System (NFS) drives.
+
+If you still think we should add a new feature to Workhorse, open an issue for the
+Workhorse maintainers and explain:
+
+1. What you want to implement.
+1. Why it can't be implemented in our Ruby codebase.
+
+The Workhorse maintainers can help you assess the situation.
+
+## Related topics
+
+- In 2020, `@nolith` presented the talk
+ ["Speed up the monolith. Building a smart reverse proxy in Go"](https://archive.fosdem.org/2020/schedule/event/speedupmonolith/)
+ at FOSDEM. The talk includes more details on the history of Workhorse and the NFS removal.
+- The [uploads development documentation](../uploads.md) contains the most common
+ use cases for adding a new type of upload.