Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/geo/proxying.md')
-rw-r--r--doc/development/geo/proxying.md356
1 files changed, 356 insertions, 0 deletions
diff --git a/doc/development/geo/proxying.md b/doc/development/geo/proxying.md
new file mode 100644
index 00000000000..41c7f426c6f
--- /dev/null
+++ b/doc/development/geo/proxying.md
@@ -0,0 +1,356 @@
+---
+stage: Systems
+group: Geo
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Geo proxying
+
+With Geo proxying, secondaries now proxy web requests through Workhorse to the primary, so users navigating to the
+secondary see a read-write UI, and are able to do all operations that they can do on the primary.
+
+## Request life cycle
+
+### Top-level view
+
+The proxying interaction can be explained at a high level through the following diagram:
+
+```mermaid
+sequenceDiagram
+actor client
+participant secondary
+participant primary
+
+client->>secondary: GET /explore
+secondary-->>primary: GET /explore (proxied)
+primary-->>secondary: HTTP/1.1 200 OK [..]
+secondary->>client: HTTP/1.1 200 OK [..]
+```
+
+### Proxy detection mechanism
+
+To know whether or not it should proxy requests to the primary, and the URL of the primary (as it is stored in
+the database), Workhorse polls the internal API when Geo is enabled. When proxying should be enabled, the internal
+API responds with the primary URL and JWT-signed data that is passed on to the primary for every request.
+
+```mermaid
+sequenceDiagram
+ participant W as Workhorse (secondary)
+ participant API as Internal Rails API
+ W->API: GET /api/v4/geo/proxy (internal)
+ loop Poll every 10 seconds
+ API-->W: {geo_proxy_primary_url, geo_proxy_extra_data}, update config
+ end
+```
+
+### In-depth request flow and local data acceleration compared with proxying
+
+Detailing implementation, Workhorse on the secondary (requested) site decides whether to proxy the data or not. If it
+can "accelerate" the data type (that is, can serve locally to save a roundtrip request), it returns the data
+immediately. Otherwise, traffic is sent to the primary's internal URL, served by Workhorse on the primary exactly
+as a direct request would. The response is then be proxied back to the user through the secondary Workhorse in the
+same connection.
+
+```mermaid
+flowchart LR
+ A[Client]--->W1["Workhorse (secondary)"]
+ W1 --> W1C[Serve data locally?]
+ W1C -- "Yes" ----> W1
+ W1C -- "No (proxy)" ----> W2["Workhorse (primary)"]
+ W2 --> W1 ----> A
+```
+
+## Sign-in
+
+### Requests proxied to the primary requiring authorization
+
+```mermaid
+sequenceDiagram
+autoNumber
+participant Client
+participant Secondary
+participant Primary
+
+Client->>Secondary: `/group/project` request
+Secondary->>Primary: proxy /group/project
+opt primary not signed in
+Primary-->>Secondary: 302 redirect
+Secondary-->>Client: proxy 302 redirect
+Client->>Secondary: /users/sign_in
+Secondary->>Primary: proxy /users/sign_in
+Note right of Primary: authentication happens, POST to same URL etc
+Primary-->>Secondary: 302 redirect
+Secondary-->>Client: proxy 302 redirect
+Client->>Secondary: /group/project
+Secondary->>Primary: proxy /group/project
+end
+Primary-->>Secondary: /group/project logged in response (session on primary created)
+Secondary-->>Client: proxy full response
+```
+
+### Requests requiring a user session on the secondary
+
+At the moment, this flow only applies to Project Replication Details and Design Replication Details in the Geo Admin
+Area. For more context, see
+[View replication data on the primary site](../../administration/geo/index.md#view-replication-data-on-the-primary-site).
+
+```mermaid
+sequenceDiagram
+autoNumber
+participant Client
+participant Secondary
+participant Primary
+
+Client->>Secondary: `admin/geo/replication/projects` request
+opt secondary not signed in
+Secondary-->>Client: 302 redirect
+Client->>Secondary: /users/auth/geo/sign_in
+Secondary-->>Client: 302 redirect
+Client->>Secondary: /oauth/geo/auth/geo/sign_in
+Secondary-->>Client: 302 redirect
+Client->>Secondary: /oauth/authorize
+Secondary->>Primary: proxy /oauth/authorize
+opt primary not signed in
+Primary-->>Secondary: 302 redirect
+Secondary-->>Client: proxy 302 redirect
+Client->>Secondary: /users/sign_in
+Secondary->>Primary: proxy /users/sign_in
+Note right of Primary: authentication happens, POST to same URL etc
+end
+Primary-->>Secondary: 302 redirect
+Secondary-->>Client: proxy 302 redirect
+Client->>Secondary: /oauth/geo/callback
+Secondary-->>Client: 302 redirect
+Client->>Secondary: admin/geo/replication/projects
+end
+Secondary-->>Client: admin/geo/replication/projects logged in response (session on both primary and secondary)
+```
+
+## Git pull
+
+For historical reasons, the `push_from_secondary` path is used to forward a Git pull. There is [an issue proposing to
+rename this route](https://gitlab.com/gitlab-org/gitlab/-/issues/292690) to avoid confusion.
+
+### Git pull over HTTP(s)
+
+#### Accelerated repositories
+
+When a repository exists on the secondary and we detect is up to date with the primary, we serve it directly instead of
+proxying.
+
+```mermaid
+sequenceDiagram
+participant C as Git client
+participant Wsec as "Workhorse (secondary)"
+participant Rsec as "Rails (secondary)"
+participant Gsec as "Gitaly (secondary)"
+C->>Wsec: GET /foo/bar.git/info/refs/?service=git-upload-pack
+Wsec->>Rsec: <internal API check>
+note over Rsec: decide that the repo is synced and up to date
+Rsec-->>Wsec: 401 Unauthorized
+Wsec-->>C: <response>
+C->>Wsec: GET /foo/bar.git/info/refs/?service=git-upload-pack
+Wsec->>Rsec: <internal API check>
+Rsec-->>Wsec: Render Workhorse OK
+Wsec-->>C: 200 OK
+C->>Wsec: POST /foo/bar.git/git-upload-pack
+Wsec->>Rsec: GitHttpController#git_receive_pack
+Rsec-->>Wsec: Render Workhorse OK
+Wsec->>Gsec: Workhorse gets the connection details from Rails, connects to Gitaly: SmartHTTP Service, UploadPack RPC (check the proto for details)
+Gsec-->>Wsec: Return a stream of Proto messages
+Wsec-->>C: Pipe messages to the Git client
+```
+
+#### Proxied repositories
+
+If a requested repository isn't synced, or we detect is not up to date, the request will be proxied to the primary, in
+order to get the latest version of the changes.
+
+```mermaid
+sequenceDiagram
+participant C as Git client
+participant Wsec as "Workhorse (secondary)"
+participant Rsec as "Rails (secondary)"
+participant W as "Workhorse (primary)"
+participant R as "Rails (primary)"
+participant G as "Gitaly (primary)"
+C->>Wsec: GET /foo/bar.git/info/refs/?service=git-upload-pack
+Wsec->>Rsec: <response>
+note over Rsec: decide that the repo is out of date
+Rsec-->>Wsec: 302 Redirect to /-/push_from_secondary/2/foo/bar.git/info/refs?service=git-upload-pack
+Wsec-->>C: <response>
+C->>Wsec: GET /-/push_from_secondary/2/foo/bar.git/info/refs/?service=git-upload-pack
+Wsec->>W: <proxied request>
+W->>R: <data>
+R-->>W: 401 Unauthorized
+W-->>Wsec: <proxied response>
+Wsec-->>C: <response>
+C->>Wsec: GET /-/push_from_secondary/2/foo/bar.git/info/refs/?service=git-upload-pack
+note over W: proxied
+Wsec->>W: <proxied request>
+W->>R: <data>
+R-->>W: Render Workhorse OK
+W-->>Wsec: <proxied response>
+Wsec-->>C: <response>
+C->>Wsec: POST /-/push_from_secondary/2/foo/bar.git/git-upload-pack
+Wsec->>W: <proxied request>
+W->>R: GitHttpController#git_receive_pack
+R-->>W: Render Workhorse OK
+W->>G: Workhorse gets the connection details from Rails, connects to Gitaly: SmartHTTP Service, UploadPack RPC (check the proto for details)
+G-->>W: Return a stream of Proto messages
+W-->>Wsec: Pipe messages to the Git client
+Wsec-->>C: Return piped messages from Git
+```
+
+### Git pull over SSH
+
+As SSH operations go through GitLab Shell instead of Workhorse, they are not proxied through the mechanism used for
+Workhorse requests. With SSH operations, they are proxied as Git HTTP requests to the primary site by the secondary
+Rails internal API.
+
+#### Accelerated repositories
+
+When a repository exists on the secondary and we detect is up to date with the primary, we serve it directly instead of
+proxying.
+
+```mermaid
+sequenceDiagram
+participant C as Git client
+participant S as GitLab Shell (secondary)
+participant I as Internal API (secondary Rails)
+participant G as Gitaly (secondary)
+C->>S: git pull
+S->>I: SSH key validation (api/v4/internal/authorized_keys?key=..)
+I-->>S: HTTP/1.1 200 OK
+S->>G: InfoRefs:UploadPack RPC
+G-->>S: stream Git response back
+S-->>C: stream Git response back
+C-->>S: stream Git data to push
+S->>G: UploadPack RPC
+G-->>S: stream Git response back
+S-->>C: stream Git response back
+```
+
+#### Proxied repositories
+
+If a requested repository isn't synced, or we detect is not up to date, the request will be proxied to the primary, in
+order to get the latest version of the changes.
+
+```mermaid
+sequenceDiagram
+participant C as Git client
+participant S as GitLab Shell (secondary)
+participant I as Internal API (secondary Rails)
+participant P as Primary API
+C->>S: git pull
+S->>I: SSH key validation (api/v4/internal/authorized_keys?key=..)
+I-->>S: HTTP/1.1 300 (custom action status) with {endpoint, msg, primary_repo}
+S->>I: POST /api/v4/geo/proxy_git_ssh/info_refs_upload_pack
+I->>P: POST $PRIMARY/foo/bar.git/info/refs/?service=git-upload-pack
+P-->>I: HTTP/1.1 200 OK
+I-->>S: <response>
+S-->>C: return Git response from primary
+C-->>S: stream Git data to push
+S->>I: POST /api/v4/geo/proxy_git_ssh/upload_pack
+I->>P: POST $PRIMARY/foo/bar.git/git-upload-pack
+P-->>I: HTTP/1.1 200 OK
+I-->>S: <response>
+S-->>C: return Git response from primary
+```
+
+## Git push
+
+### Unified URLs
+
+With unified URLs, a push will redirect to a local path formatted as `/-/push_from_secondary/$SECONDARY_ID/*`. Further
+requests through this path will be proxied to the primary, which will handle the push.
+
+#### Git push over SSH
+
+As SSH operations go through GitLab Shell instead of Workhorse, they are not proxied through the mechanism used for
+Workhorse requests. With SSH operations, they are proxied as Git HTTP requests to the primary site by the secondary
+Rails internal API.
+
+```mermaid
+sequenceDiagram
+participant C as Git client
+participant S as GitLab Shell (secondary)
+participant I as Internal API (secondary Rails)
+participant P as Primary API
+C->>S: git push
+S->>I: SSH key validation (api/v4/internal/authorized_keys?key=..)
+I-->>S: HTTP/1.1 300 (custom action status) with {endpoint, msg, primary_repo}
+S->>I: POST /api/v4/geo/proxy_git_ssh/info_refs_receive_pack
+I->>P: POST $PRIMARY/foo/bar.git/info/refs/?service=git-receive-pack
+P-->>I: HTTP/1.1 200 OK
+I-->>S: <response>
+S-->>C: return Git response from primary
+C-->>S: stream Git data to push
+S->>I: POST /api/v4/geo/proxy_git_ssh/receive_pack
+I->>P: POST $PRIMARY/foo/bar.git/git-receive-pack
+P-->>I: HTTP/1.1 200 OK
+I-->>S: <response>
+S-->>C: return Git response from primary
+```
+
+#### Git push over HTTP(s)
+
+```mermaid
+sequenceDiagram
+participant C as Git client
+participant Wsec as Workhorse (secondary)
+participant W as Workhorse (primary)
+participant R as Rails (primary)
+participant G as Gitaly (primary)
+C->>Wsec: GET /foo/bar.git/info/refs/?service=git-receive-pack
+Wsec->>C: 302 Redirect to /-/push_from_secondary/2/foo/bar.git/info/refs?service=git-receive-pack
+C->>Wsec: GET /-/push_from_secondary/2/foo/bar.git/info/refs/?service=git-receive-pack
+Wsec->>W: <proxied request>
+W->>R: <data>
+R-->>W: 401 Unauthorized
+W-->>Wsec: <proxied response>
+Wsec-->>C: <response>
+C->>Wsec: GET /-/push_from_secondary/2/foo/bar.git/info/refs/?service=git-receive-pack
+Wsec->>W: <proxied request>
+W->>R: <data>
+R-->>W: Render Workhorse OK
+W-->>Wsec: <proxied response>
+Wsec-->>C: <response>
+C->>Wsec: POST /-/push_from_secondary/2/foo/bar.git/git-receive-pack
+Wsec->>W: <proxied request>
+W->>R: GitHttpController:git_receive_pack
+R-->>W: Render Workhorse OK
+W->>G: Get connection details from Rails and connects to SmartHTTP Service, ReceivePack RPC
+G-->>W: Return a stream of Proto messages
+W-->>Wsec: Pipe messages to the Git client
+Wsec-->>C: Return piped messages from Git
+```
+
+### Git push over HTTP with Separate URLs
+
+With separate URLs, the secondary will redirect to a URL formatted like `$PRIMARY/-/push_from_secondary/$SECONDARY_ID/*`.
+
+```mermaid
+sequenceDiagram
+participant Wsec as Workhorse (secondary)
+participant C as Git client
+participant W as Workhorse (primary)
+participant R as Rails (primary)
+participant G as Gitaly (primary)
+C->>Wsec: GET $SECONDARY/foo/bar.git/info/refs/?service=git-receive-pack
+Wsec->>C: 302 Redirect to $PRIMARY/-/push_from_secondary/2/foo/bar.git/info/refs?service=git-receive-pack
+C->>W: GET $PRIMARY/-/push_from_secondary/2/foo/bar.git/info/refs/?service=git-receive-pack
+W->>R: <data>
+R-->>W: 401 Unauthorized
+W-->>C: <response>
+C->>W: GET /-/push_from_secondary/2/foo/bar.git/info/refs/?service=git-receive-pack
+W->>R: <data>
+R-->>W: Render Workhorse OK
+W-->>C: <response>
+C->>W: POST /-/push_from_secondary/2/foo/bar.git/git-receive-pack
+W->>R: GitHttpController:git_receive_pack
+R-->>W: Render Workhorse OK
+W->>G: Get connection details from Rails and connects to SmartHTTP Service, ReceivePack RPC
+G-->>W: Return a stream of Proto messages
+W-->>C: Pipe messages to the Git client
+```