Welcome to mirror list, hosted at ThFree Co, Russian Federation.

index.md « activity_pub « blueprints « architecture « doc - gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
blob: 1612d0916e32eb0d6c3ebc874b7575732100c2cf (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
---
status: proposed
creation-date: "2023-09-12"
authors: [ "@oelmekki", "@jpcyiza" ]
coach: "@tkuah"
approvers: [ "@derekferguson" ]
owning-stage: ""
participating-stages: [ "~section::dev" ]
---

<!-- Blueprints often contain forward-looking statements -->
<!-- vale gitlab.FutureTense = NO -->

# ActivityPub support

## Summary

The end goal of this proposal is to build interoperability features into
GitLab so that it's possible on one instance of GitLab to open a merge
request to a project hosted on an other instance, merging all willing
instances in a global network.

To achieve that, we propose to use ActivityPub, the w3c standard used by
the Fediverse. This will allow us to build upon a robust and battle-tested
protocol, and it will open GitLab to a wider community.

Before starting implementing cross-instance merge requests, we want to
start with smaller steps, helping us to build up domain knowledge about
ActivityPub and creating the underlying architecture that will support the
more advanced features. For that reason, we propose to start with
implementing social features, allowing people on the Fediverse to subscribe
to activities on GitLab, for example to be notified on their social network
of choice when their favorite project hosted on GitLab makes a new release.
As a bonus, this is an opportunity to make GitLab more social and grow its
audience.

## Description of the related tech and terms

Feel free to jump to [Motivation](#motivation) if you already know what
ActivityPub and the Fediverse are.

Among the push for [decentralization of the web](https://en.wikipedia.org/wiki/Decentralized_web),
several projects tried different protocols with different ideals behind their reasoning.
Some examples:

- [Secure Scuttlebutt](https://en.wikipedia.org/wiki/Secure_Scuttlebutt) (or SSB for short)
- [Dat](https://en.wikipedia.org/wiki/Dat_%28software%29)
- [IPFS](https://en.wikipedia.org/wiki/InterPlanetary_File_System),
- [Solid](https://en.wikipedia.org/wiki/Solid_%28web_decentralization_project%29)

One gained traction recently: [ActivityPub](https://en.wikipedia.org/wiki/ActivityPub),
better known for the colloquial [Fediverse](https://en.wikipedia.org/wiki/Fediverse) built
on top of it, through applications like
[Mastodon](https://en.wikipedia.org/wiki/Mastodon_%28social_network%29)
(which could be described as some sort of decentralized Facebook) or
[Lemmy](https://en.wikipedia.org/wiki/Lemmy_%28software%29) (which could be
described as some sort of decentralized Reddit).

ActivityPub has several advantages that makes it attractive
to implementers and could explain its current success:

- **It's built on top of HTTP**. You don't need to install new software or
  to tinker with TCP/UDP to implement ActivityPub, if you have a webserver
  or an application that provides an HTTP API (like a rails application),
  you already have everything you need.
- **It's built on top of JSON**. All communications are basically JSON
  objects, which web developers are already used to, which simplifies adoption.
- **It's a W3C standard and already has multiple implementations**. Being
  piloted by the W3C is a guarantee of stability and quality work. They
  have profusely demonstrated in the past through their work on HTML, CSS
  or other web standards that we can build on top of their work without
  the fear of it becoming deprecated or irrelevant after a few years.

### The Fediverse

The core idea behind Mastodon and Lemmy is called the Fediverse. Rather
than full decentralization, those applications rely on federation, in the
sense that there still are servers and clients. It's not P2P like SSB,
Dat and IPFS, but instead a galaxy of servers chatting with each other
instead of having central servers controlled by a single entity.

The user signs up to one of those servers (called **instances**), and they
can then interact with users either on this instance, or on other ones.
From the perspective of the user, they access a global network, and not
only their instance. They see the articles posted on other instances, they
can comment on them, upvote them, etc.

What happens behind the scenes:
their instance knows where the user they reply to is hosted. It
contacts that other instance to let them know there is a message for them -
somewhat similar to SMTP. Similarly, when a user subscribes
to a feed, their instance informs the instance where the feed is
hosted of this subscription. That target instance then posts back
messages when new activities are created. This allows for a push model, rather
than a constant poll model like RSS. Of course, what was just described is
the happy path; there is moderation, validation and fault tolerance
happening all the way.

### ActivityPub

Behind the Fediverse is the ActivityPub protocol. It's a HTTP API
attempting to be as general a social network implementation as possible,
while giving options to be extendable.

The basic idea is that an `actor` sends and receives `activities`. Activities
are structured JSON messages with well-defined properties, but are extensible
to cover any need. An actor is defined by four endpoints, which are
contacted with the
`application/ld+json; profile="https://www.w3.org/ns/activitystreams"` HTTP Accept header:

- `GET /inbox`: used by the actor to find new activities intended for them.
- `POST /inbox`: used by instances to push new activities intended for the actor.
- `GET /outbox`: used by anyone to read the activities created by the actor.
- `POST /outbox`: used by the actor to publish new activities.

Among those, Mastodon and Lemmy only use `POST /inbox` and `GET /outbox`, which
are the minimum needed to implement federation:

- Instances push new activities for the actor on the inbox.
- Reading the outbox allows reading the feed of an actor.

Additionally, Mastodon and Lemmy implement a `GET /` endpoint (with the
mentioned Accept header). This endpoint responds with general information about the
actor, like name and URL of the inbox and outbox. While not required by the
standard, it makes discovery easier.

While a person is the main use case for an actor, an actor does not
necessarily map to a person. Anything can be an actor: a topic, a
subreddit, a group, an event. For GitLab, anything with activities (in the sense
of what GitLab means by "activity") can be an ActivityPub actor. This includes
items like projects, groups, and releases. In those more abstract examples,
an actor can be thought of as an actionable feed.

ActivityPub by itself does not cover everything that is needed to implement
the Fediverse. Most notably, these are left for the implementers to figure out:

- Finding a way to deal with spam. Spam is handled by authorizing or
  blocking ("defederating") other instances.
- Discovering new instances.
- Performing network-wide searches.

## Motivation

Why would a social media protocol be useful for GitLab? People want a single,
global GitLab network to interact between various projects, without having to
register on each of their hosts.

Several very popular discussions around this have already happened:

- [Share events externally via ActivityPub](https://gitlab.com/gitlab-org/gitlab/-/issues/21582)
- [Implement cross-server (federated) merge requests](https://gitlab.com/gitlab-org/gitlab/-/issues/14116)
- [Distributed merge requests](https://gitlab.com/groups/gitlab-org/-/epics/260).

The ideal workflow would be:

1. Alice registers to her favorite GitLab instance, like `gitlab.example.org`.
1. She looks for a project on a given topic, and sees Bob's project, even though
   Bob is on `gitlab.com`.
1. Alice selects **Fork**, and the `gitlab.com/Bob/project.git` is
   forked to `gitlab.example.org/Alice/project.git`.
1. She makes her edits, and opens a merge request, which appears in Bob's
   project on `gitlab.com`.
1. Alice and Bob discuss the merge request, each one from their own GitLab
   instance.
1. Bob can send additional commits, which are picked up by Alice's instance.
1. When Bob accepts the merge request, his instance picks up the code from
   Alice's instance.

In this process, ActivityPub would help in:

- Letting Bob know a fork happened.
- Sending the merge request to Bob.
- Enabling Alice and Bob to discuss the merge request.
- Letting Alice know the code was merged.

It does _not_ help in these cases, which need specific implementations:

- Implementing a network-wide search.
- Implementing cross-instance forks. (Not needed, thanks to Git.)

Why use ActivityPub here rather than implementing cross-instance merge requests
in a custom way? Two reasons:

1. **Building on top of a standard helps reach beyond GitLab**.
   While the workflow presented above only mentions GitLab, building on top
   of a W3C standard means other forges can follow GitLab
   there, and build a massive Fediverse of code sharing.
1. **An opportunity to make GitLab more social**. To prepare the
   architecture for the workflow above, smaller steps can be taken, allowing
   people to subscribe to activity feeds from their Fediverse social
   network. Anything that has a RSS feed could become an ActivityPub feed.
   People on Mastodon could follow their favorite developer, project, or topic
   from GitLab and see the news in their feed on Mastodon, hopefully raising
   engagement with GitLab.

### Goals

- allowing to share interesting events on ActivityPub based social media
- allowing to open an issue and discuss it from one instance to an other
- allowing to fork a project from one instance to an other
- allowing to open a merge request, discuss it and merge it from one instance to an other
- allowing to perform a network wide search?

### Non-Goals

- federation of private resources
- allowing to perform a network wide search?

## Proposal

The idea of this implementation path is not to take the fastest route to
the feature with the most value added (cross-instance merge requests), but
to go on with the smallest useful step at each iteration, making sure each step
brings something immediately useful.

1. **Implement ActivityPub for social following**.
   After this, the Fediverse can follow activities on GitLab instances.
    1. ActivityPub to subscribe to project releases.
    1. ActivityPub to subscribe to project creation in topics.
    1. ActivityPub to subscribe to project activities.
    1. ActivityPub to subscribe to group activities.
    1. ActivityPub to subscribe to user activities.
1. **Implement cross-instance forks** to enable forking a project from an other instance.
1. **Implement ActivityPub for cross-instance discussions** to enable discussing
   issues and merge requests from another instance:
    1. In issues.
    1. In merge requests.
1. **Implement ActivityPub to submit cross-instance merge requests** to enable
   submitting merge requests to other instances.
1. **Implement cross-instance search** to enable discovering projects on other instances.

It's open to discussion if this last step should be included at all.
Currently, in most Fediverse apps, when you want to display a resource from
an instance that your instance does not know about (typically a user you
want to follow), you paste the URL of the resource in the search box of
your instance, and it fetches and displays the remote resource, now
actionable from your instance. We plan to do that at first.

The question is : do we keep it at that? This UX has severe frictions,
especially for users not used to Fediverse UX patterns (which is probably
most GitLab users). On the other hand, distributed search is a subject
complicated enough to deserve its own blueprint (although it's not as
complicated as it used to be, now that decentralization protocols and
applications worked on it for a while).

## Design and implementation details

First, it's a good idea to get familiar with the specifications of the
three standards we're going to use:

- [ActivityPub](https://www.w3.org/TR/activitypub/) defines the HTTP
  requests happening to implement federation.
- [ActivityStreams](https://www.w3.org/TR/activitystreams-core/) defines the
  format of the JSON messages exchanged by the users of the protocol.
- [Activity Vocabulary](https://www.w3.org/TR/activitystreams-vocabulary/)
  defines the various messages recognized by default.

Feel free to ping @oelmekki if you have questions or find the documents too
dense to follow.

### Production readiness

TBC

### The social following part

This part is laying the ground work allowing to
[add new ActivityPub actors](../../../development/activitypub/actors/index.md) to
GitLab.

There are 5 actors we want to implement:

- the `releases` actor, to be notified when given project makes a new
  release
- the `topic` actor, to be notified when a new project is added to a topic
- the `project` actor, regarding all activities from a project
- the `group` actor, regarding all activities from a group
- the `user` actor, regarding all activities from a user

We're only dealing with public resources for now. Allowing federation of
private resources is a tricky subject that will be solved later, if it's
possible at all.

#### Endpoints

Each actor needs 3 endpoints:

- the profile endpoint, containing basic info, like name, description, but
  also including links to the inbox and outbox
- the outbox endpoint, allowing to show previous activities for an actor
- the inbox endpoint, on which to post to submit follow and unfollow
  requests (among other things we won't use for now).

The controllers providing those endpoints are in
`app/controllers/activity_pub/`. It's been decided to use this namespace to
avoid mixing the ActivityPub JSON responses with the ones meant for the
frontend, and also because we may need further namespacing later, as the
way we format activities may be different for one Fediverse app, for an
other, and for our later cross-instance features. Also, this namespace
allow us to easily toggle what we need on all endpoints, like making sure
no private project can be accessed.

#### Serializers

The serializers in `app/serializers/activity_pub/` are the meat of our
implementation, are they provide the ActivityStreams objects. The abstract
class `ActivityPub::ActivityStreamsSerializer` does all the heavy lifting
of validating developer provided data, setting up the common fields and
providing pagination.

That pagination part is done through `Gitlab::Serializer::Pagination`, which
uses offset pagination.
[We need to allow it to do keyset pagination](https://gitlab.com/gitlab-org/gitlab/-/issues/424148).

#### Subscription

Subscription to a resource is done by posting a
[Follow activity](https://www.w3.org/TR/activitystreams-vocabulary/#dfn-follow)
to the actor inbox. When receiving a Follow activity,
[we should generate an Accept or Reject activity in return](https://www.w3.org/TR/activitypub/#follow-activity-inbox),
sent to the subscriber's inbox.

The general workflow of the implementation is as following:

- A POST request is made to the inbox endpoint, with the Follow activity
  encoded as JSON
- if the activity received is not of a supported type (e.g. someone tries to
  comment on the activity), we ignore it ; otherwise:
- we create an `ActivityPub::Subscription` with the profile URL of the
  subscriber
- we queue a job to resolve the subscriber's inbox URL
  - in which we perform a HTTP request to the subscriber profile to find
    their inbox URL (and the shared inbox URL if any)
  - we store that URL in the subscription record
- we queue a job to accept the subscription
  - in which we perform a HTTP request to the subscriber inbox to post an
    Accept activity
  - we update the state of the subscription to `:accepted`

`ActivityPub::Subscription` is a new abstract model, from which inherit
models related to our actors, each with their own table:

- ActivityPub::ReleasesSubscription, table `activity_pub_releases_subscriptions`
- ActivityPub::TopicSubscription, table `activity_pub_topic_subscriptions`
- ActivityPub::ProjectSubscription, table `activity_pub_project_subscriptions`
- ActivityPub::GroupSubscription, table `activity_pub_group_subscriptions`
- ActivityPub::UserSubscription, table `activity_pub_user_subscriptions`

The reason to go with a multiple models rather than, say, a simpler `actor`
enum in the Subscription model with a single table is because we needs
specific associations and validations for each (an
`ActivityPub::ProjectSubscription` belongs to a Project, an
`ActivityPub::UserSubscription` does not). It also gives us more room for
extensibility in the future.

#### Unfollow

When receiving
[an Undo activity](https://www.w3.org/TR/activitypub/#undo-activity-inbox)
mentioning previous Follow, we remove the subscription from our database.

We are not required to send back any activity, so we don't need any worker
here, we can directly remove the record from database.

#### Sending activities out

When specific events (which ones?) happen related to our actors, we should
queue events to issue activities on the subscribers inboxes (the activities
are the same than we display in the actor's outbox).

We're supposed to deduplicate the subscriber list to make sure we don't
send an activity twice to the same person - although it's probably better
handled by a uniqueness validation from the model when receiving the Follow
activity.

More importantly, we should group requests for a same host : if ten users
are all on `https://mastodon.social/`, we should issue a single request on
the shared inbox provided, adding all the users as recipients, rather than
sending one request per user.

#### [Webfinger](https://gitlab.com/gitlab-org/gitlab/-/issues/423079)

Mastodon
[requires instance to implement the Webfinger protocol](https://docs.joinmastodon.org/spec/webfinger/).
This protocol is about adding an endpoint at a well known location which
allows to query for a resource name and have it mapped to whatever URL we
want (so basically, it's used for discovery). Mastodon uses this to query
other fediverse apps for actor names, in order to find their profile URLs.

Actually, GitLab already implements the Webfinger protocol endpoint through
Doorkeeper
([this is the action that maps to its route](https://github.com/doorkeeper-gem/doorkeeper-openid_connect/blob/5987683ccc22262beb6e44c76ca4b65288d6067a/app/controllers/doorkeeper/openid_connect/discovery_controller.rb#L14-L16)),
implemented in GitLab
[in JwksController](https://gitlab.com/gitlab-org/gitlab/-/blob/efa76816bd0603ba3acdb8a0f92f54abfbf5cc02/app/controllers/jwks_controller.rb).

There is no incompatibility here, we can just extend this controller.
Although, we'll probably have to rename it, as it won't be related to Jwks
alone anymore.

One difficulty we may have is that contrary to Mastodon, we don't only deal
with users. So we need to figure something to differentiate asking for a
user from asking for a project, for example. One obvious way would be to
use a prefix, like `user-<username>`, `project-<project_name>`, etc. I'm
pondering that from afar, while we haven't implemented much code in the
epic and I haven't dig deep into Webfinger's specs, this remark may be
deprecated when we reach actual implementation.

#### [HTTP signatures](https://gitlab.com/gitlab-org/gitlab/-/issues/423083)

Mastodon
[requires HTTP signatures](https://docs.joinmastodon.org/spec/security/#http),
which is yet an other standard, in order to make sure no spammer tries to
impersonate a given server.

This is asymmetrical cryptography, with a private key and a public key,
like SSH or PGP. We will need to implement both signing requests, and
verifying them. This will be of considerable help when we'll want to have
various GitLab instances communicate later in the epic.

### Host allowlist and denylist

To give GitLab instance owners control over potential spam, we need to
allow to maintain two mutually exclusive lists of hosts:

- the allowlist : only hosts mentioned in this list can be federated with.
- the denylist : all hosts can be federated with but the ones mentioned in
  that list.

A setting should allow the owner to switch between the allowlist and the denylist.
In the beginning, this can be managed in rails console, but it will
ultimately need a section in the admin interface.

### Limits and rollout

In order to control the load when releasing the feature in the first
months, we're going to set `gitlab.com` to use the allowlist and rollout
federation to a few Fediverse servers at a time, so that we can see how it
takes the load progressively, before ultimately switching to denylist
(note: there are
[some ongoing discussions](https://gitlab.com/gitlab-org/gitlab/-/issues/426373#note_1584232842)
regarding if federation should be activated on `gitlab.com` or not).

We also need to implement limits to make sure the federation is not abused:

- limit to the number of subscriptions a resource can receive.
- limit to the number of subscriptions a third party server can generate.

### The cross-instance issues and merge requests part

We'll wait to be done with the social following part before designing this
part, to have ground experience with ActivityPub.