From 0aa559415ca12dd0f0efd226873dd2a8e5313e3d Mon Sep 17 00:00:00 2001 From: GitLab Bot Date: Fri, 19 Nov 2021 00:09:40 +0000 Subject: Add latest changes from gitlab-org/gitlab@master --- .../blueprints/object_storage/index.md | 220 +++++++++++++++++++++ 1 file changed, 220 insertions(+) create mode 100644 doc/architecture/blueprints/object_storage/index.md (limited to 'doc/architecture') diff --git a/doc/architecture/blueprints/object_storage/index.md b/doc/architecture/blueprints/object_storage/index.md new file mode 100644 index 00000000000..a79374d60bd --- /dev/null +++ b/doc/architecture/blueprints/object_storage/index.md @@ -0,0 +1,220 @@ +--- +stage: none +group: unassigned +comments: false +description: 'Object storage: direct_upload consolidation - architecture blueprint.' +--- + +# Object storage: `direct_upload` consolidation + +## Abstract + +GitLab stores three classes of user data: database records, Git +repositories, and user-uploaded files (which are referred to as +file storage throughout the blueprint). + +The user and contributor experience for our file +storage has room for significant improvement: + +- Initial GitLab setup experience requires creation and setup of 13 + buckets, instead of just 1. +- Features using file storage require contributors to think about both local + storage and object storage, which leads to friction and + complexity. This often results in broken features and security issues. +- Contributors who work on file storage often also have to write code + for Workhorse, Omnibus, and cloud native GitLab (CNG). + +## Problem definition + +Object storage is a fundamental component of GitLab, providing the +underlying implementation for shared, distributed, highly-available +(HA) file storage. + +Over time, we have built support for object storage across the +application, solving specific problems in a [multitude of +iterations](https://about.gitlab.com/company/team/structure/working-groups/object-storage/#company-efforts-on-uploads). This +has led to increased complexity across the board, from development +(new features and bug fixes) to installation: + +- New GitLab installations require the creation and configuration of + several object storage buckets instead of just one, as each group of + features requires its own. This has an impact on the installation + experience and new feature adoption, and takes us further away from + boring solutions. +- The release of cloud native GitLab required the removal of NFS + shared storage and the development of direct upload, a feature that + was expanded, milestone after milestone, to several type of uploads, + but never enabled globally. +- Today, GitLab supports both local storage and object storage. Local + storage only works on single box installations or with a NFS, which + [we no longer recommend](../../../administration/nfs.md) to our + users and is no longer in use on GitLab.com. +- Understanding all the moving parts and the flow is extremely + complicated: we have CarrierWave, Fog, Golang S3/Azure SDKs, all + being used, and that complicates testing as well. +- Fog and CarrierWave are not maintained to the level of the native + SDKs (for example, AWS S3 SDK), so we have to maintain or monkey + patch those tools to support requested customer features + (for example, [issue #242245](https://gitlab.com/gitlab-org/gitlab/-/issues/242245)) + that would normally be "free". +- In many cases, we copy around object storage files needlessly + (for example, [issue #285597](https://gitlab.com/gitlab-org/gitlab/-/issues/285597)). + Large files (LFS, packages, and so on) are slow to finalize or don't work + at all as a result. + +## Improvements over the current situation + +The following is a brief description of the main directions we can take to +remove the pain points affecting our object storage implementation. + +This is also available as [a YouTube +video](https://youtu.be/X9V_w8hsM8E) recorded for the [Object Storage +Working +Group](https://about.gitlab.com/company/team/structure/working-groups/object-storage/). + +### Simplify GitLab architecture by shipping MinIO + +In the beginning, object storage support was a Premium feature, not +part of our CE distribution. Because of that, we had to support both +local storage and object storage. + +With local storage, there is the assumption of a shared storage +between components. This can be achieved by having a single box +installation, without HA, or with a NFS, which [we no longer +recommend](../../../administration/nfs.md). + +We have a testing gap on object storage. It also requires Workhorse +and MinIO, which are not present in our pipelines, so too much is +replaced by a mock implementation. Furthermore, the presence of a +shared disk, both in CI and in local development, often hides broken +implementations until we deploy on an HA environment. + +Shipping MinIO as part of the product will reduce the differences +between a cloud and a local installation, standardizing our file +storage on a single technology. + +The removal of local disk operations will reduce the complexity of +development as well as mitigate several security attack vectors as +we no longer write user-provided data on the local storage. + +It will also reduce human errors as we will always run a local object +storage in development mode and any local file disk access should +raise a red flag during the merge request review. + +This effort is described in [this epic](https://gitlab.com/groups/gitlab-org/-/epics/6099). + +### Enable direct upload by default on every upload + +Because every group of features requires its own bucket, we don't have +direct upload enabled everywhere. Contributing a new upload requires +coding it in both Ruby on Rails and Go. + +Implementing a new feature that does not yet have a dedicated bucket +requires the developer to also create a merge request in Omnibus +and CNG, as well as coordinate with SREs to configure the new bucket +for our own environments. + +This also slows down feature adoptions, because our users need to +reconfigure GitLab and prepare a new bucket in their +infrastructure. It also makes the initial installation more complex +feature after feature. + +Implementing a direct upload by default, with a +[consolidated object storage configuration](../../../administration/object_storage.md#consolidated-object-storage-configuration) +will reduce the number of merge requests needed to ship a new feature +from four to only one. It will also remove the need for SRE +intervention as the bucket will always be the same. + +This will simplify our development and review processes, as well as +the GitLab configuration file. And every user will immediately have +access to new features without infrastructure chores. + +### Simplify object storage code + +Our implementation is built on top of a 3rd-party framework where +every object storage client is a 3rd-party library. Unfortunately some +of them are unmaintained. [We have customers who cannot push 5GB Git +LFS objects](https://gitlab.com/gitlab-org/gitlab/-/issues/216442), +but with such a vital feature implemented in 3rd-party libraries we +are slowed down in fixing it, and we also rely on external maintainers +to merge and release fixes. + +Before the introduction of direct upload, using the +[CarrierWave](https://github.com/carrierwaveuploader/carrierwave) +library, _"a gem that provides a simple and extremely flexible way to +upload files from Ruby applications."_, was the boring solution. +However this is no longer our use-case, as we upload files from +Workhorse, and we had to [patch CarrierWave's +internals](https://gitlab.com/gitlab-org/gitlab/-/issues/285597#note_452696638) +to support direct upload. + +A brief proposal covering CarrierWave removal and a new streamlined +internal upload API is described +[in this issue comment](https://gitlab.com/gitlab-org/gitlab/-/issues/213288#note_325358026). + +Ideally, we wouldn't need to duplicate object storage clients in Go +and Ruby. By removing CarrierWave, we can make use of the officially +supported native clients when the provider S3 compatibility level is +not sufficient. + +## Iterations + +In this section we list some possible iterations. This is not +intended to be the final roadmap, but is a conversation started for the +Object Storage Working Group. + +1. Create a new catchall bucket and a unified internal API for + authorization without CarrierWave. +1. Ship MinIO with Omnibus (CNG images already include it). +1. Expand GitLab-QA to cover all the supported configurations. +1. Deprecate local disk access. +1. Deprecate configurations with multiple buckets. +1. Implement a bucket-to-bucket migration. +1. Migrate the current CarrierWave uploads to the new implementation. +1. On the next major release: Remove support for local disk access and + configurations with multiple buckets. + +### Benefits of the current iteration plan + +The current plan is designed to provide tangible benefits from the +first step. + +With the introduction of the catchall bucket, every upload currently +not subject to direct upload will get its benefits, and new features +could be shipped with a single merge request. + +Shipping MinIO with Omnibus will allow us to default new installations +to object storage, and Omnibus could take care of creating +buckets. This will simplify HA installation outside of Kubernetes. + +Then we can migrate each CarrierWave uploader to the new +implementation, up to a point where GitLab installation will only +require one bucket. + +## Additional reading materials + +- [Uploads development documentation: The problem description](../../../development/uploads.md#the-problem-description). +- [Speed up the monolith, building a smart reverse proxy in Go](https://archive.fosdem.org/2020/schedule/event/speedupmonolith/): a presentation explaining a bit of workhorse history and the challenge we faced in releasing the first cloud-native installation. +- [Object Storage improvements epic](https://gitlab.com/groups/gitlab-org/-/epics/483). +- We are moving to GraphQL API, but [we do not support direct upload](https://gitlab.com/gitlab-org/gitlab/-/issues/280819). + +## Who + +Proposal: + + + +| Role | Who | +|--------------------------------|-------------------------| +| Author | Alessio Caiazza | +| Architecture Evolution Coach | Gerardo Lopez-Fernandez | +| Engineering Leader | Marin Jankovski | +| Domain Expert / Object storage | Stan Hu | +| Domain Expert / Security | Joern Schneeweisz | + +DRIs: + +The DRI for this blueprint is the [Object Storage Working +Group](https://about.gitlab.com/company/team/structure/working-groups/object-storage/). + + -- cgit v1.2.3