diff options
author | Ævar Arnfjörð Bjarmason <avarab@gmail.com> | 2020-12-15 19:22:55 +0300 |
---|---|---|
committer | Ævar Arnfjörð Bjarmason <avarab@gmail.com> | 2020-12-17 16:02:29 +0300 |
commit | 40f953069d4d762f489da2c160fae7c97df478ba (patch) | |
tree | 008604e2c2cdaa6c099b06c0b122a8603b85e401 /doc | |
parent | 17c95dc4bf0ae018a986d180aa131e0099fca1fe (diff) |
Feature flag rollout doc: rewrite & make better use of template
Rather than duplicating things between the issue template and process
doc, let's cross link the two.
Also elaborate on a lot of things I encountered when enabling my first
features. This is the result of feedback on the initial version of
this [1] and a video chat with @zj-gitlab this morning.
1. https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2910
Diffstat (limited to 'doc')
-rw-r--r-- | doc/PROCESS.md | 184 |
1 files changed, 168 insertions, 16 deletions
diff --git a/doc/PROCESS.md b/doc/PROCESS.md index a876cc100..eccce5a66 100644 --- a/doc/PROCESS.md +++ b/doc/PROCESS.md @@ -6,30 +6,182 @@ Gitaly uses feature flags to safely roll out features in production. Feature flags are part of the `context.Context` of each RPC. The `featureflag` package will help you with flow control. -Feature flags are [enabled through chatops][enable-flags]. For Gitaly, you have -to prepend `gitaly_` to your feature flag when enabling or disabling. For example: -to enable the feature flag "mep_mep", you run: +Most of this documentation assumes operations on `gitlab.com`. For +customers, an [HTTP API is available][ff-api]. -`/chatops run feature set gitaly_mep_mep true` +In order to roll out feature flags to `gitlab.com`, you should follow +the documented rollout process below. -For customers, who don't use chatops, an [HTTP API is available][ff-api]. +Once you have [developed your feature][feature-development] you [start +by creating an issue for the rollout][issue-for-feature-rollout]. -In order to roll out feature flags, you should always follow the documented -[rollout process][rollout-process]. Most importantly, you should test the -feature on preproduction environments first and monitor them. Only if no issues -are observed for an extended amount of time (e.g. one whole day) should you -incrementally enable the feature flag in production. To change feature flags in -production, you need to create a change management issue as described in the -[change management documentation][change-management]. +The "Feature Flag Roll Out" [template for the +issue][feature-issue-template] has a checklist for the rest of the +steps. -[enable-flags]: https://docs.gitlab.com/ee/development/feature_flags/controls.html [ff-api]: https://docs.gitlab.com/ee/api/features.html#features-flags-api -[rollout-process]: https://docs.gitlab.com/ee/development/feature_flags/controls.html#rolling-out-changes -[change-management]: https://about.gitlab.com/handbook/engineering/infrastructure/change-management +[feature-development]: https://docs.gitlab.com/ee/development/feature_flags/index.html +[issue-for-feature-rollout]: https://gitlab.com/gitlab-org/gitaly/-/issues/new?issuable_template=Feature%20Flag%20Roll%20Out +[feature-issue-template]: https://gitlab.com/gitlab-org/gitaly/-/blob/master/.gitlab/issue_templates/Feature%20Flag%20Roll%20Out.md + +#### Use and limitations + +Feature flags are [enabled through chatops][enable-flags] (which is +just a consumer [of the API][ff-api]). In +[`#chat-ops-test`][chan-chat-ops-test] try: + + /chatops run feature list --match gitaly_ + +If you get a permission error you need to request access first. That +can be done [in the `#production` channel][production-request-acl]. + +For Gitaly, you have to prepend `gitaly_` to your feature flag when +enabling or disabling. For example: to check if +[`gitaly_go_user_delete_tag`][chan-production] is enabled on staging +run: + + /chatops run feature get gitaly_go_user_delete_tag --staging + +Note that the full set of chatops features for the Rails environment +does not work in Gitaly. E.g. the [`--user` argument does +not][bug-user-argument], neither does [enabling by group or +project][bug-project-argument]. + +[enable-flags]: https://docs.gitlab.com/ee/development/feature_flags/controls.html +[chan-chat-ops-test]: https://gitlab.slack.com/archives/CB2S7NNDP +[production-request-acl]: https://gitlab.slack.com/archives/C101F3796 +[chan-production]: https://gitlab.com/gitlab-org/gitaly/-/issues/3371 +[bug-user-argument]: https://gitlab.com/gitlab-org/gitaly/-/issues/3385 +[bug-project-argument]: https://gitlab.com/gitlab-org/gitaly/-/issues/3386 + +### Feature flags issue checklist + +The rest of this section is help for the individual checklist steps in +[the issue template][feature-issue-template]. If this is your first +time doing this you might want to first skip ahead to the help below, +you'll likely need to file some access requests. + +#### Is the required code deployed? + +The [/help action on gitlab.com][help-action] shows the currently +deployed hash. Copy that `HASH` and look at `GITALY_SERVER_VERSION` in +[gitlab-org/gitlab.git][gitlab-git] to see what the embedded gitaly +version is. Or in [a gitaly.git checkout][gitaly-git] run this to see +what commits aren't deployed yet: + + git fetch + git shortlog $(curl -s https://gitlab.com/gitlab-org/gitlab/-/raw/HASH/GITALY_SERVER_VERSION)..origin/master + +See the [documentation on releases below](#gitaly-releases) for more +details on the tagging and release process. + +[help-action]: https://gitlab.com/help +[gitlab-git]: https://gitlab.com/gitlab-org/gitlab/ +[gitaly-git]: https://gitlab.com/gitlab-org/gitaly/ + +#### Enable on staging + +##### Prerequisites + +You'll need chatops access. See [above](#use-and-limitations). + +##### Steps + +Run: + +`/chatops run feature set gitaly_X true --staging` + +Where `X` is the name of your feature. + +#### Test on staging + +##### Prerequisites + +Access to https://staging.gitlab.com/users is not the same as on +gitlab.com (or signing in with Google on the @gitlab.com account). You +must [request access to it][staging-access-request]. + +As of December 2020 clicking "Sign in" on +https://about.staging.gitlab.com will redirect to https://gitlab.com, +so make sure to use the `/users` link. + +[staging-access-request]: https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/new?issuable_template=Individual_Bulk_Access_Request + +##### Steps + +Manually use the feature in whatever way exercises the code paths +being enabled. + +Then enable `X` on staging, with: + + /chatops run feature set gitaly_X --staging + +##### Discussion + +It's a good idea to run the feature for a full day on staging, this is +because there are daily smoke tests that run daily in that +environment. These are handled by +[gitlab-org/gitlab-qa.git][gitlab-qa-git] + +[gitlab-qa-git]: https://gitlab.com/gitlab-org/gitlab-qa#how-do-we-use-it + +#### Enable in production + +##### Prerequisites + +Have you waited enough time with the feature running in the staging +environment? Good! + +##### Steps + +To enable your `X` feature at 5/25/50 percent, run: + + /chatops run feature set gitaly_X 5 + /chatops run feature set gitaly_X 25 + /chatops run feature set gitaly_X 50 + +And then finally when you're happy it works properly do: + + /chatops run feature set gitaly_X 100 + +Followed by: + + /chatops run feature set gitaly_X true + +Note that you need both the `100` and `true` as separate commands. See +[the documentation on actor +gates][actor-gates] + +[actor-gates]: https://docs.gitlab.com/ee/development/feature_flags/controls.html#process + +##### Discussion + +What percentages should you pick and how long should you wait? + +It makes sense to be aggressive about getting to 50% and then 100% as +soon as possible. + +You should use lower percentages only as a paranoia check to make sure +that it e.g. doesn't spew errors at users unexpectedly at a high rate, +or (e.g. if it invokes a new expensive `git` command) doesn't create +runaway load on our servers. + +But say running at 5% for hours after we've already had sufficient +data to demonstrate that we won't be spewing errors or taking down the +site just means you're delaying getting more data to be certain that +it works properly. + +Nobody's better off if you wait 10 hours at 1% to get error data you +could have waited 1 hour at 10% to get, or just over 10 minutes with +close monitoring at 50%. ### Gitaly Releases -Gitaly releases are tagged automatically by [`release-tools`](https://gitlab.com/gitlab-org/release-tools) when a Release Manager tags a GitLab version. +Gitaly releases are tagged automatically by +[`release-tools`][release-tools] when a Release Manager tags a GitLab +version. + +[release-tools]: https://gitlab.com/gitlab-org/release-tools #### Major or minor releases |