Feature flag rollout doc: rewrite & make better use of template

Rather than duplicating things between the issue template and process doc, let's cross link the two. Also elaborate on a lot of things I encountered when enabling my first features. This is the result of feedback on the initial version of this [1] and a video chat with @zj-gitlab this morning. 1. https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2910
author: Ævar Arnfjörð Bjarmason <avarab@gmail.com> 2020-12-15 19:22:55 +0300
committer: Ævar Arnfjörð Bjarmason <avarab@gmail.com> 2020-12-17 16:02:29 +0300
commit: 40f953069d4d762f489da2c160fae7c97df478ba (patch)
tree: 008604e2c2cdaa6c099b06c0b122a8603b85e401 /doc
parent: 17c95dc4bf0ae018a986d180aa131e0099fca1fe (diff)
1 files changed, 168 insertions, 16 deletions
diff --git a/doc/PROCESS.md b/doc/PROCESS.md
index a876cc100..eccce5a66 100644
--- a/doc/PROCESS.md
+++ b/doc/PROCESS.md
@@ -6,30 +6,182 @@ Gitaly uses feature flags to safely roll out features in production. Feature
 flags are part of the `context.Context` of each RPC. The `featureflag` package
 will help you with flow control.
 
-Feature flags are [enabled through chatops][enable-flags]. For Gitaly, you have
-to prepend `gitaly_` to your feature flag when enabling or disabling. For example:
-to enable the feature flag "mep_mep", you run:
+Most of this documentation assumes operations on `gitlab.com`. For
+customers, an [HTTP API is available][ff-api].
 
-`/chatops run feature set gitaly_mep_mep true`
+In order to roll out feature flags to `gitlab.com`, you should follow
+the documented rollout process below.
 
-For customers, who don't use chatops, an [HTTP API is available][ff-api].
+Once you have [developed your feature][feature-development] you [start
+by creating an issue for the rollout][issue-for-feature-rollout].
 
-In order to roll out feature flags, you should always follow the documented
-[rollout process][rollout-process]. Most importantly, you should test the
-feature on preproduction environments first and monitor them. Only if no issues
-are observed for an extended amount of time (e.g. one whole day) should you
-incrementally enable the feature flag in production. To change feature flags in
-production, you need to create a change management issue as described in the
-[change management documentation][change-management].
+The "Feature Flag Roll Out" [template for the
+issue][feature-issue-template] has a checklist for the rest of the
+steps.
 
-[enable-flags]: https://docs.gitlab.com/ee/development/feature_flags/controls.html
 [ff-api]: https://docs.gitlab.com/ee/api/features.html#features-flags-api
-[rollout-process]: https://docs.gitlab.com/ee/development/feature_flags/controls.html#rolling-out-changes
-[change-management]: https://about.gitlab.com/handbook/engineering/infrastructure/change-management
+[feature-development]: https://docs.gitlab.com/ee/development/feature_flags/index.html
+[issue-for-feature-rollout]: https://gitlab.com/gitlab-org/gitaly/-/issues/new?issuable_template=Feature%20Flag%20Roll%20Out
+[feature-issue-template]: https://gitlab.com/gitlab-org/gitaly/-/blob/master/.gitlab/issue_templates/Feature%20Flag%20Roll%20Out.md
+
+#### Use and limitations
+
+Feature flags are [enabled through chatops][enable-flags] (which is
+just a consumer [of the API][ff-api]). In
+[`#chat-ops-test`][chan-chat-ops-test] try:
+
+    /chatops run feature list --match gitaly_
+
+If you get a permission error you need to request access first. That
+can be done [in the `#production` channel][production-request-acl].
+
+For Gitaly, you have to prepend `gitaly_` to your feature flag when
+enabling or disabling. For example: to check if
+[`gitaly_go_user_delete_tag`][chan-production] is enabled on staging
+run:
+
+    /chatops run feature get gitaly_go_user_delete_tag --staging
+
+Note that the full set of chatops features for the Rails environment
+does not work in Gitaly. E.g. the [`--user` argument does
+not][bug-user-argument], neither does [enabling by group or
+project][bug-project-argument].
+
+[enable-flags]: https://docs.gitlab.com/ee/development/feature_flags/controls.html
+[chan-chat-ops-test]: https://gitlab.slack.com/archives/CB2S7NNDP
+[production-request-acl]: https://gitlab.slack.com/archives/C101F3796
+[chan-production]: https://gitlab.com/gitlab-org/gitaly/-/issues/3371
+[bug-user-argument]: https://gitlab.com/gitlab-org/gitaly/-/issues/3385
+[bug-project-argument]: https://gitlab.com/gitlab-org/gitaly/-/issues/3386
+
+### Feature flags issue checklist
+
+The rest of this section is help for the individual checklist steps in
+[the issue template][feature-issue-template]. If this is your first
+time doing this you might want to first skip ahead to the help below,
+you'll likely need to file some access requests.
+
+#### Is the required code deployed?
+
+The [/help action on gitlab.com][help-action] shows the currently
+deployed hash. Copy that `HASH` and look at `GITALY_SERVER_VERSION` in
+[gitlab-org/gitlab.git][gitlab-git] to see what the embedded gitaly
+version is. Or in [a gitaly.git checkout][gitaly-git] run this to see
+what commits aren't deployed yet:
+
+    git fetch
+    git shortlog $(curl -s https://gitlab.com/gitlab-org/gitlab/-/raw/HASH/GITALY_SERVER_VERSION)..origin/master
+
+See the [documentation on releases below](#gitaly-releases) for more
+details on the tagging and release process.
+
+[help-action]: https://gitlab.com/help
+[gitlab-git]: https://gitlab.com/gitlab-org/gitlab/
+[gitaly-git]: https://gitlab.com/gitlab-org/gitaly/
+
+#### Enable on staging
+
+##### Prerequisites
+
+You'll need chatops access. See [above](#use-and-limitations).
+
+##### Steps
+
+Run:
+
+`/chatops run feature set gitaly_X true --staging`
+
+Where `X` is the name of your feature.
+
+#### Test on staging
+
+##### Prerequisites
+
+Access to https://staging.gitlab.com/users is not the same as on
+gitlab.com (or signing in with Google on the @gitlab.com account). You
+must [request access to it][staging-access-request].
+
+As of December 2020 clicking "Sign in" on
+https://about.staging.gitlab.com will redirect to https://gitlab.com,
+so make sure to use the `/users` link.
+
+[staging-access-request]: https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/new?issuable_template=Individual_Bulk_Access_Request
+
+##### Steps
+
+Manually use the feature in whatever way exercises the code paths
+being enabled.
+
+Then enable `X` on staging, with:
+
+     /chatops run feature set gitaly_X --staging
+
+##### Discussion
+
+It's a good idea to run the feature for a full day on staging, this is
+because there are daily smoke tests that run daily in that
+environment. These are handled by
+[gitlab-org/gitlab-qa.git][gitlab-qa-git]
+
+[gitlab-qa-git]: https://gitlab.com/gitlab-org/gitlab-qa#how-do-we-use-it
+
+#### Enable in production
+
+##### Prerequisites
+
+Have you waited enough time with the feature running in the staging
+environment? Good!
+
+##### Steps
+
+To enable your `X` feature at 5/25/50 percent, run:
+
+    /chatops run feature set gitaly_X 5
+    /chatops run feature set gitaly_X 25
+    /chatops run feature set gitaly_X 50
+
+And then finally when you're happy it works properly do:
+
+    /chatops run feature set gitaly_X 100
+
+Followed by:
+
+    /chatops run feature set gitaly_X true
+
+Note that you need both the `100` and `true` as separate commands. See
+[the documentation on actor
+gates][actor-gates]
+
+[actor-gates]: https://docs.gitlab.com/ee/development/feature_flags/controls.html#process
+
+##### Discussion
+
+What percentages should you pick and how long should you wait?
+
+It makes sense to be aggressive about getting to 50% and then 100% as
+soon as possible.
+
+You should use lower percentages only as a paranoia check to make sure
+that it e.g. doesn't spew errors at users unexpectedly at a high rate,
+or (e.g. if it invokes a new expensive `git` command) doesn't create
+runaway load on our servers.
+
+But say running at 5% for hours after we've already had sufficient
+data to demonstrate that we won't be spewing errors or taking down the
+site just means you're delaying getting more data to be certain that
+it works properly.
+
+Nobody's better off if you wait 10 hours at 1% to get error data you
+could have waited 1 hour at 10% to get, or just over 10 minutes with
+close monitoring at 50%.
 
 ### Gitaly Releases
 
-Gitaly releases are tagged automatically by [`release-tools`](https://gitlab.com/gitlab-org/release-tools) when a Release Manager tags a GitLab version.
+Gitaly releases are tagged automatically by
+[`release-tools`][release-tools] when a Release Manager tags a GitLab
+version.
+
+[release-tools]: https://gitlab.com/gitlab-org/release-tools
 
 #### Major or minor releases
author	Ævar Arnfjörð Bjarmason <avarab@gmail.com>	2020-12-15 19:22:55 +0300
committer	Ævar Arnfjörð Bjarmason <avarab@gmail.com>	2020-12-17 16:02:29 +0300
commit	40f953069d4d762f489da2c160fae7c97df478ba (patch)
tree	008604e2c2cdaa6c099b06c0b122a8603b85e401 /doc
parent	17c95dc4bf0ae018a986d180aa131e0099fca1fe (diff)