--- stage: Growth group: Telemetry info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers --- # Snowplow Guide This guide provides a details about how Snowplow works. It includes the following sections: 1. [What is Snowplow](#what-is-snowplow) 1. [Snowplow schema](#snowplow-schema) 1. [Enabling Snowplow](#enabling-snowplow) 1. [Snowplow request flow](#snowplow-request-flow) 1. [Implementing Snowplow JS (Frontend) tracking](#implementing-snowplow-js-frontend-tracking) 1. [Implementing Snowplow Ruby (Backend) tracking](#implementing-snowplow-ruby-backend-tracking) 1. [Developing and testing Snowplow](#developing-and-testing-snowplow) For more information about Telemetry, see: - [Telemetry Guide](index.md) - [Usage Ping Guide](usage_ping.md) More useful links: - [Telemetry Direction](https://about.gitlab.com/direction/telemetry/) - [Data Analysis Process](https://about.gitlab.com/handbook/business-ops/data-team/#-data-analysis-process) - [Data for Product Managers](https://about.gitlab.com/handbook/business-ops/data-team/data-for-product-managers/) - [Data Infrastructure](https://about.gitlab.com/handbook/business-ops/data-team/data-infrastructure/) ## What is Snowplow Snowplow is an enterprise-grade marketing and product analytics platform which helps track the way users engage with our website and application. From [Snowplow's documentation](https://github.com/snowplow/snowplow), Snowplow consists of six loosely-coupled sub-systems: - **Trackers** fire Snowplow events. Currently Snowplow has 12 trackers, covering web, mobile, desktop, server and IoT - **Collectors** receive Snowplow events from trackers. Currently we have three different event collectors, sinking events either to Amazon S3, Apache Kafka or Amazon Kinesis - **Enrich** cleans up the raw Snowplow events, enriches them and puts them into storage. Currently we have a Hadoop-based enrichment process, and a Kinesis- or Kafka-based process - **Storage** is where the Snowplow events live. Currently we store the Snowplow events in a flat file structure on S3, and in the Redshift and PostgreSQL databases - **Data modeling** is where event-level data is joined with other data sets and aggregated into smaller data sets, and business logic is applied. This produces a clean set of tables which make it easier to perform analysis on the data. We have data models for Redshift and Looker - **Analytics** are performed on the Snowplow events or on the aggregate tables. ![snowplow_flow](../img/snowplow_flow.png) > ![snowplow_flow](../img/snowplow_flow.png) ## Snowplow schema We currently have many definitions of Snowplow's schema. We have an active issue to [standardize this schema](https://gitlab.com/gitlab-org/gitlab/-/issues/207930) including the following definitions: - Frontend and backend taxonomy as listed below - [Feature instrumentation taxonomy](https://about.gitlab.com/handbook/product/feature-instrumentation/#taxonomy) - [Self describing events](https://github.com/snowplow/snowplow/wiki/Custom-events#self-describing-events) - [Iglu schema](https://gitlab.com/gitlab-org/iglu/) - [Snowplow authored events](https://github.com/snowplow/snowplow/wiki/Snowplow-authored-events) ## Enabling Snowplow Tracking can be enabled at: - The instance level, which will enable tracking on both the frontend and backend layers. - User level, though user tracking can be disabled on a per-user basis. GitLab tracking respects the [Do Not Track](https://www.eff.org/issues/do-not-track) standard, so any user who has enabled the Do Not Track option in their browser will also not be tracked from a user level. We utilize Snowplow for the majority of our tracking strategy and it is enabled on GitLab.com. On a self-managed instance, Snowplow can be enabled by navigating to: - **Admin Area > Settings > Integrations** in the UI. - `admin/application_settings/integrations` in your browser. The following configuration is required: | Name | Value | | ------------- | ------------------------- | | Collector | `snowplow.trx.gitlab.net` | | Site ID | `gitlab` | | Cookie domain | `.gitlab.com` | ## Snowplow request flow The following example shows a basic request/response flow between a Snowplow JS / Ruby Trackers on GitLab.com, [the GitLab.com Snowplow Collector](https://about.gitlab.com/handbook/engineering/infrastructure/library/snowplow/), GitLab's S3 Bucket, GitLab's Snowflake Data Warehouse, and Sisense.: ```mermaid sequenceDiagram participant Snowplow JS (Frontend) participant Snowplow Ruby (Backend) participant GitLab.com Snowplow Collector participant S3 Bucket participant Snowflake DW participant Sisense Dashboards Snowplow JS (Frontend) ->> GitLab.com Snowplow Collector: FE Tracking event Snowplow Ruby (Backend) ->> GitLab.com Snowplow Collector: BE Tracking event loop Process using Kinesis Stream GitLab.com Snowplow Collector ->> GitLab.com Snowplow Collector: Log raw events GitLab.com Snowplow Collector ->> GitLab.com Snowplow Collector: Enrich events GitLab.com Snowplow Collector ->> GitLab.com Snowplow Collector: Write to disk end GitLab.com Snowplow Collector ->> S3 Bucket: Kinesis Firehose S3 Bucket->>Snowflake DW: Import data Snowflake DW->>Snowflake DW: Transform data using dbt Snowflake DW->>Sisense Dashboards: Data available for querying ``` ## Implementing Snowplow JS (Frontend) tracking GitLab provides `Tracking`, an interface that wraps the [Snowplow JavaScript Tracker](https://github.com/snowplow/snowplow/wiki/javascript-tracker) for tracking custom events. There are a few ways to utilize tracking, but each generally requires at minimum, a `category` and an `action`. Additional data can be provided that adheres to our [Feature instrumentation taxonomy](https://about.gitlab.com/handbook/product/feature-instrumentation/#taxonomy). | field | type | default value | description | |:-----------|:-------|:---------------------------|:------------| | `category` | string | document.body.dataset.page | Page or subsection of a page that events are being captured within. | | `action` | string | 'generic' | Action the user is taking. Clicks should be `click` and activations should be `activate`, so for example, focusing a form field would be `activate_form_input`, and clicking a button would be `click_button`. | | `data` | object | {} | Additional data such as `label`, `property`, `value`, and `context` as described [in our Feature Instrumentation taxonomy](https://about.gitlab.com/handbook/product/feature-instrumentation/#taxonomy). | ### Tracking in HAML (or Vue Templates) When working within HAML (or Vue templates) we can add `data-track-*` attributes to elements of interest. All elements that have a `data-track-event` attribute will automatically have event tracking bound on clicks. Below is an example of `data-track-*` attributes assigned to a button: ```haml %button.btn{ data: { track: { event: "click_button", label: "template_preview", property: "my-template" } } } ``` ```html