Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/sec/index.md')
-rw-r--r--doc/development/sec/index.md90
1 files changed, 90 insertions, 0 deletions
diff --git a/doc/development/sec/index.md b/doc/development/sec/index.md
index c9805044e58..fc13c960451 100644
--- a/doc/development/sec/index.md
+++ b/doc/development/sec/index.md
@@ -64,7 +64,97 @@ After the data is available as a Report Artifact it can be processed by the GitL
Depending on the context, the security reports may be stored either in the database or stay as Report Artifacts for on-demand access.
+#### Security report ingestion overview
+
+For details on how GitLab processes the reports generated by the scanners, see
+[Security report ingestion overview](security_report_ingestion_overview.md).
+
## CI/CD template development
While CI/CD templates are the responsibility of the Verify section, many are critical to the Sec Section's feature usage.
If you are working with CI/CD templates, please read the [development guide for GitLab CI/CD templates](../cicd/templates.md).
+
+## Importance of the primary identifier
+
+Within analyzer JSON reports, the [`identifiers` field](../integrations/secure.md#identifiers) contains a collection of types and categories by which
+a vulnerability can be described (that is, a CWE family).
+
+The first item in the `identifiers` collection is known as the [primary identifier](../../user/application_security/terminology#primary-identifier),
+a critical component to both describing and tracking vulnerabilities.
+
+In most other cases, the `identifiers` collection is unordered, where the remaining secondary identifiers act as metadata for grouping vulnerabilities
+(see [Analyzer vulnerability translation](#analyzer-vulnerability-translation) below for the exception).
+
+Any time the primary identifier changes and a project pipeline is re-run, ingestion of the new report will “orphan” the previous DB record.
+Because our processing logic relies on generating a delta of two different vulnerabilities, it can end up looking rather confusing. For example:
+
+[!Screenshot of primary identifier mismatch in MR widget](img/primary_identifier_changed_v15_6.png)
+
+After being [merged](../integrations/secure.md#tracking-and-merging-vulnerabilities), the previous vulnerability is listed as "remediated" and the introduced as ["detected"](../../user/application_security/vulnerabilities/index.md#vulnerability-status-values).
+
+### Guiding principles for ensuring primary identifier stability
+
+- A primary identifier should never change unless we have a compelling reason.
+- Analyzer supporting vulnerability translation must include the legacy primary identifiers in a secondary position to prevent “orphaning” of results.
+- Beyond the primary identifier, the order of secondary identifiers does not matter.
+- The identifier is unique based on a combination of the `Type` and `Value` fields (see [identifier fingerprint](https://gitlab.com/gitlab-org/gitlab/-/blob/v15.5.1-ee/lib/gitlab/ci/reports/security/identifier.rb#L63)).
+- If we change the primary identifier, rolling back analyzers to previous versions will not fix the orphaned results. The data previously ingested into our database is an artifact of previous jobs with few ways of automating data migrations.
+
+### Analyzer vulnerability translation
+
+In the case of SAST's semgrep analyzer, there is a secondary identifier of particular importance: the identifier linking the report’s vulnerability
+to the legacy analyzer (that is, bandit or eslint).
+
+To [enable vulnerability translation](../../user/application_security/sast/analyzers.md#vulnerability-translation)
+the semgrep analyzer relies on a secondary identifier exactly matching the primary identifier of the legacy analyzer.
+
+For example, when [`eslint`](https://gitlab.com/gitlab-org/security-products/analyzers/eslint) was previously used to generate vulnerability records,
+the [`semgrep`](https://gitlab.com/gitlab-org/security-products/analyzers/semgrep) analyzer must produce an identifier collection containing the
+original eslint primary identifier.
+
+Given the original `eslint` report:
+
+```json
+{
+ "version": "14.0.4",
+ "vulnerabilities": [
+ {
+ "identifiers": [
+ {
+ "type": "eslint_rule_id",
+ "name": "ESLint rule ID security/detect-eval-with-expression",
+ "value": "security/detect-eval-with-expression"
+ }
+ ]
+ }
+ ]
+}
+```
+
+The corresponding semgrep report must contain the `eslint_rule_id`:
+
+```json
+{
+ "version": "14.0.4",
+ "vulnerabilities": [
+ {
+ "identifiers": [
+ {
+ "type": "semgrep_id",
+ "name": "eslint.detect-eval-with-expression",
+ "value": "eslint.detect-eval-with-expression",
+ "url": "https://semgrep.dev/r/gitlab.eslint.detect-eval-with-expression"
+ },
+ {
+ "type": "eslint_rule_id",
+ "name": "ESLint rule ID security/detect-eval-with-expression",
+ "value": "security/detect-eval-with-expression"
+ }
+ ]
+ }
+ ]
+}
+```
+
+[Tracking of vulnerabilities](../integrations/secure.md#tracking-and-merging-vulnerabilities) relies on a combination of the two identifiers
+to remap DB records previously generated with the legacy analyzers to those generated with the new `semgrep` ones.