Add latest changes from gitlab-org/gitlab@master

author: GitLab Bot <gitlab-bot@gitlab.com> 2024-01-24 09:07:44 +0300
committer: GitLab Bot <gitlab-bot@gitlab.com> 2024-01-24 09:07:44 +0300
commit: 4dcdd5bebb55bd5522ec180070d4d265e00943b5 (patch)
tree: 33760c353dd9dc97d0e5a64b107579b89d9110ea /doc
parent: 3f29b140ab13fd23ed35e759fd2bb6f41ba788ac (diff)
5 files changed, 69 insertions, 33 deletions
diff --git a/doc/ci/components/index.md b/doc/ci/components/index.md
index 727baa9be18..55f0cc67a87 100644
--- a/doc/ci/components/index.md
+++ b/doc/ci/components/index.md
@@ -27,6 +27,10 @@ but have several advantages:
 Instead of creating your own components, you can also search for published components
 that have the functionality you need in the [CI/CD Catalog](#cicd-catalog).
 
+<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
+For an introduction and hands-on examples, see [Efficient DevSecOps workflows with reusable CI/CD components](https://www.youtube.com/watch?v=-yvfSFKAgbA).
+<!-- Video published on 2024-01-22. DRI: Developer Relations, https://gitlab.com/groups/gitlab-com/marketing/developer-relations/-/epics/399 -->
+
 ## Component project
 
 A component project is a GitLab project with a repository that hosts one or more components.
@@ -301,7 +305,7 @@ For example:
 ```yaml
 include:
   # include the component located in the current project from the current SHA
-  - component: gitlab.com/$CI_PROJECT_PATH/my-component@$CI_COMMIT_SHA
+  - component: $CI_SERVER_HOST/$CI_PROJECT_PATH/my-component@$CI_COMMIT_SHA
     inputs:
       stage: build
 
@@ -315,7 +319,7 @@ ensure-job-added:
   image: badouralix/curl-jq
   script:
     - |
-      route="https://gitlab.com/api/v4/projects/$CI_PROJECT_ID/pipelines/$CI_PIPELINE_ID/jobs"
+      route="${CI_API_V4_URL}/projects/$CI_PROJECT_ID/pipelines/$CI_PIPELINE_ID/jobs"
       count=`curl --silent --header "PRIVATE-TOKEN: $API_TOKEN" $route | jq 'map(select(.name | contains("component-job"))) | length'`
       if [ "$count" != "1" ]; then
         exit 1
diff --git a/doc/development/ai_features/duo_chat.md b/doc/development/ai_features/duo_chat.md
index d7f88997fca..1230b46f093 100644
--- a/doc/development/ai_features/duo_chat.md
+++ b/doc/development/ai_features/duo_chat.md
@@ -12,7 +12,7 @@ NOTE:
 Use [this snippet](https://gitlab.com/gitlab-org/gitlab/-/snippets/2554994) for help automating the following section.
 
 1. [Enable Anthropic API features](index.md#configure-anthropic-access).
-1. [Ensure the embedding database is configured](index.md#set-up-the-embedding-database).
+1. [Ensure the embedding database is configured](index.md#embeddings-database).
 1. Ensure that your current branch is up-to-date with `master`.
 1. Enable the feature in Rails console: `Feature.enable(:tanuki_bot_breadcrumbs_entry_point)`
 
diff --git a/doc/development/ai_features/glossary.md b/doc/development/ai_features/glossary.md
index be856639b83..6c3966a054a 100644
--- a/doc/development/ai_features/glossary.md
+++ b/doc/development/ai_features/glossary.md
@@ -39,6 +39,15 @@ to AI that you think could benefit from being in this list, add it!
   piece of information, which helps to clarify its meaning and implications.
   For GitLab Duo Chat, context is the attributes of the Issue or Epic being
   referenced in a user question.
+- **Embeddings**: In the context of machine learning and large language models,
+  embeddings refer to a technique used to represent words, phrases, or even
+  entire documents as dense numerical vectors in a continuous vector space.
+  At GitLab, [we use Vertex AI's Embeddings API](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/129930)
+  to create a vector representation of GitLab documentation. These
+  embeddings are stored in the `vertex_gitlab_docs` database table in the
+  `embeddings` database. The embeddings search is done in Postgres using the
+  `vector` extension. The vertex embeddings database is updated based on the
+  latest version of GitLab documentation on daily basis by running `Llm::Embedding::GitlabDocumentation::CreateEmptyEmbeddingsRecordsWorker` as a cronjob. 
 - **Golden Questions**: a small subset of the types of questions we think a user
   should be able to ask GitLab Duo Chat. Used to generate data for Chat evaluation.
   [Questions for Chat Beta](https://gitlab.com/groups/gitlab-org/-/epics/10550#what-the-user-can-ask).
diff --git a/doc/development/ai_features/index.md b/doc/development/ai_features/index.md
index 8f9ffa20fe7..f9431f076f1 100644
--- a/doc/development/ai_features/index.md
+++ b/doc/development/ai_features/index.md
@@ -77,20 +77,6 @@ RAILS_ENV=development bundle exec rake gitlab:duo:setup['<test-group-name>']
    1. For Vertex, follow the [instructions below](#configure-gcp-vertex-access).
    1. For Anthropic, create an access request
 
-### Set up the embedding database
-
-For features that use the embedding database, additional setup is needed.
-
-1. Enable [`pgvector`](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/pgvector.md#enable-pgvector-in-the-gdk) in GDK
-1. Enable the embedding database in GDK
-
-   ```shell
-     gdk config set gitlab.rails.databases.embedding.enabled true
-   ```
-
-1. Run `gdk reconfigure`
-1. Run database migrations to create the embedding database
-
 ### Configure GCP Vertex access
 
 In order to obtain a GCP service key for local development, follow the steps below:
@@ -118,38 +104,71 @@ Gitlab::CurrentSettings.update(vertex_ai_project: PROJECT_ID)
 Gitlab::CurrentSettings.update!(anthropic_api_key: <insert API key>)
 ```
 
-### Populating embeddings and using embeddings fixture
+### Embeddings database
 
-Embeddings are generated through VertexAI text embeddings endpoint. The sections below explain how to populate
-embeddings in the DB or extract embeddings to be used in specs.
+Embeddings are generated through the [VertexAI text embeddings API](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings). The sections
+below explain how to populate embeddings in the DB or extract embeddings to be
+used in specs.
 
-#### VertexAI embeddings
+#### Set up
 
-To seed your development database with the embeddings for GitLab Documentation,
-you may use the pre-generated embeddings and a Rake task.
+1. Enable [`pgvector`](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/pgvector.md#enable-pgvector-in-the-gdk) in GDK
+1. Enable the embedding database in GDK
+
+   ```shell
+     gdk config set gitlab.rails.databases.embedding.enabled true
+   ```
+
+1. Run `gdk reconfigure`
+1. Run database migrations to create the embedding database
+
+   ```shell
+     RAILS_ENV=development bin/rails db:migrate
+   ```
+
+#### Populate
+
+Seed your development database with the embeddings for GitLab Documentation
+using this Rake task:
 
 ```shell
 RAILS_ENV=development bundle exec rake gitlab:llm:embeddings:vertex:seed
 ```
 
-The DBCleaner gem we use clear the database tables before each test runs.
-Instead of fully populating the table `vertex_gitlab_docs` where we store VertexAI embeddings for the documentations,
-we can add a few selected embeddings to the table from a pre-generated fixture.
+This Rake Task populates the embeddings database with a vectorized
+representation of all GitLab Documentation. The file the Rake Task uses as a
+source is a snapshot of GitLab Documentation at some point in the past and is
+not updated regularly. As a result, it is helpful to know that this seed task
+creates embeddings based on GitLab Documentation that is out of date. Slightly
+outdated documentation embeddings are sufficient for the development
+environment, which is the use-case for the seed task.
 
-For instance, to test that the question "How can I reset my password" is correctly
-retrieving the relevant embeddings and answered, we can extract the top N closet embeddings
-to the question into a fixture and only restore a small number of embeddings quickly.
-To facilitate an extraction process, a Rake task has been written.
-You can add or remove the questions needed to be tested in the Rake task and run the task to generate a new fixture.
+When writing or updating tests related to embeddings, you may want to update the
+embeddings fixture file:
 
 ```shell
 RAILS_ENV=development bundle exec rake gitlab:llm:embeddings:vertex:extract_embeddings
 ```
 
-#### Using embeddings in specs
+#### Use embeddings in specs
+
+The `seed` Rake Task populates the development database with embeddings for all GitLab
+Documentation. The `extract_embeddings` Rake Task populates a fixture file with a subset
+of embeddings.
+
+The set of questions listed in the Rake Task itself determines
+which embeddings are pulled into the fixture file. For example, one of the
+questions is "How can I reset my password?" The `extract_embeddings` Task
+pulls the most relevant embeddings for this question from the development
+database (which has data from the `seed` Rake Task) and saves those embeddings
+in `ee/spec/fixtures/vertex_embeddings`. This fixture is used in tests related
+to embeddings.
+
+If you would like to change any of the questions supported in embeddings specs,
+update and re-run the `extract_embeddings` Rake Task.
 
 In the specs where you need to use the embeddings,
-use the RSpec config hook `:ai_embedding_fixtures` on a context.
+use the RSpec `:ai_embedding_fixtures` metadata.
 
 ```ruby
 context 'when asking about how to use GitLab', :ai_embedding_fixtures do
diff --git a/doc/user/project/push_options.md b/doc/user/project/push_options.md
index a129e6f6cd0..bf5bf856631 100644
--- a/doc/user/project/push_options.md
+++ b/doc/user/project/push_options.md
@@ -32,6 +32,10 @@ For server-side controls and enforcement of best practices, see
 
 You can use push options to skip a CI/CD pipeline, or pass CI/CD variables.
 
+NOTE:
+Push options are not available for merge request pipelines. For more information,
+see [issue 373212](https://gitlab.com/gitlab-org/gitlab/-/issues/373212).
+
 | Push option                    | Description | Example |
 |--------------------------------|-------------|---------|
 | `ci.skip`                      | Do not create a CI/CD pipeline for the latest push. Skips only branch pipelines and not [merge request pipelines](../../ci/pipelines/merge_request_pipelines.md). This does not skip pipelines for CI/CD integrations, such as Jenkins. | `git push -o ci.skip` |
author	GitLab Bot <gitlab-bot@gitlab.com>	2024-01-24 09:07:44 +0300
committer	GitLab Bot <gitlab-bot@gitlab.com>	2024-01-24 09:07:44 +0300
commit	4dcdd5bebb55bd5522ec180070d4d265e00943b5 (patch)
tree	33760c353dd9dc97d0e5a64b107579b89d9110ea /doc
parent	3f29b140ab13fd23ed35e759fd2bb6f41ba788ac (diff)