diff options
Diffstat (limited to 'doc/development/ai_features/index.md')
-rw-r--r-- | doc/development/ai_features/index.md | 577 |
1 files changed, 577 insertions, 0 deletions
diff --git a/doc/development/ai_features/index.md b/doc/development/ai_features/index.md new file mode 100644 index 00000000000..e1d3ae36570 --- /dev/null +++ b/doc/development/ai_features/index.md @@ -0,0 +1,577 @@ +--- +stage: AI-powered +group: AI Framework +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments +--- + +# AI features based on 3rd-party integrations + +[Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/117296) in GitLab 15.11. + +## Features + +- Async execution of the long running API requests + - GraphQL Action starts the request + - Background workers execute + - GraphQL subscriptions deliver results back in real time +- Abstraction for + - OpenAI + - Google Vertex AI + - Anthropic +- Rate Limiting +- Circuit Breaker +- Multi-Level feature flags +- License checks on group level +- Snowplow execution tracking +- Tracking of Token Spent on Prometheus +- Configuration for Moderation check of inputs +- Automatic Markdown Rendering of responses +- Centralised Group Level settings for experiment and 3rd party +- Experimental API endpoints for exploration of AI APIs by GitLab team members without the need for credentials + - OpenAI + - Google Vertex AI + - Anthropic + +## Feature flags + +Apply the following two feature flags to any AI feature work: + +- A general that applies to all AI features. +- A flag specific to that feature. The feature flag name [must be different](../feature_flags/index.md#feature-flags-for-licensed-features) than the licensed feature name. + +See the [feature flag tracker](https://gitlab.com/gitlab-org/gitlab/-/issues/405161) for the list of all feature flags and how to use them. + +## Implement a new AI action + +To implement a new AI action, connect to the preferred AI provider. You can connect to this API using either the: + +- Experimental REST API. +- Abstraction layer. + +All AI features are experimental. + +## Test AI features locally + +NOTE: +Use [this snippet](https://gitlab.com/gitlab-org/gitlab/-/snippets/2554994) for help automating the following section. + +1. Enable the required general feature flags: + + ```ruby + Feature.enable(:ai_related_settings) + Feature.enable(:openai_experimentation) + ``` + +1. Simulate the GDK to [simulate SaaS](../ee_features.md#simulate-a-saas-instance) and ensure the group you want to test has an Ultimate license +1. Enable `Experimental features` and `Third-party AI services` + 1. Go to the group with the Ultimate license + 1. **Group Settings** > **General** -> **Permissions and group features** + 1. Enable **Experiment features** + 1. Enable **Third-party AI services** +1. Enable the specific feature flag for the feature you want to test +1. Set the required access token. To receive an access token: + 1. For Vertex, follow the [instructions below](#configure-gcp-vertex-access). + 1. For all other providers, like Anthropic or OpenAI, create an access request where `@m_gill`, `@wayne`, and `@timzallmann` are the tech stack owners. + +### Set up the embedding database + +NOTE: +Use [this snippet](https://gitlab.com/gitlab-org/gitlab/-/snippets/2554994) for help automating the following section. + +For features that use the embedding database, additional setup is needed. + +1. Enable [pgvector](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/pgvector.md#enable-pgvector-in-the-gdk) in GDK +1. Enable the embedding database in GDK + + ```shell + gdk config set gitlab.rails.databases.embedding.enabled true + ``` + +1. Run `gdk reconfigure` +1. Run database migrations to create the embedding database + +### Setup for GitLab documentation chat (legacy chat) + +To populate the embedding database for GitLab chat: + +1. Open a rails console +1. Run [this script](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/10588#note_1373586079) to populate the embedding database + +### Configure GCP Vertex access + +In order to obtain a GCP service key for local development, please follow the steps below: + +- Create a sandbox GCP environment by visiting [this page](https://about.gitlab.com/handbook/infrastructure-standards/#individual-environment) and following the instructions, or by requesting access to our existing group environment by using [this template](https://gitlab.com/gitlab-com/it/infra/issue-tracker/-/issues/new?issuable_template=gcp_group_account_iam_update_request). +- In the GCP console, go to `IAM & Admin` > `Service Accounts` and click on the "Create new service account" button +- Name the service account something specific to what you're using it for. Select Create and Continue. Under `Grant this service account access to project`, select the role `Vertex AI User`. Select `Continue` then `Done` +- Select your new service account and `Manage keys` > `Add Key` > `Create new key`. This will download the **private** JSON credentials for your service account. +- If you are using your own project, you may also need to enable the Vertex AI API: + 1. Go to **APIs & Services > Enabled APIs & services**. + 1. Select **+ Enable APIs and Services**. + 1. Search for `Vertex AI API`. + 1. Select **Vertex AI API**, then select **Enable**. +- Open the Rails console. Update the settings to: + +```ruby +Gitlab::CurrentSettings.update(vertex_ai_credentials: File.read('/YOUR_FILE.json')) + +# Note: These credential examples will not work locally for all models +Gitlab::CurrentSettings.update(vertex_ai_host: "<root-domain>") # Example: us-central1-aiplatform.googleapis.com +Gitlab::CurrentSettings.update(vertex_ai_project: "<project-id>") # Example: cloud-large-language-models +``` + +Internal team members can [use this snippet](https://gitlab.com/gitlab-com/gl-infra/production/-/snippets/2541742) for help configuring these endpoints. + +### Configure OpenAI access + +```ruby +Gitlab::CurrentSettings.update(openai_api_key: "<open-ai-key>") +``` + +### Configure Anthropic access + +```ruby +Gitlab::CurrentSettings.update!(anthropic_api_key: <insert API key>) +``` + +#### Populating embeddings and using embeddings fixture + +To seed your development database with the embeddings for GitLab Documentation, +you may use the pre-generated embeddings and a Rake test. + +```shell +RAILS_ENV=development bundle exec rake gitlab:llm:embeddings:seed_pre_generated +``` + +The DBCleaner gem we use clear the database tables before each test runs. +Instead of fully populating the table `tanuki_bot_mvc` where we store embeddings for the documentations, +we can add a few selected embeddings to the table from a pre-generated fixture. + +For instance, to test that the question "How can I reset my password" is correctly +retrieving the relevant embeddings and answered, we can extract the top N closet embeddings +to the question into a fixture and only restore a small number of embeddings quickly. +To faciliate an extraction process, a Rake task been written. +You can add or remove the questions needed to be tested in the Rake task and run the task to generate a new fixture. + +```shell +RAILS_ENV=development bundle exec rake gitlab:llm:embeddings:extract_embeddings +``` + +In the specs where you need to use the embeddings, +use the RSpec config hook `:ai_embedding_fixtures` on a context. + +```ruby +context 'when asking about how to use GitLab', :ai_embedding_fixtures do + # ...examples +end +``` + +### Working with GitLab Duo Chat + +View [guidelines](duo_chat.md) for working with GitLab Duo Chat. + +## Experimental REST API + +Use the [experimental REST API endpoints](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/api/ai/experimentation) to quickly experiment and prototype AI features. + +The endpoints are: + +- `https://gitlab.example.com/api/v4/ai/experimentation/openai/completions` +- `https://gitlab.example.com/api/v4/ai/experimentation/openai/embeddings` +- `https://gitlab.example.com/api/v4/ai/experimentation/openai/chat/completions` +- `https://gitlab.example.com/api/v4/ai/experimentation/anthropic/complete` +- `https://gitlab.example.com/api/v4/ai/experimentation/vertex/chat` + +These endpoints are only for prototyping, not for rolling features out to customers. + +In your local dev environment, you can experiment with these endpoints locally with the feature flag enabled: + +```ruby +Feature.enable(:ai_experimentation_api) +``` + +On production, the experimental endpoints are only available to GitLab team members. Use a +[GitLab API token](../../user/profile/personal_access_tokens.md) to authenticate. + +## Abstraction layer + +### GraphQL API + +To connect to the AI provider API using the Abstraction Layer, use an extendable GraphQL API called +[`aiAction`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/app/graphql/mutations/ai/action.rb). +The `input` accepts key/value pairs, where the `key` is the action that needs to be performed. +We only allow one AI action per mutation request. + +Example of a mutation: + +```graphql +mutation { + aiAction(input: {summarizeComments: {resourceId: "gid://gitlab/Issue/52"}}) { + clientMutationId + } +} +``` + +As an example, assume we want to build an "explain code" action. To do this, we extend the `input` with a new key, +`explainCode`. The mutation would look like this: + +```graphql +mutation { + aiAction(input: {explainCode: {resourceId: "gid://gitlab/MergeRequest/52", code: "foo() { console.log()" }}) { + clientMutationId + } +} +``` + +The GraphQL API then uses the [OpenAI Client](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/gitlab/llm/open_ai/client.rb) +to send the response. + +Remember that other clients are available and you should not use OpenAI. + +#### How to receive a response + +The API requests to AI providers are handled in a background job. We therefore do not keep the request alive and the Frontend needs to match the request to the response from the subscription. + +WARNING: +Determining the right response to a request can cause problems when only `userId` and `resourceId` are used. For example, when two AI features use the same `userId` and `resourceId` both subscriptions will receive the response from each other. To prevent this intereference, we introduced the `clientSubscriptionId`. + +To match a response on the `aiCompletionResponse` subscription, you can provide a `clientSubscriptionId` to the `aiAction` mutation. + +- The `clientSubscriptionId` should be unique per feature and within a page to not interfere with other AI features. We recommend to use a `UUID`. +- Only when the `clientSubscriptionId` is provided as part of the `aiAction` mutation, it will be used for broadcasting the `aiCompletionResponse`. +- If the `clientSubscriptionId` is not provided, only `userId` and `resourceId` are used for the `aiCompletionResponse`. + +As an example mutation for summarizing comments, we provide a `randomId` as part of the mutation: + +```graphql +mutation { + aiAction(input: {summarizeComments: {resourceId: "gid://gitlab/Issue/52"}, clientSubscriptionId: "randomId"}) { + clientMutationId + } +} +``` + +In our component, we then listen on the `aiCompletionResponse` using the `userId`, `resourceId` and `clientSubscriptionId` (`"randomId"`): + +```graphql +subscription aiCompletionResponse($userId: UserID, $resourceId: AiModelID, $clientSubscriptionId: String) { + aiCompletionResponse(userId: $userId, resourceId: $resourceId, clientSubscriptionId: $clientSubscriptionId) { + content + errors + } +} +``` + +Note that the [subscription for chat](duo_chat.md#graphql-subscription) behaves differently. + +To not have many concurrent subscriptions, you should also only subscribe to the subscription once the mutation is sent by using [`skip()`](https://apollo.vuejs.org/guide/apollo/subscriptions.html#skipping-the-subscription). + +#### Current abstraction layer flow + +The following graph uses OpenAI as an example. You can use different providers. + +```mermaid +flowchart TD +A[GitLab frontend] -->B[AiAction GraphQL mutation] +B --> C[Llm::ExecuteMethodService] +C --> D[One of services, for example: Llm::GenerateSummaryService] +D -->|scheduled| E[AI worker:Llm::CompletionWorker] +E -->F[::Gitlab::Llm::Completions::Factory] +F -->G[`::Gitlab::Llm::OpenAi::Completions::...` class using `::Gitlab::Llm::OpenAi::Templates::...` class] +G -->|calling| H[Gitlab::Llm::OpenAi::Client] +H --> |response| I[::Gitlab::Llm::OpenAi::ResponseService] +I --> J[GraphqlTriggers.ai_completion_response] +J --> K[::GitlabSchema.subscriptions.trigger] +``` + +## CircuitBreaker + +The CircuitBreaker concern is a reusable module that you can include in any class that needs to run code with circuit breaker protection. The concern provides a `run_with_circuit` method that wraps a code block with circuit breaker functionality, which helps prevent cascading failures and improves system resilience. For more information about the circuit breaker pattern, see: + +- [What is Circuit breaker](https://martinfowler.com/bliki/CircuitBreaker.html). +- [The Hystrix documentation on CircuitBreaker](https://github.com/Netflix/Hystrix/wiki/How-it-Works#circuit-breaker). + +### Use CircuitBreaker + +To use the CircuitBreaker concern, you need to include it in a class. For example: + +```ruby +class MyService + include Gitlab::Llm::Concerns::CircuitBreaker + + def call_external_service + run_with_circuit do + # Code that interacts with external service goes here + + raise InternalServerError + end + end +end +``` + +The `call_external_service` method is an example method that interacts with an external service. +By wrapping the code that interacts with the external service with `run_with_circuit`, the method is executed within the circuit breaker. +The circuit breaker is created and configured by the `circuit` method, which is called automatically when the `CircuitBreaker` module is included. +The method should raise `InternalServerError` error which will be counted towards the error threshold if raised during the execution of the code block. + +The circuit breaker tracks the number of errors and the rate of requests, +and opens the circuit if it reaches the configured error threshold or volume threshold. +If the circuit is open, subsequent requests fail fast without executing the code block, and the circuit breaker periodically allows a small number of requests through to test the service's availability before closing the circuit again. + +### Configuration + +The circuit breaker is configured with two constants which control the number of errors and requests at which the circuit will open: + +- `ERROR_THRESHOLD` +- `VOLUME_THRESHOLD` + +You can adjust these values as needed for the specific service and usage pattern. +The `InternalServerError` is the exception class counted towards the error threshold if raised during the execution of the code block. +This is the exception class that triggers the circuit breaker when raised by the code that interacts with the external service. + +NOTE: +The `CircuitBreaker` module depends on the `Circuitbox` gem to provide the circuit breaker implementation. By default, the service name is inferred from the class name where the concern module is included. Override the `service_name` method if the name needs to be different. + +### Testing + +To test code that uses the `CircuitBreaker` concern, you can use `RSpec` shared examples and pass the `service` and `subject` variables: + +```ruby +it_behaves_like 'has circuit breaker' do + let(:service) { dummy_class.new } + let(:subject) { service.dummy_method } +end +``` + +## How to implement a new action + +### Register a new method + +Go to the `Llm::ExecuteMethodService` and add a new method with the new service class you will create. + +```ruby +class ExecuteMethodService < BaseService + METHODS = { + # ... + amazing_new_ai_feature: Llm::AmazingNewAiFeatureService + }.freeze +``` + +### Create a Service + +1. Create a new service under `ee/app/services/llm/` and inherit it from the `BaseService`. +1. The `resource` is the object we want to act on. It can be any object that includes the `Ai::Model` concern. For example it could be a `Project`, `MergeRequest`, or `Issue`. + +```ruby +# ee/app/services/llm/amazing_new_ai_feature_service.rb + +module Llm + class AmazingNewAiFeatureService < BaseService + private + + def perform + ::Llm::CompletionWorker.perform_async(user.id, resource.id, resource.class.name, :amazing_new_ai_feature) + success + end + + def valid? + super && Ability.allowed?(user, :amazing_new_ai_feature, resource) + end + end +end +``` + +### Authorization + +We recommend to use [policies](../policies.md) to deal with authorization for a feature. Currently we need to make sure to cover the following checks: + +1. General AI feature flag is enabled +1. Feature specific feature flag is enabled +1. The namespace has the required license for the feature +1. User is a member of the group/project +1. `experiment_features_enabled` and `third_party_ai_features_enabled` flags are set on the `Namespace` + +For our example, we need to implement the `allowed?(:amazing_new_ai_feature)` call. As an example, you can look at the [Issue Policy for the summarize comments feature](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/policies/ee/issue_policy.rb). In our example case, we want to implement the feature for Issues as well: + +```ruby +# ee/app/policies/ee/issue_policy.rb + +module EE + module IssuePolicy + extend ActiveSupport::Concern + prepended do + with_scope :subject + condition(:ai_available) do + ::Feature.enabled?(:openai_experimentation) + end + + with_scope :subject + condition(:amazing_new_ai_feature_enabled) do + ::Feature.enabled?(:amazing_new_ai_feature, subject_container) && + subject_container.licensed_feature_available?(:amazing_new_ai_feature) + end + + rule do + ai_available & amazing_new_ai_feature_enabled & is_project_member + end.enable :amazing_new_ai_feature + end + end +end +``` + +### Pairing requests with responses + +Because multiple users' requests can be processed in parallel, when receiving responses, +it can be difficult to pair a response with its original request. The `requestId` +field can be used for this purpose, because both the request and response are assured +to have the same `requestId` UUID. + +### Caching + +AI requests and responses can be cached. Cached conversation is being used to +display user interaction with AI features. In the current implementation, this cache +is not used to skip consecutive calls to the AI service when a user repeats +their requests. + +```graphql +query { + aiMessages { + nodes { + id + requestId + content + role + errors + timestamp + } + } +} +``` + +This cache is especially useful for chat functionality. For other services, +caching is disabled. (It can be enabled for a service by using `cache_response: true` +option.) + +Caching has following limitations: + +- Messages are stored in Redis stream. +- There is a single stream of messages per user. This means that all services + currently share the same cache. If needed, this could be extended to multiple + streams per user (after checking with the infrastructure team that Redis can handle + the estimated amount of messages). +- Only the last 50 messages (requests + responses) are kept. +- Expiration time of the stream is 3 days since adding last message. +- User can access only their own messages. There is no authorization on the caching + level, and any authorization (if accessed by not current user) is expected on + the service layer. + +### Check if feature is allowed for this resource based on namespace settings + +There are two settings allowed on root namespace level that restrict the use of AI features: + +- `experiment_features_enabled` +- `third_party_ai_features_enabled`. + +To check if that feature is allowed for a given namespace, call: + +```ruby +Gitlab::Llm::StageCheck.available?(namespace, :name_of_the_feature) +``` + +Add the name of the feature to the `Gitlab::Llm::StageCheck` class. There are arrays there that differentiate +between experimental and beta features. + +This way we are ready for the following different cases: + +- If the feature is not in any array, the check will return `true`. For example, the feature was moved to GA and does not use a third-party setting. +- If feature is in GA, but uses a third-party setting, the class will return a proper answer based on the namespace third-party setting. + +To move the feature from the experimental phase to the beta phase, move the name of the feature from the `EXPERIMENTAL_FEATURES` array to the `BETA_FEATURES` array. + +### Implement calls to AI APIs and the prompts + +The `CompletionWorker` will call the `Completions::Factory` which will initialize the Service and execute the actual call to the API. +In our example, we will use OpenAI and implement two new classes: + +```ruby +# /ee/lib/gitlab/llm/open_ai/completions/amazing_new_ai_feature.rb + +module Gitlab + module Llm + module OpenAi + module Completions + class AmazingNewAiFeature + def initialize(ai_prompt_class) + @ai_prompt_class = ai_prompt_class + end + + def execute(user, issue, options) + options = ai_prompt_class.get_options(options[:messages]) + + ai_response = Gitlab::Llm::OpenAi::Client.new(user).chat(content: nil, **options) + + ::Gitlab::Llm::OpenAi::ResponseService.new(user, issue, ai_response, options: {}).execute( + Gitlab::Llm::OpenAi::ResponseModifiers::Chat.new + ) + end + + private + + attr_reader :ai_prompt_class + end + end + end + end +end +``` + +```ruby +# /ee/lib/gitlab/llm/open_ai/templates/amazing_new_ai_feature.rb + +module Gitlab + module Llm + module OpenAi + module Templates + class AmazingNewAiFeature + TEMPERATURE = 0.3 + + def self.get_options(messages) + system_content = <<-TEMPLATE + You are an assistant that writes code for the following input: + """ + TEMPLATE + + { + messages: [ + { role: "system", content: system_content }, + { role: "user", content: messages }, + ], + temperature: TEMPERATURE + } + end + end + end + end + end +end +``` + +Because we support multiple AI providers, you may also use those providers for the same example: + +```ruby +Gitlab::Llm::VertexAi::Client.new(user) +Gitlab::Llm::Anthropic::Client.new(user) +``` + +### Monitoring Ai Actions + +- Error ratio and response latency apdex for each Ai action can be found on [Sidekiq Service dashboard](https://dashboards.gitlab.net/d/sidekiq-main/sidekiq-overview?orgId=1) under "SLI Detail: llm_completion". +- Spent tokens, usage of each Ai feature and other statistics can be found on [periscope dashboard](https://app.periscopedata.com/app/gitlab/1137231/Ai-Features). + +### Add Ai Action to GraphQL + +TODO + +## Security + +Refer to the [secure coding guidelines for Artificial Intelligence (AI) features](../secure_coding_guidelines.md#artificial-intelligence-ai-features). |