1 files changed, 44 insertions, 68 deletions
diff --git a/doc/development/ai_architecture.md b/doc/development/ai_architecture.md
index f497047ccce..84a2635b13c 100644
--- a/doc/development/ai_architecture.md
+++ b/doc/development/ai_architecture.md
@@ -4,7 +4,7 @@ group: unassigned
 info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
 ---
 
-# AI Architecture (Experiment)
+# AI Architecture
 
 GitLab has created a common set of tools to support our product groups and their utilization of AI. Our goals with this common architecture are:
 
@@ -13,79 +13,20 @@ GitLab has created a common set of tools to support our product groups and their
 
 AI is moving very quickly, and we need to be able to keep pace with changes in the area. We have built an [abstraction layer](../../ee/development/ai_features.md) to do this, allowing us to take a more "pluggable" approach to the underlying models, data stores, and other technologies.
 
-The following diagram shows a simplified view of how the different components in GitLab interact. The abstraction layer helps avoid code duplication within the REST APIs within the `AI API` block.
-
-```plantuml
-@startuml
-skin rose
-
-package "Code Suggestions" {
-  node "Model Gateway"
-  node "Triton Inference Server" as Triton
-}
-
-package "Code Suggestions Models"  as CSM {
-  node "codegen"
-  node "PaLM"
-}
-
-package "Suggested Reviewers" {
-  node "Model Gateway (SR)"
-  node "Extractor"
-  node "Serving Model"
-}
-
-package "AI API" as AIF {
-  node "OpenAI"
-  node "Vertex AI"
-}
-
-package GitLab {
-  node "Web IDE"
-
-  package "Web" {
-    node "REST API"
-    node "GraphQL"
-  }
-
-  package "Jobs" {
-    node "Sidekiq"
-  }
-}
-
-package Databases {
-  node "Vector Database"
-  node "PostgreSQL"
-}
-
-node "VSCode"
-
-"Model Gateway" --> Triton
-Triton --> CSM
-GitLab --> Databases
-VSCode --> "Model Gateway"
-"Web IDE" --> "Model Gateway"
-"Web IDE" --> "GraphQL"
-"Web IDE" --> "REST API"
-"Model Gateway" -[#blue]--> "REST API": user authorized?
-
-"Sidekiq" --> AIF
-Web --> AIF
-
-"Model Gateway (SR)" --> "REST API"
-"Model Gateway (SR)" --> "Serving Model"
-"Extractor" --> "GraphQL"
-"Sidekiq" --> "Model Gateway (SR)"
-
-@enduml
-```
+The following diagram from the [architecture blueprint](../architecture/blueprints/ai_gateway/index.md) shows a simplified view of how the different components in GitLab interact. The abstraction layer helps avoid code duplication within the REST APIs within the `AI API` block.
+
+![architecture diagram](img/architecture.png)
 
 ## SaaS-based AI abstraction layer
 
-GitLab currently operates a cloud-hosted AI architecture. We are exploring how self-managed instances integrate with it.
+GitLab currently operates a cloud-hosted AI architecture. We will allow access to it for licensed self managed instances using the AI-gateway. Please see [the blueprint](../architecture/blueprints/ai_gateway) for details
 
 There are two primary reasons for this: the best AI models are cloud-based as they often depend on specialized hardware designed for this purpose, and operating self-managed infrastructure capable of AI at-scale and with appropriate performance is a significant undertaking. We are actively [tracking self-managed customers interested in AI](https://gitlab.com/gitlab-org/gitlab/-/issues/409183).
 
+## AI Gateway
+
+The AI Gateway (formerly the [model gateway](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist)) is a standalone-service that will give access to AI features to all users of GitLab, no matter which instance they are using: self-managed, dedicated or GitLab.com. The SaaS-based AI abstraction layer will transition to connecting to this gateway, rather than accessing cloud-based providers directly.
+
 ## Supported technologies
 
 As part of the AI working group, we have been investigating various technologies and vetting them. Below is a list of the tools which have been reviewed and already approved for use within the GitLab application.
@@ -127,3 +68,38 @@ For optimal `probes` and `lists` values:
 
 - Use `lists` equal to `rows / 1000` for tables with up to 1 million rows and `sqrt(rows)` for larger datasets.
 - For `probes` start with `lists / 10` for tables up to 1 million rows and `sqrt(lists)` for larger datasets.
+
+### Code Suggestions
+
+Code Suggestions is being integrated as part of the GitLab-Rails repository which will unify the architectures between Code Suggestions and AI features that use the abstraction layer, along with offering self-managed support for the other AI features.
+
+The following table documents functionality that Code Suggestions offers today, and what those changes will look like as part of the unification:
+
+| Topic | Details | Where this happens today | Where this will happen going forward |
+| ----- | ------  | --------------           | ------------------                   |
+| Request processing | |                     |                                      |
+|                    | Receives requests from IDEs (VSCode, GitLab WebIDE, MS Visual Studio, IntelliJ, JetBrains, VIM, Emacs, Sublime), including code before and after the cursor | AI Gateway | Abstraction Layer |
+|                    | Authentication the current user, verifies they are authorized to use Code Suggestions for this project | AI Gateway | Abstraction layer |
+|                    | Preprocesses the request to add context, such as including imports via TreeSitter | AI Gateway | Undecided |
+|                    | Routes the request to the AI Provider | AI Gateway | AI Gateway |
+|                    | Returns the response to the IDE | AI Gateway | Abstraction Layer |
+|                    | Logs the request, including timestamp, response time, model, etc | AI Gateway | Both |
+| Telemetry | |                     |                                      |
+|           | User acceptance or rejection in the IDE | AI Gateway | [Both](https://gitlab.com/gitlab-org/gitlab/-/issues/418282) |
+|           | Number of unique users per day | [Abstraction Layer](https://app.periscopedata.com/app/gitlab/1143612/Code-Suggestions-Usage) | Undecided |
+|           | Error rate, model usage, response time, IDE usage | [AI Gateway](https://log.gprd.gitlab.net/app/dashboards#/view/6c947f80-7c07-11ed-9f43-e3784d7fe3ca?_g=(refreshInterval:(pause:!t,value:0),time:(from:now-6h,to:now))) | Both |
+|           | Suggestions per language | AI Gateway |[Both](https://gitlab.com/groups/gitlab-org/-/epics/11017) |
+| Monitoring | |  Both                   |   Both                                  |
+|            | |                     |                                      |
+| Model Routing | |                     |                                      |
+|            | Currently we are not using this functionality, but Code Suggestions is able to support routing to multiple models based on a percentage of traffic | AI Gateway | Both |
+| Internal Models | |                     |                                      |
+|            | Currently unmaintained, the ability to run models in our own instance, running them inside Triton, and routing requests to our own models | AI Gateway | AI Gateway |
+
+#### Code Suggestions Latency
+
+Code Suggestions acceptance rates are _highly_ sensitive to latency. While writing code with an AI assistant, a user will pause only for a short duration before continuing on with manually typing out a block of code. As soon as the user has pressed a subsequent keypress, the existing suggestion will be invalidated and a new request will need to be issued to the code suggestions endpoint. In turn, this request will also be highly sensitive to latency.
+
+In a worst case with sufficient latency, the IDE could be issuing a string of requests, each of which is then ignored as the user proceeds without waiting for the response. This adds no value for the user, while still putting load on our services.
+
+See our discussions [here](https://gitlab.com/gitlab-org/gitlab/-/issues/418955) around how we plan to iterate on latency for this feature.