Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/ai_architecture.md')
-rw-r--r--doc/development/ai_architecture.md23
1 files changed, 22 insertions, 1 deletions
diff --git a/doc/development/ai_architecture.md b/doc/development/ai_architecture.md
index ac62f50baf5..f497047ccce 100644
--- a/doc/development/ai_architecture.md
+++ b/doc/development/ai_architecture.md
@@ -40,7 +40,7 @@ package "AI API" as AIF {
node "Vertex AI"
}
-package GitLab {
+package GitLab {
node "Web IDE"
package "Web" {
@@ -106,3 +106,24 @@ The following models have been approved for use:
The following vector stores have been approved for use:
- [`pgvector`](https://github.com/pgvector/pgvector) is a Postgres extension adding support for storing vector embeddings and calculating ANN (approximate nearest neighbor).
+
+### Indexing Update
+
+We are currently using sequential scan, which provides perfect recall. We are considering adding an index if we can ensure that it still produces accurate results, as noted in the `pgvector` indexing [documentation](https://github.com/pgvector/pgvector#indexing).
+
+Given that the table contains thousands of entries, indexing with these updated settings would likely improve search speed while maintaining high accuracy. However, more testing may be needed to verify the optimal configuration for this dataset size before deploying to production.
+
+A [draft MR](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/122035) has been created to update the index.
+
+The index function has been updated to improve search quality. This was tested locally by setting the `ivfflat.probes` value to `10` with the following SQL command:
+
+```ruby
+Embedding::TanukiBotMvc.connection.execute("SET ivfflat.probes = 10")
+```
+
+Setting the `probes` value for indexing improves results, as per the neighbor [documentation](https://github.com/ankane/neighbor#indexing).
+
+For optimal `probes` and `lists` values:
+
+- Use `lists` equal to `rows / 1000` for tables with up to 1 million rows and `sqrt(rows)` for larger datasets.
+- For `probes` start with `lists / 10` for tables up to 1 million rows and `sqrt(lists)` for larger datasets.