Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSami Hiltunen <shiltunen@gitlab.com>2021-11-17 16:17:22 +0300
committerSami Hiltunen <shiltunen@gitlab.com>2021-11-18 12:50:44 +0300
commitab692bf6a4b35d0e9d536d33cb4e248d50449847 (patch)
tree5a784f5e0224227b35db0e187b49363541b6ba45
parentd69b465fdff3c04f8e3f395aed34aaa59f23fe76 (diff)
Materialize valid_primaries view in dataloss query
The dataloss query is extremely slow for bigger datasets. The problem is that for each row that the data loss query is returning, Postgres computes the full result of the valid_primaries view only to filter down to the correct record. This results in an o(n2) complexity which kills the performance as soon as the dataset size increases. It's not clear why the join parameters are not pushed down in to the view in the query. This commit optimizes the query by materializing the valid_primaries view. This ensures Postgres computes the full view only once and joins with the pre-computed result. Changelog: performance
-rw-r--r--internal/praefect/datastore/repository_store.go6
1 files changed, 5 insertions, 1 deletions
diff --git a/internal/praefect/datastore/repository_store.go b/internal/praefect/datastore/repository_store.go
index b9ee33ba3..cdacca0e8 100644
--- a/internal/praefect/datastore/repository_store.go
+++ b/internal/praefect/datastore/repository_store.go
@@ -670,6 +670,10 @@ func (rs *PostgresRepositoryStore) GetPartiallyAvailableRepositories(ctx context
// than the assigned ones.
//
rows, err := rs.db.QueryContext(ctx, `
+WITH valid_primaries AS MATERIALIZED (
+ SELECT repository_id, storage FROM valid_primaries
+)
+
SELECT
json_build_object (
'RelativePath', relative_path,
@@ -707,7 +711,7 @@ FROM (
) AS repository_assignments USING (repository_id, storage)
JOIN repositories USING (repository_id)
LEFT JOIN healthy_storages USING (virtual_storage, storage)
- LEFT JOIN ( SELECT repository_id, storage FROM valid_primaries ) AS valid_primaries USING (repository_id, storage)
+ LEFT JOIN valid_primaries USING (repository_id, storage)
WHERE virtual_storage = $1
ORDER BY relative_path, "primary", storage
) AS outdated_repositories