diff options
author | Sami Hiltunen <shiltunen@gitlab.com> | 2021-11-17 16:17:22 +0300 |
---|---|---|
committer | Sami Hiltunen <shiltunen@gitlab.com> | 2021-11-18 12:50:44 +0300 |
commit | ab692bf6a4b35d0e9d536d33cb4e248d50449847 (patch) | |
tree | 5a784f5e0224227b35db0e187b49363541b6ba45 | |
parent | d69b465fdff3c04f8e3f395aed34aaa59f23fe76 (diff) |
Materialize valid_primaries view in dataloss query
The dataloss query is extremely slow for bigger datasets. The problem
is that for each row that the data loss query is returning,
Postgres computes the full result of the valid_primaries view only to
filter down to the correct record. This results in an o(n2) complexity
which kills the performance as soon as the dataset size increases. It's
not clear why the join parameters are not pushed down in to the view in
the query.
This commit optimizes the query by materializing the valid_primaries view.
This ensures Postgres computes the full view only once and joins with the
pre-computed result.
Changelog: performance
-rw-r--r-- | internal/praefect/datastore/repository_store.go | 6 |
1 files changed, 5 insertions, 1 deletions
diff --git a/internal/praefect/datastore/repository_store.go b/internal/praefect/datastore/repository_store.go index b9ee33ba3..cdacca0e8 100644 --- a/internal/praefect/datastore/repository_store.go +++ b/internal/praefect/datastore/repository_store.go @@ -670,6 +670,10 @@ func (rs *PostgresRepositoryStore) GetPartiallyAvailableRepositories(ctx context // than the assigned ones. // rows, err := rs.db.QueryContext(ctx, ` +WITH valid_primaries AS MATERIALIZED ( + SELECT repository_id, storage FROM valid_primaries +) + SELECT json_build_object ( 'RelativePath', relative_path, @@ -707,7 +711,7 @@ FROM ( ) AS repository_assignments USING (repository_id, storage) JOIN repositories USING (repository_id) LEFT JOIN healthy_storages USING (virtual_storage, storage) - LEFT JOIN ( SELECT repository_id, storage FROM valid_primaries ) AS valid_primaries USING (repository_id, storage) + LEFT JOIN valid_primaries USING (repository_id, storage) WHERE virtual_storage = $1 ORDER BY relative_path, "primary", storage ) AS outdated_repositories |