diff options
author | Sami Hiltunen <shiltunen@gitlab.com> | 2021-11-17 16:17:22 +0300 |
---|---|---|
committer | Sami Hiltunen <shiltunen@gitlab.com> | 2021-11-17 16:17:22 +0300 |
commit | f505f80f3548d8e6e06f97125a18fb49a41b848f (patch) | |
tree | d101981d77546f487bef5b1898226ab1b304efa3 | |
parent | aa413c7afbc868f21c0c97da9d46ac040dfdc4c6 (diff) |
Materialize valid_primaries view in dataloss query
The dataloss query is extremely slow for bigger datasets. The problem
is that for each row that the data loss query is returning,
Postgres computes the full result of the valid_primaries view only to
filter down to the correct record. This results in an o(n2) complexity
which kills the performance as soon as the dataset size increases. It's
not clear why the join parameters are not pushed down in to the view in
the query.
This commit optimizes the query by materializing the valid_primaries view.
This ensures Postgres computes the full view only once and joins with the
pre-computed result.
Changelog: performance
-rw-r--r-- | internal/praefect/datastore/repository_store.go | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/internal/praefect/datastore/repository_store.go b/internal/praefect/datastore/repository_store.go index 278f4e3ea..48d4814bc 100644 --- a/internal/praefect/datastore/repository_store.go +++ b/internal/praefect/datastore/repository_store.go @@ -589,6 +589,10 @@ func (rs *PostgresRepositoryStore) GetPartiallyAvailableRepositories(ctx context // than the assigned ones. // rows, err := rs.db.QueryContext(ctx, ` +WITH valid_primaries AS MATERIALIZED ( + SELECT * FROM valid_primaries +) + SELECT json_build_object ( 'RelativePath', relative_path, |