Use extended statistics to estimate (Var op Var) clauses

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Use extended statistics to estimate (Var op Var) clauses
Дата	13 ноября 2020 г. 04:14:28
Msg-id	9e0a12e0-c05f-b193-ed3d-fe88bc1e5fe1@enterprisedb.com обсуждение исходный текст
Список	pgsql-hackers

Дерево обсуждения

Hi,

Attached is a patch to allow estimation of (Var op Var) clauses using
extended statistics. Currently we only use extended stats to estimate
(Var op Const) clauses, which is sufficient for most cases, but it's not
very hard to support this second type of clauses.

This is not an entirely new patch - I've originally included it in the
patch series in [1] but it's probably better to discuss it separately,
so that it does not get buried in that discussion.

[1]
https://www.postgresql.org/message-id/flat/20200113230008.g67iyk4cs3xbnjju@development

To illustrate the purpose of this patch, consider this:

    db=# create table t (a int, b int);
    CREATE TABLE

    db=# insert into t select mod(i,10), mod(i,10)+1
           from generate_series(1,100000) s(i);
    INSERT 0 100000

    db=# analyze t;
    ANALYZE

    db=# explain select * from t where a < b;
                           QUERY PLAN
    --------------------------------------------------------
     Seq Scan on t  (cost=0.00..1693.00 rows=33333 width=8)
       Filter: (a < b)
    (2 rows)

    db=# explain select * from t where a > b;
                           QUERY PLAN
    --------------------------------------------------------
     Seq Scan on t  (cost=0.00..1693.00 rows=33333 width=8)
       Filter: (a > b)
    (2 rows)

    db=# create statistics s (mcv) on a,b from t;
    CREATE STATISTICS

    db=# analyze t;
    ANALYZE

    db=# explain select * from t where a < b;
                           QUERY PLAN
    ---------------------------------------------------------
     Seq Scan on t  (cost=0.00..1693.00 rows=100000 width=8)
       Filter: (a < b)
    (2 rows)

    db=# explain select * from t where a > b;
                         QUERY PLAN
    ----------------------------------------------------
     Seq Scan on t  (cost=0.00..1693.00 rows=1 width=8)
       Filter: (a > b)
    (2 rows)


I'm not entirely convinced this patch (on it's own) is very useful, for
a couple of reasons:

(a) Clauses of this form are not particularly common, at least compared
to the Var op Const clauses. (I don't recall slow-query reports from any
of our mailing lists that might be attributed to such clauses.)

(b) For known cases of such queries (e.g. several TPC-H queries do use
clauses like "l_commitdate < l_receiptdate" etc.) this is somewhat
hindered by extended statistics only supporting MCV lists, which may not
work particularly well for high-cardinality columns like dates etc.

But despite that it seems like a useful feature / building block, and
those limitations may be addressed in some other way (e.g. we may add
multi-dimensional histograms to address the second one).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

0001-Support-estimation-of-clauses-of-the-form-V-20201113.patch

В списке pgsql-hackers по дате отправления:

Предыдущее

От: "tsunakawa.takay@fujitsu.com"
Дата: 13 ноября 2020 г., 03:50:30
Сообщение: RE: Detecting File Damage & Inconsistencies

Следующее

От: Thomas Munro
Дата: 13 ноября 2020 г., 05:20:26
Сообщение: Re: WIP: WAL prefetch (another approach)

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Use extended statistics to estimate (Var op Var) clauses

Вложения

Предыдущее

Следующее