Re: Use extended statistics to estimate (Var op Var) clauses

Поиск

Список

Период

Сортировка

От	Mark Dilger
Тема	Re: Use extended statistics to estimate (Var op Var) clauses
Дата	11 августа 2021 г. 17:51:36
Msg-id	5E98F67D-73C7-4E7B-AEDE-DBABB13C09B0@enterprisedb.com обсуждение исходный текст
Ответ на	Re: Use extended statistics to estimate (Var op Var) clauses (Dean Rasheed <dean.a.rasheed@gmail.com>)
Ответы	Re: Use extended statistics to estimate (Var op Var) clauses (Mark Dilger <mark.dilger@enterprisedb.com>) Re: Use extended statistics to estimate (Var op Var) clauses (Tomas Vondra <tomas.vondra@enterprisedb.com>) Re: Use extended statistics to estimate (Var op Var) clauses (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список	pgsql-hackers

Дерево обсуждения

> On Aug 11, 2021, at 5:08 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> This feels like rather an artificial example though. Is there any real
> use for this sort of clause?

The test generated random combinations of clauses and then checked if any had consistently worse performance.  These
cameup.  I don't know that they represent anything real. 

What was not random in the tests was the data in the tables.  I've gotten curious if these types of clauses (with
columnscompared against themselves) would still be bad for random rather than orderly data sets.  I'll go check.... 

   testing....

Wow.  Randomizing the data makes the problems even more extreme.  It seems my original test set was actually playing to
thispatch's strengths, not its weaknesses.  I've changed the columns to double precision and filled the columns with
random()data, where column1 gets random()^1, column2 gets random()^2, etc.  So on average the larger numbered columns
willbe smaller, and the mcv list will be irrelevant, since values should not tend to repeat. 

Over all queries, 47791 have better estimates after the patch, but 34802 had worse estimates after the patch (with the
remaining17407 queries having roughly equal quality). 

The worst estimates are still ones that have a column compared to itself:

better:0, worse:33:  A <= B or A <= A or A <= A
better:0, worse:33:  A <= B or A = A or not A <> A
better:0, worse:33:  A <= B or A >= A or not A <> A
better:0, worse:33:  A <> B or A <= A
better:0, worse:33:  A <> B or A <= A or A <> A
better:0, worse:33:  A <> B or A <= A or A >= A
better:0, worse:33:  A <> B or A <= A or not A = A
better:0, worse:33:  A <> B or A > A or not A < A
better:0, worse:33:  A <> B or A >= A
better:0, worse:33:  A <> B or A >= A and A <= A
better:0, worse:33:  A = B or not A > A or not A > A
better:0, worse:33:  A >= B or not A <> A or A = A
better:0, worse:39:  B <= A or B <= B or B <= B
better:0, worse:39:  B <= A or B = B or not B <> B
better:0, worse:39:  B <= A or B >= B or not B <> B
better:0, worse:39:  B <> A or B <= B
better:0, worse:39:  B <> A or B <= B or B <> B
better:0, worse:39:  B <> A or B <= B or B >= B
better:0, worse:39:  B <> A or B <= B or not B = B
better:0, worse:39:  B <> A or B > B or not B < B
better:0, worse:39:  B <> A or B >= B
better:0, worse:39:  B <> A or B >= B and B <= B
better:0, worse:39:  B = A or not B > B or not B > B
better:0, worse:39:  B >= A or not B <> B or B = B

But there are plenty that got worse without that, such as the following examples:

better:25, worse:39:  A < B and A < B or B > A
better:10, worse:48:  A < B and A < C
better:10, worse:54:  A < B and A < C or C > A

I'll go test random data designed to have mcv lists of significance....

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Предыдущее

От: "David G. Johnston"
Дата: 11 августа 2021 г., 17:41:04
Сообщение: Re: DROP relation IF EXISTS Docs and Tests - Bug Fix

Следующее

От: Robert Haas
Дата: 11 августа 2021 г., 17:54:09
Сообщение: Re: Next Steps with Hash Indexes

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Use extended statistics to estimate (Var op Var) clauses

Предыдущее

Следующее