Re: Using multiple extended statistics for estimates

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Using multiple extended statistics for estimates
Дата
Msg-id 20191114214541.3ssauovlhh3vsasr@development
обсуждение исходный текст
Ответ на Re: Using multiple extended statistics for estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Thu, Nov 14, 2019 at 03:16:04PM -0500, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> For the case with equal Const values that should be mostly obvious, i.e.
>> "a=1 AND a=1 AND a=1" has the same selectivity as "a=1".
>
>> The case with different Const values is harder, unfortunately. It might
>> seem obvious that "a=1 AND a=2" means there are no matching rows, but
>> that heavily relies on the semantics of the equality operator. And we
>> can't simply compare the Const values either, I'm afraid, because there
>> are cases with cross-type operators like
>>   a = 1::int AND a = 1.0::numeric
>> where the Consts are of different type, yet both conditions can be true.
>
>FWIW, there's code in predtest.c to handle exactly that, at least for
>types sharing a btree opfamily.  Whether it's worth applying that logic
>here is unclear, but note that we've had the ability to recognize
>redundant and contradictory clauses for a long time:
>
>regression=# explain select * from tenk1 where two = 1;
>                         QUERY PLAN
>------------------------------------------------------------
> Seq Scan on tenk1  (cost=0.00..470.00 rows=5000 width=244)
>   Filter: (two = 1)
>(2 rows)
>
>regression=# explain select * from tenk1 where two = 1 and two = 1::bigint;
>                         QUERY PLAN
>------------------------------------------------------------
> Seq Scan on tenk1  (cost=0.00..470.00 rows=5000 width=244)
>   Filter: (two = 1)
>(2 rows)
>
>regression=# explain select * from tenk1 where two = 1 and two = 2::bigint;
>                          QUERY PLAN
>---------------------------------------------------------------
> Result  (cost=0.00..470.00 rows=1 width=244)
>   One-Time Filter: false
>   ->  Seq Scan on tenk1  (cost=0.00..470.00 rows=1 width=244)
>         Filter: (two = 1)
>(4 rows)
>
>It falls down on
>
>regression=# explain select * from tenk1 where two = 1 and two = 2::numeric;
>                        QUERY PLAN
>-----------------------------------------------------------
> Seq Scan on tenk1  (cost=0.00..520.00 rows=25 width=244)
>   Filter: ((two = 1) AND ((two)::numeric = '2'::numeric))
>(2 rows)
>
>because numeric isn't in the same opfamily, so these clauses can't be
>compared easily.
>
>            regards, tom lane

Yeah, and this logic still works - the redundant clauses won't even get
to the selectivity estimation, I think. So maybe the comment is not
quite necessary, because the problem does not even exist ...

Maybe we could do something about the cases that predtest.c can't solve,
but it's not clear if we can be much smarter for types with different
opfamilies.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Missing dependency tracking for TableFunc nodes
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Using multiple extended statistics for estimates