Re: [GENERAL] odd optimizer result, index condition "is not null" on column defined as "not null"

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [GENERAL] odd optimizer result, index condition "is not null" on column defined as "not null"
Дата
Msg-id 19032.1488562421@sss.pgh.pa.us
обсуждение исходный текст
Ответ на [GENERAL] odd optimizer result, index condition "is not null" on column definedas "not null"  (Martin F <pg@mfriebe.de>)
Ответы Re: [GENERAL] odd optimizer result, index condition "is not null" oncolumn defined as "not null"  (Martin F <pg@mfriebe.de>)
Re: [GENERAL] odd optimizer result, index condition "is not null" oncolumn defined as "not null"  (Martin F <pg@mfriebe.de>)
Список pgsql-general
Martin F <pg@mfriebe.de> writes:
>    Index Cond: (tbl_foo.id IS NOT NULL)
> only "id" is the pk, and declared "not null".
> So why this index condition?

The IS NOT NULL condition is generated as part of transforming a "min(x)"
query into an indexscan, on the basis that "select min(x) from ..." is
equivalent to "select x from ... where x is not null order by x limit 1".
Without the NOT NULL check, that's an incorrect transformation.
You're right that we could observe that the NOT NULL is implied by a table
constraint and drop it, but it seems unlikely to be worth the planner
cycles to do so; the condition doesn't cost much at runtime.  (Also,
plans that depend on table constraints for correctness have added
bookkeeping costs from tracking such dependency.)

> The select with filter choose an IMHO better plan
>> Index Only Scan using tbl_foo_date on public.tbl_foo

[ shrug... ]  Can't get too excited about that.  The first rule of working
with the Postgres planner is that planning results on toy tables do not
scale to large tables; too many of the cost factors are nonlinear.
But the bigger picture here, which would become more obvious if you were
working with a non-toy amount of data, is that you're asking the planner
to choose between two bad options.  Basically it can either scan the data
in id order (using the pkey index) and stop when it hits the first row
satisfying the created_at condition (which might be arbitrarily far in);
or it can scan all the data satisfying the created_at condition (possibly
using the other index to optimize that) and remember the smallest id seen
while doing so.  Your query with the aggregate FILTER condition is a
further-pessimized version of the second approach, because it has to scan
the *whole* index; a per-aggregate FILTER doesn't get applied as an index
condition, but only while executing that aggregate.

If you are concerned about the performance of this specific query shape,
what you actually want is an index on (id, created_at).  That allows
stopping at the first index entry satisfying the created_at condition,
knowing that it must have the min id value that does so.

On a toy table the performance of any of these variants is going to be so
close that it's not certain which one the planner will pick (and it will
hardly matter anyway).  On a large table the correctly-chosen index will
make an enormous difference.

            regards, tom lane


В списке pgsql-general по дате отправления:

Предыдущее
От: Martin F
Дата:
Сообщение: Re: [GENERAL] odd optimizer result, index condition "is not null" oncolumn defined as "not null"
Следующее
От: Martin F
Дата:
Сообщение: Re: [GENERAL] odd optimizer result, index condition "is not null" oncolumn defined as "not null"