Re: [GENERAL] odd optimizer result, index condition "is not null" oncolumn defined as "not null"

Поиск
Список
Период
Сортировка
От Martin F
Тема Re: [GENERAL] odd optimizer result, index condition "is not null" oncolumn defined as "not null"
Дата
Msg-id 59496c1c-781a-5225-f082-35e6b20cadea@mfriebe.de
обсуждение исходный текст
Ответ на Re: [GENERAL] odd optimizer result, index condition "is not null" on column defined as "not null"  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
On 03/03/2017 17:33, Tom Lane wrote:
> Martin F <pg@mfriebe.de> writes:
>> The select with filter choose an IMHO better plan
>>> Index Only Scan using tbl_foo_date on public.tbl_foo
> But the bigger picture here, which would become more obvious if you were
> working with a non-toy amount of data, is that you're asking the planner
> to choose between two bad options.
I agree "better" was the wrong term. "different"

And yes they are both bad.

And in fact after some more research, I think I found
https://wiki.postgresql.org/wiki/Index-only_scans#What_types_of_queries_may_be_satisfied_by_an_index-only_scan.3F
which explains why the aggregate-with-filter is potentially much worse
(as it accesses more rows)

Lets see, if I am closer to a correct understanding.
Lets see if my following assumptions are (closer to being) correct:

So the real missing feature here is the opposite of what I expected.
     select min(id) filter(where  created_at >= '2017-01-15') from tbl_foo
is NOT rewritten to
    select id from tbl_foo where created_at >= '2017-01-15' and id is
not null order by id limit 1
That is the filter is currently not transformed to a where.


On the other hand, looking at the explain of
>  explain analyze verbose select min(id) filter(where  created_at >=
> '2017-01-15') from tbl_foo;
> QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=13.28..13.29 rows=1 width=16) (actual
> time=0.799..0.804 rows=1 loops=1)
>    Output: min(id) FILTER (WHERE (created_at >= '2017-01-15
> 00:00:00'::timestamp without time zone))
>    ->  Index Only Scan using tbl_foo_id_date on public.tbl_foo
> (cost=0.14..13.00 rows=57 width=16) (actual time=0.024..0.437 rows=57
> loops=1)
>          Output: created_at, id
>          Heap Fetches: 57
>  Planning time: 0.080 ms
>  Execution time: 0.901 ms
57 heap fetches, so one for every row.

It seems that min() does a heap fetch for every row, even if the value
for that row is bigger than the current aggregated value. That is the
heap fetch happens, even if the value's visibility does not matter / the
value will be discarded anyway. (Of course that is because the function
can not affect the scanners decision if a row is required or not)

Are my above observation and conclusions correct, or am I missing
something crucially (apart from that I am only looking at a tiny
fraction of reality)


> If you are concerned about the performance of this specific query shape,
> what you actually want is an index on (id, created_at).  That allows
> stopping at the first index entry satisfying the created_at condition,
> knowing that it must have the min id value that does so.
>
Thanks indeed, taking in account the true nature of "index only", the
above is a good idea.

regards
Martin


В списке pgsql-general по дате отправления:

Предыдущее
От: Martin F
Дата:
Сообщение: Re: [GENERAL] odd optimizer result, index condition "is not null" oncolumn defined as "not null"
Следующее
От: "Sven R. Kunze"
Дата:
Сообщение: Re: [GENERAL] Querying JSON Lists