Re: Apparent missed query optimization with self-join and inner grouping

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Apparent missed query optimization with self-join and inner grouping
Дата
Msg-id 1934924.1596306669@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Apparent missed query optimization with self-join and inner grouping  (Zack Weinberg <zackw@panix.com>)
Ответы How can you find out what point logical replication is at?
Список pgsql-general
Zack Weinberg <zackw@panix.com> writes:
> For each (experiment_id, url_id) pair for some small subset of the
> experiment_ids, I need to query the full_url_id corresponding to the
> *largest* value of redirect_num.  The query planner does something
> reasonable with this SELECT:

> => explain (analyze, verbose)
>    select b.experiment_id, b.url_id, b.full_url_id
>      from blockpage b,
>           (select experiment_id, url_id, max(redirect_num) as redirect_num
>              from blockpage group by experiment_id, url_id) bm
>     where b.experiment_id = bm.experiment_id
>       and b.url_id = bm.url_id
>       and b.redirect_num = bm.redirect_num
>       and bm.experiment_id in (16, 43);

With that query, the "bm.experiment_id in (16, 43)" restriction is
pushed into the "bm" sub-select, greatly reducing the amount of work
the GROUP BY step has to do.

> But if I change the final part of the WHERE to reference
> b.experiment_id instead of bm.experiment_id, I get this much more
> expensive query plan:

> => explain (analyze, verbose)
>    select b.experiment_id, b.url_id, b.full_url_id
>      from blockpage b,
>           (select experiment_id, url_id, max(redirect_num) as redirect_num
>              from blockpage group by experiment_id, url_id) bm
>     where b.experiment_id = bm.experiment_id
>       and b.url_id = bm.url_id
>       and b.redirect_num = bm.redirect_num
>       and b.experiment_id in (16, 43);

With that query, the GROUP BY is evaluated in full, and it costs you.

In principle, given the nearby "where b.experiment_id = bm.experiment_id"
clause, we could derive "bm.experiment_id in (16, 43)" from the stated
clause.  But we don't.  The existing machinery around derivation of
implied equalities only works for simple equalities, not OR clauses.
Extending that would be a bit of a research project, and it's far from
clear that the benefits would be worth the additional planning costs.

> What is the best way to report this to the developers?  Should I file
> a bug report?  I'm using Postgres 12.2.

This is not a bug, and you should not hold your breath waiting
for it to change.

            regards, tom lane



В списке pgsql-general по дате отправления:

Предыдущее
От: Curt Kolovson
Дата:
Сообщение: how reliable is pg_rewind?
Следующее
От: David Rowley
Дата:
Сообщение: Re: bad JIT decision