Re: Bad selectivity estimate when using a sub query to determine WHERE condition

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Bad selectivity estimate when using a sub query to determine WHERE condition
Дата
Msg-id 6875.1581349171@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Bad selectivity estimate when using a sub query to determine WHERE condition  (Chris Borckholder <chris.borckholder@bitpanda.com>)
Ответы Re: Bad selectivity estimate when using a sub query to determineWHERE condition  (Chris Borckholder <chris.borckholder@bitpanda.com>)
Список pgsql-performance
Chris Borckholder <chris.borckholder@bitpanda.com> writes:
> When using a sub select or CTE to get the latest roll up time, the query
> planner fails to recognize that a most of the large table would be filtered
> out by the condition and tries a sequential scan instead of an index scan.
> When using the literal value for the WHERE condition, the plan correctly
> uses an index scan, which is much faster.

Yeah, a scalar sub-select is pretty much a black box to the planner.

> EXPLAIN (ANALYZE, BUFFERS) WITH current_rollup AS (
>     SELECT COALESCE(MAX(window_end), '-infinity') AS cutoff
>     FROM exchange.ledger_zerosum_rollup
> )
> SELECT *
> FROM exchange.ledger
> WHERE created > (SELECT cutoff FROM current_rollup);

Well, it's not that hard to get rid of that scalar sub-select: since
you're already relying on current_rollup to produce exactly one row,
you could write a plain join instead, something like

WITH current_rollup AS ...
SELECT l.*
FROM exchange.ledger l, current_rollup c
WHERE l.created > c.cutoff;

Unfortunately I doubt that will improve matters much, since the
planner also knows relatively little about MAX() and nothing about
COALESCE, so it's not going to be able to estimate what comes out
of the WITH.  I think you're going to have to cheat a bit.

The form of cheating that comes to mind is to wrap the sub-select
in a function that's marked STABLE:

create function current_rollup_cutoff() returns timestamp -- or whatever
stable language sql as $$
SELECT COALESCE(MAX(window_end), '-infinity') AS cutoff
FROM exchange.ledger_zerosum_rollup
$$;

SELECT *
FROM exchange.ledger
WHERE created > current_rollup_cutoff();

I have not actually tried this, but I think that since the function is
marked stable, the planner would test-run it to get an estimated value,
and then produce a plan similar to what you'd get with a literal constant.

Of course, then it's going to run the function once more when the query is
executed for-real, so this approach doubles the cost of getting the MAX().
That shouldn't be too awful if you have an index on window_end, though.

If you like living dangerously, you could cheat a LOT and mark the
function immutable so that its value gets substituted at plan time.
But that will only work for interactive submission of the outer
query --- if the plan gets cached and re-used, you'll have a stale
cutoff value.  Personally I wouldn't risk that.

            regards, tom lane



В списке pgsql-performance по дате отправления:

Предыдущее
От: Asya Nevra Buyuksoy
Дата:
Сообщение: Re: Fwd: TOAST table performance problem
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: Bad selectivity estimate when using a sub query to determineWHERE condition