Re: fix cost subqueryscan wrong parallel cost

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: fix cost subqueryscan wrong parallel cost
Дата
Msg-id CA+Tgmoa7Zv+5fs4zZ-0WSXeNpZpFJfDqpbtWdEUzsMc0Q6R7fw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: fix cost subqueryscan wrong parallel cost  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: fix cost subqueryscan wrong parallel cost  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Mon, May 2, 2022 at 5:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I did look at the rest of costsize.c for similar instances, and didn't
> find any.  In any case, I think we have two options:
>
> 1. Apply this fix, and in future fix any other places that we identify
> later.
>
> 2. Invent some entirely new scheme that we hope is less mistake-prone.
>
> Option #2 is unlikely to lead to any near-term fix, and we certainly
> wouldn't dare back-patch it.

Sure, although I think it's questionable whether we should back-patch
anyway, since there's no guarantee that every plan change anybody gets
will be a desirable one.

> I've wondered about that too, but it seems to depend on the assumption
> that clauses are estimated independently by clauselist_selectivity, which
> has not been true for a long time (and is getting less true not more so).
> So we could possibly apply something like this for parallelism, but not
> for parameterized paths, and that makes it less appealing ... IMO anyway.

I agree. We'd have to correct for that somehow, and that might be awkward.

> I have thought it might be good to explicitly mark partial paths with the
> estimated number of workers, which would be effectively the same thing
> as what you're talking about.  But I wonder if we'd not still be better off
> keeping the path rowcount as being number-of-rows-in-each-worker, and
> just scale it up by the multiplier for EXPLAIN output.  (And then also
> print the true total number of rows in EXPLAIN ANALYZE.)  If we do the
> inverse of that, then we risk bugs from failing to correct the rowcount
> during cost-estimation calculations.

That I don't like at all. I'm still of the opinion that it's a huge
mistake for EXPLAIN to print int(rowcount/loops) instead of just
rowcount. The division is never what I want and in my experience is
also not what other people want and often causes confusion. Both the
division and the rounding lose information about precisely what row
count was estimated, which makes it harder to figure out where in the
plan things went wrong. I am not at all keen on adding more ways for
what we print out to be different from the information actually stored
in the plan tree. I don't know for sure what we ought to be storing in
the plan tree, but I think whatever we store should also be what we
print. I think the fact that we've chosen to store something in the
plan tree is strong evidence that that exact value, and not some
quantity derived therefrom, is what's interesting.

> I kinda feel that the bottom line here is that cost estimation is
> hard, and we're not going to find a magic bullet that removes bugs.

Well that much is certainly true.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Re: testclient.exe installed under MSVC
Следующее
От: "Godfrin, Philippe E"
Дата:
Сообщение: pg_stat_statements