Re: POC: GROUP BY optimization

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: POC: GROUP BY optimization
Дата
Msg-id CA+TgmoaAGPSzcetRQyBywVk=qSJjBpta5=Qz-1j2XJ5BkgkAVA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: POC: GROUP BY optimization  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: POC: GROUP BY optimization
Список pgsql-hackers
On Tue, Dec 26, 2023 at 10:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think it's a fool's errand to even try to separate different sort
> column orderings by cost.  We simply do not have sufficiently accurate
> cost information.  The previous patch in this thread got reverted because
> of that (well, also some implementation issues, but mostly that), and
> nothing has happened to make me think that another try will fare any
> better.

I'm late to the party, but I'd like to better understand what's being
argued here. If you're saying that, for some particular planner
problem, we should prefer a solution that doesn't need to know about
the relative cost of various sorts over one that does, I agree, for
exactly the reason that you state: our knowledge of sort costs won't
be reliable, and we will make mistakes. That's true in lots of
situations, not just related to sorts,
because estimation is a hard problem. Heuristics not based on cost are
going to be, in many cases, more accurate than heuristics based on
cost. They're also often cheaper, since they often let us reject
possible approaches very early, without all the bother of a cost
comparison.

But if you're saying that it's utterly impossible to know whether
sorting text will be cheaper or more expensive than sorting 4-byte
integers, and that if a particular problem can be solved only by
knowing which one is cheaper we should just give up, then I disagree.
In the absence of any other information, it must be right, at the very
least, to bank on varlena data types being more expensive to sort than
fixed-length data types. How much more expensive is hard to know,
because toasted blobs are going to be more expensive to sort than
short varlenas. But even before you reach the comparison function, a
pass-by-value datum has a significantly lower access cost than a
pass-by-reference datum. The fact that the pass-by-reference value
might be huge only compounds the problem.

--
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Следующее
От: vignesh C
Дата:
Сообщение: Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning