Re: Default setting for enable_hashagg_disk

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Default setting for enable_hashagg_disk
Дата
Msg-id 20200626230020.GD3337@momjian.us
обсуждение исходный текст
Ответ на Re: Default setting for enable_hashagg_disk  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Default setting for enable_hashagg_disk  (Peter Geoghegan <pg@bowt.ie>)
Re: Default setting for enable_hashagg_disk  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On Fri, Jun 26, 2020 at 07:45:13PM +0200, Tomas Vondra wrote:
> On Fri, Jun 26, 2020 at 12:37:26PM -0400, Bruce Momjian wrote:
> > I was thinking more of being able to allocate a single value to be
> > shared by all active sesions.
> 
> Not sure I understand. What "single value" do you mean?

I was thinking of a full-cluster work_mem maximum allocation that could
be given to various backends that request it.

Imagine we set the cluster-wide total of work_mem to 1GB.  If a session
asks for 100MB, if there are no other active sessions, it can grant the
entire 100MB.  If there are other sessions running, and 500MB has
already been allocated, maybe it is only given an active per-node
work_mem of 50MB.  As the amount of unallocated cluster-wide work_mem
gets smaller, requests are granted smaller actual allocations.

What we do now makes little sense, because we might have lots of free
memory, but we force nodes to spill to disk when they exceed a fixed
work_mem.  I realize this is very imprecise, because you don't know what
future work_mem requests are coming, or how long until existing
allocations are freed, but it seems it would have to be better than what
we do now.

> Wasn't the idea was to replace work_mem with something like query_mem?
> That'd be nice, but I think it's inherently circular - we don't know how
> to distribute this to different nodes until we know which nodes will
> need a buffer, but the buffer size is important for costing (so we need
> it when constructing the paths).
> 
> Plus then there's the question whether all nodes should get the same
> fraction, or less sensitive nodes should get smaller chunks, etc.
> Ultimately this would be based on costing too, I think, but it makes it
> soe much complex ...

Since work_mem affect the optimizer choices, I can imagine it getting
complex since nodes would have to ask the global work_mem allocator how
much memory it _might_ get, but then ask for final work_mem during
execution, and they might differ.  Still, our spill costs are so high
for so many node types, that reducing spills seems like it would be a
win, even if it sometimes causes poorer plans.

> > Also, doesn't this blog entry also show that spiling to disk for ORDER
> > BY is similarly slow compared to hash aggs?
> > 
> >     https://momjian.us/main/blogs/pgblog/2012.html#February_2_2012
> 
> The post does not mention hashagg at all, so I'm not sure how could it
> show that? But I think you're right the spilling itself is not that far
> away, in most cases (thanks to the recent fixes made by Jeff).

Yeah, I was just measuring ORDER BY spill, but it seems to be a similar
overhead to hashagg spill, which is being singled out in this discussion
as particularly expensive, and I am questioning that.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Felix Lechner
Дата:
Сообщение: Fwd: PostgreSQL: WolfSSL support
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: xid wraparound danger due to INDEX_CLEANUP false