Обсуждение: Tweaking DSM and DSA limits

Поиск
Список
Период
Сортировка

Tweaking DSM and DSA limits

От
Thomas Munro
Дата:
Hello,

If you run a lot of parallel queries that use big parallel hash joins
simultaneously, you can run out of DSM slots (for example, when
testing many concurrent parallel queries).  That's because we allow 64
slots + 2 * MaxBackends, but allocating seriously large amounts of
dynamic shared memory requires lots of slots.

Originally the DSM system was designed to support one segment per
parallel query, but now we also use one for the session and any number
for parallel executor nodes that want space limited by work_mem.

The number of slots it takes for a given total amount of shared memory
depends on the macro DSA_NUM_SEGMENTS_AT_EACH_SIZE.  Since DSM slots
are relatively scarce (we use inefficient algorithms to access them,
and we think that some operating systems won't like us if we create
too many, so we impose this scarcity on ourselves), each DSA area
allocates bigger and bigger segments as it goes, starting with 1MB.
The approximate number of segments required to allocate various sizes
incrementally using different values of DSA_NUM_SEGMENTS_AT_EACH_SIZE
can be seen in this table:

 N =   1   2   3   4

  1MB  1   1   1   1
 64MB  6  10  13  16
512MB  9  16  22  28
  1GB 10  18  25  32
  8GB 13  24  34  44
 16GB 14  26  37  48
 32GB 15  28  40  52
  1TB 20  38  55  72

It's currently set to 4, but I now think that was too cautious.  It
tries to avoid fragmentation by ramping up slowly (that is, memory
allocated and in some cases committed by the operating system that we
don't turn out to need), but it's pretty wasteful of slots.  Perhaps
it should be set to 2?

Perhaps also the number of slots per backend should be dynamic, so
that you have the option to increase it from the current hard-coded
value of 2 if you don't want to increase max_connections but find
yourself running out of slots (this GUC was a request from Andres but
the name was made up by me -- if someone has a better suggestion I'm
all ears).

Also, there are some outdated comments near
PG_DYNSHMEM_SLOTS_PER_BACKEND's definition that we might as well drop
along with the macro.

Draft patch attached.

-- 
Thomas Munro
https://enterprisedb.com

Вложения

Re: Tweaking DSM and DSA limits

От
Robert Haas
Дата:
On Tue, Jun 18, 2019 at 9:08 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> It's currently set to 4, but I now think that was too cautious.  It
> tries to avoid fragmentation by ramping up slowly (that is, memory
> allocated and in some cases committed by the operating system that we
> don't turn out to need), but it's pretty wasteful of slots.  Perhaps
> it should be set to 2?

+1.  I think I said at the time that I thought that was too cautious...

> Perhaps also the number of slots per backend should be dynamic, so
> that you have the option to increase it from the current hard-coded
> value of 2 if you don't want to increase max_connections but find
> yourself running out of slots (this GUC was a request from Andres but
> the name was made up by me -- if someone has a better suggestion I'm
> all ears).

I am not convinced that we really need to GUC-ify this.  How about
just bumping the value up from 2 to say 5?  Between the preceding
change and this one we ought to buy ourselves more than 4x, and if
that is not enough then we can ask whether raising max_connections is
a reasonable workaround, and if that's still not enough then we can
revisit this idea, or maybe come up with something better.  The
problem I have with a GUC here is that nobody without a PhD in
PostgreSQL-ology will have any clue how to set it, and while that's
good for your employment prospects and mine, it's not so great for
PostgreSQL users generally.

As Andres observed off-list, it would also be a good idea to allow
things that are going to gobble memory like hash joins to have some
input into how much memory gets allocated.  Maybe preallocating the
expected size of the hash is too aggressive -- estimates can be wrong,
and it could be much smaller.  But maybe we should allocate at least,
say, 1/64th of that amount, and act as if
DSA_NUM_SEGMENTS_AT_EACH_SIZE == 1 until the cumulative memory
allocation is more than 25% of that amount.  So if we think it's gonna
be 1GB, start by allocating 16MB and double the size of each
allocation thereafter until we get to at least 256MB allocated.  So
then we'd have 16MB + 32MB + 64MB + 128MB + 256MB + 256MB + 512MB = 7
segments instead of the 32 required currently or the 18 required with
DSA_NUM_SEGMENTS_AT_EACH_SIZE == 2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tweaking DSM and DSA limits

От
Andres Freund
Дата:
Hi,

On 2019-06-20 14:20:27 -0400, Robert Haas wrote:
> On Tue, Jun 18, 2019 at 9:08 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > Perhaps also the number of slots per backend should be dynamic, so
> > that you have the option to increase it from the current hard-coded
> > value of 2 if you don't want to increase max_connections but find
> > yourself running out of slots (this GUC was a request from Andres but
> > the name was made up by me -- if someone has a better suggestion I'm
> > all ears).
> 
> I am not convinced that we really need to GUC-ify this.  How about
> just bumping the value up from 2 to say 5?

I'm not sure either. Although it's not great if the only way out for a
user hitting this is to increase max_connections... But we should really
increase the default.


> As Andres observed off-list, it would also be a good idea to allow
> things that are going to gobble memory like hash joins to have some
> input into how much memory gets allocated.  Maybe preallocating the
> expected size of the hash is too aggressive -- estimates can be wrong,
> and it could be much smaller.

At least for the case of the hashtable itself, we allocate that at the
predicted size immediately. So a mis-estimation wouldn't change
anything. For the entires, yea, something like you suggest would make
sense.

Greetings,

Andres Freund



Re: Tweaking DSM and DSA limits

От
David Fetter
Дата:
On Thu, Jun 20, 2019 at 02:20:27PM -0400, Robert Haas wrote:
> On Tue, Jun 18, 2019 at 9:08 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > It's currently set to 4, but I now think that was too cautious.  It
> > tries to avoid fragmentation by ramping up slowly (that is, memory
> > allocated and in some cases committed by the operating system that we
> > don't turn out to need), but it's pretty wasteful of slots.  Perhaps
> > it should be set to 2?
> 
> +1.  I think I said at the time that I thought that was too cautious...
> 
> > Perhaps also the number of slots per backend should be dynamic, so
> > that you have the option to increase it from the current hard-coded
> > value of 2 if you don't want to increase max_connections but find
> > yourself running out of slots (this GUC was a request from Andres but
> > the name was made up by me -- if someone has a better suggestion I'm
> > all ears).
> 
> I am not convinced that we really need to GUC-ify this.  How about
> just bumping the value up from 2 to say 5?  Between the preceding
> change and this one we ought to buy ourselves more than 4x, and if
> that is not enough then we can ask whether raising max_connections is
> a reasonable workaround,

Is there perhaps a way to make raising max_connections not require a
restart? There are plenty of situations out there where restarts
aren't something that can be done on a whim.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Tweaking DSM and DSA limits

От
Robert Haas
Дата:
On Thu, Jun 20, 2019 at 5:00 PM David Fetter <david@fetter.org> wrote:
> Is there perhaps a way to make raising max_connections not require a
> restart? There are plenty of situations out there where restarts
> aren't something that can be done on a whim.

Sure, if you want to make this take about 100x more work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tweaking DSM and DSA limits

От
Thomas Munro
Дата:
On Fri, Jun 21, 2019 at 6:52 AM Andres Freund <andres@anarazel.de> wrote:
> On 2019-06-20 14:20:27 -0400, Robert Haas wrote:
> > On Tue, Jun 18, 2019 at 9:08 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > Perhaps also the number of slots per backend should be dynamic, so
> > > that you have the option to increase it from the current hard-coded
> > > value of 2 if you don't want to increase max_connections but find
> > > yourself running out of slots (this GUC was a request from Andres but
> > > the name was made up by me -- if someone has a better suggestion I'm
> > > all ears).
> >
> > I am not convinced that we really need to GUC-ify this.  How about
> > just bumping the value up from 2 to say 5?
>
> I'm not sure either. Although it's not great if the only way out for a
> user hitting this is to increase max_connections... But we should really
> increase the default.

Ok, hard-to-explain GUC abandoned.  Here is a patch that just adjusts
the two constants.  DSM's array allows for 5 slots per connection (up
from 2), and DSA doubles its size after every two segments (down from
4).

> > As Andres observed off-list, it would also be a good idea to allow
> > things that are going to gobble memory like hash joins to have some
> > input into how much memory gets allocated.  Maybe preallocating the
> > expected size of the hash is too aggressive -- estimates can be wrong,
> > and it could be much smaller.
>
> At least for the case of the hashtable itself, we allocate that at the
> predicted size immediately. So a mis-estimation wouldn't change
> anything. For the entires, yea, something like you suggest would make
> sense.

At the moment the 32KB chunks are used as parallel granules for
various work (inserting, repartitioning, rebucketing).  I could
certainly allocate a much bigger piece based on estimates, and then
invent another kind of chunks inside that, or keep the existing
layering but find a way to hint to DSA what allocation stream to
expect in the future so it can get bigger underlying chunks ready.
One problem is that it'd result in large, odd sized memory segments,
whereas the current scheme uses power of two sizes and might be more
amenable to a later segment reuse scheme; or maybe that doesn't really
matter.

I have a long wish list of improvements I'd like to investigate in
this area, subject for future emails, but while I'm making small
tweaks, here's another small thing: there is no "wait event" while
allocating (in the kernel sense) POSIX shm on Linux, unlike the
equivalent IO when file-backed segments are filled with write() calls.
Let's just reuse the same wait event, so that you can see what's going
on in pg_stat_activity.

Вложения

Re: Tweaking DSM and DSA limits

От
Thomas Munro
Дата:
On Mon, Oct 21, 2019 at 12:21 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Fri, Jun 21, 2019 at 6:52 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2019-06-20 14:20:27 -0400, Robert Haas wrote:
> > > I am not convinced that we really need to GUC-ify this.  How about
> > > just bumping the value up from 2 to say 5?
> >
> > I'm not sure either. Although it's not great if the only way out for a
> > user hitting this is to increase max_connections... But we should really
> > increase the default.
>
> Ok, hard-to-explain GUC abandoned.  Here is a patch that just adjusts
> the two constants.  DSM's array allows for 5 slots per connection (up
> from 2), and DSA doubles its size after every two segments (down from
> 4).

Pushed.  No back-patch for now: the risk/reward ratio doesn't seem
right for that.

> I have a long wish list of improvements I'd like to investigate in
> this area, subject for future emails, but while I'm making small
> tweaks, here's another small thing: there is no "wait event" while
> allocating (in the kernel sense) POSIX shm on Linux, unlike the
> equivalent IO when file-backed segments are filled with write() calls.
> Let's just reuse the same wait event, so that you can see what's going
> on in pg_stat_activity.

Also pushed.