Обсуждение: Tweaking DSM and DSA limits
Hello, If you run a lot of parallel queries that use big parallel hash joins simultaneously, you can run out of DSM slots (for example, when testing many concurrent parallel queries). That's because we allow 64 slots + 2 * MaxBackends, but allocating seriously large amounts of dynamic shared memory requires lots of slots. Originally the DSM system was designed to support one segment per parallel query, but now we also use one for the session and any number for parallel executor nodes that want space limited by work_mem. The number of slots it takes for a given total amount of shared memory depends on the macro DSA_NUM_SEGMENTS_AT_EACH_SIZE. Since DSM slots are relatively scarce (we use inefficient algorithms to access them, and we think that some operating systems won't like us if we create too many, so we impose this scarcity on ourselves), each DSA area allocates bigger and bigger segments as it goes, starting with 1MB. The approximate number of segments required to allocate various sizes incrementally using different values of DSA_NUM_SEGMENTS_AT_EACH_SIZE can be seen in this table: N = 1 2 3 4 1MB 1 1 1 1 64MB 6 10 13 16 512MB 9 16 22 28 1GB 10 18 25 32 8GB 13 24 34 44 16GB 14 26 37 48 32GB 15 28 40 52 1TB 20 38 55 72 It's currently set to 4, but I now think that was too cautious. It tries to avoid fragmentation by ramping up slowly (that is, memory allocated and in some cases committed by the operating system that we don't turn out to need), but it's pretty wasteful of slots. Perhaps it should be set to 2? Perhaps also the number of slots per backend should be dynamic, so that you have the option to increase it from the current hard-coded value of 2 if you don't want to increase max_connections but find yourself running out of slots (this GUC was a request from Andres but the name was made up by me -- if someone has a better suggestion I'm all ears). Also, there are some outdated comments near PG_DYNSHMEM_SLOTS_PER_BACKEND's definition that we might as well drop along with the macro. Draft patch attached. -- Thomas Munro https://enterprisedb.com
Вложения
On Tue, Jun 18, 2019 at 9:08 PM Thomas Munro <thomas.munro@gmail.com> wrote: > It's currently set to 4, but I now think that was too cautious. It > tries to avoid fragmentation by ramping up slowly (that is, memory > allocated and in some cases committed by the operating system that we > don't turn out to need), but it's pretty wasteful of slots. Perhaps > it should be set to 2? +1. I think I said at the time that I thought that was too cautious... > Perhaps also the number of slots per backend should be dynamic, so > that you have the option to increase it from the current hard-coded > value of 2 if you don't want to increase max_connections but find > yourself running out of slots (this GUC was a request from Andres but > the name was made up by me -- if someone has a better suggestion I'm > all ears). I am not convinced that we really need to GUC-ify this. How about just bumping the value up from 2 to say 5? Between the preceding change and this one we ought to buy ourselves more than 4x, and if that is not enough then we can ask whether raising max_connections is a reasonable workaround, and if that's still not enough then we can revisit this idea, or maybe come up with something better. The problem I have with a GUC here is that nobody without a PhD in PostgreSQL-ology will have any clue how to set it, and while that's good for your employment prospects and mine, it's not so great for PostgreSQL users generally. As Andres observed off-list, it would also be a good idea to allow things that are going to gobble memory like hash joins to have some input into how much memory gets allocated. Maybe preallocating the expected size of the hash is too aggressive -- estimates can be wrong, and it could be much smaller. But maybe we should allocate at least, say, 1/64th of that amount, and act as if DSA_NUM_SEGMENTS_AT_EACH_SIZE == 1 until the cumulative memory allocation is more than 25% of that amount. So if we think it's gonna be 1GB, start by allocating 16MB and double the size of each allocation thereafter until we get to at least 256MB allocated. So then we'd have 16MB + 32MB + 64MB + 128MB + 256MB + 256MB + 512MB = 7 segments instead of the 32 required currently or the 18 required with DSA_NUM_SEGMENTS_AT_EACH_SIZE == 2. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2019-06-20 14:20:27 -0400, Robert Haas wrote: > On Tue, Jun 18, 2019 at 9:08 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > Perhaps also the number of slots per backend should be dynamic, so > > that you have the option to increase it from the current hard-coded > > value of 2 if you don't want to increase max_connections but find > > yourself running out of slots (this GUC was a request from Andres but > > the name was made up by me -- if someone has a better suggestion I'm > > all ears). > > I am not convinced that we really need to GUC-ify this. How about > just bumping the value up from 2 to say 5? I'm not sure either. Although it's not great if the only way out for a user hitting this is to increase max_connections... But we should really increase the default. > As Andres observed off-list, it would also be a good idea to allow > things that are going to gobble memory like hash joins to have some > input into how much memory gets allocated. Maybe preallocating the > expected size of the hash is too aggressive -- estimates can be wrong, > and it could be much smaller. At least for the case of the hashtable itself, we allocate that at the predicted size immediately. So a mis-estimation wouldn't change anything. For the entires, yea, something like you suggest would make sense. Greetings, Andres Freund
On Thu, Jun 20, 2019 at 02:20:27PM -0400, Robert Haas wrote: > On Tue, Jun 18, 2019 at 9:08 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > It's currently set to 4, but I now think that was too cautious. It > > tries to avoid fragmentation by ramping up slowly (that is, memory > > allocated and in some cases committed by the operating system that we > > don't turn out to need), but it's pretty wasteful of slots. Perhaps > > it should be set to 2? > > +1. I think I said at the time that I thought that was too cautious... > > > Perhaps also the number of slots per backend should be dynamic, so > > that you have the option to increase it from the current hard-coded > > value of 2 if you don't want to increase max_connections but find > > yourself running out of slots (this GUC was a request from Andres but > > the name was made up by me -- if someone has a better suggestion I'm > > all ears). > > I am not convinced that we really need to GUC-ify this. How about > just bumping the value up from 2 to say 5? Between the preceding > change and this one we ought to buy ourselves more than 4x, and if > that is not enough then we can ask whether raising max_connections is > a reasonable workaround, Is there perhaps a way to make raising max_connections not require a restart? There are plenty of situations out there where restarts aren't something that can be done on a whim. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Thu, Jun 20, 2019 at 5:00 PM David Fetter <david@fetter.org> wrote: > Is there perhaps a way to make raising max_connections not require a > restart? There are plenty of situations out there where restarts > aren't something that can be done on a whim. Sure, if you want to make this take about 100x more work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jun 21, 2019 at 6:52 AM Andres Freund <andres@anarazel.de> wrote: > On 2019-06-20 14:20:27 -0400, Robert Haas wrote: > > On Tue, Jun 18, 2019 at 9:08 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > Perhaps also the number of slots per backend should be dynamic, so > > > that you have the option to increase it from the current hard-coded > > > value of 2 if you don't want to increase max_connections but find > > > yourself running out of slots (this GUC was a request from Andres but > > > the name was made up by me -- if someone has a better suggestion I'm > > > all ears). > > > > I am not convinced that we really need to GUC-ify this. How about > > just bumping the value up from 2 to say 5? > > I'm not sure either. Although it's not great if the only way out for a > user hitting this is to increase max_connections... But we should really > increase the default. Ok, hard-to-explain GUC abandoned. Here is a patch that just adjusts the two constants. DSM's array allows for 5 slots per connection (up from 2), and DSA doubles its size after every two segments (down from 4). > > As Andres observed off-list, it would also be a good idea to allow > > things that are going to gobble memory like hash joins to have some > > input into how much memory gets allocated. Maybe preallocating the > > expected size of the hash is too aggressive -- estimates can be wrong, > > and it could be much smaller. > > At least for the case of the hashtable itself, we allocate that at the > predicted size immediately. So a mis-estimation wouldn't change > anything. For the entires, yea, something like you suggest would make > sense. At the moment the 32KB chunks are used as parallel granules for various work (inserting, repartitioning, rebucketing). I could certainly allocate a much bigger piece based on estimates, and then invent another kind of chunks inside that, or keep the existing layering but find a way to hint to DSA what allocation stream to expect in the future so it can get bigger underlying chunks ready. One problem is that it'd result in large, odd sized memory segments, whereas the current scheme uses power of two sizes and might be more amenable to a later segment reuse scheme; or maybe that doesn't really matter. I have a long wish list of improvements I'd like to investigate in this area, subject for future emails, but while I'm making small tweaks, here's another small thing: there is no "wait event" while allocating (in the kernel sense) POSIX shm on Linux, unlike the equivalent IO when file-backed segments are filled with write() calls. Let's just reuse the same wait event, so that you can see what's going on in pg_stat_activity.
Вложения
On Mon, Oct 21, 2019 at 12:21 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Fri, Jun 21, 2019 at 6:52 AM Andres Freund <andres@anarazel.de> wrote: > > On 2019-06-20 14:20:27 -0400, Robert Haas wrote: > > > I am not convinced that we really need to GUC-ify this. How about > > > just bumping the value up from 2 to say 5? > > > > I'm not sure either. Although it's not great if the only way out for a > > user hitting this is to increase max_connections... But we should really > > increase the default. > > Ok, hard-to-explain GUC abandoned. Here is a patch that just adjusts > the two constants. DSM's array allows for 5 slots per connection (up > from 2), and DSA doubles its size after every two segments (down from > 4). Pushed. No back-patch for now: the risk/reward ratio doesn't seem right for that. > I have a long wish list of improvements I'd like to investigate in > this area, subject for future emails, but while I'm making small > tweaks, here's another small thing: there is no "wait event" while > allocating (in the kernel sense) POSIX shm on Linux, unlike the > equivalent IO when file-backed segments are filled with write() calls. > Let's just reuse the same wait event, so that you can see what's going > on in pg_stat_activity. Also pushed.