Обсуждение: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 17757 Logged by: David Angel Email address: david_sisson@dell.com PostgreSQL version: 14.5 Operating system: Linux Description: On an OS where hugepages are enabled, if no hugepages resources are assigned in Kubernetes and the postgres instance is set to hugepages = off in the config then one would assume that the DB would not use hugepages. However, because the initdb process uses postgresql.conf.sample or postgresql.conf.template instead of the actual specified configuration the applied setting is actually hugepages = try during initdb. In these cases, the initdb phase will attempt to allocate huge pages that are available in the OS, but it will be denied access by Kubernetes and fail. Here is a PR with a possible fix: https://github.com/postgres/postgres/pull/114/files Here are some links for further information Ours: https://github.com/CrunchyData/postgres-operator/issues/3477 Others with the same having no solution to disable huge pages. https://github.com/CrunchyData/postgres-operator/issues/3039 https://github.com/CrunchyData/postgres-operator/issues/2258 https://github.com/CrunchyData/postgres-operator/issues/3126 https://github.com/CrunchyData/postgres-operator/issues/3421 Bitnami https://github.com/bitnami/charts/issues/7901
On 1/20/23 23:48, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 17757 > Logged by: David Angel > Email address: david_sisson@dell.com > PostgreSQL version: 14.5 > Operating system: Linux > Description: > > On an OS where hugepages are enabled, if no hugepages resources are assigned > in Kubernetes and the postgres instance is set to hugepages = off in the > config then one would assume that the DB would not use hugepages. There's no config at that point - it's initdb that creates it, by copying the .sample file, IIRC. So not sure which file you're modifying. > However, because the initdb process uses postgresql.conf.sample or > postgresql.conf.template instead of the actual specified configuration the > applied setting is actually hugepages = try during initdb. Specified how? > In these cases, the initdb phase will attempt to allocate huge pages that > are available in the OS, but it will be denied access by Kubernetes and > fail. Well, so how exactly this fails? Does that mean Kubernetes broke mmap() with MAP_HUGETLB so that it doesn't return MAP_FAILED when hugepages are not available, or what? Because that's the only explanation I can see, looking at the code. Or it just does not realize there are no hugepages, returns something and then crashes with SIGBUS later when trying to access it? > > Here is a PR with a possible fix: > https://github.com/postgres/postgres/pull/114/files > I doubt we want to just go straight to changing the default value for everyone. IMHO if the "try" logic is somehow broken, we should fix the try logic, not mess with the defaults. In the worst case, the operator can probably tweak the .sample config before calling initdb. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От
Andres Freund
Дата:
Hi, On 2023-01-22 00:10:29 +0100, Tomas Vondra wrote: > On 1/20/23 23:48, PG Bug reporting form wrote: > > In these cases, the initdb phase will attempt to allocate huge pages that > > are available in the OS, but it will be denied access by Kubernetes and > > fail. > > Well, so how exactly this fails? Does that mean Kubernetes broke mmap() > with MAP_HUGETLB so that it doesn't return MAP_FAILED when hugepages are > not available, or what? Because that's the only explanation I can see, > looking at the code. Yea, that's what I was wondering about as well. > Or it just does not realize there are no hugepages, returns something > and then crashes with SIGBUS later when trying to access it? I assume that that's the case. There's references to bus errors in a bunch of the linked issues. E.g. https://github.com/CrunchyData/postgres-operator/issues/413 selecting default max_connections ... sh: line 1: 60 Bus error (core dumped) "/usr/pgsql-10/bin/postgres"--boot -x0 -F -c max_connections=100 -c shared_buffers=1000 -c dynamic_shared_memory_type=none< "/dev/null" > "/dev/null" 2>&1 It's possible that the problem would go away if we used MAP_POPULATE for the allocation. I'd guess that this is annoying cgroups stuff :( > I doubt we want to just go straight to changing the default value for > everyone. IMHO if the "try" logic is somehow broken, we should fix the > try logic, not mess with the defaults. Agreed. But we could disable huge pages explicitly inside initdb - there's really no point in using it there... Greetings, Andres Freund
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > On 1/20/23 23:48, PG Bug reporting form wrote: >> Here is a PR with a possible fix: >> https://github.com/postgres/postgres/pull/114/files > I doubt we want to just go straight to changing the default value for > everyone. Yeah, that proposal is a non-starter. I could see providing an initdb option to adjust the value applied during initdb, though. Ideally, maybe what we want is a generalized switch that could replace any variable in the sample config, along the lines of the server's "-c foo=bar". I recall having tried to do that and having run into quoting hazards, but I did not try very hard. regards, tom lane
Andres Freund <andres@anarazel.de> writes: > On 2023-01-22 00:10:29 +0100, Tomas Vondra wrote: >> I doubt we want to just go straight to changing the default value for >> everyone. IMHO if the "try" logic is somehow broken, we should fix the >> try logic, not mess with the defaults. > Agreed. But we could disable huge pages explicitly inside initdb - there's > really no point in using it there... One of the things initdb is trying to do is establish a set of values that is known to allow the server to start. Not using the same settings that the server is expected to use would break that idea completely. regards, tom lane
Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От
Andres Freund
Дата:
Hi, On 2023-01-21 18:33:03 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2023-01-22 00:10:29 +0100, Tomas Vondra wrote: > >> I doubt we want to just go straight to changing the default value for > >> everyone. IMHO if the "try" logic is somehow broken, we should fix the > >> try logic, not mess with the defaults. > > Agreed. But we could disable huge pages explicitly inside initdb - there's > > really no point in using it there... > > One of the things initdb is trying to do is establish a set of values > that is known to allow the server to start. Not using the same settings > that the server is expected to use would break that idea completely. Yea, I'm not saying like the approach. OTOH, we don't provide a proper way to influence the configuration, which is bad, as this issue shows. Perhaps we should add an option to force MAP_POPULATE being used? I'm fairly certain that'd avoid the SIGBUS in this case. And it'd make sense to ensure that we can actually use the memory in initdb. Unfortunately it's not unproblematic to use it in general, because with large shared_buffers values it can be quite slow, because the kernel initializes the memory in a single thread. I've seen ~3GB/s on multi-socket machines. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Perhaps we should add an option to force MAP_POPULATE being used? I'm fairly > certain that'd avoid the SIGBUS in this case. And it'd make sense to ensure > that we can actually use the memory in initdb. > Unfortunately it's not unproblematic to use it in general, because with large > shared_buffers values it can be quite slow, because the kernel initializes the > memory in a single thread. I've seen ~3GB/s on multi-socket machines. Hmm ... but if we can't use it by default, we're still back to the problem of needing a way to tell initdb to do things differently. I'd just as soon keep that to "set huge_pages = off" rather than inventing whole new things. regards, tom lane
Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От
Andres Freund
Дата:
Hi, On 2023-01-21 15:29:22 -0800, Andres Freund wrote: > On 2023-01-22 00:10:29 +0100, Tomas Vondra wrote: > > On 1/20/23 23:48, PG Bug reporting form wrote: > > > In these cases, the initdb phase will attempt to allocate huge pages that > > > are available in the OS, but it will be denied access by Kubernetes and > > > fail. > > > > Well, so how exactly this fails? Does that mean Kubernetes broke mmap() > > with MAP_HUGETLB so that it doesn't return MAP_FAILED when hugepages are > > not available, or what? Because that's the only explanation I can see, > > looking at the code. > > Yea, that's what I was wondering about as well. > > > > Or it just does not realize there are no hugepages, returns something > > and then crashes with SIGBUS later when trying to access it? > > I assume that that's the case. There's references to bus errors in a bunch of > the linked issues. E.g. > https://github.com/CrunchyData/postgres-operator/issues/413 > > selecting default max_connections ... sh: line 1: 60 Bus error (core dumped) "/usr/pgsql-10/bin/postgres"--boot -x0 -F -c max_connections=100 -c shared_buffers=1000 -c dynamic_shared_memory_type=none< "/dev/null" > "/dev/null" 2>&1 > > It's possible that the problem would go away if we used MAP_POPULATE for the > allocation. > I'd guess that this is annoying cgroups stuff :( Ah, the fun: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/hugetlb.html The HugeTLB controller allows users to limit the HugeTLB usage (page fault) per control group and enforces the limit during page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit at page fault time implies that, the application will get SIGBUS signal if it tries to fault in HugeTLB pages beyond its limit. Therefore the application needs to know exactly how many HugeTLB pages it uses before hand, and the sysadmin needs to make sure that there are enough available on the machine for all the users to avoid processes getting SIGBUS. but there's also Reservation accounting hugetlb.<hugepagesize>.rsvd.limit_in_bytes hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes hugetlb.<hugepagesize>.rsvd.usage_in_byteshugetlb.<hugepagesize>.rsvd.failcnt The HugeTLB controller allows to limit the HugeTLB reservations per control group and enforces the controller limit at reservation time and at the fault of HugeTLB memory for which no reservation exists. Since reservation limits are enforced at reservation time (on mmap or shget), reservation limits never causes the application to get SIGBUS signal if the memory was reserved before hand. For MAP_NORESERVE allocations, the reservation limit behaves the same as the fault limit, enforcing memory usage at fault time and causing the application to receive a SIGBUS if it’s crossing its limit. Reservation limits are superior to page fault limits described above, since reservation limits are enforced at reservation time (on mmap or shget), and never causes the application to get SIGBUS signal if the memory was reserved before hand. This allows for easier fallback to alternatives such as non-HugeTLB memory for example. In the case of page fault accounting, it’s very hard to avoid processes getting SIGBUS since the sysadmin needs precisely know the HugeTLB usage of all the tasks in the system and make sure there is enough pages to satisfy all requests. Avoiding tasks getting SIGBUS on overcommited systems is practically impossible with page fault accounting. So the problem is that the wrong time of cgroup limits are used. I don't know if that's a kubernetes or a postgres-operator issue. Greetings, Andres Freund
On 1/22/23 00:30, Tom Lane wrote: > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: >> On 1/20/23 23:48, PG Bug reporting form wrote: >>> Here is a PR with a possible fix: >>> https://github.com/postgres/postgres/pull/114/files > >> I doubt we want to just go straight to changing the default value for >> everyone. > > Yeah, that proposal is a non-starter. I could see providing an > initdb option to adjust the value applied during initdb, though. > > Ideally, maybe what we want is a generalized switch that could > replace any variable in the sample config, along the lines of > the server's "-c foo=bar". I recall having tried to do that and > having run into quoting hazards, but I did not try very hard. > Yeah, I was looking for something like "-c" in initdb, only to realize there's nothing like that. The main "problem" with adding that is that we're unlikely to backpatch that (I guess), and thus it does not really solve the issue for the OP. I'm not sure we'd be keen to backpatch a change of the default, but maybe we would ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > On 1/22/23 00:30, Tom Lane wrote: >> Yeah, that proposal is a non-starter. I could see providing an >> initdb option to adjust the value applied during initdb, though. >> Ideally, maybe what we want is a generalized switch that could >> replace any variable in the sample config, along the lines of >> the server's "-c foo=bar". I recall having tried to do that and >> having run into quoting hazards, but I did not try very hard. > Yeah, I was looking for something like "-c" in initdb, only to realize > there's nothing like that. The main "problem" with adding that is that > we're unlikely to backpatch that (I guess), and thus it does not really > solve the issue for the OP. > I'm not sure we'd be keen to backpatch a change of the default, but > maybe we would ... Back-patching a change of default seems like REALLY a non-starter. Perhaps adding a switch (which would break nothing if not used) could be discussed, though. regards, tom lane
Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От
Andres Freund
Дата:
Hi, On 2023-01-22 01:55:01 +0100, Tomas Vondra wrote: > I'm not sure we'd be keen to backpatch a change of the default, but > maybe we would ... After figuring out that it's clearly a configuration issue *somewhere* outside of postgres's remit, I'm not that sure it's worth doing something concretely to avoid the SIGBUS issue. But if we end up doing something, I think a parameter triggering use of MAP_POPULATE would be a good idea. It's actually useful outside of the SIGBUS issue, because benchmarks reach a steady state noticably more quickly when using it. OTOH, in a production scenario with large shared_buffers I'd probably not want to use it, because getting up more quickly and and distributing the memory initialization across across cores is more important. I think it'd be ok to explicitly specify such an option in initdb - after all, initdb does do work to determine the correct shared buffers size etc, and MAP_POPULATE will lead to a more reliable determination. Not just with huge pages, but also with "small" pages and system-level memory overcommit. Greetings, Andres Freund
RE: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От
"Sisson, David"
Дата:
I believe something should be done with PostgreSQL because we are configuring huge_pages = off in the standard "postgresql.conf"file. huge_pages can be turned on through outside manipulation but it can't be turned off. Not without altering the sample config file. Thanks, David Angel 😊 Internal Use - Confidential -----Original Message----- From: Andres Freund <andres@anarazel.de> Sent: Saturday, January 21, 2023 8:08 PM To: Tomas Vondra Cc: Tom Lane; Sisson, David; pgsql-bugs@lists.postgresql.org Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes [EXTERNAL EMAIL] Hi, On 2023-01-22 01:55:01 +0100, Tomas Vondra wrote: > I'm not sure we'd be keen to backpatch a change of the default, but > maybe we would ... After figuring out that it's clearly a configuration issue *somewhere* outside of postgres's remit, I'm not that sure it'sworth doing something concretely to avoid the SIGBUS issue. But if we end up doing something, I think a parameter triggering use of MAP_POPULATE would be a good idea. It's actuallyuseful outside of the SIGBUS issue, because benchmarks reach a steady state noticably more quickly when using it. OTOH, in a production scenario with large shared_buffers I'd probably not want to use it, because getting up more quicklyand and distributing the memory initialization across across cores is more important. I think it'd be ok to explicitly specify such an option in initdb - after all, initdb does do work to determine the correctshared buffers size etc, and MAP_POPULATE will lead to a more reliable determination. Not just with huge pages, butalso with "small" pages and system-level memory overcommit. Greetings, Andres Freund
Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От
Christophe Pettus
Дата:
> On Jan 23, 2023, at 11:26, Sisson, David <David.Sisson@dell.com> wrote: > > I believe something should be done with PostgreSQL because we are configuring huge_pages = off in the standard "postgresql.conf"file. We are? I believe the default is "huge_pages = try", not off.
RE: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От
"Sisson, David"
Дата:
The default is "huge_pages = try" which is commented out in the "postgresql.conf.sample" file. When a consumer like myself turns it off in the standard "postgresql.conf" file, it should not be turned on when initdb runs. There is no way to turn it off without altering the sample config file. It is quite difficult to nearly impossible to alter the "postgresql.conf.sample" file using a 3rd party controller. The file is read-only at runtime within Kubernetes. Only some controllers let you modify the sample file without rebuilding their code. You guys are awesome with truly outstanding responses. I certainly didn't expect my initial solution to be used but to help in finding a good solution. 😊 Thanks, David Angel Internal Use - Confidential -----Original Message----- From: Christophe Pettus <xof@thebuild.com> Sent: Monday, January 23, 2023 1:38 PM To: Sisson, David Cc: Andres Freund; Tomas Vondra; Tom Lane; pgsql-bugs@lists.postgresql.org Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes [EXTERNAL EMAIL] > On Jan 23, 2023, at 11:26, Sisson, David <David.Sisson@dell.com> wrote: > > I believe something should be done with PostgreSQL because we are configuring huge_pages = off in the standard "postgresql.conf"file. We are? I believe the default is "huge_pages = try", not off.
Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От
Andres Freund
Дата:
Hi, On 2023-01-23 19:26:09 +0000, Sisson, David wrote: > I believe something should be done with PostgreSQL because we are configuring huge_pages = off in the standard "postgresql.conf"file. > huge_pages can be turned on through outside manipulation but it can't be > turned off. It's a fault of the environment if mmap(MAP_HUGETLB) causes a SIGBUS. Normally huge_pages = try is harmless, because it'll just fall back. That source of SIGBUSes needs to be fixed regardless of anything else - plenty allocators try to use huge pages for example, so you'll run into problems regardless of postgres' default. That said, I'm for allowing to specify options to initdb. Greetings, Andres Freund
"Sisson, David" <David.Sisson@dell.com> writes: > The default is "huge_pages = try" which is commented out in the "postgresql.conf.sample" file. > When a consumer like myself turns it off in the standard "postgresql.conf" file, it should not be turned on when initdbruns. What "standard postgresql.conf file"? There is no such thing until initdb creates it. > There is no way to turn it off without altering the sample config file. Yup, that's exactly why we are having this discussion. regards, tom lane
Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От
"David G. Johnston"
Дата:
On Mon, Jan 23, 2023 at 12:51 PM Sisson, David <David.Sisson@dell.com> wrote:
The default is "huge_pages = try" which is commented out in the "postgresql.conf.sample" file.
When a consumer like myself turns it off in the standard "postgresql.conf" file, it should not be turned on when initdb runs.
There is no way to turn it off without altering the sample config file.
Right, the present way to control what is seen by initdb is postgresql.conf.sample since that is the template that initdb uses to then produce an actual postgresql.conf for the newly created instance. postgresql.conf is only ever a per-instance configuration file. It doesn't make sense to "change postgresql.conf in hopes of influencing some future initdb run."
David J.
RE: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От
"Sisson, David"
Дата:
That makes sense, the PostgreSQL controllers are calling initdb to create the "postgresql.conf" file before they apply customizationsto it. To the consumer, it is just yaml to be added to the "postgresql.conf" file. That makes it much harder to fix and means it is really the controllers at fault. This probably needs to be explicitly documented when creating a HA cluster or within initdb docs. https://www.postgresql.org/docs/15/app-initdb.html Maybe something about how initdb uses sample and what configuration settings must be pre-configured. Thanks, David Angel Internal Use - Confidential -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Monday, January 23, 2023 1:56 PM To: Sisson, David Cc: Christophe Pettus; Andres Freund; Tomas Vondra; pgsql-bugs@lists.postgresql.org Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes [EXTERNAL EMAIL] "Sisson, David" <David.Sisson@dell.com> writes: > The default is "huge_pages = try" which is commented out in the "postgresql.conf.sample" file. > When a consumer like myself turns it off in the standard "postgresql.conf" file, it should not be turned on when initdbruns. What "standard postgresql.conf file"? There is no such thing until initdb creates it. > There is no way to turn it off without altering the sample config file. Yup, that's exactly why we are having this discussion. regards, tom lane
RE: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От
"Sisson, David"
Дата:
A quick and dirty solution could be to alter initdb to catch the exception and retry using a copy of the sample with "huge_pages=false". Would that be acceptable? Passing in a config setting into initdb would still require a rebuild of all controllers. That could take months to years at best. Thanks, David Angel Internal Use - Confidential -----Original Message----- From: Sisson, David <David_Sisson@Dell.com> Sent: Monday, January 23, 2023 2:12 PM To: Tom Lane Cc: Christophe Pettus; Andres Freund; Tomas Vondra; pgsql-bugs@lists.postgresql.org; Sisson, David; Howell, Stephen Subject: RE: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes That makes sense, the PostgreSQL controllers are calling initdb to create the "postgresql.conf" file before they apply customizationsto it. To the consumer, it is just yaml to be added to the "postgresql.conf" file. That makes it much harder to fix and means it is really the controllers at fault. This probably needs to be explicitly documented when creating a HA cluster or within initdb docs. https://www.postgresql.org/docs/15/app-initdb.html Maybe something about how initdb uses sample and what configuration settings must be pre-configured. Thanks, David Angel Internal Use - Confidential -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Monday, January 23, 2023 1:56 PM To: Sisson, David Cc: Christophe Pettus; Andres Freund; Tomas Vondra; pgsql-bugs@lists.postgresql.org Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes [EXTERNAL EMAIL] "Sisson, David" <David.Sisson@dell.com> writes: > The default is "huge_pages = try" which is commented out in the "postgresql.conf.sample" file. > When a consumer like myself turns it off in the standard "postgresql.conf" file, it should not be turned on when initdbruns. What "standard postgresql.conf file"? There is no such thing until initdb creates it. > There is no way to turn it off without altering the sample config file. Yup, that's exactly why we are having this discussion. regards, tom lane
Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От
Andres Freund
Дата:
Hi, On 2023-01-23 20:35:17 +0000, Sisson, David wrote: > A quick and dirty solution could be to alter initdb to catch the exception and retry using a copy of the sample with "huge_pages=false". > Would that be acceptable? This is a kubernetes or postgres-operator bug (setting up the wrong cgroup limit, which the docs explicitly warn against doing). I don't think we want to accumulate workarounds like that in postgres. > Passing in a config setting into initdb would still require a rebuild of all controllers. > That could take months to years at best. Huh. I don't know anything about the controller, but that seems problematic independent of this specific issue. And you'd still need to deploy a new version of postgres to get such changes... > Internal Use - Confidential Hardly. Greetings, Andres Freund
RE: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От
"Sisson, David"
Дата:
The controllers generally always pull in the latest PostgreSQL. It is easy to get the latest version with PostgreSQL updated. Unfortunately, getting a bug fix is a lot harder. One controller currently holding this defect for over a year with no end in sight. Found this: https://github.com/opencontainers/runtime-spec/issues/1050 Looks like a PR exists for it but the solution is invalid. https://github.com/kailun-qin/runtime-spec/commit/a6505339204535150260d8e4f0bc112628f1fa87 More info: https://www.postgresql.org/message-id/flat/20200218093240.jd3lgoxmisyl2tt5%40localhost#61c2c7fc3d3dd80512c9130b6967be16 It would be nice if "try" worked as expected. I totally understand it is not a PostgreSQL issue but any assistance would be very appreciated. Thanks, David Angel Internal Use - Confidential -----Original Message----- From: Andres Freund <andres@anarazel.de> Sent: Monday, January 23, 2023 3:10 PM To: Sisson, David Cc: Tom Lane; Christophe Pettus; Tomas Vondra; pgsql-bugs@lists.postgresql.org; Howell, Stephen Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes [EXTERNAL EMAIL] Hi, On 2023-01-23 20:35:17 +0000, Sisson, David wrote: > A quick and dirty solution could be to alter initdb to catch the exception and retry using a copy of the sample with "huge_pages=false". > Would that be acceptable? This is a kubernetes or postgres-operator bug (setting up the wrong cgroup limit, which the docs explicitly warn againstdoing). I don't think we want to accumulate workarounds like that in postgres. > Passing in a config setting into initdb would still require a rebuild of all controllers. > That could take months to years at best. Huh. I don't know anything about the controller, but that seems problematic independent of this specific issue. And you'dstill need to deploy a new version of postgres to get such changes... > Internal Use - Confidential Hardly. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > It's a fault of the environment if mmap(MAP_HUGETLB) causes a SIGBUS. Normally > huge_pages = try is harmless, because it'll just fall back. That source of > SIGBUSes needs to be fixed regardless of anything else - plenty allocators try > to use huge pages for example, so you'll run into problems regardless of > postgres' default. That seems likely to me too. > That said, I'm for allowing to specify options to initdb. Yeah, I think that has enough other potential applications to be worth doing. Here's a quick draft patch (sans user-facing docs as yet). It injects any given values into postgresql.auto.conf, not postgresql.conf proper. I did that mainly because the latter looked beyond the abilities of the primitive string-munging code we have in there, but I think it can be argued to be a reasonable choice anyway. regards, tom lane diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 7a58c33ace..e6f1f42cae 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -82,6 +82,13 @@ /* Ideally this would be in a .h file, but it hardly seems worth the trouble */ extern const char *select_default_timezone(const char *share_path); +/* simple list of strings */ +typedef struct _stringlist +{ + char *str; + struct _stringlist *next; +} _stringlist; + static const char *const auth_methods_host[] = { "trust", "reject", "scram-sha-256", "md5", "password", "ident", "radius", #ifdef ENABLE_GSS @@ -142,6 +149,8 @@ static char *pwfilename = NULL; static char *superuser_password = NULL; static const char *authmethodhost = NULL; static const char *authmethodlocal = NULL; +static _stringlist *extra_guc_names = NULL; +static _stringlist *extra_guc_values = NULL; static bool debug = false; static bool noclean = false; static bool noinstructions = false; @@ -253,6 +262,7 @@ static void check_input(char *path); static void write_version_file(const char *extrapath); static void set_null_conf(void); static void test_config_settings(void); +static bool test_specific_config_settings(int test_conns, int test_buffs); static void setup_config(void); static void bootstrap_template1(void); static void setup_auth(FILE *cmdfd); @@ -359,6 +369,27 @@ escape_quotes_bki(const char *src) return result; } +/* + * Add an item at the end of a stringlist. + */ +static void +add_stringlist_item(_stringlist **listhead, const char *str) +{ + _stringlist *newentry = pg_malloc(sizeof(_stringlist)); + _stringlist *oldentry; + + newentry->str = pg_strdup(str); + newentry->next = NULL; + if (*listhead == NULL) + *listhead = newentry; + else + { + for (oldentry = *listhead; oldentry->next; oldentry = oldentry->next) + /* skip */ ; + oldentry->next = newentry; + } +} + /* * make a copy of the array of lines, with token replaced by replacement * the first time it occurs on each line. @@ -873,11 +904,9 @@ test_config_settings(void) 400, 300, 200, 100, 50 }; - char cmd[MAXPGPATH]; const int connslen = sizeof(trial_conns) / sizeof(int); const int bufslen = sizeof(trial_bufs) / sizeof(int); int i, - status, test_conns, test_buffs, ok_buffers = 0; @@ -903,19 +932,7 @@ test_config_settings(void) test_conns = trial_conns[i]; test_buffs = MIN_BUFS_FOR_CONNS(test_conns); - snprintf(cmd, sizeof(cmd), - "\"%s\" --check %s %s " - "-c max_connections=%d " - "-c shared_buffers=%d " - "-c dynamic_shared_memory_type=%s " - "< \"%s\" > \"%s\" 2>&1", - backend_exec, boot_options, extra_options, - test_conns, test_buffs, - dynamic_shared_memory_type, - DEVNULL, DEVNULL); - fflush(NULL); - status = system(cmd); - if (status == 0) + if (test_specific_config_settings(test_conns, test_buffs)) { ok_buffers = test_buffs; break; @@ -940,19 +957,7 @@ test_config_settings(void) break; } - snprintf(cmd, sizeof(cmd), - "\"%s\" --check %s %s " - "-c max_connections=%d " - "-c shared_buffers=%d " - "-c dynamic_shared_memory_type=%s " - "< \"%s\" > \"%s\" 2>&1", - backend_exec, boot_options, extra_options, - n_connections, test_buffs, - dynamic_shared_memory_type, - DEVNULL, DEVNULL); - fflush(NULL); - status = system(cmd); - if (status == 0) + if (test_specific_config_settings(n_connections, test_buffs)) break; } n_buffers = test_buffs; @@ -968,6 +973,48 @@ test_config_settings(void) printf("%s\n", default_timezone ? default_timezone : "GMT"); } +/* + * Test a specific combination of configuration settings. + */ +static bool +test_specific_config_settings(int test_conns, int test_buffs) +{ + PQExpBuffer cmd = createPQExpBuffer(); + int status; + _stringlist *gnames, + *gvalues; + + /* Set up the test postmaster invocation */ + printfPQExpBuffer(cmd, + "\"%s\" --check %s %s " + "-c max_connections=%d " + "-c shared_buffers=%d " + "-c dynamic_shared_memory_type=%s", + backend_exec, boot_options, extra_options, + test_conns, test_buffs, + dynamic_shared_memory_type); + + /* Add any user-given setting overrides */ + for (gnames = extra_guc_names, gvalues = extra_guc_values; + gnames != NULL; /* assume lists have the same length */ + gnames = gnames->next, gvalues = gvalues->next) + { + appendPQExpBuffer(cmd, " -c %s=", gnames->str); + appendShellString(cmd, gvalues->str); + } + + appendPQExpBuffer(cmd, + " < \"%s\" > \"%s\" 2>&1", + DEVNULL, DEVNULL); + + fflush(NULL); + status = system(cmd->data); + + destroyPQExpBuffer(cmd); + + return (status == 0); +} + /* * Calculate the default wal_size with a "pretty" unit. */ @@ -994,7 +1041,10 @@ setup_config(void) char **conflines; char repltok[MAXPGPATH]; char path[MAXPGPATH]; - char *autoconflines[3]; + char **autoconflines; + int numautoconflines; + _stringlist *gnames, + *gvalues; fputs(_("creating configuration files ... "), stdout); fflush(stdout); @@ -1152,15 +1202,32 @@ setup_config(void) if (chmod(path, pg_file_create_mode) != 0) pg_fatal("could not change permissions of \"%s\": %m", path); + free(conflines); + + /* - * create the automatic configuration file to store the configuration - * parameters set by ALTER SYSTEM command. The parameters present in this - * file will override the value of parameters that exists before parse of - * this file. + * Create the automatic configuration file that stores the configuration + * parameters set by the ALTER SYSTEM command. We also place any + * parameter values set with initdb's -c option into this file. */ + numautoconflines = 2; /* fixed header lines */ + for (gnames = extra_guc_names; gnames != NULL; gnames = gnames->next) + numautoconflines++; + autoconflines = (char **) pg_malloc((numautoconflines + 1) * sizeof(char *)); + autoconflines[0] = pg_strdup("# Do not edit this file manually!\n"); autoconflines[1] = pg_strdup("# It will be overwritten by the ALTER SYSTEM command.\n"); - autoconflines[2] = NULL; + numautoconflines = 2; + for (gnames = extra_guc_names, gvalues = extra_guc_values; + gnames != NULL; /* assume lists have the same length */ + gnames = gnames->next, gvalues = gvalues->next) + { + autoconflines[numautoconflines++] = + psprintf("%s = '%s'\n", + gnames->str, + escape_quotes(gvalues->str)); + } + autoconflines[numautoconflines] = NULL; sprintf(path, "%s/postgresql.auto.conf", pg_data); @@ -1168,7 +1235,7 @@ setup_config(void) if (chmod(path, pg_file_create_mode) != 0) pg_fatal("could not change permissions of \"%s\": %m", path); - free(conflines); + free(autoconflines); /* pg_hba.conf */ @@ -2103,6 +2170,7 @@ usage(const char *progname) printf(_(" -A, --auth=METHOD default authentication method for local connections\n")); printf(_(" --auth-host=METHOD default authentication method for local TCP/IP connections\n")); printf(_(" --auth-local=METHOD default authentication method for local-socket connections\n")); + printf(_(" -c NAME=VALUE override default setting for server parameter\n")); printf(_(" [-D, --pgdata=]DATADIR location for this database cluster\n")); printf(_(" -E, --encoding=ENCODING set default encoding for new databases\n")); printf(_(" -g, --allow-group-access allow group read/execute on data directory\n")); @@ -2808,7 +2876,8 @@ main(int argc, char *argv[]) /* process command-line options */ - while ((c = getopt_long(argc, argv, "A:dD:E:gkL:nNsST:U:WX:", long_options, &option_index)) != -1) + while ((c = getopt_long(argc, argv, "A:c:dD:E:gkL:nNsST:U:WX:", + long_options, &option_index)) != -1) { switch (c) { @@ -2831,6 +2900,24 @@ main(int argc, char *argv[]) case 11: authmethodhost = pg_strdup(optarg); break; + case 'c': + { + char *buf = pg_strdup(optarg); + char *equals = strchr(buf, '='); + + if (!equals) + { + pg_log_error("-c %s requires a value", buf); + pg_log_error_hint("Try \"%s --help\" for more information.", + progname); + exit(1); + } + *equals++ = '\0'; /* terminate variable name */ + add_stringlist_item(&extra_guc_names, buf); + add_stringlist_item(&extra_guc_values, equals); + pfree(buf); + } + break; case 'D': pg_data = pg_strdup(optarg); break;
Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От
Andres Freund
Дата:
Hi, On 2023-01-23 17:51:46 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > That said, I'm for allowing to specify options to initdb. > > Yeah, I think that has enough other potential applications to be worth > doing. Here's a quick draft patch (sans user-facing docs as yet). > It injects any given values into postgresql.auto.conf, not > postgresql.conf proper. I did that mainly because the latter looked > beyond the abilities of the primitive string-munging code we have in > there, but I think it can be argued to be a reasonable choice anyway. Oh, I had thought we'd just pass them on with -c to the processes that initdb starts. But perhaps just persisting them isn't a bad idea... - Andres
Andres Freund <andres@anarazel.de> writes: > On 2023-01-23 17:51:46 -0500, Tom Lane wrote: >> Yeah, I think that has enough other potential applications to be worth >> doing. Here's a quick draft patch (sans user-facing docs as yet). >> It injects any given values into postgresql.auto.conf, not >> postgresql.conf proper. I did that mainly because the latter looked >> beyond the abilities of the primitive string-munging code we have in >> there, but I think it can be argued to be a reasonable choice anyway. > Oh, I had thought we'd just pass them on with -c to the processes that initdb > starts. But perhaps just persisting them isn't a bad idea... It certainly seems to me that that would be the mainstream use-case, so why not fill in the file as the user probably wants? They can always change it. Also, as I mentioned, the expectation is that initdb will set up a known-working combination of settings; and we don't really know that if we leave off whatever was injected by "-c". In the case at hand, if we don't propagate "huge_pages = off" to the installed configuration, the server still won't work. regards, tom lane
Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
От
Andres Freund
Дата:
Hi, On 2023-01-23 19:45:19 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2023-01-23 17:51:46 -0500, Tom Lane wrote: > >> Yeah, I think that has enough other potential applications to be worth > >> doing. Here's a quick draft patch (sans user-facing docs as yet). > >> It injects any given values into postgresql.auto.conf, not > >> postgresql.conf proper. I did that mainly because the latter looked > >> beyond the abilities of the primitive string-munging code we have in > >> there, but I think it can be argued to be a reasonable choice anyway. > > > Oh, I had thought we'd just pass them on with -c to the processes that initdb > > starts. But perhaps just persisting them isn't a bad idea... > > It certainly seems to me that that would be the mainstream use-case, > so why not fill in the file as the user probably wants? They can > always change it. Also, as I mentioned, the expectation is that > initdb will set up a known-working combination of settings; and > we don't really know that if we leave off whatever was injected by > "-c". In the case at hand, if we don't propagate "huge_pages = off" > to the installed configuration, the server still won't work. Yea, makes sense. Greetings, Andres Freund