Обсуждение: BUG #8225: logging options don't change after reload
The following bug has been logged on the website: Bug reference: 8225 Logged by: Jeff Frost Email address: jeff@pgexperts.com PostgreSQL version: 9.1.8 Operating system: various Description: = I've seen this a few times on client servers but still can't seem to boil it down to a nice reproducible test case. = What happens is that we change various logging options in postgresql.conf, then reload, and every so often, the settings don't seem to take effect even though they are logged as being changed. Here's an example from a recent logging run: grep -hi checkpoint *.csv 2013-05-31 12:58:33.051 MDT,,,5398,,5189c940.1516,8,,2013-05-07 21:40:48 MDT,,0,LOG,00000,"parameter ""log_checkpoints"" changed to ""on""",,,,,,,,,"" 2013-05-31 15:18:36.131 MDT,,,5398,,5189c940.1516,22,,2013-05-07 21:40:48 MDT,,0,LOG,00000,"parameter ""log_checkpoints"" removed from configuration file, reset to default",,,,,,,,,"" So, note that log_checkpoints was logged as being changed, then 2hrs 20mins later it's logged as being set back to default, but there's not a single checkpoint logged in that time period. I checked and checkpoint_timeout is indeed set to 300s. = Also, in addition, we change the log_filename to datacollection-%H and while that also shows up in the log during the reload: 2013-05-31 12:58:33.051 MDT,,,5398,,5189c940.1516,9,,2013-05-07 21:40:48 MDT,,0,LOG,00000,"parameter ""log_filename"" changed to ""datacollection-%H""",,,,,,,,,"" and the logfile is created, postgresql continues to log to the old filename for the duration of the logging run. I've seen this on recent 9.0, 9.1 and 9.2 versions. This particular run was on 9.1.8.
On 06/11/13 15:27, jeff@pgexperts.com wrote: > The following bug has been logged on the website: > > Bug reference: 8225 > Logged by: Jeff Frost > Email address: jeff@pgexperts.com > PostgreSQL version: 9.1.8 > Operating system: various > Description: > > I've seen this a few times on client servers but still can't seem to boil it > down to a nice reproducible test case. > > What happens is that we change various logging options in postgresql.conf, > then reload, and every so often, the settings don't seem to take effect even > though they are logged as being changed. > > Here's an example from a recent logging run: > > grep -hi checkpoint *.csv > 2013-05-31 12:58:33.051 MDT,,,5398,,5189c940.1516,8,,2013-05-07 21:40:48 > MDT,,0,LOG,00000,"parameter ""log_checkpoints"" changed to > ""on""",,,,,,,,,"" > 2013-05-31 15:18:36.131 MDT,,,5398,,5189c940.1516,22,,2013-05-07 21:40:48 > MDT,,0,LOG,00000,"parameter ""log_checkpoints"" removed from configuration > file, reset to default",,,,,,,,,"" > > So, note that log_checkpoints was logged as being changed, then 2hrs 20mins > later it's logged as being set back to default, but there's not a single > checkpoint logged in that time period. I checked and checkpoint_timeout is > indeed set to 300s. > > Also, in addition, we change the log_filename to datacollection-%H and while > that also shows up in the log during the reload: > > 2013-05-31 12:58:33.051 MDT,,,5398,,5189c940.1516,9,,2013-05-07 21:40:48 > MDT,,0,LOG,00000,"parameter ""log_filename"" changed to > ""datacollection-%H""",,,,,,,,,"" > > and the logfile is created, postgresql continues to log to the old filename > for the duration of the logging run. > > I've seen this on recent 9.0, 9.1 and 9.2 versions. This particular run was > on 9.1.8. > > > Oh sorry, that should specify "Various linux version" meaning: Ubuntu 12.04, Ubuntu 10.04, CentOS 5/6, Scientific Linux 6. -- Jeff Frost <jeff@pgexperts.com> CTO, PostgreSQL Experts, Inc. Phone: 1-888-PG-EXPRT x506 FAX: 415-762-5122 http://www.pgexperts.com/
jeff@pgexperts.com writes: > What happens is that we change various logging options in postgresql.conf, > then reload, and every so often, the settings don't seem to take effect even > though they are logged as being changed. FWIW, the "parameter changed" messages are logged when the postmaster process updates its values of the GUCs. The particular options you're complaining of here, though, are not actually checked in the postmaster --- they're used in the checkpointer or syslogger processes respectively. So one theory about this would be that those processes aren't absorbing the GUC updates, perhaps because the SIGHUP signals the postmaster should be sending them are getting lost. I'm not sure how we might track down the cause though. How "various" are the platforms you're seeing this on? regards, tom lane
On Jun 13, 2013, at 4:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > jeff@pgexperts.com writes: >> What happens is that we change various logging options in = postgresql.conf, >> then reload, and every so often, the settings don't seem to take = effect even >> though they are logged as being changed. >=20 > FWIW, the "parameter changed" messages are logged when the postmaster > process updates its values of the GUCs. The particular options you're > complaining of here, though, are not actually checked in the = postmaster > --- they're used in the checkpointer or syslogger processes > respectively. So one theory about this would be that those processes > aren't absorbing the GUC updates, perhaps because the SIGHUP signals = the > postmaster should be sending them are getting lost. I'm not sure how = we > might track down the cause though. How "various" are the platforms > you're seeing this on? I've seen it on 9.0, 9.1 and 9.2 recent versions running on Ubuntu = 10.04/12.04, Centos 5/6 and Scientific Linux 6. I've not tried on Windows. Interestingly, it will often pick them up if you wait a few seconds and = send it another reload. I've been seeing it for a while, but haven't reported it since I = couldn't come up with a reproducible test case.=
Jeff Frost <jeff@pgexperts.com> writes: > On Jun 13, 2013, at 4:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... So one theory about this would be that those processes >> aren't absorbing the GUC updates, perhaps because the SIGHUP signals the >> postmaster should be sending them are getting lost. > Interestingly, it will often pick them up if you wait a few seconds and send it another reload. Hmm, that definitely lends some credence to the lost-signal theory, since another reload would cause the postmaster to again signal all its children, and this time the signal might go through. But I still have no idea how we might debug further. You could possibly try something like strace'ing the processes, but it seems fairly likely that the Heisenberg principle would apply if you did. regards, tom lane
Tom Lane wrote: > jeff@pgexperts.com writes: > > What happens is that we change various logging options in postgresql.conf, > > then reload, and every so often, the settings don't seem to take effect even > > though they are logged as being changed. > > FWIW, the "parameter changed" messages are logged when the postmaster > process updates its values of the GUCs. The particular options you're > complaining of here, though, are not actually checked in the postmaster > --- they're used in the checkpointer or syslogger processes > respectively. So one theory about this would be that those processes > aren't absorbing the GUC updates, perhaps because the SIGHUP signals the > postmaster should be sending them are getting lost. Another idea is that postmaster sees the changed file but the other processes see an older version of it -- which would be pretty bizarre, but .. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Jun 13, 2013, at 5:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Frost <jeff@pgexperts.com> writes: >> On Jun 13, 2013, at 4:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> ... So one theory about this would be that those processes >>> aren't absorbing the GUC updates, perhaps because the SIGHUP signals = the >>> postmaster should be sending them are getting lost. >=20 >> Interestingly, it will often pick them up if you wait a few seconds = and send it another reload. >=20 > Hmm, that definitely lends some credence to the lost-signal theory, > since another reload would cause the postmaster to again signal all > its children, and this time the signal might go through. >=20 > But I still have no idea how we might debug further. You could = possibly > try something like strace'ing the processes, but it seems fairly = likely > that the Heisenberg principle would apply if you did. What I don't understand is the new log file being created from the new = log_filename setting but then nothing being logged into it. Is it the = postmaster which creates that file? I would've thought it would be the = logger process?=
Jeff Frost <jeff@pgexperts.com> writes: > What I don't understand is the new log file being created from the new > log_filename setting but then nothing being logged into it. Is it the > postmaster which creates that file? I would've thought it would be > the logger process? Hm, I hadn't focused on that --- that *is* pretty bizarre. The postmaster creates the log file initially before forking the syslogger, but subsequent rotations are handled by the syslogger process. Is it possible that your systems are running on the hairy edge of ENFILE limits? I notice that the syslogger will silently fail to rotate if it gets ENFILE while trying to open the new log file. That doesn't look like it'd explain the lack of log_checkpoint activity, though. Also, usually people notice this state because everything else on the box starts to fall over ... regards, tom lane
On Jun 13, 2013, at 7:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Frost <jeff@pgexperts.com> writes: >> What I don't understand is the new log file being created from the = new >> log_filename setting but then nothing being logged into it. Is it = the >> postmaster which creates that file? I would've thought it would be >> the logger process? >=20 > Hm, I hadn't focused on that --- that *is* pretty bizarre. The > postmaster creates the log file initially before forking the = syslogger, > but subsequent rotations are handled by the syslogger process. >=20 > Is it possible that your systems are running on the hairy edge of = ENFILE > limits? I notice that the syslogger will silently fail to rotate if = it > gets ENFILE while trying to open the new log file. That doesn't look > like it'd explain the lack of log_checkpoint activity, though. Also, > usually people notice this state because everything else on the box > starts to fall over ... These are definitely busy systems, but usually not running close to the = edge, I'll see if I can make a test case using pgbench on 9.2.4.=
On 06/13/13 20:44, Jeff Frost wrote: > These are definitely busy systems, but usually not running close to the edge, I'll see if I can make a test case usingpgbench on 9.2.4. I'm afraid my attempts to reproduce were again unsuccessful. :-(