Обсуждение: Better way of dealing with pgstat wait timeout during buildfarm runs?
Hi, We quite regularly have buildfarm failures that are caused by 'WARNING: pgstat wait timeout' at random points during the build. Especially on some of the ARM buildfarm animals those are really frequent, to the point that it's hard to know the actual state of the buildfarm without looking at several animals. The ARM ones are probably affected more frequently because they'll often run on sd cards and such. So I think a better way to deal with that warning would be a good idea. Besides somehow making the mechanism there are two ways to attack this that I can think of, neither of them awe inspiring: 1) Make that WARNING a LOG message instead. Since those don't get send to the client with default settings... 2) Increase PGSTAT_MAX_WAIT_TIME even further than what 99b545 increased it to. Does anybody have an actually good idea? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > So I think a better way to deal with that warning would be a good > idea. Besides somehow making the mechanism there are two ways to attack > this that I can think of, neither of them awe inspiring: > 1) Make that WARNING a LOG message instead. Since those don't get send > to the client with default settings... > 2) Increase PGSTAT_MAX_WAIT_TIME even further than what 99b545 increased > it to. Yeah, I've been getting more annoyed by that too lately. I keep wondering though whether there's an actual bug underneath that behavior that we're failing to see. PGSTAT_MAX_WAIT_TIME is already 10 seconds; it's hard to credit that increasing it still further would be "fixing" anything. The other change would also mainly just sweep the issue under the rug, if there is any issue and it's not just that we're overloading underpowered buildfarm machines. (Maybe a better fix would be to reduce MAX_CONNECTIONS for the tests on these machines?) I wonder whether when multiple processes are demanding statsfile updates, there's some misbehavior that causes them to suck CPU away from the stats collector and/or convince it that it doesn't need to write anything. There are odd things in the logs in some of these events. For example in today's failure on hamster, http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2014-12-25%2016%3A00%3A07 there are two client-visible wait-timeout warnings, one each in the gist and spgist tests. But in the postmaster log we find these in fairly close succession: [549c38ba.724d:2] WARNING: pgstat wait timeout [549c39b1.73e7:10] WARNING: pgstat wait timeout [549c38ba.724d:3] WARNING: pgstat wait timeout Correlating these with other log entries shows that the first and third are from the autovacuum launcher while the second is from the gist test session. So the spgist failure failed to get logged, and in any case the big picture is that we had four timeout warnings occurring in a pretty short span of time, in a parallel test set that's not all that demanding (12 parallel tests, well below our max). Not sure what to make of that. BTW, I notice that in the current state of pgstat.c, all the logic for keeping track of request arrival times is dead code, because nothing is actually looking at DBWriteRequest.request_time. This makes me think that somebody simplified away some logic we maybe should have kept. But if we're going to leave it like this, we could replace the DBWriteRequest data structure with a simple OID list and save a fair amount of code. regards, tom lane
On 2014-12-25 14:36:42 -0500, Tom Lane wrote: > I wonder whether when multiple processes are demanding statsfile updates, > there's some misbehavior that causes them to suck CPU away from the stats > collector and/or convince it that it doesn't need to write anything. > There are odd things in the logs in some of these events. For example in > today's failure on hamster, > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2014-12-25%2016%3A00%3A07 > there are two client-visible wait-timeout warnings, one each in the > gist and spgist tests. But in the postmaster log we find these in > fairly close succession: > > [549c38ba.724d:2] WARNING: pgstat wait timeout > [549c39b1.73e7:10] WARNING: pgstat wait timeout > [549c38ba.724d:3] WARNING: pgstat wait timeout > > Correlating these with other log entries shows that the first and third > are from the autovacuum launcher while the second is from the gist test > session. So the spgist failure failed to get logged, and in any case the > big picture is that we had four timeout warnings occurring in a pretty > short span of time, in a parallel test set that's not all that demanding > (12 parallel tests, well below our max). Not sure what to make of that. My guess is that a checkpoint happened at that time. Maybe it'd be a good idea to make pg_regress start postgres with log_checkpoints enabled? My guess is that we'd find horrendous 'sync' times. Michael: Could you perhaps turn log_checkpoints on in the config? > BTW, I notice that in the current state of pgstat.c, all the logic for > keeping track of request arrival times is dead code, because nothing is > actually looking at DBWriteRequest.request_time. This makes me think that > somebody simplified away some logic we maybe should have kept. But if > we're going to leave it like this, we could replace the DBWriteRequest > data structure with a simple OID list and save a fair amount of code. That's indeed odd. Seems to have been lost when the statsfile was split into multiple files. Alvaro, Tomas? I wondered for a second whether the split could be responsible somehow, but there's reports of that in older backbranches as well: http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=mereswine&dt=2014-12-23%2019%3A17%3A41 Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 25.12.2014 20:36, Tom Lane wrote: > > Yeah, I've been getting more annoyed by that too lately. I keep > wondering though whether there's an actual bug underneath that > behavior that we're failing to see. PGSTAT_MAX_WAIT_TIME is already > 10 seconds; it's hard to credit that increasing it still further > would be "fixing" anything. The other change would also mainly just > sweep the issue under the rug, if there is any issue and it's not > just that we're overloading underpowered buildfarm machines. (Maybe a > better fix would be to reduce MAX_CONNECTIONS for the tests on these > machines?) I agree that increasing the limit further is not a good solution. > BTW, I notice that in the current state of pgstat.c, all the logic > for keeping track of request arrival times is dead code, because > nothing is actually looking at DBWriteRequest.request_time. This > makes me think that somebody simplified away some logic we maybe > should have kept. But if we're going to leave it like this, we could > replace the DBWriteRequest data structure with a simple OID list and > save a fair amount of code. Really? Which part of the code is dead? I see pgstat_recv_inquiry() is updating the request_time after receiving the inquiry, and pgstat_db_requested() is looking at it when writing the files. If we can simplify the code by keeping just OIDs, let's do that. I think the main reason why we haven't done that in 187492b6 was to keep as much of the existing logic (and maybe change it in a separate patch). regards Tomas
Tomas Vondra <tv@fuzzy.cz> writes: > On 25.12.2014 20:36, Tom Lane wrote: >> BTW, I notice that in the current state of pgstat.c, all the logic >> for keeping track of request arrival times is dead code, because >> nothing is actually looking at DBWriteRequest.request_time. > Really? Which part of the code is dead? I see pgstat_recv_inquiry() is > updating the request_time after receiving the inquiry, and > pgstat_db_requested() is looking at it when writing the files. Where is pgstat_db_requested() looking at request_time? > If we can simplify the code by keeping just OIDs, let's do that. I think > the main reason why we haven't done that in 187492b6 was to keep as much > of the existing logic (and maybe change it in a separate patch). The real point here is that I think that commit *already* changed the existing logic, because the time-of-receipt used to matter. In particular, there used to be a throttle on how often the stats file could get written, which seems to have vanished. I seriously doubt that that was a good change, especially on write-bandwidth-challenged platforms. regards, tom lane
On 25.12.2014 21:14, Andres Freund wrote: > On 2014-12-25 14:36:42 -0500, Tom Lane wrote: > > My guess is that a checkpoint happened at that time. Maybe it'd be a > good idea to make pg_regress start postgres with log_checkpoints > enabled? My guess is that we'd find horrendous 'sync' times. > > Michael: Could you perhaps turn log_checkpoints on in the config? Logging timestamps (using log_line_prefux) would be also helpful. > >> BTW, I notice that in the current state of pgstat.c, all the logic >> for keeping track of request arrival times is dead code, because >> nothing is actually looking at DBWriteRequest.request_time. This >> makes me think that somebody simplified away some logic we maybe >> should have kept. But if we're going to leave it like this, we >> could replace the DBWriteRequest data structure with a simple OID >> list and save a fair amount of code. > > That's indeed odd. Seems to have been lost when the statsfile was > split into multiple files. Alvaro, Tomas? The goal was to keep the logic as close to the original as possible. IIRC there were "pgstat wait timeout" issues before, and in most cases the conclusion was that it's probably because of overloaded I/O. But maybe there actually was another bug, and it's entirely possible that the split introduced a new one, and that's what we're seeing now. The strange thing is that the split happened ~2 years ago, which is inconsistent with the sudden increase of this kind of issues. So maybe something changed on that particular animal (a failing SD card causing I/O stalls, perhaps)? Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce and analyze the issue locally. But that won't happen until January. > I wondered for a second whether the split could be responsible > somehow, but there's reports of that in older backbranches as well: > http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=mereswine&dt=2014-12-23%2019%3A17%3A41 Tomas
Tomas Vondra <tv@fuzzy.cz> writes: > The strange thing is that the split happened ~2 years ago, which is > inconsistent with the sudden increase of this kind of issues. So maybe > something changed on that particular animal (a failing SD card causing > I/O stalls, perhaps)? I think that hamster has basically got a tin can and string for an I/O subsystem. It's not real clear to me whether there's actually been an increase in "wait timeout" failures recently; somebody would have to go through and count them before I'd have much faith in that thesis. However, I notice that at least the last few occurrences on "hamster" all seem to have been in this parallel block: test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address which as recently as 9.4 contained just these tests: test: privileges security_label collate matview lock replica_identity I'm fairly sure that the index-related tests in this batch are I/O intensive, and since they were not there at all six months ago, it's not hard to believe that this block of tests has far greater I/O demands than used to exist. Draw your own conclusions ... regards, tom lane
On 25.12.2014 22:16, Tom Lane wrote: > Tomas Vondra <tv@fuzzy.cz> writes: >> On 25.12.2014 20:36, Tom Lane wrote: >>> BTW, I notice that in the current state of pgstat.c, all the logic >>> for keeping track of request arrival times is dead code, because >>> nothing is actually looking at DBWriteRequest.request_time. > >> Really? Which part of the code is dead? I see pgstat_recv_inquiry() is >> updating the request_time after receiving the inquiry, and >> pgstat_db_requested() is looking at it when writing the files. > > Where is pgstat_db_requested() looking at request_time? Oh, right. pgstat_db_requested() is not looking at the timestamp - it only checks the OID of the database. But pgstat_recv_inquiry() is checking it, comparing it to cutoff_time etc. ISTM the main change related to this is that this: if (last_statwrite < last_statrequest) pgstat_write_statsfile(false); got replaced by this: if (pgstat_write_statsfile_needed()) pgstat_write_statsfiles(false, false); where pgstat_write_statsfile_needed() only checks if the list is empty (and ignores the request/write timestamps). If I understand that correctly, this can would lead to writing the files more often, and we're dealing with a timeout. >> If we can simplify the code by keeping just OIDs, let's do that. I think >> the main reason why we haven't done that in 187492b6 was to keep as much >> of the existing logic (and maybe change it in a separate patch). > > The real point here is that I think that commit *already* changed > the existing logic, because the time-of-receipt used to matter. In > particular, there used to be a throttle on how often the stats file > could get written, which seems to have vanished. I seriously doubt > that that was a good change, especially on > write-bandwidth-challenged platforms. Yes - if that change causes writing the files being written more frequently, it's not a good change. But I think the time-of-receipt still matters - pgstat_recv_inquiry logic remained the same, just applied per database. ISTM the only thing that disappeared is the (last_statwrite < last_statrequest) check. I have to think about this a bit more, I haven't seen this code since the split patch. Tomas
On 25.12.2014 22:40, Tom Lane wrote: > Tomas Vondra <tv@fuzzy.cz> writes: >> The strange thing is that the split happened ~2 years ago, which is >> inconsistent with the sudden increase of this kind of issues. So maybe >> something changed on that particular animal (a failing SD card causing >> I/O stalls, perhaps)? > > I think that hamster has basically got a tin can and string for an I/O > subsystem. It's not real clear to me whether there's actually been an Yes. It's called "SD card". > increase in "wait timeout" failures recently; somebody would have to > go through and count them before I'd have much faith in that thesis. That's what I did. On hamster I see this (in the HEAD): 2014-12-25 16:00:07 yes 2014-12-24 16:00:07 yes 2014-12-23 16:00:07 yes 2014-12-22 16:00:07 yes 2014-12-19 16:00:07 yes 2014-12-15 16:00:11 no 2014-10-25 16:00:06 no 2014-10-24 16:00:06 no 2014-10-23 16:00:06 no 2014-10-22 16:00:06 no 2014-10-21 16:00:07 no 2014-10-19 16:00:06 no 2014-09-28 16:00:06 no 2014-09-26 16:00:07 no 2014-08-28 16:00:06 no 2014-08-12 16:00:06 no 2014-08-05 22:04:48 no 2014-07-19 01:53:30 no 2014-07-06 16:00:06 no 2014-07-04 16:00:06 no 2014-06-29 16:00:06 no 2014-05-09 16:00:04 no 2014-05-07 16:00:04 no 2014-05-04 16:00:04 no 2014-04-28 16:00:04 no 2014-04-18 16:00:04 no 2014-04-04 16:00:04 no (where "yes" means "pgstat wait timeout" is in the logs). On chipmunk, the trend is much less convincing (but there's much less failures, and only 3 of them failed because of the "pgstat wait timeout"). However, it's worth mentioning that all the pgstat failures happened at "16:00:07" and most of the older failures are before that. So it may be that it was failing because of something else, and the pgstat timeout could not happen because of that. OTOH, there's a fair amount of successful runs. > However, I notice that at least the last few occurrences on "hamster" > all seem to have been in this parallel block: > > test: brin gin gist spgist privileges security_label collate matview > lock replica_identity rowsecurity object_address > > which as recently as 9.4 contained just these tests: > > test: privileges security_label collate matview lock replica_identity > > I'm fairly sure that the index-related tests in this batch are I/O > intensive, and since they were not there at all six months ago, it's > not hard to believe that this block of tests has far greater I/O > demands than used to exist. Draw your own conclusions ... Yes, that might be the culprit here. Would be interesting to know what's happening on the machine while the tests are running to confirm this hypothesis. regards Tomas
Tomas Vondra <tv@fuzzy.cz> writes: > On 25.12.2014 22:40, Tom Lane wrote: >> I think that hamster has basically got a tin can and string for an I/O >> subsystem. It's not real clear to me whether there's actually been an >> increase in "wait timeout" failures recently; somebody would have to >> go through and count them before I'd have much faith in that thesis. > That's what I did. On hamster I see this (in the HEAD): > 2014-12-25 16:00:07 yes > 2014-12-24 16:00:07 yes > 2014-12-23 16:00:07 yes > 2014-12-22 16:00:07 yes > 2014-12-19 16:00:07 yes > 2014-12-15 16:00:11 no > 2014-10-25 16:00:06 no > 2014-10-24 16:00:06 no > 2014-10-23 16:00:06 no > 2014-10-22 16:00:06 no > 2014-10-21 16:00:07 no > 2014-10-19 16:00:06 no > 2014-09-28 16:00:06 no > 2014-09-26 16:00:07 no > 2014-08-28 16:00:06 no > 2014-08-12 16:00:06 no > 2014-08-05 22:04:48 no > 2014-07-19 01:53:30 no > 2014-07-06 16:00:06 no > 2014-07-04 16:00:06 no > 2014-06-29 16:00:06 no > 2014-05-09 16:00:04 no > 2014-05-07 16:00:04 no > 2014-05-04 16:00:04 no > 2014-04-28 16:00:04 no > 2014-04-18 16:00:04 no > 2014-04-04 16:00:04 no > (where "yes" means "pgstat wait timeout" is in the logs). On chipmunk, > the trend is much less convincing (but there's much less failures, and > only 3 of them failed because of the "pgstat wait timeout"). mereswine's history is also pretty interesting in this context. That series makes it look like the probability of "pgstat wait timeout" took a big jump around the beginning of December, especially if you make the unproven-but-not-unreasonable assumption that the two pg_upgradecheck failures since then were also wait timeout failures. That's close enough after commit 88fc71926392115c (Nov 19) to make me suspect that that was what put us over the edge: that added a bunch more I/O *and* a bunch more statistics demands to this one block of parallel tests. But even if we are vastly overstressing the I/O subsystem on these boxes, why is it manifesting like this? pgstat never fsyncs the stats temp file, so it should not have to wait for physical I/O I'd think. Or perhaps the file rename() operations get fsync'd behind the scenes by the filesystem? regards, tom lane
On 26.12.2014 02:59, Tom Lane wrote: > Tomas Vondra <tv@fuzzy.cz> writes: >> On 25.12.2014 22:40, Tom Lane wrote: >>> I think that hamster has basically got a tin can and string for an I/O >>> subsystem. It's not real clear to me whether there's actually been an >>> increase in "wait timeout" failures recently; somebody would have to >>> go through and count them before I'd have much faith in that thesis. > >> That's what I did. On hamster I see this (in the HEAD): > >> 2014-12-25 16:00:07 yes >> 2014-12-24 16:00:07 yes >> 2014-12-23 16:00:07 yes >> 2014-12-22 16:00:07 yes >> 2014-12-19 16:00:07 yes >> 2014-12-15 16:00:11 no >> 2014-10-25 16:00:06 no >> 2014-10-24 16:00:06 no >> 2014-10-23 16:00:06 no >> 2014-10-22 16:00:06 no >> 2014-10-21 16:00:07 no >> 2014-10-19 16:00:06 no >> 2014-09-28 16:00:06 no >> 2014-09-26 16:00:07 no >> 2014-08-28 16:00:06 no >> 2014-08-12 16:00:06 no >> 2014-08-05 22:04:48 no >> 2014-07-19 01:53:30 no >> 2014-07-06 16:00:06 no >> 2014-07-04 16:00:06 no >> 2014-06-29 16:00:06 no >> 2014-05-09 16:00:04 no >> 2014-05-07 16:00:04 no >> 2014-05-04 16:00:04 no >> 2014-04-28 16:00:04 no >> 2014-04-18 16:00:04 no >> 2014-04-04 16:00:04 no > >> (where "yes" means "pgstat wait timeout" is in the logs). On >> chipmunk, the trend is much less convincing (but there's much less >> failures, and only 3 of them failed because of the "pgstat wait >> timeout"). > > mereswine's history is also pretty interesting in this context. That > series makes it look like the probability of "pgstat wait timeout" > took a big jump around the beginning of December, especially if you > make the unproven-but-not-unreasonable assumption that the two > pg_upgradecheck failures since then were also wait timeout failures. > That's close enough after commit 88fc71926392115c (Nov 19) to make me > suspect that that was what put us over the edge: that added a bunch > more I/O *and* a bunch more statistics demands to this one block of > parallel tests. Interesting. But even if this commit tipped us over the edge, it's not a proof that the split patch was perfectly correct. > But even if we are vastly overstressing the I/O subsystem on these > boxes, why is it manifesting like this? pgstat never fsyncs the stats > temp file, so it should not have to wait for physical I/O I'd think. > Or perhaps the file rename() operations get fsync'd behind the scenes > by the filesystem? My guess is that the amount of dirty data in page cache reaches, reaches dirty_ratio/dirty_bytes, effectively forcing the writes to go directly to the disks. Those ARM machines have rather low amounts of RAM (typically 256-512MB), and the default values for dirty_ratio are ~20% IIRC. So that's ~50MB-100MB. Tomas
On Fri, Dec 26, 2014 at 6:28 AM, Tomas Vondra <tv@fuzzy.cz> wrote: > On 25.12.2014 21:14, Andres Freund wrote: >> On 2014-12-25 14:36:42 -0500, Tom Lane wrote: >> >> My guess is that a checkpoint happened at that time. Maybe it'd be a >> good idea to make pg_regress start postgres with log_checkpoints >> enabled? My guess is that we'd find horrendous 'sync' times. >> >> Michael: Could you perhaps turn log_checkpoints on in the config? > > Logging timestamps (using log_line_prefux) would be also helpful. Done. If have been wondering about those failures as well but didn't get the time to look around. FYI, hamster is running with a 8GB class 10 SD card able to do 40M/s in read, card that I changed 2 weeks ago btw, the former 4GB card beginning to show I/O errors, RIP to it. So this is what is triggering the wait timeouts more frequently... But I have no real explanation why REL9_4_STABLE shows no signs of failures. For the time being, what about putting pg_stats_tmp into a ram disk to leverage the I/O? Whatever what we come up with, I imagine that I/O will still be tight on hamster. -- Michael
Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > So I think a better way to deal with that warning would be a good > > idea. Besides somehow making the mechanism there are two ways to attack > > this that I can think of, neither of them awe inspiring: > > > 1) Make that WARNING a LOG message instead. Since those don't get send > > to the client with default settings... > > 2) Increase PGSTAT_MAX_WAIT_TIME even further than what 99b545 increased > > it to. > > Yeah, I've been getting more annoyed by that too lately. I keep wondering > though whether there's an actual bug underneath that behavior that we're > failing to see. I think the first thing to do is reconsider usage of PGSTAT_RETRY_DELAY instead of PGSTAT_STAT_INTERVAL in autovacuum workers. That decreases the wait time 50-fold, if I recall this correctly, and causes large amounts of extra I/O traffic. > BTW, I notice that in the current state of pgstat.c, all the logic for > keeping track of request arrival times is dead code, because nothing is > actually looking at DBWriteRequest.request_time. This makes me think that > somebody simplified away some logic we maybe should have kept. I will have a look. I remember being confused about this at some point when reviewing that patch. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> Yeah, I've been getting more annoyed by that too lately. I keep wondering >> though whether there's an actual bug underneath that behavior that we're >> failing to see. > I think the first thing to do is reconsider usage of PGSTAT_RETRY_DELAY > instead of PGSTAT_STAT_INTERVAL in autovacuum workers. That decreases > the wait time 50-fold, if I recall this correctly, and causes large > amounts of extra I/O traffic. Yeah --- that means that instead of the normal behavior that a stats file newer than 500 msec is good enough, an autovac worker insists on a stats file newer than 10 msec. I did some experimentation on prairiedog, and found that it's not hard at all to see autovac workers demanding multiple stats writes per second: 2014-12-26 16:26:52.958 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:26:53.128 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:26:53.188 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:26:54.903 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:26:55.058 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:00.022 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:00.285 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:00.792 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:01.010 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:01.163 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:01.193 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:03.595 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:03.673 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:03.839 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:03.878 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:05.878 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:06.571 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:07.001 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:07.769 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:07.950 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:10.256 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:11.039 EST 21026 LOG: sending inquiry for database 45116 2014-12-26 16:27:11.402 EST 21026 LOG: sending inquiry for database 45116 The argument that autovac workers need fresher stats than anything else seems pretty dubious to start with. Why shouldn't we simplify that down to "they use PGSTAT_STAT_INTERVAL like everybody else"? regards, tom lane
Tom Lane wrote: > The argument that autovac workers need fresher stats than anything else > seems pretty dubious to start with. Why shouldn't we simplify that down > to "they use PGSTAT_STAT_INTERVAL like everybody else"? The point of wanting fresher stats than that, eons ago, was to avoid a worker vacuuming a table that some other worker vacuumed more recently than PGSTAT_STAT_INTERVAL. I realize now that the semantics we really want was something like "stats no older than XYZ" where the given value is the timestamp at which we start checking; if we get anything newer than that it would be okay, but we currently reject it because of lack of a more appropriate API. (If it takes more than PGSTAT_STAT_INTERVAL to get the stats back, a regular backend would ask for fresher stats, but to an autovac worker they would be good enough as long as they are newer than its recheck start time.) Nowadays we can probably disregard the whole issue, since starting a new vacuum just after the prior one finished should not cause much stress to the system thanks to the visibility map. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
От
Heikki Linnakangas
Дата:
On 12/27/2014 12:16 AM, Alvaro Herrera wrote: > Tom Lane wrote: > >> The argument that autovac workers need fresher stats than anything else >> seems pretty dubious to start with. Why shouldn't we simplify that down >> to "they use PGSTAT_STAT_INTERVAL like everybody else"? > > The point of wanting fresher stats than that, eons ago, was to avoid a > worker vacuuming a table that some other worker vacuumed more recently > than PGSTAT_STAT_INTERVAL. I realize now that the semantics we really > want was something like "stats no older than XYZ" where the given value > is the timestamp at which we start checking; if we get anything newer > than that it would be okay, but we currently reject it because of lack > of a more appropriate API. (If it takes more than PGSTAT_STAT_INTERVAL > to get the stats back, a regular backend would ask for fresher stats, > but to an autovac worker they would be good enough as long as they are > newer than its recheck start time.) > > Nowadays we can probably disregard the whole issue, since starting a new > vacuum just after the prior one finished should not cause much stress to > the system thanks to the visibility map. Vacuuming is far from free, even if the visibility map says that most pages are visible to all: you still scan all indexes, if you remove any dead tuples at all. - Heikki
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > On 12/27/2014 12:16 AM, Alvaro Herrera wrote: >> Tom Lane wrote: >>> The argument that autovac workers need fresher stats than anything else >>> seems pretty dubious to start with. Why shouldn't we simplify that down >>> to "they use PGSTAT_STAT_INTERVAL like everybody else"? >> The point of wanting fresher stats than that, eons ago, was to avoid a >> worker vacuuming a table that some other worker vacuumed more recently >> than PGSTAT_STAT_INTERVAL. ... >> Nowadays we can probably disregard the whole issue, since starting a new >> vacuum just after the prior one finished should not cause much stress to >> the system thanks to the visibility map. > Vacuuming is far from free, even if the visibility map says that most > pages are visible to all: you still scan all indexes, if you remove any > dead tuples at all. With typical autovacuum settings, I kinda doubt that there's much value in reducing the window for this problem from 500ms to 10ms. As Alvaro says, this was just a partial, kluge solution from the start --- if we're worried about such duplicate vacuuming, we should undertake a real solution that closes the window altogether. In any case, timeouts occurring inside autovacuum are not directly causing the buildfarm failures, since autovacuum's log entries don't reflect into regression outputs. (It's possible that autovacuum's tight tolerance is contributing to the failures by increasing the load on the stats collector, but I'm not sure I believe that.) To get back to that original complaint about buildfarm runs failing, I notice that essentially all of those failures are coming from "wait timeout" warnings reported by manual VACUUM commands. Now, VACUUM itself has no need to read the stats files. What's actually causing these messages is failure to get a timely response in pgstat_vacuum_stat(). So let me propose a drastic solution: let's dike out this bit in vacuum.c: /* * Send info about dead objects to the statistics collector, unless we are * in autovacuum --- autovacuum.c doesthis for itself. */ if ((vacstmt->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess()) pgstat_vacuum_stat(); This would have the effect of transferring all responsibility for dead-stats-entry cleanup to autovacuum. For ordinary users, I think that'd be just fine. It might be less fine though for people who disable autovacuum, if there still are any. To analyze the effects on them, let's break down what pgstat_vacuum_stat() actually does: 1. It causes drops of whole databases from pgstat's tables. I claim we don't need to do that at this level. There's no issue unless the stats collector missed the PgStat_MsgDropdb message when the database was dropped; and even if it did, the negative consequences of having a dead DB's stats still lying around are pretty minimal since we split up the stats files. There will be no extra I/O except for one small record in the global stats file. So I think we can well afford to say that with autovac off, such missed databases only get cleaned up the next time an autovac is forced for antiwraparound. 2. Within the current database, it causes drops of pgstat entries for dropped tables. This would be a tad annoying if you have lots of transient tables and no or minimal autovacuuming. However, lots of transient tables can be a pain point anyway, because we don't currently have any mechanism other than pgstat_vacuum_stat() for cleaning up per-table stats for dropped tables. It seems like it might be time to do something about that. We could probably extend the PgStat_TableXactStatus infrastructure to keep track of whether a table was created or deleted in the current transaction, and solve the problem without very much new code. Or maybe we could move the responsibility for reporting stale entries into the code that reads the stats files for the stats views. 3. Within the current database, it causes drops of pgstat entries for dropped functions, if you're tracking per-function execution stats. Since that's not the default, maybe we could get away with saying that we don't clean up such dead entries very often unless you're running autovacuum. I don't really want to propose building the infrastructure to support dropping such entries retail. Or, much more simply, we could conclude that it's not that important if pgstat_vacuum_stat() is unable to get up-to-date stats, and rejigger the code so that we don't bleat when the file-reading request comes from that source. This should be a back-patchable fix, whereas #2 above might be a bit too complicated for that. Thoughts? regards, tom lane
On Thu, Dec 25, 2014 at 09:14:36PM +0100, Andres Freund wrote: > My guess is that a checkpoint happened at that time. Maybe it'd be a > good idea to make pg_regress start postgres with log_checkpoints > enabled? My guess is that we'd find horrendous 'sync' times. +1, independent of $SUBJECT. How about log_autovacuum_min_duration=0, too?
On Sat, Dec 27, 2014 at 7:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Heikki Linnakangas <hlinnakangas@vmware.com> writes: >> On 12/27/2014 12:16 AM, Alvaro Herrera wrote: >>> Tom Lane wrote: >>>> The argument that autovac workers need fresher stats than anything else >>>> seems pretty dubious to start with. Why shouldn't we simplify that down >>>> to "they use PGSTAT_STAT_INTERVAL like everybody else"? > >>> The point of wanting fresher stats than that, eons ago, was to avoid a >>> worker vacuuming a table that some other worker vacuumed more recently >>> than PGSTAT_STAT_INTERVAL. ... >>> Nowadays we can probably disregard the whole issue, since starting a new >>> vacuum just after the prior one finished should not cause much stress to >>> the system thanks to the visibility map. > >> Vacuuming is far from free, even if the visibility map says that most >> pages are visible to all: you still scan all indexes, if you remove any >> dead tuples at all. > > With typical autovacuum settings, I kinda doubt that there's much value in > reducing the window for this problem from 500ms to 10ms. As Alvaro says, > this was just a partial, kluge solution from the start --- if we're > worried about such duplicate vacuuming, we should undertake a real > solution that closes the window altogether. In any case, timeouts > occurring inside autovacuum are not directly causing the buildfarm > failures, since autovacuum's log entries don't reflect into regression > outputs. (It's possible that autovacuum's tight tolerance is contributing > to the failures by increasing the load on the stats collector, but I'm > not sure I believe that.) > > To get back to that original complaint about buildfarm runs failing, > I notice that essentially all of those failures are coming from "wait > timeout" warnings reported by manual VACUUM commands. Now, VACUUM itself > has no need to read the stats files. What's actually causing these > messages is failure to get a timely response in pgstat_vacuum_stat(). > So let me propose a drastic solution: let's dike out this bit in vacuum.c: > > /* > * Send info about dead objects to the statistics collector, unless we are > * in autovacuum --- autovacuum.c does this for itself. > */ > if ((vacstmt->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess()) > pgstat_vacuum_stat(); > > This would have the effect of transferring all responsibility for > dead-stats-entry cleanup to autovacuum. For ordinary users, I think > that'd be just fine. It might be less fine though for people who > disable autovacuum, if there still are any. -1. I don't think it's a good idea to inflict pain on people who want to schedule their vacuums manually (and yes, there are some) to get clean buildfarm runs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Dec 27, 2014 at 7:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> This would have the effect of transferring all responsibility for >> dead-stats-entry cleanup to autovacuum. For ordinary users, I think >> that'd be just fine. It might be less fine though for people who >> disable autovacuum, if there still are any. > -1. I don't think it's a good idea to inflict pain on people who want > to schedule their vacuums manually (and yes, there are some) to get > clean buildfarm runs. Did you read the rest of it? regards, tom lane
On Sat, Dec 27, 2014 at 8:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sat, Dec 27, 2014 at 7:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> This would have the effect of transferring all responsibility for >>> dead-stats-entry cleanup to autovacuum. For ordinary users, I think >>> that'd be just fine. It might be less fine though for people who >>> disable autovacuum, if there still are any. > >> -1. I don't think it's a good idea to inflict pain on people who want >> to schedule their vacuums manually (and yes, there are some) to get >> clean buildfarm runs. > > Did you read the rest of it? Yeah... since when do I not read your emails? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Dec 27, 2014 at 07:55:05PM -0500, Tom Lane wrote: > To get back to that original complaint about buildfarm runs failing, > I notice that essentially all of those failures are coming from "wait > timeout" warnings reported by manual VACUUM commands. Now, VACUUM itself > has no need to read the stats files. What's actually causing these > messages is failure to get a timely response in pgstat_vacuum_stat(). > So let me propose a drastic solution: let's dike out this bit in vacuum.c: > > /* > * Send info about dead objects to the statistics collector, unless we are > * in autovacuum --- autovacuum.c does this for itself. > */ > if ((vacstmt->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess()) > pgstat_vacuum_stat(); > > This would have the effect of transferring all responsibility for > dead-stats-entry cleanup to autovacuum. For ordinary users, I think > that'd be just fine. It might be less fine though for people who > disable autovacuum, if there still are any. Like Robert, I don't care for that. > Or, much more simply, we could conclude that it's not that important > if pgstat_vacuum_stat() is unable to get up-to-date stats, and rejigger > the code so that we don't bleat when the file-reading request comes from > that source. This should be a back-patchable fix, whereas #2 above might > be a bit too complicated for that. This, however, feels like a clear improvement. When every VACUUM does pgstat_vacuum_stat(), it doesn't matter if any given VACUUM misses removing entries that became obsolete in the preceding second or so. In fact, rather than merely not bleating when the wait times out, let's not wait at all. I don't favor VACUUM of a small table taking 12s because it spent 2s on the table and 10s waiting to garbage collect a piping-hot stats file.
Noah Misch <noah@leadboat.com> writes: > On Sat, Dec 27, 2014 at 07:55:05PM -0500, Tom Lane wrote: >> To get back to that original complaint about buildfarm runs failing, >> I notice that essentially all of those failures are coming from "wait >> timeout" warnings reported by manual VACUUM commands. Now, VACUUM itself >> has no need to read the stats files. What's actually causing these >> messages is failure to get a timely response in pgstat_vacuum_stat(). >> So let me propose a drastic solution: let's dike out this bit in vacuum.c: >> >> /* >> * Send info about dead objects to the statistics collector, unless we are >> * in autovacuum --- autovacuum.c does this for itself. >> */ >> if ((vacstmt->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess()) >> pgstat_vacuum_stat(); >> >> This would have the effect of transferring all responsibility for >> dead-stats-entry cleanup to autovacuum. For ordinary users, I think >> that'd be just fine. It might be less fine though for people who >> disable autovacuum, if there still are any. > Like Robert, I don't care for that. I obviously failed to make it clear enough that that was meant as a straw man, lets-think-outside-the-box proposal. >> Or, much more simply, we could conclude that it's not that important >> if pgstat_vacuum_stat() is unable to get up-to-date stats, and rejigger >> the code so that we don't bleat when the file-reading request comes from >> that source. This should be a back-patchable fix, whereas #2 above might >> be a bit too complicated for that. > This, however, feels like a clear improvement. When every VACUUM does > pgstat_vacuum_stat(), it doesn't matter if any given VACUUM misses removing > entries that became obsolete in the preceding second or so. In fact, rather > than merely not bleating when the wait times out, let's not wait at all. I > don't favor VACUUM of a small table taking 12s because it spent 2s on the > table and 10s waiting to garbage collect a piping-hot stats file. I think that would be going too far: if an autovac wakes up in the dead of night, it should not use an ancient stats file merely because no one else has asked for stats recently. But if it never waits at all, it'll be at the mercy of whatever is already on disk. After looking at the code, the minimum-change alternative would be more or less as attached: first, get rid of the long-obsolete proposition that autovacuum workers need fresher-than-usual stats; second, allow pgstat_vacuum_stat to accept stats that are moderately stale (the number given below allows them to be up to 50 seconds old); and third, suppress wait-timeout warnings when the call is from pgstat_vacuum_stat. The third point is what we need to avoid unnecessary buildfarm failures. The second point addresses the idea that we don't need to stress the stats collector too much for this. regards, tom lane diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 0cf4988..b030e1c 100644 *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *************** static void pgstat_write_statsfiles(bool *** 259,265 **** static void pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry, bool permanent); static HTAB *pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep); static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent); ! static void backend_read_statsfile(void); static void pgstat_read_current_status(void); static bool pgstat_write_statsfile_needed(void); --- 259,265 ---- static void pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry, bool permanent); static HTAB *pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep); static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent); ! static void backend_read_statsfile(bool for_vacuum_stat); static void pgstat_read_current_status(void); static bool pgstat_write_statsfile_needed(void); *************** pgstat_vacuum_stat(void) *** 951,959 **** /* * If not done for this transaction, read the statistics collector stats ! * file into some hash tables. */ ! backend_read_statsfile(); /* * Read pg_database and make a list of OIDs of all existing databases --- 951,962 ---- /* * If not done for this transaction, read the statistics collector stats ! * file into some hash tables. We are willing to accept stats that are ! * more stale than usual here. (Since VACUUM never runs in a transaction ! * block, this cannot result in accepting stale stats that would be used ! * for other purposes later in the transaction.) */ ! backend_read_statsfile(true); /* * Read pg_database and make a list of OIDs of all existing databases *************** pgstat_fetch_stat_dbentry(Oid dbid) *** 2182,2188 **** * If not done for this transaction, read the statistics collector stats * file into some hash tables. */ ! backend_read_statsfile(); /* * Lookup the requested database; return NULL if not found --- 2185,2191 ---- * If not done for this transaction, read the statistics collector stats * file into some hash tables. */ ! backend_read_statsfile(false); /* * Lookup the requested database; return NULL if not found *************** pgstat_fetch_stat_tabentry(Oid relid) *** 2213,2219 **** * If not done for this transaction, read the statistics collector stats * file into some hash tables. */ ! backend_read_statsfile(); /* * Lookup our database, then look in its table hash table. --- 2216,2222 ---- * If not done for this transaction, read the statistics collector stats * file into some hash tables. */ ! backend_read_statsfile(false); /* * Lookup our database, then look in its table hash table. *************** pgstat_fetch_stat_funcentry(Oid func_id) *** 2265,2271 **** PgStat_StatFuncEntry *funcentry = NULL; /* load the stats file if needed */ ! backend_read_statsfile(); /* Lookup our database, then find the requested function. */ dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId); --- 2268,2274 ---- PgStat_StatFuncEntry *funcentry = NULL; /* load the stats file if needed */ ! backend_read_statsfile(false); /* Lookup our database, then find the requested function. */ dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId); *************** pgstat_fetch_stat_numbackends(void) *** 2350,2356 **** PgStat_ArchiverStats * pgstat_fetch_stat_archiver(void) { ! backend_read_statsfile(); return &archiverStats; } --- 2353,2359 ---- PgStat_ArchiverStats * pgstat_fetch_stat_archiver(void) { ! backend_read_statsfile(false); return &archiverStats; } *************** pgstat_fetch_stat_archiver(void) *** 2367,2373 **** PgStat_GlobalStats * pgstat_fetch_global(void) { ! backend_read_statsfile(); return &globalStats; } --- 2370,2376 ---- PgStat_GlobalStats * pgstat_fetch_global(void) { ! backend_read_statsfile(false); return &globalStats; } *************** done: *** 4341,4349 **** * If not already done, read the statistics collector stats file into * some hash tables. The results will be kept until pgstat_clear_snapshot() * is called (typically, at end of transaction). */ static void ! backend_read_statsfile(void) { TimestampTz min_ts = 0; TimestampTz ref_ts = 0; --- 4344,4357 ---- * If not already done, read the statistics collector stats file into * some hash tables. The results will be kept until pgstat_clear_snapshot() * is called (typically, at end of transaction). + * + * If for_vacuum_stat is true, we're doing this for pgstat_vacuum_stat, + * which does not really need up-to-date stats: it's only trying to flush + * obsolete stats entries. So we accept older stats than we normally would, + * and don't bleat about failure to get fresh stats either. */ static void ! backend_read_statsfile(bool for_vacuum_stat) { TimestampTz min_ts = 0; TimestampTz ref_ts = 0; *************** backend_read_statsfile(void) *** 4376,4385 **** /* * We set the minimum acceptable timestamp to PGSTAT_STAT_INTERVAL * msec before now. This indirectly ensures that the collector ! * needn't write the file more often than PGSTAT_STAT_INTERVAL. In ! * an autovacuum worker, however, we want a lower delay to avoid ! * using stale data, so we use PGSTAT_RETRY_DELAY (since the ! * number of workers is low, this shouldn't be a problem). * * We don't recompute min_ts after sleeping, except in the * unlikely case that cur_ts went backwards. So we might end up --- 4384,4392 ---- /* * We set the minimum acceptable timestamp to PGSTAT_STAT_INTERVAL * msec before now. This indirectly ensures that the collector ! * needn't write the file more often than PGSTAT_STAT_INTERVAL. ! * For vacuum_stat purposes, though, up to 100 times the normal ! * age is considered acceptable. * * We don't recompute min_ts after sleeping, except in the * unlikely case that cur_ts went backwards. So we might end up *************** backend_read_statsfile(void) *** 4390,4398 **** * actually accept. */ ref_ts = cur_ts; ! if (IsAutoVacuumWorkerProcess()) min_ts = TimestampTzPlusMilliseconds(ref_ts, ! -PGSTAT_RETRY_DELAY); else min_ts = TimestampTzPlusMilliseconds(ref_ts, -PGSTAT_STAT_INTERVAL); --- 4397,4405 ---- * actually accept. */ ref_ts = cur_ts; ! if (for_vacuum_stat) min_ts = TimestampTzPlusMilliseconds(ref_ts, ! -100 * PGSTAT_STAT_INTERVAL); else min_ts = TimestampTzPlusMilliseconds(ref_ts, -PGSTAT_STAT_INTERVAL); *************** backend_read_statsfile(void) *** 4441,4447 **** pg_usleep(PGSTAT_RETRY_DELAY * 1000L); } ! if (count >= PGSTAT_POLL_LOOP_COUNT) elog(WARNING, "pgstat wait timeout"); /* --- 4448,4454 ---- pg_usleep(PGSTAT_RETRY_DELAY * 1000L); } ! if (count >= PGSTAT_POLL_LOOP_COUNT && !for_vacuum_stat) elog(WARNING, "pgstat wait timeout"); /*
On Sun, Jan 18, 2015 at 04:09:29PM -0500, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Sat, Dec 27, 2014 at 07:55:05PM -0500, Tom Lane wrote: > >> Or, much more simply, we could conclude that it's not that important > >> if pgstat_vacuum_stat() is unable to get up-to-date stats, and rejigger > >> the code so that we don't bleat when the file-reading request comes from > >> that source. This should be a back-patchable fix, whereas #2 above might > >> be a bit too complicated for that. > > > This, however, feels like a clear improvement. When every VACUUM does > > pgstat_vacuum_stat(), it doesn't matter if any given VACUUM misses removing > > entries that became obsolete in the preceding second or so. In fact, rather > > than merely not bleating when the wait times out, let's not wait at all. I > > don't favor VACUUM of a small table taking 12s because it spent 2s on the > > table and 10s waiting to garbage collect a piping-hot stats file. > > I think that would be going too far: if an autovac wakes up in the dead of > night, it should not use an ancient stats file merely because no one else > has asked for stats recently. But if it never waits at all, it'll be > at the mercy of whatever is already on disk. Whoops; I underestimated the upper bound on time between stats file writes. > After looking at the code, the minimum-change alternative would be more or > less as attached: first, get rid of the long-obsolete proposition that > autovacuum workers need fresher-than-usual stats; second, allow > pgstat_vacuum_stat to accept stats that are moderately stale (the number > given below allows them to be up to 50 seconds old); and third, suppress > wait-timeout warnings when the call is from pgstat_vacuum_stat. The third > point is what we need to avoid unnecessary buildfarm failures. The second > point addresses the idea that we don't need to stress the stats collector > too much for this. Only #3 belongs in a minimum-change patch. #1 and #2 solve and/or create different problems and operate independently of #3. A patch covering #3 alone sounds attractive.
Noah Misch <noah@leadboat.com> writes: > On Sun, Jan 18, 2015 at 04:09:29PM -0500, Tom Lane wrote: >> After looking at the code, the minimum-change alternative would be more or >> less as attached: first, get rid of the long-obsolete proposition that >> autovacuum workers need fresher-than-usual stats; second, allow >> pgstat_vacuum_stat to accept stats that are moderately stale (the number >> given below allows them to be up to 50 seconds old); and third, suppress >> wait-timeout warnings when the call is from pgstat_vacuum_stat. The third >> point is what we need to avoid unnecessary buildfarm failures. The second >> point addresses the idea that we don't need to stress the stats collector >> too much for this. > Only #3 belongs in a minimum-change patch. #1 and #2 solve and/or create > different problems and operate independently of #3. A patch covering #3 alone > sounds attractive. [ raised eyebrow... ] In your previous message, you were advocating *for* a change that was more invasive than this one. Why the change of heart? In particular, I thought we'd already agreed to point #1. regards, tom lane
On Sun, Jan 18, 2015 at 07:02:54PM -0500, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Sun, Jan 18, 2015 at 04:09:29PM -0500, Tom Lane wrote: > >> After looking at the code, the minimum-change alternative would be more or > >> less as attached: first, get rid of the long-obsolete proposition that > >> autovacuum workers need fresher-than-usual stats; second, allow > >> pgstat_vacuum_stat to accept stats that are moderately stale (the number > >> given below allows them to be up to 50 seconds old); and third, suppress > >> wait-timeout warnings when the call is from pgstat_vacuum_stat. The third > >> point is what we need to avoid unnecessary buildfarm failures. The second > >> point addresses the idea that we don't need to stress the stats collector > >> too much for this. > > > Only #3 belongs in a minimum-change patch. #1 and #2 solve and/or create > > different problems and operate independently of #3. A patch covering #3 alone > > sounds attractive. > > [ raised eyebrow... ] In your previous message, you were advocating *for* > a change that was more invasive than this one. Why the change of heart? I phrased that proposal based on a wrong belief that the collector writes the stats file regularly in any case. Vacuuming a 49s-old stats file without even trying to get a fresh one might or might not improve PostgreSQL, but it's dissimilar to what I had in mind. I did not advocate for #1 at any point. > In particular, I thought we'd already agreed to point #1. Nope. You and Alvaro ACKed it, then Heikki NACKed it.
On Sun, Jan 18, 2015 at 4:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > After looking at the code, the minimum-change alternative would be more or > less as attached: first, get rid of the long-obsolete proposition that > autovacuum workers need fresher-than-usual stats; second, allow > pgstat_vacuum_stat to accept stats that are moderately stale (the number > given below allows them to be up to 50 seconds old); and third, suppress > wait-timeout warnings when the call is from pgstat_vacuum_stat. The third > point is what we need to avoid unnecessary buildfarm failures. The second > point addresses the idea that we don't need to stress the stats collector > too much for this. I think this is too much of a good thing. I don't see any reason why autovacuum's statistics need to be fresher than normal, but I also don't see any reason why they need to be less fresh. I think suppressing the warning is a good idea, but why only suppress it for autovacuum? How about just knocking the level down to, say, DEBUG1? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
От
Andres Freund
Дата:
On 2015-01-18 21:33:25 -0500, Robert Haas wrote: > On Sun, Jan 18, 2015 at 4:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > After looking at the code, the minimum-change alternative would be more or > > less as attached: first, get rid of the long-obsolete proposition that > > autovacuum workers need fresher-than-usual stats; second, allow > > pgstat_vacuum_stat to accept stats that are moderately stale (the number > > given below allows them to be up to 50 seconds old); and third, suppress > > wait-timeout warnings when the call is from pgstat_vacuum_stat. The third > > point is what we need to avoid unnecessary buildfarm failures. The second > > point addresses the idea that we don't need to stress the stats collector > > too much for this. > > I think this is too much of a good thing. I don't see any reason why > autovacuum's statistics need to be fresher than normal, but I also > don't see any reason why they need to be less fresh. I think > suppressing the warning is a good idea, but why only suppress it for > autovacuum? How about just knocking the level down to, say, DEBUG1? +1 for just using LOG - which by default does not end up on client machines. In contrast to WARNING. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 19, 2015 at 7:09 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2015-01-18 21:33:25 -0500, Robert Haas wrote: >> On Sun, Jan 18, 2015 at 4:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > After looking at the code, the minimum-change alternative would be more or >> > less as attached: first, get rid of the long-obsolete proposition that >> > autovacuum workers need fresher-than-usual stats; second, allow >> > pgstat_vacuum_stat to accept stats that are moderately stale (the number >> > given below allows them to be up to 50 seconds old); and third, suppress >> > wait-timeout warnings when the call is from pgstat_vacuum_stat. The third >> > point is what we need to avoid unnecessary buildfarm failures. The second >> > point addresses the idea that we don't need to stress the stats collector >> > too much for this. >> >> I think this is too much of a good thing. I don't see any reason why >> autovacuum's statistics need to be fresher than normal, but I also >> don't see any reason why they need to be less fresh. I think >> suppressing the warning is a good idea, but why only suppress it for >> autovacuum? How about just knocking the level down to, say, DEBUG1? > > +1 for just using LOG - which by default does not end up on client > machines. In contrast to WARNING. Yeah, that doesn't seem like a bad idea, either. The message seems much more likely to be of interest to the DBA than the user; the DBA can use the message to diagnose an overloaded I/O subsystem (which I think is the usual cause of this problem) whereas the only point of likely interest to the user is that their query ran slowly (which they know without the message). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jan 19, 2015 at 7:09 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> On 2015-01-18 21:33:25 -0500, Robert Haas wrote: >>> I think this is too much of a good thing. I don't see any reason why >>> autovacuum's statistics need to be fresher than normal, but I also >>> don't see any reason why they need to be less fresh. I think >>> suppressing the warning is a good idea, but why only suppress it for >>> autovacuum? How about just knocking the level down to, say, DEBUG1? >> +1 for just using LOG - which by default does not end up on client >> machines. In contrast to WARNING. > Yeah, that doesn't seem like a bad idea, either. The message seems > much more likely to be of interest to the DBA than the user; the DBA > can use the message to diagnose an overloaded I/O subsystem (which I > think is the usual cause of this problem) whereas the only point of > likely interest to the user is that their query ran slowly (which they > know without the message). If that were true I'd agree with you, but it's false on its face. A user who is actually examining the statistics might well want to know if they're significantly out of date. A very relevant example is that I've always wondered how come, when we see buildfarm failures in the "stats" regression test, they always appear in the form of output differences that indicate that the session did not see the expected stats update --- but there's never a timeout warning printed, which indicates that whatever the cause is, it ain't that. I'd be fine with changing the warning to LOG level rather than suppressing it entirely for the specific case of pgstat_vacuum_stat; but I do want to distinguish that case, wherein it's fair to conclude that obsolete stats aren't too worrisome, from other cases where no such conclusion is justified. regards, tom lane
On Mon, Jan 19, 2015 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > If that were true I'd agree with you, but it's false on its face. > A user who is actually examining the statistics might well want to > know if they're significantly out of date. A very relevant example > is that I've always wondered how come, when we see buildfarm failures > in the "stats" regression test, they always appear in the form of > output differences that indicate that the session did not see the > expected stats update --- but there's never a timeout warning printed, > which indicates that whatever the cause is, it ain't that. Sure, but nobody who is not a developer is going to care about that. A typical user who sees "pgstat wait timeout", or doesn't, isn't going to be able to make anything at all out of that. > I'd be fine with changing the warning to LOG level rather than > suppressing it entirely for the specific case of pgstat_vacuum_stat; > but I do want to distinguish that case, wherein it's fair to conclude > that obsolete stats aren't too worrisome, from other cases where no > such conclusion is justified. But I can live with this compromise. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Sure, but nobody who is not a developer is going to care about that. > A typical user who sees "pgstat wait timeout", or doesn't, isn't going > to be able to make anything at all out of that. Possibly we need to improve the wording of that error message then. When it was written, we really assumed that it was a can't-happen case and so didn't spend much effort on it. Perhaps it should become a translatable ereport phrased like "WARNING: using stale statistics instead of current ones because stats collector is not responding". (I'm also wondering if it'd make sense to expose the stats timestamp as a callable function, so that the case could be dealt with programmatically as well. But that's future-feature territory.) >> I'd be fine with changing the warning to LOG level rather than >> suppressing it entirely for the specific case of pgstat_vacuum_stat; >> but I do want to distinguish that case, wherein it's fair to conclude >> that obsolete stats aren't too worrisome, from other cases where no >> such conclusion is justified. > But I can live with this compromise. Is that OK with everybody? Going once, going twice ... regards, tom lane
Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
От
Andres Freund
Дата:
On 2015-01-19 11:16:09 -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > >> I'd be fine with changing the warning to LOG level rather than > >> suppressing it entirely for the specific case of pgstat_vacuum_stat; > >> but I do want to distinguish that case, wherein it's fair to conclude > >> that obsolete stats aren't too worrisome, from other cases where no > >> such conclusion is justified. > > > But I can live with this compromise. > > Is that OK with everybody? Going once, going twice ... I can live with the compromise as well, but I'd rather change it to always be of LOG priority. LOG is more likely to end up in the log and that's where it's actually likely to be noticed. In most of the cases WARNINGs going to the client won't be noticed as this error is much more likely on servers with a high load caused by programs than during interactive use. > > Sure, but nobody who is not a developer is going to care about that. > > A typical user who sees "pgstat wait timeout", or doesn't, isn't going > > to be able to make anything at all out of that. > > Possibly we need to improve the wording of that error message then. > When it was written, we really assumed that it was a can't-happen case > and so didn't spend much effort on it. Perhaps it should become a > translatable ereport phrased like "WARNING: using stale statistics > instead of current ones because stats collector is not responding". Yes, that seems like a good message improvement. It's not like making it LOG makes the message invisible... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 19, 2015 at 11:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Sure, but nobody who is not a developer is going to care about that. >> A typical user who sees "pgstat wait timeout", or doesn't, isn't going >> to be able to make anything at all out of that. > > Possibly we need to improve the wording of that error message then. > When it was written, we really assumed that it was a can't-happen case > and so didn't spend much effort on it. Perhaps it should become a > translatable ereport phrased like "WARNING: using stale statistics > instead of current ones because stats collector is not responding". I'm still not completely convinced it deserves to be a WARNING, but I definitely think turning it into a translatable error message is the right call. Calling this a "can't happen" case is clearly ridiculous at this point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Andres Freund <andres@2ndquadrant.com> writes: > On 2015-01-19 11:16:09 -0500, Tom Lane wrote: >> Possibly we need to improve the wording of that error message then. >> When it was written, we really assumed that it was a can't-happen case >> and so didn't spend much effort on it. Perhaps it should become a >> translatable ereport phrased like "WARNING: using stale statistics >> instead of current ones because stats collector is not responding". > Yes, that seems like a good message improvement. > It's not like making it LOG makes the message invisible... Uh, yes it does. So far as the user is concerned anyway. The entire point of this discussion is to prevent the message from showing up in buildfarm output, remember? regards, tom lane
Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
От
Andres Freund
Дата:
On 2015-01-19 11:28:41 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2015-01-19 11:16:09 -0500, Tom Lane wrote: > >> Possibly we need to improve the wording of that error message then. > >> When it was written, we really assumed that it was a can't-happen case > >> and so didn't spend much effort on it. Perhaps it should become a > >> translatable ereport phrased like "WARNING: using stale statistics > >> instead of current ones because stats collector is not responding". > > > Yes, that seems like a good message improvement. > > > It's not like making it LOG makes the message invisible... > > Uh, yes it does. So far as the user is concerned anyway. Sure, but the log isn't invisible. As mentioned one paragraph above, I don't think it's likely to ever be noticed in the client code in the cases where it happens in production. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 19, 2015 at 11:30 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2015-01-19 11:28:41 -0500, Tom Lane wrote: >> Andres Freund <andres@2ndquadrant.com> writes: >> > On 2015-01-19 11:16:09 -0500, Tom Lane wrote: >> >> Possibly we need to improve the wording of that error message then. >> >> When it was written, we really assumed that it was a can't-happen case >> >> and so didn't spend much effort on it. Perhaps it should become a >> >> translatable ereport phrased like "WARNING: using stale statistics >> >> instead of current ones because stats collector is not responding". >> >> > Yes, that seems like a good message improvement. >> >> > It's not like making it LOG makes the message invisible... >> >> Uh, yes it does. So far as the user is concerned anyway. > > Sure, but the log isn't invisible. As mentioned one paragraph above, I > don't think it's likely to ever be noticed in the client code in the > cases where it happens in production. Yes, that is my feeling as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jan 19, 2015 at 11:30 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> Sure, but the log isn't invisible. As mentioned one paragraph above, I >> don't think it's likely to ever be noticed in the client code in the >> cases where it happens in production. > Yes, that is my feeling as well. Meh. I still don't like it, but I guess I'm outvoted. Unless there are further votes, what we have at this point is just: - elog(WARNING, "pgstat wait timeout"); + ereport(LOG, (errmsg("using stale statistics instead of current ones because stats collector is not responding"))); with no conditionality for pgstat_vacuum_stat vs. other callers. regards, tom lane
On Mon, Jan 19, 2015 at 12:05:01PM -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Mon, Jan 19, 2015 at 11:30 AM, Andres Freund <andres@2ndquadrant.com> wrote: > >> Sure, but the log isn't invisible. As mentioned one paragraph above, I > >> don't think it's likely to ever be noticed in the client code in the > >> cases where it happens in production. > > > Yes, that is my feeling as well. > > Meh. I still don't like it, but I guess I'm outvoted. Unless there are > further votes, what we have at this point is just: > > - elog(WARNING, "pgstat wait timeout"); > + ereport(LOG, (errmsg("using stale statistics instead of current ones because stats collector is not responding"))); > > with no conditionality for pgstat_vacuum_stat vs. other callers. That is satisfactory to me, too.
On 25.12.2014 22:28, Tomas Vondra wrote: > On 25.12.2014 21:14, Andres Freund wrote: > >> That's indeed odd. Seems to have been lost when the statsfile was >> split into multiple files. Alvaro, Tomas? > > The goal was to keep the logic as close to the original as possible. > IIRC there were "pgstat wait timeout" issues before, and in most cases > the conclusion was that it's probably because of overloaded I/O. > > But maybe there actually was another bug, and it's entirely possible > that the split introduced a new one, and that's what we're seeing now. > The strange thing is that the split happened ~2 years ago, which is > inconsistent with the sudden increase of this kind of issues. So maybe > something changed on that particular animal (a failing SD card causing > I/O stalls, perhaps)? > > Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce > and analyze the issue locally. But that won't happen until January. I've tried to reproduce this on my Raspberry PI 'machine' and it's not very difficult to trigger this. About 7 out of 10 'make check' runs fail because of 'pgstat wait timeout'. All the occurences I've seen were right after some sort of VACUUM (sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time looked something like this: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util mmcblk0 0.00 75.00 0.00 8.00 0.00 36.00 9.00 5.73 15633.75 0.00 15633.75 125.00 100.00 So pretty terrible (this is a Class 4 SD card, supposedly able to handle 4 MB/s). If hamster had faulty SD card, it might have been much worse, I guess. This of course does not prove the absence of a bug - I plan to dig into this a bit more. Feel free to point out some suspicious scenarios that might be worth reproducing and analyzing. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jan 21, 2015 at 1:08 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > On 25.12.2014 22:28, Tomas Vondra wrote: >> On 25.12.2014 21:14, Andres Freund wrote: >> >>> That's indeed odd. Seems to have been lost when the statsfile was >>> split into multiple files. Alvaro, Tomas? >> >> The goal was to keep the logic as close to the original as possible. >> IIRC there were "pgstat wait timeout" issues before, and in most cases >> the conclusion was that it's probably because of overloaded I/O. >> >> But maybe there actually was another bug, and it's entirely possible >> that the split introduced a new one, and that's what we're seeing now. >> The strange thing is that the split happened ~2 years ago, which is >> inconsistent with the sudden increase of this kind of issues. So maybe >> something changed on that particular animal (a failing SD card causing >> I/O stalls, perhaps)? >> >> Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce >> and analyze the issue locally. But that won't happen until January. > > I've tried to reproduce this on my Raspberry PI 'machine' and it's not > very difficult to trigger this. About 7 out of 10 'make check' runs fail > because of 'pgstat wait timeout'. > > All the occurences I've seen were right after some sort of VACUUM > (sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time > looked something like this: > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > mmcblk0 0.00 75.00 0.00 8.00 0.00 36.00 > 9.00 5.73 15633.75 0.00 15633.75 125.00 100.00 > > So pretty terrible (this is a Class 4 SD card, supposedly able to handle > 4 MB/s). If hamster had faulty SD card, it might have been much worse, I > guess. By experience, a class 10 is at least necessary, with a minimum amount of memory to minimize the apparition of those warnings, hamster having now a 8GB class 10 card. -- Michael
On 21.1.2015 00:38, Michael Paquier wrote: > On Wed, Jan 21, 2015 at 1:08 AM, Tomas Vondra > >> I've tried to reproduce this on my Raspberry PI 'machine' and it's not >> very difficult to trigger this. About 7 out of 10 'make check' runs fail >> because of 'pgstat wait timeout'. >> >> All the occurences I've seen were right after some sort of VACUUM >> (sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time >> looked something like this: >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s >> avgrq-sz avgqu-sz await r_await w_await svctm %util >> mmcblk0 0.00 75.00 0.00 8.00 0.00 36.00 >> 9.00 5.73 15633.75 0.00 15633.75 125.00 100.00 >> >> So pretty terrible (this is a Class 4 SD card, supposedly able to >> handle 4 MB/s). If hamster had faulty SD card, it might have been >> much worse, I guess. > > By experience, a class 10 is at least necessary, with a minimum > amount of memory to minimize the apparition of those warnings, > hamster having now a 8GB class 10 card. Well, my goal was exactly to produce those warnings ;-) and see if I can identify some strange cases. That's why I chose just class 4. But even then it produces rather low number of those warnings (one or two per check run), and mostly at the expected places with significant I/O overload. So I'm not any wiser :-( regards Tomas
Sure, but nobody who is not a developer is going to care about that.
A typical user who sees "pgstat wait timeout", or doesn't, isn't going
to be able to make anything at all out of that.
As a user, I wholeheartedly disagree.
That warning helped me massively in diagnosing an unhealthy database server in the past at TripAdvisor (i.e. high end server class box, not a raspberry pie). I have realtime monitoring that looks at pg_stat_database at regular intervals particularly for the velocity of change of xact_commit and xact_rollback columns, similar to how check_postgres does it. https://github.com/bucardo/check_postgres/blob/master/check_postgres.pl#L4234
When one of those servers was unhealthy, it stopped reporting statistics for 30 seconds+ at a time. My dashboard which polled far more frequently than that indicated the server was normally processing 0 tps with intermittent spikes. I went directly onto the server and sampled pg_stat_database. That warning was the only thing that directly indicated that the statistics collector was not to be trusted. It obviously was a victim of what was going on in the server, but its pretty important to know when your methods for measuring server health are lying to you. The spiky TPS at first glance appears like some sort of live lock, not just that the server is overloaded.
Now, I know: 0 change in stats = collector broken. Rereading the docks,
Also, the collector itself emits a new report at most once per PGSTAT_STAT_INTERVAL milliseconds (500 ms unless altered while building the server).
Without context this merely reads: "We sleep for 500ms, plus the time to write the file, plus whenever the OS decides to enforce the timer interrupt... so like 550-650ms." It doesn't read, "When server is unhealthy, but _still_ serving queries, the stats collector might not be able to keep up and will just stop reporting stats all together."
I think the warning is incredibly valuable. Along those lines I'd also love to see a pg_stat_snapshot_timestamp() for monitoring code to use to determine if its using a stale snapshot, as well as helping to smooth graphs of the statistics that are based on highly granular snapshotting.
- Matt Kelly
On 2015-01-21 22:43:03 -0500, Matt Kelly wrote: > > > > Sure, but nobody who is not a developer is going to care about that. > > A typical user who sees "pgstat wait timeout", or doesn't, isn't going > > to be able to make anything at all out of that. > > > As a user, I wholeheartedly disagree. Note that that the change we discussed wasn't removing the message. It was changing the log level from WARNING to LOG. Which means the change is not going to the client anymore, but still to the server log (perhaps surprisingly, the likelihood for the latter actually increases). > I think the warning is incredibly valuable. Along those lines I'd also > love to see a pg_stat_snapshot_timestamp() for monitoring code to use to > determine if its using a stale snapshot, as well as helping to smooth > graphs of the statistics that are based on highly granular snapshotting. I can see that being useful. Greetings, Andres Freund