Обсуждение: Better way of dealing with pgstat wait timeout during buildfarm runs?

Поиск
Список
Период
Сортировка

Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Andres Freund
Дата:
Hi,

We quite regularly have buildfarm failures that are caused by 'WARNING:
pgstat wait timeout' at random points during the build. Especially on
some of the ARM buildfarm animals those are really frequent, to the
point that it's hard to know the actual state of the buildfarm without
looking at several animals. The ARM ones are probably affected more
frequently because they'll often run on sd cards and such.

So I think a better way to deal with that warning would be a good
idea. Besides somehow making the mechanism there are two ways to attack
this that I can think of, neither of them awe inspiring:

1) Make that WARNING a LOG message instead. Since those don't get send
to the client with default settings...
2) Increase PGSTAT_MAX_WAIT_TIME even further than what 99b545 increased
it to.

Does anybody have an actually good idea?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> So I think a better way to deal with that warning would be a good
> idea. Besides somehow making the mechanism there are two ways to attack
> this that I can think of, neither of them awe inspiring:

> 1) Make that WARNING a LOG message instead. Since those don't get send
> to the client with default settings...
> 2) Increase PGSTAT_MAX_WAIT_TIME even further than what 99b545 increased
> it to.

Yeah, I've been getting more annoyed by that too lately.  I keep wondering
though whether there's an actual bug underneath that behavior that we're
failing to see.  PGSTAT_MAX_WAIT_TIME is already 10 seconds; it's hard to
credit that increasing it still further would be "fixing" anything.
The other change would also mainly just sweep the issue under the rug,
if there is any issue and it's not just that we're overloading
underpowered buildfarm machines.  (Maybe a better fix would be to reduce
MAX_CONNECTIONS for the tests on these machines?)

I wonder whether when multiple processes are demanding statsfile updates,
there's some misbehavior that causes them to suck CPU away from the stats
collector and/or convince it that it doesn't need to write anything.
There are odd things in the logs in some of these events.  For example in
today's failure on hamster,
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2014-12-25%2016%3A00%3A07
there are two client-visible wait-timeout warnings, one each in the
gist and spgist tests.  But in the postmaster log we find these in
fairly close succession:

[549c38ba.724d:2] WARNING:  pgstat wait timeout
[549c39b1.73e7:10] WARNING:  pgstat wait timeout
[549c38ba.724d:3] WARNING:  pgstat wait timeout

Correlating these with other log entries shows that the first and third
are from the autovacuum launcher while the second is from the gist test
session.  So the spgist failure failed to get logged, and in any case the
big picture is that we had four timeout warnings occurring in a pretty
short span of time, in a parallel test set that's not all that demanding
(12 parallel tests, well below our max).  Not sure what to make of that.

BTW, I notice that in the current state of pgstat.c, all the logic for
keeping track of request arrival times is dead code, because nothing is
actually looking at DBWriteRequest.request_time.  This makes me think that
somebody simplified away some logic we maybe should have kept.  But if
we're going to leave it like this, we could replace the DBWriteRequest
data structure with a simple OID list and save a fair amount of code.
        regards, tom lane



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Andres Freund
Дата:
On 2014-12-25 14:36:42 -0500, Tom Lane wrote:
> I wonder whether when multiple processes are demanding statsfile updates,
> there's some misbehavior that causes them to suck CPU away from the stats
> collector and/or convince it that it doesn't need to write anything.
> There are odd things in the logs in some of these events.  For example in
> today's failure on hamster,
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2014-12-25%2016%3A00%3A07
> there are two client-visible wait-timeout warnings, one each in the
> gist and spgist tests.  But in the postmaster log we find these in
> fairly close succession:
> 
> [549c38ba.724d:2] WARNING:  pgstat wait timeout
> [549c39b1.73e7:10] WARNING:  pgstat wait timeout
> [549c38ba.724d:3] WARNING:  pgstat wait timeout
> 
> Correlating these with other log entries shows that the first and third
> are from the autovacuum launcher while the second is from the gist test
> session.  So the spgist failure failed to get logged, and in any case the
> big picture is that we had four timeout warnings occurring in a pretty
> short span of time, in a parallel test set that's not all that demanding
> (12 parallel tests, well below our max).  Not sure what to make of that.

My guess is that a checkpoint happened at that time. Maybe it'd be a
good idea to make pg_regress start postgres with log_checkpoints
enabled? My guess is that we'd find horrendous 'sync' times.

Michael: Could you perhaps turn log_checkpoints on in the config?

> BTW, I notice that in the current state of pgstat.c, all the logic for
> keeping track of request arrival times is dead code, because nothing is
> actually looking at DBWriteRequest.request_time.  This makes me think that
> somebody simplified away some logic we maybe should have kept.  But if
> we're going to leave it like this, we could replace the DBWriteRequest
> data structure with a simple OID list and save a fair amount of code.

That's indeed odd. Seems to have been lost when the statsfile was split
into multiple files. Alvaro, Tomas?

I wondered for a second whether the split could be responsible somehow,
but there's reports of that in older backbranches as well:
http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=mereswine&dt=2014-12-23%2019%3A17%3A41

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tomas Vondra
Дата:
On 25.12.2014 20:36, Tom Lane wrote:
>
> Yeah, I've been getting more annoyed by that too lately. I keep
> wondering though whether there's an actual bug underneath that
> behavior that we're failing to see. PGSTAT_MAX_WAIT_TIME is already
> 10 seconds; it's hard to credit that increasing it still further
> would be "fixing" anything. The other change would also mainly just
> sweep the issue under the rug, if there is any issue and it's not
> just that we're overloading underpowered buildfarm machines. (Maybe a
> better fix would be to reduce MAX_CONNECTIONS for the tests on these
> machines?)

I agree that increasing the limit further is not a good solution.

> BTW, I notice that in the current state of pgstat.c, all the logic
> for keeping track of request arrival times is dead code, because
> nothing is actually looking at DBWriteRequest.request_time. This
> makes me think that somebody simplified away some logic we maybe
> should have kept. But if we're going to leave it like this, we could
> replace the DBWriteRequest data structure with a simple OID list and
> save a fair amount of code.

Really? Which part of the code is dead? I see pgstat_recv_inquiry() is
updating the request_time after receiving the inquiry, and
pgstat_db_requested() is looking at it when writing the files.

If we can simplify the code by keeping just OIDs, let's do that. I think
the main reason why we haven't done that in 187492b6 was to keep as much
of the existing logic (and maybe change it in a separate patch).

regards
Tomas



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tom Lane
Дата:
Tomas Vondra <tv@fuzzy.cz> writes:
> On 25.12.2014 20:36, Tom Lane wrote:
>> BTW, I notice that in the current state of pgstat.c, all the logic
>> for keeping track of request arrival times is dead code, because
>> nothing is actually looking at DBWriteRequest.request_time.

> Really? Which part of the code is dead? I see pgstat_recv_inquiry() is
> updating the request_time after receiving the inquiry, and
> pgstat_db_requested() is looking at it when writing the files.

Where is pgstat_db_requested() looking at request_time?

> If we can simplify the code by keeping just OIDs, let's do that. I think
> the main reason why we haven't done that in 187492b6 was to keep as much
> of the existing logic (and maybe change it in a separate patch).

The real point here is that I think that commit *already* changed the
existing logic, because the time-of-receipt used to matter.  In
particular, there used to be a throttle on how often the stats file
could get written, which seems to have vanished.  I seriously doubt
that that was a good change, especially on write-bandwidth-challenged
platforms.
        regards, tom lane



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tomas Vondra
Дата:
On 25.12.2014 21:14, Andres Freund wrote:
> On 2014-12-25 14:36:42 -0500, Tom Lane wrote:
>
> My guess is that a checkpoint happened at that time. Maybe it'd be a 
> good idea to make pg_regress start postgres with log_checkpoints 
> enabled? My guess is that we'd find horrendous 'sync' times.
> 
> Michael: Could you perhaps turn log_checkpoints on in the config?

Logging timestamps (using log_line_prefux) would be also helpful.

> 
>> BTW, I notice that in the current state of pgstat.c, all the logic
>> for keeping track of request arrival times is dead code, because
>> nothing is actually looking at DBWriteRequest.request_time. This
>> makes me think that somebody simplified away some logic we maybe
>> should have kept. But if we're going to leave it like this, we
>> could replace the DBWriteRequest data structure with a simple OID
>> list and save a fair amount of code.
> 
> That's indeed odd. Seems to have been lost when the statsfile was
> split into multiple files. Alvaro, Tomas?

The goal was to keep the logic as close to the original as possible.
IIRC there were "pgstat wait timeout" issues before, and in most cases
the conclusion was that it's probably because of overloaded I/O.

But maybe there actually was another bug, and it's entirely possible
that the split introduced a new one, and that's what we're seeing now.
The strange thing is that the split happened ~2 years ago, which is
inconsistent with the sudden increase of this kind of issues. So maybe
something changed on that particular animal (a failing SD card causing
I/O stalls, perhaps)?

Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
and analyze the issue locally. But that won't happen until January.

> I wondered for a second whether the split could be responsible
> somehow, but there's reports of that in older backbranches as well:
> http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=mereswine&dt=2014-12-23%2019%3A17%3A41


Tomas



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tom Lane
Дата:
Tomas Vondra <tv@fuzzy.cz> writes:
> The strange thing is that the split happened ~2 years ago, which is
> inconsistent with the sudden increase of this kind of issues. So maybe
> something changed on that particular animal (a failing SD card causing
> I/O stalls, perhaps)?

I think that hamster has basically got a tin can and string for an I/O
subsystem.  It's not real clear to me whether there's actually been an
increase in "wait timeout" failures recently; somebody would have to
go through and count them before I'd have much faith in that thesis.
However, I notice that at least the last few occurrences on "hamster"
all seem to have been in this parallel block:

test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address

which as recently as 9.4 contained just these tests:

test: privileges security_label collate matview lock replica_identity

I'm fairly sure that the index-related tests in this batch are I/O
intensive, and since they were not there at all six months ago, it's not
hard to believe that this block of tests has far greater I/O demands than
used to exist.  Draw your own conclusions ...
        regards, tom lane



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tomas Vondra
Дата:
On 25.12.2014 22:16, Tom Lane wrote:
> Tomas Vondra <tv@fuzzy.cz> writes:
>> On 25.12.2014 20:36, Tom Lane wrote:
>>> BTW, I notice that in the current state of pgstat.c, all the logic
>>> for keeping track of request arrival times is dead code, because
>>> nothing is actually looking at DBWriteRequest.request_time.
> 
>> Really? Which part of the code is dead? I see pgstat_recv_inquiry() is
>> updating the request_time after receiving the inquiry, and
>> pgstat_db_requested() is looking at it when writing the files.
> 
> Where is pgstat_db_requested() looking at request_time?

Oh, right. pgstat_db_requested() is not looking at the timestamp - it
only checks the OID of the database. But pgstat_recv_inquiry() is
checking it, comparing it to cutoff_time etc.

ISTM the main change related to this is that this:
   if (last_statwrite < last_statrequest)       pgstat_write_statsfile(false);

got replaced by this:
   if (pgstat_write_statsfile_needed())       pgstat_write_statsfiles(false, false);

where pgstat_write_statsfile_needed() only checks if the list is empty
(and ignores the request/write timestamps).

If I understand that correctly, this can would lead to writing the files
more often, and we're dealing with a timeout.

>> If we can simplify the code by keeping just OIDs, let's do that. I think
>> the main reason why we haven't done that in 187492b6 was to keep as much
>> of the existing logic (and maybe change it in a separate patch).
> 
> The real point here is that I think that commit *already* changed
> the existing logic, because the time-of-receipt used to matter. In 
> particular, there used to be a throttle on how often the stats file 
> could get written, which seems to have vanished. I seriously doubt 
> that that was a good change, especially on
> write-bandwidth-challenged platforms.

Yes - if that change causes writing the files being written more
frequently, it's not a good change.

But I think the time-of-receipt still matters - pgstat_recv_inquiry
logic remained the same, just applied per database. ISTM the only thing
that disappeared is the (last_statwrite < last_statrequest) check.

I have to think about this a bit more, I haven't seen this code since
the split patch.

Tomas



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tomas Vondra
Дата:
On 25.12.2014 22:40, Tom Lane wrote:
> Tomas Vondra <tv@fuzzy.cz> writes:
>> The strange thing is that the split happened ~2 years ago, which is
>> inconsistent with the sudden increase of this kind of issues. So maybe
>> something changed on that particular animal (a failing SD card causing
>> I/O stalls, perhaps)?
> 
> I think that hamster has basically got a tin can and string for an I/O
> subsystem.  It's not real clear to me whether there's actually been an

Yes. It's called "SD card".

> increase in "wait timeout" failures recently; somebody would have to
> go through and count them before I'd have much faith in that thesis.

That's what I did. On hamster I see this (in the HEAD):

2014-12-25 16:00:07 yes
2014-12-24 16:00:07 yes
2014-12-23 16:00:07 yes
2014-12-22 16:00:07 yes
2014-12-19 16:00:07 yes
2014-12-15 16:00:11 no
2014-10-25 16:00:06 no
2014-10-24 16:00:06 no
2014-10-23 16:00:06 no
2014-10-22 16:00:06 no
2014-10-21 16:00:07 no
2014-10-19 16:00:06 no
2014-09-28 16:00:06 no
2014-09-26 16:00:07 no
2014-08-28 16:00:06 no
2014-08-12 16:00:06 no
2014-08-05 22:04:48 no
2014-07-19 01:53:30 no
2014-07-06 16:00:06 no
2014-07-04 16:00:06 no
2014-06-29 16:00:06 no
2014-05-09 16:00:04 no
2014-05-07 16:00:04 no
2014-05-04 16:00:04 no
2014-04-28 16:00:04 no
2014-04-18 16:00:04 no
2014-04-04 16:00:04 no

(where "yes" means "pgstat wait timeout" is in the logs). On chipmunk,
the trend is much less convincing (but there's much less failures, and
only 3 of them failed because of the "pgstat wait timeout").

However, it's worth mentioning that all the pgstat failures happened at
"16:00:07" and most of the older failures are before that. So it may be
that it was failing because of something else, and the pgstat timeout
could not happen because of that. OTOH, there's a fair amount of
successful runs.


> However, I notice that at least the last few occurrences on "hamster"
> all seem to have been in this parallel block:
> 
> test: brin gin gist spgist privileges security_label collate matview
> lock replica_identity rowsecurity object_address
> 
> which as recently as 9.4 contained just these tests:
> 
> test: privileges security_label collate matview lock replica_identity
> 
> I'm fairly sure that the index-related tests in this batch are I/O 
> intensive, and since they were not there at all six months ago, it's
> not hard to believe that this block of tests has far greater I/O
> demands than used to exist. Draw your own conclusions ...

Yes, that might be the culprit here. Would be interesting to know what's
happening on the machine while the tests are running to confirm this
hypothesis.

regards
Tomas



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tom Lane
Дата:
Tomas Vondra <tv@fuzzy.cz> writes:
> On 25.12.2014 22:40, Tom Lane wrote:
>> I think that hamster has basically got a tin can and string for an I/O
>> subsystem.  It's not real clear to me whether there's actually been an
>> increase in "wait timeout" failures recently; somebody would have to
>> go through and count them before I'd have much faith in that thesis.

> That's what I did. On hamster I see this (in the HEAD):

> 2014-12-25 16:00:07 yes
> 2014-12-24 16:00:07 yes
> 2014-12-23 16:00:07 yes
> 2014-12-22 16:00:07 yes
> 2014-12-19 16:00:07 yes
> 2014-12-15 16:00:11 no
> 2014-10-25 16:00:06 no
> 2014-10-24 16:00:06 no
> 2014-10-23 16:00:06 no
> 2014-10-22 16:00:06 no
> 2014-10-21 16:00:07 no
> 2014-10-19 16:00:06 no
> 2014-09-28 16:00:06 no
> 2014-09-26 16:00:07 no
> 2014-08-28 16:00:06 no
> 2014-08-12 16:00:06 no
> 2014-08-05 22:04:48 no
> 2014-07-19 01:53:30 no
> 2014-07-06 16:00:06 no
> 2014-07-04 16:00:06 no
> 2014-06-29 16:00:06 no
> 2014-05-09 16:00:04 no
> 2014-05-07 16:00:04 no
> 2014-05-04 16:00:04 no
> 2014-04-28 16:00:04 no
> 2014-04-18 16:00:04 no
> 2014-04-04 16:00:04 no

> (where "yes" means "pgstat wait timeout" is in the logs). On chipmunk,
> the trend is much less convincing (but there's much less failures, and
> only 3 of them failed because of the "pgstat wait timeout").

mereswine's history is also pretty interesting in this context.  That
series makes it look like the probability of "pgstat wait timeout" took
a big jump around the beginning of December, especially if you make the
unproven-but-not-unreasonable assumption that the two pg_upgradecheck
failures since then were also wait timeout failures.  That's close enough
after commit 88fc71926392115c (Nov 19) to make me suspect that that was
what put us over the edge: that added a bunch more I/O *and* a bunch more
statistics demands to this one block of parallel tests.

But even if we are vastly overstressing the I/O subsystem on these boxes,
why is it manifesting like this?  pgstat never fsyncs the stats temp file,
so it should not have to wait for physical I/O I'd think.  Or perhaps the
file rename() operations get fsync'd behind the scenes by the filesystem?
        regards, tom lane



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tomas Vondra
Дата:
On 26.12.2014 02:59, Tom Lane wrote:
> Tomas Vondra <tv@fuzzy.cz> writes:
>> On 25.12.2014 22:40, Tom Lane wrote:
>>> I think that hamster has basically got a tin can and string for an I/O
>>> subsystem.  It's not real clear to me whether there's actually been an
>>> increase in "wait timeout" failures recently; somebody would have to
>>> go through and count them before I'd have much faith in that thesis.
> 
>> That's what I did. On hamster I see this (in the HEAD):
> 
>> 2014-12-25 16:00:07 yes
>> 2014-12-24 16:00:07 yes
>> 2014-12-23 16:00:07 yes
>> 2014-12-22 16:00:07 yes
>> 2014-12-19 16:00:07 yes
>> 2014-12-15 16:00:11 no
>> 2014-10-25 16:00:06 no
>> 2014-10-24 16:00:06 no
>> 2014-10-23 16:00:06 no
>> 2014-10-22 16:00:06 no
>> 2014-10-21 16:00:07 no
>> 2014-10-19 16:00:06 no
>> 2014-09-28 16:00:06 no
>> 2014-09-26 16:00:07 no
>> 2014-08-28 16:00:06 no
>> 2014-08-12 16:00:06 no
>> 2014-08-05 22:04:48 no
>> 2014-07-19 01:53:30 no
>> 2014-07-06 16:00:06 no
>> 2014-07-04 16:00:06 no
>> 2014-06-29 16:00:06 no
>> 2014-05-09 16:00:04 no
>> 2014-05-07 16:00:04 no
>> 2014-05-04 16:00:04 no
>> 2014-04-28 16:00:04 no
>> 2014-04-18 16:00:04 no
>> 2014-04-04 16:00:04 no
> 
>> (where "yes" means "pgstat wait timeout" is in the logs). On
>> chipmunk, the trend is much less convincing (but there's much less
>> failures, and only 3 of them failed because of the "pgstat wait
>> timeout").
> 
> mereswine's history is also pretty interesting in this context. That 
> series makes it look like the probability of "pgstat wait timeout"
> took a big jump around the beginning of December, especially if you
> make the unproven-but-not-unreasonable assumption that the two
> pg_upgradecheck failures since then were also wait timeout failures.
> That's close enough after commit 88fc71926392115c (Nov 19) to make me
> suspect that that was what put us over the edge: that added a bunch
> more I/O *and* a bunch more statistics demands to this one block of
> parallel tests.

Interesting. But even if this commit tipped us over the edge, it's not a
proof that the split patch was perfectly correct.

> But even if we are vastly overstressing the I/O subsystem on these
> boxes, why is it manifesting like this? pgstat never fsyncs the stats
> temp file, so it should not have to wait for physical I/O I'd think.
> Or perhaps the file rename() operations get fsync'd behind the scenes
> by the filesystem?

My guess is that the amount of dirty data in page cache reaches, reaches
dirty_ratio/dirty_bytes, effectively forcing the writes to go directly
to the disks. Those ARM machines have rather low amounts of RAM
(typically 256-512MB), and the default values for dirty_ratio are ~20%
IIRC. So that's ~50MB-100MB.

Tomas



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Michael Paquier
Дата:
On Fri, Dec 26, 2014 at 6:28 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
> On 25.12.2014 21:14, Andres Freund wrote:
>> On 2014-12-25 14:36:42 -0500, Tom Lane wrote:
>>
>> My guess is that a checkpoint happened at that time. Maybe it'd be a
>> good idea to make pg_regress start postgres with log_checkpoints
>> enabled? My guess is that we'd find horrendous 'sync' times.
>>
>> Michael: Could you perhaps turn log_checkpoints on in the config?
>
> Logging timestamps (using log_line_prefux) would be also helpful.
Done. If have been wondering about those failures as well but didn't
get the time to look around. FYI, hamster is running with a 8GB class
10 SD card able to do 40M/s in read, card that I changed 2 weeks ago
btw, the former 4GB card beginning to show I/O errors, RIP to it. So
this is what is triggering the wait timeouts more frequently... But I
have no real explanation why REL9_4_STABLE shows no signs of failures.

For the time being, what about putting pg_stats_tmp into a ram disk to
leverage the I/O? Whatever what we come up with, I imagine that I/O
will still be tight on hamster.
-- 
Michael



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > So I think a better way to deal with that warning would be a good
> > idea. Besides somehow making the mechanism there are two ways to attack
> > this that I can think of, neither of them awe inspiring:
> 
> > 1) Make that WARNING a LOG message instead. Since those don't get send
> > to the client with default settings...
> > 2) Increase PGSTAT_MAX_WAIT_TIME even further than what 99b545 increased
> > it to.
> 
> Yeah, I've been getting more annoyed by that too lately.  I keep wondering
> though whether there's an actual bug underneath that behavior that we're
> failing to see.

I think the first thing to do is reconsider usage of PGSTAT_RETRY_DELAY
instead of PGSTAT_STAT_INTERVAL in autovacuum workers.  That decreases
the wait time 50-fold, if I recall this correctly, and causes large
amounts of extra I/O traffic.

> BTW, I notice that in the current state of pgstat.c, all the logic for
> keeping track of request arrival times is dead code, because nothing is
> actually looking at DBWriteRequest.request_time.  This makes me think that
> somebody simplified away some logic we maybe should have kept.

I will have a look.  I remember being confused about this at some point
when reviewing that patch.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> Yeah, I've been getting more annoyed by that too lately.  I keep wondering
>> though whether there's an actual bug underneath that behavior that we're
>> failing to see.

> I think the first thing to do is reconsider usage of PGSTAT_RETRY_DELAY
> instead of PGSTAT_STAT_INTERVAL in autovacuum workers.  That decreases
> the wait time 50-fold, if I recall this correctly, and causes large
> amounts of extra I/O traffic.

Yeah --- that means that instead of the normal behavior that a stats file
newer than 500 msec is good enough, an autovac worker insists on a stats
file newer than 10 msec.  I did some experimentation on prairiedog, and
found that it's not hard at all to see autovac workers demanding multiple
stats writes per second:

2014-12-26 16:26:52.958 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:26:53.128 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:26:53.188 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:26:54.903 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:26:55.058 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:00.022 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:00.285 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:00.792 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:01.010 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:01.163 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:01.193 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:03.595 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:03.673 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:03.839 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:03.878 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:05.878 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:06.571 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:07.001 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:07.769 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:07.950 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:10.256 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:11.039 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:11.402 EST 21026 LOG:  sending inquiry for database 45116

The argument that autovac workers need fresher stats than anything else
seems pretty dubious to start with.  Why shouldn't we simplify that down
to "they use PGSTAT_STAT_INTERVAL like everybody else"?
        regards, tom lane



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Alvaro Herrera
Дата:
Tom Lane wrote:

> The argument that autovac workers need fresher stats than anything else
> seems pretty dubious to start with.  Why shouldn't we simplify that down
> to "they use PGSTAT_STAT_INTERVAL like everybody else"?

The point of wanting fresher stats than that, eons ago, was to avoid a
worker vacuuming a table that some other worker vacuumed more recently
than PGSTAT_STAT_INTERVAL.  I realize now that the semantics we really
want was something like "stats no older than XYZ" where the given value
is the timestamp at which we start checking; if we get anything newer
than that it would be okay, but we currently reject it because of lack
of a more appropriate API.  (If it takes more than PGSTAT_STAT_INTERVAL
to get the stats back, a regular backend would ask for fresher stats,
but to an autovac worker they would be good enough as long as they are
newer than its recheck start time.)

Nowadays we can probably disregard the whole issue, since starting a new
vacuum just after the prior one finished should not cause much stress to
the system thanks to the visibility map.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Heikki Linnakangas
Дата:
On 12/27/2014 12:16 AM, Alvaro Herrera wrote:
> Tom Lane wrote:
>
>> The argument that autovac workers need fresher stats than anything else
>> seems pretty dubious to start with.  Why shouldn't we simplify that down
>> to "they use PGSTAT_STAT_INTERVAL like everybody else"?
>
> The point of wanting fresher stats than that, eons ago, was to avoid a
> worker vacuuming a table that some other worker vacuumed more recently
> than PGSTAT_STAT_INTERVAL.  I realize now that the semantics we really
> want was something like "stats no older than XYZ" where the given value
> is the timestamp at which we start checking; if we get anything newer
> than that it would be okay, but we currently reject it because of lack
> of a more appropriate API.  (If it takes more than PGSTAT_STAT_INTERVAL
> to get the stats back, a regular backend would ask for fresher stats,
> but to an autovac worker they would be good enough as long as they are
> newer than its recheck start time.)
>
> Nowadays we can probably disregard the whole issue, since starting a new
> vacuum just after the prior one finished should not cause much stress to
> the system thanks to the visibility map.

Vacuuming is far from free, even if the visibility map says that most 
pages are visible to all: you still scan all indexes, if you remove any 
dead tuples at all.

- Heikki




Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tom Lane
Дата:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 12/27/2014 12:16 AM, Alvaro Herrera wrote:
>> Tom Lane wrote:
>>> The argument that autovac workers need fresher stats than anything else
>>> seems pretty dubious to start with.  Why shouldn't we simplify that down
>>> to "they use PGSTAT_STAT_INTERVAL like everybody else"?

>> The point of wanting fresher stats than that, eons ago, was to avoid a
>> worker vacuuming a table that some other worker vacuumed more recently
>> than PGSTAT_STAT_INTERVAL. ...
>> Nowadays we can probably disregard the whole issue, since starting a new
>> vacuum just after the prior one finished should not cause much stress to
>> the system thanks to the visibility map.

> Vacuuming is far from free, even if the visibility map says that most 
> pages are visible to all: you still scan all indexes, if you remove any 
> dead tuples at all.

With typical autovacuum settings, I kinda doubt that there's much value in
reducing the window for this problem from 500ms to 10ms.  As Alvaro says,
this was just a partial, kluge solution from the start --- if we're
worried about such duplicate vacuuming, we should undertake a real
solution that closes the window altogether.  In any case, timeouts
occurring inside autovacuum are not directly causing the buildfarm
failures, since autovacuum's log entries don't reflect into regression
outputs.  (It's possible that autovacuum's tight tolerance is contributing
to the failures by increasing the load on the stats collector, but I'm
not sure I believe that.)

To get back to that original complaint about buildfarm runs failing,
I notice that essentially all of those failures are coming from "wait
timeout" warnings reported by manual VACUUM commands.  Now, VACUUM itself
has no need to read the stats files.  What's actually causing these
messages is failure to get a timely response in pgstat_vacuum_stat().
So let me propose a drastic solution: let's dike out this bit in vacuum.c:
   /*    * Send info about dead objects to the statistics collector, unless we are    * in autovacuum --- autovacuum.c
doesthis for itself.    */   if ((vacstmt->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
pgstat_vacuum_stat();

This would have the effect of transferring all responsibility for
dead-stats-entry cleanup to autovacuum.  For ordinary users, I think
that'd be just fine.  It might be less fine though for people who
disable autovacuum, if there still are any.  To analyze the effects
on them, let's break down what pgstat_vacuum_stat() actually does:

1. It causes drops of whole databases from pgstat's tables.  I claim
we don't need to do that at this level.  There's no issue unless
the stats collector missed the PgStat_MsgDropdb message when the database
was dropped; and even if it did, the negative consequences of having
a dead DB's stats still lying around are pretty minimal since we split
up the stats files.  There will be no extra I/O except for one small
record in the global stats file.  So I think we can well afford to say
that with autovac off, such missed databases only get cleaned up the
next time an autovac is forced for antiwraparound.

2. Within the current database, it causes drops of pgstat entries for
dropped tables.  This would be a tad annoying if you have lots of
transient tables and no or minimal autovacuuming.  However, lots of
transient tables can be a pain point anyway, because we don't currently
have any mechanism other than pgstat_vacuum_stat() for cleaning up
per-table stats for dropped tables.  It seems like it might be time to do
something about that.  We could probably extend the PgStat_TableXactStatus
infrastructure to keep track of whether a table was created or deleted in
the current transaction, and solve the problem without very much new code.
Or maybe we could move the responsibility for reporting stale entries
into the code that reads the stats files for the stats views.

3. Within the current database, it causes drops of pgstat entries for
dropped functions, if you're tracking per-function execution stats.
Since that's not the default, maybe we could get away with saying that
we don't clean up such dead entries very often unless you're running
autovacuum.  I don't really want to propose building the infrastructure
to support dropping such entries retail.

Or, much more simply, we could conclude that it's not that important
if pgstat_vacuum_stat() is unable to get up-to-date stats, and rejigger
the code so that we don't bleat when the file-reading request comes from
that source.  This should be a back-patchable fix, whereas #2 above might
be a bit too complicated for that.

Thoughts?
        regards, tom lane



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Noah Misch
Дата:
On Thu, Dec 25, 2014 at 09:14:36PM +0100, Andres Freund wrote:
> My guess is that a checkpoint happened at that time. Maybe it'd be a
> good idea to make pg_regress start postgres with log_checkpoints
> enabled? My guess is that we'd find horrendous 'sync' times.

+1, independent of $SUBJECT.  How about log_autovacuum_min_duration=0, too?



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Robert Haas
Дата:
On Sat, Dec 27, 2014 at 7:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> On 12/27/2014 12:16 AM, Alvaro Herrera wrote:
>>> Tom Lane wrote:
>>>> The argument that autovac workers need fresher stats than anything else
>>>> seems pretty dubious to start with.  Why shouldn't we simplify that down
>>>> to "they use PGSTAT_STAT_INTERVAL like everybody else"?
>
>>> The point of wanting fresher stats than that, eons ago, was to avoid a
>>> worker vacuuming a table that some other worker vacuumed more recently
>>> than PGSTAT_STAT_INTERVAL. ...
>>> Nowadays we can probably disregard the whole issue, since starting a new
>>> vacuum just after the prior one finished should not cause much stress to
>>> the system thanks to the visibility map.
>
>> Vacuuming is far from free, even if the visibility map says that most
>> pages are visible to all: you still scan all indexes, if you remove any
>> dead tuples at all.
>
> With typical autovacuum settings, I kinda doubt that there's much value in
> reducing the window for this problem from 500ms to 10ms.  As Alvaro says,
> this was just a partial, kluge solution from the start --- if we're
> worried about such duplicate vacuuming, we should undertake a real
> solution that closes the window altogether.  In any case, timeouts
> occurring inside autovacuum are not directly causing the buildfarm
> failures, since autovacuum's log entries don't reflect into regression
> outputs.  (It's possible that autovacuum's tight tolerance is contributing
> to the failures by increasing the load on the stats collector, but I'm
> not sure I believe that.)
>
> To get back to that original complaint about buildfarm runs failing,
> I notice that essentially all of those failures are coming from "wait
> timeout" warnings reported by manual VACUUM commands.  Now, VACUUM itself
> has no need to read the stats files.  What's actually causing these
> messages is failure to get a timely response in pgstat_vacuum_stat().
> So let me propose a drastic solution: let's dike out this bit in vacuum.c:
>
>     /*
>      * Send info about dead objects to the statistics collector, unless we are
>      * in autovacuum --- autovacuum.c does this for itself.
>      */
>     if ((vacstmt->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
>         pgstat_vacuum_stat();
>
> This would have the effect of transferring all responsibility for
> dead-stats-entry cleanup to autovacuum.  For ordinary users, I think
> that'd be just fine.  It might be less fine though for people who
> disable autovacuum, if there still are any.

-1.  I don't think it's a good idea to inflict pain on people who want
to schedule their vacuums manually (and yes, there are some) to get
clean buildfarm runs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Dec 27, 2014 at 7:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This would have the effect of transferring all responsibility for
>> dead-stats-entry cleanup to autovacuum.  For ordinary users, I think
>> that'd be just fine.  It might be less fine though for people who
>> disable autovacuum, if there still are any.

> -1.  I don't think it's a good idea to inflict pain on people who want
> to schedule their vacuums manually (and yes, there are some) to get
> clean buildfarm runs.

Did you read the rest of it?
        regards, tom lane



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Robert Haas
Дата:
On Sat, Dec 27, 2014 at 8:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sat, Dec 27, 2014 at 7:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> This would have the effect of transferring all responsibility for
>>> dead-stats-entry cleanup to autovacuum.  For ordinary users, I think
>>> that'd be just fine.  It might be less fine though for people who
>>> disable autovacuum, if there still are any.
>
>> -1.  I don't think it's a good idea to inflict pain on people who want
>> to schedule their vacuums manually (and yes, there are some) to get
>> clean buildfarm runs.
>
> Did you read the rest of it?

Yeah...  since when do I not read your emails?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Noah Misch
Дата:
On Sat, Dec 27, 2014 at 07:55:05PM -0500, Tom Lane wrote:
> To get back to that original complaint about buildfarm runs failing,
> I notice that essentially all of those failures are coming from "wait
> timeout" warnings reported by manual VACUUM commands.  Now, VACUUM itself
> has no need to read the stats files.  What's actually causing these
> messages is failure to get a timely response in pgstat_vacuum_stat().
> So let me propose a drastic solution: let's dike out this bit in vacuum.c:
> 
>     /*
>      * Send info about dead objects to the statistics collector, unless we are
>      * in autovacuum --- autovacuum.c does this for itself.
>      */
>     if ((vacstmt->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
>         pgstat_vacuum_stat();
> 
> This would have the effect of transferring all responsibility for
> dead-stats-entry cleanup to autovacuum.  For ordinary users, I think
> that'd be just fine.  It might be less fine though for people who
> disable autovacuum, if there still are any.

Like Robert, I don't care for that.

> Or, much more simply, we could conclude that it's not that important
> if pgstat_vacuum_stat() is unable to get up-to-date stats, and rejigger
> the code so that we don't bleat when the file-reading request comes from
> that source.  This should be a back-patchable fix, whereas #2 above might
> be a bit too complicated for that.

This, however, feels like a clear improvement.  When every VACUUM does
pgstat_vacuum_stat(), it doesn't matter if any given VACUUM misses removing
entries that became obsolete in the preceding second or so.  In fact, rather
than merely not bleating when the wait times out, let's not wait at all.  I
don't favor VACUUM of a small table taking 12s because it spent 2s on the
table and 10s waiting to garbage collect a piping-hot stats file.



Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> On Sat, Dec 27, 2014 at 07:55:05PM -0500, Tom Lane wrote:
>> To get back to that original complaint about buildfarm runs failing,
>> I notice that essentially all of those failures are coming from "wait
>> timeout" warnings reported by manual VACUUM commands.  Now, VACUUM itself
>> has no need to read the stats files.  What's actually causing these
>> messages is failure to get a timely response in pgstat_vacuum_stat().
>> So let me propose a drastic solution: let's dike out this bit in vacuum.c:
>>
>> /*
>> * Send info about dead objects to the statistics collector, unless we are
>> * in autovacuum --- autovacuum.c does this for itself.
>> */
>> if ((vacstmt->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
>>     pgstat_vacuum_stat();
>>
>> This would have the effect of transferring all responsibility for
>> dead-stats-entry cleanup to autovacuum.  For ordinary users, I think
>> that'd be just fine.  It might be less fine though for people who
>> disable autovacuum, if there still are any.

> Like Robert, I don't care for that.

I obviously failed to make it clear enough that that was meant as a straw
man, lets-think-outside-the-box proposal.

>> Or, much more simply, we could conclude that it's not that important
>> if pgstat_vacuum_stat() is unable to get up-to-date stats, and rejigger
>> the code so that we don't bleat when the file-reading request comes from
>> that source.  This should be a back-patchable fix, whereas #2 above might
>> be a bit too complicated for that.

> This, however, feels like a clear improvement.  When every VACUUM does
> pgstat_vacuum_stat(), it doesn't matter if any given VACUUM misses removing
> entries that became obsolete in the preceding second or so.  In fact, rather
> than merely not bleating when the wait times out, let's not wait at all.  I
> don't favor VACUUM of a small table taking 12s because it spent 2s on the
> table and 10s waiting to garbage collect a piping-hot stats file.

I think that would be going too far: if an autovac wakes up in the dead of
night, it should not use an ancient stats file merely because no one else
has asked for stats recently.  But if it never waits at all, it'll be
at the mercy of whatever is already on disk.

After looking at the code, the minimum-change alternative would be more or
less as attached: first, get rid of the long-obsolete proposition that
autovacuum workers need fresher-than-usual stats; second, allow
pgstat_vacuum_stat to accept stats that are moderately stale (the number
given below allows them to be up to 50 seconds old); and third, suppress
wait-timeout warnings when the call is from pgstat_vacuum_stat.  The third
point is what we need to avoid unnecessary buildfarm failures.  The second
point addresses the idea that we don't need to stress the stats collector
too much for this.

            regards, tom lane

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 0cf4988..b030e1c 100644
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
*************** static void pgstat_write_statsfiles(bool
*** 259,265 ****
  static void pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry, bool permanent);
  static HTAB *pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep);
  static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
! static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);

  static bool pgstat_write_statsfile_needed(void);
--- 259,265 ----
  static void pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry, bool permanent);
  static HTAB *pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep);
  static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
! static void backend_read_statsfile(bool for_vacuum_stat);
  static void pgstat_read_current_status(void);

  static bool pgstat_write_statsfile_needed(void);
*************** pgstat_vacuum_stat(void)
*** 951,959 ****

      /*
       * If not done for this transaction, read the statistics collector stats
!      * file into some hash tables.
       */
!     backend_read_statsfile();

      /*
       * Read pg_database and make a list of OIDs of all existing databases
--- 951,962 ----

      /*
       * If not done for this transaction, read the statistics collector stats
!      * file into some hash tables.  We are willing to accept stats that are
!      * more stale than usual here.  (Since VACUUM never runs in a transaction
!      * block, this cannot result in accepting stale stats that would be used
!      * for other purposes later in the transaction.)
       */
!     backend_read_statsfile(true);

      /*
       * Read pg_database and make a list of OIDs of all existing databases
*************** pgstat_fetch_stat_dbentry(Oid dbid)
*** 2182,2188 ****
       * If not done for this transaction, read the statistics collector stats
       * file into some hash tables.
       */
!     backend_read_statsfile();

      /*
       * Lookup the requested database; return NULL if not found
--- 2185,2191 ----
       * If not done for this transaction, read the statistics collector stats
       * file into some hash tables.
       */
!     backend_read_statsfile(false);

      /*
       * Lookup the requested database; return NULL if not found
*************** pgstat_fetch_stat_tabentry(Oid relid)
*** 2213,2219 ****
       * If not done for this transaction, read the statistics collector stats
       * file into some hash tables.
       */
!     backend_read_statsfile();

      /*
       * Lookup our database, then look in its table hash table.
--- 2216,2222 ----
       * If not done for this transaction, read the statistics collector stats
       * file into some hash tables.
       */
!     backend_read_statsfile(false);

      /*
       * Lookup our database, then look in its table hash table.
*************** pgstat_fetch_stat_funcentry(Oid func_id)
*** 2265,2271 ****
      PgStat_StatFuncEntry *funcentry = NULL;

      /* load the stats file if needed */
!     backend_read_statsfile();

      /* Lookup our database, then find the requested function.  */
      dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
--- 2268,2274 ----
      PgStat_StatFuncEntry *funcentry = NULL;

      /* load the stats file if needed */
!     backend_read_statsfile(false);

      /* Lookup our database, then find the requested function.  */
      dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
*************** pgstat_fetch_stat_numbackends(void)
*** 2350,2356 ****
  PgStat_ArchiverStats *
  pgstat_fetch_stat_archiver(void)
  {
!     backend_read_statsfile();

      return &archiverStats;
  }
--- 2353,2359 ----
  PgStat_ArchiverStats *
  pgstat_fetch_stat_archiver(void)
  {
!     backend_read_statsfile(false);

      return &archiverStats;
  }
*************** pgstat_fetch_stat_archiver(void)
*** 2367,2373 ****
  PgStat_GlobalStats *
  pgstat_fetch_global(void)
  {
!     backend_read_statsfile();

      return &globalStats;
  }
--- 2370,2376 ----
  PgStat_GlobalStats *
  pgstat_fetch_global(void)
  {
!     backend_read_statsfile(false);

      return &globalStats;
  }
*************** done:
*** 4341,4349 ****
   * If not already done, read the statistics collector stats file into
   * some hash tables.  The results will be kept until pgstat_clear_snapshot()
   * is called (typically, at end of transaction).
   */
  static void
! backend_read_statsfile(void)
  {
      TimestampTz min_ts = 0;
      TimestampTz ref_ts = 0;
--- 4344,4357 ----
   * If not already done, read the statistics collector stats file into
   * some hash tables.  The results will be kept until pgstat_clear_snapshot()
   * is called (typically, at end of transaction).
+  *
+  * If for_vacuum_stat is true, we're doing this for pgstat_vacuum_stat,
+  * which does not really need up-to-date stats: it's only trying to flush
+  * obsolete stats entries.  So we accept older stats than we normally would,
+  * and don't bleat about failure to get fresh stats either.
   */
  static void
! backend_read_statsfile(bool for_vacuum_stat)
  {
      TimestampTz min_ts = 0;
      TimestampTz ref_ts = 0;
*************** backend_read_statsfile(void)
*** 4376,4385 ****
              /*
               * We set the minimum acceptable timestamp to PGSTAT_STAT_INTERVAL
               * msec before now.  This indirectly ensures that the collector
!              * needn't write the file more often than PGSTAT_STAT_INTERVAL. In
!              * an autovacuum worker, however, we want a lower delay to avoid
!              * using stale data, so we use PGSTAT_RETRY_DELAY (since the
!              * number of workers is low, this shouldn't be a problem).
               *
               * We don't recompute min_ts after sleeping, except in the
               * unlikely case that cur_ts went backwards.  So we might end up
--- 4384,4392 ----
              /*
               * We set the minimum acceptable timestamp to PGSTAT_STAT_INTERVAL
               * msec before now.  This indirectly ensures that the collector
!              * needn't write the file more often than PGSTAT_STAT_INTERVAL.
!              * For vacuum_stat purposes, though, up to 100 times the normal
!              * age is considered acceptable.
               *
               * We don't recompute min_ts after sleeping, except in the
               * unlikely case that cur_ts went backwards.  So we might end up
*************** backend_read_statsfile(void)
*** 4390,4398 ****
               * actually accept.
               */
              ref_ts = cur_ts;
!             if (IsAutoVacuumWorkerProcess())
                  min_ts = TimestampTzPlusMilliseconds(ref_ts,
!                                                      -PGSTAT_RETRY_DELAY);
              else
                  min_ts = TimestampTzPlusMilliseconds(ref_ts,
                                                       -PGSTAT_STAT_INTERVAL);
--- 4397,4405 ----
               * actually accept.
               */
              ref_ts = cur_ts;
!             if (for_vacuum_stat)
                  min_ts = TimestampTzPlusMilliseconds(ref_ts,
!                                                 -100 * PGSTAT_STAT_INTERVAL);
              else
                  min_ts = TimestampTzPlusMilliseconds(ref_ts,
                                                       -PGSTAT_STAT_INTERVAL);
*************** backend_read_statsfile(void)
*** 4441,4447 ****
          pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
      }

!     if (count >= PGSTAT_POLL_LOOP_COUNT)
          elog(WARNING, "pgstat wait timeout");

      /*
--- 4448,4454 ----
          pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
      }

!     if (count >= PGSTAT_POLL_LOOP_COUNT && !for_vacuum_stat)
          elog(WARNING, "pgstat wait timeout");

      /*

Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Noah Misch
Дата:
On Sun, Jan 18, 2015 at 04:09:29PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Sat, Dec 27, 2014 at 07:55:05PM -0500, Tom Lane wrote:
> >> Or, much more simply, we could conclude that it's not that important
> >> if pgstat_vacuum_stat() is unable to get up-to-date stats, and rejigger
> >> the code so that we don't bleat when the file-reading request comes from
> >> that source.  This should be a back-patchable fix, whereas #2 above might
> >> be a bit too complicated for that.
> 
> > This, however, feels like a clear improvement.  When every VACUUM does
> > pgstat_vacuum_stat(), it doesn't matter if any given VACUUM misses removing
> > entries that became obsolete in the preceding second or so.  In fact, rather
> > than merely not bleating when the wait times out, let's not wait at all.  I
> > don't favor VACUUM of a small table taking 12s because it spent 2s on the
> > table and 10s waiting to garbage collect a piping-hot stats file.
> 
> I think that would be going too far: if an autovac wakes up in the dead of
> night, it should not use an ancient stats file merely because no one else
> has asked for stats recently.  But if it never waits at all, it'll be
> at the mercy of whatever is already on disk.

Whoops; I underestimated the upper bound on time between stats file writes.

> After looking at the code, the minimum-change alternative would be more or
> less as attached: first, get rid of the long-obsolete proposition that
> autovacuum workers need fresher-than-usual stats; second, allow
> pgstat_vacuum_stat to accept stats that are moderately stale (the number
> given below allows them to be up to 50 seconds old); and third, suppress
> wait-timeout warnings when the call is from pgstat_vacuum_stat.  The third
> point is what we need to avoid unnecessary buildfarm failures.  The second
> point addresses the idea that we don't need to stress the stats collector
> too much for this.

Only #3 belongs in a minimum-change patch.  #1 and #2 solve and/or create
different problems and operate independently of #3.  A patch covering #3 alone
sounds attractive.



Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> On Sun, Jan 18, 2015 at 04:09:29PM -0500, Tom Lane wrote:
>> After looking at the code, the minimum-change alternative would be more or
>> less as attached: first, get rid of the long-obsolete proposition that
>> autovacuum workers need fresher-than-usual stats; second, allow
>> pgstat_vacuum_stat to accept stats that are moderately stale (the number
>> given below allows them to be up to 50 seconds old); and third, suppress
>> wait-timeout warnings when the call is from pgstat_vacuum_stat.  The third
>> point is what we need to avoid unnecessary buildfarm failures.  The second
>> point addresses the idea that we don't need to stress the stats collector
>> too much for this.

> Only #3 belongs in a minimum-change patch.  #1 and #2 solve and/or create
> different problems and operate independently of #3.  A patch covering #3 alone
> sounds attractive.

[ raised eyebrow... ] In your previous message, you were advocating *for*
a change that was more invasive than this one.  Why the change of heart?

In particular, I thought we'd already agreed to point #1.
        regards, tom lane



Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Noah Misch
Дата:
On Sun, Jan 18, 2015 at 07:02:54PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Sun, Jan 18, 2015 at 04:09:29PM -0500, Tom Lane wrote:
> >> After looking at the code, the minimum-change alternative would be more or
> >> less as attached: first, get rid of the long-obsolete proposition that
> >> autovacuum workers need fresher-than-usual stats; second, allow
> >> pgstat_vacuum_stat to accept stats that are moderately stale (the number
> >> given below allows them to be up to 50 seconds old); and third, suppress
> >> wait-timeout warnings when the call is from pgstat_vacuum_stat.  The third
> >> point is what we need to avoid unnecessary buildfarm failures.  The second
> >> point addresses the idea that we don't need to stress the stats collector
> >> too much for this.
> 
> > Only #3 belongs in a minimum-change patch.  #1 and #2 solve and/or create
> > different problems and operate independently of #3.  A patch covering #3 alone
> > sounds attractive.
> 
> [ raised eyebrow... ] In your previous message, you were advocating *for*
> a change that was more invasive than this one.  Why the change of heart?

I phrased that proposal based on a wrong belief that the collector writes the
stats file regularly in any case.  Vacuuming a 49s-old stats file without even
trying to get a fresh one might or might not improve PostgreSQL, but it's
dissimilar to what I had in mind.  I did not advocate for #1 at any point.

> In particular, I thought we'd already agreed to point #1.

Nope.  You and Alvaro ACKed it, then Heikki NACKed it.



Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Robert Haas
Дата:
On Sun, Jan 18, 2015 at 4:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> After looking at the code, the minimum-change alternative would be more or
> less as attached: first, get rid of the long-obsolete proposition that
> autovacuum workers need fresher-than-usual stats; second, allow
> pgstat_vacuum_stat to accept stats that are moderately stale (the number
> given below allows them to be up to 50 seconds old); and third, suppress
> wait-timeout warnings when the call is from pgstat_vacuum_stat.  The third
> point is what we need to avoid unnecessary buildfarm failures.  The second
> point addresses the idea that we don't need to stress the stats collector
> too much for this.

I think this is too much of a good thing.  I don't see any reason why
autovacuum's statistics need to be fresher than normal, but I also
don't see any reason why they need to be less fresh.  I think
suppressing the warning is a good idea, but why only suppress it for
autovacuum?  How about just knocking the level down to, say, DEBUG1?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Andres Freund
Дата:
On 2015-01-18 21:33:25 -0500, Robert Haas wrote:
> On Sun, Jan 18, 2015 at 4:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > After looking at the code, the minimum-change alternative would be more or
> > less as attached: first, get rid of the long-obsolete proposition that
> > autovacuum workers need fresher-than-usual stats; second, allow
> > pgstat_vacuum_stat to accept stats that are moderately stale (the number
> > given below allows them to be up to 50 seconds old); and third, suppress
> > wait-timeout warnings when the call is from pgstat_vacuum_stat.  The third
> > point is what we need to avoid unnecessary buildfarm failures.  The second
> > point addresses the idea that we don't need to stress the stats collector
> > too much for this.
> 
> I think this is too much of a good thing.  I don't see any reason why
> autovacuum's statistics need to be fresher than normal, but I also
> don't see any reason why they need to be less fresh.  I think
> suppressing the warning is a good idea, but why only suppress it for
> autovacuum?  How about just knocking the level down to, say, DEBUG1?

+1 for just using LOG - which by default does not end up on client
machines. In contrast to WARNING.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Robert Haas
Дата:
On Mon, Jan 19, 2015 at 7:09 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2015-01-18 21:33:25 -0500, Robert Haas wrote:
>> On Sun, Jan 18, 2015 at 4:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > After looking at the code, the minimum-change alternative would be more or
>> > less as attached: first, get rid of the long-obsolete proposition that
>> > autovacuum workers need fresher-than-usual stats; second, allow
>> > pgstat_vacuum_stat to accept stats that are moderately stale (the number
>> > given below allows them to be up to 50 seconds old); and third, suppress
>> > wait-timeout warnings when the call is from pgstat_vacuum_stat.  The third
>> > point is what we need to avoid unnecessary buildfarm failures.  The second
>> > point addresses the idea that we don't need to stress the stats collector
>> > too much for this.
>>
>> I think this is too much of a good thing.  I don't see any reason why
>> autovacuum's statistics need to be fresher than normal, but I also
>> don't see any reason why they need to be less fresh.  I think
>> suppressing the warning is a good idea, but why only suppress it for
>> autovacuum?  How about just knocking the level down to, say, DEBUG1?
>
> +1 for just using LOG - which by default does not end up on client
> machines. In contrast to WARNING.

Yeah, that doesn't seem like a bad idea, either.  The message seems
much more likely to be of interest to the DBA than the user; the DBA
can use the message to diagnose an overloaded I/O subsystem (which I
think is the usual cause of this problem) whereas the only point of
likely interest to the user is that their query ran slowly (which they
know without the message).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jan 19, 2015 at 7:09 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> On 2015-01-18 21:33:25 -0500, Robert Haas wrote:
>>> I think this is too much of a good thing.  I don't see any reason why
>>> autovacuum's statistics need to be fresher than normal, but I also
>>> don't see any reason why they need to be less fresh.  I think
>>> suppressing the warning is a good idea, but why only suppress it for
>>> autovacuum?  How about just knocking the level down to, say, DEBUG1?

>> +1 for just using LOG - which by default does not end up on client
>> machines. In contrast to WARNING.

> Yeah, that doesn't seem like a bad idea, either.  The message seems
> much more likely to be of interest to the DBA than the user; the DBA
> can use the message to diagnose an overloaded I/O subsystem (which I
> think is the usual cause of this problem) whereas the only point of
> likely interest to the user is that their query ran slowly (which they
> know without the message).

If that were true I'd agree with you, but it's false on its face.
A user who is actually examining the statistics might well want to
know if they're significantly out of date.  A very relevant example
is that I've always wondered how come, when we see buildfarm failures
in the "stats" regression test, they always appear in the form of
output differences that indicate that the session did not see the
expected stats update --- but there's never a timeout warning printed,
which indicates that whatever the cause is, it ain't that.

I'd be fine with changing the warning to LOG level rather than
suppressing it entirely for the specific case of pgstat_vacuum_stat;
but I do want to distinguish that case, wherein it's fair to conclude
that obsolete stats aren't too worrisome, from other cases where no
such conclusion is justified.
        regards, tom lane



Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Robert Haas
Дата:
On Mon, Jan 19, 2015 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If that were true I'd agree with you, but it's false on its face.
> A user who is actually examining the statistics might well want to
> know if they're significantly out of date.  A very relevant example
> is that I've always wondered how come, when we see buildfarm failures
> in the "stats" regression test, they always appear in the form of
> output differences that indicate that the session did not see the
> expected stats update --- but there's never a timeout warning printed,
> which indicates that whatever the cause is, it ain't that.

Sure, but nobody who is not a developer is going to care about that.
A typical user who sees "pgstat wait timeout", or doesn't, isn't going
to be able to make anything at all out of that.

> I'd be fine with changing the warning to LOG level rather than
> suppressing it entirely for the specific case of pgstat_vacuum_stat;
> but I do want to distinguish that case, wherein it's fair to conclude
> that obsolete stats aren't too worrisome, from other cases where no
> such conclusion is justified.

But I can live with this compromise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> Sure, but nobody who is not a developer is going to care about that.
> A typical user who sees "pgstat wait timeout", or doesn't, isn't going
> to be able to make anything at all out of that.

Possibly we need to improve the wording of that error message then.
When it was written, we really assumed that it was a can't-happen case
and so didn't spend much effort on it.  Perhaps it should become a
translatable ereport phrased like "WARNING: using stale statistics
instead of current ones because stats collector is not responding".

(I'm also wondering if it'd make sense to expose the stats timestamp
as a callable function, so that the case could be dealt with
programmatically as well.  But that's future-feature territory.)

>> I'd be fine with changing the warning to LOG level rather than
>> suppressing it entirely for the specific case of pgstat_vacuum_stat;
>> but I do want to distinguish that case, wherein it's fair to conclude
>> that obsolete stats aren't too worrisome, from other cases where no
>> such conclusion is justified.

> But I can live with this compromise.

Is that OK with everybody?  Going once, going twice ...
        regards, tom lane



Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Andres Freund
Дата:
On 2015-01-19 11:16:09 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> >> I'd be fine with changing the warning to LOG level rather than
> >> suppressing it entirely for the specific case of pgstat_vacuum_stat;
> >> but I do want to distinguish that case, wherein it's fair to conclude
> >> that obsolete stats aren't too worrisome, from other cases where no
> >> such conclusion is justified.
> 
> > But I can live with this compromise.
> 
> Is that OK with everybody?  Going once, going twice ...

I can live with the compromise as well, but I'd rather change it to
always be of LOG priority. LOG is more likely to end up in the log and
that's where it's actually likely to be noticed.  In most of the cases
WARNINGs going to the client won't be noticed as this error is much more
likely on servers with a high load caused by programs than during
interactive use.

> > Sure, but nobody who is not a developer is going to care about that.
> > A typical user who sees "pgstat wait timeout", or doesn't, isn't going
> > to be able to make anything at all out of that.
> 
> Possibly we need to improve the wording of that error message then.
> When it was written, we really assumed that it was a can't-happen case
> and so didn't spend much effort on it.  Perhaps it should become a
> translatable ereport phrased like "WARNING: using stale statistics
> instead of current ones because stats collector is not responding".

Yes, that seems like a good message improvement.

It's not like making it LOG makes the message invisible...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Robert Haas
Дата:
On Mon, Jan 19, 2015 at 11:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Sure, but nobody who is not a developer is going to care about that.
>> A typical user who sees "pgstat wait timeout", or doesn't, isn't going
>> to be able to make anything at all out of that.
>
> Possibly we need to improve the wording of that error message then.
> When it was written, we really assumed that it was a can't-happen case
> and so didn't spend much effort on it.  Perhaps it should become a
> translatable ereport phrased like "WARNING: using stale statistics
> instead of current ones because stats collector is not responding".

I'm still not completely convinced it deserves to be a WARNING, but I
definitely think turning it into a translatable error message is the
right call.  Calling this a "can't happen" case is clearly ridiculous
at this point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2015-01-19 11:16:09 -0500, Tom Lane wrote:
>> Possibly we need to improve the wording of that error message then.
>> When it was written, we really assumed that it was a can't-happen case
>> and so didn't spend much effort on it.  Perhaps it should become a
>> translatable ereport phrased like "WARNING: using stale statistics
>> instead of current ones because stats collector is not responding".

> Yes, that seems like a good message improvement.

> It's not like making it LOG makes the message invisible...

Uh, yes it does.  So far as the user is concerned anyway.  The entire
point of this discussion is to prevent the message from showing up in
buildfarm output, remember?
        regards, tom lane



Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Andres Freund
Дата:
On 2015-01-19 11:28:41 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2015-01-19 11:16:09 -0500, Tom Lane wrote:
> >> Possibly we need to improve the wording of that error message then.
> >> When it was written, we really assumed that it was a can't-happen case
> >> and so didn't spend much effort on it.  Perhaps it should become a
> >> translatable ereport phrased like "WARNING: using stale statistics
> >> instead of current ones because stats collector is not responding".
> 
> > Yes, that seems like a good message improvement.
> 
> > It's not like making it LOG makes the message invisible...
> 
> Uh, yes it does.  So far as the user is concerned anyway.

Sure, but the log isn't invisible. As mentioned one paragraph above, I
don't think it's likely to ever be noticed in the client code in the
cases where it happens in production.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Robert Haas
Дата:
On Mon, Jan 19, 2015 at 11:30 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2015-01-19 11:28:41 -0500, Tom Lane wrote:
>> Andres Freund <andres@2ndquadrant.com> writes:
>> > On 2015-01-19 11:16:09 -0500, Tom Lane wrote:
>> >> Possibly we need to improve the wording of that error message then.
>> >> When it was written, we really assumed that it was a can't-happen case
>> >> and so didn't spend much effort on it.  Perhaps it should become a
>> >> translatable ereport phrased like "WARNING: using stale statistics
>> >> instead of current ones because stats collector is not responding".
>>
>> > Yes, that seems like a good message improvement.
>>
>> > It's not like making it LOG makes the message invisible...
>>
>> Uh, yes it does.  So far as the user is concerned anyway.
>
> Sure, but the log isn't invisible. As mentioned one paragraph above, I
> don't think it's likely to ever be noticed in the client code in the
> cases where it happens in production.

Yes, that is my feeling as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jan 19, 2015 at 11:30 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> Sure, but the log isn't invisible. As mentioned one paragraph above, I
>> don't think it's likely to ever be noticed in the client code in the
>> cases where it happens in production.

> Yes, that is my feeling as well.

Meh.  I still don't like it, but I guess I'm outvoted.  Unless there are
further votes, what we have at this point is just:

-        elog(WARNING, "pgstat wait timeout");
+        ereport(LOG, (errmsg("using stale statistics instead of current ones because stats collector is not
responding")));

with no conditionality for pgstat_vacuum_stat vs. other callers.
        regards, tom lane



Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Noah Misch
Дата:
On Mon, Jan 19, 2015 at 12:05:01PM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Jan 19, 2015 at 11:30 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> Sure, but the log isn't invisible. As mentioned one paragraph above, I
> >> don't think it's likely to ever be noticed in the client code in the
> >> cases where it happens in production.
> 
> > Yes, that is my feeling as well.
> 
> Meh.  I still don't like it, but I guess I'm outvoted.  Unless there are
> further votes, what we have at this point is just:
> 
> -        elog(WARNING, "pgstat wait timeout");
> +        ereport(LOG, (errmsg("using stale statistics instead of current ones because stats collector is not
responding")));
> 
> with no conditionality for pgstat_vacuum_stat vs. other callers.

That is satisfactory to me, too.



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tomas Vondra
Дата:
On 25.12.2014 22:28, Tomas Vondra wrote:
> On 25.12.2014 21:14, Andres Freund wrote:
>
>> That's indeed odd. Seems to have been lost when the statsfile was
>> split into multiple files. Alvaro, Tomas?
> 
> The goal was to keep the logic as close to the original as possible.
> IIRC there were "pgstat wait timeout" issues before, and in most cases
> the conclusion was that it's probably because of overloaded I/O.
> 
> But maybe there actually was another bug, and it's entirely possible
> that the split introduced a new one, and that's what we're seeing now.
> The strange thing is that the split happened ~2 years ago, which is
> inconsistent with the sudden increase of this kind of issues. So maybe
> something changed on that particular animal (a failing SD card causing
> I/O stalls, perhaps)?
> 
> Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
> and analyze the issue locally. But that won't happen until January.

I've tried to reproduce this on my Raspberry PI 'machine' and it's not
very difficult to trigger this. About 7 out of 10 'make check' runs fail
because of 'pgstat wait timeout'.

All the occurences I've seen were right after some sort of VACUUM
(sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time
looked something like this:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
mmcblk0           0.00    75.00    0.00    8.00     0.00    36.00
9.00     5.73 15633.75    0.00 15633.75 125.00 100.00

So pretty terrible (this is a Class 4 SD card, supposedly able to handle
4 MB/s). If hamster had faulty SD card, it might have been much worse, I
guess.

This of course does not prove the absence of a bug - I plan to dig into
this a bit more. Feel free to point out some suspicious scenarios that
might be worth reproducing and analyzing.

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Michael Paquier
Дата:
On Wed, Jan 21, 2015 at 1:08 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 25.12.2014 22:28, Tomas Vondra wrote:
>> On 25.12.2014 21:14, Andres Freund wrote:
>>
>>> That's indeed odd. Seems to have been lost when the statsfile was
>>> split into multiple files. Alvaro, Tomas?
>>
>> The goal was to keep the logic as close to the original as possible.
>> IIRC there were "pgstat wait timeout" issues before, and in most cases
>> the conclusion was that it's probably because of overloaded I/O.
>>
>> But maybe there actually was another bug, and it's entirely possible
>> that the split introduced a new one, and that's what we're seeing now.
>> The strange thing is that the split happened ~2 years ago, which is
>> inconsistent with the sudden increase of this kind of issues. So maybe
>> something changed on that particular animal (a failing SD card causing
>> I/O stalls, perhaps)?
>>
>> Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
>> and analyze the issue locally. But that won't happen until January.
>
> I've tried to reproduce this on my Raspberry PI 'machine' and it's not
> very difficult to trigger this. About 7 out of 10 'make check' runs fail
> because of 'pgstat wait timeout'.
>
> All the occurences I've seen were right after some sort of VACUUM
> (sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time
> looked something like this:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> mmcblk0           0.00    75.00    0.00    8.00     0.00    36.00
> 9.00     5.73 15633.75    0.00 15633.75 125.00 100.00
>
> So pretty terrible (this is a Class 4 SD card, supposedly able to handle
> 4 MB/s). If hamster had faulty SD card, it might have been much worse, I
> guess.
By experience, a class 10 is at least necessary, with a minimum amount
of memory to minimize the apparition of those warnings, hamster having
now a 8GB class 10 card.
-- 
Michael



Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Tomas Vondra
Дата:
On 21.1.2015 00:38, Michael Paquier wrote:
> On Wed, Jan 21, 2015 at 1:08 AM, Tomas Vondra
>
>> I've tried to reproduce this on my Raspberry PI 'machine' and it's not
>> very difficult to trigger this. About 7 out of 10 'make check' runs fail
>> because of 'pgstat wait timeout'.
>>
>> All the occurences I've seen were right after some sort of VACUUM
>> (sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time
>> looked something like this:
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> mmcblk0           0.00    75.00    0.00    8.00     0.00    36.00
>> 9.00     5.73 15633.75    0.00 15633.75 125.00 100.00
>>
>> So pretty terrible (this is a Class 4 SD card, supposedly able to
>> handle 4 MB/s). If hamster had faulty SD card, it might have been
>> much worse, I guess.
>
> By experience, a class 10 is at least necessary, with a minimum
> amount of memory to minimize the apparition of those warnings,
> hamster having now a 8GB class 10 card.

Well, my goal was exactly to produce those warnings ;-) and see if I can
identify some strange cases. That's why I chose just class 4. But even
then it produces rather low number of those warnings (one or two per
check run), and mostly at the expected places with significant I/O
overload. So I'm not any wiser :-(

regards
Tomas





Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Matt Kelly
Дата:
Sure, but nobody who is not a developer is going to care about that.
A typical user who sees "pgstat wait timeout", or doesn't, isn't going
to be able to make anything at all out of that.

As a user, I wholeheartedly disagree.

That warning helped me massively in diagnosing an unhealthy database server in the past at TripAdvisor (i.e. high end server class box, not a raspberry pie).  I have realtime monitoring that looks at pg_stat_database at regular intervals particularly for the velocity of change of xact_commit and xact_rollback columns, similar to how check_postgres does it. https://github.com/bucardo/check_postgres/blob/master/check_postgres.pl#L4234

When one of those servers was unhealthy, it stopped reporting statistics for 30 seconds+ at a time.  My dashboard which polled far more frequently than that indicated the server was normally processing 0 tps with intermittent spikes. I went directly onto the server and sampled pg_stat_database.  That warning was the only thing that directly indicated that the statistics collector was not to be trusted.  It obviously was a victim of what was going on in the server, but its pretty important to know when your methods for measuring server health are lying to you.  The spiky TPS at first glance appears like some sort of live lock, not just that the server is overloaded.

Now, I know: 0 change in stats = collector broken.  Rereading the docks,

 Also, the collector itself emits a new report at most once per PGSTAT_STAT_INTERVAL milliseconds (500 ms unless altered while building the server).

Without context this merely reads: "We sleep for 500ms, plus the time to write the file, plus whenever the OS decides to enforce the timer interrupt... so like 550-650ms."  It doesn't read, "When server is unhealthy, but _still_ serving queries, the stats collector might not be able to keep up and will just stop reporting stats all together."

I think the warning is incredibly valuable.  Along those lines I'd also love to see a pg_stat_snapshot_timestamp() for monitoring code to use to determine if its using a stale snapshot, as well as helping to smooth graphs of the statistics that are based on highly granular snapshotting.

- Matt Kelly

Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

От
Andres Freund
Дата:
On 2015-01-21 22:43:03 -0500, Matt Kelly wrote:
> >
> > Sure, but nobody who is not a developer is going to care about that.
> > A typical user who sees "pgstat wait timeout", or doesn't, isn't going
> > to be able to make anything at all out of that.
> 
> 
> As a user, I wholeheartedly disagree.

Note that that the change we discussed wasn't removing the message. It
was changing the log level from WARNING to LOG. Which means the change
is not going to the client anymore, but still to the server log (perhaps
surprisingly, the likelihood for the latter actually increases).

> I think the warning is incredibly valuable.  Along those lines I'd also
> love to see a pg_stat_snapshot_timestamp() for monitoring code to use to
> determine if its using a stale snapshot, as well as helping to smooth
> graphs of the statistics that are based on highly granular snapshotting.

I can see that being useful.

Greetings,

Andres Freund