Обсуждение: FW: Intermittent Stats Failiures: firefly: HEAD

Поиск
Список
Период
Сортировка

FW: Intermittent Stats Failiures: firefly: HEAD

От
"Larry Rosenman"
Дата:
Reposting, since it seems to not have made it :(




Larry Rosenman wrote:
> Ever since the stats collector changes, I've seen intermittent
> failures
> on 'firefly' in the buildfarm.  This is my machine.
>
> There is one posted now, and the history has them as well.
>
> Could someone look and tell me if I need to tweak something, or is
> this 'expected'?
>
> http://www.pgbuildfarm.org/cgi-bin/show_history.pl?nm=firefly&br=HEAD
>
> Thanks!
> LER



--
Larry Rosenman
Database Support Engineer

PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX  78727-6531

Tel: 512.231.6173
Fax: 512.231.6597
Email: Larry.Rosenman@pervasive.com
Web: www.pervasive.com


Re: FW: Intermittent Stats Failiures: firefly: HEAD

От
Tom Lane
Дата:
"Larry Rosenman" <lrosenman@pervasive.com> writes:
>> Ever since the stats collector changes, I've seen intermittent
>> failures on 'firefly' in the buildfarm.

Yeah, you're not the only one.  We haven't figured out what's causing
them.  But while fooling with Joachim Wieland's pg_sleep patch just
now, I was struck by an idea: on machines where select() is
interruptible by signals, it is possible that the do_sleep() function
won't wait as long as specified.  This could easily cause the observed
regression diff, if the test doesn't wait long enough for the stats
collector to update the stats.

It's not immediately obvious what signal might be arriving at the
backend, given that there's not supposed to be any other database
operations going on.  It's barely possible that a SIGUSR1 (sinval
catchup interrupt) could be generated here, if one of the previous
group of tests were still in the process of shutting down its backend.
So I'm not sure about this theory ... but at least it's a theory.

If the theory is correct then the just-committed pg_sleep patch
should provide a permanent solution.  We'll have to wait and see
if we see any more of those errors.

If we don't see any more such errors in HEAD for awhile, it might
be worth back-patching the implementation of pg_sleep into the
older branches' regression tests, so we don't keep seeing intermittent
regression failures in them either.
        regards, tom lane