Обсуждение: There's some sort of race condition with the new FSM stuff

Поиск
Список
Период
Сортировка

There's some sort of race condition with the new FSM stuff

От
Tom Lane
Дата:
Two different buildfarm machines are currently showing the same failure:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=kudu&dt=2008-10-13%2015:30:00
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dragonfly&dt=2008-10-13%2016:30:01

The postmaster log in each case shows

ERROR:  could not fsync segment 0 of relation 1663/16384/29270/1: No such file or directory
ERROR:  checkpoint request failed
HINT:  Consult recent messages in the server log for details.
STATEMENT:  DROP TABLESPACE testspace;
ERROR:  could not fsync segment 0 of relation 1663/16384/29270/1: No such file or directory
ERROR:  checkpoint request failed
HINT:  Consult recent messages in the server log for details.
STATEMENT:  DROP TABLESPACE testspace;

which looks like an fsync request has been allowed to hang around too
long, ie, after its file is already gone.  Since /1 indicates the FSM
fork, I suppose this has been introduced by the new FSM code.  I haven't
dug any more deeply than that though.
        regards, tom lane


Re: There's some sort of race condition with the new FSM stuff

От
Tom Lane
Дата:
I wrote:
> Two different buildfarm machines are currently showing the same failure:
> ERROR:  could not fsync segment 0 of relation 1663/16384/29270/1: No such file or directory
> ERROR:  checkpoint request failed

Some tests show that when the serial regression tests are run in a
freshly initdb'd installation, HEAD assigns OID 29270 to "bmscantest"
in the bitmapops test.  So that's been dropped some time before the
failure occurs; which means that this isn't a narrow-window race
condition; which raises the question of why we're not seeing it on more
machines.  I notice now that kudu and dragonfly are actually the same
machine ... could this be an OS-specific problem?  Kris, has there been
any system-software change on that machine recently?
        regards, tom lane


Re: There's some sort of race condition with the new FSM stuff

От
Kris Jurka
Дата:

On Mon, 13 Oct 2008, Tom Lane wrote:

> I notice now that kudu and dragonfly are actually the same machine ... 
> could this be an OS-specific problem?  Kris, has there been any 
> system-software change on that machine recently?

This is a VM that I haven't touched in some time.  It was turned off after 
a host kernel upgrade no longer allowed vmware to work on it.  I recently 
turned it back on after switching from vmware workstation 5.5 to vmware 
server 2.0.  On the VM itself the only change I've made was to switch the 
timezone from Mountain to Pacific.

http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=dragonfly&br=HEAD

Kris Jurka


Re: There's some sort of race condition with the new FSM stuff

От
Heikki Linnakangas
Дата:
Tom Lane wrote:
> I wrote:
>> Two different buildfarm machines are currently showing the same failure:
>> ERROR:  could not fsync segment 0 of relation 1663/16384/29270/1: No such file or directory
>> ERROR:  checkpoint request failed
> 
> Some tests show that when the serial regression tests are run in a
> freshly initdb'd installation, HEAD assigns OID 29270 to "bmscantest"
> in the bitmapops test.  So that's been dropped some time before the
> failure occurs; which means that this isn't a narrow-window race
> condition; which raises the question of why we're not seeing it on more
> machines.  I notice now that kudu and dragonfly are actually the same
> machine ... could this be an OS-specific problem?  Kris, has there been
> any system-software change on that machine recently?

Must be because of this little missing line here:

--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -1012,6 +1012,7 @@ ForwardFsyncRequest(RelFileNode rnode, ForkNumber 
forknum, BlockNumber segno)        }        request = &BgWriterShmem->requests[BgWriterShmem->num_requests++];
request->rnode= rnode;
 
+       request->forknum = forknum;        request->segno = segno;        LWLockRelease(BgWriterCommLock);
returntrue;
 

Most fsync requests are for main fork, not FSM, so I guess what usually 
happens because of that bug is that the we skip the fsync on the FSM, 
which is why we haven't noticed before.

I still wonder, though, why we're seeing the error consistently on kudu, 
and not on any other animal. Perhaps the forknum field that's left 
uninitialized gets a different value there than on other platforms.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: There's some sort of race condition with the new FSM stuff

От
Tom Lane
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> I still wonder, though, why we're seeing the error consistently on kudu, 
> and not on any other animal. Perhaps the forknum field that's left 
> uninitialized gets a different value there than on other platforms.

Hmm ... AFAICS this mistake would mean that no forknum field of the
requests[] array ever gets set at all, so they would stay at whatever
the virgin value in the shmem segment had been.  Perhaps Solaris doesn't
guarantee that a shared memory block starts out as zeroes?   But if
there were random garbage in the forknum fields you'd expect rather more
failures than are observed.
        regards, tom lane


Re: There's some sort of race condition with the new FSM stuff

От
Zdenek Kotala
Дата:
Tom Lane napsal(a):
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> I still wonder, though, why we're seeing the error consistently on kudu, 
>> and not on any other animal. Perhaps the forknum field that's left 
>> uninitialized gets a different value there than on other platforms.
> 
> Hmm ... AFAICS this mistake would mean that no forknum field of the
> requests[] array ever gets set at all, so they would stay at whatever
> the virgin value in the shmem segment had been.  Perhaps Solaris doesn't
> guarantee that a shared memory block starts out as zeroes?  

For security reason any OS should clean memory pages before process 
first touches them.
    Zdenek


Re: There's some sort of race condition with the new FSM stuff

От
Heikki Linnakangas
Дата:
Zdenek Kotala wrote:
> Tom Lane napsal(a):
>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>>> I still wonder, though, why we're seeing the error consistently on 
>>> kudu, and not on any other animal. Perhaps the forknum field that's 
>>> left uninitialized gets a different value there than on other platforms.
>>
>> Hmm ... AFAICS this mistake would mean that no forknum field of the
>> requests[] array ever gets set at all, so they would stay at whatever
>> the virgin value in the shmem segment had been.  Perhaps Solaris doesn't
>> guarantee that a shared memory block starts out as zeroes?  
> 
> For security reason any OS should clean memory pages before process 
> first touches them.

Yeah. But it doesn't necessarily need to fill them with zeros, any 
garbage will do.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: There's some sort of race condition with the new FSM stuff

От
Tom Lane
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Zdenek Kotala wrote:
>> Tom Lane napsal(a):
>>> Hmm ... AFAICS this mistake would mean that no forknum field of the
>>> requests[] array ever gets set at all, so they would stay at whatever
>>> the virgin value in the shmem segment had been.  Perhaps Solaris doesn't
>>> guarantee that a shared memory block starts out as zeroes?  
>> 
>> For security reason any OS should clean memory pages before process 
>> first touches them.

> Yeah. But it doesn't necessarily need to fill them with zeros, any 
> garbage will do.

Yeah, but the observed symptoms seem to indicate that the fill is mostly
zeroes with a very occasional one.  This seems less than probable.

The only theory I've thought of that seems to fit the facts is that
someplace we have a wild store that is clobbering that particular word.
Which is a pretty unpleasant thought.
        regards, tom lane


Re: There's some sort of race condition with the new FSM stuff

От
Heikki Linnakangas
Дата:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Zdenek Kotala wrote:
>>> For security reason any OS should clean memory pages before process 
>>> first touches them.
> 
>> Yeah. But it doesn't necessarily need to fill them with zeros, any 
>> garbage will do.
> 
> Yeah, but the observed symptoms seem to indicate that the fill is mostly
> zeroes with a very occasional one.  This seems less than probable.
> 
> The only theory I've thought of that seems to fit the facts is that
> someplace we have a wild store that is clobbering that particular word.
> Which is a pretty unpleasant thought.

The bug only affected fsync/forget requests that are forwarded from 
backends, not the ones that bgwriter puts into the hash table itself. If 
the fsync request is queued by the bgwriter itself, and the forget 
request comes from a backend, you get the error that we saw, I think. I 
noted that kudu has a small shared_buffers setting, 5.6 MB, compared to 
most buildfarm members, which might explain different behavior wrt. 
which buffers are written by backends and which are written by bgwriter.

(kudu is green now BTW)

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: There's some sort of race condition with the new FSM stuff

От
Tom Lane
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> The bug only affected fsync/forget requests that are forwarded from 
> backends, not the ones that bgwriter puts into the hash table itself.

Oh, of course.  So the actual sequence of events was:
* bgwriter queues an fsync request for a FSM fork, after  writing a dirty FSM buffer.* backend drops table, sends
forgetrequest for FSM fork,  which the bug converts into an extra forget for the main fork.* at checkpoint, we got
trouble.

> I noted that kudu has a small shared_buffers setting, 5.6 MB, compared to 
> most buildfarm members, which might explain different behavior wrt. 
> which buffers are written by backends and which are written by bgwriter.

Yup, that must have been the determining factor.  That would also
explain a certain amount of nondeterminism in the bug's manifestation,
which also squares with the buildfarm observations.

Okay, I feel better now.
        regards, tom lane