Обсуждение: BUG #5915: OldSerXidAdd inflates pg_serial too much

Поиск
Список
Период
Сортировка

BUG #5915: OldSerXidAdd inflates pg_serial too much

От
"YAMAMOTO Takashi"
Дата:
The following bug has been logged online:

Bug reference:      5915
Logged by:          YAMAMOTO Takashi
Email address:      yamt@mwd.biglobe.ne.jp
PostgreSQL version: 9.1devel
Operating system:   NetBSD
Description:        OldSerXidAdd inflates pg_serial too much
Details:

a seemingly wrong math in OldSerXidAdd makes it busy writing zeros
to pg_serial.


diff --git a/src/backend/storage/lmgr/predicate.c
b/src/backend/storage/lmgr/predicate.c
index aa657fa..297508b 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -755,7 +755,7 @@ OldSerXidAdd(TransactionId xid, SerCommitSeqNo
minConflictCommitSeqNo)
     {
         page = OldSerXidPage(tailXid);
         oldSerXidControl->tailSegment = OldSerXidSegment(page);
-        page = oldSerXidControl->tailSegment * OLDSERXID_ENTRIESPERPAGE;
+        page = oldSerXidControl->tailSegment * SLRU_PAGES_PER_SEGMENT;
         isNewPage = true;
     }
     else

Re: BUG #5915: OldSerXidAdd inflates pg_serial too much

От
"Kevin Grittner"
Дата:
"YAMAMOTO Takashi" <yamt@mwd.biglobe.ne.jp> wrote:

> a seemingly wrong math in OldSerXidAdd makes it busy writing zeros
> to pg_serial.
>
> [patch]

Your fix looks correct to me -- we want to get from a SLRU segment
number to the first page of that segment, so SLRU_PAGES_PER_SEGMENT
is the right multiplier.

Thanks!

While I was looking at it, I noticed some obsolete comment lines
which should be removed.  Trivial patch attached.

-Kevin


Вложения

Re: BUG #5915: OldSerXidAdd inflates pg_serial too much

От
Heikki Linnakangas
Дата:
On 04.03.2011 21:26, Kevin Grittner wrote:
> "YAMAMOTO Takashi"<yamt@mwd.biglobe.ne.jp>  wrote:
>
>> a seemingly wrong math in OldSerXidAdd makes it busy writing zeros
>> to pg_serial.
>>
>> [patch]
>
> Your fix looks correct to me -- we want to get from a SLRU segment
> number to the first page of that segment, so SLRU_PAGES_PER_SEGMENT
> is the right multiplier.
>
> Thanks!
>
> While I was looking at it, I noticed some obsolete comment lines
> which should be removed.  Trivial patch attached.

Hmm, if I'm reading that function correctly, it makes sure that when
headPage < 0 (which implies that the SLRU has not been used since
startup, right? ), it zeroes out the whole SLRU file, not only the
currently active region. At least pg_subtrans doesn't seem to bother
with that, StartupSubTrans only zeroes the active region.

I wonder if we should move the responsibility of truncating the SLRU to
checkpoint. At the moment, it's done in OldSerXidSetActiveSerXmin(),
while the callers are holding SerializableXactHashLock in exclusive
mode. While it'll probably go quickly in practice, it still seems like
there's a risk of stalling all new transactions for a few seconds while
that happens.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #5915: OldSerXidAdd inflates pg_serial too much

От
"Kevin Grittner"
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

> Hmm, if I'm reading that function correctly, it makes sure that
> when headPage < 0 (which implies that the SLRU has not been used
> since startup, right? )

No, look at the bottom of OldSerXidSetActiveSerXmin() -- cleanup of
segments is done incrementally, but when it finds it has cleaned up
*everything* it sets headPage = -1.  I believe that should only
happen when the xmin has moved past the end of the segment.

> it zeroes out the whole SLRU file, not only the currently active
> region.

That's not the intent.  If it's doing that, it's accidental.  It is
trying to zero from the start of a segment, if a new one is needed.

I'll look at this some more to see if I can spot something I'm
missing, but if you see something to indicate it's not working as I
describe above, please point me in the right direction.

-Kevin

Re: BUG #5915: OldSerXidAdd inflates pg_serial too much

От
Heikki Linnakangas
Дата:
On 04.03.2011 22:33, Kevin Grittner wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  wrote:
>
>> Hmm, if I'm reading that function correctly, it makes sure that
>> when headPage<  0 (which implies that the SLRU has not been used
>> since startup, right? )
>
> No, look at the bottom of OldSerXidSetActiveSerXmin() -- cleanup of
> segments is done incrementally, but when it finds it has cleaned up
> *everything* it sets headPage = -1.  I believe that should only
> happen when the xmin has moved past the end of the segment.
>
>> it zeroes out the whole SLRU file, not only the currently active
>> region.
>
> That's not the intent.  If it's doing that, it's accidental.  It is
> trying to zero from the start of a segment, if a new one is needed.

Sorry, I was not entirely clear. It clears all pages from the start of
the segment, up to the last currently active page, even if the active
region from tailXid to headXid only spans a couple of pages somewhere in
the middle of the segment. It seems pointless to clear the pages in the
beginning of the segment in that case.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #5915: OldSerXidAdd inflates pg_serial too much

От
"Kevin Grittner"
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

> Sorry, I was not entirely clear. It clears all pages from the
> start of the segment, up to the last currently active page, even
> if the active region from tailXid to headXid only spans a couple
> of pages somewhere in the middle of the segment. It seems
> pointless to clear the pages in the beginning of the segment in
> that case.

Oh, I see now.  Somehow I thought I needed to clear pages from the
start of the segment to the active portion.  If that's not needed,
it's easy enough to adjust the calculations there to start with the
first active page.  Do you want me to do that?

-Kevin

Re: BUG #5915: OldSerXidAdd inflates pg_serial too much

От
"Kevin Grittner"
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

> I wonder if we should move the responsibility of truncating the
> SLRU to checkpoint. At the moment, it's done in
> OldSerXidSetActiveSerXmin(), while the callers are holding
> SerializableXactHashLock in exclusive mode. While it'll probably
> go quickly in practice, it still seems like there's a risk of
> stalling all new transactions for a few seconds while that
> happens.

I don't think it can stall new transactions unless they are
DEFERRABLE, but it might stall the COMMIT of the oldest serializable
transaction -- if I'm thinking it through correctly.  (If not, I
have another energy drink I can crack open.)

Here's my thinking:

The call in RegisterSerializableTransactionInt() to
OldSerXidSetActiveSerXmin() only happens if there are no active
serializable transactions, so it can't fall into the SLRU truncation
code (I think).  I'm not worried about
predicatelock_twophase_recover() because it will only call the
function during startup with no transactions active, so it can't hit
the cleanup code.  That leaves SetNewSxactGlobalXmin(), which is
only called from ReleasePredicateLocks(), and only when the last
transaction with the low xmin is being cleaned up.  Now, that's only
called around completion of a transaction except when starting a
SERIALIZABLE READ ONLY DEFERRABLE transaction; and if you have
explicitly requested a DEFERRABLE transaction you presumably won't
be astonished by a delay in its startup.

So, I'm not arguing that we shouldn't move the truncation to
checkpoint time, but I think what we're protecting against is disk
I/O at COMMIT time, not transaction startup.  Presumably disk I/O at
that point would be less surprising, although perhaps the fact that
it can happen on a read only transaction might surprise someone.  Of
course, moving any possible delay from this to a background process
seems like a good thing in general; I just don't know whether it's
worth doing right now, versus adding to a list of possible
enhancements.  If you're confident it's safe enough, I'm game.

-Kevin

Re: BUG #5915: OldSerXidAdd inflates pg_serial too much

От
"Kevin Grittner"
Дата:
I wrote:

> I think what we're protecting against is disk I/O at COMMIT time,
> not transaction startup.

One more thought on this -- on a properly configured server, this
code should rarely be exercised unless there is a long-running READ
WRITE transaction.  The delay, if any, would be on the connection
which is committing that long running transaction.

-Kevin

Re: BUG #5915: OldSerXidAdd inflates pg_serial too much

От
Heikki Linnakangas
Дата:
On 04.03.2011 23:28, Kevin Grittner wrote:
> I wrote:
>
>> I think what we're protecting against is disk I/O at COMMIT time,
>> not transaction startup.
>
> One more thought on this -- on a properly configured server, this
> code should rarely be exercised unless there is a long-running READ
> WRITE transaction.  The delay, if any, would be on the connection
> which is committing that long running transaction.

What worries me most is that the cleanup happens while
SerializableXactHashLock is held. It's probably not a big deal in
practice, but it seems safer and probably more readable too to do the
cleanup at checkpoint.

Here's what I had in mind. Can you review, and do you have something to
test this with?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Вложения

Re: BUG #5915: OldSerXidAdd inflates pg_serial too much

От
"Kevin Grittner"
Дата:
Heikki Linnakangas  wrote:

> Here's what I had in mind. Can you review

The additions and modifications to the comments all look good to me.
I can see why you renamed one field and eliminated another; no
problem there.  I'm really surprised, when you ignore those changes,
how little was needed to move this to checkpoint time.  I hadn't
looked into that, and expected it to be much harder to do, even
though I should know better by now.  :-)

I looked it over and pondered a bit, and have three concerns.

(1)  Why is it safe to assume that checkpoint will occur before this
wraps around?   Is it just that it takes a billion transactions, and
one can reasonably expect a checkpoint before reaching that point, or
is there some other safety net?

(2)  What if there are only occassional clusters of serializable
transactions?  If the one SLRU page is held so that it isn't
repeatedly truncated and re-zeroed, does it pose a risk if
transaction IDs wrap around while that page is held?  (I don't think
this is a new problem with the proposed patch, it just made it more
obvious.)  Should we be using RecentGlobalXmin or something in that
checkpoint function to truncate that last page when it is safe to do
so?

(3)  That unnecessary SLRU flush should probably be conditioned on a
#ifdef or some not-very-visible GUC or something.  With the problems
which some people have with writes glutting the I/O during checkpoint
I'd hate to add to the writes at checkpoint time just for debugging
info -- at least without a specific request for that behavior.

Other than that, it seems like a very good idea.

> do you have something to test this with?

Well, I tested it with the src/test/isolation/ tests with and without
TEST_OLDSERXID defined, so it passes that much[1]; but if Dan can
grab another day on the MIT benchmarking server for a new round of
DBT-2 tests (at least some of which should be run with TEST_OLDSERXID
defined), it would help shake out any problems.  I have to say,
though, that Takashi-san seems to be the master of chasing down
corner condition problems with this patch.

Constructing test scripts for these corner cases which can be run
under the normal PostgreSQL regression testing framework is hard, but
I intend to continue to try to develop some before release.  But as
Dan pointed out recently, some of these issues are really hard to hit
without pounding on the software for hours with high concurrency on a
machine with a lot of CPUs.

-Kevin

[1] Actually, the last of the tests has been showing failure when
running with TEST_OLDSERXID defined, because it gives up on the
transaction set and forces a transaction to roll back a little
earlier due to the summarization of the conflict data.  I'm attaching
another file we may want to add to src/test/isolation/expected to
avoid confusion when people test a build with that defined.



Вложения

Re: BUG #5915: OldSerXidAdd inflates pg_serial too much

От
Heikki Linnakangas
Дата:
On 06.03.2011 20:32, Kevin Grittner wrote:
> Heikki Linnakangas  wrote:
>
>> Here's what I had in mind. Can you review
>
> The additions and modifications to the comments all look good to me.
> I can see why you renamed one field and eliminated another; no
> problem there.  I'm really surprised, when you ignore those changes,
> how little was needed to move this to checkpoint time.  I hadn't
> looked into that, and expected it to be much harder to do, even
> though I should know better by now.  :-)

Ok, committed, so that we get it into the alpha4 release that's wrapped
soon.

> I looked it over and pondered a bit, and have three concerns.
>
> (1)  Why is it safe to assume that checkpoint will occur before this
> wraps around?   Is it just that it takes a billion transactions, and
> one can reasonably expect a checkpoint before reaching that point, or
> is there some other safety net?

Hmm, good question. The maximum checkpoint interval is 30 minutes, so
you would need a sustained transaction rate of over 500000 transactions
per second to bump into that.

If it happens, I don't think there's any harm done. OldSerXidAdd()
initializes any new pages to zero, and the truncation code should never
truncate away files that are still in use.

> (2)  What if there are only occassional clusters of serializable
> transactions?  If the one SLRU page is held so that it isn't
> repeatedly truncated and re-zeroed, does it pose a risk if
> transaction IDs wrap around while that page is held?  (I don't think
> this is a new problem with the proposed patch, it just made it more
> obvious.)  Should we be using RecentGlobalXmin or something in that
> checkpoint function to truncate that last page when it is safe to do
> so?

I didn't quite understand this, but I believe the fact that
OldSerXidAdd() initializes any new pages to zero protects from this too.

> (3)  That unnecessary SLRU flush should probably be conditioned on a
> #ifdef or some not-very-visible GUC or something.  With the problems
> which some people have with writes glutting the I/O during checkpoint
> I'd hate to add to the writes at checkpoint time just for debugging
> info -- at least without a specific request for that behavior.

We have similar unnecessary SLRU flush in CheckpointSUBTRANS too, so I
don't think that's necessary.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com