Обсуждение: old_snapshot_threshold vs indexes

Поиск

Список

Период

Сортировка

old_snapshot_threshold vs indexes

От

Thomas Munro

Дата:

20 июня 2019 г., 13:21:09

Hello,

I ran into someone with a system where big queries scanning 8GB+ of
all-in-cache data took consistently ~2.5x longer on a primary server
than on a replica.  Both servers had concurrent activity on them but
plenty of spare capacity and similar specs.  After some investigation
it turned out that on the primary there were (1) some select()
syscalls waiting for 1ms, which might indicate contended
SpinLockAcquire() back-offs, and (2) a huge amount of time spent in:

+ 93,31% 0,00% postgres postgres [.] index_getnext
+ 93,30% 0,00% postgres postgres [.] index_fetch_heap
+ 81,66% 0,01% postgres postgres [.] heap_page_prune_opt
+ 75,85% 0,00% postgres postgres [.] TransactionIdLimitedForOldSnapshots
+ 75,83% 0,01% postgres postgres [.] RelationHasUnloggedIndex
+ 75,79% 0,00% postgres postgres [.] RelationGetIndexList
+ 75,79% 75,78% postgres postgres [.] list_copy

The large tables in question have around 30 indexes.  I see that
heap_page_prune_opt()'s call to TransactionIdLimitedForOldSnapshots()
acquires a couple of system-wide spinlocks, and also tests
RelationAllowsEarlyPruning() which calls RelationHasUnloggedIndex()
which says:

 * Tells whether any index for the relation is unlogged.
 *
 * Note: There doesn't seem to be any way to have an unlogged index attached
 * to a permanent table, but it seems best to keep this general so that it
 * returns sensible results even when they seem obvious (like for an unlogged
 * table) and to handle possible future unlogged indexes on permanent tables.

It calls RelationGetIndexList() which conses up a new copy of the list
every time, so that we can spin through it looking for unlogged
indexes (and in this user's case there are none).  I didn't try to
poke at this in lab conditions, but from a glance a the code, I guess
heap_page_prune_opt() is running for every index tuple except those
that reference the same heap page as the one before, so I guess it
happens a lot if the heap is not physically correlated with the index
keys.  Ouch.

-- 
Thomas Munro
https://enterprisedb.com

Re: old_snapshot_threshold vs indexes

От

Thomas Munro

Дата:

25 июня 2019 г., 02:21:31

On Fri, Jun 21, 2019 at 1:21 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> I ran into someone with a system where big queries scanning 8GB+ of
> all-in-cache data took consistently ~2.5x longer on a primary server
> than on a replica.  Both servers had concurrent activity on them but
> plenty of spare capacity and similar specs.  After some investigation
> it turned out that on the primary there were (1) some select()
> syscalls waiting for 1ms, which might indicate contended
> SpinLockAcquire() back-offs, and (2) a huge amount of time spent in:
>
> + 93,31% 0,00% postgres postgres [.] index_getnext
> + 93,30% 0,00% postgres postgres [.] index_fetch_heap
> + 81,66% 0,01% postgres postgres [.] heap_page_prune_opt
> + 75,85% 0,00% postgres postgres [.] TransactionIdLimitedForOldSnapshots
> + 75,83% 0,01% postgres postgres [.] RelationHasUnloggedIndex
> + 75,79% 0,00% postgres postgres [.] RelationGetIndexList
> + 75,79% 75,78% postgres postgres [.] list_copy

On my laptop, all prewarmed, no concurrency, the mere existence of 10
brin indexes causes a sequential scan to take ~5% longer and an
uncorrelated index scan to take ~45% longer (correlated index scans
don't suffer).  Here's a draft patch for v13 that fixes that problem
by caching the result of RelationHasUnloggedIndex().

Reproducer scripts also attached.  I ran them with shared_buffers=8GB,
old_snapshot_threshold=10s and pg_prewarm installed.

I didn't try to look into the complaint about suspected spinlock contention.

-- 
Thomas Munro
https://enterprisedb.com

Вложения

Re: old_snapshot_threshold vs indexes

От

Tom Lane

Дата:

26 августа 2019 г., 21:28:57

Thomas Munro <thomas.munro@gmail.com> writes:
> On my laptop, all prewarmed, no concurrency, the mere existence of 10
> brin indexes causes a sequential scan to take ~5% longer and an
> uncorrelated index scan to take ~45% longer (correlated index scans
> don't suffer).  Here's a draft patch for v13 that fixes that problem
> by caching the result of RelationHasUnloggedIndex().

I agree that this code is absolutely horrid as it stands.  However,
it doesn't look to me like caching RelationHasUnloggedIndex is quite
enough to fix it.  The other problem is that the calls in question
seem to be mostly in TestForOldSnapshot, which is called in places
like heapgetpage:

    LockBuffer(buffer, BUFFER_LOCK_SHARE);

    dp = BufferGetPage(buffer);
    TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, dp);
    lines = PageGetMaxOffsetNumber(dp);
    ntup = 0;

It is hard to express what a bad idea it is to be asking for complex
catalog searches while holding a buffer lock.  We could easily get
into undetectable deadlocks that way, for example.  We need to refactor
these call sites to arrange that the catalog lookup happens outside
the low-level page access.

Your 0001 patch looks reasonable for the purpose of caching the
result, though.

            regards, tom lane

Re: old_snapshot_threshold vs indexes

От

Thomas Munro

Дата:

26 августа 2019 г., 22:53:02

On Tue, Aug 27, 2019 at 9:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I agree that this code is absolutely horrid as it stands.  However,
> it doesn't look to me like caching RelationHasUnloggedIndex is quite
> enough to fix it.  The other problem is that the calls in question
> seem to be mostly in TestForOldSnapshot, which is called in places
> like heapgetpage:
>
>         LockBuffer(buffer, BUFFER_LOCK_SHARE);
>
>         dp = BufferGetPage(buffer);
>         TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, dp);
>         lines = PageGetMaxOffsetNumber(dp);
>         ntup = 0;
>
> It is hard to express what a bad idea it is to be asking for complex
> catalog searches while holding a buffer lock.  We could easily get
> into undetectable deadlocks that way, for example.  We need to refactor
> these call sites to arrange that the catalog lookup happens outside
> the low-level page access.

Hmm.  Right.  Perhaps the theory was that it was OK because it's
shared (rather than exclusive), or perhaps the catalog lookup was
sufficiently well hidden and was forgotten.  At first glance it seems
like we need to capture PageGetLSN(page) while we have the lock, and
then later pass that into TestForOldSnapshot() instead of the page.
I'll look into that and write a patch, probably in a day or two.

> Your 0001 patch looks reasonable for the purpose of caching the
> result, though.

Thanks for the review.  I'll wait until we figure out what to do about
the other problem and what needs to be back-patched.

-- 
Thomas Munro
https://enterprisedb.com

Re: old_snapshot_threshold vs indexes

От

Tom Lane

Дата:

26 августа 2019 г., 22:59:38

Thomas Munro <thomas.munro@gmail.com> writes:
> On Tue, Aug 27, 2019 at 9:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It is hard to express what a bad idea it is to be asking for complex
>> catalog searches while holding a buffer lock.  We could easily get
>> into undetectable deadlocks that way, for example.  We need to refactor
>> these call sites to arrange that the catalog lookup happens outside
>> the low-level page access.

> Hmm.  Right.  Perhaps the theory was that it was OK because it's
> shared (rather than exclusive), or perhaps the catalog lookup was
> sufficiently well hidden and was forgotten.

I strongly suspect the latter.  Also, it may well be that the
unlogged-index check was not in the original design but was
added later with insufficient thought about where it'd be called
from.

> At first glance it seems
> like we need to capture PageGetLSN(page) while we have the lock, and
> then later pass that into TestForOldSnapshot() instead of the page.
> I'll look into that and write a patch, probably in a day or two.

Hm, but surely we need to do other things to the page besides
TestForOldSnapshot?  I was imagining that we'd collect the
RelationHasUnloggedIndex flag (or perhaps better, the
RelationAllowsEarlyPruning result) before attempting to lock
the page, and then pass it to TestForOldSnapshot.

            regards, tom lane

Re: old_snapshot_threshold vs indexes

От

Thomas Munro

Дата:

27 августа 2019 г., 00:02:25

On Tue, Aug 27, 2019 at 10:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > At first glance it seems
> > like we need to capture PageGetLSN(page) while we have the lock, and
> > then later pass that into TestForOldSnapshot() instead of the page.
> > I'll look into that and write a patch, probably in a day or two.
>
> Hm, but surely we need to do other things to the page besides
> TestForOldSnapshot?  I was imagining that we'd collect the
> RelationHasUnloggedIndex flag (or perhaps better, the
> RelationAllowsEarlyPruning result) before attempting to lock
> the page, and then pass it to TestForOldSnapshot.

OK I started writing a patch and realised there were a few ugly
problems that I was about to report here... but now I wonder if, based
on the comment for RelationHasUnloggedIndex(), we shouldn't just nuke
all this code.  We don't actually support unlogged indexes on
permanent tables (there is no syntax to create them, and
RelationHasUnloggedIndex() will never return true in practice because
RelationNeedsWAL() will always return false first).  This is a locking
protocol violation and a performance pessimisation in support of a
feature we don't have.  If we add support for that in some future
release, we can figure out how to do it properly then, no?

-- 
Thomas Munro
https://enterprisedb.com

Re: old_snapshot_threshold vs indexes

От

Tom Lane

Дата:

27 августа 2019 г., 01:54:46

Thomas Munro <thomas.munro@gmail.com> writes:
> OK I started writing a patch and realised there were a few ugly
> problems that I was about to report here... but now I wonder if, based
> on the comment for RelationHasUnloggedIndex(), we shouldn't just nuke
> all this code.  We don't actually support unlogged indexes on
> permanent tables (there is no syntax to create them, and
> RelationHasUnloggedIndex() will never return true in practice because
> RelationNeedsWAL() will always return false first).

Oh!  That explains why the code coverage report shows clearly that
RelationHasUnloggedIndex never returns true ;-)

> This is a locking
> protocol violation and a performance pessimisation in support of a
> feature we don't have.  If we add support for that in some future
> release, we can figure out how to do it properly then, no?

+1.  That fix is also back-patchable, which adding fields to relcache
entries would not be.

It's not really apparent to me that unlogged indexes on logged tables
would ever be a useful combination, so I'm certainly willing to nuke
poorly-thought-out code that putatively supports it.  But perhaps
we should add some comments to remind us that this area would need
work if anyone ever wanted to support that.  Not sure where.

            regards, tom lane

Re: old_snapshot_threshold vs indexes

От

Thomas Munro

Дата:

27 августа 2019 г., 04:29:27

On Tue, Aug 27, 2019 at 1:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > OK I started writing a patch and realised there were a few ugly
> > problems that I was about to report here... but now I wonder if, based
> > on the comment for RelationHasUnloggedIndex(), we shouldn't just nuke
> > all this code.  We don't actually support unlogged indexes on
> > permanent tables (there is no syntax to create them, and
> > RelationHasUnloggedIndex() will never return true in practice because
> > RelationNeedsWAL() will always return false first).
>
> Oh!  That explains why the code coverage report shows clearly that
> RelationHasUnloggedIndex never returns true ;-)
>
> > This is a locking
> > protocol violation and a performance pessimisation in support of a
> > feature we don't have.  If we add support for that in some future
> > release, we can figure out how to do it properly then, no?
>
> +1.  That fix is also back-patchable, which adding fields to relcache
> entries would not be.

There is a fly in the ointment: REL9_6_STABLE's copy of
RelationHasUnloggedIndex() is hardcoded to return true for hash
indexes (see commit 2cc41acd8).

However, I now see that there isn't a buffer content lock deadlock
risk here after all, because we don't reach RelationHasUnloggedIndex()
if IsCatalogRelation(rel).  That reminds me of commit 4fd05bb55b4.  It
still doesn't seem like a great idea to be doing catalog access while
holding the buffer content lock, though.

So I think we need to leave 9.6 as is, and discuss how far back to
back-patch the attached.  It could go back to 10, but perhaps we
should be cautious and push it to master only for now, if you agree
with my analysis of the deadlock thing.

> It's not really apparent to me that unlogged indexes on logged tables
> would ever be a useful combination, so I'm certainly willing to nuke
> poorly-thought-out code that putatively supports it.  But perhaps
> we should add some comments to remind us that this area would need
> work if anyone ever wanted to support that.  Not sure where.

It might make sense for some kind of in-memory index that is rebuilt
from the heap at startup, but then I doubt such a thing would have an
index relation with a relpersistence to check anyway.

-- 
Thomas Munro
https://enterprisedb.com

Вложения

0001-Don-t-perform-catalog-lookups-in-TestForOldSnapshot.patch

Re: old_snapshot_threshold vs indexes

От

Tom Lane

Дата:

27 августа 2019 г., 15:05:49

Thomas Munro <thomas.munro@gmail.com> writes:
> On Tue, Aug 27, 2019 at 1:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> +1.  That fix is also back-patchable, which adding fields to relcache
>> entries would not be.

> There is a fly in the ointment: REL9_6_STABLE's copy of
> RelationHasUnloggedIndex() is hardcoded to return true for hash
> indexes (see commit 2cc41acd8).

True, in 9.6 hash indexes *were* effectively unlogged, so that the code
actually did something in that branch.  Given the lack of bug reports
traceable to this, I wouldn't feel too bad about leaving it alone in 9.6.

> However, I now see that there isn't a buffer content lock deadlock
> risk here after all, because we don't reach RelationHasUnloggedIndex()
> if IsCatalogRelation(rel).  That reminds me of commit 4fd05bb55b4.  It
> still doesn't seem like a great idea to be doing catalog access while
> holding the buffer content lock, though.

Yeah, I'm not convinced that that observation means the problem is
unreachable.  Probably does make it harder to hit a deadlock, but
if you mix a few VACUUMs and untimely cache flushes into the
equation, I feel like one could still happen.

> So I think we need to leave 9.6 as is, and discuss how far back to
> back-patch the attached.  It could go back to 10, but perhaps we
> should be cautious and push it to master only for now, if you agree
> with my analysis of the deadlock thing.

I'd vote for back-patching to 10.  Even if there is in fact no deadlock
hazard, you've clearly demonstrated a significant performance hit that
we're taking for basically no reason.

In the larger picture, the commit this reminds me of is b04aeb0a0.
I'm wondering if we could add some assertions to the effect of
"don't initiate relcache or syscache lookups while holding a buffer
lock".  It would be relatively easy to do that if we could make
the rule be "... while holding any LWLock", but I suspect that
that would break some legitimate cases.

            regards, tom lane

Re: old_snapshot_threshold vs indexes

От

Thomas Munro

Дата:

28 августа 2019 г., 09:02:37

On Wed, Aug 28, 2019 at 3:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd vote for back-patching to 10.  Even if there is in fact no deadlock
> hazard, you've clearly demonstrated a significant performance hit that
> we're taking for basically no reason.

Done.

The second symptom reported in my first email looked like evidence of
high levels of spinlock backoff, which I guess might have been coming
from TransactionIdLimitedForOldSnapshots()'s hammering of
oldSnapshotControl->mutex_threshold and
oldSnapshotControl->mutex_threshold, when running
heap_page_prune_opt()-heavy workloads like the one generated by
test-indexscan.sql (from my earlier message) from many backends at the
same time on a large system.  That's just an observation I'm leaving
here, I'm not planning to chase that any further for now.

-- 
Thomas Munro
https://enterprisedb.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: old_snapshot_threshold vs indexes

Вложения

Вложения