Обсуждение: SPGist "triple parity" concept doesn't work

Поиск
Список
Период
Сортировка

SPGist "triple parity" concept doesn't work

От
Tom Lane
Дата:
I've been looking into the problem reported at
http://www.postgresql.org/message-id/519A5917.40704@qunar.com
and what I find is that we have spgist insertion operations deadlocking
against each other because one is descending from page A to page B while
the other descends from page B to page A.  According to the README file,
the "triple parity" page selection algorithm is supposed to prevent
that:

: While descending the tree, the insertion algorithm holds exclusive lock on
: two tree levels at a time, ie both parent and child pages (parent and child
: pages can be the same, see notes above). There is a possibility of deadlock
: between two insertions if there are cross-referenced pages in different
: branches.  That is, if inner tuple on page M has a child on page N while
: an inner tuple from another branch is on page N and has a child on page M,
: then two insertions descending the two branches could deadlock.  To prevent
: deadlocks we introduce a concept of "triple parity" of pages: if inner tuple
: is on page with BlockNumber N, then its child tuples should be placed on the
: same page, or else on a page with BlockNumber M where (N+1) mod 3 == M mod 3.
: This rule guarantees that tuples on page M will have no children on page N,
: since (M+1) mod 3 != N mod 3.

That would work fine as long as the invariant is maintained accurately.
However, there are at least two cases where the existing code fails to
maintain the invariant:

1. In spgAddNodeAction, if the enlarged inner tuple doesn't fit on the
current page anymore, we do this:
       /*        * obtain new buffer with the same parity as current, since it will be        * a child of same parent
tuple       */       current->buffer = SpGistGetBuffer(index,
GBUF_INNER_PARITY(current->blkno),                                        ...
 

That's fine as long as the parent tuple wasn't also on the current page.
If it was on the current page, we end up re-downlinking the parent to a
page having the same parity it has, not one more as it should be.

I tried to fix this like so:
       /*        * get a new buffer that has the right parity to store a child of        * the current tuple's parent
    */       current->buffer = SpGistGetBuffer(index,
GBUF_INNER_PARITY(parent->blkno+ 1),                                         ...
 

but that just moves the problem somewhere else: the link from the parent
to the new inner tuple is now guaranteed to follow the parity rules, but
the downlinks leading out of it don't follow them anymore.

2. In spgSplitNodeAction, we split an inner tuple into a "prefix" tuple
that replaces that inner tuple, and a "postfix" tuple that contains the
same downlinks the original tuple did.  That's fine as long as we can
fit the postfix tuple on the same page.  If we can't, we assign it to a
page that's one parity level below the current page, and then its
outgoing links violate the parity rules.  (Keeping the postfix tuple
on the current page wouldn't make things better, since we'd still
violate the parity rules with respect to either the incoming or outgoing
links of the prefix tuple, if it has to go to another page.)

With a few repetitions of either of these cases, and some bad luck
about placement of the new tuples, you get into situations where two
pages each contain downlinks leading to the other; and then a deadlock
is just a matter of time.

I don't immediately see any good way to fix this.  I think the "triple
parity" rule as it stands is hopelessly broken, but I don't know what
to replace it with, even granting that we don't need to maintain on-disk
compatibility.  (We'd have to tell people to reindex SPGist indexes
anyway, because of the risk that they already contain circular links;
so switching to a new layout rule doesn't seem to add any more pain.)
Or we could try to modify the insertion algorithm so it doesn't lock
two levels of the tree at once, but that seems pretty risky.

Thoughts?
        regards, tom lane



Re: SPGist "triple parity" concept doesn't work

От
Greg Stark
Дата:
On Thu, Jun 6, 2013 at 10:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>  To prevent
> : deadlocks we introduce a concept of "triple parity" of pages: if inner tuple
> : is on page with BlockNumber N, then its child tuples should be placed on the
> : same page, or else on a page with BlockNumber M where (N+1) mod 3 == M mod 3.
> : This rule guarantees that tuples on page M will have no children on page N,
> : since (M+1) mod 3 != N mod 3.

Even if the invariant was maintained why doesn't that just mean you
need three concurrent inserts to create a deadlock?


-- 
greg



Re: SPGist "triple parity" concept doesn't work

От
Tom Lane
Дата:
Greg Stark <stark@mit.edu> writes:
> On Thu, Jun 6, 2013 at 10:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> : This rule guarantees that tuples on page M will have no children on page N,
>> : since (M+1) mod 3 != N mod 3.

> Even if the invariant was maintained why doesn't that just mean you
> need three concurrent inserts to create a deadlock?

Hm, good point.  That reinforces my feeling that the page-number-based
approach isn't workable as a guarantee; though we might want to keep
that layout rule as a heuristic that would help reduce contention.

I thought a little bit about whether we could drop the requirement of
locking two tree levels during insertion descent, or at least recover
from deadlock if it did occur.  One simple fix would be to do a
ConditionalLockBuffer on the child level, and if it fails, just abandon
the insertion attempt and start over.  However that could lead to a lot
of wasted work when there's contention, so it's not terribly attractive
as-is.  Another line of thought is to lock just the single parent tuple,
not its whole page, when descending --- then we can't deadlock unless
there are actual circularities in the index.  We'd probably have to use
a heavyweight lock for that, which might be problematic for performance,
and I'm not exactly sure about timing of when to take that lock relative
to acquiring the page's buffer lock (which we'd still need).  There are
probably some other ways to attack this.
        regards, tom lane



Re: SPGist "triple parity" concept doesn't work

От
Alvaro Herrera
Дата:
Tom Lane wrote:

> I don't immediately see any good way to fix this.  I think the "triple
> parity" rule as it stands is hopelessly broken, but I don't know what
> to replace it with, even granting that we don't need to maintain on-disk
> compatibility.  (We'd have to tell people to reindex SPGist indexes
> anyway, because of the risk that they already contain circular links;
> so switching to a new layout rule doesn't seem to add any more pain.)
> Or we could try to modify the insertion algorithm so it doesn't lock
> two levels of the tree at once, but that seems pretty risky.

Is this the chance to add a metapage?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: SPGist "triple parity" concept doesn't work

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Is this the chance to add a metapage?

SPGist indexes do have a metapage already --- you're confusing them with
Gist.
        regards, tom lane



Re: SPGist "triple parity" concept doesn't work

От
Teodor Sigaev
Дата:
> That would work fine as long as the invariant is maintained accurately.
> However, there are at least two cases where the existing code fails to
> maintain the invariant:

Hmm. Didn't catch that during development.

> Thoughts?

Give me some time to play around it. Will think.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: SPGist "triple parity" concept doesn't work

От
Will Crawford
Дата:
On 7 June 2013 02:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Hm, good point.  That reinforces my feeling that the page-number-based
> approach isn't workable as a guarantee; though we might want to keep
> that layout rule as a heuristic that would help reduce contention.

Can the locks just be taken in, say, numeric order of the pages involved?



Re: SPGist "triple parity" concept doesn't work

От
Tom Lane
Дата:
Teodor Sigaev <teodor@sigaev.ru> writes:
>> Thoughts?

> Give me some time to play around it. Will think.

I experimented a bit with the idea of taking a heavyweight lock to
represent the right to alter an inner tuple.  The results were pretty
grim: an spgist index build example went from 225 ms to 380 ms, and
a test case involving concurrent insertions (basically the complainant's
original example pgbench-ified) went from 5400 tps to 4300 tps.
I hadn't realized our heavyweight lock code was quite that expensive :-(

Anyway I now think that we might be better off with the other idea of
abandoning an insertion and retrying if we get a lock conflict.  That
would at least not create any performance penalty for non-concurrent
scenarios; and even in concurrent cases, I suspect you'd have to be
rather unlucky to get penalties as bad as the heavyweight-lock solution
is showing.
        regards, tom lane



Re: SPGist "triple parity" concept doesn't work

От
Teodor Sigaev
Дата:
> Anyway I now think that we might be better off with the other idea of
> abandoning an insertion and retrying if we get a lock conflict.  That
> would at least not create any performance penalty for non-concurrent
> scenarios; and even in concurrent cases, I suspect you'd have to be
> rather unlucky to get penalties as bad as the heavyweight-lock solution
> is showing.

Agree, it would be a better workaround for now. I will be able to do this at 
this friday.

I considered the idea to forbid placement of child on the same page as parent, 
but this implementation a) could significantly increase size of index, b) 
doesn't solve Greg's point.

We definetly need new idea of locking protocol and I'll return to this problem 
at autumn (sorry, I havn't time in summer to do this research).
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: SPGist "triple parity" concept doesn't work

От
Teodor Sigaev
Дата:
> Anyway I now think that we might be better off with the other idea of
> abandoning an insertion and retrying if we get a lock conflict.

done, look at the patch.

I was faced with the fact that my mail is considered spam by postgresql.org, so
I repeat some hthoughts from previous mail:

I considered the idea to forbid placement of child on the same page as parent,
but this implementation a) could significantly increase size of index, b)
doesn't solve Greg's point.

We definetly need new idea of locking protocol and I'll return to this problem
at autumn (sorry, I havn't time in summer to do this research).

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: SPGist "triple parity" concept doesn't work

От
Tom Lane
Дата:
Teodor Sigaev <teodor@sigaev.ru> writes:
>> Anyway I now think that we might be better off with the other idea of
>> abandoning an insertion and retrying if we get a lock conflict.

> done, look at the patch.

Looks good, committed with some cosmetic adjustments.

> We definetly need new idea of locking protocol and I'll return to this
> problem at autumn (sorry, I havn't time in summer to do this
> research).

OK.  I think the performance of this way will be okay, actually, in most
cases anyhow.  It'll do till we have a better idea.
        regards, tom lane