Обсуждение: amcheck verification for GiST

Поиск

Список

Период

Сортировка

amcheck verification for GiST

От

Andrey Borodin

Дата:

23 сентября 2018 г., 16:15:09

Hi, hackers!

Here's the patch with amcheck functionality for GiST.

It basically checks two invariants:
1. Every internal tuple need no adjustment by tuples of referenced page
2. Internal page reference or only leaf pages or only internal pages

We actually cannot check all balanced tree invariants due to concurrency reasons some concurrent splits will be visible
astemporary balance violations. 

Are there any other invariants that we can check?

I'd be happy to hear any thought about this.

Best regards, Andrey Borodin.

Вложения

0001-GiST-verification-function-for-amcheck.patch

Re: amcheck verification for GiST

От

Thomas Munro

Дата:

24 сентября 2018 г., 09:29:38

On Sun, Sep 23, 2018 at 10:15 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> Here's the patch with amcheck functionality for GiST.

Hi Andrey,

Windows doesn't like it[1]:

contrib/amcheck/verify_gist.c(163): error C2121: '#' : invalid
character : possibly the result of a macro expansion
[C:\projects\postgresql\amcheck.vcxproj]

That's:

+    MemoryContext mctx = AllocSetContextCreate(CurrentMemoryContext,
+ "amcheck context",
+#if PG_VERSION_NUM >= 110000
+ ALLOCSET_DEFAULT_SIZES);
+#else
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+#endif

Not sure what's gong on there... perhaps it doesn't like you to do
that in the middle of a function-like-macro invocation
(AllocSetContextCreate)?

[1] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.14056

-- 
Thomas Munro
http://www.enterprisedb.com

Re: amcheck verification for GiST

От

Andres Freund

Дата:

24 сентября 2018 г., 09:34:57

On 2018-09-24 15:29:38 +1200, Thomas Munro wrote:
> On Sun, Sep 23, 2018 at 10:15 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> > Here's the patch with amcheck functionality for GiST.
> 
> Hi Andrey,
> 
> Windows doesn't like it[1]:
> 
> contrib/amcheck/verify_gist.c(163): error C2121: '#' : invalid
> character : possibly the result of a macro expansion
> [C:\projects\postgresql\amcheck.vcxproj]
> 
> That's:
> 
> +    MemoryContext mctx = AllocSetContextCreate(CurrentMemoryContext,
> + "amcheck context",
> +#if PG_VERSION_NUM >= 110000
> + ALLOCSET_DEFAULT_SIZES);
> +#else
> + ALLOCSET_DEFAULT_MINSIZE,
> + ALLOCSET_DEFAULT_INITSIZE,
> + ALLOCSET_DEFAULT_MAXSIZE);
> +#endif
> 
> Not sure what's gong on there... perhaps it doesn't like you to do
> that in the middle of a function-like-macro invocation
> (AllocSetContextCreate)?

But note that the version dependent code shouldn't be present in
/contrib anyway.

Greetings,

Andres Freund

Re: amcheck verification for GiST

От

Andrey Borodin

Дата:

24 сентября 2018 г., 11:12:05

Hi!

> 24 сент. 2018 г., в 8:29, Thomas Munro <thomas.munro@enterprisedb.com> написал(а):
>
> On Sun, Sep 23, 2018 at 10:15 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>> Here's the patch with amcheck functionality for GiST.
>
> Hi Andrey,
>
> Windows doesn't like it[1]:

Thanks, Thomas! Yes, I've missed that version-dependent macro. Surely, it's redundant.

Best regards, Andrey Borodin.

Вложения

0001-GiST-verification-function-for-amcheck-v2.patch

Re: amcheck verification for GiST

От

Peter Geoghegan

Дата:

07 декабря 2018 г., 01:59:28

On Sun, Sep 23, 2018 at 10:12 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> (0001-GiST-verification-function-for-amcheck-v2.patch)

Thanks for working on this. Some feedback:

* You do this:

> +/* Check of an internal page. Hold locks on two pages at a time (parent+child). */

This isn't consistent with what you do within verify_nbtree.c, which
deliberately avoids ever holding more than a single buffer lock at a
time, on general principle. That isn't necessarily a reason why you
have to do the same, but it's not clear why you do things that way.
Why isn't it enough to have a ShareLock on the relation? Maybe this is
a sign that it would be a good idea to always operate on a palloc()'d
copy of the page, by introducing something equivalent to
palloc_btree_page(). (That would also be an opportunity to do very
basic checks on every page.)

* You need to sprinkle a few CHECK_FOR_INTERRUPTS() calls around.
Certainly, there should be one at the top of the main loop.

* Maybe gist_index_check() should be called gist_index_parent_check(),
since it's rather like the existing verification function
bt_index_parent_check().

* Alternatively, you could find a way to make your function only need
an AccessShareLock -- that would make gist_index_check() an
appropriate name. That would probably require careful thought about
VACUUM.

* Why is it okay to do this?:

> +       if (GistTupleIsInvalid(idxtuple))
> +           ereport(LOG,
> +                   (errmsg("index \"%s\" contains an inner tuple marked as invalid",
> +                           RelationGetRelationName(rel)),
> +                    errdetail("This is caused by an incomplete page split at crash recovery before upgrading to
PostgreSQL9.1."),

> +                    errhint("Please REINDEX it.")));

You should probably mention the gistdoinsert() precedent for this.

* Can we check GIST_PAGE_ID somewhere? I try to be as paranoid as
possible, adding almost any check that I can think of, provided it
hasn't got very high overhead. Note that gistcheckpage() doesn't do
this for you.

* Should we be concerned about the memory used by pushStackIfSplited()?

* How about a cross-check between IndexTupleSize() and
ItemIdGetLength(), like the B-Tree code? It's a bit unfortunate that
we have this redundancy, which wastes space, but we do, so we might as
well get some small benefit from it.

--
Peter Geoghegan

Re: amcheck verification for GiST

От

Andrey Borodin

Дата:

01 января 2019 г., 13:11:06

Hi, Peter!

Thank you for the review!

> 7 дек. 2018 г., в 3:59, Peter Geoghegan <pg@bowt.ie> написал(а):
>
> On Sun, Sep 23, 2018 at 10:12 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> * You do this:
>
>> +/* Check of an internal page. Hold locks on two pages at a time (parent+child). */
>
> This isn't consistent with what you do within verify_nbtree.c, which
> deliberately avoids ever holding more than a single buffer lock at a
> time, on general principle. That isn't necessarily a reason why you
> have to do the same, but it's not clear why you do things that way.
> Why isn't it enough to have a ShareLock on the relation? Maybe this is
> a sign that it would be a good idea to always operate on a palloc()'d
> copy of the page, by introducing something equivalent to
> palloc_btree_page(). (That would also be an opportunity to do very
> basic checks on every page.)
If we unlock parent page before checking child, some insert can adjust tuple on parent, sneak into child and insert new
tuple.
This can trigger false positive. I'll think about it more.

> * You need to sprinkle a few CHECK_FOR_INTERRUPTS() calls around.
> Certainly, there should be one at the top of the main loop.
I've added check into main loop of the scan. All deeper paths hold buffer locks.

> * Maybe gist_index_check() should be called gist_index_parent_check(),
> since it's rather like the existing verification function
> bt_index_parent_check().
>
> * Alternatively, you could find a way to make your function only need
> an AccessShareLock -- that would make gist_index_check() an
> appropriate name. That would probably require careful thought about
> VACUUM.

I've changed lock level to AccessShareLock. IMV scan is just as safe as regular GiST index scan.
There is my patch with VACUUM on CF, it can add deleted pages. I'll update one of these two patches accordingly, if
otheris committed.  

>
> * Why is it okay to do this?:
>
>> +       if (GistTupleIsInvalid(idxtuple))
>> +           ereport(LOG,
>> +                   (errmsg("index \"%s\" contains an inner tuple marked as invalid",
>> +                           RelationGetRelationName(rel)),
>> +                    errdetail("This is caused by an incomplete page split at crash recovery before upgrading to
PostgreSQL9.1."), 
>> +                    errhint("Please REINDEX it.")));
>
> You should probably mention the gistdoinsert() precedent for this.
This invalid tuple will break inserts, but will not affect select. I do not know, should this be error or warning in
amcheck?

> * Can we check GIST_PAGE_ID somewhere? I try to be as paranoid as
> possible, adding almost any check that I can think of, provided it
> hasn't got very high overhead. Note that gistcheckpage() doesn't do
> this for you.
Done. I think that gistcheckpage() should do this too, but I think we should avoid interventions into GiST mechanics
here.
>
> * Should we be concerned about the memory used by pushStackIfSplited()?
Memory is pfree()`d as usual. Tree scan code is almost equivalent to VACUUM`s gistbulkdelete.
>
> * How about a cross-check between IndexTupleSize() and
> ItemIdGetLength(), like the B-Tree code? It's a bit unfortunate that
> we have this redundancy, which wastes space, but we do, so we might as
> well get some small benefit from it.
Done. I'm checking it MAXALIGNED, this rounding seems correct.


Please find attached v3.


Best regards, Andrey Borodin.

Вложения

0001-GiST-verification-function-for-amcheck-v3.patch

Re: amcheck verification for GiST

От

Peter Geoghegan

Дата:

01 февраля 2019 г., 02:58:48

On Tue, Jan 1, 2019 at 2:11 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> > This isn't consistent with what you do within verify_nbtree.c, which
> > deliberately avoids ever holding more than a single buffer lock at a
> > time, on general principle. That isn't necessarily a reason why you
> > have to do the same, but it's not clear why you do things that way.
> > Why isn't it enough to have a ShareLock on the relation? Maybe this is
> > a sign that it would be a good idea to always operate on a palloc()'d
> > copy of the page, by introducing something equivalent to
> > palloc_btree_page(). (That would also be an opportunity to do very
> > basic checks on every page.)
> If we unlock parent page before checking child, some insert can adjust tuple on parent, sneak into child and insert
newtuple.

> This can trigger false positive. I'll think about it more.

I think that holding a buffer lock on an internal pages for as long as
it takes to check all of the child pages is a non-starter. If you
can't think of a way of not doing that that's race free with a
relation-level AccessShareLock, then a relation-level ShareLock (which
will block VACUUM) seems necessary.

I think that you should duplicate some of what's in
bt_index_check_internal() -- lock the heap table as well as the index,
to avoid deadlocks. We might not do this with contrib/pageinspect, but
that's not really intended to be used in production in the same way
this will be.

> > * Maybe gist_index_check() should be called gist_index_parent_check(),
> > since it's rather like the existing verification function
> > bt_index_parent_check().
> >
> > * Alternatively, you could find a way to make your function only need
> > an AccessShareLock -- that would make gist_index_check() an
> > appropriate name. That would probably require careful thought about
> > VACUUM.
>
> I've changed lock level to AccessShareLock. IMV scan is just as safe as regular GiST index scan.
> There is my patch with VACUUM on CF, it can add deleted pages. I'll update one of these two patches accordingly, if
otheris committed.

Maybe you should have renamed it to gist_index_parent_check() instead,
and kept the ShareLock. I don't think that this business with buffer
locks is acceptable. And it is mostly checking parent/child
relationships. Right?

You're not really able to check GiST tuples against anything other
than their parent, unlike with B-Tree (you can only do very simple
things, like the IndexTupleSize()/lp_len cross-check). Naming the
function gist_index_parent_check() seems like the right way to go,
even if you could get away with an AccessShareLock (which I now tend
to doubt). It was way too optimistic to suppose that there might be a
clever way of locking down race conditions that allowed you to not
couple buffer locks, but also use an AccessShareLock. After all, even
the B-Tree amcheck code doesn't manage to do this when verifying
parent/child relationships (it only does something clever with the
sibling page cross-check).

> > * Why is it okay to do this?:
> >
> >> +       if (GistTupleIsInvalid(idxtuple))
> >> +           ereport(LOG,
> >> +                   (errmsg("index \"%s\" contains an inner tuple marked as invalid",
> >> +                           RelationGetRelationName(rel)),
> >> +                    errdetail("This is caused by an incomplete page split at crash recovery before upgrading to
PostgreSQL9.1."),

> >> +                    errhint("Please REINDEX it.")));
> >
> > You should probably mention the gistdoinsert() precedent for this.
> This invalid tuple will break inserts, but will not affect select. I do not know, should this be error or warning in
amcheck?

It should be an error. Breaking inserts is a serious problem.

-- 
Peter Geoghegan

Re: amcheck verification for GiST

От

Michael Paquier

Дата:

04 февраля 2019 г., 08:16:42

On Thu, Jan 31, 2019 at 03:58:48PM -0800, Peter Geoghegan wrote:
> I think that holding a buffer lock on an internal pages for as long as
> it takes to check all of the child pages is a non-starter. If you
> can't think of a way of not doing that that's race free with a
> relation-level AccessShareLock, then a relation-level ShareLock (which
> will block VACUUM) seems necessary.

(Please be careful to update the status of the patch in the CF
correctly!)
This review is recent, so I have moved the patch to next CF, waiting
for input from the author.
--
Michael

Вложения

signature.asc

Re: amcheck verification for GiST

От

Andrey Borodin

Дата:

17 февраля 2019 г., 11:55:05

Hi Peter!

Sorry for the delay. Here's new version.

> 1 февр. 2019 г., в 4:58, Peter Geoghegan <pg@bowt.ie> написал(а):
>
> On Tue, Jan 1, 2019 at 2:11 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>>> This isn't consistent with what you do within verify_nbtree.c, which
>>> deliberately avoids ever holding more than a single buffer lock at a
>>> time, on general principle. That isn't necessarily a reason why you
>>> have to do the same, but it's not clear why you do things that way.
>>> Why isn't it enough to have a ShareLock on the relation? Maybe this is
>>> a sign that it would be a good idea to always operate on a palloc()'d
>>> copy of the page, by introducing something equivalent to
>>> palloc_btree_page(). (That would also be an opportunity to do very
>>> basic checks on every page.)
>> If we unlock parent page before checking child, some insert can adjust tuple on parent, sneak into child and insert
newtuple. 
>> This can trigger false positive. I'll think about it more.
>
> I think that holding a buffer lock on an internal pages for as long as
> it takes to check all of the child pages is a non-starter. If you
> can't think of a way of not doing that that's race free with a
> relation-level AccessShareLock, then a relation-level ShareLock (which
> will block VACUUM) seems necessary.
>
> I think that you should duplicate some of what's in
> bt_index_check_internal() -- lock the heap table as well as the index,
> to avoid deadlocks. We might not do this with contrib/pageinspect, but
> that's not really intended to be used in production in the same way
> this will be.

I've extracted functions amcheck_lock_relation() and amcheck_unlock_relation().
With this patch a little bit complicated, but I do not think that code duplication will be OK here.
Instead of lock\unlock functions I can provide one-function-interface receiving index_checkable() and index_check()
callbacks.

>
>>> * Maybe gist_index_check() should be called gist_index_parent_check(),
>>> since it's rather like the existing verification function
>>> bt_index_parent_check().
>>>
>>> * Alternatively, you could find a way to make your function only need
>>> an AccessShareLock -- that would make gist_index_check() an
>>> appropriate name. That would probably require careful thought about
>>> VACUUM.
>>
>> I've changed lock level to AccessShareLock. IMV scan is just as safe as regular GiST index scan.
>> There is my patch with VACUUM on CF, it can add deleted pages. I'll update one of these two patches accordingly, if
otheris committed. 
>
> Maybe you should have renamed it to gist_index_parent_check() instead,
> and kept the ShareLock. I don't think that this business with buffer
> locks is acceptable. And it is mostly checking parent/child
> relationships. Right?
That's right. Semantically gist_index_parent_check() is correct name, let's use it.
>
>
> You're not really able to check GiST tuples against anything other
> than their parent, unlike with B-Tree (you can only do very simple
> things, like the IndexTupleSize()/lp_len cross-check). Naming the
> function gist_index_parent_check() seems like the right way to go,
> even if you could get away with an AccessShareLock (which I now tend
> to doubt). It was way too optimistic to suppose that there might be a
> clever way of locking down race conditions that allowed you to not
> couple buffer locks, but also use an AccessShareLock. After all, even
> the B-Tree amcheck code doesn't manage to do this when verifying
> parent/child relationships (it only does something clever with the
> sibling page cross-check).

That's true, we cannot avoid locking parent and child page simultaneously to check correctness of tuples.

Currently, we do not check index tuples against heap. Should we do this or leave this for another patch?


Thanks!

Best regards, Andrey Borodin.

Вложения

0001-GiST-verification-function-for-amcheck-v4.patch

Re: amcheck verification for GiST

От

Peter Geoghegan

Дата:

21 февраля 2019 г., 03:56:31

On Sun, Feb 17, 2019 at 12:55 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> That's true, we cannot avoid locking parent and child page simultaneously to check correctness of tuples.

Right.

Some further questions/comments:

* I think that this should be an error:

> +       if (GistPageIsLeaf(page))
> +       {
> +           /* should never happen unless it is root */
> +           Assert(stack->blkno == GIST_ROOT_BLKNO);
> +       }

I use assertions in the verify_nbtree.c, but only for things that are
programming errors, and state that amcheck "owns". If they fail, then
it's a bug in amcheck specifically (they're really obvious assertions
about local state). Whereas this case could in theory happen with a
corrupt index, for example due to a page written in the wrong place.
I'm sure that the root block looks very similar to a leaf, so we won't
detect this any other way.

It's good to be paranoid, and to even think adversarially, provided it
doesn't make the design more difficult and has low runtime overhead.
We could debate whether or not this corruption is realistic, but it's
easy to make it an error and the cost is low, so you should just do
it.

* Couldn't leaf-like root pages also get some of the testing that we
do for other pages within check_index_tuple()? Ideally it wouldn't be
too much of a special case, though.

* I think that you need to add an
errcode(ERRCODE_FEATURE_NOT_SUPPORTED) to this:

> +   /*
> +    * Check that it's not a leftover invalid tuple from pre-9.1
> +    * See also gistdoinsert() and gistbulkdelete() handlding of such tuples.
> +    * We do not consider it error here, but warn operator.
> +    */
> +   if (GistTupleIsInvalid(idxtuple))
> +       ereport(ERROR,
> +               (errmsg("index \"%s\" contains an inner tuple marked as invalid",
> +                       RelationGetRelationName(rel)),
> +                errdetail("This is caused by an incomplete page split at crash recovery before upgrading to
PostgreSQL9.1."),

> +                errhint("Please REINDEX it.")));

> Currently, we do not check index tuples against heap. Should we do this or leave this for another patch?

To address this question: I would leave this for now. It can probably
work based on the same principle as nbtree's heapallindexed
verification, with few or no changes, but I don't think that we need
to make that happen in this release.

* You still hold multiple buffer locks at once, starting with the
parent and moving to the child. Only 2 buffer locks. Why is this
necessary, given that you now hold a ShareLock on both heap and index
relations? Couldn't you just copy the parent into local memory, in the
style of verify_nbtree.c?

* Note that this "parent then child" lock order seems to not be
consistent with the general rule for holding concurrent buffer locks
that is described in the GiST README:

"""
Concurrency control
-------------------
As a rule of thumb, if you need to hold a lock on multiple pages at the
same time, the locks should be acquired in the following order: child page
before parent, and left-to-right at the same level. Always acquiring the
locks in the same order avoids deadlocks.
"""

* The main point of entry for GiST verification is
gist_check_parent_keys_consistency(). It would be nice to have
comments that describe what it does, and in what order. I understand
that English is not your first language, which makes it harder, but I
would appreciate it if you made the effort to explain the theory. I
don't want to assume that I understand your intent -- I could be
wrong.

* Suggest that you use function prototypes consistently, even for
static functions. That is the style that we prefer. Also, please make
the comments consistent with project guidelines on indentation and
style.

* It would be nice to know if the code in
gist_check_parent_keys_consistency() finds problems in the parent, or
in the child. The nbtree check has an idea of a "target" page: every
page gets to be the target exactly once, and we only ever find
problems in the current target. Maybe that is arbitrary in some cases,
because the relationship between the two is where the problem actually
is. I still think that it is a good idea to make it work that way if
possible. It makes it easier to describe complicated relationships in
comments.

* Why does it make sense to use range_gist_picksplit()/operator class
"union" support function to verify parent/child relationships? Does it
handle NULL values correctly, given the special rules?

* Why not use the "consistent" support function instead of the "union"
support function? Could you write new code that was based on
gistindex_keytest() to do this?

* The GiST docs ("63.3. Extensibility") say of the "distance" support
function: "For an internal tree node, the distance returned must not
be greater than the distance to any of the child nodes". Is there an
opportunity to test this relationship, too, by making sure that
distance is sane? Perhaps this is a very naive question -- I am not a
GiST expert.

* Is it right that gist_check_internal_page() should both return a
value that says "this internal page has child pages that are
themselves internal", while testing the child pages?

* Do we need to worry about F_DELETED pages? Why or why not?

* Do we need to worry about F_HAS_GARBAGE pages? Why or why not?

--
Peter Geoghegan

Re: amcheck verification for GiST

От

Andrey Borodin

Дата:

24 февраля 2019 г., 15:37:09

Hi!


Thanks for this detailed review!

>
> * Note that this "parent then child" lock order seems to not be
> consistent with the general rule for holding concurrent buffer locks
> that is described in the GiST README:

This is correct. I've changed locking order.
When we check target internal page, we make a palloc'ed copy and unlock it (but hold the pin).
If we find discrepancy between parent and child keys we lock parent page again.
Then look for correct downlink. And check keys again.
In case when discrepancy still present we report an error.
Otherwise drop parent lock.

>
> * I think that this should be an error:
>
>> +       if (GistPageIsLeaf(page))
>> +       {
>> +           /* should never happen unless it is root */
>> +           Assert(stack->blkno == GIST_ROOT_BLKNO);
>> +       }
>
Done. If this happens, this looks like a programming error or page header flags were corrupted
concurrently. Let's just report.

> * Couldn't leaf-like root pages also get some of the testing that we
> do for other pages within check_index_tuple()? Ideally it wouldn't be
> too much of a special case, though.
Done.

> * I think that you need to add an
> errcode(ERRCODE_FEATURE_NOT_SUPPORTED) to this:

Done.

> * You still hold multiple buffer locks at once, starting with the
> parent and moving to the child. Only 2 buffer locks. Why is this
> necessary, given that you now hold a ShareLock on both heap and index
> relations? Couldn't you just copy the parent into local memory, in the
> style of verify_nbtree.c?
Done.

> * The main point of entry for GiST verification is
> gist_check_parent_keys_consistency(). It would be nice to have
> comments that describe what it does, and in what order. I understand
> that English is not your first language, which makes it harder, but I
> would appreciate it if you made the effort to explain the theory. I
> don't want to assume that I understand your intent -- I could be
> wrong.

I've added about 10 lines of comments.

>
> * Suggest that you use function prototypes consistently, even for
> static functions. That is the style that we prefer. Also, please make
> the comments consistent with project guidelines on indentation and
> style.
Done.

>
> * It would be nice to know if the code in
> gist_check_parent_keys_consistency() finds problems in the parent, or
> in the child. The nbtree check has an idea of a "target" page: every
> page gets to be the target exactly once, and we only ever find
> problems in the current target. Maybe that is arbitrary in some cases,
> because the relationship between the two is where the problem actually
> is. I still think that it is a good idea to make it work that way if
> possible. It makes it easier to describe complicated relationships in
> comments.
the "target" for gist_check_parent_keys_consistency() is tuples on internal
page and their relation with tuples on referenced page. All other checks
are just additional checks, while I expect that this parent-child relationship
may contain some kind of bug: we had observed that GiST could not find some
tuples after CREATE INDEX CONCURRENTLY, but could not reliably reproduce the
problem before it was gone. That's why I've started this work.

> * Why does it make sense to use range_gist_picksplit()/operator class
> "union" support function to verify parent/child relationships? Does it
> handle NULL values correctly, given the special rules?
Yes, union handles NULLs. I do not use range_gist_picksplit().
By definition parent tuple is union of all child tuples.

> * Why not use the "consistent" support function instead of the "union"
> support function? Could you write new code that was based on
> gistindex_keytest() to do this?
Initially, I've used the consistency function. But it answers the question
"Does the downlinked key-space overlap with query". And query may be "not a
given key". Consistency function is controlled by search strategy. Different
operator classes support different set of strategies. But every opclass
should support union to build a GiST. So, I've used "union" instead of
"consistency".

>
> * The GiST docs ("63.3. Extensibility") say of the "distance" support
> function: "For an internal tree node, the distance returned must not
> be greater than the distance to any of the child nodes". Is there an
> opportunity to test this relationship, too, by making sure that
> distance is sane? Perhaps this is a very naive question -- I am not a
> GiST expert.
If parent tuple is correctly adjusted by child tuples, distance between
them must be 0. With this check We will test opclass, not an index structure.

>
> * Is it right that gist_check_internal_page() should both return a
> value that says "this internal page has child pages that are
> themselves internal", while testing the child pages?
Yes.

>
> * Do we need to worry about F_DELETED pages? Why or why not?
Currently, GiST scan does not check for F_DELETED: there simply is no code
to delete a page in GiST (except my patch on commitfest).
I suspect that there may be deleted pages from previous versions.
But encountering this in a search should not be a problem. Thus, I copy
behavior from index scan: do not complain about deleted pages.


> * Do we need to worry about F_HAS_GARBAGE pages? Why or why not?
This flag is for microvacuum and only hints that there may be some vacuumable
tuples on page. It is used during check for neccesary split to allocate some space.

Thanks!

Best regards, Andrey Borodin.

Вложения

0001-GiST-verification-function-for-amcheck-v5.patch

Re: amcheck verification for GiST

От

Heikki Linnakangas

Дата:

04 марта 2019 г., 12:53:38

There's a little copy-pasto in gist_check_page_keys():

> +            for (o = FirstOffsetNumber; o <= parent_maxoff; o = OffsetNumberNext(i))

Should be "OffsetNumberNext(o)".

I tested this patch with your testing patch from the other thread (after 
fixing the above), to leave behind incompletely split pages [1]. It 
seems that the amcheck code doesn't expect incomplete splits:

postgres=# SELECT gist_index_parent_check('x_c_idx');
ERROR:  index "x_c_idx" has inconsistent records

[1] 
https://www.postgresql.org/message-id/EB87A69B-EE5E-4259-9EEB-DA9DC1F7E265%40yandex-team.ru

- Heikki

Re: amcheck verification for GiST

От

Heikki Linnakangas

Дата:

04 марта 2019 г., 14:50:34

On 04/03/2019 17:53, Heikki Linnakangas wrote:
> I tested this patch with your testing patch from the other thread (after
> fixing the above), to leave behind incompletely split pages [1]. It
> seems that the amcheck code doesn't expect incomplete splits:
> 
> postgres=# SELECT gist_index_parent_check('x_c_idx');
> ERROR:  index "x_c_idx" has inconsistent records

On closer look, I think that was because that testing patch to leave 
behind incomplete splits really did corrupt the index. It always 
inserted the downlink to the parent, but randomly skipped clearing the 
FOLLOW_RIGHT flag and updating the NSN in the child. That's not a valid 
combination. To test incomplete splits, you need to skip inserting the 
downlink to the parent, too.

- Heikki

Re: amcheck verification for GiST

От

Heikki Linnakangas

Дата:

27 марта 2019 г., 20:29:38

On 27/03/2019 11:51, Andrey Borodin wrote:
> Hi!
> 
> Here's new version of GiST amcheck, which takes into account recently committed GiST VACUUM.
> It tests that deleted pages do not contain any data.

Thanks! I had a look, and refactored it quite a bit.

I found the way the recursion worked confusing. On each internal page, 
it looped through all the child nodes, checking the consistency of the 
downlinks. And then it looped through the children again, to recurse. 
This isn't performance-critical, but visiting every page twice still 
seems strange.

In gist_check_page_keys(), if we get into the code to deal with a 
concurrent update, we set 'parent' to point to a tuple on a parent page, 
then unlock it, and continue to look at remaining tuples, using the 
pointer that points to an unlocked buffer.

I came up with the attached, which fixes the above-mentioned things. I 
also replaced the check that each node has only internal or leaf 
children, with a different check that the tree has the same height in 
all branches. That catches more potential problems, and was easier to 
implement after the refactoring. This still needs at least a round of 
fixing typos and tidying up comments, but it's more straightforward now, 
IMHO.

What have you been using to test this?

- Heikki

Вложения

amcheck-gist-v6-heikki.patch

Re: amcheck verification for GiST

От

Peter Geoghegan

Дата:

28 марта 2019 г., 02:57:17

On Wed, Mar 27, 2019 at 10:29 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Thanks! I had a look, and refactored it quite a bit.

I'm really happy that other people seem to be driving amcheck in a new
direction, without any prompting from me. It's too important to remain
something that only I have contributed to.

> I found the way the recursion worked confusing. On each internal page,
> it looped through all the child nodes, checking the consistency of the
> downlinks. And then it looped through the children again, to recurse.
> This isn't performance-critical, but visiting every page twice still
> seems strange.

To be fair, that's actually what bt_index_parent_check() does. OTOH,
it has to work in a way that makes it an extension of
bt_index_check(), which is not a problem for
gist_index_parent_check().

> In gist_check_page_keys(), if we get into the code to deal with a
> concurrent update, we set 'parent' to point to a tuple on a parent page,
> then unlock it, and continue to look at remaining tuples, using the
> pointer that points to an unlocked buffer.

Why not just copy the page into a local buffer? See my remarks on this below.

> I came up with the attached, which fixes the above-mentioned things. I
> also replaced the check that each node has only internal or leaf
> children, with a different check that the tree has the same height in
> all branches. That catches more potential problems, and was easier to
> implement after the refactoring. This still needs at least a round of
> fixing typos and tidying up comments, but it's more straightforward now,
> IMHO.

You probably didn't mean to leave this "boom" error behind:

> +   if (GistPageIsDeleted(page))
> +   {
> +       elog(ERROR,"boom");

I see that you have this check for deleted pages:

> +       if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber)
> +           ereport(ERROR,
> +                   (errcode(ERRCODE_INDEX_CORRUPTED),
> +                   errmsg("index \"%s\" has deleted page with tuples",
> +                           RelationGetRelationName(rel))));
> +   }

Why not have a similar check for non-deleted pages, whose maxoffset
must be <= MaxIndexTuplesPerPage?

I see various errors like this one:

> +           if (MAXALIGN(ItemIdGetLength(iid)) != MAXALIGN(IndexTupleSize(idxtuple)))
> +               ereport(ERROR,
> +                       (errcode(ERRCODE_INDEX_CORRUPTED),
> +                        errmsg("index \"%s\" has inconsistent tuple sizes",
> +                               RelationGetRelationName(rel))));

Can't we say which TID is involved here, so we can find the offending
corrupt tuple afterwards? Or at least the block number? And maybe even
the LSN of the page? I think that that kind of stuff could be added in
a number of places.

I see this stuff that's related to concurrent processes:

> +       /*
> +        * It's possible that the page was split since we looked at the parent,
> +        * so that we didn't missed the downlink of the right sibling when we
> +        * scanned the parent. If so, add the right sibling to the stack now.
> +        */

> +               /*
> +                * There was a  discrepancy between parent and child tuples.
> +                * We need to verify it is not a result of concurrent call
> +                * of gistplacetopage(). So, lock parent and try to find downlink
> +                * for current page. It may be missing due to concurrent page
> +                * split, this is OK.
> +                */

Is this really needed? Isn't the ShareLock on the index sufficient? If so, why?

> +               stack->parenttup = gist_refind_parent(rel, stack->parentblk, stack->blkno, strategy);

If the gistplacetopage() stuff is truly necessary, then is it okay to
call gist_refind_parent() with the original buffer lock still held
like this?

I still suspect that we should have something like palloc_btree_page()
for GiST, so that we're always operating on a copy of the page in
palloc()'d memory.  Maybe it's worthwhile to do something clever with
concurrently holding buffer locks, but if that's what we're doing here
then I would also expect us to have something weaker than ShareLock as
our relation-level heavyweight lock. And, there should be a prominent
explanation of the theory behind it somewhere.

> What have you been using to test this?

pg_hexedit has full support for GiST.     ;-)

--
Peter Geoghegan

Re: amcheck verification for GiST

От

Andrey Borodin

Дата:

28 марта 2019 г., 16:35:07

Thanks for looking into this!

> 27 марта 2019 г., в 22:29, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):
>
> On 27/03/2019 11:51, Andrey Borodin wrote:
>> Hi!
>> Here's new version of GiST amcheck, which takes into account recently committed GiST VACUUM.
>> It tests that deleted pages do not contain any data.
>
> Thanks! I had a look, and refactored it quite a bit.
Cool! New scan logic is much easier to read.

> I found the way the recursion worked confusing. On each internal page, it looped through all the child nodes,
checkingthe consistency of the downlinks. And then it looped through the children again, to recurse. This isn't
performance-critical,but visiting every page twice still seems strange. 
>
> In gist_check_page_keys(), if we get into the code to deal with a concurrent update, we set 'parent' to point to a
tupleon a parent page, then unlock it, and continue to look at remaining tuples, using the pointer that points to an
unlockedbuffer. 
Uh, that was a tricky bug.

> I came up with the attached, which fixes the above-mentioned things. I also replaced the check that each node has
onlyinternal or leaf children, with a different check that the tree has the same height in all branches. That catches
morepotential problems, and was easier to implement after the refactoring. This still needs at least a round of fixing
typosand tidying up comments, but it's more straightforward now, IMHO. 
>
> What have you been using to test this?
Please see attached patch with line
//if (false) // THIS LINE IS INTENTIONALLY BROKEN
which breaks GiST consistency. Uncomment it to create logically broken GiST.

Also, I've fixed buffer release and small typo (childblkno->parentblkno).

To test that it does not deadlock with inserts and vacuums I use pgbench the same way is it is used here [0] but also
with
SELECT gist_index_parent_check('gist_check_idx');

Also I've added NOTICE when parent is not refound. To test this, I was removing adjust here
if (stack->parenttup && gistgetadjusted(rel, stack->parenttup, idxtuple, state))

> XXX: shouldn't we rather use gist_consistent?
No, consistency test must use scan strategy, which can be absent from opclass.
Parent tuples must be adjusted with every child. We check this simple invariant, it should cover all corner cases.


> 28 марта 2019 г., в 4:57, Peter Geoghegan <pg@bowt.ie> написал(а):
>
>
>> In gist_check_page_keys(), if we get into the code to deal with a
>> concurrent update, we set 'parent' to point to a tuple on a parent page,
>> then unlock it, and continue to look at remaining tuples, using the
>> pointer that points to an unlocked buffer.
>
> Why not just copy the page into a local buffer? See my remarks on this below.
It already was copied, there was a bug of reusing copy from released buffer. In case, when we suspected error, which
outedto be not error. 

>
> You probably didn't mean to leave this "boom" error behind:
>
>> +   if (GistPageIsDeleted(page))
>> +   {
>> +       elog(ERROR,"boom");
Oops. Sorry.

> I see that you have this check for deleted pages:
>
>> +       if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber)
>> +           ereport(ERROR,
>> +                   (errcode(ERRCODE_INDEX_CORRUPTED),
>> +                   errmsg("index \"%s\" has deleted page with tuples",
>> +                           RelationGetRelationName(rel))));
>> +   }
>
> Why not have a similar check for non-deleted pages, whose maxoffset
> must be <= MaxIndexTuplesPerPage?
Done.

>
> I see various errors like this one:
>
>> +           if (MAXALIGN(ItemIdGetLength(iid)) != MAXALIGN(IndexTupleSize(idxtuple)))
>> +               ereport(ERROR,
>> +                       (errcode(ERRCODE_INDEX_CORRUPTED),
>> +                        errmsg("index \"%s\" has inconsistent tuple sizes",
>> +                               RelationGetRelationName(rel))));
>
> Can't we say which TID is involved here, so we can find the offending
> corrupt tuple afterwards? Or at least the block number? And maybe even
> the LSN of the page? I think that that kind of stuff could be added in
> a number of places.
I've added block number and offset whenever known. I do not understand point of LSN here...

>
> I see this stuff that's related to concurrent processes:
>
>> +       /*
>> +        * It's possible that the page was split since we looked at the parent,
>> +        * so that we didn't missed the downlink of the right sibling when we
>> +        * scanned the parent. If so, add the right sibling to the stack now.
>> +        */
>
>> +               /*
>> +                * There was a  discrepancy between parent and child tuples.
>> +                * We need to verify it is not a result of concurrent call
>> +                * of gistplacetopage(). So, lock parent and try to find downlink
>> +                * for current page. It may be missing due to concurrent page
>> +                * split, this is OK.
>> +                */
>
> Is this really needed? Isn't the ShareLock on the index sufficient? If so, why?
There may be concurrent inserts? In GiST they can reorder items on page.

>
>> +               stack->parenttup = gist_refind_parent(rel, stack->parentblk, stack->blkno, strategy);
>
> If the gistplacetopage() stuff is truly necessary, then is it okay to
> call gist_refind_parent() with the original buffer lock still held
> like this?
When we call gist_refind_parent() we hold lock for a child and lock parent.
We exclude concurrent VACUUM, thus parent cannot become a child for current child, because it has to be recycled for
suchcoincidence. 

>
> I still suspect that we should have something like palloc_btree_page()
> for GiST, so that we're always operating on a copy of the page in
> palloc()'d memory.
That's actually is what we are doing now. GistScanItem->parenttup is always a copy. But we are doing more small copies
ofeach individual tuple, because we have to refresh this copies sometimes. 

>  Maybe it's worthwhile to do something clever with
> concurrently holding buffer locks, but if that's what we're doing here
> then I would also expect us to have something weaker than ShareLock as
> our relation-level heavyweight lock. And, there should be a prominent
> explanation of the theory behind it somewhere.
We definitely can run under SharedLock. And I will try to compose up all things that prevent us from using weaker
levelsin next message... 

>
>> What have you been using to test this?
>
> pg_hexedit has full support for GiST.     ;-)
For me it is easier to break GiST in it's code for tests :)

Thanks!

Best regards, Andrey Borodin.


[0]https://www.postgresql.org/message-id/flat/96ec7ebd-42b9-4df5-18a4-42181c8a5a41%40iki.fi#be331694fd0c8f2a575b3e51a4306e72

Вложения

0001-GiST-verification-function-for-amcheck-v7.patch

Re: amcheck verification for GiST

От

Andrey Borodin

Дата:

28 марта 2019 г., 20:08:29


> 28 марта 2019 г., в 18:35, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>>
>> Is this really needed? Isn't the ShareLock on the index sufficient? If so, why?
> There may be concurrent inserts? In GiST they can reorder items on page.

Looks like I've tried to cope with same problem twice:
v3 of the patch used AccessShareLock and many locks with incorrect order.
We could use one of possible solutions: either use ShareLock, or rewrite scan to correct locking order.
But patches v4-v7 use both.
I think we should use AccessShareLock, as long as we implemented tricky logic with gist_refind_parent().


>>> +               stack->parenttup = gist_refind_parent(rel, stack->parentblk, stack->blkno, strategy);
>>
>> If the gistplacetopage() stuff is truly necessary, then is it okay to
>> call gist_refind_parent() with the original buffer lock still held
>> like this?
> When we call gist_refind_parent() we hold lock for a child and lock parent.
> We exclude concurrent VACUUM, thus parent cannot become a child for current child, because it has to be recycled for
suchcoincidence. 
That's merely hard form of paranoia, internal pages are never deleted. gist_index_parent_check() would work just fine
withconcurrent VACUUM, INSERTs and SELECTs. 

Best regards, Andrey Borodin.

Re: amcheck verification for GiST

От

Peter Geoghegan

Дата:

29 марта 2019 г., 03:35:06

On Thu, Mar 28, 2019 at 10:08 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> >> Is this really needed? Isn't the ShareLock on the index sufficient? If so, why?
> > There may be concurrent inserts? In GiST they can reorder items on page.
>
> Looks like I've tried to cope with same problem twice:
> v3 of the patch used AccessShareLock and many locks with incorrect order.
> We could use one of possible solutions: either use ShareLock, or rewrite scan to correct locking order.
> But patches v4-v7 use both.

It definitely has to be one or the other. The combination makes no sense.

-- 
Peter Geoghegan

Re: amcheck verification for GiST

От

Andrey Borodin

Дата:

29 марта 2019 г., 09:30:24

> 29 марта 2019 г., в 5:35, Peter Geoghegan <pg@bowt.ie> написал(а):
>
> On Thu, Mar 28, 2019 at 10:08 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>>>> Is this really needed? Isn't the ShareLock on the index sufficient? If so, why?
>>> There may be concurrent inserts? In GiST they can reorder items on page.
>>
>> Looks like I've tried to cope with same problem twice:
>> v3 of the patch used AccessShareLock and many locks with incorrect order.
>> We could use one of possible solutions: either use ShareLock, or rewrite scan to correct locking order.
>> But patches v4-v7 use both.
>
> It definitely has to be one or the other. The combination makes no sense.

Here's updated patch with AccessShareLock.
I've tried to stress this with combination of random inserts, vaccuums and checks. This process neither failed, nor
deadlocked.
The patch intentionally contains one superflous line to make gist logically broken. This triggers regression test of
amcheck.

Best regards, Andrey Borodin.

Вложения

0001-GiST-verification-function-for-amcheck-v8.patch

Re: amcheck verification for GiST

От

Peter Geoghegan

Дата:

03 апреля 2019 г., 00:26:06

On Thu, Mar 28, 2019 at 11:30 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> Here's updated patch with AccessShareLock.
> I've tried to stress this with combination of random inserts, vaccuums and checks. This process neither failed, nor
deadlocked.
> The patch intentionally contains one superflous line to make gist logically broken. This triggers regression test of
amcheck.

Any thoughts on this, Heikki?

It would be nice to be able to squeeze this into Postgres 12,
especially given that GiST has been enhanced for 12 already.

-- 
Peter Geoghegan

Re: amcheck verification for GiST

От

Alvaro Herrera

Дата:

04 сентября 2019 г., 00:13:51

On 2019-Mar-29, Andrey Borodin wrote:

> Here's updated patch with AccessShareLock.
> I've tried to stress this with combination of random inserts, vaccuums and checks. This process neither failed, nor
deadlocked.
> The patch intentionally contains one superflous line to make gist logically broken. This triggers regression test of
amcheck.

How close are we to this being a committable patch?  CF bot complains
that it doesn't apply anymore (please rebase), but from past discussion
it seems pretty close to ready.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: amcheck verification for GiST

От

Andrey Borodin

Дата:

06 сентября 2019 г., 09:20:12

Hi!

> 4 сент. 2019 г., в 2:13, Alvaro Herrera <alvherre@2ndquadrant.com> написал(а):
>
> On 2019-Mar-29, Andrey Borodin wrote:
>
>> Here's updated patch with AccessShareLock.
>> I've tried to stress this with combination of random inserts, vaccuums and checks. This process neither failed, nor
deadlocked.
>> The patch intentionally contains one superflous line to make gist logically broken. This triggers regression test of
amcheck.
>
> How close are we to this being a committable patch?  CF bot complains
> that it doesn't apply anymore (please rebase), but from past discussion
> it seems pretty close to ready.

Here's rebased version. Changes in v9:
* adjust to usage of table_open
* update new extension version
* check for main fork presence in GiST check too

Thanks!

Best regards, Andrey Borodin.

Вложения

0001-GiST-verification-function-for-amcheck-v9.patch

Re: amcheck verification for GiST

От

Alvaro Herrera from 2ndQuadrant

Дата:

06 сентября 2019 г., 17:02:05

On 2019-Sep-06, Andrey Borodin wrote:

> Here's rebased version. Changes in v9:
> * adjust to usage of table_open
> * update new extension version
> * check for main fork presence in GiST check too

Cool.  On a quick eyeball, your new amcheck.h does not conform to our
conventions: it should not include postgres.h, and it should only
include other headers as needed in order for it to compile standalone (I
think you just need utils/relcache.h and storage/lockdefs.h).  All the
other headers needed for .c files should be in the corresponding .c
files, not in the .h.

Please don't split error messages (errmsg, errdetail etc) in multiple
lines; just leave the line run long (do put arguments beyond a long
format string into separate lines, though).  This improves greppability.
There are some other minor style violations -- nothing that would not be
fixed by pgindent.

Peter, Heikki, are you going to do [at least] one more round of
design/functional review?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: amcheck verification for GiST

От

Peter Geoghegan

Дата:

07 сентября 2019 г., 00:28:40

On Fri, Sep 6, 2019 at 7:02 AM Alvaro Herrera from 2ndQuadrant
<alvherre@alvh.no-ip.org> wrote:
> Peter, Heikki, are you going to do [at least] one more round of
> design/functional review?

I didn't plan on it, but somebody probably should. Are you offering to
commit the patch? If not, I can take care of it.

-- 
Peter Geoghegan

Re: amcheck verification for GiST

От

Alvaro Herrera from 2ndQuadrant

Дата:

07 сентября 2019 г., 00:35:09

On 2019-Sep-06, Peter Geoghegan wrote:

> On Fri, Sep 6, 2019 at 7:02 AM Alvaro Herrera from 2ndQuadrant
> <alvherre@alvh.no-ip.org> wrote:
> > Peter, Heikki, are you going to do [at least] one more round of
> > design/functional review?
> 
> I didn't plan on it, but somebody probably should. Are you offering to
> commit the patch? If not, I can take care of it.

I'd welcome it more if you did it; thanks.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: amcheck verification for GiST

От

Peter Geoghegan

Дата:

07 сентября 2019 г., 01:22:03

On Fri, Sep 6, 2019 at 2:35 PM Alvaro Herrera from 2ndQuadrant
<alvherre@alvh.no-ip.org> wrote:
> I'd welcome it more if you did it; thanks.

I'll take care of it, then.

-- 
Peter Geoghegan

Re: amcheck verification for GiST

От

Peter Geoghegan

Дата:

07 сентября 2019 г., 04:26:04

On Fri, Sep 6, 2019 at 3:22 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I'll take care of it, then.

Attached is v10, which has some comment and style fix-ups, including
the stuff Alvaro mentioned. It also adds line pointer sanitization to
match what I added to verify_nbtree.c in commit a9ce839a (we use a
custom PageGetItemIdCareful() for GiST instead of a simple
PageGetItemId()). I also added a new file/TU for the routines that are
now common to both nbtree and GiST verification, which I named
amcheck.c. (I'm not sure about that, but I don't like verify_nbtree.c
having generic/common functions.)

I have only had a few hours to polish this, which doesn't seem like
enough, though was enough to fix the noticeable stuff.

My main concern now is the heavyweight lock strength needed by the new
function. I don't feel particularly qualified to sign off on the
concurrency aspects of the patch. Heikki's v6 used a ShareLock, like
bt_index_parent_check(), but you went back to an AccessShareLock,
Andrey. Why is this safe? I see that you do gist_refind_parent() in
your v9 a little differently to Heikki in his v6, which you seemed to
suggest made this safe in your e-mail on March 28, but I don't
understand that at all.

-- 
Peter Geoghegan

Вложения

v10-0001-Revisions-of-GiST-amcheck-from-Peter.patch

Re: amcheck verification for GiST

От

Andrey Borodin

Дата:

08 сентября 2019 г., 11:21:30

Alvaro, Peter, thanks for working on this!

> 7 сент. 2019 г., в 6:26, Peter Geoghegan <pg@bowt.ie> написал(а):
>
> On Fri, Sep 6, 2019 at 3:22 PM Peter Geoghegan <pg@bowt.ie> wrote:
>> I'll take care of it, then.
>
> Attached is v10, which has some comment and style fix-ups, including
> the stuff Alvaro mentioned. It also adds line pointer sanitization to
> match what I added to verify_nbtree.c in commit a9ce839a (we use a
> custom PageGetItemIdCareful() for GiST instead of a simple
> PageGetItemId()). I also added a new file/TU for the routines that are
> now common to both nbtree and GiST verification, which I named
> amcheck.c. (I'm not sure about that, but I don't like verify_nbtree.c
> having generic/common functions.)
Maybe we should PageGetItemIdCareful() to amcheck.c too?
I think it can be reused later by GIN entry tree and to some extent by SP-GiST.
SP-GiST uses redirect tuples, but do not store this information in line pointer.

> I have only had a few hours to polish this, which doesn't seem like
> enough, though was enough to fix the noticeable stuff.
Cool, thanks!

> My main concern now is the heavyweight lock strength needed by the new
> function. I don't feel particularly qualified to sign off on the
> concurrency aspects of the patch. Heikki's v6 used a ShareLock, like
> bt_index_parent_check(), but you went back to an AccessShareLock,
> Andrey. Why is this safe? I see that you do gist_refind_parent() in
> your v9 a little differently to Heikki in his v6, which you seemed to
> suggest made this safe in your e-mail on March 28, but I don't
> understand that at all.

Heikki's version was reading childblkno instead of parentblkno, thus never refinding parent tuple.
Also, I've added check that internal page did not become leaf. In this case we would be comparing heap tids with index
tids,and could possibly find incorrect match. 
But, in current GiST implementation, inner page should never become leaf. We do not delete inner pages.
I think we could even convert it into sanity check, but I'm afraid that once upon a time we will implement delete for
internalpages and forget to remove this check. 

Current lock mode is AccessShareLock.
Before we find some discrepancy in tuples we follow standard locking protocol of GiST Index Scan.

When we suspect key consistency violation, we hold lock on page with some tuple. Then we take pin and lock on page that
wasparent for current some time before. 
For example of validity see gistfinishsplit(). Comments state "On entry, the caller must hold a lock on stack->buffer",
line1330 acquires LockBuffer(stack->parent->buffer, GIST_EXCLUSIVE); 
This function is used during inserts, but we are not going to modify data and place row locks, thus neither ROW
EXCLUSIVE,not ROW SHARE is necessary. 

PFA v11. There is one small comment in GistScanItem. Also, I've moved PageGetItemIdCareful() to amcheck.c. I'm not sure
thatthis refactoring is beneficial for patch, but it reduces code size. 

Thanks!

Best regards, Andrey Borodin.

Вложения

v11-0001-GiST-amcheck-from-Andrey.patch

Re: amcheck verification for GiST

От

Peter Geoghegan

Дата:

12 сентября 2019 г., 02:10:20

On Sun, Sep 8, 2019 at 1:21 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> Maybe we should PageGetItemIdCareful() to amcheck.c too?
> I think it can be reused later by GIN entry tree and to some extent by SP-GiST.
> SP-GiST uses redirect tuples, but do not store this information in line pointer.

Well, the details are slightly different in each case in at least one
way -- we use the size of the special area to determine the exact
bounds that it is safe for a tuple to appear in. The size of the
special area varies based on the access method. (Actually, pg_filedump
relies on this difference when inferring which access method a
particular page image is based on -- it starts out by looking at the
standard pd_special field that appears in page headers. So clearly
they're often different.)

> > My main concern now is the heavyweight lock strength needed by the new
> > function. I don't feel particularly qualified to sign off on the
> > concurrency aspects of the patch. Heikki's v6 used a ShareLock, like
> > bt_index_parent_check(), but you went back to an AccessShareLock,
> > Andrey. Why is this safe? I see that you do gist_refind_parent() in
> > your v9 a little differently to Heikki in his v6, which you seemed to
> > suggest made this safe in your e-mail on March 28, but I don't
> > understand that at all.
>
> Heikki's version was reading childblkno instead of parentblkno, thus never refinding parent tuple.

> When we suspect key consistency violation, we hold lock on page with some tuple. Then we take pin and lock on page
thatwas parent for current some time before.

> For example of validity see gistfinishsplit(). Comments state "On entry, the caller must hold a lock on
stack->buffer",line 1330 acquires LockBuffer(stack->parent->buffer, GIST_EXCLUSIVE);

> This function is used during inserts, but we are not going to modify data and place row locks, thus neither ROW
EXCLUSIVE,not ROW SHARE is necessary.

But gistfinishsplit() is called when finishing a page split -- the
F_FOLLOW_RIGHT bit must be set on the leaf. Are you sure that you can
generalize from that, and safely relocate the parent?

I would be a lot more comfortable with this if Heikki weighed in. I am
also concerned about the correctness of this because of this paragraph
from the GiST README file:

"""
This page deletion algorithm works on a best-effort basis. It might fail to
find a downlink, if a concurrent page split moved it after the first stage.
In that case, we won't be able to remove all empty pages. That's OK, it's
not expected to happen very often, and hopefully the next VACUUM will clean
it up.
"""

Why is this not a problem for the new amcheck checks?  Maybe this is a
very naive question. I don't claim to be a GiST expert.

--
Peter Geoghegan

Re: amcheck verification for GiST

От

Michael Paquier

Дата:

28 ноября 2019 г., 06:57:10

On Wed, Sep 11, 2019 at 04:10:20PM -0700, Peter Geoghegan wrote:
> Why is this not a problem for the new amcheck checks?  Maybe this is a
> very naive question. I don't claim to be a GiST expert.

This thread did not receive any updates after a couple of months, and
visibly input was waited from Andrey, so I am marking it as returned
with feedback in the CF.  Please feel free to update the CF entry or
register a new entry once you have dealt with the comments from Peter
--
Michael

Вложения

signature.asc

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: amcheck verification for GiST

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения