Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Дата
Msg-id CAPpHfdsYBc=vPsF4+5aqrk4DaF5XNvMuDCJkg5DWt-GXXj1qPg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-bugs
( . ()  w On Tue, Jun 22, 2021 at 7:02 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> On Fri, Jun 18, 2021 at 12:55 AM Alexander Korotkov
> <aekorotkov@gmail.com> wrote:
> > On Thu, Jun 17, 2021 at 10:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > Anyway, I'm punting this to the code authors.  I'd like to see
> > > some comments clarifying what the API really is, not just a
> > > quick-n-dirty code fix.
> >
> > Yep, I'm going to have a closer look at this tomorrow.
>
> Sorry for the delay.  I'm going to take a closer look in the next
> couple of days.

I've closer look at shimTriConsistentFn() function.  It looks to me
like the function itself has inconsistencies.  But the way it's
currently used in GIN shouldn't produce the wrong query answers.

shimTriConsistentFn() is one of implementation of
GinScanKey.triConsistentFn.  In turn, GinScanKey.triConsistentFn have
3 callers:
1) startScanKey() to determine required keys
2) keyGetItem() for lossy item check
3) keyGetItem() for normal item check

Let's see shimTriConsistentFn() inconsistencies and how callers handle them.
1) shimTriConsistentFn() returns result of directBoolConsistentFn()
for zero maybe entries without examining key->recheckCurItem.  That
may result in returning GIN_TRUE instead of GIN_MAYBE
1.1) startScanKey() doesn't care about recheck, just looking for
GIN_FALSE result.
1.2) keyGetItem() for lossy item always implies recheck.
1.3) keyGetItem() for a normal item does the trick.  Whether a current
item is rechecked is controlled by key->recheckCurItem variable (the
same which is set by directBoolConsistentFn().  The following switch
sets key->recheckCurItem for GIN_MAYBE, but leaves it untouched for
GIN_TRUE.  Thus, GIN_TRUE with key->recheckCurItem works here just
like GIN_MAYBE.  I think this is inconsistent by itself, but this
inconsistency compensates for inconsistency in shimTriConsistentFn().
2) shimTriConsistentFn() might call directBoolConsistentFn() with all
false inputs for GIN_SEARCH_MODE_DEFAULT.  The result is intended to
be false, but opclass consistent method isn't intended to handle this
situation correctly.  For instance, ginint4_consistent() returns true
without checking the input array.  That could make
shimTriConsistentFn() return GIN_TRUE instead of GIN_MAYBE, or
GIN_MAYBE instead of GIN_FALSE.
2.1) In principle, this could lead startScanKey() to select less
required entries than possible.  But this is definitely not the case
of ginint4_consistent() when meeting any of entries is enough for
match.
2.2) In principle, keyGetItem() could get false positives for lossy
items.  But that wouldn't lead to wrong query answers.  Again, this is
not the case of ginint4_consistent().
2.3) keyGetItem() for a normal item doesn't call triConsistentFn()
with any GIN_MAYBE or all GIN_FALSE.

To sum up, I don't see how inconsistencies in shimTriConsistentFn()
could lead to wrong query answers, especially in intarray GIN index.
Nevertheless, these inconsistencies should be fixed.  I'm going to
propose a patch soon.

------
Regards,
Alexander Korotkov



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #17083: [PATCH] PostgreSQL fails to build with OpenLDAP 2.5.x
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17099: Problem with EXECUTE and JSON