Re: Non-deterministic IndexTuple toast compression fromindex_form_tuple() + amcheck false positives

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Non-deterministic IndexTuple toast compression fromindex_form_tuple() + amcheck false positives
Дата
Msg-id CAH2-WznJZXUb_4ZN+e_W7U3rCHUne0TCzX4hK0dY6+VoD6onMw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Non-deterministic IndexTuple toast compression fromindex_form_tuple() + amcheck false positives  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Non-deterministic IndexTuple toast compression fromindex_form_tuple() + amcheck false positives  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Wed, Jan 23, 2019 at 10:59 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > The fix here must be to normalize index tuples that are compressed
> > within amcheck, both during initial fingerprinting, and during
> > subsequent probes of the Bloom filter in bt_tuple_present_callback().
>
> I happened to talk to Andres about this in person yesterday. He
> thought that there was reason to be concerned about the need for
> logical normalization beyond TOAST issues. Expression indexes were a
> particular concern, because they could in principle have a change in
> the on-disk representation without a change of logical values -- false
> positives could result. He suggested that the long term solution was
> to bring hash operator class hash functions into Bloom filter hashing,
> at least where available.

I think that the best way forward is to normalize to compensate for
inconsistent input datum TOAST state, and leave it at that. ISTM that
logical normalization beyond that (based on hashing, or anything else)
creates more problems than it solves. I am concerned about cases like
INCLUDE indexes (which may have datums that lack even a B-Tree
opclass), and about the logical-though-semantically-relevant facets of
some datatypes such as numeric's display scale. If I can get an
example from Andres of a case where further logical normalization is
necessary to avoid false positives with expression indexes, that may
change things. (BTW, I implemented another amcheck enhancement that
searches indexes from the root to find matches -- the code is a
trivial addition to the new patch series I'm working on, and seems
like a better way to do enhanced logical normalization if that proves
to be truly necessary.)

Attached draft patch fixes the bug by doing fairly simple
normalization. I think that TOAST compression of datums in indexes is
fairly rare in practice, so I'm not very worried about the fact that
this won't perform as well as it could with indexes that have a lot of
compressed datums. I think that the interface I've added might need to
be expanded for other things in the future (e.g., to make amcheck work
with nbtree-native duplicate compression), and not worrying about the
performance too much helps with that goal.

I'll pick this up next week, and likely commit a fix by Wednesday or
Thursday if there are no objections. I'm not sure if the test case is
worth including.

-- 
Peter Geoghegan

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: WIP: Avoid creation of the free space map for small tables
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: [Patch] Log10 and hyperbolic functions for SQL:2016 compliance