Re: Building infrastructure for B-Tree deduplication that recognizeswhen opclass equality is also equivalence

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Building infrastructure for B-Tree deduplication that recognizeswhen opclass equality is also equivalence
Дата
Msg-id CAH2-WznXowi-RTs86WPGxgF+K3CCa5_Ab_LB7wSKk_sHTuxO5Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Building infrastructure for B-Tree deduplication that recognizeswhen opclass equality is also equivalence  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Sun, Aug 25, 2019 at 2:55 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I suppose that we'd add something new to CREATE OPERATOR CLASS to make
> this work? My instinct is to avoid adding things that are only
> meaningful for a single AM to interfaces like CREATE OPERATOR CLASS,
> but the system already has numerous dependencies on B-Tree opclasses
> that seem comparable to me.

Another question is whether or not it would be okay to define
"equality is precise"-ness to be "the system's generic equality
function works perfectly as a drop-in replacement for my own equality
operator's function". The system's generic equality function could be
the recently added datum_image_eq() function -- that looks like it
will do exactly what I have in mind. This would be a new way of using
datum_image_eq(), I think, since it wouldn't be okay for it to give an
answer that differed from the equality operator's function. It looks
like existing datum_image_eq() callers can deal with false negatives
(but not false positives, which are impossible).

This exceeds what is strictly necessary for the deduplication patch,
but it seems like the patch should make comparisons as fast as
possible in the context of deduplicating items (it would be nice if it
could just use datum_image_eq instead of an insertion scankey when
doing many comparisons to deduplicate items). We can imagine a
datatype with undefined garbage bytes that affect the answer that
datum_image_eq() gives, but could be safe targets for deduplication,
so it's not clear if being this aggressive will work. But maybe that
isn't actually possible among types that aren't inherently unsafe for
deduplication. And maybe we could be more aggressive with
optimizations in numerous other contexts by defining "equality is
precise"-ness as strict binary equality after accounting for TOAST
compression.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Building infrastructure for B-Tree deduplication that recognizeswhen opclass equality is also equivalence
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free orcorruption (!prev)