Understanding GIN indexes

Поиск
Список
Период
Сортировка
От Jack Orenstein
Тема Understanding GIN indexes
Дата
Msg-id CAGNxcass8pF9H1+19dRO1VKaDxTdOZ-2cu11FCX0_wy16fEi5g@mail.gmail.com
обсуждение исходный текст
Список pgsql-general
I am building a new type, which will be indexed using a GIN index. Things are starting to work, and I am seeing queries use the index, call the partialMatch(), consistent(), and compare() functions, and return correct results.

However, I am still unclear on some aspects of how partialMatch and consistent are supposed to work, (so my implementation of consistent() always sets *refresh to true).

1) The recheck logic of consistent() is unclear to me. The docs say (https://www.postgresql.org/docs/12/gin-extensibility.html):

On success, *recheck should be set to true if the heap tuple needs to be rechecked against the query operator, or false if the index test is exact. That is, a false return value guarantees that the heap tuple does not match the query; a true return value with *recheck set to false guarantees that the heap tuple does match the query; and a true return value with *recheck set to true means that the heap tuple might match the query, so it needs to be fetched and rechecked by evaluating the query operator directly against the originally indexed item.

How can it ever be correct to return true and set *recheck to false? My understanding of conventional (btree) indexes is that the row needs to be retrieved, and the index condition rechecked, because the table has visibility information, and the index does not -- a key in the index might correspond to an obsolete row version. I understand visibility map optimizations, and the fact that going to the actual data page can sometimes be skipped. But that doesn't seem to be what the consistent() refetch flag is about.

In other words, how can consistent() ever decide that a recheck is not necessary, since the index entry may be from an obsolete row version?  Couldn't returning true and setting *recheck to false result in a false positive?

2) For partial matches, why does consistent() need to be called at all? For a given key (2nd arg), partialMatch() decides whether the key satisfies the index condition. Why is a further check by consistent() required?

I think that my mental model of how GIN works must be way off. Is there a presentation or paper that explains how GIN works?

Jack Orenstein

В списке pgsql-general по дате отправления:

Предыдущее
От: Ron
Дата:
Сообщение: Re: Define hash partition for certain column values
Следующее
От: Jack Orenstein
Дата:
Сообщение: Re: Static memory, shared memory