Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)
Дата
Msg-id 2484032b-3e6d-ae6a-61e5-732e2136ad18@iki.fi
обсуждение исходный текст
Ответ на Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
On 02/07/2023 19:09, Tomas Vondra wrote:
> Here's an updated version of the patch series.
> 
> I've polished and pushed the first three patches with cleanup, tests to
> improve test coverage and so on. I chose not to backpatch those - I
> planned to do that to make future backpatches simpler, but the changes
> ended up less disruptive than expected.
> 
> The remaining patches are just about adding SK_SEARCHARRAY to BRIN.
> 
> 0001 - adds the optional preprocess procedure, calls it from brinrescan
> 
> 0002 to 0005 - adds the support to the existing BRIN opclasses

Could you implement this completely in the consistent-function, by 
caching the sorted array in fn_extra, without adding the new preprocess 
procedure? On first call, when fn_extra == NULL, sort the array and 
stash it in fn_extra.

I don't think that works, because fn_extra isn't reset when the scan 
keys change on rescan. We could reset it, and document that you can use 
fn_extra for per-scankey caching. There's some precedence for not 
resetting it though, see commit d22a09dc70f. But we could provide an 
opaque per-scankey scratch space like that somewhere else. In BrinDesc, 
perhaps.

The new preprocess support function feels a bit too inflexible to me. 
True, you can store whatever you want in the ScanKey that it returns, 
but since that's the case, why not just make it void * ?It seems that 
the constraint here was that you need to pass a ScanKey to the 
consistent function, because the consistent function's signature is what 
it is. But we can change the signature, if we introduce a new support 
amproc number for it.

> The main open question I have is what exactly does it mean that the
> procedure is optional. In particular, should it be supported to have a
> BRIN opclass without the "preprocess" procedure but using the other
> built-in support procedures?
>
> For example, imagine you have a custom BRIN opclass in an extension (for
> a custom data type or something). This does not need to implement any
> procedures, it can just call the existing built-in ones. Of course, this
> won't get the "preprocess" procedure automatically.
> 
> Should we support such opclasses or should we force the extension to be
> updated by adding a preprocess procedure? I'd say "optional" means we
> should support (otherwise it'd not really optional).
> 
> The reason why this matters is that "amsearcharray" is AM-level flag,
> but the support procedure is defined by the opclass. So the consistent
> function needs to handle SK_SEARCHARRAY keys both with and without
> preprocessing.
> 
> That's mostly what I did for all existing BRIN opclasses (it's a bit
> confusing that opclass may refer to both the "generic" minmax or the
> opclass defined for a particular data type). All the opclasses now
> handle three cases:
> 
> 1) scalar keys (just like before, with amsearcharray=fase)
> 
> 2) array keys with preprocessing (sorted array, array of hashes, ...)
> 
> 3) array keys without preprocessing (for compatibility with old
>     opclasses missing the optional preprocess procedure)
> 
> The current code is a bit ugly, because it duplicates a bunch of code,
> because the option (3) mostly does (1) in a loop. I'm confident this can
> be reduced by refactoring and reusing some of the "shared" code.
> 
> The question is if my interpretation of what "optional" procedure means
> is reasonable. Thoughts?
> 
> The other thing is how to test this "compatibility" code. I assume we
> want to have the procedure for all built-in opclasses, so that won't
> exercise it. I did test it by temporarily removing the procedure from a
> couple pg_amproc.dat entries. I guess creating a custom opclass in the
> regression test is the right solution.

It would be unpleasant to force all BRIN opclasses to immediately 
implement the searcharray-logic. If we don't want to do that, we need to 
implement generic SK_SEARCHARRAY handling in BRIN AM itself. That would 
be pretty easy, right? Just call the regular consistent function for 
every element in the array.

If an opclass wants to provide a faster/better implementation, it can 
provide a new consistent support procedure that supports that. Let's 
assign a new amproc number for new-style consistent function, which must 
support SK_SEARCHARRAY, and pass it some scratch space where it can 
cache whatever per-scankey data. Because it gets a new amproc number, we 
can define the arguments as we wish. We can pass a pointer to the 
per-scankey scratch space as a new argument, for example.

We did this backwards-compatibility dance with the 3/4-argument variants 
of the current consistent functions. But I think assigning a whole new 
procedure number is better than looking at the number of arguments.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: Reducing connection overhead in pg_upgrade compat check phase
Следующее
От: Nathan Bossart
Дата:
Сообщение: Re: Preventing non-superusers from altering session authorization