Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

Поиск

Список

Период

Сортировка

От	Heikki Linnakangas
Тема	Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Дата	27 ноября 2023 г. 13:39:16
Msg-id	86930045-5df5-494a-b4f1-815bc3fbcce0@iki.fi обсуждение исходный текст
Ответ на	Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan (Peter Geoghegan <pg@bowt.ie>)
Ответы	Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Список	pgsql-hackers

Дерево обсуждения

On 21/11/2023 04:52, Peter Geoghegan wrote:
> Attached is v7.

First, some high-level reactions before looking at the patch very closely:

- +1 on the general idea. Hard to see any downsides if implemented right.

- This changes the meaning of amsearcharray==true to mean that the 
ordering is preserved with ScalarArrayOps, right? You change B-tree to 
make that true, but what about any out-of-tree index AM extensions? I 
don't know if any such extensions exist, and I don't think we should 
jump through any hoops to preserve backwards compatibility here, but 
probably deserves a notice in the release notes if nothing else.

- You use the term "primitive index scan" a lot, but it's not clear to 
me what it means. Does one ScalarArrayOps turn into one "primitive index 
scan"? Or each element in the array turns into a separate primitive 
index scan? Or something in between? Maybe add a new section to the 
README explain how that works.

- _bt_preprocess_array_keys() is called for each btrescan(). It performs 
a lot of work like cmp function lookups and desconstructing and merging 
the arrays, even if none of the SAOP keys change in the rescan. That 
could make queries with nested loop joins like this slower than before: 
"select * from generate_series(1, 50) g, tenk1 WHERE g = tenk1.unique1 
and tenk1.two IN (1,2);".

- nbtutils.c is pretty large now. Perhaps create a new file 
nbtpreprocesskeys.c or something?

- You and Matthias talked about an implicit state machine. I wonder if 
this could be refactored to have a more explicit state machine. The 
state transitions and interactions between _bt_checkkeys(), 
_bt_advance_array_keys() and friends feel complicated.


And then some details:

> --- a/doc/src/sgml/monitoring.sgml
> +++ b/doc/src/sgml/monitoring.sgml
> @@ -4035,6 +4035,19 @@ description | Waiting for a newly initialized WAL file to reach durable storage
>     </para>
>    </note>
>  
> +  <note>
> +   <para>
> +    Every time an index is searched, the index's
> +    <structname>pg_stat_all_indexes</structname>.<structfield>idx_scan</structfield>
> +    field is incremented.  This usually happens once per index scan node
> +    execution, but might take place several times during execution of a scan
> +    that searches for multiple values together.  Only queries that use certain
> +    <acronym>SQL</acronym> constructs to search for rows matching any value
> +    out of a list (or an array) of multiple scalar values are affected.  See
> +    <xref linkend="functions-comparisons"/> for details.
> +   </para>
> +  </note>
> +

Is this true even without this patch? Maybe commit this separately.

The "Only queries ..." sentence feels difficult. Maybe something like 
"For example, queries using IN (...) or = ANY(...) constructs.".

>  * _bt_preprocess_keys treats each primitive scan as an independent piece of
>  * work.  That structure pushes the responsibility for preprocessing that must
>  * work "across array keys" onto us.  This division of labor makes sense once
>  * you consider that we're typically called no more than once per btrescan,
>  * whereas _bt_preprocess_keys is always called once per primitive index scan.

"That structure ..." is a garden-path sentence. I kept parsing "that 
must work" as one unit, the same way as "that structure", and it didn't 
make sense. Took me many re-reads to parse it correctly. Now that I get 
it, it doesn't bother me anymore, but maybe it could be rephrased.

Is there _any_ situation where _bt_preprocess_array_keys() is called 
more than once per btrescan?

>     /*
>      * Look up the appropriate comparison operator in the opfamily.
>      *
>      * Note: it's possible that this would fail, if the opfamily is
>      * incomplete, but it seems quite unlikely that an opfamily would omit
>      * non-cross-type comparison operators for any datatype that it supports
>      * at all. ...
>      */

I agree that's unlikely. I cannot come up with an example where you 
would have cross-type operators between A and B, but no same-type 
operators between B and B. For any real-world opfamily, that would be an 
omission you'd probably want to fix.

Still I wonder if we could easily fall back if it doesn't exist? And 
maybe add a check in the 'opr_sanity' test for that.

In _bt_readpage():
>     /*
>      * Prechecking the page with scan keys required for direction scan.  We
>      * check these keys with the last item on the page (according to our scan
>      * direction).  If these keys are matched, we can skip checking them with
>      * every item on the page.  Scan keys for our scan direction would
>      * necessarily match the previous items.  Scan keys required for opposite
>      * direction scan are already matched by the _bt_first() call.
>      *
>      * With the forward scan, we do this check for the last item on the page
>      * instead of the high key.  It's relatively likely that the most
>      * significant column in the high key will be different from the
>      * corresponding value from the last item on the page.  So checking with
>      * the last item on the page would give a more precise answer.
>      *
>      * We skip this for the first page in the scan to evade the possible
>      * slowdown of point queries.  Never apply the optimization with a scans
>      * that uses array keys, either, since that breaks certain assumptions.
>      * (Our search-type scan keys change whenever _bt_checkkeys advances the
>      * arrays, invalidating any precheck.  Tracking all that would be tricky.)
>      */
>     if (!so->firstPage && !numArrayKeys && minoff < maxoff)
>     {

It's sad to disable this optimization completely for array keys. It's 
actually a regression from current master, isn't it? There's no 
fundamental reason we couldn't do it for array keys so I think we should 
do it.

_bt_checkkeys() is called in an assertion in _bt_readpage, but it has 
the side-effect of advancing the array keys. Side-effects from an 
assertion seems problematic.

Vague idea: refactor _bt_checkkeys() into something that doesn't have 
side-effects, and have a separate function or an argument to 
_bt_checkkeys() to advance to next array key. The prechecking 
optimization and the Assertion could both use the side-effect-free function.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan