Re: POC, WIP: OR-clause support for indexes

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: POC, WIP: OR-clause support for indexes
Дата
Msg-id CAH2-WznKiUwPtzLOVa7SQVSfurttZ+ULSRqhyseG_XbkYeyiag@mail.gmail.com
обсуждение исходный текст
Ответ на Re: POC, WIP: OR-clause support for indexes  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: POC, WIP: OR-clause support for indexes  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Re: POC, WIP: OR-clause support for indexes  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Mon, Nov 27, 2023 at 1:04 PM Robert Haas <robertmhaas@gmail.com> wrote:
> The use of op_mergejoinable() seems pretty random to me. Why should we
> care about that? If somebody writes a<1 or a<2 or a<3 or a<4, you can
> transform that to a<any(array[1,2,3,4]) if you want. It might not be a
> good idea, but I think it's a legal transformation.

That kind of transformation is likely to be a very good idea, because
nbtree's _bt_preprocess_array_keys() function knows how to perform
preprocessing that makes the final index qual "a < 1". Obviously that
could be far more efficient.

Further suppose you have a machine generated query  "a<1 or a<2 or a<3
or a<4 AND a = 2" -- same as before, except that I added "AND a = 2"
to the end. Now _bt_preprocess_array_keys() will be able to do the
aforementioned inequality preprocessing, just as before. But this time
_bt_preprocess_keys() (a different function with a similar name) can
see that the quals are contradictory. That makes the entire index scan
end, before it ever really began.

Obviously, this is an example of a more general principle: a great
deal of the benefit of these transformations is indirect, in the sense
that they come from enabling further transformations/optimizations,
that apply in the context of some particular query. So you have to
think holistically.

It's perhaps a little unfortunate that all of this nbtree
preprocessing stuff is totally opaque to the planner. Tom has
expressed concerns about that in the past, FWIW:

https://www.postgresql.org/message-id/2587523.1647982549@sss.pgh.pa.us
(see the final paragraph for the reference)

There might be some bigger picture to doing all of these
transformations, in a way that maximizes opportunities to apply
further transformations/optimizations. You know much more about the
optimizer than I do, so maybe this is already very obvious. Just
pointing it out.

> Honestly, it seems very hard to avoid the conclusion that this
> transformation is being done at too early a stage. Parse analysis is
> not the time to try to do query optimization. I can't really believe
> that there's a way to produce a committable patch along these lines.
> Ideally, a transformation like this should be done after we know what
> plan shape we're using (or considering using), so that we can make
> cost-based decisions about whether to transform or not. But at the
> very least it should happen somewhere in the planner. There's really
> no justification for parse analysis rewriting the SQL that the user
> entered.

I am sure that there is a great deal of truth to this. The general
conclusion about parse analysis being the wrong place for this seems
very hard to argue with. But I'm much less sure that there needs to be
a conventional cost model.

The planner's cost model is supposed to have some basis in physical
runtime costs, which is not the case for any of these transformations.
Not in any general sense; they're just transformations that enable
finding a cheaper way to execute the query. While they have to pay for
themselves, in some sense, I think that that's purely a matter of
managing the added planner cycles. In principle they shouldn't have
any direct impact on the physical costs incurred by physical
operators. No?

As I keep pointing out, there is a sound theoretical basis to the idea
of normalizing to conjunctive normal form as its own standard step in
query processing. To some extent we do this already, but it's all
rather ad-hoc. Even if (say) the nbtree preprocessing transformations
that I described were something that the planner already knew about
directly, they still wouldn't really need to be costed. They're pretty
much strictly better at runtime (at most you only have to worry about
the fixed cost of determining if they apply at all).

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: Re: logical decoding and replication of sequences, take 2
Следующее
От: Nathan Bossart
Дата:
Сообщение: Re: common signal handler protection