Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Дата
Msg-id CAH2-Wzkw8pATO6p13pFHzD5jDBGhMhH6dS3O7vGpFTEN1d_MTQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan  (Alena Rybakina <lena.ribackina@yandex.ru>)
Список pgsql-hackers
On Mon, Jul 31, 2023 at 12:24 PM Alena Rybakina
<lena.ribackina@yandex.ru> wrote:
> I noticed that you are going to add CNF->DNF transformation at the index
> construction stage. If I understand correctly, you will rewrite
> restrictinfo node,
> change boolean "AND" expressions to "OR" expressions, but would it be
> possible to apply such a procedure earlier?

Sort of. I haven't really added any new CNF->DNF transformations. The
code you're talking about is really just checking that every index
path has clauses that we know that nbtree can handle. That's a big,
ugly modularity violation -- many of these details are quite specific
to the nbtree index AM (in theory we could have other index AMs that
are amsearcharray).

At most, v1 of the patch makes greater use of an existing
transformation that takes place in the nbtree index AM, as it
preprocesses scan keys for these types of queries (it's not inventing
new transformations at all). This is a slightly creative
interpretation, too. Tom's commit 9e8da0f7 didn't actually say
anything about CNF/DNF.

> Otherwise I suppose you
> could face the problem of
> incorrect selectivity of the calculation and, consequently, the
> cardinality calculation?

I can't think of any reason why that should happen as a direct result
of what I have done here. Multi-column index paths + multiple SAOP
clauses are not a new thing. The number of rows returned does not
depend on whether we have some columns as filter quals or not.

Of course that doesn't mean that the costing has no problems. The
costing definitely has several problems right now.

It also isn't necessarily okay that it's "just as good as before" if
it turns out that it needs to be better now. But I don't see why it
would be. (Actually, my hope is that selectivity estimation might be
*less* important as a practical matter with the patch.)

> I can't clearly understand at what stage it is clear that the such a
> transformation needs to be applied?

I don't know either.

I think that most of this work needs to take place in the nbtree code,
during preprocessing. But it's not so simple. There is a mutual
dependency between the code that generates index paths in the planner
and nbtree scan key preprocessing. The planner needs to know what
kinds of index paths are possible/safe up-front, so that it can choose
the fastest plan (the fastest that the index AM knows how to execute
correctly). But, there are lots of small annoying nbtree
implementation details that might matter, and can change.

I think we need to have nbtree register a callback, so that the
planner can initialize some preprocessing early. I think that we
require a "two way conversation" between the planner and the index AM.

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Faster "SET search_path"
Следующее
От: Joe Conway
Дата:
Сообщение: Re: pgsql: Fix search_path to a safe value during maintenance operations.