Re: a path towards replacing GEQO with something better

Поиск
Список
Период
Сортировка
От John Naylor
Тема Re: a path towards replacing GEQO with something better
Дата
Msg-id CAFBsxsFYUiRGy5wVfhieKXPKW=ooFpwkpa9rS6qB=Mr4bZZ-8A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: a path towards replacing GEQO with something better  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: a path towards replacing GEQO with something better  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Tue, Jun 15, 2021 at 12:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jun 9, 2021 at 9:24 PM John Naylor <john.naylor@enterprisedb.com> wrote:
> > 3) It actually improves the existing exhaustive search, because the complexity of the join order problem depends on the query shape: a "chain" shape (linear) vs. a "star" shape (as in star schema), for the most common examples. The size of the DP table grows like this (for n >= 4):
...
> I don't quite understand the difference between the "chain" case and
> the "star" case. Can you show sample queries for each one? e.g. SELECT
> ... FROM a_1, a_2, ..., a_n WHERE <something>?

There's a very simple example in the optimizer README:

--
SELECT  *
FROM    tab1, tab2, tab3, tab4
WHERE   tab1.col = tab2.col AND
    tab2.col = tab3.col AND
    tab3.col = tab4.col

Tables 1, 2, 3, and 4 are joined as:
{1 2},{2 3},{3 4}
{1 2 3},{2 3 4}
{1 2 3 4}
(other possibilities will be excluded for lack of join clauses)

SELECT  *
FROM    tab1, tab2, tab3, tab4
WHERE   tab1.col = tab2.col AND
    tab1.col = tab3.col AND
    tab1.col = tab4.col

Tables 1, 2, 3, and 4 are joined as:
{1 2},{1 3},{1 4}
{1 2 3},{1 3 4},{1 2 4}
{1 2 3 4}
--

The first one is chain, and the second is star. Four is the smallest set where we have a difference. I should now point out an imprecision in my language: By "size of the DP table", the numbers in my first email refer to the number of joinrels times the number of possible joins (not paths, and ignoring commutativity). Here are the steps laid out, with cumulative counts:

join_level, # joins,  cumulative # joins:

linear, n=4
 2               3               3
 3               4               7
 4               3              10

star, n=4
 2               3               3
 3               6               9
 4               3              12

And of course, the chain query also has three at the last level, because it tries two left- (or right-) deep joins and one bushy join.

> One idea I just ran across in
> https://15721.courses.cs.cmu.edu/spring2020/papers/22-costmodels/p204-leis.pdf
> is to try to economize by skipping consideration of bushy plans.

That's a good paper, and it did influence my thinking.

You likely already know this, but for the archives: If only chain queries could have bushy plans, it wouldn't matter because they are so cheap to enumerate. But, since star queries can introduce a large number of extra joins via equivalence (same join column or FK), making them resemble "clique" queries, bushy joins get excessively numerous.

> We
> could start doing that when some budget is exceeded, similar to what
> you are proposing here, but probably the budget for skipping
> consideration of bushy plans would be smaller than the budget for
> switching to IDP. The idea of skipping bushy plan generation in some
> cases makes sense to me intuitively because most of the plans
> PostgreSQL generates are mostly left-deep, and even when we do
> generate bushy plans, they're not always a whole lot better than the
> nearest equivalent left-deep plan. The paper argues that considering
> bushy plans makes measurable gains, but also that failure to consider
> such plans isn't catastrophically bad.

I think that makes sense. There are a few things we could do within the "grey zone" -- too many rels to finish exhaustive search, but not enough to justify starting directly with the greedy step -- to increase our chances of completing, and that's a very simple one.

--
John Naylor
EDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: disfavoring unparameterized nested loops
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: snapshot too old issues, first around wraparound and then more.