Re: Removing INNER JOINs

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Removing INNER JOINs
Дата
Msg-id CAApHDvroLRWYr+_yoihJ+6_2g_BghBwZDPmJgEg0nRYSANZ-Vw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Removing INNER JOINs  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Ответы Re: Removing INNER JOINs  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Removing INNER JOINs  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 15 January 2015 at 08:36, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/13/15 5:02 AM, David Rowley wrote:

I can't quite get my head around what you mean here, as the idea sounds quite similar to something that's been discussed already and ruled out.
If we're joining relation a to relation b, say the plan chosen is a merge join. If we put some special node as the parent of the merge join then how will we know to skip or not skip any sorts that are there especially for the merge join, or perhaps the planner choose an index scan as a sorted path, where now that the join is removed, could become a faster seqscan. The whole plan switching node discussion came from this exact problem. Nobody seemed to like the non-optimal plan that was not properly optimised for having the relation removed.

In my mind this is the same as a root level Alternative Plan, so you'd be free to do whatever you wanted in the alternate:

-> blah blah
  -> Alternate
    -> Merge join
       ...
    -> SeqScan

I'm guessing this would be easier to code, but that's just a guess. The other advantage is if you can't eliminate the join to table A at runtime you could still eliminate table B, whereas a top-level Alternate node doesn't have that flexibility.

This does have a disadvantage of creating more plan variations to consider. With a single top-level Alternate node there's only one other option. I believe multiple Alternates would create more options to consider.

Ultimately, unless this is easier to code than a top-level alternate, it's probably not worth pursuing.


I think it's probably possible to do this, but I think it would require calling make_one_rel() with every combination of each possibly removable relations included and not included in the join list. I'm thinking this could end up a lot of work as the number of calls to make_one_rel() would be N^2, where N is the number of relations that may be removable.

My line of thought was more along the lines of that the backup/all purpose plan will only be used in very rare cases. Either when a fk has been deferred or if the query is being executed from within a volatile function which has been called by an UPDATE statement which has just modified the table causing a foreign key trigger to be queued. I'm willing to bet someone does this somewhere in the world, but the query that's run would also have to have a removable join. (One of the regression tests I've added exercises this)

For that reason I thought it was best to generate only 2 plans. One with *all* possible removable rels removed, and a backup one with nothing removed which will be executed if there's any FK triggers queued up.
 
It also seems that transitioning through needless nodes comes at a cost. This is why I quite liked the Alternative Plan node idea, as it allowed me to skip over the alternative plan node at plan initialisation.

For init I would expect this to result in a smaller number of nodes than a top-level Alternate, because you wouldn't be duplicating all the stuff above the joins. That said, I rather doubt it's worth worrying about the cost to init; won't it be completely swamped by extra planning cost, no matter how we go about this?


I'm not worried about the cost of the decision at plan init time. I was just confused about what Tom was recommending. I couldn't quite decide from his email if he meant to keep the alternative plan node around so that the executor must transition through it for each row, or to just choose the proper plan at executor init and return the actual root of the selected plan instead of returning the initialised AlternativePlan node (see nodeAlternativePlan.c)

The two ways of doing this have a massively different look in the EXPLAIN output. With the method the patch currently implements only 1 of the 2 alternative plans are seen by EXPLAIN, this is because I've coded ExecInitAlternativePlan() to return the root node only 1 of the 2 plans. If I had kept the AlternativePlan node around then the EXPLAIN output would have 2 plans, both sitting under the AlternativePlan node. 

I've attached a rebased patch which is based on master as of today.

Any comments/reviews are welcome.

Regards

David Rowley
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dmitry Voronin
Дата:
Сообщение: Question about TEMP tables
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Removing INNER JOINs