Re: [HACKERS] why not parallel seq scan for slow functions

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: [HACKERS] why not parallel seq scan for slow functions
Дата
Msg-id CAMkU=1ymvFbTCYFgzj45_EMzBg=ddQ_m2j3cObzU=vywqttf-A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] why not parallel seq scan for slow functions  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: [HACKERS] why not parallel seq scan for slow functions  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Wed, Jul 12, 2017 at 7:08 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit.kapila16@gmail.com>
> wrote:
>>
>> On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> > On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com>
>> > wrote:
>> >>
>> >> So because of this high projection cost the seqpath and parallel path
>> >> both have fuzzily same cost but seqpath is winning because it's
>> >> parallel safe.
>> >
>> >
>> > I think you are correct.  However, unless parallel_tuple_cost is set
>> > very
>> > low, apply_projection_to_path never gets called with the Gather path as
>> > an
>> > argument.  It gets ruled out at some earlier stage, presumably because
>> > it
>> > assumes the projection step cannot make it win if it is already behind
>> > by
>> > enough.
>> >
>>
>> I think that is genuine because tuple communication cost is very high.
>
>
> Sorry, I don't know which you think is genuine, the early pruning or my
> complaint about the early pruning.
>

Early pruning.  See, currently, we don't have a way to maintain both
parallel and non-parallel paths till later stage and then decide which
one is better. If we want to maintain both parallel and non-parallel
paths, it can increase planning cost substantially in the case of
joins.  Now, surely it can have benefit in many cases, so it is a
worthwhile direction to pursue.

If I understand it correctly, we have a way, it just can lead to exponential explosion problem, so we are afraid to use it, correct?  If I just lobotomize the path domination code (make pathnode.c line 466 always test false) 

                if (JJ_all_paths==0 && costcmp != COSTS_DIFFERENT)

Then it keeps the parallel plan and later chooses to use it (after applying your other patch in this thread) as the overall best plan.  It even doesn't slow down "make installcheck-parallel" by very much, which I guess just means the regression tests don't have a lot of complex joins.

But what is an acceptable solution?  Is there a heuristic for when retaining a parallel path could be helpful, the same way there is for fast-start paths?  It seems like the best thing would be to include the evaluation costs in the first place at this step.

Why is the path-cost domination code run before the cost of the function evaluation is included?  Is that because the information needed to compute it is not available at that point, or because it would be too slow to include it at that point? Or just because no one thought it important to do?

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Yura Sokolov
Дата:
Сообщение: Re: [HACKERS] Walsender timeouts and large transactions
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: [HACKERS] Macros bundling RELKIND_* conditions