Re: [DESIGN] ParallelAppend

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [DESIGN] ParallelAppend
Дата
Msg-id CA+TgmoYzGJ208n-pzQCx71e139o5XMQ6TJLaotRPZYm-N7=BPg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [DESIGN] ParallelAppend  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: [DESIGN] ParallelAppend
Список pgsql-hackers
On Wed, Nov 18, 2015 at 7:25 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Don't we need the startup cost incase we need to build partial paths for
> joinpaths like mergepath?
> Also, I think there are other cases for single relation scan where startup
> cost can matter like when there are psuedoconstants in qualification
> (refer cost_qual_eval_walker()) or let us say if someone has disabled
> seq scan (disable_cost is considered as startup cost.)

I'm not saying that we don't need to compute it.  I'm saying we don't
need to take it into consideration when deciding which paths have
merit.   Note that consider_statup is set this way:

    rel->consider_startup = (root->tuple_fraction > 0);

And for a joinrel:

    joinrel->consider_startup = (root->tuple_fraction > 0);

root->tuple_fraction is 0 when we expect all tuples to be retrieved,
and parallel query can currently only be used when all tuples will be
retrieved.

> B. I think partial path is an important concept and desrves some
> explanation in src/backend/optimizer/README.
> There is already a good explanation about Paths, so I think it
> seems that it is better to add explanation about partial paths.

Good idea.  In the attached, revised version of the patch, I've added
a large new section to that README.

> 2.
> + *  costs: parallelism is only used for plans that will be run to
> completion.
> + *    Therefore, this
> routine is much simpler than add_path: it needs to
> + *    consider only pathkeys and total cost.
>
> There seems to be some spacing issue in last two lines.

Fixed.

> A.
> This means that for inheritance child relations for which rel pages are
> less than parallel_threshold, it will always consider the cost shared
> between 1 worker and leader as per below calc in cost_seqscan:
> if (path->parallel_degree > 0)
> run_cost = run_cost / (path->parallel_degree + 0.5);
>
> I think this might not be the appropriate cost model for even for
> non-inheritence relations which has pages more than parallel_threshold,
> but it seems to be even worst for inheritance children which have
> pages less than parallel_threshold

Why?  I'm certainly open to patches to improve the cost model, but I
don't see why this patch needs to do that.

> B.
> Will it be possible that if none of the inheritence child rels (or very few
> of them) are big enough for parallel scan, then considering Append
> node for parallelism of any use or in other words, won't it be better
> to generate plan as it is done now without this patch for such cases
> considering current execution model of Gather node?

I think we should instead extend the Append node as suggested by
KaiGai, so that it can have both partial and non-partial children.  I
think we can leave that to another patch, though.  Aside from
questions of what the fastest plan are, it's very bad to have Gather
nodes under the Append, because the Append could have many children
and we could end up with a ton of Gather nodes, each using a DSM and a
bunch of workers.  That's important to avoid.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: Using quicksort for every external sort run
Следующее
От: Greg Stark
Дата:
Сообщение: Re: LLVM miscompiles numeric.c access to short numeric var headers