Re: [HACKERS] Parallel Append implementation

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [HACKERS] Parallel Append implementation
Дата
Msg-id CA+TgmoamMxv_ufrdUxxUhAdfuTgs7119JCmguf78KuJC0XM7LA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Parallel Append implementation  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Ответы Re: [HACKERS] Parallel Append implementation  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Re: [HACKERS] Parallel Append implementation  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Список pgsql-hackers
On Thu, Feb 16, 2017 at 1:34 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> What I was thinking about is something like this:
>>
>> 1. First, take the maximum parallel_workers value from among all the children.
>>
>> 2. Second, compute log2(num_children)+1 and round up.  So, for 1
>> child, 1; for 2 children, 2; for 3-4 children, 3; for 5-8 children, 4;
>> for 9-16 children, 5, and so on.
>>
>> 3. Use as the number of parallel workers for the children the maximum
>> of the value computed in step 1 and the value computed in step 2.
>
> Ah, now that I closely look at compute_parallel_worker(), I see what
> you are getting at.
>
> For plain unpartitioned table, parallel_workers is calculated as
> roughly equal to log(num_pages) (actually it is log3). So if the table
> size is n, the workers will be log(n). So if it is partitioned into p
> partitions of size n/p each, still the number of workers should be
> log(n). Whereas, in the patch, it is calculated as (total of all the
> child workers) i.e. n * log(n/p) for this case. But log(n) != p *
> log(x/p). For e.g. log(1000) is much less than log(300) + log(300) +
> log(300).
>
> That means, the way it is calculated in the patch turns out to be much
> larger than if it were calculated using log(total of sizes of all
> children). So I think for the step 2 above, log(total_rel_size)
> formula seems to be appropriate. What do you think ? For
> compute_parallel_worker(), it is actually log3 by the way.
>
> BTW this formula is just an extension of how parallel_workers is
> calculated for an unpartitioned table.

log(total_rel_size) would be a reasonable way to estimate workers when
we're scanning an inheritance hierarchy, but I'm hoping Parallel
Append is also going to apply to UNION ALL queries, where there's no
concept of the total rel size.  For that we need something else, which
is why the algorithm that I proposed upthread doesn't rely on it.

>> The decision to use fewer workers for a smaller scan isn't really
>> because we think that using more workers will cause a regression.
>> It's because we think it may not help very much, and because it's not
>> worth firing up a ton of workers for a relatively small scan given
>> that workers are a limited resource.  I think once we've got a bunch
>> of workers started, we might as well try to use them.
>
> One possible side-effect I see due to this is : Other sessions might
> not get a fair share of workers due to this. But again, there might be
> counter argument that, because Append is now focussing all the workers
> on a last subplan, it may finish faster, and release *all* of its
> workers earlier.

Right.  I think in general it's pretty clear that there are possible
fairness problems with parallel query.  The first process that comes
along seizes however many workers it thinks it should use, and
everybody else can use whatever (if anything) is left.  In the long
run, I think it would be cool to have a system where workers can leave
one parallel query in progress and join a different one (or exit and
spawn a new worker to join a different one), automatically rebalancing
as the number of parallel queries in flight fluctuates.  But that's
clearly way beyond anything we can do right now.  I think we should
assume that any parallel workers our process has obtained are ours to
use for the duration of the query, and use them as best we can.  Note
that even if the Parallel Append tells one of the workers that there
are no more tuples and it should go away, some higher level of the
query plan could make a different choice anyway; there might be
another Append elsewhere in the plan tree.

> BTW, there is going to be some logic change in the choose-next-subplan
> algorithm if we consider giving extra workers to subplans.

I'm not sure that it's going to be useful to make this logic very
complicated.  I think the most important thing is to give 1 worker to
each plan before we give a second worker to any plan.  In general I
think it's sufficient to assign a worker that becomes available to the
subplan with the fewest number of workers (or one of them, if there's
a tie) without worrying too much about the target number of workers
for that subplan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: [HACKERS] Partitioning vs ON CONFLICT
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] Patch to implement pg_current_logfile() function