Re: Parallel Append implementation

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Parallel Append implementation
Дата
Msg-id 20170406020323.ef6tyffrg6lzdpvw@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Parallel Append implementation  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Список pgsql-hackers
On 2017-04-05 14:52:38 +0530, Amit Khandekar wrote:
> This is what the earlier versions of my patch had done : just add up
> per-subplan parallel_workers (1 for non-partial subplan and
> subpath->parallel_workers for partial subplans) and set this total as
> the Append parallel_workers.

I don't think that's great, consider e.g. the case that you have one
very expensive query, and a bunch of cheaper ones. Most of those workers
wouldn't do much while waiting for the the expensive query.  What I'm
basically thinking we should do is something like the following
pythonesque pseudocode:

best_nonpartial_cost = -1
best_nonpartial_nworkers = -1

for numworkers in 1...#max workers:  worker_work = [0 for x in range(0, numworkers)]
  nonpartial_cost += startup_cost * numworkers
  # distribute all nonpartial tasks over workers.  Assign tasks to the  # worker with the least amount of work already
performed. for task in all_nonpartial_subqueries:      least_busy_worker = worker_work.smallest()
least_busy_worker+= task.total_nonpartial_cost
 
  # the nonpartial cost here is the largest amount any single worker  # has to perform.  nonpartial_cost +=
worker_work.largest()
  total_partial_cost = 0  for task in all_partial_subqueries:      total_partial_cost += task.total_nonpartial_cost
  # Compute resources needed by partial tasks. First compute how much  # cost we can distribute to workers that take
shorterthan the  # "busiest" worker doing non-partial tasks.  remaining_avail_work = 0  for i in range(0, numworkers):
   remaining_avail_work += worker_work.largest() - worker_work[i]
 
  # Equally divide up remaining work over all workers  if remaining_avail_work < total_partial_cost:
nonpartial_cost+= (worker_work.largest - remaining_avail_work) / numworkers
 
  # check if this is the best number of workers  if best_nonpartial_cost == -1 or best_nonpartial_cost >
nonpartial_cost:    best_nonpartial_cost = worker_work.largest     best_nonpartial_nworkers = nworkers
 

Does that make sense?


> BTW all of the above points apply only for non-partial plans.

Indeed. But I think that's going to be a pretty common type of plan,
especially if we get partitionwise joins.


Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: Re: Query fails when SRFs are part of FROM clause(Commit id: 69f4b9c85f)
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Interval for launching the table sync worker