Re: [HACKERS] why not parallel seq scan for slow functions

Поиск
Список
Период
Сортировка
От Marina Polyakova
Тема Re: [HACKERS] why not parallel seq scan for slow functions
Дата
Msg-id afd43869dcb18830118e9bcd813cbb4c@postgrespro.ru
обсуждение исходный текст
Ответ на Re: [HACKERS] why not parallel seq scan for slow functions  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: [HACKERS] why not parallel seq scan for slow functions  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Hello everyone in this thread!

On 29-11-2017 8:01, Michael Paquier wrote:
> Moved to next CF for extra reviews.

Amit, I would like to ask some questions about your patch (and can you 
please rebase it on the top of the master?):

1)
+            path->total_cost -= (target->cost.per_tuple - oldcost.per_tuple) * 
path->rows;

Here we undo the changes that we make in this function earlier:

path->total_cost += target->cost.startup - oldcost.startup +
    (target->cost.per_tuple - oldcost.per_tuple) * path->rows;

Perhaps we should not start these "reversible" changes in this case from 
the very beginning?

2)
          gpath->subpath = (Path *)
              create_projection_path(root,
                                     gpath->subpath->parent,
                                     gpath->subpath,
                                     target);
...
+        if (((ProjectionPath *) gpath->subpath)->dummypp)
+        {
...
+            path->total_cost += (target->cost.per_tuple - oldcost.per_tuple) * 
gpath->subpath->rows;
+        }
+        else
+        {
...
+            path->total_cost += (cpu_tuple_cost + target->cost.per_tuple) * 
gpath->subpath->rows;
+        }

As I understand it, here in the if-else block we change the run costs of 
gpath in the same way as they were changed for its subpath in the 
function create_projection_path earlier. But for the startup costs we 
always subtract the cost of the old target:

path->startup_cost += target->cost.startup - oldcost.startup;
path->total_cost += target->cost.startup - oldcost.startup +
    (target->cost.per_tuple - oldcost.per_tuple) * path->rows;

Should we change the startup costs of gpath in this way if 
((ProjectionPath *) gpath->subpath)->dummypp is false?

3)
      simple_gather_path = (Path *)
          create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
                             NULL, NULL);
+    /* Add projection step if needed */
+    if (target && simple_gather_path->pathtarget != target)
+        simple_gather_path = apply_projection_to_path(root, rel, 
simple_gather_path, target);
...
+        path = (Path *) create_gather_merge_path(root, rel, subpath, 
rel->reltarget,
+                                                 subpath->pathkeys, NULL, NULL);
+        /* Add projection step if needed */
+        if (target && path->pathtarget != target)
+            path = apply_projection_to_path(root, rel, path, target);
...
@@ -2422,16 +2422,27 @@ apply_projection_to_path(PlannerInfo *root,
...
          gpath->subpath = (Path *)
              create_projection_path(root,
                                     gpath->subpath->parent,
                                     gpath->subpath,
                                     target);

The target is changing so we change it for the gather(merge) node and 
for its subpath. Do not we have to do this work (replace the subpath by 
calling the function create_projection_path if the target is different) 
in the functions create_gather(_merge)_path too? I suppose that the 
target of the subpath affects its costs => the costs of the 
gather(merge) node in the functions cost_gather(_merge) (=> the costs of 
the gather(merge) node in the function apply_projection_to_path).

4)
+         * Consider ways to implement parallel paths.  We always skip
+         * generating parallel path for top level scan/join nodes till the
+         * pathtarget is computed.  This is to ensure that we can account for
+         * the fact that most of the target evaluation work will be performed
+         * in workers.
+         */
+        generate_gather_paths(root, current_rel, scanjoin_target);
+
+        /* Set or update cheapest_total_path and related fields */
+        set_cheapest(current_rel);

As I understand it (if correctly, thank the comments of Robert Haas and 
Tom Lane :), after that we cannot use the function 
apply_projection_to_path for paths in the current_rel->pathlist without 
risking that the cheapest path will change. And we have several calls to 
the function adjust_paths_for_srfs (which uses apply_projection_to_path 
for paths in the current_rel->pathlist) in grouping_planner after 
generating the gather paths:

        /* Now fix things up if scan/join target contains SRFs */
        if (parse->hasTargetSRFs)
            adjust_paths_for_srfs(root, current_rel,
                                  scanjoin_targets,
                                  scanjoin_targets_contain_srfs);
...
            /* Fix things up if grouping_target contains SRFs */
            if (parse->hasTargetSRFs)
                adjust_paths_for_srfs(root, current_rel,
                                      grouping_targets,
                                      grouping_targets_contain_srfs);
...
            /* Fix things up if sort_input_target contains SRFs */
            if (parse->hasTargetSRFs)
                adjust_paths_for_srfs(root, current_rel,
                                      sort_input_targets,
                                      sort_input_targets_contain_srfs);
...
        /* Fix things up if final_target contains SRFs */
        if (parse->hasTargetSRFs)
            adjust_paths_for_srfs(root, current_rel,
                                  final_targets,
                                  final_targets_contain_srfs);

Maybe we should add the appropriate call to the function set_cheapest() 
after this if parse->hasTargetSRFs is true?

5)
+     * If this is a baserel and not the top-level rel, consider gathering 
any
+     * partial paths we may have created for it.  (If we tried to gather
+     * inheritance children, we could end up with a very large number of
+     * gather nodes, each trying to grab its own pool of workers, so don't 
do
+     * this for otherrels.  Instead, we'll consider gathering partial 
paths
+     * for the parent appendrel.).  We can check for joins by counting the
+     * membership of all_baserels (note that this correctly counts 
inheritance
+     * trees as single rels).  The gather path for top-level rel is 
generated
+     * once path target is available.  See grouping_planner.
       */
-    if (rel->reloptkind == RELOPT_BASEREL)
-        generate_gather_paths(root, rel);
+    if (rel->reloptkind == RELOPT_BASEREL &&
+        bms_membership(root->all_baserels) != BMS_SINGLETON)
+        generate_gather_paths(root, rel, NULL);

Do I understand correctly that here there's an assumption about 
root->query_level (which we don't need to check)?

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: [PATCH] session_replication_role = replica with TRUNCATE
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: [HACKERS] [PATCH] Lockable views