Re: Parallel INSERT (INTO ... SELECT ...)

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: Parallel INSERT (INTO ... SELECT ...)
Дата
Msg-id CA+hUKG+D6rA=dbYntrcwoA8kYWB5LAQ0uay+q1u7QLZo98OLJw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel INSERT (INTO ... SELECT ...)  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Parallel INSERT (INTO ... SELECT ...)
Список pgsql-hackers
On Sun, Oct 11, 2020 at 12:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> + /*
> + * For Parallel INSERT, provided no tuples are returned from workers
> + * to gather/leader node, don't add a cost-per-row, as each worker
> + * parallelly inserts the tuples that result from its chunk of plan
> + * execution. This change may make the parallel plan cheap among all
> + * other plans, and influence the planner to consider this parallel
> + * plan.
> + */
> + if (!(IsA(path->subpath, ModifyTablePath) &&
> + castNode(ModifyTablePath, path->subpath)->operation == CMD_INSERT &&
> + castNode(ModifyTablePath, path->subpath)->returningLists != NULL))
> + {
> + run_cost += parallel_tuple_cost * path->path.rows;
> + }
>
> Isn't the last condition in above check "castNode(ModifyTablePath,
> path->subpath)->returningLists != NULL" should be
> "castNode(ModifyTablePath, path->subpath)->returningLists == NULL"
> instead? Because otherwise when there is returning list it won't count
> the cost for passing tuples via Gather node. This might be reason of
> what Thomas has seen in his recent email [2].

Yeah, I think this is trying to fix the problem too late.  Instead, we
should fix the incorrect row estimates so we don't have to fudge it
later like that.  For example, this should be estimating rows=0:

postgres=# explain analyze insert into s select * from t t1 join t t2 using (i);
...
 Insert on s  (cost=30839.08..70744.45 rows=1000226 width=4) (actual
time=2940.560..2940.562 rows=0 loops=1)

I think that should be done with something like this:

--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3583,16 +3583,11 @@ create_modifytable_path(PlannerInfo *root,
RelOptInfo *rel,
                if (lc == list_head(subpaths))  /* first node? */
                        pathnode->path.startup_cost = subpath->startup_cost;
                pathnode->path.total_cost += subpath->total_cost;
-               pathnode->path.rows += subpath->rows;
+               if (returningLists != NIL)
+                       pathnode->path.rows += subpath->rows;
                total_size += subpath->pathtarget->width * subpath->rows;
        }

-       /*
-        * Set width to the average width of the subpath outputs.  XXX this is
-        * totally wrong: we should report zero if no RETURNING, else an average
-        * of the RETURNING tlist widths.  But it's what happened historically,
-        * and improving it is a task for another day.
-        */
        if (pathnode->path.rows > 0)
                total_size /= pathnode->path.rows;
        pathnode->path.pathtarget->width = rint(total_size);



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Parallel INSERT (INTO ... SELECT ...)
Следующее
От: Noah Misch
Дата:
Сообщение: Re: powerpc pg_atomic_compare_exchange_u32_impl: error: comparison of integer expressions of different signedness (Re: pgsql: For all ppc compilers, implement compare_exchange and) fetch_add