Re: [bug?] Missed parallel safety checks, and wrong parallel safety

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: [bug?] Missed parallel safety checks, and wrong parallel safety
Дата	28 июня 2021 г. 09:51:34
Msg-id	CAA4eK1Jwz8xGss4b0-33eyX0i5W_1CnqT16DjB9snVC--DoOsQ@mail.gmail.com обсуждение исходный текст
Ответ на	RE: [bug?] Missed parallel safety checks, and wrong parallel safety ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Ответы	Re: [bug?] Missed parallel safety checks, and wrong parallel safety
Список	pgsql-hackers

Дерево обсуждения

On Mon, Jun 21, 2021 at 4:40 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> To be honest, I didn't find a cheap way to invalidate partitioned table's
> parallel safety automatically.
>

I also don't see the feasibility for doing parallelism checks for
partitioned tables both because it is expensive due to
traversing/locking all the partitions and then the invalidations are
difficult to handle due to deadlock hazards as discussed above.

Let me try to summarize the discussion so far and see if we can have
any better ideas than what we have discussed so far or we want to go
with one of the ideas discussed till now. I think we have broadly
discussed two approaches (a) to automatically decide whether
parallelism can be enabled for inserts, (b) provide an option to the
user to specify whether inserts can be parallelized on a relation.

For the first approach (a), we have evaluated both the partitioned and
non-partitioned relation cases. For non-partitioned relations, we can
compute the parallel-safety of relation during the planning and save
it in the relation cache entry. This is normally safe because we have
a lock on the relation and any change to the relation should raise an
invalidation which will lead to re-computation of parallel-safety
information for a relation. Now, there are cases where the
parallel-safety of some trigger function or a function used in index
expression can be changed by the user which won't register an
invalidation for a relation. To handle such cases, we can register a
new kind of invalidation only when a function's parallel-safety
information is changed. And every backend in the same database then
needs to re-evaluate the parallel-safety of every relation for which
it has cached a value. For partitioned relations, the similar idea
won't work because of multiple reasons (a) We need to traverse and
lock all the partitions to compute the parallel-safety of the root
relation which could be very expensive; (b) Whenever we invalidate a
particular partition, we need to invalidate its parent hierarchy as
well. We can't traverse the parent hierarchy without taking locks on
the parent table which can lead to deadlock. The alternative could be
that for partitioned relations we can rely on the user-specified
information about parallel-safety (like the approach-b mentioned in
the previous paragraph). We can additionally check the parallel safety
of partitions when we are trying to insert into a particular partition
and error out if we detect any parallel-unsafe clause and we are in
parallel-mode. So, in this case, we won't be completely relying on the
users. Users can either change the parallel safe option of the table
or remove/change the parallel-unsafe clause after an error.

For the second approach (b), we can provide an option to the user to
specify whether inserts (or other dml's) can be parallelized for a
relation. One of the ideas is to provide some options like below to
the user:
CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE };
ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE };

This property is recorded in pg_class's relparallel column as 'u',
'r', or 's', just like pg_proc's proparallel. The default is UNSAFE.
Additionally, provide a function pg_get_parallel_safety(oid) using
which users can determine whether it is safe to enable parallelism.
Surely, after the user has checked with that function, one can add
some unsafe constraints to the table by altering the table but it will
still be an aid to enable parallelism on a relation.

The first approach (a) has an appeal because it would allow to
automatically parallelize inserts in many cases but might have some
overhead in some cases due to processing of relcache entries after the
parallel-safety of the relation is changed. The second approach (b)
has an appeal because of its consistent behavior for partitioned and
non-partitioned relations.

Among the above options, I would personally prefer (b) mainly because
of the consistent handling for partition and non-partition table cases
but I am fine with approach (a) as well if that is what other people
feel is better.

Thoughts?

-- 
With Regards,
Amit Kapila.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [bug?] Missed parallel safety checks, and wrong parallel safety