Re: Parallel INSERT SELECT take 2
От | Amit Kapila |
---|---|
Тема | Re: Parallel INSERT SELECT take 2 |
Дата | |
Msg-id | CAA4eK1LYbNc2V1KrqqBE4XT7JXPPjkAP9gK2bDrhqVGcBPMpKQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Parallel INSERT SELECT take 2 ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>) |
Ответы |
Re: Parallel INSERT SELECT take 2
|
Список | pgsql-hackers |
On Mon, Apr 12, 2021 at 6:52 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > > SOLUTION TO (1) > ======================================== > > The candidate ideas are: > > 1) Caching the result of parallel-safety check > The planner stores the result of checking parallel safety for each relation in relcache, or some purpose-built hash tablein shared memory. > > The problems are: > > * Even if the target relation turns out to be parallel safe by looking at those data structures, we cannot assume it remainstrue until the SQL statement finishes. For instance, other sessions might add a parallel-unsafe index to its descendantrelations. Other examples include that when the user changes the parallel safety of indexes or triggers by runningALTER FUNCTION on the underlying index AM function or trigger function, the relcache entry of the table or index isnot invalidated, so the correct parallel safety is not maintained in the cache. > In that case, when the executor encounters a parallel-unsafe object, it can change the cached state as being parallel-unsafeand error out. > > * Can't ensure fast access. With relcache, the first access in each session has to undergo the overhead of parallel-safetycheck. With a hash table in shared memory, the number of relations stored there would be limited, so thefirst access after database startup or the hash table entry eviction similarly experiences slowness. > > * With a new hash table, some lwlock for concurrent access must be added, which can have an adverse effect on performance. > > > 2) Enabling users to declare that the table allows parallel data modification > Add a table property that represents parallel safety of the table for DML statement execution. Users specify it as follows: > > CREATE TABLE table_name (...) PARALLEL { UNSAFE | RESTRICTED | SAFE }; > ALTER TABLE table_name PARALLEL { UNSAFE | RESTRICTED | SAFE }; > > This property is recorded in pg_class's relparallel column as 'u', 'r', or 's', just like pg_proc's proparallel. The defaultis UNSAFE. > So, in short, if we need to go with any sort of solution with caching, we can't avoid (a) locking all the partitions (b) getting an error while execution because at a later point user has altered the parallel-safe property of a relation. We can't avoid locking all the partitions because while we are executing the statement, the user can change the parallel-safety for one of the partitions by changing a particular partition and if we didn't have a lock on that partition, it will lead to an error during execution. Now, here, one option could be that we document this point and then don't take lock on any of the partitions except for root table. So, the design would be simpler, that we either cache the parallel-safe in relcache or shared hash table and just lock the parent table and perform all parallel-safety checks for the first time. I think if we want to go with the behavior that we will error out during statement execution if any parallel-safe property is changed at run-time, it is better to go with the declarative approach. In the declarative approach, at least the user will be responsible for taking any such decision and the chances of toggling the parallel-safe property will be less. To aid users, as suggested, we can provide a function to determine parallel-safety of relation for DML operations. Now, in the declarative approach, we can either go with whatever the user has mentioned or we can do some checks during DDL to determine the actual parallel-safety. I think even if try to determine parallel-safety during DDL it will be quite tricky in some cases, like when a user performs Alter Function to change parallel-safety of the function used in some constraint for the table or if the user changes parallel-safety of one of the partition then we need to traverse the partition hierarchy upwards which doesn't seem advisable. So, I guess it is better to go with whatever the user has mentioned but if you or others feel we can have some sort of parallel-safety checks during DDL as well. > The planner assumes that all of the table, its descendant partitions, and their ancillary objects have the specified parallelsafety or safer one. The user is responsible for its correctness. If the parallel processes find an object thatis less safer than the assumed parallel safety during statement execution, it throws an ERROR and abort the statementexecution. > > The objects that relate to the parallel safety of a DML target table are as follows: > > * Column default expression > * DOMAIN type CHECK expression > * CHECK constraints on column > * Partition key > * Partition key support function > * Index expression > * Index predicate > * Index AM function > * Operator function > * Trigger function > > When the parallel safety of some of these objects is changed, it's costly to reflect it on the parallel safety of tablesthat depend on them. So, we don't do it. Instead, we provide a utility function pg_get_parallel_safety('table_name')that returns records of (objid, classid, parallel_safety) that represent the parallelsafety of objects that determine the parallel safety of the specified table. The function only outputs objects thatare not parallel safe. > So, users need to check count(*) for this to determine parallel-safety? How about if we provide a wrapper function on top of this function or a separate function that returns char to indicate whether it is safe, unsafe, or restricted to perform a DML operation on the table? > How does the executor detect parallel unsafe objects? There are two ways: > > 1) At loading time > When the executor loads the definition of objects (tables, constraints, index, triggers, etc.) during the first accessto them after session start or their eviction by sinval message, it checks the parallel safety. > > This is a legitimate way, but may need much code. Also, it might overlook necessary code changes without careful inspection. > If we want to go with a declarative approach, then I think we should try to do this because it will be almost free in some cases and we can detect error early. For example, when we decide to insert in a partition that is declared unsafe whereas the root (partitioned) table is declared safe. -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Julien RouhaudДата:
Сообщение: Re: Remove "FROM" in "DELETE FROM" when using tab-completion
Следующее
От: Dilip KumarДата:
Сообщение: Re: Remove "FROM" in "DELETE FROM" when using tab-completion