Re: Parallel copy
От | Amit Kapila |
---|---|
Тема | Re: Parallel copy |
Дата | |
Msg-id | CAA4eK1+ANNEaMJCCXm4naweP5PLY6LhJMvGo_V7-Pnfbh6GsOA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Parallel copy (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: Parallel copy
(Robert Haas <robertmhaas@gmail.com>)
|
Список | pgsql-hackers |
On Wed, Apr 15, 2020 at 11:49 PM Andres Freund <andres@anarazel.de> wrote: > > To be clear, I was only thinking of using a ringbuffer to indicate split > boundaries. And that workers would just pop entries from it before they > actually process the data (stored outside of the ringbuffer). Since the > split boundaries will always be read in order by workers, and the > entries will be tiny, there's no need to avoid copying out entries. > I think the binary mode processing will be slightly different because unlike text and csv format, the data is stored in Length, Value format for each column and there are no line markers. I don't think there will be a big difference but still, we need to somewhere keep the information what is the format of data in ring buffers. Basically, we can copy the data in Length, Value format and once the writers know about the format, they will parse the data in the appropriate format. We currently also have a different way of parsing the binary format, see NextCopyFrom. I think we need to be careful about avoiding duplicate work as much as possible. Apart from this, we have analyzed the other cases as mentioned below where we need to decide whether we can allow parallelism for the copy command. Case-1: Do we want to enable parallelism for a copy when transition tables are involved? Basically, during the copy, we do capture tuples in transition tables for certain cases like when after statement trigger accesses the same relation on which we have a trigger. See the example below [1]. We decide this in function MakeTransitionCaptureState. For such cases, we collect minimal tuples in the tuple store after processing them so that later after statement triggers can access them. Now, if we want to enable parallelism for such cases, we instead need to store and access tuples from shared tuple store (sharedtuplestore.c/sharedtuplestore.h). However, it doesn't have the facility to store tuples in-memory, so we always need to store and access from a file which could be costly unless we also have an additional way to store minimal tuples in shared memory till work_memory and then in shared tuple store. It is possible to do all this or part of this work to enable parallel copy for such cases but I am not sure if it is worth it. We can decide to not enable parallelism for such cases and later allow if we see demand for the same and it will also help us to not introduce additional work/complexity in the first version of the patch. Case-2: The Single Insertion mode (CIM_SINGLE) is performed in various scenarios and whether we can allow parallelism for those depends on case to case basis which is discussed below: a. When there are BEFORE/INSTEAD OF triggers on the table. We don't allow multi-inserts in such cases because such triggers might query the table we're inserting into and act differently if the tuples that have already been processed and prepared for insertion are not there. Now, if we allow parallelism with such triggers the behavior would depend on if the parallel worker has already inserted or not that particular row. I guess such functions should ideally be marked as parallel-unsafe. So, in short in this case whether to allow parallelism or not depends upon the parallel-safety marking of this function. b. For partitioned tables, we can't support multi-inserts when there are any statement-level insert triggers. This is because as of now, we expect that any before row insert and statement-level insert triggers are on the same relation. Now, there is no harm in allowing parallelism for such cases but it depends upon if we have the infrastructure (basically allow tuples to be collected in shared tuple store) to support statement-level insert triggers. c. For inserts into foreign tables. We can't allow the parallelism in this case because each worker needs to establish the FDW connection and operate in a separate transaction. Now unless we have a capability to provide a two-phase commit protocol for "Transactions involving multiple postgres foreign servers" (which is being discussed in a separate thread [2]), we can't allow this. d. If there are volatile default expressions or the where clause contains a volatile expression. Here, we can check if the expression is parallel-safe, then we can allow parallelism. Case-3: In copy command, for performing foreign key checks, we take KEY SHARE lock on primary key table rows which inturn will increment the command counter and updates the snapshot. Now, as we share the snapshots at the beginning of the command, we can't allow it to be changed later. So, unless we do something special for it, I think we can't allow parallelism in such cases. I couldn't think of many problems if we allow parallelism in such cases. One inconsistency, if we allow FK checks via workers, would be that at the end of COPY the value of command_counter will not be what we expect as we wouldn't have accounted for that from workers. Now, if COPY is being done in a transaction it will not assign the correct values to the next commands. Also, for executing deferred triggers, we use transaction snapshot, so if anything is changed in snapshot via parallel workers, ideally it should have synced the changed snapshot in the worker. Now, the other concern could be that different workers can try to acquire KEY SHARE lock on the same tuples which they will be able to acquire due to group locking or otherwise but I don't see any problem with it. I am not sure if it above leads to any user-visible problem but I might be missing something here. I think if we can think of any real problems we can try to design a better solution to address those. Case-4: For Deferred Triggers, it seems we record CTIDs of tuples (via ExecARInsertTriggers->AfterTriggerSaveEvent) and then execute deferred triggers at transaction end using AfterTriggerFireDeferred or at end of the statement. The challenge to allow parallelism for such cases is we need to capture the CTID events in shared memory. For that, we either need to invent a new infrastructure for event capturing in shared memory which will be a huge task on its own. The other idea is to get CTIDs via shared memory and then add those to event queues via leader but I think in that case we need to ensure the order of CTIDs (basically it should be in the same order in which we have processed them). [1] - create or replace function dump_insert() returns trigger language plpgsql as $$ begin raise notice 'trigger = %, new table = %', TG_NAME, (select string_agg(new_table::text, ', ' order by a) from new_table); return null; end; $$; create table test (a int); create trigger trg1_test after insert on test referencing new table as new_table for each statement execute procedure dump_insert(); copy test (a) from stdin; 1 2 3 \. [2] - https://www.postgresql.org/message-id/20191206.173215.1818665441859410805.horikyota.ntt%40gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Andrew DunstanДата:
Сообщение: Re: Add "-Wimplicit-fallthrough" to default flags (was Re: pgsql:Support FETCH FIRST WITH TIES)