RE: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

Поиск
Список
Период
Сортировка
От tsunakawa.takay@fujitsu.com
Тема RE: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
Дата
Msg-id TYAPR01MB2990BC344B1C307CF5C3BC37FE619@TYAPR01MB2990.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-committers
From: Robert Haas <robertmhaas@gmail.com>
> On Wed, Mar 24, 2021 at 12:48 AM Andres Freund <andres@anarazel.de>
> wrote:
> > Although this specific hack doesn't seem too terrible to me. If you
> > execute a parallel insert the likelihood to end up not needing an xid is
> > pretty low. Implementing it concurrently does seem like it'd end up
> > needing another lwlock nested around xid assignment, or some more
> > complicated scheme with already holding XidGenLock or retries. But maybe
> > I'm missing an easy solution here.
> 
> I don't think you need to do anything that is known outside the group
> of processes involved in the parallel query. I think you just need to
> make sure that only one of them is trying to acquire an XID at a time,
> and that all the others find out about it. I haven't thought too hard
> about the timing: if one process acquires an XID for the transaction,
> is it OK if the others do an arbitrary amount of work before they
> realize that this has happened? Also, there's the problem that the
> leader has the whole transaction stack and the workers don't, so the
> recursive nature of XID acquisition is a problem. I suspect these are
> all pretty solvable problems; I just haven't put in the energy. But,
> it could also be that I'm missing something.

It doesn't seem easy to make parallel workers allocate an XID and share it among the parallel processes.  When the DML
isrun inside a deeply nested subtransaction and the parent transactions have not allocated their XIDs yet, the worker
needsto allocate the XIDs for its parents.  That indeterminate number of XIDs must be stored in shared memory.  The
stackof TransactionState structures must also be passed.  Also, TransactionIdIsCurrentTransactionId() uses an array
ParallelCurrentXidswhere parallel workers receive sub-committed XIDs from the leader.  This needs to be reconsidered.
 

Before that, I don't see the need for parallel workers to allocate the XID.  As the following Oracle manual says,
parallelDML will be used in data analytics and OLTP batch jobs.  There should be plenty of source data in those
scenarios.


When to Use Parallel DML

https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/types-parallelism.html#GUID-18B2AF09-C548-48DE-A794-86224111549F
--------------------------------------------------
Several scenarios where parallel DML is used include:

Refreshing Tables in a Data Warehouse System

Creating Intermediate Summary Tables

Using Scoring Tables

Updating Historical Tables

Running Batch Jobs
--------------------------------------------------


I don't mean to say we want to use the easy hack as we want to be lazy.  I'd like to know whether we *really* need the
effort. And I want PostgreSQL to provide great competitive features as early as possible without messing up the design
andcode.
 

For what kind of realistic conceivable scenarios do we need the sophisticated XID assignment mechanism in parallel
workers?


Regards
Takayuki Tsunakawa



В списке pgsql-committers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
Следующее
От: Tomas Vondra
Дата:
Сообщение: pgsql: Remove unnecessary pg_amproc BRIN minmax entries