RE: Implement UNLOGGED clause for COPY FROM

Поиск
Список
Период
Сортировка
От osumi.takamichi@fujitsu.com
Тема RE: Implement UNLOGGED clause for COPY FROM
Дата
Msg-id TY2PR01MB4890DB0C8D1B8C90502266D9ED5A0@TY2PR01MB4890.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Implement UNLOGGED clause for COPY FROM  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Ответы Re: Implement UNLOGGED clause for COPY FROM  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: Implement UNLOGGED clause for COPY FROM  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Hello.

Apologies for the delay.
> > When the server crash occurs during data loading of COPY UNLOGGED,
> > it's a must to keep index consistent of course.
> > I'm thinking that to rebuild the indexes on the target table would work.
> >
> > In my opinion, UNLOGGED clause must be designed to guarantee that
> > where the data loaded by this clause is written starts from the end of all other
> data blocks.
> > Plus, those blocks needs to be protected by any write of other transactions
> during the copy.
> > Apart from that, the server must be aware of which block is the first
> > block, or the range about where it started or ended in preparation for the crash.
> >
> > During the crash recovery, those points are helpful to recognize and
> > detach such blocks in order to solve a situation that the loaded data is partially
> synced to the disk and the rest isn't.
> 
> How do online backup and archive recovery work ?
> 
> Suppose that the user executes pg_basebackup during COPY UNLOGGED running,
> the physical backup might have the portion of tuples loaded by COPY UNLOGGED
> but these data are not recovered. It might not be a problem because the operation
> is performed without WAL records. But what if an insertion happens after COPY
> UNLOGGED but before pg_stop_backup()? I think that a new tuple could be
> inserted at the end of the table, following the data loaded by COPY UNLOGGED.
> With your approach described above, the newly inserted tuple will be recovered
> during archive recovery, but it either will be removed if we replay the insertion
> WAL then truncate the table or won’t be inserted due to missing block if we
> truncate the table then replay the insertion WAL, resulting in losing the tuple
> although the user got successful of insertion.
I consider that from the point in time when COPY UNLOGGED is executed,
any subsequent operations to the data which comes from UNLOGGED operation
also cannot be recovered even if those issued WAL.

This is basically inevitable because subsequent operations 
after COPY UNLOGGED depend on blocks of loaded data without WAL,
which means we cannot replay exact operations.

Therefore, all I can do is to guarantee that 
when one recovery process ends, the target table returns to the state
immediately before the COPY UNLOGGED is executed.
This could be achieved by issuing and notifying the server of an invalidation WAL,
an indicator to stop WAL application toward one specific table after this new type of WAL.
I think I need to implement this mechanism as well for this feature.
Thus, I'll take a measure against your concern of confusing data loss.

For recovery of the loaded data itself, the user of this clause,
like DBA or administrator of data warehouse for instance, 
would need to make a backup just after the data loading.
For some developers, this behavior would seem incomplete because of the heavy user's burden.

On the other hand, I'm aware of a fact that Oracle Database has a feature of UNRECOVERABLE clause,
which is equivalent to what I'm suggesting now in this thread.

This data loading without REDO log by the clause is more convenient than what I said above,
because it's supported by a tool named Recovery Manager which enables users to make an incremental backup.
This works to back up only the changed blocks since the previous backup and
remove the manual burden from the user like above.
Here, I have to admit that I cannot design and implement 
this kind of synergistic pair of all features at once for data warehousing.
So I'd like to make COPY UNLOGGED as the first step.

This is the URL of how Oracle database for data warehouse achieves the backup of no log operation while acquiring high
speedof data loading.
 
https://docs.oracle.com/database/121/VLDBG/GUID-42825ED1-C4C5-449B-870F-D2C8627CBF86.htm#VLDBG1578

Best,
    Takamichi Osumi

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Creating a function for exposing memory usage of backend process