Re: Implement UNLOGGED clause for COPY FROM

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Implement UNLOGGED clause for COPY FROM
Дата
Msg-id CAA4eK1+HDqS+1fhs5Jf9o4ZujQT=XBZ6sU0kOuEh2hqQAC+t=w@mail.gmail.com
обсуждение исходный текст
Ответ на RE: Implement UNLOGGED clause for COPY FROM  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
Ответы RE: Implement UNLOGGED clause for COPY FROM  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Список pgsql-hackers
On Thu, Aug 20, 2020 at 5:49 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
>
> Hello.
> > > During the crash recovery, those points are helpful to recognize and
> > > detach such blocks in order to solve a situation that the loaded data is partially
> > synced to the disk and the rest isn't.
> >
> > How do online backup and archive recovery work ?
> >
> > Suppose that the user executes pg_basebackup during COPY UNLOGGED running,
> > the physical backup might have the portion of tuples loaded by COPY UNLOGGED
> > but these data are not recovered. It might not be a problem because the operation
> > is performed without WAL records. But what if an insertion happens after COPY
> > UNLOGGED but before pg_stop_backup()? I think that a new tuple could be
> > inserted at the end of the table, following the data loaded by COPY UNLOGGED.
> > With your approach described above, the newly inserted tuple will be recovered
> > during archive recovery, but it either will be removed if we replay the insertion
> > WAL then truncate the table or won’t be inserted due to missing block if we
> > truncate the table then replay the insertion WAL, resulting in losing the tuple
> > although the user got successful of insertion.
> I consider that from the point in time when COPY UNLOGGED is executed,
> any subsequent operations to the data which comes from UNLOGGED operation
> also cannot be recovered even if those issued WAL.
>
> This is basically inevitable because subsequent operations
> after COPY UNLOGGED depend on blocks of loaded data without WAL,
> which means we cannot replay exact operations.
>
> Therefore, all I can do is to guarantee that
> when one recovery process ends, the target table returns to the state
> immediately before the COPY UNLOGGED is executed.
> This could be achieved by issuing and notifying the server of an invalidation WAL,
> an indicator to stop WAL application toward one specific table after this new type of WAL.
>

I don't think we can achieve what you want by one special invalidation
WAL. Consider a case where an update has happened on the page which
exists before 'Copy Unlogged' operation and while writing that page to
disk, the system crashed and the page is half-written. Without the
special WAL mechanism you are proposing to introduce, during recovery,
we can replay the full-page-image from WAL of such a page and then
perform the required update, so after recovery, the page won't be torn
anymore.

Basically, the idea is that to protect from such torn-writes
(half-written pages), we have a concept called full-page writes which
protects the data from such writes after recovery. Before writing to
any page after a checkpoint, we write its full-page-image in WAL which
helps us in recovering from such situations but with your proposed
mechanism it won't work.

Another concern I have with this idea is that you want to keep writing
WAL for such a relation but don't want to replay in recovery which
sounds like a good idea.

The idea to keep part of the table as logged and other as unlogged
sounds scary to me. Now, IIUC, you are trying to come up with these
ideas because to use Alter Table .. Set Unlogged, one has to rewrite
the entire table and if such a table is large, it will be a very
time-consuming operation. You might want to explore whether we can
avoid rewriting the table for such an operation but I don't think that
is easy either. The two problems I could see immediately are (a) we
have to change BM_PERMANENT marking of exiting buffers of such a
relation which again can be a time-consuming operation especially for
a large value of shread_buffers, (b) create an _init fork of such a
relation in-sync with the commit. You might want to read the archives
to see why at the first place we have decided to re-write the table
for SET UNLOGGED operation, see email [1].

[1] - https://www.postgresql.org/message-id/CAFcNs%2Bpeg3VPG2%3Dv6Lu3vfCDP8mt7cs6-RMMXxjxWNLREgSRVQ%40mail.gmail.com

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Следующее
От: Konstantin Knizhnik
Дата:
Сообщение: INSERT ON CONFLICT and RETURNING