Re: Adding REPACK [concurrently]
| От | Antonin Houska |
|---|---|
| Тема | Re: Adding REPACK [concurrently] |
| Дата | |
| Msg-id | 125085.1775827305@localhost обсуждение |
| Ответ на | RE: Adding REPACK [concurrently] ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
| Список | pgsql-hackers |
Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > When testing REPACK concurrently, I noticed that all WALs are retained from > the moment REPACK begins copying data to the new table until the command > finishes replaying concurrent changes on the new table and stops the repack > decoding worker. > > I understand the reason: the REPACK command itself starts a long-running > transaction, and logical decoding does not advance restart_lsn beyond the > oldest running transaction's start position. As a result, slot.restart_lsn > remains unchanged, preventing the checkpointer from recycling WALs. I think you're right, sorry for the omission. > However, since REPACK can run for a long time (hours or even days), I'd like > to confirm whether this is expected behavior or if we plan to improve it > in the future ? And additionally, Yes, it will be improved. I have a draft patch for it, will rebase and post it soon. The plan is to: 1) preserve the original xmin/xmax of the tuples when we insert them into the new heap. Thus, besides achieving MVCC safety, we won't need XID assigned for most of the time. 2) do catalog changes in separate transactions - XID needed here, but these transactions take very short time. 3) use a single snapshot only for limited number of tuples/pages. When more data needs to be copied, a new snapshot is built, supposedly with higher ->xmin than the prevous one. > IIUC, REPACK without using concurrent option does not have this issue. It does not have the WAL recycling issue because it does not need to read WAL. However it also runs in a long transaction. Even though it does not need XID for the actual heap rewriting, it gets one at the moment it locks the table using AccessExclusiveLock (which is at the very beginning). > Given that we do not restart a REPACK, I think the repack decoding worker > should be able to advance restart_lsn each time after writing changes > (similar to how a physical slot behaves). To illustrate this, I've written > a patch (attached) that implements this approach, and it works fine for me. LGTM, thanks! > BTW, catalog_xmin also won't advance, but that seems not a big issue as > the REPACK transaction itself also holds a snapshot that retains catalog tuples, > so advancing catalog_xmin wouldn't change the situation anyway. The snapshot "resetting" (mentioned above) should fix this problem too. -- Antonin Houska Web: https://www.cybertec-postgresql.com
В списке pgsql-hackers по дате отправления: