Обсуждение: pg_archivecleanup should remove WAL files also in pg_xlog?
Hi, pg_archivecleanup removes unnecessary WAL files from the archive, but not from pg_xlog directory. So, after failover, those WAL files might exist in pg_xlog and be archived again later. Re-archiving of unnecessary WAL files seems odd to me. To avoid this problem, how about changing pg_archivecleanup so that it removes WAL files also in pg_xlog or creates .done file in archive_status when removing them from the archive? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Dec 13, 2010 at 3:44 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > pg_archivecleanup removes unnecessary WAL files from the archive, but not > from pg_xlog directory. So, after failover, those WAL files might > exist in pg_xlog > and be archived again later. Re-archiving of unnecessary WAL files seems odd > to me. To avoid this problem, how about changing pg_archivecleanup so that > it removes WAL files also in pg_xlog or creates .done file in > archive_status when > removing them from the archive? Well, we can avoid this problem by specifying pg_xlog directory instead of the archive in recovery_end_command: recovery_end_command = 'pg_archivecleanup pg_xlog %r' Though this sounds like somewhat bad know-how.. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 13.12.2010 08:44, Fujii Masao wrote: > pg_archivecleanup removes unnecessary WAL files from the archive, but not > from pg_xlog directory. So, after failover, those WAL files might > exist in pg_xlog and be archived again later. A file that has already been archived successfully should not be archived again. The server keeps track of which files it has already archived with the .ready/.done files. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, Dec 13, 2010 at 4:28 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 13.12.2010 08:44, Fujii Masao wrote: >> >> pg_archivecleanup removes unnecessary WAL files from the archive, but not >> from pg_xlog directory. So, after failover, those WAL files might >> exist in pg_xlog and be archived again later. > > A file that has already been archived successfully should not be archived > again. The server keeps track of which files it has already archived with > the .ready/.done files. This seems to require * archiver to save the last archived WAL file name in the shmem * walsender to send it to walreceiver * walreceiverto create .done file when it's arrived * bgwriter not to remove WAL files which don't have .done file in standby Right? One good side effect of this is that we can prevent WAL files from being removed from the standby before the master archives them. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 13.12.2010 09:50, Fujii Masao wrote: > On Mon, Dec 13, 2010 at 4:28 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> On 13.12.2010 08:44, Fujii Masao wrote: >>> >>> pg_archivecleanup removes unnecessary WAL files from the archive, but not >>> from pg_xlog directory. So, after failover, those WAL files might >>> exist in pg_xlog and be archived again later. >> >> A file that has already been archived successfully should not be archived >> again. The server keeps track of which files it has already archived with >> the .ready/.done files. > > This seems to require > > * archiver to save the last archived WAL file name in the shmem > * walsender to send it to walreceiver > * walreceiver to create .done file when it's arrived > * bgwriter not to remove WAL files which don't have .done file in standby > > Right? One good side effect of this is that we can prevent WAL files from > being removed from the standby before the master archives them. Oh, you said "after failover", I missed that. So the problem is that the standby might try to re-archive files that the master already archived. If the only consequence is that you get some extra WAL files in the archive, until pg_archivecleanup runs again, I think we can just live with it. But don't you have bigger problems when standby tries to archive a file that already exists in the archive, because master already archived it? We advise to write archive_command so that it fails if the file exists already. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, Dec 13, 2010 at 5:01 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > If the only consequence is that you get some extra WAL files in the archive, > until pg_archivecleanup runs again, I think we can just live with it. We might get some extra WAL files also in pg_xlog. Because the WAL files which don't have .ready/.done file survive two checkpoints after failover. IOW, though first checkpoint right after failover should remove those WAL files, they cannot be removed since they don't have .done file. Then the first checkpoint creates .ready files for them, the archiver performs bulk-archiving (this would be harm in performance if there are many such WAL files), and the subsequent checkpoint removes them. > But > don't you have bigger problems when standby tries to archive a file that > already exists in the archive, because master already archived it? We advise > to write archive_command so that it fails if the file exists already. Yep, that's recommended in the document, but I don't want to use that. Because that might make new master fail to archive WAL file because of existence of half-baked file in the archive, after failover. This can happen if the master crashes while its archiver is copying WAL file. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Dec 13, 2010 at 6:02 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > Yep, that's recommended in the document, but I don't want to use that. > Because that might make new master fail to archive WAL file because of > existence of half-baked file in the archive, after failover. This can happen > if the master crashes while its archiver is copying WAL file. This occurred to me that archive_command should check the size of the existing archived WAL file, and overwrite it with new file when the size is not equal to 16MB. Of course, if backup history file or timeline history file is given, it should not take account of the file size. I implemented pg_archivecopy module to do the above. git://git.postgresql.org/git/users/fujii/postgres.git branch: pg_archivecopy If it's worth, I'll release it in pgFoundry or elsewhere. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center