[Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
От | Dilip Kumar |
---|---|
Тема | [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints |
Дата | |
Msg-id | CAFiTN-sP6yLVTfjR42mEfvFwJ-SZ2iEtG1t0j=QX09X=BM+KWQ@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
(Julien Rouhaud <rjuju123@gmail.com>)
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints (Heikki Linnakangas <hlinnaka@iki.fi>) Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
Currently, CREATE DATABASE forces a checkpoint, then copies all the files, then forces another checkpoint. The comments in the createdb() function explain the reasons for this. The attached patch fixes this problem by making CREATE DATABASE completely WAL-logged so that now we can avoid checkpoints. The patch modifies both CREATE DATABASE and ALTER DATABASE..SET TABLESPACE to be fully WAL-logged. One main advantage of this change is that it will be cheaper. Forcing checkpoints on an idle system is no big deal, but when the system is under heavy write load, it's very expensive. Another advantage is that it makes things better for features like TDE, which might want the pages in the source database to be encrypted using a different key or nonce than the pages in the target database. Design Idea: ----------------- First, create the target database directory along with the version file and WAL-log this operation. Create the "relation map file" in the target database and copy the content from the source database. For this, we can use some modified versions of the write_relmap_file() and WAL-log the relmap create operation along with the file content. Now, read the relmap file to find the relfilenode for pg_class and then we read pg_class block by block and decode the tuples. For reading the pg_class blocks, we can use ReadBufferWithoutRelCache() so that we don't need the relcache. Nothing prevents us from checking visibility for tuples in another database because CLOG is global to the cluster. And nothing prevents us from deforming those tuples because the column definitions for pg_class have to be the same in every database. Then we can get the relfilenode of every file we need to copy, and prepare a list of all such relfilenode. Next, for each relfilenode in the source database, create a respective relfilenode in the target database (for all forks) using smgrcreate, which is already a WAL-logged operation. Now read the source relfilenode block by block using ReadBufferWithoutRelCache() and copy the block to the target relfilenode using smgrextend() and WAL-log them using log_newpage(). For the source database, we can not directly use the smgrread(), because there could be some dirty buffers so we will have to read them through the buffer manager interface, otherwise, we will have to flush all the dirty buffers. WAL sequence using pg_waldump ---------------------------------------------------- 1. (new wal to create db dir and write PG_VERSION file) rmgr: Database desc: CREATE create dir 1663/16394 2. (new wal to create and write relmap file) rmgr: RelMap desc: CREATE database 16394 tablespace 1663 size 512 2. (create relfilenode) rmgr: Storage desc: CREATE base/16394/16384 rmgr: Storage desc: CREATE base/16394/2619 3. (write page data) rmgr: XLOG desc: FPI , blkref #0: rel 1663/16394/2619 blk 0 FPW rmgr: XLOG desc: FPI , blkref #0: rel 1663/16394/2619 blk 1 FPW ............ 4. (create other forks) rmgr: Storage desc: CREATE base/16394/2619_fsm rmgr: Storage CREATE base/16394/2619_vm ............. I have attached a POC patch, which shows this idea, with this patch all basic sanity testing and the "check-world" is passing. Open points: ------------------- - This is a POC patch so needs more refactoring/cleanup and testing. - Might need to relook into the SMGR level API usage. Credits: ----------- Thanks to Robert Haas, for suggesting this idea and the high-level design. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления:
Следующее
От: Julien RouhaudДата:
Сообщение: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints