[bug fix] PITR corrupts the database cluster
От | MauMau |
---|---|
Тема | [bug fix] PITR corrupts the database cluster |
Дата | |
Msg-id | F93E42280A9A4A5EB74FC7350C801A20@maumau обсуждение исходный текст |
Ответы |
Re: [bug fix] PITR corrupts the database cluster
|
Список | pgsql-hackers |
Hello, I've encountered a bug of PITR that corrupts the database. I'm willing to submit the patch to fix it, but I'm wondering what approach is appropriate. Could you give me your opinions? [Problem] I cannot connect to the database after performing the following steps: 1. CREATE DATABASE mydb; 2. Take a base backup with pg_basebackup. 3. DROP DATABASE mydb; 4. Shutdown the database server with "pg_ctl stop". 5. Recover the database cluster to the point where the base backup completed, i.e., before dropping mydb. The contents of recovery.conf is: restore_command = 'cp /arc/dir/%f %p' recovery_target_timeline = 'latest' recovery_target_time = 'STOP TIME recorded in the backup history file which was created during base backup' I expected to be able to connect to mydb because I recovered to the point before dropping mydb. However, I cannot connect to mydb because the directory for mydb does not exist. The entry for mydb exists in pg_database. [Cause] DROP DATABASE emits the below WAL records: 1. System catalog changes including deletion of a tuple for mydb in pg_database 2. Deletion of directories for the database 3. Transaction commit During recovery, postgres replays 1 and 2. It ends the recovery when it notices that the time recorded in commit record (3 above) is later than the recovery target time. The commit record is not replayed, thus the system catalog changes are virtually undone. The problem is that 2 is replayed. This deletes the directory for the database although the transaction is not committed. [How to fix] There are two approaches. Which do you think is the way to go? <Approach 1> During recovery, when the WAL record for directory deletion is found, just record that fact for later replay (in a hash table keyed by xid). When the corresponding transaction commit record is found, replay the directory deletion record. <Approach 2> Like the DROP TABLE/INDEX case, piggyback the directory deletion record on the transaction commit record, and eliminate the directory deletion record altogether. I think we need to take approach 1 even when we also does 2, because 1 is necessary when the backup and archive WAL are already taken with the current PostgreSQL anyway. Regards MauMau
В списке pgsql-hackers по дате отправления: