Обсуждение: Make drop database safer

Поиск
Список
Период
Сортировка

Make drop database safer

От
Alexandra Wang
Дата:
Current sequence of operations for drop database (dropdb())
1. Start Transaction
2. Make catalog changes
3. Drop database buffers
4. Forget database fsync requests
5. Checkpoint
6. Delete database directory
7. Commit Transaction

Problem
This sequence is unsafe from couple of fronts. Like if drop database, aborts (means system can crash/shutdown can happen) right after buffers are dropped step 3 or step 4. The database will still exist and fully accessible but will loose the data from the dirty buffers. This seems very bad.

Operation can abort after step 5 as well in which can the entries remain in catalog but the database is not accessible. Which is bad as well but not as severe as above case mentioned, where it exists but some stuff goes magically missing.

Repo:
```
CREATE DATABASE test;
\c test
CREATE TABLE t1(a int); CREATE TABLE t2(a int); CREATE TABLE t3(a int); 
\c postgres
DROP DATABASE test; <<====== kill the session after DropDatabaseBuffers() (make sure to issue checkpoint before killing the session)
```

Proposed ways to fix
1. CommitTransactionCommand() right after step 2. This makes it fully safe as the catalog will have the database dropped. Files may still exist on disk in some cases which is okay. This also makes it consistent with the approach used in movedb().

2. Alternative way to make it safer is perform Checkpoint (step 5) just before dropping database buffers, to avoid the unsafe nature. Caveats of this solution is:
- Performs IO for data which in success case anyways will get deleted
- Still doesn't cover the case where catalog has the database entry but files are removed from disk

3. One more fancier approach is to use pending delete mechanism used by relation drops, to perform these non-catalog related activities at commit. Easily, the pending delete structure can be added boolean to convey database directory dropping instead of file. Given drop database can't be performed inside transaction, not needed to be done this way, but this makes it one consistent approach used to deal with on-disk removal.

We passing along patch with fix 1, as seems most promising to us. But we would love to hear thoughts on other solutions mentioned.
===================================
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index d207cd899f..af1b1e0896 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -917,6 +917,9 @@ dropdb(const char *dbname, bool missing_ok)
         */
        dropDatabaseDependencies(db_id);

+       CommitTransactionCommand();
+       StartTransactionCommand();
+
        /*
         * Drop db-specific replication slots.
         */
===================================

Thanks, 
Ashwin and Alex

Re: Make drop database safer

От
Andres Freund
Дата:
Hi,

On 2019-02-08 16:36:13 -0800, Alexandra Wang wrote:
> Current sequence of operations for drop database (dropdb())
> 1. Start Transaction
> 2. Make catalog changes
> 3. Drop database buffers
> 4. Forget database fsync requests
> 5. Checkpoint
> 6. Delete database directory
> 7. Commit Transaction
> 
> Problem
> This sequence is unsafe from couple of fronts. Like if drop database,
> aborts (means system can crash/shutdown can happen) right after buffers are
> dropped step 3 or step 4. The database will still exist and fully
> accessible but will loose the data from the dirty buffers. This seems very
> bad.
> 
> Operation can abort after step 5 as well in which can the entries remain in
> catalog but the database is not accessible. Which is bad as well but not as
> severe as above case mentioned, where it exists but some stuff goes
> magically missing.
> 
> Repo:
> ```
> CREATE DATABASE test;
> \c test
> CREATE TABLE t1(a int); CREATE TABLE t2(a int); CREATE TABLE t3(a int);
> \c postgres
> DROP DATABASE test; <<====== kill the session after DropDatabaseBuffers()
> (make sure to issue checkpoint before killing the session)
> ```
> 
> Proposed ways to fix
> 1. CommitTransactionCommand() right after step 2. This makes it fully safe
> as the catalog will have the database dropped. Files may still exist on
> disk in some cases which is okay. This also makes it consistent with the
> approach used in movedb().

To me this seems bad. The current failure mode obviously isn't good, but
the data obviously isn't valuable, and just loosing track of an entire
database worth of data seems worse.


> 2. Alternative way to make it safer is perform Checkpoint (step 5) just
> before dropping database buffers, to avoid the unsafe nature. Caveats of
> this solution is:
> - Performs IO for data which in success case anyways will get deleted
> - Still doesn't cover the case where catalog has the database entry but
> files are removed from disk

That seems like an unacceptable slowdown.


> 3. One more fancier approach is to use pending delete mechanism used by
> relation drops, to perform these non-catalog related activities at commit.
> Easily, the pending delete structure can be added boolean to convey
> database directory dropping instead of file. Given drop database can't be
> performed inside transaction, not needed to be done this way, but this
> makes it one consistent approach used to deal with on-disk removal.

ISTM we'd need to do something like this.

Greetings,

Andres Freund


Re: Make drop database safer

От
Ashwin Agrawal
Дата:

Thanks for the response and inputs.

On Sat, Feb 9, 2019 at 4:51 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2019-02-08 16:36:13 -0800, Alexandra Wang wrote:
> Current sequence of operations for drop database (dropdb())
> 1. Start Transaction
> 2. Make catalog changes
> 3. Drop database buffers
> 4. Forget database fsync requests
> 5. Checkpoint
> 6. Delete database directory
> 7. Commit Transaction
>
> Problem
> This sequence is unsafe from couple of fronts. Like if drop database,
> aborts (means system can crash/shutdown can happen) right after buffers are
> dropped step 3 or step 4. The database will still exist and fully
> accessible but will loose the data from the dirty buffers. This seems very
> bad.
>
> Operation can abort after step 5 as well in which can the entries remain in
> catalog but the database is not accessible. Which is bad as well but not as
> severe as above case mentioned, where it exists but some stuff goes
> magically missing.
>
> Repo:
> ```
> CREATE DATABASE test;
> \c test
> CREATE TABLE t1(a int); CREATE TABLE t2(a int); CREATE TABLE t3(a int);
> \c postgres
> DROP DATABASE test; <<====== kill the session after DropDatabaseBuffers()
> (make sure to issue checkpoint before killing the session)
> ```
>
> Proposed ways to fix
> 1. CommitTransactionCommand() right after step 2. This makes it fully safe
> as the catalog will have the database dropped. Files may still exist on
> disk in some cases which is okay. This also makes it consistent with the
> approach used in movedb().

To me this seems bad. The current failure mode obviously isn't good, but
the data obviously isn't valuable, and just loosing track of an entire
database worth of data seems worse.

So, based on that response seems not loosing track to the files associated with the database is design choice we wish to achieve. Hence catalog having entry but data directory being deleted is fine behavior to have and doesn't need to be solved.

> 2. Alternative way to make it safer is perform Checkpoint (step 5) just
> before dropping database buffers, to avoid the unsafe nature. Caveats of
> this solution is:
> - Performs IO for data which in success case anyways will get deleted
> - Still doesn't cover the case where catalog has the database entry but
> files are removed from disk

That seems like an unacceptable slowdown.

Given dropping database should be infrequent operation and only addition IO cost is for buffers for that database itself as Checkpoint is anyways performed in later step, is it really unacceptable slowdown, compared to safety it brings ?
 

> 3. One more fancier approach is to use pending delete mechanism used by
> relation drops, to perform these non-catalog related activities at commit.
> Easily, the pending delete structure can be added boolean to convey
> database directory dropping instead of file. Given drop database can't be
> performed inside transaction, not needed to be done this way, but this
> makes it one consistent approach used to deal with on-disk removal.

ISTM we'd need to do something like this.

Given the above design choice to retain link to database files till actually deleted, not seeing why pending delete approach any better than approach 1. This approach will allow us to track the database oid in commit transaction xlog record but any checkpoint post the same still looses the reference to the database. Which is same case in approach 1 where separate xlog record XLOG_DBASE_DROP is written just after committing the transaction.
When we proposed approach 3, we thought its functionally same as approach 1 just differs in implementation. But your preference to this approach and stating approach 1 as bad, reads as pending deletes approach is functionally different, we would like to hear more how?

Considering the design choice we must meet, seems approach 2, moving Checkpoint from step 5 before step 3 would give us the safety desired and retain the desired link to the database till we actually delete the files for it.

Thanks,
Ashwin and Alex

Re: Make drop database safer

От
Ashwin Agrawal
Дата:
On Mon, Feb 11, 2019 at 3:55 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:

Thanks for the response and inputs.

On Sat, Feb 9, 2019 at 4:51 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2019-02-08 16:36:13 -0800, Alexandra Wang wrote:
> Current sequence of operations for drop database (dropdb())
> 1. Start Transaction
> 2. Make catalog changes
> 3. Drop database buffers
> 4. Forget database fsync requests
> 5. Checkpoint
> 6. Delete database directory
> 7. Commit Transaction
>
> Problem
> This sequence is unsafe from couple of fronts. Like if drop database,
> aborts (means system can crash/shutdown can happen) right after buffers are
> dropped step 3 or step 4. The database will still exist and fully
> accessible but will loose the data from the dirty buffers. This seems very
> bad.
>
> Operation can abort after step 5 as well in which can the entries remain in
> catalog but the database is not accessible. Which is bad as well but not as
> severe as above case mentioned, where it exists but some stuff goes
> magically missing.
>
> Repo:
> ```
> CREATE DATABASE test;
> \c test
> CREATE TABLE t1(a int); CREATE TABLE t2(a int); CREATE TABLE t3(a int);
> \c postgres
> DROP DATABASE test; <<====== kill the session after DropDatabaseBuffers()
> (make sure to issue checkpoint before killing the session)
> ```
>
> Proposed ways to fix
> 1. CommitTransactionCommand() right after step 2. This makes it fully safe
> as the catalog will have the database dropped. Files may still exist on
> disk in some cases which is okay. This also makes it consistent with the
> approach used in movedb().

To me this seems bad. The current failure mode obviously isn't good, but
the data obviously isn't valuable, and just loosing track of an entire
database worth of data seems worse.

So, based on that response seems not loosing track to the files associated with the database is design choice we wish to achieve. Hence catalog having entry but data directory being deleted is fine behavior to have and doesn't need to be solved.

> 2. Alternative way to make it safer is perform Checkpoint (step 5) just
> before dropping database buffers, to avoid the unsafe nature. Caveats of
> this solution is:
> - Performs IO for data which in success case anyways will get deleted
> - Still doesn't cover the case where catalog has the database entry but
> files are removed from disk

That seems like an unacceptable slowdown.

Given dropping database should be infrequent operation and only addition IO cost is for buffers for that database itself as Checkpoint is anyways performed in later step, is it really unacceptable slowdown, compared to safety it brings ?
 

> 3. One more fancier approach is to use pending delete mechanism used by
> relation drops, to perform these non-catalog related activities at commit.
> Easily, the pending delete structure can be added boolean to convey
> database directory dropping instead of file. Given drop database can't be
> performed inside transaction, not needed to be done this way, but this
> makes it one consistent approach used to deal with on-disk removal.

ISTM we'd need to do something like this.

Given the above design choice to retain link to database files till actually deleted, not seeing why pending delete approach any better than approach 1. This approach will allow us to track the database oid in commit transaction xlog record but any checkpoint post the same still looses the reference to the database. Which is same case in approach 1 where separate xlog record XLOG_DBASE_DROP is written just after committing the transaction.
When we proposed approach 3, we thought its functionally same as approach 1 just differs in implementation. But your preference to this approach and stating approach 1 as bad, reads as pending deletes approach is functionally different, we would like to hear more how?

Considering the design choice we must meet, seems approach 2, moving Checkpoint from step 5 before step 3 would give us the safety desired and retain the desired link to the database till we actually delete the files for it.

Awaiting response on this thread, highly appreciate the inputs.

Re: Make drop database safer

От
Tomas Vondra
Дата:

On 2/12/19 12:55 AM, Ashwin Agrawal wrote:
> 
> Thanks for the response and inputs.
> 
> On Sat, Feb 9, 2019 at 4:51 AM Andres Freund <andres@anarazel.de 
> <mailto:andres@anarazel.de>> wrote:
> 
>     Hi,
> 
>     On 2019-02-08 16:36:13 -0800, Alexandra Wang wrote:
>      > Current sequence of operations for drop database (dropdb())
>      > 1. Start Transaction
>      > 2. Make catalog changes
>      > 3. Drop database buffers
>      > 4. Forget database fsync requests
>      > 5. Checkpoint
>      > 6. Delete database directory
>      > 7. Commit Transaction
>      >
>      > Problem
>      > This sequence is unsafe from couple of fronts. Like if drop database,
>      > aborts (means system can crash/shutdown can happen) right after
>     buffers are
>      > dropped step 3 or step 4. The database will still exist and fully
>      > accessible but will loose the data from the dirty buffers. This
>     seems very
>      > bad.
>      >
>      > Operation can abort after step 5 as well in which can the entries
>     remain in
>      > catalog but the database is not accessible. Which is bad as well
>     but not as
>      > severe as above case mentioned, where it exists but some stuff goes
>      > magically missing.
>      >
>      > Repo:
>      > ```
>      > CREATE DATABASE test;
>      > \c test
>      > CREATE TABLE t1(a int); CREATE TABLE t2(a int); CREATE TABLE t3(a
>     int);
>      > \c postgres
>      > DROP DATABASE test; <<====== kill the session after
>     DropDatabaseBuffers()
>      > (make sure to issue checkpoint before killing the session)
>      > ```
>      >
>      > Proposed ways to fix
>      > 1. CommitTransactionCommand() right after step 2. This makes it
>     fully safe
>      > as the catalog will have the database dropped. Files may still
>     exist on
>      > disk in some cases which is okay. This also makes it consistent
>     with the
>      > approach used in movedb().
> 
>     To me this seems bad. The current failure mode obviously isn't good, but
>     the data obviously isn't valuable, and just loosing track of an entire
>     database worth of data seems worse.
> 
> 
> So, based on that response seems not loosing track to the files 
> associated with the database is design choice we wish to achieve. Hence 
> catalog having entry but data directory being deleted is fine behavior 
> to have and doesn't need to be solved.
> 

What about adding 'is dropped' flag to pg_database, set it to true at 
the beginning of DROP DATABASE and commit? And ensure no one can connect 
to such database, making DROP DATABASE the only allowed operation?

ISTM we could then continue doing the same thing we do today, without 
any extra checkpoints etc.

>  > 2. Alternative way to make it safer is perform Checkpoint (step 5) just
> 
>      > before dropping database buffers, to avoid the unsafe nature.
>     Caveats of
>      > this solution is:
>      > - Performs IO for data which in success case anyways will get deleted
>      > - Still doesn't cover the case where catalog has the database
>     entry but
>      > files are removed from disk
> 
>     That seems like an unacceptable slowdown.
> 
> 
> Given dropping database should be infrequent operation and only addition 
> IO cost is for buffers for that database itself as Checkpoint is anyways 
> performed in later step, is it really unacceptable slowdown, compared to 
> safety it brings ?
> 

That's probably true, although I do know quite a few systems that create 
and drop databases fairly often. And the implied explicit checkpoints 
are quite painful, so I'd vote not to make this worse.

FWIW I don't recall why exactly we need the checkpoints, except perhaps 
to ensure the file copies see the most recent data (in CREATE DATABASE) 
and evict stuff for the to-be-dropped database from shared bufers. I 
wonder if we could do that without a checkpoint somehow ...

> 
>      > 3. One more fancier approach is to use pending delete mechanism
>     used by
>      > relation drops, to perform these non-catalog related activities
>     at commit.
>      > Easily, the pending delete structure can be added boolean to convey
>      > database directory dropping instead of file. Given drop database
>     can't be
>      > performed inside transaction, not needed to be done this way, but
>     this
>      > makes it one consistent approach used to deal with on-disk removal.
> 
>     ISTM we'd need to do something like this.
> 
> 
> Given the above design choice to retain link to database files till 
> actually deleted, not seeing why pending delete approach any better than 
> approach 1. This approach will allow us to track the database oid in 
> commit transaction xlog record but any checkpoint post the same still 
> looses the reference to the database. Which is same case in approach 1 
> where separate xlog record XLOG_DBASE_DROP is written just after 
> committing the transaction.
> When we proposed approach 3, we thought its functionally same as 
> approach 1 just differs in implementation. But your preference to this 
> approach and stating approach 1 as bad, reads as pending deletes 
> approach is functionally different, we would like to hear more how?
> 

Hmmm, I don't see how this is an improvement over option #1 either.

> Considering the design choice we must meet, seems approach 2, moving 
> Checkpoint from step 5 before step 3 would give us the safety desired 
> and retain the desired link to the database till we actually delete the 
> files for it.
> 

Ummm? That essentially means this order:

1. Start Transaction
2. Make catalog changes
5. Checkpoint
3. Drop database buffers
4. Forget database fsync requests
6. Delete database directory
7. Commit Transaction

I don't see how that actually fixes any of the issues? Can you explain?

Not to mention we might end up doing quite a bit of I/O to checkpoint 
buffers from the database that is going to disappear shortly ...

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Make drop database safer

От
Ashwin Agrawal
Дата:

On Wed, Mar 6, 2019 at 7:56 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:


On 2/12/19 12:55 AM, Ashwin Agrawal wrote:
>
> Thanks for the response and inputs.
>
> On Sat, Feb 9, 2019 at 4:51 AM Andres Freund <andres@anarazel.de
> <mailto:andres@anarazel.de>> wrote:
>
>     Hi,
>
>     On 2019-02-08 16:36:13 -0800, Alexandra Wang wrote:
>      > Current sequence of operations for drop database (dropdb())
>      > 1. Start Transaction
>      > 2. Make catalog changes
>      > 3. Drop database buffers
>      > 4. Forget database fsync requests
>      > 5. Checkpoint
>      > 6. Delete database directory
>      > 7. Commit Transaction
>      >
>      > Problem
>      > This sequence is unsafe from couple of fronts. Like if drop database,
>      > aborts (means system can crash/shutdown can happen) right after
>     buffers are
>      > dropped step 3 or step 4. The database will still exist and fully
>      > accessible but will loose the data from the dirty buffers. This
>     seems very
>      > bad.
>      >
>      > Operation can abort after step 5 as well in which can the entries
>     remain in
>      > catalog but the database is not accessible. Which is bad as well
>     but not as
>      > severe as above case mentioned, where it exists but some stuff goes
>      > magically missing.
>      >
>      > Repo:
>      > ```
>      > CREATE DATABASE test;
>      > \c test
>      > CREATE TABLE t1(a int); CREATE TABLE t2(a int); CREATE TABLE t3(a
>     int);
>      > \c postgres
>      > DROP DATABASE test; <<====== kill the session after
>     DropDatabaseBuffers()
>      > (make sure to issue checkpoint before killing the session)
>      > ```
>      >
>      > Proposed ways to fix
>      > 1. CommitTransactionCommand() right after step 2. This makes it
>     fully safe
>      > as the catalog will have the database dropped. Files may still
>     exist on
>      > disk in some cases which is okay. This also makes it consistent
>     with the
>      > approach used in movedb().
>
>     To me this seems bad. The current failure mode obviously isn't good, but
>     the data obviously isn't valuable, and just loosing track of an entire
>     database worth of data seems worse.
>
>
> So, based on that response seems not loosing track to the files
> associated with the database is design choice we wish to achieve. Hence
> catalog having entry but data directory being deleted is fine behavior
> to have and doesn't need to be solved.
>

What about adding 'is dropped' flag to pg_database, set it to true at
the beginning of DROP DATABASE and commit? And ensure no one can connect
to such database, making DROP DATABASE the only allowed operation?

ISTM we could then continue doing the same thing we do today, without
any extra checkpoints etc.

Sure, adding 'is dropped' column flag to pg_database and committing that update before dropping database buffers can give us the safety and also allows to keep the link. But seems very heavy duty way to gain the desired behavior with extra column in pg_database catalog table specifically just to protect against this failure window. If this solution gets at least one more vote as okay route to take, I am fine implementing it.

I am surprised though that keeping the link to database worth of data and not losing track of it is preferred for dropdb() but not cared for in movedb(). In movedb(), transaction commits first and then old database directory is deleted, which was the first solution proposed for dropdb().

FWIW I don't recall why exactly we need the checkpoints, except perhaps
to ensure the file copies see the most recent data (in CREATE DATABASE)
and evict stuff for the to-be-dropped database from shared bufers. I
wonder if we could do that without a checkpoint somehow ...

Checkpoint during CREATE DATABASE is understandable. But yes during dropdb() seems unnecessary. Only rational seems for windows based on comment in code "On Windows, this also ensures that background procs don't hold any open files, which would cause rmdir() to fail." I think we can avoid the checkpoint for all other platforms in dropdb() except windows. Even for windows if have way to easily ensure no background procs have open files, without checkpoint then can be avoided even for it.

> Considering the design choice we must meet, seems approach 2, moving
> Checkpoint from step 5 before step 3 would give us the safety desired
> and retain the desired link to the database till we actually delete the
> files for it.
>

Ummm? That essentially means this order:

1. Start Transaction
2. Make catalog changes
5. Checkpoint
3. Drop database buffers
4. Forget database fsync requests
6. Delete database directory
7. Commit Transaction

I don't see how that actually fixes any of the issues? Can you explain?

Since checkpoint is performed, all the dirty buffers make to the disk and avoid loosing the data for this database if command fails after this and database doesn't get dropped, problem we started this thread with. Drop database buffers step after it will just be removing read-only buffers. Forget database fsync requests step ideally can be removed as not needed with that sequence.
 
Not to mention we might end up doing quite a bit of I/O to checkpoint
buffers from the database that is going to disappear shortly ...

Yes, that's the downside it comes with.