Re: basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?
Дата
Msg-id 20150122185607.GE7148@alap3.anarazel.de
обсуждение исходный текст
Ответ на basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?  (Stephen Frost <sfrost@snowman.net>)
Re: basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
Hi,

On 2015-01-20 16:28:19 +0100, Andres Freund wrote:
> I'm analyzing a problem in which a customer had a pg_basebackup (from
> standby) created 9.2 cluster that failed with "WAL contains references to
> invalid pages". The failed record was a "xlog redo visible"
> i.e. XLOG_HEAP2_VISIBLE.
>
> First I thought there might be another bug along the line of
> 17fa4c321cc. Looking at the code and the WAL that didn't seem to be the
> case (man, I miss pg_xlogdump). Other, slightly older, standbys, didn't
> seem to have any problems.
>
> Logs show that a ALTER DATABASE ... SET TABLESPACE ... was running when
> the basebackup was started and finished *before* pg_basebackup finished.
>
> movedb() basically works in these steps:
> 1) lock out users of the database
> 2) RequestCheckpoint(IMMEDIATE|WAIT)
> 3) DropDatabaseBuffers()
> 4) copydir()
> 5) XLogInsert(XLOG_DBASE_CREATE)
> 6) RequestCheckpoint(CHECKPOINT_IMMEDIATE)
> 7) rmtree(src_dbpath)
> 8) XLogInsert(XLOG_DBASE_DROP)
> 9) unlock database
>
> If a basebackup starts while 4) is in progress and continues until 7)
> happens I think a pretty wide race opens: The basebackup can end up with
> a partial copy of the database in the old tablespace because the
> rmtree(old_path) concurrently was in progress.  Normally such races are
> fixed during replay. But in this case, the replay of the
> XLOG_DBASE_CREATE will just try to do a rmtree(new); copydiar(old, new);.
> fixing nothing.
>
> Besides making AD .. ST use sane WAL logging, which doesn't seem
> backpatchable, I don't see what could be done against this except
> somehow making basebackups fail if a AD .. ST is in progress. Which
> doesn't look entirely trivial either.

I basically have two ideas to fix this.

1) Make do_pg_start_backup() acquire a SHARE lock on  pg_database. That'll prevent it from starting while a movedb() is
still in progress. Then additionally add pg_backup_in_progress()  function to xlog.c that checks
(XLogCtl->Insert.exclusiveBackup||  XLogCtl->Insert.nonExclusiveBackups != 0). Use that in createdb() and  movedb() to
errorout if a backup is in progress.
 
  Not pretty, but sounds doable.
  We've discussed trying to sleep instead of erroring out in movedb(),  while a base backup is in progress, but that's
nontrivial.I also  don't think ALTER DATABASE is ever intentionally run at the time of a  base backup.
 

2) Make movedb() (and possibly created(), not sure yet) use proper WAL  logging and log the whole copied data. I think
thisis the right long  term fix and would end up being much more reliable. But it either  requires some uglyness during
redo(creating nonexistant database  directories on the fly during redo) or new wal records.
 
  Doable, but probably too large/invasive to backpatch.

Thanks for Alvaro and Petr for discussing the problem.

I lean towards doing 1) in all branches and then doing 2) in master.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: PQputCopyEnd doesn't adhere to its API contract
Следующее
От: Bruce Momjian
Дата:
Сообщение: pg_upgrade and rsync