Re: [GENERAL] Slow PITR restore

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: [GENERAL] Slow PITR restore
Дата
Msg-id 4761AAC9.2050303@enterprisedb.com
обсуждение исходный текст
Ответ на Re: [GENERAL] Slow PITR restore  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [GENERAL] Slow PITR restore  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tom Lane wrote:
> Also, I have not seen anyone provide a very credible argument why
> we should spend a lot of effort on optimizing a part of the system
> that is so little-exercised.  Don't tell me about warm standby
> systems --- they are fine as long as recovery is at least as fast
> as the original transactions, and no evidence has been provided to
> suggest that it's not.

Koichi showed me & Simon graphs of DBT-2 runs in their test lab back in 
May. They had setup two identical systems, one running the benchmark, 
and another one as a warm stand-by. The stand-by couldn't keep up; it 
couldn't replay the WAL as quickly as the primary server produced it. 
IIRC, replaying WAL generated in a 1h benchmark run took 6 hours.

It sounds unbelievable at first, but the problem is that our WAL replay 
doesn't scale. On the primary server, you can have (and they did) a huge 
RAID array with dozens of disks, and a lot of concurrent activity 
keeping it busy. On the standby, we do all the same work, but with a 
single process. Every time we need to read in a page to modify it, we 
block. No matter how many disks you have in the array, it won't help, 
because we only issue one I/O request at a time.

That said, I think the change we made in Spring to not read in pages for 
full page writes will help a lot with that. It would be nice to see some 
new benchmark results to measure that. However, it didn't fix the 
underlying scalability problem.

One KISS approach would be to just do full page writes more often. It 
would obviously bloat the WAL, but it would make the replay faster.

Another reason you would care about fast recovery is PITR. If you do 
base backups only once a week, for example, when you need to recover 
using the archive, you might have to replay a weeks worth of WAL in the 
worst case. You don't want to wait a week for the replay to finish.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: [GENERAL] Slow PITR restore
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [GENERAL] Slow PITR restore