Re: restoration after crash slowness, any way to improve?

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: restoration after crash slowness, any way to improve?
Дата
Msg-id CAMkU=1zEJUBFi=PrhqgiMebrimYDmsKF2uz+dxQEdc+C17FoSQ@mail.gmail.com
обсуждение исходный текст
Ответ на restoration after crash slowness, any way to improve?  ("Joshua D. Drake" <jd@commandprompt.com>)
Список pgsql-hackers
On Wed, Aug 31, 2016 at 6:26 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
-hackers,

So this is more of a spit balling thread than anything. As I understand it, if we have a long running transaction or a large number of wal logs and we crash, we then have to restore those logs on restart to the last known good transaction. No problem.

It only has to replay from the start of the last successful checkpoint.  It doesn't matter whether there was a long-running transaction or not.  If a long transaction spans many checkpoints, replay still only has to go back to the start of the last successful checkpoint.  Maybe you just had checkpoint_segments or max_wal_size st way too high, assuming checkpoint_timeout to always kick in instead and be the limiting factor.  But then your long-running transaction  invalidated that assumption?
 
I recently ran a very long transaction. I was building up a large number of rows into a two column table to test index performance. I ended up having to kill the connection and thus the transaction after I realized I had an extra zero in my generate_series(). (Side note: Amazing the difference a single zero can make ;)). When coming back up, I watched the machine and I was averaging anywhere from 60MBs to 97MBs on writes.

Was it IO limited?

Killing a session/connection/transaction should not take down the entire server, so there should be no recovery taking place in the first place.  Are you sure you are seeing recovery, and not just the vacuuming of the aborted tuples?
 


However, since I know this machine can get well over 400MBs when using multiple connections I can't help but wonder if there is anything we can do to make restoration more efficient without sacrificing the purpose of what it is doing?

Can we have multiple readers pull transaction logs into shared_buffers (on recovery only), sort the good transactions and then push them back to the walwriter or bgwriter to the pages?

I don't see how that could work.  Whether a page is consistent or not is orthogonal to whether the transactions on that page have committed or aborted.

There are two possibilities that I've considered though for long-running PITR, which could also apply to crash recovery, and which I think have been discussed here before.  One is to have a leading recovery process which would identify pages which will be recovered from a FPI, and send word back to the lagging process not to bother applying incremental WAL to those pages.  The other would be for a leading process to asynchronously read into memory (either FS cache or shared_buffers) pages which it sees the lagging process will need to write to.

In the first case, you would want the leading process to be leading by a lot, so that it has the broadest scope to detect FPI.  Basically you would want it to read all the way to the end of the replay, provided it had enough memory to store the list of FPI pages.  For the second one, you would not want it to run so far ahead that it the pages it read in would get pushed out again before the lagging process got to them.  Controlling how far ahead that would be seems like it would be hard.

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [PATCH] OpenSSL 1.1.0 support
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Exclude schema during pg_restore