Re: Online enabling of checksums

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: Online enabling of checksums
Дата
Msg-id CABUevEx6o6KfU3WJn7UTiRKWfjvpVhA++uagsmvkmTBNhxJJNw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Online enabling of checksums  (Andres Freund <andres@anarazel.de>)
Ответы Re: Online enabling of checksums  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Online enabling of checksums  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers


On Thu, Apr 5, 2018 at 11:41 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-04-05 23:32:19 +0200, Magnus Hagander wrote:
> On Thu, Apr 5, 2018 at 11:23 PM, Andres Freund <andres@anarazel.de> wrote:
> > Is there any sort of locking that guarantees that worker processes see
> > an up2date value of
> > DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
> > there's not. So you can afaict end up with checksums being computed by
> > the worker, but concurrent writes missing them.  The window is going to
> > be at most one missed checksum per process (as the unlocking of the page
> > is a barrier) and is probably not easy to hit, but that's dangerous
> > enough.
> >
>
> So just to be clear of the case you're worried about. It's basically:
> Session #1 - sets checksums to inprogress
> Session #1 - starts dynamic background worker ("launcher")
> Launcher reads and enumerates pg_database
> Launcher starts worker in first database
> Worker processes first block of data in database
> And at this point, Session #2 has still not seen the "checksums inprogress"
> flag and continues to write without checksums?

Yes.  I think there are some variations of that, but yes, that's pretty
much it.


> That seems like quite a long time to me -- is that really a problem?

We don't generally build locking models that are only correct based on
likelihood. Especially not without a lengthy comment explaining that
analysis.

Oh, that's not my intention either -- I just wanted to make sure I was thinking about the same issue you were.

Since you know a lot more about that type of interlocks than I do :) We already wait for all running transactions to finish before we start doing anything. Obviously transactions != buffer writes (and we have things like the checkpointer/bgwriter to consider). Is there something else that we could safely just *wait* for? I have no problem whatsoever if this is a long wait (given the total time). I mean to the point of "what if we just stick a sleep(10) in there" level waiting.

Or can that somehow be cleanly solved using some of the new atomic operators? Or is that likely to cause the same kind of overhead as throwing a barrier in there?


-- 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Re: [HACKERS] Optional message to user when terminating/cancelling backend
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: [HACKERS] MERGE SQL Statement for PG11