Re: Online enabling of checksums

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: Online enabling of checksums
Дата
Msg-id CABUevEwQYBmFLQ5VYtDhWW37nH5WG85WA5oHMvMD0HtHtNzbSg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Online enabling of checksums  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Online enabling of checksums  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers


On Sat, Mar 3, 2018 at 5:06 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 03/03/2018 01:38 PM, Robert Haas wrote:
> On Sat, Mar 3, 2018 at 7:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Mar 2, 2018 at 6:26 PM, Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>>> Hmmm, OK. So we need to have a valid checksum on a page, disable
>>> checksums, set some hint bits on the page (which won't be
>>> WAL-logged), enable checksums again and still get a valid
>>> checksum even with the new hint bits? That's possible, albeit
>>> unlikely.
>>
>> No, the problem is if - as is much more likely - the checksum is
>> not still valid.
>
> Hmm, on second thought ... maybe I didn't think this through
> carefully enough. If the checksum matches on the master by chance,
> and the page is the same on the standby, then we're fine, right? It's
> a weird accident, but nothing is actually broken. The failure
> scenario is where the standby has a version of the page with a bad
> checksum, but the master has a good checksum. So for example:
> checksums disabled, master modifies the page (which is replicated),
> master sets some hint bits (coincidentally making the checksum
> match), now we try to turn checksums on and don't re-replicate the
> page because the checksum already looks correct.
>

Yeah. Doesn't that pretty much mean we can't skip any pages that have
correct checksum, because we can't rely on standby having the same page
data? That is, this block in ProcessSingleRelationFork:

  /*
   * If checksum was not set or was invalid, mark the buffer as dirty
   * and force a full page write. If the checksum was already valid, we
   * can leave it since we know that any other process writing the
   * buffer will update the checksum.
   */
  if (checksum != pagehdr->pd_checksum)
  {
      START_CRIT_SECTION();
      MarkBufferDirty(buf);
      log_newpage_buffer(buf, false);
      END_CRIT_SECTION();
  }

That would mean this optimization - only doing the write when the
checksum does not match - is broken.



If that's the case, it probably makes restarts/resume more expensive,
because this optimization was why after restart the already processed
data was only read (and the checksums verified) but not written.


Yes, it definitely does. It's not a dealbreaker, but it's certainly a bit painful not to be able to resume as cheap.

 
--

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Online enabling of checksums
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Online enabling of checksums