Re: Online enabling of checksums

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: Online enabling of checksums
Дата
Msg-id CABUevEwHMHBAL5Msf-kV1BjMgTZPM8KuOUymZuy-bTSjdPNkYg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Online enabling of checksums  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Online enabling of checksums  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On Fri, Mar 2, 2018 at 5:50 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:


On 03/02/2018 02:35 PM, Magnus Hagander wrote:
>
>
> On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmhaas@gmail.com
> <mailto:robertmhaas@gmail.com>> wrote:
>
>     On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander
>     <magnus@hagander.net <mailto:magnus@hagander.net>> wrote:
>     > Also if that wasn't clear -- we only do the full page write if there isn't
>     > already a checksum on the page and that checksum is correct.
>
>     Hmm.
>
>     Suppose that on the master there is a checksum on the page and that
>     checksum is correct, but on the standby the page contents differ in
>     some way that we don't always WAL-log, like as to hint bits, and there
>     the checksum is incorrect.  Then you'll enable checksums when the
>     standby still has some pages without valid checksums, and disaster
>     will ensue.
>
>     I think this could be hard to prevent if checksums are turned on and
>     off multiple times.
>
>
> Do we ever make hintbit changes on the standby for example? If so, it
> would definitely cause problems. I didn't realize we did, actually...
>

I don't think we do. SetHintBits does TransactionIdIsValid(xid) and
AFAIK that can't be true on a standby.

> I guess we could get there even if we don't by:
> * All checksums are correct
> * Checkums are disabled (which replicates)
> * Non-WAL logged change on the master, which updates checksum but does
> *not* replicate
> * Checksums re-enabled
> * Worker sees the checksum as correct, and thus does not force a full
> page write.
> * Worker completes and flips checksums on which replicates. At this
> point, if the replica reads the page, boom.
>

Maybe.

My understanding of Robert's example is that you can start with an
instance that has wal_log_hints=off, and so pages on master/standby may
not be 100% identical. Then we do the online checksum thing, and the
standby may get pages with incorrect checksums.

No, in that case the master will issue full page writes for *all* pages, since they dind't hvae a checksum. The current patch only avoids doing that if the checksum on the master is correct, which it isn't when you start from checksums=off.  So this particular problem only shows up if you iterate between off/on/off multiple times.


 
> I guess we have to remove that optimisation. It's definitely a
> bummer, but I don't think it's an absolute dealbreaker.
>

I agree it's not a deal-breaker. Or at least I don't see why it should
be - any other maintenance activity on the database (freezing etc.) will
also generate full-page writes.

Yes.

 
The good thing is the throttling also limits the amount of WAL, so it's
possible to prevent generating too many checkpoints etc.

I suggest we simply:

1) set the checksums to in-progress
2) wait for a checkpoint
3) use the regular logic for full-pages (i.e. first change after
checkpoint does a FPW)

This is very close to what it does now, except it does not wait for a checkpoint in #2. Why does it need that?


BTW speaking of checkpoints, I see ChecksumHelperLauncherMain does

    RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | \
                      CHECKPOINT_IMMEDIATE);

I'm rather unhappy about that - immediate checkpoints have massive
impact on production systems, so we try not doing them (That's one of
the reasons why CREATE DATABASE is somewhat painful). It usually
requires a bit of thinking about when to do such commands. But in this
case it's unpredictable when exactly the checksumming completes, so it
may easily be in the middle of peak activity.

Why not to simply wait for regular spread checkpoint, the way
pg_basebackup does it?

Actually, that was my original idea. I changed it for testing, and shuld go change it back.



> We could say that we keep the optimisation if wal_level=minimal for
> example, because then we know there is no replica. But I doubt
> that's worth it?
>

If it doesn't require a lot of code, why not? But I don't really see
much point in doing that.


Yeah, I doubt there are a lot of people using "minimal" these days, not since we changed the default.

//Magnus 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Gould
Дата:
Сообщение: Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate.
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: prokind column (was Re: [HACKERS] SQL procedures)