Re: [HACKERS] WAL logging problem in 9.4.3?

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: [HACKERS] WAL logging problem in 9.4.3?
Дата
Msg-id 20200330.145611.1603373373605263450.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] WAL logging problem in 9.4.3?  (Noah Misch <noah@leadboat.com>)
Ответы Re: [HACKERS] WAL logging problem in 9.4.3?  (Noah Misch <noah@leadboat.com>)
Список pgsql-hackers
At Sun, 29 Mar 2020 21:41:01 -0700, Noah Misch <noah@leadboat.com> wrote in 
> I think attached v41nm is ready for commit.  Would anyone like to vote against
> back-patching this?  It's hard to justify lack of back-patch for a data-loss
> bug, but this is atypically invasive.  (I'm repeating the question, since some
> folks missed my 2020-02-18 question.)  Otherwise, I'll push this on Saturday.
> 
> On Mon, Mar 23, 2020 at 05:20:27PM +0900, Kyotaro Horiguchi wrote:
> > At Sat, 21 Mar 2020 15:49:20 -0700, Noah Misch <noah@leadboat.com> wrote in 
> > > The proximate cause is the RelFileNodeSkippingWAL() call that we added to
> > > MarkBufferDirtyHint().  MarkBufferDirtyHint() runs in parallel workers, but
> > > parallel workers have zeroes for pendingSyncHash and rd_*Subid.
> 
> > > Kyotaro, can you look through the affected code and propose a strategy for
> > > good coexistence of parallel query with the WAL skipping mechanism?
> > 
> > Bi-directional communication between leader and workers is too-much.
> > It wouldn't be acceptable to inhibit the problematic operations on
> > workers such like heap-prune or btree pin removal.  If we do pending
> > syncs just before worker start, it won't fix the issue.
> > 
> > The attached patch passes a list of pending-sync relfilenodes at
> > worker start.
> 
> If you were to issue pending syncs and also cease skipping WAL for affected
> relations, that would fix the issue.  Your design is better, though.  I made
> two notable changes:
>
> - The patch was issuing syncs or FPIs every time a parallel worker exited.  I
>   changed it to skip most of smgrDoPendingSyncs() in parallel workers, like
>   AtEOXact_RelationMap() does.

Exactly. Thank you for fixing it.

> - PARALLEL_KEY_PENDING_SYNCS is most similar to PARALLEL_KEY_REINDEX_STATE and
>   PARALLEL_KEY_COMBO_CID.  parallel.c, not execParallel.c, owns those.  I
>   moved PARALLEL_KEY_PENDING_SYNCS to parallel.c, which also called for style
>   changes in the associated storage.c functions.

That sounds better.

Moving the responsibility of creating pending syncs array reduces
copy. RestorePendingSyncs (And AddPendingSync()) looks better.

> Since pendingSyncHash is always NULL under XLogIsNeeded(), I also removed some
> XLogIsNeeded() tests that immediately preceded !pendingSyncHash tests.

Sounds reasonable. In AddPendingSync, don't we put
Assert(!XLogIsNeeded()) instead of "Assert(pendingSyncHash == NULL)"?
The former guarantees the relationship between XLogIsNeeded() and
pendingSyncHash, and the existing latter assertion looks redundant as
it is placed just after "if (pendingSyncHash)".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
Следующее
От: Noah Misch
Дата:
Сообщение: Re: backup manifests