Re: Assertion failure when streaming logical changes

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: Assertion failure when streaming logical changes
Дата
Msg-id CAMsr+YH_R-x3y8=843wdsnYGS9O-D3QxDbGbEvvfr2_t8E=-ug@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Assertion failure when streaming logical changes  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: Assertion failure when streaming logical changes  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers


On 11 February 2015 at 08:51, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

The new xmin tracking code assumes that if a snapshots's regd_count > 0, it has already been pushed to the RegisteredSnapshots heap. That assumption doesn't hold because the logical decoding stuff creates its own snapshots and sets regd_count=1 to prevent snapmgr.c from freeing them, even though they haven't been registered with RegisterSnapshot.

The second paragraph in this comment in snapmgr.c needs fixing:

 * Likewise, any snapshots that have been exported by pg_export_snapshot
 * have regd_count = 1 and are counted in RegisteredSnapshots, but are not
 * tracked by any resource owner.
 *
 * The same is true for historic snapshots used during logical decoding,
 * their lifetime is managed separately (as they life longer as one xact.c
 * transaction).

Besides the spelling, that's incorrect: historic snapshots are *not* counted in RegisteredSnapshots. That was harmless before commit 94028691, but it was always wrong.


I think setting regd_count=1 outside snapmgr.c is a pretty ugly hack. snapbuild.c also abuses active_count as a reference counter, which is similarly ugly. I think regd_count and active_count should both be treated as private to snapmgr.c, and initialized to zero elsewhere.

As a minimal fix, we could change the logical decoding code to not use regd_count to prevent snapmgr.c from freeing its snapshots, but use active_count for that too. Setting regd_count to 1 in SnapBuildBuildSnapshot() seems unnecessary in the first place, as the callers also call SnapBuildSnapIncRefcount(). Patch attached.

It might be a good idea to apply this if nothing better is forthcoming. Logical decoding in WALsenders is broken at the moment.

Otherwise it needs to go on the 9.5 blockers list.

But could we get rid of those active_count manipulations too? Could you replace the SnapBuildSnap[Inc|Dec]Refcount calls with [Un]RegisterSnapshot()?

It would be interesting to know why it was done that way in the first place, rather than using the snapshot management infrastructure. I presume it needed to do something not directly offered by the snapshot manager but I haven't managed to grasp what, exactly.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Shigeru Hanada
Дата:
Сообщение: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Следующее
От: Sawada Masahiko
Дата:
Сообщение: Re: Proposal : REINDEX xxx VERBOSE