Re: SimpleLruTruncate() mutual exclusion

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: SimpleLruTruncate() mutual exclusion
Дата
Msg-id 20190801065117.GA2355684@rfd.leadboat.com
обсуждение исходный текст
Ответ на Re: SimpleLruTruncate() mutual exclusion  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: SimpleLruTruncate() mutual exclusion  (Thomas Munro <thomas.munro@gmail.com>)
Re: SimpleLruTruncate() mutual exclusion  (Dmitry Dolgov <9erthalion6@gmail.com>)
Список pgsql-hackers
On Mon, Jul 29, 2019 at 12:58:17PM -0400, Tom Lane wrote:
> > On Sun, Feb 17, 2019 at 11:31:03PM -0800, Noah Misch wrote:
> >>> b. Arrange so only one backend runs vac_truncate_clog() at a time.  Other than
> >>> AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
> >>> checkpoint, or in the startup process.  Hence, also arrange for only one
> >>> backend to call SimpleLruTruncate(AsyncCtl) at a time.
> 
> >> More specifically, restrict vac_update_datfrozenxid() to one backend per
> >> database, and restrict vac_truncate_clog() and asyncQueueAdvanceTail() to one
> >> backend per cluster.  This, attached, was rather straightforward.

> I'm pretty sure it was intentional that multiple backends
> could run truncation directory scans concurrently, and I don't really
> want to give that up if possible.

I want to avoid a design that requires painstaking concurrency analysis.  Such
analysis is worth it for heap_update(), but vac_truncate_clog() isn't enough
of a hot path.  If there's a way to make vac_truncate_clog() easy to analyze
and still somewhat concurrent, fair.

> Also, if I understand the data-loss hazard properly, it's what you
> said in the other thread: the latest_page_number could advance after
> we make our decision about what to truncate, and then maybe we could
> truncate newly-added data.  We surely don't want to lock out the
> operations that can advance last_page_number, so how does serializing
> vac_truncate_clog help?
> 
> I wonder whether what we need to do is add some state in shared
> memory saying "it is only safe to create pages before here", and
> make SimpleLruZeroPage or its callers check that.  The truncation
> logic would advance that value, but only after completing a scan.
> (Oh ..., hmm, maybe the point of serializing truncations is to
> ensure that we know that we can safely advance that?) 

vac_truncate_clog() already records "it is only safe to create pages before
here" in ShmemVariableCache->xidWrapLimit, which it updates after any unlinks.
The trouble comes when two vac_truncate_clog() run in parallel and you get a
sequence of events like this:

vac_truncate_clog() instance 1 starts, considers segment ABCD eligible to unlink
vac_truncate_clog() instance 2 starts, considers segment ABCD eligible to unlink
vac_truncate_clog() instance 1 unlinks segment ABCD
vac_truncate_clog() instance 1 calls SetTransactionIdLimit()
vac_truncate_clog() instance 1 finishes
some backend calls SimpleLruZeroPage(), creating segment ABCD
vac_truncate_clog() instance 2 unlinks segment ABCD

Serializing vac_truncate_clog() fixes that.

> Can you post whatever script you used to detect/reproduce the problem
> in the first place?

I've attached it, but it's pretty hackish.  Apply the patch on commit 7170268,
make sure your --bindir is in $PATH, copy "conf-test-pg" to your home
directory, and run "make trunc-check".  This incorporates xid-burn
acceleration code that Jeff Janes shared with -hackers at some point.

> PS: also, re-reading this code, it's worrisome that we are not checking
> for failure of the unlink calls.  I think the idea was that it didn't
> really matter because if we did re-use an existing file we'd just
> re-zero it; hence failing the truncate is an overreaction.  But probably
> some comments about that are in order.

That's my understanding.  We'll zero any page before reusing it.  Failing the
whole vac_truncate_clog() (and therefore not advancing the wrap limit) would
do more harm than the bit of wasted disk space.  Still, a LOG or WARNING
message would be fine, I think.

Thanks,
nm

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: Contribution to Perldoc for TestLib module in Postgres
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: pgbench - implement strict TPC-B benchmark