Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
Дата
Msg-id 20190214072623.GA1139206@rfd.leadboat.com
обсуждение исходный текст
Ответ на Spurious "apparent wraparound" via SimpleLruTruncate() rounding  (Noah Misch <noah@leadboat.com>)
Ответы Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding  (Noah Misch <noah@leadboat.com>)
Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On Sat, Feb 02, 2019 at 03:38:22AM -0500, Noah Misch wrote:
> The main consequence is the false alarm.

That conclusion was incorrect.  On further study, I was able to reproduce data
loss via either of two weaknesses in the "apparent wraparound" test:

1. The result of the test is valid only until we release the SLRU ControlLock,
   which we do before SlruScanDirCbDeleteCutoff() uses the cutoff to evaluate
   segments for deletion.  Once we release that lock, latest_page_number can
   advance.  This creates a TOCTOU race condition, allowing excess deletion:

   [local] test=# table trunc_clog_concurrency ;
   ERROR:  could not access status of transaction 2149484247
   DETAIL:  Could not open file "pg_xact/0801": No such file or directory.

2. By the time the "apparent wraparound" test fires, we've already WAL-logged
   the truncation.  clog_redo() suppresses the "apparent wraparound" test,
   then deletes too much.  Startup then fails:

   881997 2019-02-10 02:53:32.105 GMT FATAL:  could not access status of transaction 708112327
   881997 2019-02-10 02:53:32.105 GMT DETAIL:  Could not open file "pg_xact/02A3": No such file or directory.
   881855 2019-02-10 02:53:32.107 GMT LOG:  startup process (PID 881997) exited with exit code 1


Fixes are available:

a. Fix the rounding in SimpleLruTruncate().  (The patch I posted upthread is
   wrong; I will correct it in a separate message.)

b. Arrange so only one backend runs vac_truncate_clog() at a time.  Other than
   AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
   checkpoint, or in the startup process.  Hence, also arrange for only one
   backend to call SimpleLruTruncate(AsyncCtl) at a time.

c. Test "apparent wraparound" before writing WAL, and don't WAL-log
   truncations that "apparent wraparound" forces us to skip.

d. Hold the ControlLock for the entirety of SimpleLruTruncate().  This removes
   the TOCTOU race condition, but TransactionIdDidCommit() and other key
   operations would be waiting behind filesystem I/O.

e. Have the SLRU track a "low cutoff" for an ongoing truncation.  Initially,
   the low cutoff is the page furthest in the past relative to cutoffPage (the
   "high cutoff").  If SimpleLruZeroPage() wishes to use a page in the
   truncation range, it would acquire an LWLock and increment the low cutoff.
   Before unlinking any segment, SlruScanDirCbDeleteCutoff() would take the
   same LWLock and recheck the segment against the latest low cutoff.

With both (a) and (b), the only way I'd know to reach the "apparent
wraparound" message is to restart in single-user mode and burn XIDs to the
point of bona fide wraparound.  Hence, I propose to back-patch (a) and (b),
and I propose (c) for HEAD only.  I don't want (d), which threatens
performance too much.  I would rather not have (e), because I expect it's more
complex than (b) and fixes strictly less than (b) fixes.

Can you see a way to improve on that plan?  Can you see other bugs of this
nature that this plan does not fix?

Thanks,
nm


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Tsunakawa, Takayuki"
Дата:
Сообщение: RE: Protect syscache from bloating with negative cache entries
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: replace_text optimization (StringInfo to varlena)