BUG #13473: VACUUM FREEZE mistakenly cancel standby sessions

Поиск
Список
Период
Сортировка
От marco.nenciarini@2ndquadrant.it
Тема BUG #13473: VACUUM FREEZE mistakenly cancel standby sessions
Дата
Msg-id 20150626134310.3876.82768@wrigleys.postgresql.org
обсуждение исходный текст
Ответы Re: BUG #13473: VACUUM FREEZE mistakenly cancel standby sessions  (Marco Nenciarini <marco.nenciarini@2ndquadrant.it>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      13473
Logged by:          Marco Nenciarini
Email address:      marco.nenciarini@2ndquadrant.it
PostgreSQL version: 9.4.4
Operating system:   all
Description:

= Symptoms

Let's have a simple master -> standby setup, with hot_standby_feedback
activated,
if a backend on standby is holding the cluster xmin and the master runs a
VACUUM FREEZE
on the same database of the standby's backend, it will generate a conflict
and the query
running on standby will be canceled.

= How to reproduce it

Run the following operation on an idle cluster.

1) connect to the standby and simulate a long running query:

   select pg_sleep(3600);

2) connect to the master and run the following script

   create table t(id int primary key);
   insert into t select generate_series(1, 10000);
   vacuum freeze verbose t;
   drop table t;

3) after 30 seconds the pg_sleep query on standby will be canceled.

= Expected output

The hot standby feedback should have prevented the query cancellation

= Analysis

Ive run postgres at DEBUG2 logging level, and I can confirm that the vacuum
correctly see the OldestXmin propagated by the standby through the hot
standby feedback.
The issue is in heap_xlog_freeze function, which calls
ResolveRecoveryConflictWithSnapshot as first thing, passing the cutoff_xid
value as first argument.
The cutoff_xid is the OldestXmin active when the vacuum, so it represents a
running xid.
The issue is that the function ResolveRecoveryConflictWithSnapshot expects
as first argument of is latestRemovedXid, which represent the higher xid
that has been actually removed, so there is an off-by-one error.

I've been able to reproduce this issue for every version of postgres since
9.0 (9.0, 9.1, 9.2, 9.3, 9.4 and current master)

= Proposed solution

In the heap_xlog_freeze we need to subtract one to the value of cutoff_xid
before passing it to ResolveRecoveryConflictWithSnapshot.

В списке pgsql-bugs по дате отправления:

Предыдущее
От: matthew.seaman@adestra.com
Дата:
Сообщение: BUG #13472: VACUUM ANALYZE hangs on certain tables
Следующее
От: Marco Nenciarini
Дата:
Сообщение: Re: BUG #13473: VACUUM FREEZE mistakenly cancel standby sessions