Обсуждение: Replication slots and isolation levels

Поиск

Список

Период

Сортировка

Replication slots and isolation levels

От

Vladimir Borodin

Дата:

27 октября 2015 г., 19:46:06

Hi all.

I’m wondering why do I get conflicts with recovery on hot standby using replication slots and read commited isolation level? And if I start repeatable read transaction I don’t get any errors. Below is some diagnostics.

I’m using 9.4.4 (upgrade is planned) from yum.postgresql.org packages on both master and standby. Configs are the same on both master and standby:

rpopdb01d/postgres M # SELECT name, setting FROM pg_settings

WHERE category LIKE 'Replication%' or category LIKE 'Write-Ahead Log';

name | setting

------------------------------+---------

hot_standby | on

hot_standby_feedback | on

max_replication_slots | 1

max_standby_archive_delay | 30000

max_standby_streaming_delay | 30000

max_wal_senders | 10

synchronous_standby_names |

vacuum_defer_cleanup_age | 200000

wal_keep_segments | 64

wal_receiver_status_interval | 1

wal_receiver_timeout | 60000

wal_sender_timeout | 3000

(12 rows)

Time: 1.583 ms

rpopdb01d/postgres M #

On the master I’ve created a physical replication slot and attached standby to it, I do see changing xmin and restart_lsn fields in pg_replication_slots view.

rpopdb01d/postgres M # select * from pg_replication_slots ;

----------------------+--------+-----------+--------+----------+--------+------------+--------------+---------------

(1 row)

Time: 0.463 ms

rpopdb01d/postgres M #

When I start a read commited transaction on standby (or use autocommit mode, doesn’t matter) I still see that xmin in pg_replication_slots view on master increases. If I do run a heavy SELECT statement, at some point of time (presumably after vacuum_defer_cleanup_age expires) standby starts to lag replication apply and when it hits max_standby_streaming_delay I get 40001 sql code, either ERROR or FATAL:

rpopdb01e/rpopdb R # SHOW transaction_isolation ;

transaction_isolation

-----------------------

read committed

(1 row)

Time: 0.324 ms

rpopdb01e/rpopdb R # SELECT count(*) FROM big_table;

ERROR: 40001: canceling statement due to conflict with recovery

DETAIL: User query might have needed to see row versions that must be removed.

LOCATION: ProcessInterrupts, postgres.c:2990

Time: 199791.339 ms

rpopdb01e/rpopdb R #

rpopdb01e/rpopdb R # SHOW transaction_isolation ;

transaction_isolation

-----------------------

read committed

(1 row)

Time: 0.258 ms

rpopdb01e/rpopdb R # BEGIN;

BEGIN

Time: 0.067 ms

rpopdb01e/rpopdb R # SELECT count(*) FROM big_table;

FATAL: 40001: terminating connection due to conflict with recovery

DETAIL: User was holding a relation lock for too long.

LOCATION: ProcessInterrupts, postgres.c:2857

server closed the connection unexpectedly

This probably means the server terminated abnormally

before or while processing the request.

The connection to the server was lost. Attempting reset: Succeeded.

Time: 307864.830 ms

rpopdb01e/rpopdb R #

The behavior is the same as expected to be without using replication slots.

But when I start repeatable read transaction xmin field in pg_replication_slots view on master freezes (while restart_lsn is still increasing) and I don’t get any replication lag and conflicts with recovery. When I end this transaction, xmin starts increasing again.

rpopdb01e/rpopdb R # begin transaction isolation level repeatable read;

BEGIN

Time: 0.118 ms

rpopdb01e/rpopdb R # SELECT count(*) FROM big_table;

count

------------

3106222429

(1 row)

Time: 411944.889 ms

rpopdb01e/rpopdb R # ROLLBACK;

ROLLBACK

Time: 0.269 ms

rpopdb01e/rpopdb R #

And that is what I expect. Am I missing something or is it expected behavior in read commited mode?

Thanks in advance.

May the force be with you…

https://simply.name

Re: Replication slots and isolation levels

От

Vladimir Borodin

Дата:

29 октября 2015 г., 11:43:09

27 окт. 2015 г., в 19:45, Vladimir Borodin <root@simply.name> написал(а):

Hi all.

I’m wondering why do I get conflicts with recovery on hot standby using replication slots and read commited isolation level? And if I start repeatable read transaction I don’t get any errors. Below is some diagnostics.

+hackers@

Could anybody explain, why this is happening?

I’m using 9.4.4 (upgrade is planned) from yum.postgresql.org packages on both master and standby. Configs are the same on both master and standby:

rpopdb01d/postgres M # SELECT name, setting FROM pg_settings
WHERE category LIKE 'Replication%' or category LIKE 'Write-Ahead Log';
name | setting
------------------------------+---------
hot_standby | on
hot_standby_feedback | on
max_replication_slots | 1
max_standby_archive_delay | 30000
max_standby_streaming_delay | 30000
max_wal_senders | 10
synchronous_standby_names |
vacuum_defer_cleanup_age | 200000
wal_keep_segments | 64
wal_receiver_status_interval | 1
wal_receiver_timeout | 60000
wal_sender_timeout | 3000
(12 rows)

Time: 1.583 ms
rpopdb01d/postgres M #

On the master I’ve created a physical replication slot and attached standby to it, I do see changing xmin and restart_lsn fields in pg_replication_slots view.

rpopdb01d/postgres M # select * from pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn
----------------------+--------+-----------+--------+----------+--------+------------+--------------+---------------
rpopdb01e_domain_com | [null] | physical | [null] | [null] | t | 2127399287 | [null] | 960B/415C79C8
(1 row)

Time: 0.463 ms
rpopdb01d/postgres M #

When I start a read commited transaction on standby (or use autocommit mode, doesn’t matter) I still see that xmin in pg_replication_slots view on master increases. If I do run a heavy SELECT statement, at some point of time (presumably after vacuum_defer_cleanup_age expires) standby starts to lag replication apply and when it hits max_standby_streaming_delay I get 40001 sql code, either ERROR or FATAL:

rpopdb01e/rpopdb R # SHOW transaction_isolation ;
transaction_isolation
-----------------------
read committed
(1 row)

Time: 0.324 ms
rpopdb01e/rpopdb R # SELECT count(*) FROM big_table;
ERROR: 40001: canceling statement due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be removed.
LOCATION: ProcessInterrupts, postgres.c:2990
Time: 199791.339 ms
rpopdb01e/rpopdb R #

rpopdb01e/rpopdb R # SHOW transaction_isolation ;
transaction_isolation
-----------------------
read committed
(1 row)

Time: 0.258 ms
rpopdb01e/rpopdb R # BEGIN;
BEGIN
Time: 0.067 ms
rpopdb01e/rpopdb R # SELECT count(*) FROM big_table;
FATAL: 40001: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.
LOCATION: ProcessInterrupts, postgres.c:2857
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
Time: 307864.830 ms
rpopdb01e/rpopdb R #

The behavior is the same as expected to be without using replication slots.

But when I start repeatable read transaction xmin field in pg_replication_slots view on master freezes (while restart_lsn is still increasing) and I don’t get any replication lag and conflicts with recovery. When I end this transaction, xmin starts increasing again.

rpopdb01e/rpopdb R # begin transaction isolation level repeatable read;
BEGIN
Time: 0.118 ms
rpopdb01e/rpopdb R # SELECT count(*) FROM big_table;
count
------------
3106222429
(1 row)

Time: 411944.889 ms
rpopdb01e/rpopdb R # ROLLBACK;
ROLLBACK
Time: 0.269 ms
rpopdb01e/rpopdb R #

And that is what I expect. Am I missing something or is it expected behavior in read commited mode?

Thanks in advance.

--
May the force be with you…
https://simply.name

May the force be with you…

https://simply.name

Re: [HACKERS] Replication slots and isolation levels

От

Michael Paquier

Дата:

29 октября 2015 г., 13:12:45

On Thu, Oct 29, 2015 at 9:42 AM, Vladimir Borodin wrote:
> I’m wondering why do I get conflicts with recovery on hot standby using
> replication slots and read commited isolation level? And if I start
> repeatable read transaction I don’t get any errors. Below is some
> diagnostics.

In the case of repeatable read the standby will wait before applying
the VACUUM WAL record cleaning up a relation page. Hence you won't get
conflicts in this case.
--
Michael

Re: [HACKERS] Replication slots and isolation levels

От

Vladimir Borodin

Дата:

29 октября 2015 г., 13:33:31

29 окт. 2015 г., в 13:12, Michael Paquier <michael.paquier@gmail.com> написал(а):

On Thu, Oct 29, 2015 at 9:42 AM, Vladimir Borodin wrote:
I’m wondering why do I get conflicts with recovery on hot standby using
replication slots and read commited isolation level? And if I start
repeatable read transaction I don’t get any errors. Below is some
diagnostics.

In the case of repeatable read the standby will wait before applying
the VACUUM WAL record cleaning up a relation page. Hence you won't get
conflicts in this case.

Standby will receive but will not apply? Or master will not vacuum needed by standby pages? It seems that the second one is happening because replication lag on standby does not increase while issuing such repeatable read transaction.

--
Michael

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Да пребудет с вами сила…

https://simply.name/ru

Re: [HACKERS] Replication slots and isolation levels

От

Michael Paquier

Дата:

29 октября 2015 г., 14:03:45

On Thu, Oct 29, 2015 at 11:33 AM, Vladimir Borodin wrote:
> 29 окт. 2015 г., в 13:12, Michael Paquier написал(а):
>> In the case of repeatable read the standby will wait before applying
>> the VACUUM WAL record cleaning up a relation page. Hence you won't get
>> conflicts in this case.
>
> Standby will receive but will not apply? Or master will not vacuum needed by
> standby pages? It seems that the second one is happening because replication
> lag on standby does not increase while issuing such repeatable read
> transaction.

Standby will receive the record but not replay it until the
transaction doing REPEATABLE READ transactions that needs those rows
commits on the standby. The WAL flush position on the standby
continues to move on. This depends of course on
max_standby_streaming_delay which may decide or not to force the
transaction to cancel if it takes too long. Someone feel free to
correct me if I am missing something here.
--
Michael

Re: [HACKERS] Replication slots and isolation levels

От

Vladimir Borodin

Дата:

29 октября 2015 г., 14:35:41

29 окт. 2015 г., в 14:03, Michael Paquier <michael.paquier@gmail.com> написал(а):

On Thu, Oct 29, 2015 at 11:33 AM, Vladimir Borodin wrote:
29 окт. 2015 г., в 13:12, Michael Paquier написал(а):
In the case of repeatable read the standby will wait before applying
the VACUUM WAL record cleaning up a relation page. Hence you won't get
conflicts in this case.

Standby will receive but will not apply? Or master will not vacuum needed by
standby pages? It seems that the second one is happening because replication
lag on standby does not increase while issuing such repeatable read
transaction.

Standby will receive the record but not replay it until the
transaction doing REPEATABLE READ transactions that needs those rows
commits on the standby. The WAL flush position on the standby
continues to move on.

By replication lag on standby I mean exactly replay_location, not flush_location.

This depends of course on
max_standby_streaming_delay which may decide or not to force the
transaction to cancel if it takes too long. Someone feel free to
correct me if I am missing something here.

Well, the initial problem is that in read commited mode heavy SELECT-statement hits max_standby_streaming_delay but in repeatable read mode doesn’t. My question is if it is expected behavior? If yes, why is it so?

Thanks for your response!

--
Michael

Да пребудет с вами сила…

https://simply.name/ru

Re: [HACKERS] Replication slots and isolation levels

От

Michael Paquier

Дата:

29 октября 2015 г., 15:29:25

On Thu, Oct 29, 2015 at 12:35 PM, Vladimir Borodin wrote:
> 29 окт. 2015 г., в 14:03, Michael Paquier написал(а):
>> Standby will receive the record but not replay it until the
>> transaction doing REPEATABLE READ transactions that needs those rows
>> commits on the standby. The WAL flush position on the standby
>> continues to move on.
>
> By replication lag on standby I mean exactly replay_location, not
> flush_location.
> Well, the initial problem is that in read commited mode heavy
> SELECT-statement hits max_standby_streaming_delay but in repeatable read
> mode doesn’t. My question is if it is expected behavior? If yes, why is it
> so?

Er, well. If I enforce on master the deletion then VACUUM-cleanup of a
page with a REPEATABLE READ transaction on standby still expecting to
have this page items visible until its commit the startup process puts
itself in waiting state when trying to replay the cleanup record, and
the replay_location does not move on, still the wal receiver gets WAL
in parallel, so it continues to flush things and flush_position
progresses. With a READ COMMITTED transaction running on the standby,
this transaction considers as visible stuff that has been committed,
so WAL replay can move on, and indeed there is a risk to face a
recovery conflict. So this behavior as-is is correct, based on how
isolation levels should behave when a node performs recovery.
--
Michael

Re: [HACKERS] Replication slots and isolation levels

От

Vladimir Borodin

Дата:

29 октября 2015 г., 16:39:33

29 окт. 2015 г., в 15:29, Michael Paquier <michael.paquier@gmail.com> написал(а):

On Thu, Oct 29, 2015 at 12:35 PM, Vladimir Borodin wrote:
29 окт. 2015 г., в 14:03, Michael Paquier написал(а):
Standby will receive the record but not replay it until the
transaction doing REPEATABLE READ transactions that needs those rows
commits on the standby. The WAL flush position on the standby
continues to move on.

By replication lag on standby I mean exactly replay_location, not
flush_location.
Well, the initial problem is that in read commited mode heavy
SELECT-statement hits max_standby_streaming_delay but in repeatable read
mode doesn’t. My question is if it is expected behavior? If yes, why is it
so?

Er, well. If I enforce on master the deletion then VACUUM-cleanup of a
page with a REPEATABLE READ transaction on standby still expecting to
have this page items visible until its commit the startup process puts
itself in waiting state when trying to replay the cleanup record, and
the replay_location does not move on, still the wal receiver gets WAL
in parallel, so it continues to flush things and flush_position
progresses. With a READ COMMITTED transaction running on the standby,
this transaction considers as visible stuff that has been committed,
so WAL replay can move on, and indeed there is a risk to face a
recovery conflict. So this behavior as-is is correct, based on how
isolation levels should behave when a node performs recovery.

Everything you describe is exactly true for setups without replication slots. And the ability to run heavy SELECT statements on hot standby without replication lag and recovery conflicts was the reason why I tried to use them. And the documentation [0] directly says that «Replication slots provide an automated way to ensure ... that the master does not remove rows which could cause a recovery conflict even when the standby is disconnected». My question is why is it true for REPEATABLE READ transactions but it doesn’t work for READ COMMITED queries? Seems, that «even when the standby is disconnected» is much stronger limitation and READ COMMITED should work fine, but it doesn’t.

If I understand right, with hot_standby_feedback = on standby tells the master xmin of the earliest transaction on standby. And autovacuum worker on master takes it into account when doing vacuum cleanup (because it can see it from pg_replications_slots view), AFAIK with GetOldestXmin function. But I don’t understand why with READ COMMITED transactions xmin in pg_replication_slots view on master continues to increase while with REPEATABLE READ xmin freezes until this transaction finishes.

[0] http://www.postgresql.org/docs/9.4/static/warm-standby.html#STREAMING-REPLICATION-SLOTS

--
Michael

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Да пребудет с вами сила…

https://simply.name/ru

Re: [HACKERS] Replication slots and isolation levels

От

Oleksii Kliukin

Дата:

29 октября 2015 г., 17:29:52

On 29 Oct 2015, at 14:39, Vladimir Borodin <root@simply.name> wrote:

f I understand right, with hot_standby_feedback = on standby tells the master xmin of the earliest transaction on standby. And autovacuum worker on master takes it into account when doing vacuum cleanup (because it can see it from pg_replications_slots view), AFAIK with GetOldestXmin function. But I don’t understand why with READ COMMITED transactions xmin in pg_replication_slots view on master continues to increase while with REPEATABLE READ xmin freezes until this transaction finishes.

Could it be a consequence of how REPEATABLE READ transactions handle snapshots? With REPEATABLE READ the snapshot is acquired only once at the beginning of a transaction; a READ COMMITTED transaction re-evaluates its snapshot with each new command.

http://www.postgresql.org/docs/current/static/transaction-iso.html

Kind regards,

Oleksii

Re: [HACKERS] Replication slots and isolation levels

От

Robert Haas

Дата:

30 октября 2015 г., 14:30:13

On Thu, Oct 29, 2015 at 3:29 PM, Oleksii Kliukin <alexk@hintbits.com> wrote:
> Could it be a consequence of how REPEATABLE READ transactions handle
> snapshots? With REPEATABLE READ the snapshot is acquired only once at the
> beginning of a transaction; a READ COMMITTED transaction re-evaluates its
> snapshot with each new command.

I bet that's exactly it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Replication slots and isolation levels

От

Vladimir Borodin

Дата:

30 октября 2015 г., 14:40:34

30 окт. 2015 г., в 14:30, Robert Haas <robertmhaas@gmail.com> написал(а):

On Thu, Oct 29, 2015 at 3:29 PM, Oleksii Kliukin <alexk@hintbits.com> wrote:
Could it be a consequence of how REPEATABLE READ transactions handle
snapshots? With REPEATABLE READ the snapshot is acquired only once at the
beginning of a transaction; a READ COMMITTED transaction re-evaluates its
snapshot with each new command.

I bet that's exactly it.

I still don’t fully understand why is it so (the problem occurs while running only one SELECT-statement in READ COMMITED so only one snapshot is taken), but if is expected behavior shouldn’t the documentation mention that using READ COMMITED (which is the default) you may still get conflicts with recovery while using replication slots?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

May the force be with you…

https://simply.name

Re: [HACKERS] Replication slots and isolation levels

От

Michael Paquier

Дата:

30 октября 2015 г., 15:42:23

On Fri, Oct 30, 2015 at 12:40 PM, Vladimir Borodin wrote:
> On Thu, Oct 29, 2015 at 3:29 PM, Oleksii Kliukin wrote:
>> Could it be a consequence of how REPEATABLE READ transactions handle
>> snapshots? With REPEATABLE READ the snapshot is acquired only once at the
>> beginning of a transaction; a READ COMMITTED transaction re-evaluates its
>> snapshot with each new command.
>
> I still don’t fully understand why is it so (the problem occurs while
> running only one SELECT-statement in READ COMMITED so only one snapshot is
> taken), but if is expected behavior shouldn’t the documentation mention that
> using READ COMMITTED (which is the default) you may still get conflicts with
> recovery while using replication slots?

Replication slots and hot_standby_feedback are two different unrelated
concepts, slots being aimed at retaining WAL. I guess that's the
origin of your confusion:
http://www.postgresql.org/message-id/20150616192141.GD2626@alap3.anarazel.de
--
Michael

Re: [HACKERS] Replication slots and isolation levels

От

Andres Freund

Дата:

30 октября 2015 г., 16:00:10

On 2015-10-30 13:42:19 +0100, Michael Paquier wrote:
> On Fri, Oct 30, 2015 at 12:40 PM, Vladimir Borodin wrote:
> > On Thu, Oct 29, 2015 at 3:29 PM, Oleksii Kliukin wrote:
> >> Could it be a consequence of how REPEATABLE READ transactions handle
> >> snapshots? With REPEATABLE READ the snapshot is acquired only once at the
> >> beginning of a transaction; a READ COMMITTED transaction re-evaluates its
> >> snapshot with each new command.
> >
> > I still don’t fully understand why is it so (the problem occurs while
> > running only one SELECT-statement in READ COMMITED so only one snapshot is
> > taken), but if is expected behavior shouldn’t the documentation mention that
> > using READ COMMITTED (which is the default) you may still get conflicts with
> > recovery while using replication slots?
>
> Replication slots and hot_standby_feedback are two different unrelated
> concepts, slots being aimed at retaining WAL.

Uh. Slots also retain the xmin horizon if hot_standby_feedback is
enabled on the standby?

> I guess that's the origin of your confusion:
> http://www.postgresql.org/message-id/20150616192141.GD2626@alap3.anarazel.de

That just says what I said above, I don't see how this makes replication
slots and hs feedback unrelated?

Greetings,

Andres Freund

Re: [HACKERS] Replication slots and isolation levels

От

Robert Haas

Дата:

30 октября 2015 г., 16:04:38

On Fri, Oct 30, 2015 at 12:40 PM, Vladimir Borodin <root@simply.name> wrote:
> I still don’t fully understand why is it so (the problem occurs while
> running only one SELECT-statement in READ COMMITED so only one snapshot is
> taken), but if is expected behavior shouldn’t the documentation mention that
> using READ COMMITED (which is the default) you may still get conflicts with
> recovery while using replication slots?

Are you doing BEGIN / one or more SELECT statements / END?

Or just a bare SELECT with no explicit transaction control?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Replication slots and isolation levels

От

Vladimir Borodin

Дата:

30 октября 2015 г., 16:49:17

30 окт. 2015 г., в 16:04, Robert Haas <robertmhaas@gmail.com> написал(а):

On Fri, Oct 30, 2015 at 12:40 PM, Vladimir Borodin <root@simply.name> wrote:
I still don’t fully understand why is it so (the problem occurs while
running only one SELECT-statement in READ COMMITED so only one snapshot is
taken), but if is expected behavior shouldn’t the documentation mention that
using READ COMMITED (which is the default) you may still get conflicts with
recovery while using replication slots?

Are you doing BEGIN / one or more SELECT statements / END?

Or just a bare SELECT with no explicit transaction control?

I’ve tried two ways - bare SELECT in autocommit mode and BEGIN; SELECT; ROLLBACK. I first described the problem in thread on pgsql-admin@ [0], there is copy-paste from psql there, but during conversation initial description was lost.

[0] http://www.postgresql.org/message-id/7F74C5EA-6741-44FC-B6C6-E96F18D761FB@simply.name

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Да пребудет с вами сила…

https://simply.name/ru

Re: [HACKERS] Replication slots and isolation levels

От

Robert Haas

Дата:

02 ноября 2015 г., 23:38:03

On Fri, Oct 30, 2015 at 9:49 AM, Vladimir Borodin <root@simply.name> wrote:
> I’ve tried two ways - bare SELECT in autocommit mode and BEGIN; SELECT;
> ROLLBACK. I first described the problem in thread on pgsql-admin@ [0], there
> is copy-paste from psql there, but during conversation initial description
> was lost.
>
> [0]
> http://www.postgresql.org/message-id/7F74C5EA-6741-44FC-B6C6-E96F18D761FB@simply.name

Hmm.  That behavior seems unexpected to me, but I might be missing something.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Replication slots and isolation levels

От

Vladimir Borodin

Дата:

03 ноября 2015 г., 09:58:04

2 нояб. 2015 г., в 23:37, Robert Haas <robertmhaas@gmail.com> написал(а):

On Fri, Oct 30, 2015 at 9:49 AM, Vladimir Borodin <root@simply.name> wrote:
I’ve tried two ways - bare SELECT in autocommit mode and BEGIN; SELECT;
ROLLBACK. I first described the problem in thread on pgsql-admin@ [0], there
is copy-paste from psql there, but during conversation initial description
was lost.

[0]
http://www.postgresql.org/message-id/7F74C5EA-6741-44FC-B6C6-E96F18D761FB@simply.name

Hmm. That behavior seems unexpected to me, but I might be missing something.

Me too. That’s why I started the thread. One small detail that might have a value is that the big table being queried is partitioned into 64 inhereted tables. Now I’m trying to write a simple script to reproduce the problem, but that is not so easy because AFAIK VACUUM on master should happen while single query on standby is running and it should vacuum those rows that have not been accessed by the query on standby yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

May the force be with you…

https://simply.name

Re: [HACKERS] Replication slots and isolation levels

От

Andres Freund

Дата:

03 ноября 2015 г., 11:38:49

On 2015-11-02 15:37:57 -0500, Robert Haas wrote:
> On Fri, Oct 30, 2015 at 9:49 AM, Vladimir Borodin <root@simply.name> wrote:
> > I’ve tried two ways - bare SELECT in autocommit mode and BEGIN; SELECT;
> > ROLLBACK. I first described the problem in thread on pgsql-admin@ [0], there
> > is copy-paste from psql there, but during conversation initial description
> > was lost.
> >
> > [0]
> > http://www.postgresql.org/message-id/7F74C5EA-6741-44FC-B6C6-E96F18D761FB@simply.name
>
> Hmm.  That behavior seems unexpected to me, but I might be missing something.

The conflict is because of a relation lock, not because of
visibility. Hot-Standby feedback changes nothing about that.

I presume all the other conflicts are all because of relation level
locks? Check pg_stat_database_conflicts and the server logs to verify.

Andres

Re: [HACKERS] Replication slots and isolation levels

От

Vladimir Borodin

Дата:

03 ноября 2015 г., 12:54:21

3 нояб. 2015 г., в 11:38, Andres Freund <andres@anarazel.de> написал(а):

On 2015-11-02 15:37:57 -0500, Robert Haas wrote:
On Fri, Oct 30, 2015 at 9:49 AM, Vladimir Borodin <root@simply.name> wrote:
I’ve tried two ways - bare SELECT in autocommit mode and BEGIN; SELECT;
ROLLBACK. I first described the problem in thread on pgsql-admin@ [0], there
is copy-paste from psql there, but during conversation initial description
was lost.

[0]
http://www.postgresql.org/message-id/7F74C5EA-6741-44FC-B6C6-E96F18D761FB@simply.name

Hmm. That behavior seems unexpected to me, but I might be missing something.

The conflict is because of a relation lock, not because of
visibility. Hot-Standby feedback changes nothing about that.

I presume all the other conflicts are all because of relation level
locks? Check pg_stat_database_conflicts and the server logs to verify.

Oh, good point, thank you, it gives the answer. Actually I’ve already done a switchover in this cluster, so pg_stat_database_conflicts started from scratch :( But the logs haven’t been rotated yet:

root@rpopdb01e ~ # fgrep -e 562f9ef0.23df,6 -e 562fa107.451a -e 562fa1d9.5146 -e 562f9ef0.23df,10 -e 562fa259.56d1 /var/lib/pgsql/9.4/data/pg_log/postgresql-2015-10-27_185736.csv

2015-10-27 19:06:28.656 MSK,,,9183,,562f9ef0.23df,6,,2015-10-27 18:57:36 MSK,,0,LOG,00000,"parameter ""hot_standby_feedback"" changed to ""off""",,,,,,,,,""

2015-10-27 19:10:05.039 MSK,"postgres","rpopdb",17690,"[local]",562fa107.451a,1,"",2015-10-27 19:06:31 MSK,12/54563,0,ERROR,40001,"canceling statement due to conflict with recovery","User query might have needed to see row versions that must be removed.",,,,,"select count(*) from rpop.rpop_imap_uidls;",,,"psql"

2015-10-27 19:10:05.995 MSK,"monitor","rpopdb",20806,"localhost:51794",562fa1d9.5146,1,"",2015-10-27 19:10:01 MSK,15/24192,0,ERROR,40001,"canceling statement due to conflict with recovery","User was holding shared buffer pin for too long.",,,,"SQL function ""to_timestamp"" statement 1","select cnt from monitor.bad_rpop_total",,,""

2015-10-27 19:12:06.878 MSK,,,9183,,562f9ef0.23df,10,,2015-10-27 18:57:36 MSK,,0,LOG,00000,"parameter ""hot_standby_feedback"" changed to ""on""",,,,,,,,,""

2015-10-27 19:17:57.056 MSK,"postgres","rpopdb",22225,"[local]",562fa259.56d1,1,"",2015-10-27 19:12:09 MSK,3/35442,0,FATAL,40001,"terminating connection due to conflict with recovery","User was holding a relation lock for too long.",,,,,"select count(*) from rpop.rpop_imap_uidls;",,,"psql"

root@rpopdb01e ~ #

So FATAL is due to relation lock and one ERROR is due to pinned buffers (this is actually from another user) but there is also one ERROR due to old snapshots (first line). But I actually turned off hs_feedback before first ERROR and turned it on after it. So it seems to work expectedly.

Does it actually mean that I could get such conflicts (due to relation locks, for example) even in repeatable read or serializable? I mean, is there any dependency between transaction isolation level on standby and conflicts with recovery?

And am I right that the only way not to have confl_lock is to increase max_standby_streaming_delay?

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

May the force be with you…

https://simply.name

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Replication slots and isolation levels