Обсуждение: Hot Standby Conflict on pg_attribute

Поиск

Список

Период

Сортировка

Hot Standby Conflict on pg_attribute

От

Erik Jones

Дата:

09 мая 2019 г., 20:03:50

Hello,

A client has recently had a couple of hot standby query conflict pile-ups around AccessShare lock waits on pg_attribute. Here is an example from the log:

Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-1] sql_error_code = 00000 LOG: process 118946 still waiting for AccessShareLock on relation 1249 of database 16401 after 1000.127 ms at character 92

Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-2] sql_error_code = 00000 DETAIL: Process holding the lock: 9. Wait queue: 118948, 118950, 118708, 118818, 118886, 118961, 118960, 118806, 118963, 118959, 118881, 118887, 118878, 118896, 118964, 118965, 118945, 118949, 118946, 118743, 118966, 118947, 118967, 118968.

Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-3] sql_error_code = 00000 STATEMENT: SELECT uc.id,

Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-4] uc.some_id,

Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-5] uc.utr_id,

Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-6] utr.name

Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-7] FROM usertable1 uc

Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-8] INNER JOIN usertable2 utr

Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-9] ON uc.utr_id = utr.id

Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-10] WHERE uc.some_id = $1

Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-11] ORDER BY name

Relation 1249 is pg_attribute and process 9 that was holding the lock was RecoveryWalAll process. I've confirmed that autovacuum had removed some pages from pg_attribute shortly before this, which happens somewhat regularly since this client runs a couple thousand REFERSH MATARIALIZED VIEW queries per day which look to cause inserts and deletes there so it having an exclusive lock on pg_attribute makes sense.

The question then is: Why would these user queries be waiting on an AccessShare lock on pg_attribute? Thus far we've been unable to recreate any transacitons with the above query (and others) that show any pg_attribute locks. There is no ORM in play here and these queries are being sent as single query transactions via this Node.js postgres adapter: https://github.com/brianc/node-postgres which is pretty bare bones.

Erik Jones
mage2k@gmail.com

Re: Hot Standby Conflict on pg_attribute

От

Andres Freund

Дата:

10 мая 2019 г., 19:14:37

Hi,

On 2019-05-09 13:03:50 -0700, Erik Jones wrote:
> The question then is: Why would these user queries be waiting on an
> AccessShare lock on pg_attribute?  Thus far we've been unable to recreate
> any transacitons with the above query (and others) that show any
> pg_attribute locks.  There is no ORM in play here and these queries are
> being sent as single query transactions via this Node.js postgres adapter:
> https://github.com/brianc/node-postgres which is pretty bare bones.

Queries that access a table for the *first* time after DDL happened
(including truncating the relation), need an AccessShareLock on
pg_attribute (and pg_class, pg_index, ...) for a short time.

You can reproduce that fairly easily:

S1: CREATE TABLE foo();
S2: BEGIN; LOCK pg_attribute;
S1: SELECT * FROM foo;
S2: COMMIT;

S1 could execute the select, because it has a cached view of the way the
relation looks.

S2: ALTER TABLE foo ADD COLUMN bar INT;
S2: BEGIN; LOCK pg_attribute;
S1: SELECT * FROM foo;

Here S1 is blocked, because it needs to look at pg_attribute to figure
out the "shape" of the table, but it's currently locked.

Greetings,

Andres Freund

Re: Hot Standby Conflict on pg_attribute

От

Erik Jones

Дата:

10 мая 2019 г., 19:23:38

Hi Andres,

Thank you very much! That's exactly what I needed.

On Fri, May 10, 2019 at 12:14 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-05-09 13:03:50 -0700, Erik Jones wrote:
> The question then is: Why would these user queries be waiting on an
> AccessShare lock on pg_attribute? Thus far we've been unable to recreate
> any transacitons with the above query (and others) that show any
> pg_attribute locks. There is no ORM in play here and these queries are
> being sent as single query transactions via this Node.js postgres adapter:
> https://github.com/brianc/node-postgres which is pretty bare bones.

Queries that access a table for the *first* time after DDL happened
(including truncating the relation), need an AccessShareLock on
pg_attribute (and pg_class, pg_index, ...) for a short time.

You can reproduce that fairly easily:

S1: CREATE TABLE foo();
S2: BEGIN; LOCK pg_attribute;
S1: SELECT * FROM foo;
S2: COMMIT;

S1 could execute the select, because it has a cached view of the way the
relation looks.

S2: ALTER TABLE foo ADD COLUMN bar INT;
S2: BEGIN; LOCK pg_attribute;
S1: SELECT * FROM foo;

Here S1 is blocked, because it needs to look at pg_attribute to figure
out the "shape" of the table, but it's currently locked.

Greetings,

Andres Freund

Erik Jones
mage2k@gmail.com

Re: Hot Standby Conflict on pg_attribute

От

Tom Lane

Дата:

10 мая 2019 г., 19:40:55

Andres Freund <andres@anarazel.de> writes:
> On 2019-05-09 13:03:50 -0700, Erik Jones wrote:
>> The question then is: Why would these user queries be waiting on an
>> AccessShare lock on pg_attribute?

> Queries that access a table for the *first* time after DDL happened
> (including truncating the relation), need an AccessShareLock on
> pg_attribute (and pg_class, pg_index, ...) for a short time.

Also, it seems likely that what's really triggering the issue is
autovacuum on pg_attribute trying to truncate off empty pages
in pg_attribute (after a bunch of dead rows were generated there
by DDL activity).  That requires exclusive lock on pg_attribute,
which would propagate down to the standby.

            regards, tom lane

Re: Hot Standby Conflict on pg_attribute

От

Erik Jones

Дата:

10 мая 2019 г., 19:46:58

On Fri, May 10, 2019 at 12:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:
> On 2019-05-09 13:03:50 -0700, Erik Jones wrote:
>> The question then is: Why would these user queries be waiting on an
>> AccessShare lock on pg_attribute?

> Queries that access a table for the *first* time after DDL happened
> (including truncating the relation), need an AccessShareLock on
> pg_attribute (and pg_class, pg_index, ...) for a short time.

Also, it seems likely that what's really triggering the issue is
autovacuum on pg_attribute trying to truncate off empty pages
in pg_attribute (after a bunch of dead rows were generated there
by DDL activity). That requires exclusive lock on pg_attribute,
which would propagate down to the standby.

regards, tom lane

Right, that part I understood after checking out pg_attribute's insert/delete counts in pg_stat_sys_tables before and after some REFRESH MATERIALIZED VIEW runs on an otherwise idle server. With them running 2k+ refreshes per day autovac is regularly working on their catalog tables.

Thanks!

Erik Jones
mage2k@gmail.com

Re: Hot Standby Conflict on pg_attribute

От

Jeremy Schneider

Дата:

11 мая 2019 г., 16:15:07

Just a quick footnote: If autovac truncations are frequently causing replica lag, and if this is a problem for you, IIUC one way you can stop autovac from doing the truncations even on older versions is setting old_snapshot_threshold to any value at all besides zero. (On 12+ you can directly control the truncation behavior.)

-Jeremy

Sent from my TI-83

On May 10, 2019, at 12:46, Erik Jones <mage2k@gmail.com> wrote:

On Fri, May 10, 2019 at 12:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-09 13:03:50 -0700, Erik Jones wrote:
>> The question then is: Why would these user queries be waiting on an
>> AccessShare lock on pg_attribute?

> Queries that access a table for the *first* time after DDL happened
> (including truncating the relation), need an AccessShareLock on
> pg_attribute (and pg_class, pg_index, ...) for a short time.

Also, it seems likely that what's really triggering the issue is
autovacuum on pg_attribute trying to truncate off empty pages
in pg_attribute (after a bunch of dead rows were generated there
by DDL activity). That requires exclusive lock on pg_attribute,
which would propagate down to the standby.

regards, tom lane

Right, that part I understood after checking out pg_attribute's insert/delete counts in pg_stat_sys_tables before and after some REFRESH MATERIALIZED VIEW runs on an otherwise idle server. With them running 2k+ refreshes per day autovac is regularly working on their catalog tables.

Thanks!
--
Erik Jones
mage2k@gmail.com

Re: Hot Standby Conflict on pg_attribute

От

Erik Jones

Дата:

11 мая 2019 г., 17:19:08

Thanks for the ti

On Sat, May 11, 2019 at 9:15 AM Jeremy Schneider <schneider@ardentperf.com> wrote:

Just a quick footnote: If autovac truncations are frequently causing replica lag, and if this is a problem for you, IIUC one way you can stop autovac from doing the truncations even on older versions is setting old_snapshot_threshold to any value at all besides zero. (On 12+ you can directly control the truncation behavior.)

-Jeremy

Thanks for the tip!

Erik Jones
mage2k@gmail.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Hot Standby Conflict on pg_attribute

Hot Standby Conflict on pg_attribute

Re: Hot Standby Conflict on pg_attribute

Re: Hot Standby Conflict on pg_attribute

Re: Hot Standby Conflict on pg_attribute

Re: Hot Standby Conflict on pg_attribute

Re: Hot Standby Conflict on pg_attribute

Re: Hot Standby Conflict on pg_attribute