Обсуждение: BDR - DDL Locking

Поиск
Список
Период
Сортировка

BDR - DDL Locking

От
Will McCormick
Дата:
First time user here and new to PostgreSQL and BDR so I hope I have the right place.

I attempted to issues a TRUNCATE TABLE without the cascade option on a Parent table that had a child FK constraint.

After this I received:

ERROR:  Database is locked against DDL operations
HINT:  Node (6203352813534641995,1,16387) in the cluster is already performing DDL

I'm trying to remove the lock so I:

bms=#  select * from pg_locks;
  locktype  | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid  |      mode       | granted | fastpath
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+------+-----------------+---------+----------
 relation   |    16387 |    11189 |      |       |            |               |         |       |          | 6/58468            3049 | AccessShareLock | t       | t
 virtualxid |          |          |      |       | 6/58468    |               |         |       |          | 6/58468            3049 | ExclusiveLock   | t       | t
 
 
bms=# SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
bms-#     ON pl.virtualtransaction = '-1/' || ppx.transaction;
  locktype  | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction |  pid  |      mode       | granted | fastpath | transaction | gid | prepared |
owner | database
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+-----------------+---------+----------+-------------+-----+----------+-
------+----------
 virtualxid |          |          |      |       | 5/861      |               |         |       |          | 5/861              20010 | ExclusiveLock   | t       | t        |             |     |          |
      |
 relation   |    16387 |    11201 |      |       |            |               |         |       |          | 6/58456            |  3049 | AccessShareLock | t       | t        |             |     |          |
      |
 relation   |    16387 |    11189 |      |       |            |               |         |       |          | 6/58456            |  3049 | AccessShareLock | t       | t        |             |     |          |
      |
 virtualxid |          |          |      |       | 6/58456    |               |         |       |          | 6/58456            |  3049 | ExclusiveLock   | t       | t        |             |     |          |
      |
 relation   |        0 |     2671 |      |       |            |               |         |       |          | 6/58456            |  3049 | AccessShareLock | t       | f        |             |     |          |
      |
 relation   |        0 |     1262 |      |       |            |               |         |       |          | 6/58456            |  3049 | AccessShareLock | t       | f        |             |     |          |
      |
 relation   |        0 |     1260 |      |       |            |               |         |       |          | 6/58456            |  3049 | AccessShareLock | t       | f        |             |     |          |
      |
 relation   |        0 |     2672 |      |       |            |               |         |       |          | 6/58456            |  3049 | AccessShareLock | t       | f        |             |     |          |
      |
 relation   |        0 |     2677 |      |       |            |               |         |       |          | 6/58456            |  3049 | AccessShareLock | t       | f        |             |     |          |
      |
 relation   |        0 |     2676 |      |       |            |               |         |       |          | 6/58456            |  3049 | AccessShareLock | t       | f        |             |     |          |
      |
(10 rows)

bms=# select pg_cancel_backend(20010);
 pg_cancel_backend
-------------------
 t
 
bms=#  select pg_cancel_backend(3049);
ERROR:  canceling statement due to user request

3049 never dies. If I restart postgresql the lock persists.

Server details:

[postgres@klw1129 ~/product/9.4.4/logs] psql -U bms -d bms
Password for user bms:
psql (9.4.4)
Type "help" for help.

bms=# select * from bdr.bdr_nodes;
     node_sysid      | node_timeline | node_dboid | node_status | node_name |                         node_local_dsn                          |                       node_init_from_dsn
---------------------+---------------+------------+-------------+-----------+-----------------------------------------------------------------+-----------------------------------------------------------------
 6203352813534641995 |             1 |      16387 | r           | KLW1128   | port=5432 dbname=bms user=bms password=bms host=192.168.180.228 |
 6203352897032163158 |             1 |      16387 | r           | KLW1129   | port=5432 dbname=bms user=bms password=bms host=192.168.180.229 | port=5432 dbname=bms user=bms password=bms host=192.168.180.228


[postgres@klw1128 ~/product/9.4.4/logs] psql -U bms -d bms
Password for user bms:
psql (9.4.4)
Type "help" for help.

bms=# select * from bdr.bdr_nodes;
     node_sysid      | node_timeline | node_dboid | node_status | node_name |                         node_local_dsn                          |                       node_init_from_dsn
---------------------+---------------+------------+-------------+-----------+-----------------------------------------------------------------+-----------------------------------------------------------------
 6203352813534641995 |             1 |      16387 | r           | KLW1128   | port=5432 dbname=bms user=bms password=bms host=192.168.180.228 |
 6203352897032163158 |             1 |      16387 | r           | KLW1129   | port=5432 dbname=bms user=bms password=bms host=192.168.180.229 | port=5432 dbname=bms user=bms password=bms host=192.168.180.228
(2 rows)


My log file on klw1129 (Target of DDL):

2015-10-20 15:28:11 GMTLOG:  starting background worker process "bdr (6203352897032163158,1,16387,)->bdr (6203352813534641995,1,"
2015-10-20 15:28:11 GMTERROR:  cannot truncate a table referenced in a foreign key constraint
2015-10-20 15:28:11 GMTDETAIL:  Table "conflict_child" references "conflict".
2015-10-20 15:28:11 GMTHINT:  Truncate table "conflict_child" at the same time, or use TRUNCATE ... CASCADE.
2015-10-20 15:28:11 GMTCONTEXT:  during DDL replay of ddl statement: TRUNCATE TABLE ONLY bms.conflict
2015-10-20 15:28:11 GMTLOG:  worker process: bdr (6203352897032163158,1,16387,)->bdr (6203352813534641995,1, (PID 28543) exited with exit code 1
2015-10-20 15:28:16 GMTLOG:  starting background worker process "bdr (6203352897032163158,1,16387,)->bdr (6203352813534641995,1,"
2015-10-20 15:28:16 GMTERROR:  cannot truncate a table referenced in a foreign key constraint
2015-10-20 15:28:16 GMTDETAIL:  Table "conflict_child" references "conflict".
2015-10-20 15:28:16 GMTHINT:  Truncate table "conflict_child" at the same time, or use TRUNCATE ... CASCADE.
2015-10-20 15:28:16 GMTCONTEXT:  during DDL replay of ddl statement: TRUNCATE TABLE ONLY bms.conflict
2015-10-20 15:28:16 GMTLOG:  worker process: bdr (6203352897032163158,1,16387,)->bdr (6203352813534641995,1, (PID 28546) exited with exit code 1

My log file on klw1128 (Source of DDL):

2015-10-20 15:26:55 GMTLOG:  starting logical decoding for slot "bdr_16387_6203352897032163158_1_16387__"
2015-10-20 15:26:55 GMTDETAIL:  streaming transactions committing after 0/808BBC8, reading WAL from 0/808BB00
2015-10-20 15:26:55 GMTLOG:  logical decoding found consistent point at 0/808BB00
2015-10-20 15:26:55 GMTDETAIL:  There are no running transactions.
2015-10-20 15:26:55 GMTLOG:  could not receive data from client: Connection reset by peer
2015-10-20 15:26:55 GMTLOG:  unexpected EOF on standby connection

[postgres@klw1129 ~/product/9.4.4/logs] psql -U bms -d bms
Password for user bms:
psql (9.4.4)
Type "help" for help.

bms=# select * from pg_extension;
  extname   | extowner | extnamespace | extrelocatable | extversion |                           extconfig                           |          extcondition
------------+----------+--------------+----------------+------------+---------------------------------------------------------------+---------------------------------
 plpgsql    |       10 |           11 | f              | 1.0        |                                                               |
 btree_gist |    16385 |         2200 | t              | 1.0        |                                                               |
 bdr        |    16385 |           11 | f              | 0.9.2.0    | {16919,16934,16947,16983,17013,17023,17032,17039,17052,17121} | {"","","","","","","","","",""}
(3 rows)

How do I recover from this scenario without a rebuild? 
If I cannot recover from this scenario what caused this?

I'm currently testing postgreSQL and BDR for productization.

Thanks,

Will McCormick

Re: BDR - DDL Locking

От
Craig Ringer
Дата:
What's the *exact* BDR version?

When you say you "attempted to" - what was the outcome? Presumably an
ERROR from the TRUNCATE, right? That would roll back the transaction,
and in the process abort the DDL lock acquisition attempt.

Are you sure replication was working normally prior to this point,
with no issues?

The global DDL lock isn't a true lock in the sense that it appears in
pg_locks, etc. If you roll back the transaction trying to acquire it,
or terminate the PostgreSQL backend attempting to acquire it - such as
your TRUNCATE - using pg_terminate_backend(...) then it will be
removed automatically. If for any reason that is not the case (which
it shouldn't be) then restarting the nodes will clear it.


Re: BDR - DDL Locking

От
Will McCormick
Дата:
Hey Craig thank you very much for your response. 

When you say you "attempted to" - what was the outcome? 

I tried a truncate without the cascade option. After that I tried it with the cascade option. The session just hanged indefinitely at that point. There was no rollback and I was testing on an empty table.

Replication was in a ready state on both nodes and both DDL and DML was replicating.

0.9.2.0 BDR

When you say restarting the nodes. I did restart postgres and this didn't help.


On Wed, Oct 21, 2015 at 8:31 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
What's the *exact* BDR version?

When you say you "attempted to" - what was the outcome? Presumably an
ERROR from the TRUNCATE, right? That would roll back the transaction,
and in the process abort the DDL lock acquisition attempt.

Are you sure replication was working normally prior to this point,
with no issues?

The global DDL lock isn't a true lock in the sense that it appears in
pg_locks, etc. If you roll back the transaction trying to acquire it,
or terminate the PostgreSQL backend attempting to acquire it - such as
your TRUNCATE - using pg_terminate_backend(...) then it will be
removed automatically. If for any reason that is not the case (which
it shouldn't be) then restarting the nodes will clear it.

Re: BDR - DDL Locking

От
Craig Ringer
Дата:
Will,

I saw after replying that there's more detail I missed in your mail,
so please see the more detailed reply inline below.

On 20 October 2015 at 23:31, Will McCormick <wmccormick@gmail.com> wrote:
> First time user here and new to PostgreSQL and BDR so I hope I have the
> right place.

You do.

> I attempted to issues a TRUNCATE TABLE without the cascade option on a
> Parent table that had a child FK constraint.

I've looked at your logs, and it looks like the TRUNCATE suceeded on
the node that was doing the DDL and it was queued for replication.
Then, when applying to another node, it failed because there was a
foreign key relationship referencing the target table.

This is odd, because the way BDR captures TRUNCATEs should prevent
that from happening. It uses triggers to capture TRUNCATES and
enqueues them for execution. However, I can see upon inspection that
the approach used just isn't sufficient to handle FK relationships,
and that the current test suite doesn't cover this.

I'm going to write a test to confirm what I think is going on, then follow up.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services