Обсуждение: ALTER TABLE DETACH PARTITION violates serializability

Поиск
Список
Период
Сортировка

ALTER TABLE DETACH PARTITION violates serializability

От
Tom Lane
Дата:
I wasn't aware of $SUBJECT ... were you?

Here's a demonstration:

drop table if exists pk, fk, pk1, pk2;
create table pk (f1 int primary key) partition by list(f1);
create table pk1 partition of pk for values in (1);
create table pk2 partition of pk for values in (2);
insert into pk values(1);
insert into pk values(2);
create table fk (f1 int references pk);
insert into fk values(1);
insert into fk values(2);

In session 1, next do

regression=# begin isolation level serializable;
BEGIN
regression=*# select * from unrelated_table;   -- to set the xact snapshot
...

Now in session 2, do

regression=# delete from fk where f1=2;
DELETE 1
regression=# alter table pk detach partition pk2;             
ALTER TABLE

Back at session 1, we now see what's not only a serializability
violation, but a not-even-self-consistent view of the database:

regression=*# select * from fk;
 f1 
----
  1
  2
(2 rows)

regression=*# select * from pk;
 f1 
----
  1
(1 row)

This is slightly ameliorated by the fact that if session 1 has
already touched either pk or fk, locking considerations will
block the DETACH.  But only slightly.

(Dropping a partition altogether has the same issue, of course.)

AFAICS, the only real way to fix this is to acquire lock on
the target partition and then wait out any snapshots that are
older than the lock, just in case those transactions would look
at the partitioned table later.  I'm not sure if we want to go
there, but if we don't, we at least have to document this gotcha.

            regards, tom lane



Re: ALTER TABLE DETACH PARTITION violates serializability

От
Alvaro Herrera
Дата:
On 2021-Nov-12, Tom Lane wrote:

> I wasn't aware of $SUBJECT ... were you?

Yeah, I remember pointing out that DETACH and DROP and not fully correct
for serializability, but I can't find any thread where I said it in so
many words.  At the time I had no ideas on how to fix it; the idea of
waiting for snapshots to go away didn't occur to anybody, or at least it
didn't reach me.

> AFAICS, the only real way to fix this is to acquire lock on
> the target partition and then wait out any snapshots that are
> older than the lock, just in case those transactions would look
> at the partitioned table later.  I'm not sure if we want to go
> there, but if we don't, we at least have to document this gotcha.

ALTER TABLE DETACH PARTITION CONCURRENTLY already has a
wait-for-snapshots phase.  It doesn't fix this problem, because it
doesn't wait for all snapshots, just the ones holding any lock on the
parent table.  I'm not sure how upset would people be if we made it wait
on *all* snapshots, and worse, to make it in all cases and not just
CONCURRENTLY.

I understand that the behavior is not fully correct, but given the way
most people are going to use this (which is that they're no longer
terribly interested in the data of the partition being detached/dropped)
and the severity of the penalties if we implement a full solution, I
lean towards documenting it rather than fixing it.

Another option might be to play with the trade-offs in the CONCURRENTLY
mode vs. the regular one.  If we make the CONCURRENTLY mode wait for all
snapshots, we would only be making worse a mode that's already
documented to potentially take a long time.  So people who want safety
can use that mode, and the others are aware of the risk.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this."                               (Fotis)
               (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)



Re: ALTER TABLE DETACH PARTITION violates serializability

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> I understand that the behavior is not fully correct, but given the way
> most people are going to use this (which is that they're no longer
> terribly interested in the data of the partition being detached/dropped)
> and the severity of the penalties if we implement a full solution, I
> lean towards documenting it rather than fixing it.

> Another option might be to play with the trade-offs in the CONCURRENTLY
> mode vs. the regular one.  If we make the CONCURRENTLY mode wait for all
> snapshots, we would only be making worse a mode that's already
> documented to potentially take a long time.  So people who want safety
> can use that mode, and the others are aware of the risk.

Meh ... "wrong by default" doesn't seem like it fits the Postgres ethos.
How about adding an option UNSAFE (orthogonal to CONCURRENTLY) that
activates the current behavior, and without it we wait for everything?

            regards, tom lane



Re: ALTER TABLE DETACH PARTITION violates serializability

От
Robert Haas
Дата:
On Fri, Nov 12, 2021 at 3:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wasn't aware of $SUBJECT ... were you?

I wasn't aware of this particular example, but I didn't spend much
time worrying about it, because DDL interleaved with DML generally
isn't serializable anyway. If you issue "SELECT * FROM
some_previously_unacessed_table" in a transaction, it will find
some_table even if the XID that created that table is not visible to
your snapshot. And conversely if that table has been dropped by some
XID that is not visible to your snapshot, you will get an error that
the table doesn't exist. There's also ATRewriteTable(), which is not
MVCC-aware, and thus if someone applies that to a table before you
first access it, you may see rows you shouldn't see and fail to see
rows which you should see. I don't see those kinds of cases as being
all that different from this one.

My judgement is that outwaiting other XIDs would make far more people
unhappy than it would make happy. If we want to have that as an
optional behavior, I guess that would be OK, but I sure wouldn't make
it the default. I feel like I went to quite a bit of effort to keep
the lock levels as low as reasonably possible here and to allow as
much concurrency as I could, and I bet there are still a LOT more
people who are unhappy about things that take more locks than they'd
like than there are people who are sad about these kinds of MVCC
issues.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: ALTER TABLE DETACH PARTITION violates serializability

От
Alvaro Herrera
Дата:
On 2021-Nov-12, Tom Lane wrote:

> Meh ... "wrong by default" doesn't seem like it fits the Postgres ethos.

I certainly agree with this, yeah.

> How about adding an option UNSAFE (orthogonal to CONCURRENTLY) that
> activates the current behavior, and without it we wait for everything?

Something like that might be the best compromise.  I don't like having
to grab AEL on the parent table (which the non-concurrent version does)
for as long as every old snapshot is gone, so changing the default
behavior without any escape hatch doesn't sound very pallatable.

I think the best behavior is the CONCURRENTLY option with your
additional wait phase -- if only it worked in a transaction block, it
would be perfect.  Users doing DDL as strong as DETACH should be willing
to wait potentially long time for it to finish, but only if it doesn't
disrupt concurrent load involving other partitions.

I don't feel a need to add UNSAFE as an option to CONCURRENTLY, though.
Users are supposed to be already aware that it takes a long time, and it
doesn't block queries using the remaining partitions.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: ALTER TABLE DETACH PARTITION violates serializability

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2021-Nov-12, Tom Lane wrote:
>> How about adding an option UNSAFE (orthogonal to CONCURRENTLY) that
>> activates the current behavior, and without it we wait for everything?

> Something like that might be the best compromise.  I don't like having
> to grab AEL on the parent table (which the non-concurrent version does)
> for as long as every old snapshot is gone, so changing the default
> behavior without any escape hatch doesn't sound very pallatable.

> I think the best behavior is the CONCURRENTLY option with your
> additional wait phase -- if only it worked in a transaction block, it
> would be perfect.  Users doing DDL as strong as DETACH should be willing
> to wait potentially long time for it to finish, but only if it doesn't
> disrupt concurrent load involving other partitions.

I grant that a lot of people will be less worried about the transactional
semantics problem than about how strong a lock they need.  I'd be willing
to settle for "DETACH CONCURRENTLY is safe and plain DETACH isn't", as
long as that's very prominently documented.

However ... it appears to me that focusing on the lock strength for
the bare DETACH is misleading.  When I was experimenting with this
stuff yesterday, I observed that if I had any foreign keys on the
partitioned table, DETACH would insist on AEL both on the partition
*and* on the table that's the other end of the FK constraint.
I suppose that's needed to get rid of triggers and pg_constraint
entries.  But it seems to put a large hole in the argument that
you're going to be able to avoid strong locks and corresponding
waits/blockages when you do a DETACH.

            regards, tom lane