Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943

Поиск
Список
Период
Сортировка
От Tender Wang
Тема Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943
Дата
Msg-id CAHewXNk125iiK4MgygmdwqGP=CQ4ZOMWLRr5+t9hAkXX8xd1ag@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Ответы Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943
Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943
Список pgsql-bugs


Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年5月21日周二 00:16写道:
On 2024-May-13, Alvaro Herrera wrote:

> Evidently there's something I'm missing in how this plancache
> invalidation works.

Actually, what I was missing is that I had forgotten how
PartitionDirectory is actually used.  In the original code I wrote for
DETACH CONCURRENTLY for 2ndQPostgres a few years ago, I set things up so
that both planner and executor used the same PartitionDesc for a
partitioned table.  But when this was reworked for real Postgres by
Robert to implement concurrent ATTACH, he instead introduced
PartitionDirectory which keeps a PartitionDesc unchanged -- but planner
and executor use different PartitionDirectories.  I probably reworked
some of DETACH CONCURRENTLY to cope with that, but evidently must have
missed this scenario.

So the problem is that we read the partition descriptor during
re-planning with N partitions, and then if we receive the corresponding
invalidation message (at executor time) exactly before
CreatePartitionPruneState() creates its PartitionDirectory, then we read
N-1 partitions, and everything explodes.  (If the invalidation is
received at some other time, then the plan is remade before
ExecInitAppend by plancache, and everything works correctly.)

So one fix for this problem seems to be to patch
CreatePartitionPruneState() to accept the possibility that one partition
has been detached concurrently, and construct the planmap/partmap arrays
with a marking that the partition in question has been pruned away.


With that fix in place, I was now getting errors from
RelationBuildPartitionDesc() that some partition in the partdesc had
NULL relpartbound.  If I understand correctly, the problem here is that
we have a snapshot that still sees that partition as being in the
hierarchy, but in reality its relpartbound has been nullified by the
concurrent detach.  I can fix this symptom by doing
AcceptInvalidationMesages() and starting back at the top, where the list
of partitions is obtained by find_inheritance_children_extended().

With these two things in place , the pgbench scripts run without errors.
That's in patch 0001, which deserves a commit message that distills the
above.


I'm not convinced that this fix is correct or complete, but hey, it's
progress.  Here's some things I'm worried about:

1. this code assumes that a single partition can be in this "detach
concurrently" mode.  This sounds correct, because this operation needs
share update exclusive lock on the table, and we only allow one
partition to be in pending-detach state.  However, what if we have a
very slow process doing the query-side mode, and two processes manage to
detach two different tables in between?  I think this isn't possible
because of the wait-for-snapshots phase, but I need to convince myself
that this is the right explanation.

2. The new code in CreatePartitionPruneState assumes that if nparts
decreases, then it must be a detach, and if nparts increases, it must be
an attach.  Can these two things happen together in a way that we see
that the number of partitions remains the same, so we don't actually try
to construct planmap/partmap arrays by matching their OIDs?  I think the
only way to handle a possible problem here would be to verify the OIDs
every time we construct a partition descriptor.  I assume (without
checking) this would come with a performance cost, not sure.

3. The new stanza in CreatePartitionPruneState() should have more
sanity-checks that the two arrays we scan really match.

4. I made RelationBuildPartitionDesc retry only once.  I think this
should be enough, because a single AcceptInvalidationMessages should
process everything that we need to read to bring us up to current
reality, considering that DETACH CONCURRENTLY would need to wait for our
snapshot.

I would like to spend some more time on this, but I'm afraid I'll have
to put it off for a little while.

PS -- yes, the self-contradictory comment in CreatePartitionPruneState()
is bogus.  Patch 0002 fixes that.

I have tested this patch locally and did not encounter any failures.
I will take time to look the patch in detail and consider the issues you mentioned.
--
Tender Wang
OpenPie:  https://en.openpie.com/

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Kamil ADEM
Дата:
Сообщение: RE: Postgresql 16.3 installation error (setup file) on Windows 11
Следующее
От: Rahul Pandey
Дата:
Сообщение: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607