Re: BUG #18559: Crash after detaching a partition concurrently from another session
От | Junwang Zhao |
---|---|
Тема | Re: BUG #18559: Crash after detaching a partition concurrently from another session |
Дата | |
Msg-id | CAEG8a3+qEsMW_R502GB9szReSGnhUvoNuvLci1wvw07+Lbegiw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #18559: Crash after detaching a partition concurrently from another session (Kuntal Ghosh <kuntalghosh.2007@gmail.com>) |
Ответы |
Re: BUG #18559: Crash after detaching a partition concurrently from another session
|
Список | pgsql-bugs |
On Tue, Jul 30, 2024 at 9:52 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > > On Tue, Jul 30, 2024 at 7:18 PM PG Bug reporting form > <noreply@postgresql.org> wrote: > Adding Alvaro. > > > > The following code assumes that an pg_class entry for the detached partition > > will always be available which is wrong. > > > Referring to the following code. > /* > * Two problems are possible here. First, a concurrent ATTACH > * PARTITION might be in the process of adding a new partition, but > * the syscache doesn't have it, or its copy of it does not yet have > * its relpartbound set. We cannot just AcceptInvalidationMessages(), > * because the other process might have already removed itself from > * the ProcArray but not yet added its invalidation messages to the > * shared queue. We solve this problem by reading pg_class directly > * for the desired tuple. > * > * The other problem is that DETACH CONCURRENTLY is in the process of > * removing a partition, which happens in two steps: first it marks it > * as "detach pending", commits, then unsets relpartbound. If > * find_inheritance_children_extended included that partition but we > * below we see that DETACH CONCURRENTLY has reset relpartbound for > * it, we'd see an inconsistent view. (The inconsistency is seen > * because table_open below reads invalidation messages.) We protect > * against this by retrying find_inheritance_children_extended(). > */ > if (boundspec == NULL) > { > Relation pg_class; > SysScanDesc scan; > ScanKeyData key[1]; > Datum datum; > bool isnull; > > pg_class = table_open(RelationRelationId, AccessShareLock); > ScanKeyInit(&key[0], > Anum_pg_class_oid, > BTEqualStrategyNumber, F_OIDEQ, > ObjectIdGetDatum(inhrelid)); > scan = systable_beginscan(pg_class, ClassOidIndexId, true, > NULL, 1, key); > tuple = systable_getnext(scan); > datum = heap_getattr(tuple, Anum_pg_class_relpartbound, > RelationGetDescr(pg_class), &isnull); > if (!isnull) > boundspec = stringToNode(TextDatumGetCString(datum)); > systable_endscan(scan); > table_close(pg_class, AccessShareLock); > > IIUC, a simple fix would be to retry if an entry is not found. Attached a patch. I can reproduce the issue, and the patch LGTM. > > -- > Thanks & Regards, > Kuntal Ghosh -- Regards Junwang Zhao
В списке pgsql-bugs по дате отправления: