Re: BUG #18559: Crash after detaching a partition concurrently from another session
От | Kuntal Ghosh |
---|---|
Тема | Re: BUG #18559: Crash after detaching a partition concurrently from another session |
Дата | |
Msg-id | CAGz5QC+yTwKXAYgvvmiPAwez4CVkwO==ECDK+3fMAJ8j9Lcbkw@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: BUG #18559: Crash after detaching a partition concurrently from another session
Re: BUG #18559: Crash after detaching a partition concurrently from another session |
Список | pgsql-bugs |
On Tue, Jul 30, 2024 at 7:18 PM PG Bug reporting form <noreply@postgresql.org> wrote: Adding Alvaro. > > The following code assumes that an pg_class entry for the detached partition > will always be available which is wrong. > Referring to the following code. /* * Two problems are possible here. First, a concurrent ATTACH * PARTITION might be in the process of adding a new partition, but * the syscache doesn't have it, or its copy of it does not yet have * its relpartbound set. We cannot just AcceptInvalidationMessages(), * because the other process might have already removed itself from * the ProcArray but not yet added its invalidation messages to the * shared queue. We solve this problem by reading pg_class directly * for the desired tuple. * * The other problem is that DETACH CONCURRENTLY is in the process of * removing a partition, which happens in two steps: first it marks it * as "detach pending", commits, then unsets relpartbound. If * find_inheritance_children_extended included that partition but we * below we see that DETACH CONCURRENTLY has reset relpartbound for * it, we'd see an inconsistent view. (The inconsistency is seen * because table_open below reads invalidation messages.) We protect * against this by retrying find_inheritance_children_extended(). */ if (boundspec == NULL) { Relation pg_class; SysScanDesc scan; ScanKeyData key[1]; Datum datum; bool isnull; pg_class = table_open(RelationRelationId, AccessShareLock); ScanKeyInit(&key[0], Anum_pg_class_oid, BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(inhrelid)); scan = systable_beginscan(pg_class, ClassOidIndexId, true, NULL, 1, key); tuple = systable_getnext(scan); datum = heap_getattr(tuple, Anum_pg_class_relpartbound, RelationGetDescr(pg_class), &isnull); if (!isnull) boundspec = stringToNode(TextDatumGetCString(datum)); systable_endscan(scan); table_close(pg_class, AccessShareLock); IIUC, a simple fix would be to retry if an entry is not found. Attached a patch. -- Thanks & Regards, Kuntal Ghosh
Вложения
В списке pgsql-bugs по дате отправления: