Обсуждение: VACUUM FULL, CLUSTER, and REPACK block on other sessions' temp tables
Hi, While testing another patch [1], I noticed that REPACK is blocked when a temporary table is locked in another session. It also turns out that the same behaviour occurs with VACUUM FULL and CLUSTER: == session 1 == $ psql postgres psql (19devel) Type "help" for help. postgres=# CREATE TEMPORARY TABLE tmp (id int); CREATE TABLE postgres=# BEGIN; LOCK TABLE tmp IN SHARE MODE; BEGIN LOCK TABLE postgres=*# == session 2 == $ psql postgres psql (19devel) Type "help" for help. postgres=# REPACK; ^CCancel request sent ERROR: canceling statement due to user request CONTEXT: waiting for AccessExclusiveLock on relation 38458 of database 5 postgres=# VACUUM FULL; ^CCancel request sent ERROR: canceling statement due to user request CONTEXT: waiting for AccessExclusiveLock on relation 38458 of database 5 Skipping temporary relations in get_tables_to_repack() and get_all_vacuum_rels() before they're appended to the list seems to do the trick -- see attached draft. I can reproduce the same behaviour with CLUSTER and VACUUM FULL in PG14-PG18. I took a quick look at the code in PG17 and PG18 and the fix appears to be straightforward, but before I start working on it, I'd like to hear your thoughts. Is it worth the effort? Best, Jim 1 - https://www.postgresql.org/message-id/13637.1774342137%40localhost
Вложения
> On Mar 24, 2026, at 23:35, Jim Jones <jim.jones@uni-muenster.de> wrote: > > Hi, > > While testing another patch [1], I noticed that REPACK is blocked when a > temporary table is locked in another session. It also turns out that the > same behaviour occurs with VACUUM FULL and CLUSTER: > > == session 1 == > > $ psql postgres > psql (19devel) > Type "help" for help. > > postgres=# CREATE TEMPORARY TABLE tmp (id int); > CREATE TABLE > postgres=# BEGIN; > LOCK TABLE tmp IN SHARE MODE; > BEGIN > LOCK TABLE > postgres=*# > > == session 2 == > > $ psql postgres > psql (19devel) > Type "help" for help. > > postgres=# REPACK; > ^CCancel request sent > ERROR: canceling statement due to user request > CONTEXT: waiting for AccessExclusiveLock on relation 38458 of database 5 > postgres=# VACUUM FULL; > ^CCancel request sent > ERROR: canceling statement due to user request > CONTEXT: waiting for AccessExclusiveLock on relation 38458 of database 5 > > Skipping temporary relations in get_tables_to_repack() and > get_all_vacuum_rels() before they're appended to the list seems to do > the trick -- see attached draft. > > I can reproduce the same behaviour with CLUSTER and VACUUM FULL in > PG14-PG18. I took a quick look at the code in PG17 and PG18 and the fix > appears to be straightforward, but before I start working on it, I'd > like to hear your thoughts. Is it worth the effort? > > Best, Jim > > 1 - https://www.postgresql.org/message-id/13637.1774342137%40localhost<v1-0001-Skip-other-sessions-temp-tables-in-REPACK-CLUSTER.patch> I think skipping temp tables of another session is reasonable, because anyway they are not accessible from the current session,though visible via pg_class. Looking at the patch: ``` + /* Skip temp relations belonging to other sessions */ + if (class->relpersistence == RELPERSISTENCE_TEMP && + isOtherTempNamespace(class->relnamespace)) ``` It uses isOtherTempNamespace(), but I noticed that the header comment of the function says: ``` * isOtherTempNamespace - is the given namespace some other backend's * temporary-table namespace (including temporary-toast-table namespaces)? * * Note: for most purposes in the C code, this function is obsolete. Use * RELATION_IS_OTHER_TEMP() instead to detect non-local temp relations. ``` Then looking at RELATION_IS_OTHER_TEMP(): ``` #define RELATION_IS_OTHER_TEMP(relation) \ ((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP && \ !(relation)->rd_islocaltemp) ``` It takes a relation as parameter and check relation->rd_islocaltemp, however in the context of this patch, we have only Form_pg_class. Then checking how rd_islocaltemp is set: ``` case RELPERSISTENCE_TEMP: if (isTempOrTempToastNamespace(relation->rd_rel->relnamespace)) { relation->rd_backend = ProcNumberForTempRelations(); relation->rd_islocaltemp = true; } else { /* * If it's a temp table, but not one of ours, we have to use * the slow, grotty method to figure out the owning backend. * * Note: it's possible that rd_backend gets set to * MyProcNumber here, in case we are looking at a pg_class * entry left over from a crashed backend that coincidentally * had the same ProcNumber we're using. We should *not* * consider such a table to be "ours"; this is why we need the * separate rd_islocaltemp flag. The pg_class entry will get * flushed if/when we clean out the corresponding temp table * namespace in preparation for using it. */ relation->rd_backend = GetTempNamespaceProcNumber(relation->rd_rel->relnamespace); Assert(relation->rd_backend != INVALID_PROC_NUMBER); relation->rd_islocaltemp = false; } break; ``` It uses isTempOrTempToastNamespace(relation->rd_rel->relnamespace) to decide relation->rd_islocaltemp. So, I think this patch should also use "!isTempOrTempToastNamespace(classForm->relnamespace)" instead of isOtherTempNamespace(class->relnamespace).I tried that locally, and it works for me. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
Hi Chao,
Thanks for the thorough review.
On 25/03/2026 02:55, Chao Li wrote:
> It uses isTempOrTempToastNamespace(relation->rd_rel->relnamespace) to decide relation->rd_islocaltemp.
>
> So, I think this patch should also use "!isTempOrTempToastNamespace(classForm->relnamespace)" instead of
isOtherTempNamespace(class->relnamespace).I tried that locally, and it works for me.
I agree. In get_all_vacuum_rels() and the pg_class scan branch of
get_tables_to_repack(), relpersistence == RELPERSISTENCE_TEMP already
establishes the relation is a temp table, so
!isTempOrTempToastNamespace() alone is sufficient to identify
other-session temp tables.
In the usingindex path of get_tables_to_repack(), Form_pg_class is not
available, so there is no relpersistence to use as a pre-filter. The
explicit isAnyTempNamespace() check is required to avoid incorrectly
skipping permanent tables. This is pretty much what
isOtherTempNamespace() does internally -- the only change is inlining it
to avoid the obsolete wrapper.
* Skip temp relations belonging to other sessions */
{
Oid nsp = get_rel_namespace(index->indrelid);
if (!isTempOrTempToastNamespace(nsp) && isAnyTempNamespace(nsp))
{
UnlockRelationOid(index->indrelid, AccessShareLock);
continue;
}
}
v2 attached.
Thanks!
Best, Jim
Вложения
I forgot to create a CF entry. Here it is: https://commitfest.postgresql.org/patch/6616/ Best, Jim
Hello!
Shouldn't the patch also include a tap test to verify that the change
works / fails without it?
+ /* Skip temp relations belonging to other sessions */
+ {
+ Oid nsp = get_rel_namespace(index->indrelid);
+
+ if (!isTempOrTempToastNamespace(nsp) && isAnyTempNamespace(nsp))
+ {
Doesn't this result in several repeated syscache lookups?
There's already a SearchSysCacheExsists1 directly above this, then a
get_rel_namespace, then an isAnyTempNamespace. While this probably
isn't performance critical, this should be doable with a single
SearchSysCache1(RELOID...) and then a few conditions, similarly to the
else branch below this?
Hi
On 25/03/2026 21:38, Zsolt Parragi wrote:
> Shouldn't the patch also include a tap test to verify that the change
> works / fails without it?
Definitely. I just didn't want to invest much time on tests before
getting feedback on the issue itself.
> + /* Skip temp relations belonging to other sessions */
> + {
> + Oid nsp = get_rel_namespace(index->indrelid);
> +
> + if (!isTempOrTempToastNamespace(nsp) && isAnyTempNamespace(nsp))
> + {
>
> Doesn't this result in several repeated syscache lookups?
>
> There's already a SearchSysCacheExsists1 directly above this, then a
> get_rel_namespace, then an isAnyTempNamespace. While this probably
> isn't performance critical, this should be doable with a single
> SearchSysCache1(RELOID...) and then a few conditions, similarly to the
> else branch below this?
You're right. Although it is not performance critical we can solve it
with a single SearchSysCache1.
PFA v3 with the improved fix (0001) and tests (0002).
Thanks for the review!
Best, Jim
Вложения
> On Mar 26, 2026, at 07:00, Jim Jones <jim.jones@uni-muenster.de> wrote:
>
> Hi
>
> On 25/03/2026 21:38, Zsolt Parragi wrote:
>> Shouldn't the patch also include a tap test to verify that the change
>> works / fails without it?
>
> Definitely. I just didn't want to invest much time on tests before
> getting feedback on the issue itself.
>
>> + /* Skip temp relations belonging to other sessions */
>> + {
>> + Oid nsp = get_rel_namespace(index->indrelid);
>> +
>> + if (!isTempOrTempToastNamespace(nsp) && isAnyTempNamespace(nsp))
>> + {
>>
>> Doesn't this result in several repeated syscache lookups?
>>
>> There's already a SearchSysCacheExsists1 directly above this, then a
>> get_rel_namespace, then an isAnyTempNamespace. While this probably
>> isn't performance critical, this should be doable with a single
>> SearchSysCache1(RELOID...) and then a few conditions, similarly to the
>> else branch below this?
>
> You're right. Although it is not performance critical we can solve it
> with a single SearchSysCache1.
>
> PFA v3 with the improved fix (0001) and tests (0002).
>
> Thanks for the review!
>
> Best,
Jim<v3-0001-Skip-other-sessions-temp-tables-in-REPACK-CLUSTER.patch><v3-0002-Test-VACUUM-FULL-CLUSTER-and-REPACK-with-locked-t.patch>
I don't think such a TAP test is necessary.
One reason is that, if we look at the check right above the new one:
```
/*
* We include partitioned tables here; depending on which operation is
* to be performed, caller will decide whether to process or ignore
* them.
*/
if (classForm->relkind != RELKIND_RELATION &&
classForm->relkind != RELKIND_MATVIEW &&
classForm->relkind != RELKIND_PARTITIONED_TABLE)
continue;
```
I don't see a test specifically for that check either. So I don't think we need a test for every individual path.
Second, based on [1] and [2], I got the impression that adding new tests is not always welcome considering overall test
runtime.Anyway, maybe I’m wrong, let the committers judge that.
[1] https://postgr.es/m/mtkrkkcn2tlhytumitpch5ubxiprv2jzvprf5r5m3mjeczvq4q@p6wkzbfxuyv2 <https://postgr.es/m/>
[2] https://postgr.es/m/1449781.1773948276@sss.pgh.pa.us
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Chao Li <li.evan.chao@gmail.com> wrote: > I don't think such a TAP test is necessary. +1 -- Antonin Houska Web: https://www.cybertec-postgresql.com
On 26/03/2026 12:25, Antonin Houska wrote: > Chao Li <li.evan.chao@gmail.com> wrote: > >> I don't think such a TAP test is necessary. > +1 I've kept the tests in a separate file so the committer can easily skip them if needed. Thanks for the feedback, everyone! Best, Jim