Обсуждение: Lock tag of relation extend lock
Hi all, In a recent debug I found two process conflict on relation extension lock, one is holding it for MAIN fork extension, while the other one is trying to do FSM extension. It seems that the extension lock is using the logical relid of a table as lock tag, but smgrextend is independant among each fork. LockRelationForExtension is used to lock out concurrent extension to get an accurate smgrnblocks (of MAIN fork, mostly) for where to extend the fork from. Except for that in bufmgr.c, where the forknum is passed in as parameter, so main/fsm/vm extension shares the code. Would it be more reasonable to use physical identifier as the lock tag, like rlocator + fork? In that case, smgr*extend will not block on separate forks. And also it is easier to share code between recovery and normal operations, (see what definition of struct BufferManagerRelation says), because currently relation extension lock needs a relcache to be passed in, and we have to build a fake relcache during xlog recovery. The lockinfo of the fake relcache may be wrong actually, although it's not a problem. If we use the physical information as lock tag, the lockinfo of fake relcache won't be that hack. Good thing is that different forks of the same relfile can be extended concurrently by different processes. Not sure about any side effect. Any thoughts? -- Regards, Jingtang
Hi, On 2025-10-06 19:39:18 +0800, Jingtang Zhang wrote: > In a recent debug I found two process conflict on relation extension lock, > one is holding it for MAIN fork extension, while the other one is trying to > do FSM extension. It seems that the extension lock is using the logical relid > of a table as lock tag, but smgrextend is independant among each fork. > > LockRelationForExtension is used to lock out concurrent extension to get an > accurate smgrnblocks (of MAIN fork, mostly) for where to extend the fork from. > Except for that in bufmgr.c, where the forknum is passed in as parameter, > so main/fsm/vm extension shares the code. What workload actually has significant enough extension workload on the VM/FSM to make this a problem? Greetings, Andres Freund
Hi~ > What workload actually has significant enough extension workload on the VM/FSM > to make this a problem? The workload I was running is a concurrent INSERT into the same table. I'm running PostgreSQL with direct I/O on a distributed file system, where the latency of extend is significantly higher than the local storage, making conflict of extend lock really serious (it can be mimic by adding pg_usleep at the end of smgrzeroextend), about 200us once extend. Most of the conflicts happen on main fork extend, however sometimes the conflict may happen on FSM because the bulk extended pages need to be added into FSM. It may be conflict with both main fork extend or fsm fork extend of other processes. But actually the fsm fork extend does not need to be conflict with main fork extend? #5 0x00000000008a717e in WaitOnLock #6 0x00000000008a7d4b in LockAcquireExtended #7 0x00000000008a876e in LockAcquire #8 0x00000000008a584f in LockRelationForExtension #9 0x000000000087fd6b in ExtendBufferedRelShared #10 0x00000000008816ab in ExtendBufferedRelCommon #11 ExtendBufferedRelTo #12 0x000000000088f538 in fsm_extend #13 fsm_readbuf #14 0x000000000088f627 in fsm_set_and_search #15 0x0000000000501e81 in RelationAddBlocks -- Regards, Jingtang