Обсуждение: ProcArrayLock contention
I've been playing with the attached patch, which adds an additional light-weight lock mode, LW_SHARED2. LW_SHARED2 conflicts with LW_SHARED and LW_EXCLUSIVE, but not with itself. The patch changes ProcArrayEndTransaction() to use this new mode. IOW, multiple processes can commit at the same time, and multiple processes can take snapshots at the same time, but nobody can take a snapshot while someone else is committing. Needless to say, I don't we'd really want to apply this, because adding a LW_SHARED2 mode that's probably only useful for ProcArrayLock would be a pretty ugly wart. But the results are interesting. pgbench, scale factor 100, unlogged tables, Nate Boley's 32-core AMD box, shared_buffers = 8GB, maintenance_work_mem = 1GB, synchronous_commit = off, checkpoint_segments = 300, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9, wal_writer_delay = 20ms, results are median of three five-minute runs: #clients tps(master) tps(lwshared2) 1 657.984859 683.251582 8 4748.906750 4946.069238 32 10695.160555 17530.390578 80 7727.563437 16099.549506 That's a pretty impressive speedup, but there's trouble in paradise. With 80 clients (but not 32 or fewer), I occasionally get the following error: ERROR: t_xmin is uncommitted in tuple to be updated So it seems that there's some way in which this locking is actually incorrect, though I'm not seeing what it is at the moment. Either that, or there's some bug in the existing code that happens to be exposed by this change. The patch also produces a (much smaller) speedup with regular tables, but it's hard to know how seriously to take that until the locking issue is debugged. Any ideas? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Tue, Nov 8, 2011 at 4:52 AM, Robert Haas <robertmhaas@gmail.com> wrote: > With 80 clients (but not 32 or fewer), I occasionally get the > following error: > > ERROR: t_xmin is uncommitted in tuple to be updated > > So it seems that there's some way in which this locking is actually > incorrect, though I'm not seeing what it is at the moment. Either > that, or there's some bug in the existing code that happens to be > exposed by this change. The semantics of shared locks is that they jump the existing queue, so this patch allows locks to be held in sequences not previously seen when using exclusive locks. For me, the second kind of lock should queue up normally, but then be released en masse when possible. So queue like an exclusive, but wake like a shared. Vaguely remember shared_queued.v1.patch That can then produce flip-flop lock parties. A slight problem there is that when shared locks queue they don't all queue together, a problem which the other patch addresses, written long ago. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Tue, Nov 8, 2011 at 2:24 AM, YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp> wrote: > latestCompletedXid got backward due to concurrent updates > and it fooled TransactionIdIsInProgress? Ah ha! I bet that's it. I think this could be avoided by a more sophisticated locking scheme. Instead of waking up all the people trying to do ProcArrayEndTransaction() and letting them all run simultaneously, wake up one of them. That one guy goes and clears all the XID fields and updates latestCompletedXid, and then wakes up all the others (who now don't even need to reacquire the spinlock to "release" the lock, because they never really held it in the first place, but yet the work they needed done is done). The trick is to make something like that work within the confines of the LWLock mechanism. It strikes me that we have a number of places in the system where it would be useful to leverage the queuing and error handling facilities that the lwlock mechanism provides, but have different rules for handling lock conflicts - either different lock modes, or request combining, or whatever. lwlock.c is an awfully big chunk of code to cut-and-paste if you need an lwlock with three modes, or some primitive that has behavior similar to an lwlock overall but with some differences in detail. I wonder if there's a way that we could usefully refactor things to make that sort of thing easier. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
hi, > I've been playing with the attached patch, which adds an additional > light-weight lock mode, LW_SHARED2. LW_SHARED2 conflicts with > LW_SHARED and LW_EXCLUSIVE, but not with itself. The patch changes > ProcArrayEndTransaction() to use this new mode. IOW, multiple > processes can commit at the same time, and multiple processes can take > snapshots at the same time, but nobody can take a snapshot while > someone else is committing. > > Needless to say, I don't we'd really want to apply this, because > adding a LW_SHARED2 mode that's probably only useful for ProcArrayLock > would be a pretty ugly wart. But the results are interesting. > pgbench, scale factor 100, unlogged tables, Nate Boley's 32-core AMD > box, shared_buffers = 8GB, maintenance_work_mem = 1GB, > synchronous_commit = off, checkpoint_segments = 300, > checkpoint_timeout = 15min, checkpoint_completion_target = 0.9, > wal_writer_delay = 20ms, results are median of three five-minute runs: > > #clients tps(master) tps(lwshared2) > 1 657.984859 683.251582 > 8 4748.906750 4946.069238 > 32 10695.160555 17530.390578 > 80 7727.563437 16099.549506 > > That's a pretty impressive speedup, but there's trouble in paradise. > With 80 clients (but not 32 or fewer), I occasionally get the > following error: > > ERROR: t_xmin is uncommitted in tuple to be updated > > So it seems that there's some way in which this locking is actually > incorrect, though I'm not seeing what it is at the moment. Either > that, or there's some bug in the existing code that happens to be > exposed by this change. > > The patch also produces a (much smaller) speedup with regular tables, > but it's hard to know how seriously to take that until the locking > issue is debugged. > > Any ideas? latestCompletedXid got backward due to concurrent updates and it fooled TransactionIdIsInProgress? YAMAMOTO Takashi > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company