Обсуждение: lwlock contention with SSI
About a month ago, I told Kevin Grittner in an off-list conversation that I'd work on providing him with some statistics about lwlock contention under SSI. I then ran a benchmark on a 16-core, 64-hardware thread IBM server, testing read-only pgbench performance at scale factor 300 with 1, 8, and 32 clients (and an equal number of client threads). The following non-default configuration settings were used: shared_buffers = 8GB maintenance_work_mem = 1GB synchronous_commit = off checkpoint_segments = 300 checkpoint_timeout = 15min checkpoint_completion_target = 0.9 log_line_prefix = '%t [%p] ' default_transaction_isolation = 'serializable' max_pred_locks_per_transaction = 1000 After running the test, I dropped the ball for a month. But, picking it back up again, here are the results. I've taken the data that is produced by LWLOCK_STATS, and I've filtered it down by consolidating entries for the same lock across all PIDs that show up in the log file. Then, I've omitted all entries where blk == 0 and spindelay == 0, because those locks were never contended, so they're boring; and also because including them makes the results too long to make sense of. The results for the remaining locks are attached as three files, based on the number of clients. The really revealing entries, IMHO, are these results from the 32-client test: lwlock 28: shacq 86952211 exacq 257812441 blk 35212093 spindelay 40811 lwlock 29: shacq 0 exacq 87516792 blk 31177203 spindelay 10038 lwlock 30: shacq 227960353 exacq 0 blk 0 spindelay 10711 These locks are all SSI-related and they're all really hot. Lock 28 is SerializableXactHashLock and lock 29 is SerializableFinishedListLock; both are acquired an order of magnitude more often than any non-SSI lock, and cause two orders of magnitude more blocking than any other lock whatsoever. Lock 30 is SerializablePredicateLockListLock, which has no exclusive lock acquisitions at all on this test, but the shared acquisitions result in significant spinlock contention. This latter problem could probably be ameliorated with a reader/writer lock (a primitive we don't currently have in Postgres, but you could build one up using lwlocks), but it's unlikely to make much difference without doing something about SerializableXactHashLock and SerializableFinishedListLock first. Once you get past these big three, there's also a ton of blocking on the PredicateLockMgrLocks, which seem only ever to be acquired in exclusive mode, but it's not nearly as bad due to the 16-way partitioning. [Obligatory disclaimer: This has nothing to do with 9.3 and is not intended to distract attention therefrom.] -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Tue, Apr 09, 2013 at 07:49:51PM -0400, Robert Haas wrote: > These locks are all SSI-related and they're all really hot. Lock 28 > is SerializableXactHashLock and lock 29 is > SerializableFinishedListLock; both are acquired an order of magnitude > more often than any non-SSI lock, and cause two orders of magnitude > more blocking than any other lock whatsoever. Lock 30 is > SerializablePredicateLockListLock, which has no exclusive lock > acquisitions at all on this test, but the shared acquisitions result > in significant spinlock contention. This matches what I saw when I looked into this a while ago. I even started sketching out some plans of how we might deal with it, but unfortunately I never had much time to work on it, and that seems unlikely to change any time soon. :-\ As it is, pretty much any operation involving SSI requires acquiring SerializableXactHashLock (usually exclusive), except for checking whether a read or write indicates a conflict. That includes starting and ending a transaction. Two things make this hard to fix:- SSI is about checking for rw-conflicts, which are inherently about *pairs* of transactions.This makes it hard to do fine-grained locking, because a lot of operations involve looking at or modifying the conflict list of more than one transaction.- SerializableXactHashLock protects many things. Besides the SERIALIZABLEXACTstructures themselves, there's also the free lists for SERIALIZABLEXACTs and RWConflicts, the SerializableXidHash table, the latest SxactCommitSeqno and SxactGlobalXmin, etc. I'm trying to swap back in my notes about how to address this. It is bound to be a substantial project, however. Dan -- Dan R. K. Ports UW CSE http://drkp.net/
Robert Haas <robertmhaas@gmail.com> wrote: > About a month ago, I told Kevin Grittner in an off-list conversation > that I'd work on providing him with some statistics about lwlock > contention under SSI. I then ran a benchmark on a 16-core, > 64-hardware thread IBM server, testing read-only pgbench performance > at scale factor 300 with 1, 8, and 32 clients (and an equal number of > client threads). I hate to say this when I know how much work benchmarking is, but I don't think any benchmark of serializable transactions has very much value unless you set any transactions which don't write to READ ONLY. I guess it shows how a naive conversion by someone who doesn't read the docs or chooses to ignore the advice on how to get good performance will perform, but how interesting is that? It might be worth getting TPS numbers from the worst-looking test from this run, but with the read-only run done after changing default_transaction_read_only = on. Some shops using serializable transactions set that in the postgresql.conf file, and require that any transaction which will be modifying data override it. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Oct 7, 2014 at 2:40 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: >> About a month ago, I told Kevin Grittner in an off-list conversation >> that I'd work on providing him with some statistics about lwlock >> contention under SSI. I then ran a benchmark on a 16-core, >> 64-hardware thread IBM server, testing read-only pgbench performance >> at scale factor 300 with 1, 8, and 32 clients (and an equal number of >> client threads). > > I hate to say this when I know how much work benchmarking is, but I > don't think any benchmark of serializable transactions has very > much value unless you set any transactions which don't write to > READ ONLY. I guess it shows how a naive conversion by someone who > doesn't read the docs or chooses to ignore the advice on how to get > good performance will perform, but how interesting is that? > > It might be worth getting TPS numbers from the worst-looking test > from this run, but with the read-only run done after changing > default_transaction_read_only = on. Some shops using serializable > transactions set that in the postgresql.conf file, and require that > any transaction which will be modifying data override it. Well, we could do that. But I'm not sure it's very realistic. The pgbench workload is either 100% write or 100% read, but most real work-loads are mixed; say, 95% read, 5% write. If the client software has to be responsible for flipping default_transaction_read_only for every write transaction, or just doing BEGIN TRANSACTION READ WRITE and COMMIT around each otherwise-single-statement write transaction, that's a whole bunch of extra server round trips and complexity that most people are not going to want to bother with. We can tell them that they have to do it anyway, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Oct 7, 2014 at 2:40 PM, Kevin Grittner <kgrittn@ymail.com> wrote: >> Robert Haas <robertmhaas@gmail.com> wrote: >>> About a month ago, I told Kevin Grittner in an off-list conversation >>> that I'd work on providing him with some statistics about lwlock >>> contention under SSI. I then ran a benchmark on a 16-core, >>> 64-hardware thread IBM server, testing read-only pgbench performance >>> at scale factor 300 with 1, 8, and 32 clients (and an equal number of >>> client threads). >> >> I hate to say this when I know how much work benchmarking is, but I >> don't think any benchmark of serializable transactions has very >> much value unless you set any transactions which don't write to >> READ ONLY. I guess it shows how a naive conversion by someone who >> doesn't read the docs or chooses to ignore the advice on how to get >> good performance will perform, but how interesting is that? >> >> It might be worth getting TPS numbers from the worst-looking test >> from this run, but with the read-only run done after changing >> default_transaction_read_only = on. Some shops using serializable >> transactions set that in the postgresql.conf file, and require that >> any transaction which will be modifying data override it. > > Well, we could do that. But I'm not sure it's very realistic. The > pgbench workload is either 100% write or 100% read, but most real > work-loads are mixed; say, 95% read, 5% write. If the client software > has to be responsible for flipping default_transaction_read_only for > every write transaction, or just doing BEGIN TRANSACTION READ WRITE > and COMMIT around each otherwise-single-statement write transaction, > that's a whole bunch of extra server round trips and complexity that > most people are not going to want to bother with. Well, people using serializable transactions have generally opted to deal with that rather than using SELECT ... FOR UPDATE, LOCK TABLE, etc. There's no free lunch, and changing BEGIN to BEGIN TRANSACTION READ WRITE for those transactions which are expected to write data is generally a lot less bother than the other. In fact, most software I have seen using this has a transaction manager in the Java code which pays attention to the definition of each type of transaction -- so you override a default in a declaration. > We can tell them that they have to do it anyway, of course. The docs already recommend it. I really would like to see the LW locking issues in SSI brought up-to-date with the rest of the code, but I would rather focus on the bottlenecks where people are fundamentally using good technique rather than cases where they are not following the advice in the docs[1], and doing so would massively boost performance without any change to PostgreSQL. A paper by the University of Sidney[2] found that in their tests the bottleneck was the linked lists which track read-write dependencies, reporting that at a concurrency of 128, "Our profiling showed that PostgreSQL spend 2.3% of the overall runtime in traversing these list, plus 10% of its runtime waiting on the corresponding kernel mutexes." This list is covered by SerializableXactHashLock, so either or both of converting this to something which is not O(N^2) or using lock-free access would probably make a big difference in contention at higher concurrency. (I think we may be able to do one or the other, but not both.) Further tests may identify other bottlenecks in reasonable workloads, but this seems sure to be one which needs attention. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] http://www.postgresql.org/docs/current/interactive/transaction-iso.html#XACT-SERIALIZABLE [2] Hyungsoo Jung, Hyuck Han, Alan Fekete, Uwe Röhm, and Heon Y. Yeom. Performance of Serializable Snapshot Isolation on Multicore Servers. Technical Report 693, The University of Sidney School of Information Technologies, December, 2012. http://sydney.edu.au/engineering/it/research/tr/tr693.pdf (Quote is from section 5.2, Shared System Data Structures, subsection PostgreSQL.)
On October 7, 2014 10:06:25 PM CEST, Kevin Grittner <kgrittn@ymail.com> wrote: >Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Oct 7, 2014 at 2:40 PM, Kevin Grittner <kgrittn@ymail.com> >wrote: >>> Robert Haas <robertmhaas@gmail.com> wrote: >>>> About a month ago, I told Kevin Grittner in an off-list >conversation >>>> that I'd work on providing him with some statistics about lwlock >>>> contention under SSI. I then ran a benchmark on a 16-core, >>>> 64-hardware thread IBM server, testing read-only pgbench >performance >>>> at scale factor 300 with 1, 8, and 32 clients (and an equal number >of >>>> client threads). >>> >>> I hate to say this when I know how much work benchmarking is, but I >>> don't think any benchmark of serializable transactions has very >>> much value unless you set any transactions which don't write to >>> READ ONLY. I guess it shows how a naive conversion by someone who >>> doesn't read the docs or chooses to ignore the advice on how to get >>> good performance will perform, but how interesting is that? >>> >>> It might be worth getting TPS numbers from the worst-looking test >>> from this run, but with the read-only run done after changing >>> default_transaction_read_only = on. Some shops using serializable >>> transactions set that in the postgresql.conf file, and require that >>> any transaction which will be modifying data override it. >> >> Well, we could do that. But I'm not sure it's very realistic. The >> pgbench workload is either 100% write or 100% read, but most real >> work-loads are mixed; say, 95% read, 5% write. If the client >software >> has to be responsible for flipping default_transaction_read_only for >> every write transaction, or just doing BEGIN TRANSACTION READ WRITE >> and COMMIT around each otherwise-single-statement write transaction, >> that's a whole bunch of extra server round trips and complexity that >> most people are not going to want to bother with. > >Well, people using serializable transactions have generally opted >to deal with that rather than using SELECT ... FOR UPDATE, LOCK >TABLE, etc. There's no free lunch, and changing BEGIN to BEGIN >TRANSACTION READ WRITE for those transactions which are expected to >write data is generally a lot less bother than the other. Then it really shouldn't have supplanted the old pseudo serializable (aka repeatable read). There's software where something like this is easy. But I think it's not that largely overlapping with the set of devs whereserializable is the easier way. Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone.