Re: CSStorm occurred again by postgreSQL8.2
От | ITAGAKI Takahiro |
---|---|
Тема | Re: CSStorm occurred again by postgreSQL8.2 |
Дата | |
Msg-id | 20060807114525.5266.ITAGAKI.TAKAHIRO@oss.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: CSStorm occurred again by postgreSQL8.2. (Re: poor performance with Context Switch Storm at TPC-W.) (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: CSStorm occurred again by postgreSQL8.2
Re: CSStorm occurred again by postgreSQL8.2 |
Список | pgsql-hackers |
Tom Lane <tgl@sss.pgh.pa.us> wrote: > > It does not solve, even if it increases the number of NUM_SUBTRANS_BUFFERS. > > The problem was only postponed. > > Can you provide a reproducible test case for this? This is the reproducible test case: - Occurs on both 8.1.4 and HEAD. - On smp machine. I used dual opterons. CSStrom becomes worse on dual xeon with hyper-threading. - Tuning parameters are default. Whole data are cached in shared buffers. (shared_buffers=32MB, data of pgbench (scale=1)are less than 15MB.) - Using custom pgbench. One client doing UPDATE with indexscan and multiple clients doing SELECT with seqscan/indexscan. $ pgbench -i $ pgbench -n -c 1 -t 100000 -f cs_update.sql & $ pgbench -n -c 50 -t 100000 -f cs_indexscan.sql & $ pgbench -n -c 35 -t 100000 -f cs_seqscan.sql & (The scripts are attached at end of this message.) In above workload, context switches are 2000-10000/sec and cpu usage is user=100%. Then, start a long open transaction on another connection. $ psql # begin; -- Long open transaction After a lapse of 30-60 seconds, context switches become 50000/sec over (120000 over on xeons) and cpu usage is user=66% / sys=21% / idle=13%. If we increase the frequency of UPDATE, the duration becomes shorter. This is a human-induced workload, but I can see the same condition in TPC-W -- even though it is a benchmark. TPC-W requires full-text search and it is implementd using "LIKE %foo%" in my implementation (DBT-1, too). Also, it requires periodical aggregations. They might behave as long transactions. The cause seems to be a lock contention. The number of locks on SubtransControlLock and SubTransBuffer are significantly increased by comparison with BufMappingLocks. # Before starting a long transaction.kind | lwlock | sh_call | sh_wait | ex_call | ex_wait ------+---------------------+----------+---------+---------+--------- 13 | SubtransControlLock | 28716 | 2 | 54 | 0 22 | BufMappingLock | 11637884 | 0 | 2492 | 0 27 | SubTransBuffer | 0 | 0 | 11 | 0 # Afterkind | lwlock | sh_call | sh_wait | ex_call | ex_wait ------+---------------------+----------+---------+---------+--------- 13 | SubtransControlLock | 4139111 | 65059 | 3926691| 390838 22 | BufMappingLock | 32348073 | 0 | 2509 | 0 27 | SubTransBuffer | 939646| 960341 | 1419152 | 61 The invokers of SubTrans module are two SubTransGetTopmostTransaction() in HeapTupleSatisfiesSnapshot(). When I disabled the calls, CSStorm did not occur. SubTransGetTopmostTransaction returns the argument without change when we don't use SAVEPOINTs. If we optimize for non-subtransactions, we can avoid to lock SubTrans for check visiblities of tuples inserted by top transactions. If we want to resolve the probmen fundamentally, we might have to improve SubTrans using a better buffer management algorithm or so. Do you have any idea to avoid such a problem? -- cs_update.sql \set naccounts 100000 * :tps \setrandom aid 1 :naccounts \setrandom delta -5000 5000 UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid; SELECT pg_sleep(0.1); -- cs_seqscan.sql \set naccounts 100000 * :tps \setrandom aid 1 :naccounts SELECT abalance FROM accounts WHERE aid::int8 = :aid; -- cast to force seqscan -- cs_indexscan.sql \set naccounts 100000 * :tps \setrandom aid 1 :naccounts SELECT abalance FROM accounts WHERE aid = :aid; Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
В списке pgsql-hackers по дате отправления: