Re: CSStorm occurred again by postgreSQL8.2

Поиск
Список
Период
Сортировка
От ITAGAKI Takahiro
Тема Re: CSStorm occurred again by postgreSQL8.2
Дата
Msg-id 20060807114525.5266.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
обсуждение исходный текст
Ответ на Re: CSStorm occurred again by postgreSQL8.2. (Re: poor performance with Context Switch Storm at TPC-W.)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: CSStorm occurred again by postgreSQL8.2
Re: CSStorm occurred again by postgreSQL8.2
Список pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > It does not solve, even if it increases the number of NUM_SUBTRANS_BUFFERS.
> > The problem was only postponed.
> 
> Can you provide a reproducible test case for this?

This is the reproducible test case:
- Occurs on both 8.1.4 and HEAD.
- On smp machine. I used dual opterons. CSStrom becomes worse on dual xeon with hyper-threading.
- Tuning parameters are default. Whole data are cached in shared buffers. (shared_buffers=32MB, data of pgbench
(scale=1)are less than 15MB.)
 
- Using custom pgbench. One client doing UPDATE with indexscan and multiple clients doing SELECT with
seqscan/indexscan.

$ pgbench -i
$ pgbench -n -c  1 -t 100000 -f cs_update.sql    &
$ pgbench -n -c 50 -t 100000 -f cs_indexscan.sql &
$ pgbench -n -c 35 -t 100000 -f cs_seqscan.sql   &
(The scripts are attached at end of this message.)

In above workload, context switches are 2000-10000/sec and cpu usage is
user=100%. Then, start a long open transaction on another connection.

$ psql
# begin; -- Long open transaction

After a lapse of 30-60 seconds, context switches become 50000/sec over
(120000 over on xeons) and cpu usage is user=66% / sys=21% / idle=13%.
If we increase the frequency of UPDATE, the duration becomes shorter.


This is a human-induced workload, but I can see the same condition in
TPC-W -- even though it is a benchmark. TPC-W requires full-text search
and it is implementd using "LIKE %foo%" in my implementation (DBT-1, too).
Also, it requires periodical aggregations. They might behave as long
transactions.


The cause seems to be a lock contention. The number of locks on
SubtransControlLock and SubTransBuffer are significantly increased
by comparison with BufMappingLocks.

# Before starting a long transaction.kind |        lwlock       | sh_call  | sh_wait | ex_call | ex_wait
------+---------------------+----------+---------+---------+---------  13 | SubtransControlLock |    28716 |       2 |
   54 |       0  22 | BufMappingLock      | 11637884 |       0 |    2492 |       0  27 | SubTransBuffer      |        0
|      0 |      11 |       0
 

# Afterkind |        lwlock       | sh_call  | sh_wait | ex_call | ex_wait
------+---------------------+----------+---------+---------+---------  13 | SubtransControlLock |  4139111 |   65059 |
3926691|  390838  22 | BufMappingLock      | 32348073 |       0 |    2509 |       0  27 | SubTransBuffer      |
939646|  960341 | 1419152 |      61
 



The invokers of SubTrans module are two SubTransGetTopmostTransaction()
in HeapTupleSatisfiesSnapshot(). When I disabled the calls, CSStorm did
not occur. SubTransGetTopmostTransaction returns the argument without
change when we don't use SAVEPOINTs.

If we optimize for non-subtransactions, we can avoid to lock SubTrans
for check visiblities of tuples inserted by top transactions.
If we want to resolve the probmen fundamentally, we might have to
improve SubTrans using a better buffer management algorithm or so.

Do you have any idea to avoid such a problem?



-- cs_update.sql
\set naccounts 100000 * :tps
\setrandom aid 1 :naccounts
\setrandom delta -5000 5000
UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT pg_sleep(0.1);

-- cs_seqscan.sql
\set naccounts 100000 * :tps
\setrandom aid 1 :naccounts
SELECT abalance FROM accounts WHERE aid::int8 = :aid; -- cast to force seqscan

-- cs_indexscan.sql
\set naccounts 100000 * :tps
\setrandom aid 1 :naccounts
SELECT abalance FROM accounts WHERE aid = :aid;


Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: 'startup waiting' status message
Следующее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: 8.2 features status