Re: MultiXact\SLRU buffers configuration

Поиск
Список
Период
Сортировка
От Andrey M. Borodin
Тема Re: MultiXact\SLRU buffers configuration
Дата
Msg-id 801D899E-9CC5-4FEE-AAA4-16D87853FD45@yandex-team.ru
обсуждение исходный текст
Ответ на MultiXact\SLRU buffers configuration  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
Ответы Re: MultiXact\SLRU buffers configuration  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
Список pgsql-hackers

> 8 мая 2020 г., в 21:36, Andrey M. Borodin <x4mmm@yandex-team.ru> написал(а):
>
> *** The problem ***
> I'm investigating some cases of reduced database performance due to MultiXactOffsetLock contention (80%
MultiXactOffsetLock,20% IO DataFileRead). 
> The problem manifested itself during index repack and constraint validation. Both being effectively full table scans.
> The database workload contains a lot of select for share\select for update queries. I've tried to construct synthetic
worldgenerator and could not achieve similar lock configuration: I see a lot of different locks in wait events,
particularlya lot more MultiXactMemberLocks. But from my experiments with synthetic workload, contention of
MultiXactOffsetLockcan be reduced by increasing NUM_MXACTOFFSET_BUFFERS=8 to bigger numbers. 
>
> *** Question 1 ***
> Is it safe to increase number of buffers of MultiXact\All SLRUs, recompile and run database as usual?
> I cannot experiment much with production. But I'm mostly sure that bigger buffers will solve the problem.
>
> *** Question 2 ***
> Probably, we could do GUCs for SLRU sizes? Are there any reasons not to do them configurable? I think multis, clog,
subtransactionsand others will benefit from bigger buffer. But, probably, too much of knobs can be confusing. 
>
> *** Question 3 ***
> MultiXact offset lock is always taken as exclusive lock. It turns MultiXact Offset subsystem to single threaded. If
someonehave good idea how to make it more concurrency-friendly, I'm willing to put some efforts into this. 
> Probably, I could just add LWlocks for each offset buffer page. Is it something worth doing? Or are there any hidden
caversand difficulties? 

I've created benchmark[0] imitating MultiXact pressure on my laptop: 7 clients are concurrently running select "select
*from table where primary_key = ANY ($1) for share" where $1 is array of identifiers so that each tuple in a table is
lockedby different set of XIDs. During this benchmark I observe contention of MultiXactControlLock in pg_stat_activity 

                                    пятница,  8 мая 2020 г. 15:08:37 (every 1s)

  pid  |         wait_event         | wait_event_type | state  |                       query
-------+----------------------------+-----------------+--------+----------------------------------------------------
 41344 | ClientRead                 | Client          | idle   | insert into t1 select generate_series(1,1000000,1)
 41375 | MultiXactOffsetControlLock | LWLock          | active | select * from t1 where i = ANY ($1) for share
 41377 | MultiXactOffsetControlLock | LWLock          | active | select * from t1 where i = ANY ($1) for share
 41378 |                            |                 | active | select * from t1 where i = ANY ($1) for share
 41379 | MultiXactOffsetControlLock | LWLock          | active | select * from t1 where i = ANY ($1) for share
 41381 |                            |                 | active | select * from t1 where i = ANY ($1) for share
 41383 | MultiXactOffsetControlLock | LWLock          | active | select * from t1 where i = ANY ($1) for share
 41385 | MultiXactOffsetControlLock | LWLock          | active | select * from t1 where i = ANY ($1) for share
(8 rows)

Finally, the benchmark is measuring time to execute select for update 42 times.

I've went ahead and created 3 patches:
1. Configurable SLRU buffer sizes for MultiXacOffsets and MultiXactMembers
2. Reduce locking level to shared on read of MultiXactId members
3. Configurable cache size

I've found out that:
1. When MultiXact working set does not fit into buffers - benchmark results grow very high. Yet, very big buffers slow
downbenchmark too. For this benchmark optimal SLRU size id 32 pages for offsets and 64 pages for members (defaults are
8and 16 respectively). 
2. Lock optimisation increases performance by 5% on default SLRU sizes. Actually, benchmark does not explicitly read
MultiXactIdmembers, but when it replaces one with another - it have to read previous set. I understand that we can
constructbenchmark to demonstrate dominance of any algorithm and 5% of synthetic workload is not a very big number. But
itjust make sense to try to take shared lock for reading. 
3. Manipulations with cache size do not affect benchmark anyhow. It's somewhat expected: benchmark is designed to
defeatcache, either way OffsetControlLock would not be stressed. 

For our workload, I think we will just increase numbers of SLRU sizes. But patchset may be useful for tuning and as a
performanceoptimisation of MultiXact. 

Also MultiXacts seems to be not very good fit into SLRU design. I think it would be better to use B-tree as a
container.Or at least make MultiXact members extendable in-place (reserve some size when multixact is created). 
When we want to extend number of locks for a tuple currently we will:
1. Iterate through all SLRU buffers for offsets to read current offset (with exclusive lock for offsets)
2. Iterate through all buffers for members to find current members (with exclusive lock for members)
3. Create new members array with +1 xid
4. Iterate through all cache members to find out maybe there are any such cache item as what we are going to create
5. iterate over 1 again for write
6. Iterate over 2 again for write

Obviously this does not scale well - we cannot increase SLRU sizes for too long.

Thanks! I'd be happy to hear any feedback.

Best regards, Andrey Borodin.

[0] https://github.com/x4m/multixact_stress

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: Index Skip Scan
Следующее
От: Dmitry Dolgov
Дата:
Сообщение: Re: Index Skip Scan