Re: Built-in connection pooling

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: Built-in connection pooling
Дата
Msg-id d8d92459-b9e8-b89a-6a9e-2f497adf2a69@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Built-in connection pooling  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
Ответы Re: Built-in connection pooling
Список pgsql-hackers


On 01.02.2018 16:33, Vladimir Sitnikov wrote:
Konstantin>I have not built YCSB myself, use existed installation.

Which pgjdbc version was in use?

postgresql-9.4.1212.jar

Konstantin>One of the main problems of Postgres is significant degrade of performance in case of concurrent write access by multiple transactions to the same sows.

I would consider that a workload "problem" rather than PostgreSQL problem.
That is, if an application (e.g. YCSB) is trying to update the same rows in multiple transactions concurrently, then the outcome of such updates is likely to be unpredictable. Does it make sense?

I can't agree with you.
Yes, there are workloads where updates are more or less local: clients are used to update their own private data.
But there are many systems  with "shared" resources which are concurrently accessed by different users. They may just increment access count or perform deposit/withdraw...
Just simple example: consider that you have something like AppStore and there is some popular application which is bought by a lot of users.
From DBMS point of view a lot of clients perform concurrent update of the same record.
So performance on such workload is also very important. And unfortunately here Postgres loses to the competition with mySQL and most of other DBMSes.


At least, I do not see why Mongo would degrade in a different way there. Oleg's charts suggest that Mongo does not degrade there, so I wonder if we compare apples to apples in the first place.

Postgres locks tuples in very inefficient way in case of high contention.
It first lock buffer and checks if tuple is locked by some other backend.
Then it tries to set heavy weight lock on the tuple's tcid. If there are several processes trying update this tuple, then all of them will be queued on this heavy-weight tuple lock.
After getting this tuple lock, backend tries to lock tid of transaction which updated the tuple.
Once transaction updated this tuple is completed, Postgres unblocks backends waiting for this transaction. It checks status of the tuple and release tuple's lock, awaken one of waiting clients.
As far as Postgres  using MVCC, it creates new version of the tuple on each update.
So the tuple all clients are waiting for is not the last version of of the tuple any more.
Depending on isolation policy them either need to report error (in case of repeatable read) or update snapshot and repeat search with new snapshot...
and perform all checks and locks mentioned above once again.

I hope that it is clear from this brief and not so precise explanation that Postgres has to do a lot of redundant work if several client are competing for the same tuple.
There is well known rule that pessimistic locking is more efficient than optimistic in case of high contention.
So Postgres can provide better performance on this workload if it be more pessimistic:
set lock not on TCID (identifier of particular tuple version), but on tuple's PK (primary key) and hold it till end of the transaction (because until transaction is completed nobody still be
able to update this tuple). This trick with locking PK really helps to improve performance on this workload, but unfortunately can not reverse the trend with the degradation of performance with increasing number of competing transactions.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: unique indexes on partitioned tables
Следующее
От: Mark Rofail
Дата:
Сообщение: Re: [HACKERS] GSoC 2017: Foreign Key Arrays