Re: Postgres with pthread

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: Postgres with pthread
Дата
Msg-id CAFj8pRB39Hyeu-1wohOzx6icH8+dS0dNdqc5Pi-JRYS=aj50Pw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Postgres with pthread  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Список pgsql-hackers


2017-12-21 14:25 GMT+01:00 Konstantin Knizhnik <k.knizhnik@postgrespro.ru>:
I continue experiments with my pthread prototype.
Latest results are the following:

1. I have eliminated all (I hope) calls of non-reentrant functions (getopt, setlocale, setitimer, localtime, ...). So now parallel tests are passed.

2. I have implemented deallocation of top memory context (at thread exit) and cleanup of all opened file descriptors.
I have to replace several place where malloc is used with top_malloc: allocation in top context.

3. Now my prototype is passing all regression tests now. But handling of errors is still far from completion.

4. I have performed experiments with replacing synchronization primitives used in Postgres with pthread analogues.
Unfortunately it has almost now influence on performance.

5. Handling large number of connections.
The maximal number of postgres connections is almost the same: 100k.
But memory footprint in case of pthreads was significantly smaller: 18Gb vs 38Gb.
And difference in performance was much higher: 60k TPS vs . 600k TPS.
Compare it with performance for 10k clients: 1300k TPS.
It is read-only pgbench -S test with 1000 connections.
As far as pgbench doesn't allow to specify more than 1000 clients, I spawned several instances of pgbench.

Why handling large number of connections is important?
It allows applications to access postgres directly, not using pgbouncer or any other external connection pooling tool.
In this case an application can use prepared statements which can reduce speed of simple queries almost twice.

What I know MySQL has not good experience with high number of threads - and there is thread pool in enterprise (and now in Mariadb0 versions.

Regards

Pavel


Unfortunately Postgres sessions are not lightweight. Each backend maintains its private catalog and relation caches, prepared statement cache,...
For real database size of this caches in memory will be several megabytes and warming this caches can take significant amount of time.
So if we really want to support large number of connections, we should rewrite caches to be global (shared).
It will allow to save a lot of memory but add synchronization overhead. Also at NUMA private caches may be more efficient than one global cache.

My proptotype can be found at: git://github.com/postgrespro/postgresql.pthreads.git


--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: Basebackups reported as idle