Re: Postgres with pthread

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: Postgres with pthread
Дата
Msg-id 8c9212eb-cb6f-1cfd-9fce-84ec01390b20@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Postgres with pthread  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Ответы Re: Postgres with pthread  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: Postgres with pthread  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Список pgsql-hackers
I continue experiments with my pthread prototype.
Latest results are the following:

1. I have eliminated all (I hope) calls of non-reentrant functions 
(getopt, setlocale, setitimer, localtime, ...). So now parallel tests 
are passed.

2. I have implemented deallocation of top memory context (at thread 
exit) and cleanup of all opened file descriptors.
I have to replace several place where malloc is used with top_malloc: 
allocation in top context.

3. Now my prototype is passing all regression tests now. But handling of 
errors is still far from completion.

4. I have performed experiments with replacing synchronization 
primitives used in Postgres with pthread analogues.
Unfortunately it has almost now influence on performance.

5. Handling large number of connections.
The maximal number of postgres connections is almost the same: 100k.
But memory footprint in case of pthreads was significantly smaller: 18Gb 
vs 38Gb.
And difference in performance was much higher: 60k TPS vs . 600k TPS.
Compare it with performance for 10k clients: 1300k TPS.
It is read-only pgbench -S test with 1000 connections.
As far as pgbench doesn't allow to specify more than 1000 clients, I 
spawned several instances of pgbench.

Why handling large number of connections is important?
It allows applications to access postgres directly, not using pgbouncer 
or any other external connection pooling tool.
In this case an application can use prepared statements which can reduce 
speed of simple queries almost twice.

Unfortunately Postgres sessions are not lightweight. Each backend 
maintains its private catalog and relation caches, prepared statement 
cache,...
For real database size of this caches in memory will be several 
megabytes and warming this caches can take significant amount of time.
So if we really want to support large number of connections, we should 
rewrite caches to be global (shared).
It will allow to save a lot of memory but add synchronization overhead. 
Also at NUMA private caches may be more efficient than one global cache.

My proptotype can be found at: 
git://github.com/postgrespro/postgresql.pthreads.git


-- 

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl