Re: Built-in connection pooling

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: Built-in connection pooling
Дата
Msg-id a1fdf4eb-4ee0-b35e-34e9-36fadfcbceff@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Built-in connection pooling  (Bruce Momjian <bruce@momjian.us>)
Ответы Re: Built-in connection pooling
Список pgsql-hackers


On 28.01.2018 03:40, Bruce Momjian wrote:
On Mon, Jan 22, 2018 at 06:51:08PM +0100, Tomas Vondra wrote:
Yes, external connection pooling is more flexible. It allows to 
perform pooling either at client side either at server side (or even 
combine two approaches).>
Also external connection pooling for PostgreSQL is not limited by 
pgbouncer/pgpool.>
There are many frameworks maintaining their own connection pool, for 
example J2EE, jboss, hibernate,...>
I have a filling than about 70% of enterprise systems working with 
databases are written in Java and doing connection pooling in their 
own way.>
True, but that does not really mean we don't need "our" connection
pooling (built-in or not). The connection pools are usually built into
the application servers, so each application server has their own
independent pool. With larger deployments (a couple of application
servers) that quickly causes problems with max_connections.
I found this thread and the pthread thread very interesting.  Konstantin,
thank you for writing prototypes and giving us very useful benchmarks
for ideas I thought I might never see.

As much as I would like to move forward with coding, I would like to
back up and understand where we need to go with these ideas.

First, it looks like pthreads and a builtin pooler help mostly with
1000+ connections.  It seems like you found that pthreads wasn't
sufficient and the builtin pooler was better.  Is that correct?

Brief answer is yes.
Pthreads allows to minimize per-connection overhead and make it possible to obtain better results for large number of connections.
But there is a principle problem: Postgres connection is "heave weight"  object: each connection maintains it own private cache of catalog, relations, temporary
table pages, prepared statements,... So even through pthreads allows to minimize per-connection memory usage, it is negligible comparing with all this connection
private memory resources.  It means that we still need to use connection pooling.

Pthreads provides two main advantages:
1. Simplify interaction between different workers: on need to use shared memory with it's fixed size limitation and
impossibility to use normal pointer for dynamic shared memory. Also no need to implement specialized memory allocator for shared memory.
It makes implementation of parallel query execution and built-on connection pooling much easier.
2. Optimize virtual-to-physical address translation. There is no need to maintain separate address space for each backend, so TLB
(translation lookaside buffercan) becomes more efficient.

So it is not completely correct to consider session pooling as alternative to pthreads.
Ideally this two approaches should be combined.


Is there anything we can do differently about allowing long-idle
connections to reduce their resource usage, e.g. free their caches? 
Remove from PGPROC?  Could we do it conditionally, e.g. only sessions
that don't have open transactions or cursors?
I think that the best approach is to switch to global (shared) caches for execution plans, catalog,...
Most of the time this metadata caches are used to be identical for all clients. So it is just waste of memory and time to maintain them separately in each backend.
Certainly shared cached requires some synchronization when can be a point of contention and cause significant degrade of performance.
But taking in account that metadata is updated much rarely than data, I hope using copy-on-write and atomic operations can help to solve this problems.
And in can give a lot of different advantages. For example it will be possible to spend more time in optimizer for detecting optimal execution plan and store manually plans for
future use.

It feels like user and db mismatches are always going to cause pooling
problems.  Could we actually exit and restart connections that have
default session state?

Well, combining multiuser access and connection pooling is really a challenged problem.
I do not know the best solution for it now. It will be much simpler to find solution with pthreads model...

Most of enterprise systems are using pgbouncer or similar connection pooler. In pgbouncer in statement/transaction pooling mode access to the database is performed under the same user. So it means that many existed statements are built in the assumption that database is accessed in this manner.

Concerning "default session state": one of the main drawbacks of pgbouncer and other external poolers is that them do not allow to use prepared statements.
And it leads to up to two times performance penalty on typical OLTP queries. One of the main ideads of built-on session pooling was to eliminate such limitation.


Right now, if you hit max_connections, we start rejecting new
connections.  Would it make sense to allow an option to exit idle
connections when this happens so new users can connect?
It will require changes in client applications, will not it? Them should be ready that connection can be dropped by server at any moment of time.
I do not know it is possible to drop idle connection and hide this fact from the client. In my implementation each session keeps minimal necessary information requires for interaction with client (session context).  It includes socket, struct Port and session memory context which should be used instead of TopMemoryContext for session specific data.


I know we have relied on external connection poolers to solve all the
high connection problems but it seems there might be simple things we
can do to improve matters.  FYI, I did write a blog entry comparing
external and internal connection poolers:
https://momjian.us/main/blogs/pgblog/2017.html#April_21_2017


I completely agree with your arguments in this post.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: [HACKERS] proposal - Default namespaces for XPath expressions(PostgreSQL 11)
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: PATCH: pgbench - option to build using ppoll() for largerconnection counts