Re: Built-in connection pooling

Поиск

Список

Период

Сортировка

От	Ryan Pedela
Тема	Re: Built-in connection pooling
Дата	10 февраля 2018 г. 02:41:31
Msg-id	CACu89FSasoBzL7vaCq7nxDUB=DDzoXfBy_AXq6DgTojhbAsuzA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Built-in connection pooling (Shay Rojansky <roji@roji.org>)
Список	pgsql-hackers

Дерево обсуждения

On Fri, Feb 9, 2018 at 4:14 PM, Shay Rojansky <roji@roji.org> wrote:

Am a bit late to this thread, sorry if I'm slightly rehashing things. I'd like to go back to the basic on this.

Unless I'm mistaken, at least in the Java and .NET world, clients are almost always expected to have their own connection pooling, either implemented inside the driver (ADO.NET model) or as a separate modular component (JDBC). This approach has a few performance advantages:

1. "Opening" a new pooled connection is virtually free - no TCP connection needs to be opened, no I/O, no startup packet, nothing (only a tiny bit of synchronization).
2. Important client state can be associated to physical connections. For example, prepared statements can be tracked on the physical connection, and persisted when the connection is returned to the pool. The next time the physical connection is returned from the pool, if the user tries to server-prepare a statement, we can check on the connection if it has already been prepared in a "previous lifetime", and if so, no need to prepare again. This is vital for scenarios with short-lived (pooled) connections, such as web. Npgsql does this.

Regarding the problem of idle connections being kept open by clients, I'd argue it's a client-side problem. If the client is using a connection pool, the pool should be configurable to close idle connections after a certain time (I think this is relatively standard behavior). If the client isn't using a pool, it seems to be the application's responsibility to release connections when they're no longer needed.

The one drawback is that the pooling is application-specific, so it can't be shared by multiple applications/hosts. So in some scenarios it may make sense to use both client pooling and proxy/server pooling.

To sum it up, I would argue that connection pooling should first and foremost be considered as a client feature, rather than a proxy feature (pgpool) or server feature (the PostgreSQL pooling being discussed here). This isn't to say server-side pooling has no value though.

Recently, I did a large amount of parallel data processing where the results were stored in PG. I had about 1000 workers each with their own PG connection. As you pointed out, application pooling doesn't make sense in this scenario. I tried pgpool and pgbouncer, and both ended up as the bottleneck. Overall throughput was not great but it was highest without a pooler. That aligns with Konstantin's benchmarks too. As far as I know, server pooling is the only solution to increase throughput, without upgrading hardware, for this use case.

I hope this PR gets accepted!

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Built-in connection pooling