Re: One process per session lack of sharing

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: One process per session lack of sharing
Дата
Msg-id CAMsr+YH9tGpknN5H-T=9=vK8yP-M0jTrB_a5BQj6GeY2K2pTaA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: One process per session lack of sharing  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: One process per session lack of sharing  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Список pgsql-hackers
On 14 July 2016 at 03:59, Robert Haas <robertmhaas@gmail.com> wrote:
 
I agree that there's not really a plan to implement this, but I don't
agree that connection pooling solves the whole problem.  Most people
can't get by with statement pooling, so in practice you are looking at
transaction pooling or session pooling.  And that means that you can't
really keep the pool size as small as you'd like because backends can
be idle in transaction for long enough to force the pool size to be
pretty large.  Also, pooling causes the same backends to get reused
for different sessions which touch different relations and different
functions so that, for example, the relcache and the PL/pgsql function
caches grow until every one of those sessions has everything cached
that any client needs.  That can cause big problems.

So, I actually think it would be a good idea to think about this.

I agree. It's been on my mind for a while, but I've been assuming it's likely to involve such architectural upheaval as to be impractical.

Right now PostgreSQL conflates "user session state" and "execution engine" into one process. This means we need an external connection pooler to handle things if we want more user connections than we can efficiently handle in terms of number of executors. Current poolers don't do much to keep track of user state, they just arbitrate access to executors and expect applications to re-establish any needed state (SET vars, LISTEN, etc) or not use features that require persistence across the current pooling level.

This leaves users in the hard position of using very high, inefficient max_connections values to keep track of application<->DB state or jump through awkward hoops to use transaction pooling, either at the application level (Java appserver pools, etc) or through a proxy. 

If using the high max_connections approach the user must also ensure that they don't have all those max_connections actually doing work at the same time using some kind of external coordination. Otherwise they'll thrash the server and face out of memory issues (especially with our rather simplistic work_mem management, etc) and poor performance.

The solution, to me, is to separate "user state" and "executor". Sounds nice, but we use global variables _everywhere_ and it's assumed throughout the code that we have one user session for the life of a backend, though with some exceptions for SET SESSION AUTHORIZATION. It's not likely to be fun.

The
problem, of course, is that as long as we allow arbitrary parts of the
code - including extension code - to declare global variables and
store arbitrary stuff in them without any coordination, it's
impossible to imagine hibernating and resuming a session without a
risk of things going severely awry.

Yeah. We'd definitely need a session state management mechanism with save and restore functionality.

There's also stuff like:

* LISTEN
* advistory locking at the session level
* WITH HOLD cursors

that isn't simple to just save and restore. Those are some of the same things that are painful with transaction pooling right now.
 
This was a major issue for
parallel query, but we've solved it, mostly, by designating the things
that rely on global variables as parallel-restricted, and there
actually aren't a ton of those.  So I think it's imaginable that we
can get to a point where we can, at least in some circumstances, let a
backend exit and reconstitute its state at a later time.  It's not an
easy project, but I think it is one we will eventually need to do.

I agree on both points, but I think "not easy" is rather an understatement.

Starting with a narrow scope would help. Save/restore GUCs and the other easy stuff, and disallow sessions that are actively LISTENing, hold advisory locks, have open cursors, etc from being saved and restored.

BTW, I think this would also give us a useful path toward allowing connection poolers to change the active user and re-authenticate on an existing backend. Right now you have to use SET ROLE or SET SESSION AUTHORIZATION (ugh) and can't stop the client you hand the connection to from just RESETing back to the pooler's user and doing whatever it wants.
 
Insisting that the current model is working is just sticking our head
in the sand.  It's mostly working, but there are workloads where it
fails badly - and competing database products survive a number of
scenarios where we just fall on our face.

Yep, and like parallel query it's a long path, but it's one we've got to face sooner or later.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Fix typo in postgres_fdw/deparse.c
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: One process per session lack of sharing