Обсуждение: Separate connection handling from backends

Поиск
Список
Период
Сортировка

Separate connection handling from backends

От
Jim Nasby
Дата:
max_connections is a frequent point of contention between users and 
developers. Users want to set it high so they don't have to deal with 
Yet More Software (pgpool or pgBouncer); PG developers freak out because 
backends are pretty heavyweight, there's some very hot code that's 
sensitive to the size of ProcArray, lock contention, etc.

One solution to this would be to segregate connection handling from 
actual backends, somewhere along the lines of separating the main loop 
from the switch() that handles libpq commands. Benefits:

- External connections become very cheap
- Authentication is not an issue (unlike with external poolers)
- This is similar to what's necessary for some of the "Async 
Transaction" scenarios being discussed
- This is somewhat related to parallel query processes, though obviously 
those need a lot of extra shared state
- This could pave the way for transaction-controlling stored procedures 
(though certainly extra work would need to be done)

Downsides:
- Would presumably require at least one new parent process for spawning 
either backends or connection handlers.
- Lots of changes necessary to untangle backend and connection handling 
in all the code above PostgresMain (as well as some of the code in 
PostgresMain itself)
- Need a good way to pass errors back to the connection handler; it 
might be best not to put them in the same "pipe" as query results.
- Similarly, need a way to handle notifications... though maybe those 
don't need a full backend.

IMHO, the authentication issues alone make this very attractive from a 
user standpoint. There's no good way for an external pooler to use 
Postgres credentials for authentication, and even if they could there's 
still a problem of the pool being able to switch to a non-privileged 
user and back again.

To be clear, I won't be able to work on this myself (unless someone 
steps up to sponsor it). But I wanted to put the idea out there because 
of the potential overlap with some of the other features.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: Separate connection handling from backends

От
David Fetter
Дата:
On Mon, Dec 05, 2016 at 01:48:03PM -0600, Jim Nasby wrote:
> max_connections is a frequent point of contention between users and
> developers. Users want to set it high so they don't have to deal with Yet
> More Software (pgpool or pgBouncer); PG developers freak out because
> backends are pretty heavyweight, there's some very hot code that's sensitive
> to the size of ProcArray, lock contention, etc.
> 
> One solution to this would be to segregate connection handling from actual
> backends, somewhere along the lines of separating the main loop from the
> switch() that handles libpq commands. Benefits:

[interesting stuff elided]

What do you see as the relationship between this proposal and the
earlier one for admission control?

https://www.postgresql.org/message-id/4B38C1C5020000250002D9A5@gw.wicourts.gov

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Separate connection handling from backends

От
Jim Nasby
Дата:
On 12/5/16 2:14 PM, David Fetter wrote:
>> One solution to this would be to segregate connection handling from actual
>> backends, somewhere along the lines of separating the main loop from the
>> switch() that handles libpq commands. Benefits:
> [interesting stuff elided]
>
> What do you see as the relationship between this proposal and the
> earlier one for admission control?
>
> https://www.postgresql.org/message-id/4B38C1C5020000250002D9A5@gw.wicourts.gov

Without having read the paper reference in that email or the rest of the 
thread...

I think my proposal would completely eliminate the need for what Kevin 
proposed as long as the "connection" layer released the backend that it 
was using as soon as possible (namely, as soon as the backend was no 
longer in a transaction). This does assume that the connection layer is 
keeping a copy of all user/session settable GUCs. I don't think we need 
that ability in the first pass, but it would be very high on the desired 
feature list (because it would allow "transaction-level" pooling).

Actually, we could potentially do one better... if a backend sat idle in 
transaction for long enough, we could "save" that transaction state and 
free up the backend to do something else. I'm thinking this would be 
similar to a prepared transaction, but presumably there'd be some 
differences to allow for picking the transaction back up.

One big difference from what Kevin describe though: I don't think it 
makes sense for the connection layer to be able to parse queries. I 
suspect it would take a very large amount of work to allow something 
that's not a full-blown backend to parse, because it needs access to the 
catalogs. *Maybe* it'd be possible if we used a method other than 
ProcArray to register the snapshot that required, but you'd still have 
to duplicate all the relcache stuff.

BTW, it just occurred to me that having this separation would make it 
relatively easy to support re-directing DML queries from a replica to 
the master; if the backend throws the error indicating you tried to 
write data, the connection layer could re-route that.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: Separate connection handling from backends

От
Kevin Grittner
Дата:
On Mon, Dec 5, 2016 at 6:54 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 12/5/16 2:14 PM, David Fetter wrote:

>> What do you see as the relationship between this proposal and the
>> earlier one for admission control?
>>
>> https://www.postgresql.org/message-id/4B38C1C5020000250002D9A5@gw.wicourts.gov
>
> Without having read the paper reference in that email or the rest of the
> thread...

> One big difference from what Kevin describe though: I don't think it makes
> sense for the connection layer to be able to parse queries. I suspect it
> would take a very large amount of work to allow something that's not a
> full-blown backend to parse, because it needs access to the catalogs.
> *Maybe* it'd be possible if we used a method other than ProcArray to
> register the snapshot that required, but you'd still have to duplicate all
> the relcache stuff.

I don't recall ever, on the referenced thread or any other,
suggesting what you describe.  Basically, I was suggesting that we
create a number hooks which an admission control policy (ACP) could
tie into, and we could create pluggable APCs.  One ACP that I think
would be useful would be one that ties into a hook placed at the
point(s) where a transaction is attempting to acquire its first
"contentious resource" -- which would include at least snapshot and
locks.  If the user was a superuser it would allow the transaction
to proceed; otherwise it would check whether the number of
transactions which were holding contentious resources had reached
some (configurable) limit.  If allowing the transaction to proceed
would put it over the limit, the transaction would be blocked and
put on a queue behind any other transactions which had already been
blocked for this reason, and a transaction from the queue would be
unblocked whenever the count of transactions holding contentious
resources fell below the threshold.

I don't see where parsing even enters into this.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Separate connection handling from backends

От
Adam Brusselback
Дата:
BTW, it just occurred to me that having this separation would make it relatively easy to support re-directing DML queries from a replica to the master; if the backend throws the error indicating you tried to write data, the connection layer could re-route that.

This also sounds like it would potentially allow re-routing the other way where you know the replica contains up-to-date data, couldn't you potentially re-direct read only queries to your replicas?

Re: Separate connection handling from backends

От
Greg Stark
Дата:
On 5 December 2016 at 19:48, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> One solution to this would be to segregate connection handling from actual
> backends, somewhere along the lines of separating the main loop from the
> switch() that handles libpq commands. Benefits:

I'm kind of mystified how a simple code restructuring could solve the
fundamental problems with a large number of backends. It sounds like
what you're describing would just push the problem around, you would
end up with some other maximum instead, max_backends, or
max_active_backends, or something like that with the same problems.
At best it would help people who have connection pooling or but few
connections active at any given time.

Heikki's work with CSN would actually address the main fundamental
problem. Instead of having to scan PGPROC when taking a snapshot
taking a snapshot would be O(1). There might need to be scans of the
list of active transactions but never of all connections whether
they're in a transaction or not.

-- 
greg



Re: Separate connection handling from backends

От
Tom Lane
Дата:
Greg Stark <stark@mit.edu> writes:
> On 5 December 2016 at 19:48, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> One solution to this would be to segregate connection handling from actual
>> backends, somewhere along the lines of separating the main loop from the
>> switch() that handles libpq commands. Benefits:

> I'm kind of mystified how a simple code restructuring could solve the
> fundamental problems with a large number of backends. It sounds like
> what you're describing would just push the problem around, you would
> end up with some other maximum instead, max_backends, or
> max_active_backends, or something like that with the same problems.

What it sounds like to me is building a connection pooler into the
backend.  I'm not really convinced we ought to go there.

> Heikki's work with CSN would actually address the main fundamental
> problem. Instead of having to scan PGPROC when taking a snapshot
> taking a snapshot would be O(1).

While that would certainly improve matters, I suspect there are still
going to be bottlenecks arising from too many backends.
        regards, tom lane



Re: Separate connection handling from backends

От
Jim Nasby
Дата:
On 12/6/16 1:46 PM, Adam Brusselback wrote:
>     BTW, it just occurred to me that having this separation would make
>     it relatively easy to support re-directing DML queries from a
>     replica to the master; if the backend throws the error indicating
>     you tried to write data, the connection layer could re-route that.
>
>
> This also sounds like it would potentially allow re-routing the other
> way where you know the replica contains up-to-date data, couldn't you
> potentially re-direct read only queries to your replicas?

That's a lot more complicated, so I don't see that happening anytime soon.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: Separate connection handling from backends

От
Craig Ringer
Дата:
On 7 December 2016 at 10:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> What it sounds like to me is building a connection pooler into the
> backend.  I'm not really convinced we ought to go there.

If we do, it probably needs to be able to offer things that
out-of-tree ones can't.

The main things I see that you can't do sensibly with an out-of-tree pooler are:

* Re-use a backend for different session users. You can SET SESSION
AUTHORIZATION, but once you hand the connection off to the client they
can just do it again or RESET SESSION AUTHORIZATION and whammo,
they're a superuser. Same issue applies for SET ROLE and RESET ROLE.

* Cope with session-level state when transaction pooling. We probably
can't do anything much about WITH HOLD cursors, advisory locks, etc,
but we could save and restore GUC state and a few other things, and we
could detect whether or not we can save and restore state so we could
switch transparently between session and transaction pooling.

* Know, conclusively, whether a query is safe to reroute to a
read-only standby, without hard coded lists of allowed functions, iffy
SQL parsers, etc. Or conversely, transparently re-route queries from
standbys to a read/write master.

In other words, we could start to separate session state from executor
state in a limited manner. That'd definitely be valuable, IMO; it's a
real shame that Pg's architecture so closely couples the two.

So - is just doing "PgInCoreBouncer" a good idea? No, I don't think
so. But there are potentially good things to be done in the area.

What I don't see here is a patch, or a vague proposal for a patch, so
I'm not sure how this can go past the hot-air stage.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Separate connection handling from backends

От
Jim Nasby
Дата:
On 12/6/16 6:19 PM, Tom Lane wrote:
>> I'm kind of mystified how a simple code restructuring could solve the
>> fundamental problems with a large number of backends. It sounds like
>> what you're describing would just push the problem around, you would
>> end up with some other maximum instead, max_backends, or
>> max_active_backends, or something like that with the same problems.
> What it sounds like to me is building a connection pooler into the
> backend.  I'm not really convinced we ought to go there.

The way I'm picturing it backends would no longer be directly tied to 
connections. The code that directly handles connections would grab an 
available backend when a statement actually came in (and certainly it'd 
need to worry about transactions and session GUCs).

So in a way it's like a pooler, except it'd be able to do things that 
poolers simply can't (like safely switch the user the backend is using).

I think there might be other uses as well, since there's several other 
places where we need something that's kind-of like a backend, but if 
Heikki's work radically shifts the expense of running many thousands of 
backends then it's probably not worth doing.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: Separate connection handling from backends

От
Jim Nasby
Дата:
On 12/6/16 10:34 PM, Craig Ringer wrote:
> In other words, we could start to separate session state from executor
> state in a limited manner. That'd definitely be valuable, IMO; it's a
> real shame that Pg's architecture so closely couples the two.
>
> So - is just doing "PgInCoreBouncer" a good idea? No, I don't think
> so. But there are potentially good things to be done in the area.

Right.

> What I don't see here is a patch, or a vague proposal for a patch, so
> I'm not sure how this can go past the hot-air stage.

Yeah, I brought it up because I think there's potential tie-in with 
other things that have been discussed (notably async transactions, but 
maybe BG workers and parallel query could benefit too). Maybe it would 
make sense as part of one of those efforts.

Though, this is something that's asked about often enough that it'd 
probably be possible to round up a few companies to fund it.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: Separate connection handling from backends

От
Kevin Grittner
Дата:
On Wed, Dec 7, 2016 at 12:36 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

> The way I'm picturing it backends would no longer be directly
> tied to connections. The code that directly handles connections
> would grab an available backend when a statement actually came in
> (and certainly it'd need to worry about transactions and session
> GUCs).

If we're going to consider that, I think we should consider going
all the way to the technique used by many (most?) database
products, which is to have a configurable number of "engines" that
pull work requests from queues.  We might have one queue for disk
writes, one for disk reads, one for network writes, etc.
Traditionally, each engine spins over attempts to read from the
queues until it finds a request to process; blocking only if
several passes over all queues come up empty.  It is often possible
to bind each engine to a particular core.  Current process-local
state would be passed around, attached to queued requests, in a
structure associated with the connection.

I don't know how that execution model would compare to what we use
now in terms of performance, but its popularity makes it hard to
ignore as something to consider.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Separate connection handling from backends

От
Craig Ringer
Дата:
On 7 December 2016 at 22:27, Kevin Grittner <kgrittn@gmail.com> wrote:

> I don't know how that execution model would compare to what we use
> now in terms of performance, but its popularity makes it hard to
> ignore as something to consider.

Those engines also tend to be threaded. They can stash state in memory
and hand it around between executors in ways we cannot really do.

I'd love to see a full separation of executor from session in
postgres, but I can't see how it could be at all practical. The use of
globals for state and the assumption that session == backend is baked
in way too deep.

At least, I think it'd be a slow and difficult thing to change, and
would need many steps. Something like what was proposed upthread would
possibly make sense as a first step.

But again, I don't see anyone who's likely to actually do it.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services