Re: Feature Proposal: Connection Pool Optimization - Change the Connection User

Поиск
Список
Период
Сортировка
От Todd Hubers
Тема Re: Feature Proposal: Connection Pool Optimization - Change the Connection User
Дата
Msg-id CABO3BC2XW24yLGAsBNQzEbJUx7JbSojWAUKUmKs79A3YOPa-tQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Feature Proposal: Connection Pool Optimization - Change the Connection User  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Feature Proposal: Connection Pool Optimization - Change the Connection User  (Daniel Gustafsson <daniel@yesql.se>)
Список pgsql-hackers
Hi Tom, Justin, and Andrey,

Thanks everybody for your feedback so far! I agree, there are a few unknowns for the design and impact and there are many details to iron out.

Benchmarking - Overall I think it's best to explore improvements with benchmarking. The key goal of this proposal pertains to performance: "fast". That means benchmarking is essential in the design phase, to ensure there are measurable improvements, and reward for effort. This should also primarily influence the selection of Solution Option (of which there are 6 or 7 listed). I have added the proposed benchmarks to the document.

1) Andrey said: "And these expenses would be not necessary if we could just send a new Startup message after the Terminate ('X') message." - Answer: I have added the "Terminate/Restartup" benchmark in the document accordingly.

2) Justin said: "However, this doesn't attempt to address the similar issue if there's hundreds or thousands of DBs.  Which I don't think could work at all, since switching to a new DB requires connecting to a new backend process."

Good point. I do have some loose thinking around that. I do have some comments throughout the proposal document to tackle that in the future. As you point out, user switching seems to be a simpler well-trodden path. For now, my proposal is that the middleware can assert a database change, but that should create an error for now. In the future, it might be a supported capability.

I have also added database-switching as a consideration for the benchmarking.

3) Justin said: "You'd need to make sure that a client connected to the connection pooler cannot itself run the PQ "set role".  The connection pooler would need to reject the request (or maybe ignore requests to switch to the same/current user)." - Answer: Agreed.

4) Justin said: "Maybe you'd have two protocol requests "begin switch user" and "end switch to user", and then the server-side could enforce that "begin switch" is not nested. Maybe the "begin switch" could return some kind of "nonce" to the connection pooler, and "end switch" would require the same nonce to be validated."

Agreed, something like this should be suitable. For performance, I would prefer to not use a Nonce because that would not be a 0-RTT approach. The PQ level ImpersonateDatabaseUser is effectively the "begin". I don't think an "end" is necessary, because it is expected to switch the user very often. The next "begin" is effectively ending the last Impersonation. The middleware could switch back to its own username, but as per my draft proposal, that user should only be allowed (as a connection bound authorisation for ImpersonateDatabaseUser) to impersonate and nothing else.

And then there are yet 5 other solution options which have an explicit Begin/End pattern.

5) Justin said: "It'd be interesting to hear if you've tested with postgresql 14, which improves scalability to larger number of connections."

Agreed. I will include a baseline in the benchmarks. Regardless of improvements, it won't be infinite and there will still be a need to "enable" pooling where there are MAX_CONNECTIONS users using the system.

6) Tom said: "It's not really apparent to me how this mechanism wouldn't be a giant security hole."

I have added your important concerns to a Security Considerations section of the document:
  • "What is the difference between granting the proposed "impersonate" privilege and granting superuser?"
  • "Is this merely transferring the entire responsibility for user authentication onto the middleware, with the server expected to just believe that the session should now be allowed to act as user X?"
7) Tom said: "I think by the time you got to something where "impersonate" wasn't a security hole, it would be pretty much equivalent to SET ROLE"

I don't think so. This is already contemplated in the draft. See Solution Option 4. SET ROLE comes with RESET ROLE. The frontend client could call RESET ROLE then change to whatever role they like. That means that feature is not currently suitable to be used for the context of this proposal.

8) Tom said: "Another issue is that the proposed implementation seems pretty far short of being an invisible substitute for actually logging in as the target user"

The goal of this proposal is performance and to enable shared connection pooling. Direct logging in doesn't allow the reuse of existing connections in the pool.

9) Tom said: "For starters, what of instantiating ALTER ROLE SET parameter values that apply to the target user, and getting rid of such settings that applied to the original user ID?"

I think you are suggesting to modify the already-logged-in user. That's interesting. However, many systems audit the logged in user by username - who they are. Furthermore, the modification of user privileges would not help to enable connection pooling as rehashed in my answer to [8].

10) Tom said: "How would this interact with the "on login" triggers that people keep asking for?

That's a good point. I would imagine that SET ROLE (which is currently unsuitable) would have the same requirement. The answer is Shared Functions. SET ROLE calls a function like "SetSessionUserId". Our implementation should call the same function(s). If OnLogin functionality is implemented they should trigger from there.

11) Tom said: "Also, you'd still have to do DISCARD ALL (at least) when switching users, or else integrate all that cache-flushing right into the switching mechanism.  So I'm not entirely convinced about how big the performance benefits are compared to a fresh session."

Agreed, nor am I convinced. We should only be guided by benchmarks and tests, not subjective assumptions. See the Benchmarking section in the document for details.

12 Tom said: "...[Upon failure, no further commands will be processed until ImpersonateDatabaseUser succeeds.] seems to require adding a huge amount of complication on the server side, and complication in the protocol spec itself, to save a rather minimal amount of complication in the middleware.  Why can't we just say that a failed "impersonate" command leaves the session in the same state as before, and it's up to the pooler to do something about it?  We are in any case trusting the pooler not to send commands from user A to a session logged in as user B. We are in any case trusting the pooler not to send commands from user A to a session logged in as user B."

I think you are overstating the complexity. It only requires a LastUserSwitchFailed boolean which is cleared to false when a UserSwitch succeeds. If LastUserSwitchFailed is true, tcop ignores the messages and sends back errors. This detail has been added to the proposal document.

It's important that the implementation is objectively faster. The 0-RTT design is proposed for efficiency. The Middleware might be able to fit BOTH the UserSwitch AND the Query within a 1500 MTU. If not, it shouldn't wait for a confirmation - for efficiency. The middleware might be on localhost, or it might be 1-5ms away on the LAN. Effectively, the UserSwitch is a sort of "Header" before a series of commands. Performance is the goal. Therefore, the connection cannot be left in the same state as before, or else the pending Query will run in the incorrect context. This is a rare failure mode, so failure is ideal.

Ultimately these are assumptions and benchmarking results should drive decisions around the implementation of every aspect.

13. Tom said: "I wonder how we test such a feature meaningfully without incorporating a pooler right into the Postgres tree."

We can benchmark without a pooler - see the Benchmark section for details. (Furthermore, I propose that general benchmark tooling does belong in Postgres for the benefit of the ecosystem of connection poolers. I have included such a remark in the Benchmarking section "PostgreSQL is not planning to incorporate Connection Pooling...".)

Thanks again everyone for the tough questions and the ideas!

Regards,

Todd

On Sun, 21 Nov 2021 at 08:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Justin Pryzby <pryzby@telsasoft.com> writes:
> On Sun, Nov 21, 2021 at 03:11:03AM +1100, Todd Hubers wrote:
>> - Google Document with Commenting turned on
>> https://docs.google.com/document/d/1u6mVKEHfKtR80UrMLNYrp5D6cCSW1_arcTaZ9HcAKlw/edit?usp=sharing.

> You proposed a PQ protocol version of SET ROLE/SET SESSION authorization.
> You'd need to make sure that a client connected to the connection pooler cannot
> itself run the PQ "set role".

It's not really apparent to me how this mechanism wouldn't be a giant
security hole.  In particular, I see no meaningful difference between
granting the proposed "impersonate" privilege and granting superuser.
You could restrict it to not allow impersonating superusers, which'd
make it a little better, but what of all the predefined roles we keep
inventing that have privileges we don't want to be accessible to Joe
User?  I think by the time you got to something where "impersonate"
wasn't a security hole, it would be pretty much equivalent to SET ROLE,
ie you could only impersonate users you'd been specifically given the
privlege for.

It also seems like this is transferring the entire responsibility for
user authentication onto the middleware, with the server expected to
just believe that the session should now be allowed to act as user X.
Again, that seems more like a security problem than a good design.

Another issue is that the proposed implementation seems pretty far
short of being an invisible substitute for actually logging in as
the target user.  For starters, what of instantiating ALTER ROLE SET
parameter values that apply to the target user, and getting rid of
such settings that applied to the original user ID?  How would this
interact with the "on login" triggers that people keep asking for?

Also, you'd still have to do DISCARD ALL (at least) when switching
users, or else integrate all that cache-flushing right into the
switching mechanism.  So I'm not entirely convinced about how big
the performance benefits are compared to a fresh session.

One more point is that the proposed business about

* ImpersonateDatabaseUser will either succeed silently (0-RTT), or
  fail. Upon failure, no further commands will be processed until
  ImpersonateDatabaseUser succeeds.

seems to require adding a huge amount of complication on the server side,
and complication in the protocol spec itself, to save a rather minimal
amount of complication in the middleware.  Why can't we just say that
a failed "impersonate" command leaves the session in the same state
as before, and it's up to the pooler to do something about it?  We are
in any case trusting the pooler not to send commands from user A to
a session logged in as user B.

                        regards, tom lane

PS: I wonder how we test such a feature meaningfully without
incorporating a pooler right into the Postgres tree.  I don't
want to do that, for sure.


--
--
Todd Hubers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Paul A Jungwirth
Дата:
Сообщение: Re: SQL:2011 application time
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: logical decoding/replication: new functions pg_ls_logicaldir and pg_ls_replslotdir