Re: Re: [pgjdbc] XADataSource support for resource sharing & interleaving. (#47)

Поиск
Список
Период
Сортировка
От Bryan Varner
Тема Re: Re: [pgjdbc] XADataSource support for resource sharing & interleaving. (#47)
Дата
Msg-id 514392D4.2030702@polarislabs.com
обсуждение исходный текст
Ответ на Re: Re: [pgjdbc] XADataSource support for resource sharing & interleaving. (#47)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: Re: [pgjdbc] XADataSource support for resource sharing & interleaving. (#47)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-jdbc
 > I was thinking of
> a wrapper that calls the native XADataSource implementation, which
> doesn't need to support suspend/resume and interleaving, and presents a
> fully-compliant XADataSource implementation to the outside, using the
> technique you used.

To me, if you're trying to solve a problem of an insufficient or
non-functioning XA implementation, you don't do that by relying upon
said insufficient XA implementation.

The way the code in the POLARIS_XA branch works would have been a pain
to wrap around the existing PGXAConnection and would have required
_more_ physical connections to handle interleaving by wrapping the
existing pgjdbc implementation. IIRC, the existing pgjdbc implementation
did not allow for a connection to switch global transactions after
prepare but before commit or rollback. This is a use-case that GlassFish
does frequently, where as soon as a connection is done with a Xid, it's
aggressively used for another Xid, and an XAResource returning true for
isSameRM is used to commit / rollback the prepared TX.

If you were to implement sharing properly atop the old PGXAConnection
our patch replaces, it would take a lot more physical (PGXAConnections
dedicated to xids which have yet to commit / rollback) than what we've
created a pull request for.

>> Hibernate has a system for providing dialect-specific SQL, which we'll
>> need to handle prepare() commit() rollback() recover() of 2pc operations
>> on a per-dialect basis.
>
> I didn't understand that part. What does Hibernate have to do with this?

Nothing. I'm pointing out that if I really want to make the wrapper
generic, and not rely upon the idiosyncrasies of faulty XADataSource
implementations, that you'll need to account for dialect in the wrapper
-- otherwise you wouldn't need the wrapper because the XADataSource from
the driver would be sufficient as implemented.

I was merely pointing out that there are other projects that:
  * wrap DataSource or Driver into a ConnectionPoolDataSource
  * implement different RDBMS requirements via specialized Dialects.

Refer to my above comments regarding resource sharing and the impact it
would have if you just wrapped the existing PGXAConnection.

>>> Sure, that would
>>> be slow, so you'd definitely still want to configure the TM to not do
>>> that, but at least it would work.
>>
>> We evaluated what it would take to get pgjdbc to work with GlassFish's
>> TM as is. There are cases in our application where it does SUSPEND. What
>> you are proposing will not fix the missing support for SUSPEND / JOIN.
>
> Did you evaluate what it would take to add the option to Glassfish?

No.

The GlassFish code base is not one I could familiarize myself with in a
reasonable amount of time. Given the choice between running custom
builds of an entire container or just a database driver in a production
environment, I'm going to pick the smallest, easiest to understand, and
most clearly documented piece of code to customize. In this case, that
was clearly pgjdbc.

>> Honestly, I don't understand why all these drivers did such a half-baked
>> job of implementing the spec, or why you're so vocally defending a
>> half-implementation when it took two people bouncing ideas, code, and
>> testing a week to get it workable, and a few additional weeks to
>> stabilize it.
>
> The JTA specification is crap. It imposes requirements on the drivers
> that have nothing to do with the real goal of the spec: supporting
> two-phase commit. It's silly that drivers have to be complicated to
> support those things.

Agreed. Which is why if I was going to write this as a generic wrapper,
I'd do it against DataSource instead of XADataSource, removing the
obvious weak-link of the quality of a driver-specific XA implementation.

There was zero reason they couldn't have added callbacks to the JDBC
interfaces to obtain statements to execute the 2pc invocations in a
driver-specific manner and implemented the rest of the X/Open spec as
.... A Wrapper! I think this pull request proves that to some extent.

> If you look at the javadocs for XADataSource, it says for
> getXAConnection():
>
>> Attempts to establish a *physical* database connection that can be
>> used in a distributed transaction.
>
> (emphasis mine). It's pretty clear what the intention of the authors is.
> "physical" is a vague term, but you'd expect that typically to be a
> single TCP connection, wouldn't you?

Yes. That's why this implementation aggressively closes those extra
connections, why it allows you to disable the behavior completely, and
why it allows you to set a 'wait for a free connection before opening a
new physical connection' timeout.

> It's not expected that the driver
> needs a multiplexing layer, mapping a single XAConnection to multiple
> physical connections.

It's also not expected that the driver won't work or will lead to data
loss when a Transaction Manager written to spec tries to use the other
methods in the API you chose not to copy / paste for reference. It's not
expected that someone arbitrarily decide they don't like the spec, so
they simply don't implement the functionality at all.

> Now, maybe it works fine with the multiplexing layer, but even if that's
> the way to meet the letter of the JTA spec, I don't think it's in the
> spirit of the spec that you'd have to do that.

I'm pretty certain it wasn't the spirit of the spec to have insufficient
implementations defended by strawman arguments about a bad spec. While I
agree that the spirit of the spec expects a single connection I think
it's far more dangerous to just completely ignore other parts of the
specification based on 'I don't like it' which can lead to people
attempting to use your software suffering data loss in production
environments.

We can agree there is no optimal solution here. The JTA spec is bad.

You can say what you want about our approach with multiplexing, but at
least it works consistently, reliably, under load, and hasn't lost data
in a production environment.

That's more than I can say for the implementation in upstream/master.

> Regarding the patch itself:
>
> Does it work correctly if you prepare a PreparedStatement in one
> transaction, suspend, and try to use the prepared statement in another
> transaction? I'm not sure what "correctly" is here, I don't think the
> JTA spec says anything explicit about that scenario (it's crap, after
> all ;-)), but I'd expect that to work.

If it hasn't hit the prepare threshold I'd expect it to work.

If it has hit the prepare threshold, I'd expect it to fail.

This is certainly a consideration that would need to be taken into
account. Especially if one were to bother with implementing JDBC3
statement pooling. PostgreSQL (due to the way it prepares statements) is
going to have to violate the 'spirit' of the JDBC 3 API and
specification in this regard too, if anyone ever adds statement pooling.
The C3P0 connection pool does statement pooling per-connection rather
than globally for non-XA datasource pools and works marvelously. There
is precedent in going outside the 'spirit' of an API if having a robust
implementation that covers the rest of the API is the result.

> It's a bit worrying that you can exceed your connection pool size with
> this. In the worst case, a single logical connection can use an
> arbitrary number of physical connections. That makes it more difficult
> to set max_connections, so that you don't run out of slots. To an admin,
> it's pretty surprising to see 50 connections from a host, when you've
> configured the max connection pool size to 40.

We did consider putting constraint properties in place to allow setting
maximum overhead connections, and even potentially pooling those -- it
was just easier and faster for us to measure the average connection
usage while running the application, and size the pool sufficiently
large to keep it from exceeding that limit under normal conditions,
while allowing it to go bonkers (if necessary) to service load in
abnormal conditions.

The choice here is between allowing extra short-lived connections to
handle load (which _may_ go over your pool size) and loosing customer data.

Keep in mind that when it's trying to service a logical connection it
will wait (for a time) for an existing connection to free up before
opening a new physical connection. That wait period is configurable, and
will have the side-effect of throttling the applications using the pool,
allowing for a more stable pool size, if you set the delay high enough.

Regards,
-Bryan Varner


---------------
The views expressed in this communication are the sole responsibility of
the author and do not reflect those of POLARIS Laboratories.


В списке pgsql-jdbc по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Re: [pgjdbc] XADataSource support for resource sharing & interleaving. (#47)
Следующее
От: Ian Pilcher
Дата:
Сообщение: Re: Client Certificate Authentication