Обсуждение: PostgreSQL XAResource & GlassFish 3.1.2.2

Поиск
Список
Период
Сортировка

PostgreSQL XAResource & GlassFish 3.1.2.2

От
Bryan Varner
Дата:
Greetings all,

I'm having a nightmare of a time using the PGXADataSource in a production GlassFish environment. It worked fine our lower volume test environments.

Is anyone else using the XADataSource with GlassFish? How about other JEE containers?

The problems we're having seem to be specific to the XA implementation, or possibly the GlassFish connection pool.

Things I've tracked down so far:

  * Race conditions as multiple threads are participating in the same transaction, invoking XAResource methods. Status checks in PGXAConnection.java are throwing exceptions (if state == ACTIVE) throw) by the time in invokes the throw, the state is != ACTIVE)  Before you start telling me I shouldn't be using threads in a  JEE environment let me remind you that EJBs by default are served out of thread pools, and allow for concurrent threads to participate within a single TX scope. This is outlined as part of the transaction context in the JTS spec (section 2.2 and 2.3) and synchronized thread-safe access to XAResources is described (without being explicitly called out) by the JTA 1.0.1 spec.

 * It appears that a second thread attempting to join an existing XAResource's scope with start(XID, TMJOIN) is going to be refused, even if it's attempting to participate in the same XID. The exception thrown is one complaining about interleaving, even though it's the -same- XID, not a sub-set of work in another TX.

It seems as though the PG XAResource implementation is a single-association implementation that will only work properly in a single-threaded environment. GlassFish appears to be expecting it to work as a multiple association (but without TX interleaving) resource. Am I missing some sort of magical configuration setting, or is this a known limitation (the whole single-threaded, non synchronized, and single-association) of the current driver?

Regards,
-Bryan Varner

Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Heikki Linnakangas
Дата:
On 12.02.2013 05:36, Bryan Varner wrote:
>    * Race conditions as multiple threads are participating in the same transaction, invoking XAResource methods.
Statuschecks in PGXAConnection.java are throwing exceptions (if state == ACTIVE) throw) by the time in invokes the
throw,the state is != ACTIVE)  Before you start telling me I shouldn't be using threads in a  JEE environment let me
remindyou that EJBs by default are served out of thread pools, and allow for concurrent threads to participate within a
singleTX scope. This is outlined as part of the transaction context in the JTS spec (section 2.2 and 2.3) and
synchronizedthread-safe access to XAResources is described (without being explicitly called out) by the JTA 1.0.1 spec. 

We could fairly easily just add "synchronized" to all the public
methods. I wonder how sane it is for Glassfish to be doing that in the
first place, though. AFAICS, in any combination of two XAResource
methods, called concurrently, one of the threads will get an error
anyway. For example, if two threads try to call start() at the same
time, one of them has to fail because an XAResource can only be
associated with a one transaction at a time.

>   * It appears that a second thread attempting to join an existing XAResource's scope with start(XID, TMJOIN) is
goingto be refused, even if it's attempting to participate in the same XID. The exception thrown is one complaining
aboutinterleaving, even though it's the -same- XID, not a sub-set of work in another TX. 

Hmm, so the application server is trying to do something like this:

xares.start(1234, 0);
xares.start(1234, TMJOIN);

We could easily allow that in the driver (ie. do nothing), but that
doesn't seem like valid sequence of calls to begin with. You must
disassociate the XAResource from the current transaction with end(),
before re-associating it with start().

If you have a simple stand-alone test application that reproduces the
problems, I could take a closer look.

- Heikki


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Bryan Varner
Дата:
>> * Race conditions as multiple threads are participating in the same
>> transaction, invoking XAResource methods. Status checks in
>> PGXAConnection.java are throwing exceptions (if state == ACTIVE)
>> throw) by the time in invokes the throw, the state is != ACTIVE)
>> Before you start telling me I shouldn't be using threads in a JEE
>> environment let me remind you that EJBs by default are served out of
>> thread pools, and allow for concurrent threads to participate within a
>> single TX scope. This is outlined as part of the transaction context
>> in the JTS spec (section 2.2 and 2.3) and synchronized thread-safe
>> access to XAResources is described (without being explicitly called
>> out) by the JTA 1.0.1 spec.
>
> We could fairly easily just add "synchronized" to all the public
> methods. I wonder how sane it is for Glassfish to be doing that in the
> first place, though. AFAICS, in any combination of two XAResource
> methods, called concurrently, one of the threads will get an error
> anyway. For example, if two threads try to call start() at the same
> time, one of them has to fail because an XAResource can only be
> associated with a one transaction at a time.

I think there's some confusion between a thread and a logical
transaction (represented by a physical connection to the db), with an
XID managed by a Transaction Manager.

In an JEE container, it's expected that multiple threads will do work on
behalf on a single XAResource, managed by the transaction manager. A
single XID (XAResource) will have multiple threads doing work on their
behalf. This does not necessitate interleaving, but it does mean that
multiple threads can be invoking start() and end() on an XAResource.

>> * It appears that a second thread attempting to join an existing
>> XAResource's scope with start(XID, TMJOIN) is going to be refused,
>> even if it's attempting to participate in the same XID. The exception
>> thrown is one complaining about interleaving, even though it's the
>> -same- XID, not a sub-set of work in another TX.
>
> Hmm, so the application server is trying to do something like this:
>
> xares.start(1234, 0);
> xares.start(1234, TMJOIN);
>
> We could easily allow that in the driver (ie. do nothing), but that
> doesn't seem like valid sequence of calls to begin with. You must
> disassociate the XAResource from the current transaction with end(),
> before re-associating it with start().

You're correct, after doing a bunch more reading, the code path above is
invalid.

What should be valid (and is not considered interleaving), is:

Thread A                       Thread B
--------                       ---------
xares.start(1234, TMNOFLAGS);
doStuff();
xares.end(1234);
                                xares.start(1234, TMJOIN);
                                doStuff();
                                xares.end(1234);
xares.start(1234, TMJOIN);
doStuff();
xares.end(1234);

So long as the TM is serializing execution of A and B and not allowing
branch interleaving.

In this case, the XAResource is preforming work on behalf of more than
one thread, but in the same XID context. The problem I think I'm seeing
at this point (still trying to coordinate with the glassfish devs) is
that the XAResource isn't fully completing execution of end() prior to
the other thread invoking start(), even though the method invocation
appears to be happening 'in order'. This would manifest as a classic
race condition, and would not constitute transaction interleaving, since
the XID in use is the same TX branch.

I'm working on a test case as part of the XAResource test suite in the
driver codebase, as I'm doing this, I'm trying to nail down how
glassfish is synchronizing access to XAResources, so this is taking me
some time.

What I can tell you, is that I'm seeing exception cases in my prod
environment where the currentXid.equals(xid), but where the state field
in XAConnection hasn't been updated by a concurrent calls to
start()/end() in time to pass the interleaving pre-condition checks. My
current hypothesis is that GF isn't trying to do interleaving, but the
internal state field isn't being updated 'fast enough' (thread-safely)
to avoid race conditions in non-interleaved, but multi-threaded
environments.

Regards,
-Bryan


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Florent Guillaume
Дата:
Maybe in PGXAConnection the field state (and maybe others) should be volatile?

Florent

On Tue, Feb 12, 2013 at 5:01 PM, Bryan Varner <bvarner@polarislabs.com> wrote:
>>> * Race conditions as multiple threads are participating in the same
>>> transaction, invoking XAResource methods. Status checks in
>>> PGXAConnection.java are throwing exceptions (if state == ACTIVE)
>>> throw) by the time in invokes the throw, the state is != ACTIVE)
>>> Before you start telling me I shouldn't be using threads in a JEE
>>> environment let me remind you that EJBs by default are served out of
>>> thread pools, and allow for concurrent threads to participate within a
>>> single TX scope. This is outlined as part of the transaction context
>>> in the JTS spec (section 2.2 and 2.3) and synchronized thread-safe
>>> access to XAResources is described (without being explicitly called
>>> out) by the JTA 1.0.1 spec.
>>
>>
>> We could fairly easily just add "synchronized" to all the public
>> methods. I wonder how sane it is for Glassfish to be doing that in the
>> first place, though. AFAICS, in any combination of two XAResource
>> methods, called concurrently, one of the threads will get an error
>> anyway. For example, if two threads try to call start() at the same
>> time, one of them has to fail because an XAResource can only be
>> associated with a one transaction at a time.
>
>
> I think there's some confusion between a thread and a logical transaction
> (represented by a physical connection to the db), with an XID managed by a
> Transaction Manager.
>
> In an JEE container, it's expected that multiple threads will do work on
> behalf on a single XAResource, managed by the transaction manager. A single
> XID (XAResource) will have multiple threads doing work on their behalf. This
> does not necessitate interleaving, but it does mean that multiple threads
> can be invoking start() and end() on an XAResource.
>
>
>>> * It appears that a second thread attempting to join an existing
>>> XAResource's scope with start(XID, TMJOIN) is going to be refused,
>>> even if it's attempting to participate in the same XID. The exception
>>> thrown is one complaining about interleaving, even though it's the
>>> -same- XID, not a sub-set of work in another TX.
>>
>>
>> Hmm, so the application server is trying to do something like this:
>>
>> xares.start(1234, 0);
>> xares.start(1234, TMJOIN);
>>
>> We could easily allow that in the driver (ie. do nothing), but that
>> doesn't seem like valid sequence of calls to begin with. You must
>> disassociate the XAResource from the current transaction with end(),
>> before re-associating it with start().
>
>
> You're correct, after doing a bunch more reading, the code path above is
> invalid.
>
> What should be valid (and is not considered interleaving), is:
>
> Thread A                       Thread B
> --------                       ---------
> xares.start(1234, TMNOFLAGS);
> doStuff();
> xares.end(1234);
>                                xares.start(1234, TMJOIN);
>                                doStuff();
>                                xares.end(1234);
> xares.start(1234, TMJOIN);
> doStuff();
> xares.end(1234);
>
> So long as the TM is serializing execution of A and B and not allowing
> branch interleaving.
>
> In this case, the XAResource is preforming work on behalf of more than one
> thread, but in the same XID context. The problem I think I'm seeing at this
> point (still trying to coordinate with the glassfish devs) is that the
> XAResource isn't fully completing execution of end() prior to the other
> thread invoking start(), even though the method invocation appears to be
> happening 'in order'. This would manifest as a classic race condition, and
> would not constitute transaction interleaving, since the XID in use is the
> same TX branch.
>
> I'm working on a test case as part of the XAResource test suite in the
> driver codebase, as I'm doing this, I'm trying to nail down how glassfish is
> synchronizing access to XAResources, so this is taking me some time.
>
> What I can tell you, is that I'm seeing exception cases in my prod
> environment where the currentXid.equals(xid), but where the state field in
> XAConnection hasn't been updated by a concurrent calls to start()/end() in
> time to pass the interleaving pre-condition checks. My current hypothesis is
> that GF isn't trying to do interleaving, but the internal state field isn't
> being updated 'fast enough' (thread-safely) to avoid race conditions in
> non-interleaved, but multi-threaded environments.
>
> Regards,
> -Bryan
>
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc



--
Florent Guillaume, Director of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Bryan Varner
Дата:
> Maybe in PGXAConnection the field state (and maybe others) should be volatile?

That might be appropriate for Java 5 +, but before the memory management
changes introduced in 1.5, I don't think volatile access to state or
currentXid is going to be enough to enforce the semantics the XAResource
implementation is doing precondition checks against in a 'more than one
thread' environment.

Regards,
-Bryan


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Heikki Linnakangas
Дата:
On 12.02.2013 18:01, Bryan Varner wrote:
> What should be valid (and is not considered interleaving), is:
>
> Thread A Thread B
> -------- ---------
> xares.start(1234, TMNOFLAGS);
> doStuff();
> xares.end(1234);
> xares.start(1234, TMJOIN);
> doStuff();
> xares.end(1234);
> xares.start(1234, TMJOIN);
> doStuff();
> xares.end(1234);
>
> So long as the TM is serializing execution of A and B and not allowing
> branch interleaving.
>
> In this case, the XAResource is preforming work on behalf of more than
> one thread, but in the same XID context. The problem I think I'm seeing
> at this point (still trying to coordinate with the glassfish devs) is
> that the XAResource isn't fully completing execution of end() prior to
> the other thread invoking start(), even though the method invocation
> appears to be happening 'in order'. This would manifest as a classic
> race condition, and would not constitute transaction interleaving, since
> the XID in use is the same TX branch.

I see. I think there's yet another potential explanation: even if
glassfish is careful to invoke the end() only after start() has fully
finished in the other thread, the java memory model does not guarantee
that the effects of the end() in one thread are visible to the other
thread doing the start(). The update of the PGXAConnection's
state-variable might be sitting in the CPU cache of the CPU running the
first thread, not yet visible to the second thread. However, I'm not
sure if glassfish could guarantee that the start() has really finished
before end() is called, without using some operation that would also
force the CPU cache to be flushed, making the effects visible.

In any case, it seems like we should add "synchronized" to all the
public methods in PGXAConnection. The performance penalty should be
minimal, and it would fix any race conditions of that sort.

Can you test the attached patch?

- Heikki

Вложения

Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Florent Guillaume
Дата:
On Tue, Feb 12, 2013 at 7:36 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> I see. I think there's yet another potential explanation: even if glassfish
> is careful to invoke the end() only after start() has fully finished in the
> other thread, the java memory model does not guarantee that the effects of
> the end() in one thread are visible to the other thread doing the start().
> The update of the PGXAConnection's state-variable might be sitting in the
> CPU cache of the CPU running the first thread, not yet visible to the second
> thread.

That's the whole point of volatile since Java 5, it enforces a barrier
and "happens-before" semantics.

> However, I'm not sure if glassfish could guarantee that the start()
> has really finished before end() is called, without using some operation
> that would also force the CPU cache to be flushed, making the effects
> visible.
>
> In any case, it seems like we should add "synchronized" to all the public
> methods in PGXAConnection. The performance penalty should be minimal, and it
> would fix any race conditions of that sort.

For Java 5+ this is really overkill.

Florent

>
> Can you test the attached patch?
>
> - Heikki
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc
>



--
Florent Guillaume, Director of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Dave Cramer
Дата:
We could patch it so that java 5+ didn't see this the driver already builds different versions

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Tue, Feb 12, 2013 at 1:41 PM, Florent Guillaume <fg@nuxeo.com> wrote:
On Tue, Feb 12, 2013 at 7:36 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> I see. I think there's yet another potential explanation: even if glassfish
> is careful to invoke the end() only after start() has fully finished in the
> other thread, the java memory model does not guarantee that the effects of
> the end() in one thread are visible to the other thread doing the start().
> The update of the PGXAConnection's state-variable might be sitting in the
> CPU cache of the CPU running the first thread, not yet visible to the second
> thread.

That's the whole point of volatile since Java 5, it enforces a barrier
and "happens-before" semantics.

> However, I'm not sure if glassfish could guarantee that the start()
> has really finished before end() is called, without using some operation
> that would also force the CPU cache to be flushed, making the effects
> visible.
>
> In any case, it seems like we should add "synchronized" to all the public
> methods in PGXAConnection. The performance penalty should be minimal, and it
> would fix any race conditions of that sort.

For Java 5+ this is really overkill.

Florent

>
> Can you test the attached patch?
>
> - Heikki
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc
>



--
Florent Guillaume, Director of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87


--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc

Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Dave Cramer
Дата:
Bryan,

Can you test Heikki's patch with your code. Florent any chance you can give me java 5 patch or better yet a git pull 

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Tue, Feb 12, 2013 at 1:43 PM, Dave Cramer <pg@fastcrypt.com> wrote:
We could patch it so that java 5+ didn't see this the driver already builds different versions

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Tue, Feb 12, 2013 at 1:41 PM, Florent Guillaume <fg@nuxeo.com> wrote:
On Tue, Feb 12, 2013 at 7:36 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> I see. I think there's yet another potential explanation: even if glassfish
> is careful to invoke the end() only after start() has fully finished in the
> other thread, the java memory model does not guarantee that the effects of
> the end() in one thread are visible to the other thread doing the start().
> The update of the PGXAConnection's state-variable might be sitting in the
> CPU cache of the CPU running the first thread, not yet visible to the second
> thread.

That's the whole point of volatile since Java 5, it enforces a barrier
and "happens-before" semantics.

> However, I'm not sure if glassfish could guarantee that the start()
> has really finished before end() is called, without using some operation
> that would also force the CPU cache to be flushed, making the effects
> visible.
>
> In any case, it seems like we should add "synchronized" to all the public
> methods in PGXAConnection. The performance penalty should be minimal, and it
> would fix any race conditions of that sort.

For Java 5+ this is really overkill.

Florent

>
> Can you test the attached patch?
>
> - Heikki
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc
>



--
Florent Guillaume, Director of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87


--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Heikki Linnakangas
Дата:
On 12.02.2013 20:41, Florent Guillaume wrote:
> On Tue, Feb 12, 2013 at 7:36 PM, Heikki Linnakangas
> <hlinnakangas@vmware.com>  wrote:
>> I see. I think there's yet another potential explanation: even if glassfish
>> is careful to invoke the end() only after start() has fully finished in the
>> other thread, the java memory model does not guarantee that the effects of
>> the end() in one thread are visible to the other thread doing the start().
>> The update of the PGXAConnection's state-variable might be sitting in the
>> CPU cache of the CPU running the first thread, not yet visible to the second
>> thread.
>
> That's the whole point of volatile since Java 5, it enforces a barrier
> and "happens-before" semantics.
 >
>> However, I'm not sure if glassfish could guarantee that the start()
>> has really finished before end() is called, without using some operation
>> that would also force the CPU cache to be flushed, making the effects
>> visible.
>>
>> In any case, it seems like we should add "synchronized" to all the public
>> methods in PGXAConnection. The performance penalty should be minimal, and it
>> would fix any race conditions of that sort.
>
> For Java 5+ this is really overkill.

Well, volatile will only fix the issue if it's indeed "just" because of
weak memory ordering issues. However, if the cause is a genuine race
condition where glassfish calls end() in one thread before the start()
in another thread has ran to completion, as Bryan suspected, then
volatile is not enough. Besides, it would be good for PGXAConnection to
be thread-safe in general, like the rest of the JDBC driver.

- Heikki


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Florent Guillaume
Дата:
Agreed, it's better to be thread-safe in general than just fix the
barrier issue.

Florent

On Tue, Feb 12, 2013 at 7:48 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 12.02.2013 20:41, Florent Guillaume wrote:
>>
>> On Tue, Feb 12, 2013 at 7:36 PM, Heikki Linnakangas
>> <hlinnakangas@vmware.com>  wrote:
>>>
>>> I see. I think there's yet another potential explanation: even if
>>> glassfish
>>> is careful to invoke the end() only after start() has fully finished in
>>> the
>>> other thread, the java memory model does not guarantee that the effects
>>> of
>>> the end() in one thread are visible to the other thread doing the
>>> start().
>>> The update of the PGXAConnection's state-variable might be sitting in the
>>> CPU cache of the CPU running the first thread, not yet visible to the
>>> second
>>> thread.
>>
>>
>> That's the whole point of volatile since Java 5, it enforces a barrier
>> and "happens-before" semantics.
>
>>
>>>
>>> However, I'm not sure if glassfish could guarantee that the start()
>>> has really finished before end() is called, without using some operation
>>> that would also force the CPU cache to be flushed, making the effects
>>> visible.
>>>
>>> In any case, it seems like we should add "synchronized" to all the public
>>> methods in PGXAConnection. The performance penalty should be minimal, and
>>> it
>>> would fix any race conditions of that sort.
>>
>>
>> For Java 5+ this is really overkill.
>
>
> Well, volatile will only fix the issue if it's indeed "just" because of weak
> memory ordering issues. However, if the cause is a genuine race condition
> where glassfish calls end() in one thread before the start() in another
> thread has ran to completion, as Bryan suspected, then volatile is not
> enough. Besides, it would be good for PGXAConnection to be thread-safe in
> general, like the rest of the JDBC driver.
>
> - Heikki



--
Florent Guillaume, Director of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Bryan Varner
Дата:
> Bryan,
>
> Can you test Heikki's patch with your code. Florent any chance you can
> give me java 5 patch or better yet a git pull

Sorry Dave, I'm not seeing a patch from Heikki in any email? is this all
happening on github?

Since this is a production issue for me, I'm having to be a little more
careful than just ramming test code into my environment and hoping it
all shakes out. I've also been talking with other developers from the
GlassFish project, attempting to understand better what their JTA TM
expects from a resource.

Regards,
-Bryan


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Bryan Varner
Дата:
So, in our testing, this has eliminated one source of error. We do see -some- improvement.

However, I'm -very- confused about why the XAResource implementation for postgres has so many condition checks, why
it'stracking what xid was being serviced by the resource (these are global). It seems like the XAResource
implementationisn't trusting the global transaction manager to actually track xids to resources. 

Is this due to the overly simplistic isSameRM method, where it's not actually comparing if the resources is the same
resourcerather than the same rmid (pointer to an XAResource)? 

I'm not an XA expert, but I've been doing some comparison / contrasting to other open source implementations, and it
seemslike other implementations are merely tracking some simple state (are we in a global tx or not?) but none of them
areenforcing the restrictions the PG resource is regarding currentxid. 

So I guess my question is, does anyone know why the pg XAResource was implemented in this fashion? I'm not trying to
sayit's wrong, it would just be very beneficial to understand why it's like this, and what the motivation was. 


Regards,
-Bryan
________________________________________
From: pgsql-jdbc-owner@postgresql.org [pgsql-jdbc-owner@postgresql.org] on behalf of Bryan Varner
[bvarner@polarislabs.com]
Sent: Tuesday, February 12, 2013 1:53 PM
Cc: pgsql-jdbc@postgresql.org
Subject: Re: [JDBC] PostgreSQL XAResource & GlassFish 3.1.2.2

> Bryan,
>
> Can you test Heikki's patch with your code. Florent any chance you can
> give me java 5 patch or better yet a git pull

Sorry Dave, I'm not seeing a patch from Heikki in any email? is this all
happening on github?

Since this is a production issue for me, I'm having to be a little more
careful than just ramming test code into my environment and hoping it
all shakes out. I've also been talking with other developers from the
GlassFish project, attempting to understand better what their JTA TM
expects from a resource.

Regards,
-Bryan


--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Heikki Linnakangas
Дата:
On 13.02.2013 04:28, Bryan Varner wrote:
> So, in our testing, this has eliminated one source of error. We do see -some- improvement.
>
> However, I'm -very- confused about why the XAResource implementation for postgres has so many condition checks, why
it'stracking what xid was being serviced by the resource (these are global). It seems like the XAResource
implementationisn't trusting the global transaction manager to actually track xids to resources. 

That's one reason. Bugs in transaction managers are not unheard of.
Getting useful error messages instead of than strange undefined behavior
if you call the methods in a wrong sequence is useful in those
scenarios. It's also highly useful for debugging purposes, if you're
developing a transaction manager.

Another reason is that because the implementation doesn't support
transaction interleaving and suspend/resume, it checks that the
transaction manager doesn't try to do that. If it does, you get a
meaningful error, "Transaction interleaving not implemented". That's a
clue to the user to configure the transaction manager to not do that.

> Is this due to the overly simplistic isSameRM method, where it's not actually comparing if the resources is the same
resourcerather than the same rmid (pointer to an XAResource)? 

I didn't fully understand that sentence, but no, it's not related to the
fact that we have one XAResource instance per connection.

> I'm not an XA expert, but I've been doing some comparison / contrasting to other open source implementations, and it
seemslike other implementations are merely tracking some simple state (are we in a global tx or not?) but none of them
areenforcing the restrictions the PG resource is regarding currentxid. 

I guess it depends on the underlying DBMS. Many drivers just pass on the
start/end calls to the backend, and the backend handles tracking the
state. Also, some drivers are simply not as strict on sanity-checking
the incoming calls, and will fail silently if the transaction manager
does something goofy.

- Heikki


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Simon Riggs
Дата:
On 13 February 2013 12:58, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 13.02.2013 04:28, Bryan Varner wrote:
>>
>> So, in our testing, this has eliminated one source of error. We do see
>> -some- improvement.
>>
>> However, I'm -very- confused about why the XAResource implementation for
>> postgres has so many condition checks, why it's tracking what xid was being
>> serviced by the resource (these are global). It seems like the XAResource
>> implementation isn't trusting the global transaction manager to actually
>> track xids to resources.
>
>
> That's one reason. Bugs in transaction managers are not unheard of. Getting
> useful error messages instead of than strange undefined behavior if you call
> the methods in a wrong sequence is useful in those scenarios. It's also
> highly useful for debugging purposes, if you're developing a transaction
> manager.
>
> Another reason is that because the implementation doesn't support
> transaction interleaving and suspend/resume, it checks that the transaction
> manager doesn't try to do that. If it does, you get a meaningful error,
> "Transaction interleaving not implemented". That's a clue to the user to
> configure the transaction manager to not do that.

Rightly so.

Even if we supported interleaving we'd still want those checks so we
can tell the difference between various types of request. It isn't
cool to assume that any TMJOIN request works with whatever the last
xid used was.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Bryan Varner
Дата:
Thanks Heikki for your responses.

>> So, in our testing, this has eliminated one source of error. We do see
>> -some- improvement.
>>
>> However, I'm -very- confused about why the XAResource implementation
>> for postgres has so many condition checks, why it's tracking what xid
>> was being serviced by the resource (these are global). It seems like
>> the XAResource implementation isn't trusting the global transaction
>> manager to actually track xids to resources.
>
> That's one reason. Bugs in transaction managers are not unheard of.
> Getting useful error messages instead of than strange undefined behavior
> if you call the methods in a wrong sequence is useful in those
> scenarios. It's also highly useful for debugging purposes, if you're
> developing a transaction manager.

I've been doing a lot (a LOT) of catching up on JTA over the last few
days, and I have some concerns about some of the sanity checks in the
driver.

Last night I was reading a thread from this list dating back to 2006,
where it seemed a lot of XA work was going on at the time.

> Another reason is that because the implementation doesn't support
> transaction interleaving and suspend/resume, it checks that the
> transaction manager doesn't try to do that. If it does, you get a
> meaningful error, "Transaction interleaving not implemented". That's a
> clue to the user to configure the transaction manager to not do that.

Fair enough, if the TM allows for that configuration.

I've been logging every XA call from GlassFish 3.1.2.2 over the last 24
hours (including our heavy load testing where the XAResource refused to
do some things the TM told it to do), and I've been reconciling what the
TM is telling the XAResource to do against the JTA 1.0.1 spec [0], and
what the PG implementation claims it can and cannot do.

I hope you don't mind my questions.

>> Is this due to the overly simplistic isSameRM method, where it's not
>> actually comparing if the resources is the same resource rather than
>> the same rmid (pointer to an XAResource)?
>
> I didn't fully understand that sentence, but no, it's not related to the
> fact that we have one XAResource instance per connection.

My understanding of the isSameRM comes JTA spec 3.4.9. Paraphrasing in
my own words, it looks like the TM expects this to return true if the
XAResource passed as a method parameter is connection to the same
resource as the one the method is being invoked upon. The TM uses this
to determine if it should invoke start with TMJOIN, or begin a new TX
branch.

Since interleaving isn't implemented, I can see why the current
implementation 'works'.

>> I'm not an XA expert, but I've been doing some comparison /
>> contrasting to other open source implementations, and it seems like
>> other implementations are merely tracking some simple state (are we in
>> a global tx or not?) but none of them are enforcing the restrictions
>> the PG resource is regarding currentxid.
>
> I guess it depends on the underlying DBMS. Many drivers just pass on the
> start/end calls to the backend, and the backend handles tracking the
> state. Also, some drivers are simply not as strict on sanity-checking
> the incoming calls, and will fail silently if the transaction manager
> does something goofy.

You're not going to see me complain about defensive programming. In
general, it's a good practice and habit to get into.

After synchronizing the methods, the 'failures' we're getting in regards
to TM invocation of the XAResources seem to all be centered around
section 3.4.6 part of the JTA spec.

Specifically, we're seeing commit() invoked with xid's that don't match
the XAResource's currentXid, as well as commit() called on connections
which have no currentXid. This appears to be behavior that's within spec...

I understand that XA isn't easy to do (or else everyone would), but it's
almost like the PG implementation is missing a layer of indirection
between the physical connections (pegged to currently serviced xids) and
the logical connections in use by the client application. I think the
best way I can describe this is, from what I'm seeing, it's like JTA
expects the XAResource being returned by an XAConnection isn't pegged to
(or required to represent) a physical connection, but (potentially)
operates upon one or more physical connections to service the
invocations of the TM upon the appropriate physical connection.

Am I completely off the mark here?

Regards,
-Bryan


[0]: http://download.oracle.com/otndocs/jcp/7286-jta-1.0.1-spec-oth-JSpec/



Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Heikki Linnakangas
Дата:
On 13.02.2013 17:20, Bryan Varner wrote:
> I've been doing a lot (a LOT) of catching up on JTA over the last few
> days, and I have some concerns about some of the sanity checks in the
> driver.

I'm all ears.

>> Another reason is that because the implementation doesn't support
>> transaction interleaving and suspend/resume, it checks that the
>> transaction manager doesn't try to do that. If it does, you get a
>> meaningful error, "Transaction interleaving not implemented". That's a
>> clue to the user to configure the transaction manager to not do that.
>
> Fair enough, if the TM allows for that configuration.

It's not required by the spec, but in practice all the TM's have an
option for that, because many JDBC implementations don't support those
features; the PostgreSQL driver isn't alone with that.

> After synchronizing the methods, the 'failures' we're getting in regards
> to TM invocation of the XAResources seem to all be centered around
> section 3.4.6 part of the JTA spec.
>
> Specifically, we're seeing commit() invoked with xid's that don't match
> the XAResource's currentXid, as well as commit() called on connections
> which have no currentXid. This appears to be behavior that's within spec...

Right, assuming they are 2nd phase commits (onephase parameter ==
false), the TM is free to use any XAResource instance. The PG driver
requires the XAResource to not be associated with a transaction, however.

> I understand that XA isn't easy to do (or else everyone would), but it's
> almost like the PG implementation is missing a layer of indirection
> between the physical connections (pegged to currently serviced xids) and
> the logical connections in use by the client application. I think the
> best way I can describe this is, from what I'm seeing, it's like JTA
> expects the XAResource being returned by an XAConnection isn't pegged to
> (or required to represent) a physical connection, but (potentially)
> operates upon one or more physical connections to service the
> invocations of the TM upon the appropriate physical connection.
>
> Am I completely off the mark here?

I think it's more like the JTA authors expect there to be methods in the
underlying FE/BE protocol to associate/disassociate a transaction from
the physical connection.

The JTA spec is based on the X/Open spec, which was written for a quite
a different world. I think the X/Open spec assumes a single operating
system process running a single thread; being able to suspend/resume a
transaction makes a lot more sense in that context. It didn't translate
well to the Java world.

- Heikki


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Kevin Grittner
Дата:
Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

> I think it's more like the JTA authors expect there to be methods
> in the underlying FE/BE protocol to associate/disassociate a
> transaction from the physical connection.

It does seem like we could implement more of the JTA features if an
XAConnection actually managed its own small pool of database
connections when necessary.  Whether there is enough demand to
merit the work involved is a whole different question.

If someone wanted it enough to pay for the work, I bet it could be
done.  :-)

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Bryan Varner
Дата:
>> I think it's more like the JTA authors expect there to be methods
>> in the underlying FE/BE protocol to associate/disassociate a
>> transaction from the physical connection.
>
> It does seem like we could implement more of the JTA features if an
> XAConnection actually managed its own small pool of database
> connections when necessary.  Whether there is enough demand to
> merit the work involved is a whole different question.

Why would each XAConnection need it's own pool?

> If someone wanted it enough to pay for the work, I bet it could be
> done.  :-)

This is where I'm supposed to mention that the views expressed in my
emails do not represent the views of POLARIS Laboratories. Right?

Regards,
-Bryan Varner


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Bryan Varner
Дата:
>> Why would each XAConnection need it's own pool?
>
> Because one PostgreSQL connection can't interleave transactions,
> and you can't commit or roll back a prepared transaction in a
> connection which has a transaction open.  I thought you wanted to
> be able to do such things.  They could be done if one XAConnection
> could map to more than one PostgreSQL connection.

Assuming that each logical XAConnection is backed by exactly one
physical PGPooledConnection (and all connections are busy servicing an
XID) then the situation you've described is completely accurate, and no
different than the situation posed by the current XA implementation.

Adding one physical connection to the data source, for use by the
XAResource control signals (commit / rollback / recover / etc.) should
be sufficient to avoid a deadlock in a client app. (you'd have to be
able to queue the control statements and synchronously respond)

I don't think you need a 'pool' per XAConnection, but you may need a
number of extra physical connections in order to dispatch / handle
non-xa invocations.

Regards,
-Bryan Varner


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Kevin Grittner
Дата:
Bryan Varner <bvarner@polarislabs.com> wrote:

> Why would each XAConnection need it's own pool?

Because one PostgreSQL connection can't interleave transactions,
and you can't commit or roll back a prepared transaction in a
connection which has a transaction open.  I thought you wanted to
be able to do such things.  They could be done if one XAConnection
could map to more than one PostgreSQL connection.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Florent Guillaume
Дата:
Ah I didn't realize that PG could issue a COMMIT PREPARED from a
different connection than the one that issued the PREPARE TRANSACTION
but thinking about it it's logical.

So I believe that the setup you propose (having one additional control
connection) could be useful. You could probably avoid using it (and
therefore synchronizing work) if the physical connection is not
already reassociated to another XID.

Florent

On Wed, Feb 13, 2013 at 11:20 PM, Bryan Varner <bvarner@polarislabs.com> wrote:
>>> Why would each XAConnection need it's own pool?
>>
>>
>> Because one PostgreSQL connection can't interleave transactions,
>> and you can't commit or roll back a prepared transaction in a
>> connection which has a transaction open.  I thought you wanted to
>> be able to do such things.  They could be done if one XAConnection
>> could map to more than one PostgreSQL connection.
>
>
> Assuming that each logical XAConnection is backed by exactly one physical
> PGPooledConnection (and all connections are busy servicing an XID) then the
> situation you've described is completely accurate, and no different than the
> situation posed by the current XA implementation.
>
> Adding one physical connection to the data source, for use by the XAResource
> control signals (commit / rollback / recover / etc.) should be sufficient to
> avoid a deadlock in a client app. (you'd have to be able to queue the
> control statements and synchronously respond)
>
> I don't think you need a 'pool' per XAConnection, but you may need a number
> of extra physical connections in order to dispatch / handle non-xa
> invocations.
>
> Regards,
> -Bryan Varner
>
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc



--
Florent Guillaume, Director of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
rhaman
Дата:
Kevin Grittner-5 wrote
> It does seem like we could implement more of the JTA features if an
> XAConnection actually managed its own small pool of database
> connections when necessary.  Whether there is enough demand to
> merit the work involved is a whole different question.

Kevin - how would you determine if there is enough demand and how do we get
the requests to you? With some informal discussions to this point, it sounds
like there is interest.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-XAResource-GlassFish-3-1-2-2-tp5744803p5745247.html
Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.


Re: PostgreSQL XAResource & GlassFish 3.1.2.2

От
Bryan Varner
Дата:
On 02/14/2013 09:59 AM, Florent Guillaume wrote:
> Ah I didn't realize that PG could issue a COMMIT PREPARED from a
> different connection than the one that issued the PREPARE TRANSACTION
> but thinking about it it's logical.

Yep, it sure does.

> So I believe that the setup you propose (having one additional control
> connection) could be useful. You could probably avoid using it (and
> therefore synchronizing work) if the physical connection is not
> already reassociated to another XID.

As I've been thinking about this, in theory unless all the physical
connections are consumed servicing an XID and have been suspended by the
TM, at some indeterminate point in time, at least -one- of them will
have an end() invoked, and become available for use to service the
XAResource invocation.

Only if all the connections created by the datasource are in a suspended
TX would there be a potential for a 'deadlock' needing an extra control
connection -- and even then, that would be an extreme corner-case -- one
that I'd like would recover at some point with a TM-driven TX timeout,
or the TM deciding to resume a suspended TX.

I know this would play out in a case where there's a lot of REQUIRES_NEW
branching going on, but I haven't quite nailed down (mentally) a case
where this could occur without the TM exhausting a pool (if it's even
using one).

Regards,
-Bryan Varner