Обсуждение: Re: Two Phase Commit WAS: Re: Two weeks to feature freeze

Поиск
Список
Период
Сортировка

Re: Two Phase Commit WAS: Re: Two weeks to feature freeze

От
Josh Berkus
Дата:
Tom,

> No.  I want to know what the subordinate does when it's promised to
> commit and the co-ordinator never responds.  AFAICS the subordinate
> is screwed --- it can't commit, and it can't abort, and it can't expect
> to make progress indefinitely on other work while it's holding locks
> for the not-quite-committed transaction.

AFAIK, MS SQL Server's two-phase commit works like this ... if both servers 
prepare, and one crashes, the transaction is screwed up.  Somewhat unreliable 
considering the frequence with which MSSQL crashes, yet it seems to be good 
enough for several companies to sell "solutions" based on it. (performance is 
also appalling, but that's a different issue)

Anybody have a grasp of Oracle internals for 2PC?

Anyway, I would vote for a first implemenation for 2PC which addressed the 
commit-then-crash issue in some expedient-but-not-reliable way, and putting 
2PC in /contrib with a "not for production use" warning.  Some people will 
use it in production anyway, and hopefully one or more of them will put in 
the dozens of hours required to make it reliable.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: Two Phase Commit WAS: Re: Two weeks to feature freeze

От
Bruce Momjian
Дата:
Agreed.

---------------------------------------------------------------------------

Josh Berkus wrote:
> Tom,
> 
> > No.  I want to know what the subordinate does when it's promised to
> > commit and the co-ordinator never responds.  AFAICS the subordinate
> > is screwed --- it can't commit, and it can't abort, and it can't expect
> > to make progress indefinitely on other work while it's holding locks
> > for the not-quite-committed transaction.
> 
> AFAIK, MS SQL Server's two-phase commit works like this ... if both servers 
> prepare, and one crashes, the transaction is screwed up.  Somewhat unreliable 
> considering the frequence with which MSSQL crashes, yet it seems to be good 
> enough for several companies to sell "solutions" based on it. (performance is 
> also appalling, but that's a different issue)
> 
> Anybody have a grasp of Oracle internals for 2PC?
> 
> Anyway, I would vote for a first implemenation for 2PC which addressed the 
> commit-then-crash issue in some expedient-but-not-reliable way, and putting 
> 2PC in /contrib with a "not for production use" warning.  Some people will 
> use it in production anyway, and hopefully one or more of them will put in 
> the dozens of hours required to make it reliable.
> 
> -- 
> Josh Berkus
> Aglio Database Solutions
> San Francisco
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Two Phase Commit WAS: Re: Two weeks to feature freeze

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
>> No.  I want to know what the subordinate does when it's promised to
>> commit and the co-ordinator never responds.  AFAICS the subordinate
>> is screwed --- it can't commit, and it can't abort, and it can't expect
>> to make progress indefinitely on other work while it's holding locks
>> for the not-quite-committed transaction.

> Anyway, I would vote for a first implemenation for 2PC which addressed the 
> commit-then-crash issue in some expedient-but-not-reliable way, and putting 
> 2PC in /contrib with a "not for production use" warning.  Some people will 
> use it in production anyway, and hopefully one or more of them will put in 
> the dozens of hours required to make it reliable.

Putting in "dozens of hours" is not the issue here --- the problem is
that there isn't any solution in sight, and I'm not eager to go down a
path that has an obvious dead end.
        regards, tom lane


Re: Two Phase Commit WAS: Re: Two weeks to feature freeze

От
The Hermit Hacker
Дата:
I second the agreement ... a 'reference implementation', of sorts, at
least gives someone to build on then starting right from scratch ...



On Mon, 23 Jun 2003, Bruce Momjian wrote:

>
> Agreed.
>
> ---------------------------------------------------------------------------
>
> Josh Berkus wrote:
> > Tom,
> >
> > > No.  I want to know what the subordinate does when it's promised to
> > > commit and the co-ordinator never responds.  AFAICS the subordinate
> > > is screwed --- it can't commit, and it can't abort, and it can't expect
> > > to make progress indefinitely on other work while it's holding locks
> > > for the not-quite-committed transaction.
> >
> > AFAIK, MS SQL Server's two-phase commit works like this ... if both servers
> > prepare, and one crashes, the transaction is screwed up.  Somewhat unreliable
> > considering the frequence with which MSSQL crashes, yet it seems to be good
> > enough for several companies to sell "solutions" based on it. (performance is
> > also appalling, but that's a different issue)
> >
> > Anybody have a grasp of Oracle internals for 2PC?
> >
> > Anyway, I would vote for a first implemenation for 2PC which addressed the
> > commit-then-crash issue in some expedient-but-not-reliable way, and putting
> > 2PC in /contrib with a "not for production use" warning.  Some people will
> > use it in production anyway, and hopefully one or more of them will put in
> > the dozens of hours required to make it reliable.
> >
> > --
> > Josh Berkus
> > Aglio Database Solutions
> > San Francisco
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: the planner will ignore your desire to choose an index scan if your
> >       joining column's datatypes do not match
> >
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


Re: Two Phase Commit WAS: Re: Two weeks to feature freeze

От
Jan Wieck
Дата:
Josh Berkus wrote:
> Anyway, I would vote for a first implemenation for 2PC which addressed the 
> commit-then-crash issue in some expedient-but-not-reliable way, and putting 
> 2PC in /contrib with a "not for production use" warning.  Some people will 
> use it in production anyway, and hopefully one or more of them will put in 
> the dozens of hours required to make it reliable.
> 

Josh,

you cannot put something that requires an FE/BE protocol change, ON 
COMMIT extra work plus ON RESTART extra work into contrib.

The interim solution to Tom's concern is "ask the operator". His entire 
point is based on the fact that there is no way to let the systems 
figure out what's right in the case they lost communication and don't 
know why. And for a system that just misses IP packets, there is no way 
to figure out if it's just that someone tripped over the cable or if the 
other building got nuked.

To figure out what happened was never the goal for the ARPA project. 
Their goal was to continue communication as long as there is a possible 
path. If that's gone, you're on your own ... sorry!

I think 2PC is of no use for things like replication with takeover on 
failure in mind. At least it'd cause a major hickup in the system, and 
since failurs tend to oscillate, I don't want to be anywhere close when 
that collaborative throwup starts. But I do think that there is value in 
distributed transactions. Well ... I *know* that there is.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #