Re: Re: AW: Re: MySQL and BerkleyDB (fwd)

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Re: AW: Re: MySQL and BerkleyDB (fwd)
Дата
Msg-id 200101250336.WAA00898@candle.pha.pa.us
обсуждение исходный текст
Ответ на Re: Re: AW: Re: MySQL and BerkleyDB (fwd)  (Bruce Momjian <pgman@candle.pha.pa.us>)
Список pgsql-hackers
Added to TODO.detail/replication.

> [ Charset ISO-8859-1 unsupported, converting... ]
> > >   I had thought that the pre-commit information could be stored in an
> > > auxiliary table by the middleware program ; we would then have
> > > to re-implement some sort of higher-level WAL (I thought of the list
> > > of the commands performed in the current transaction, with a sequence
> > > number for each of them that would guarantee correct ordering between
> > > concurrent transactions in case of a REDO). But I fear I am missing
> > 
> > This wouldn't work for READ COMMITTED isolation level.
> > But why do you want to log commands into WAL where each modification
> > is already logged in, hm, correct order?
> > Well, it has sense if you're looking for async replication but
> > you need not in two-phase commit for this and should aware about
> > problems with READ COMMITTED isolevel.
> > 
> 
> I believe the issue here is that while SERIALIZABLE ISOLATION means all
> queries can be run serially, our default is READ COMMITTED, meaning that
> open transactions see committed transactions, even if the transaction
> committed after our transaction started.  (FYI, see my chapter on
> transactions for help,  http://www.postgresql.org/docs/awbook.html.)
> 
> To do higher-level WAL, you would have to record not only the queries,
> but the other queries that were committed at the start of each command
> in your transaction.
> 
> Ideally, you could number every commit by its XID your log, and then
> when processing the query, pass the "committed" transaction ids that
> were visible at the time each command began.
> 
> In other words, you can replay the queries in transaction commit order,
> except that you have to have some transactions committed at specific
> points while other transactions are open, i.e.:
> 
> XID    Open XIDS    Query
> 500            UPDATE t SET col = 3;
> 501    500        BEGIN;
> 501    500        UPDATE t SET col = 4;
> 501            UPDATE t SET col = 5;
> 501            COMMIT;
> 
> This is a silly example, but it shows that 500 must commit after the
> first command in transaction 501, but before the second command in the
> transaction.  This is because UPDATE t SET col = 5 actually sees the
> changes made by transaction 500 in READ COMMITTED isolation level.
> 
> I am not advocating this.  I think WAL is a better choice.  I just
> wanted to outline how replaying the queries in commit order is 
> insufficient.
> 
> > Back to two-phase commit - it's easiest part of work required for
> > distributed transaction processing.
> > Currently we place single commit record to log and transaction is
> > committed when this record (and so all other transaction records)
> > is on disk.
> > Two-phase commit:
> > 
> > 1. For 1st phase we'll place into log "prepared-to-commit" record
> >    and this phase will be accomplished after record is flushed on disk.
> >    At this point transaction may be committed at any time because of
> >    all its modifications are logged. But it still may be rolled back
> >    if this phase failed on other sites of distributed system.
> > 
> > 2. When all sites are prepared to commit we'll place "committed"
> >    record into log. No need to flush it because of in the event of
> >    crash for all "prepared" transactions recoverer will have to
> >    communicate other sites to know their statuses anyway.
> > 
> > That's all! It is really hard to implement distributed lock- and
> > communication- managers but there is no problem with logging two
> > records instead of one. Period.
> 
> Great.
> 
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Re: AW: Re: MySQL and BerkleyDB (fwd)
Следующее
От: Bruce Momjian
Дата:
Сообщение: TODO.deail