Re: Re: AW: Re: MySQL and BerkleyDB (fwd)
От | Bruce Momjian |
---|---|
Тема | Re: Re: AW: Re: MySQL and BerkleyDB (fwd) |
Дата | |
Msg-id | 200101250336.WAA00898@candle.pha.pa.us обсуждение исходный текст |
Ответ на | Re: Re: AW: Re: MySQL and BerkleyDB (fwd) (Bruce Momjian <pgman@candle.pha.pa.us>) |
Список | pgsql-hackers |
Added to TODO.detail/replication. > [ Charset ISO-8859-1 unsupported, converting... ] > > > I had thought that the pre-commit information could be stored in an > > > auxiliary table by the middleware program ; we would then have > > > to re-implement some sort of higher-level WAL (I thought of the list > > > of the commands performed in the current transaction, with a sequence > > > number for each of them that would guarantee correct ordering between > > > concurrent transactions in case of a REDO). But I fear I am missing > > > > This wouldn't work for READ COMMITTED isolation level. > > But why do you want to log commands into WAL where each modification > > is already logged in, hm, correct order? > > Well, it has sense if you're looking for async replication but > > you need not in two-phase commit for this and should aware about > > problems with READ COMMITTED isolevel. > > > > I believe the issue here is that while SERIALIZABLE ISOLATION means all > queries can be run serially, our default is READ COMMITTED, meaning that > open transactions see committed transactions, even if the transaction > committed after our transaction started. (FYI, see my chapter on > transactions for help, http://www.postgresql.org/docs/awbook.html.) > > To do higher-level WAL, you would have to record not only the queries, > but the other queries that were committed at the start of each command > in your transaction. > > Ideally, you could number every commit by its XID your log, and then > when processing the query, pass the "committed" transaction ids that > were visible at the time each command began. > > In other words, you can replay the queries in transaction commit order, > except that you have to have some transactions committed at specific > points while other transactions are open, i.e.: > > XID Open XIDS Query > 500 UPDATE t SET col = 3; > 501 500 BEGIN; > 501 500 UPDATE t SET col = 4; > 501 UPDATE t SET col = 5; > 501 COMMIT; > > This is a silly example, but it shows that 500 must commit after the > first command in transaction 501, but before the second command in the > transaction. This is because UPDATE t SET col = 5 actually sees the > changes made by transaction 500 in READ COMMITTED isolation level. > > I am not advocating this. I think WAL is a better choice. I just > wanted to outline how replaying the queries in commit order is > insufficient. > > > Back to two-phase commit - it's easiest part of work required for > > distributed transaction processing. > > Currently we place single commit record to log and transaction is > > committed when this record (and so all other transaction records) > > is on disk. > > Two-phase commit: > > > > 1. For 1st phase we'll place into log "prepared-to-commit" record > > and this phase will be accomplished after record is flushed on disk. > > At this point transaction may be committed at any time because of > > all its modifications are logged. But it still may be rolled back > > if this phase failed on other sites of distributed system. > > > > 2. When all sites are prepared to commit we'll place "committed" > > record into log. No need to flush it because of in the event of > > crash for all "prepared" transactions recoverer will have to > > communicate other sites to know their statuses anyway. > > > > That's all! It is really hard to implement distributed lock- and > > communication- managers but there is no problem with logging two > > records instead of one. Period. > > Great. > > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
В списке pgsql-hackers по дате отправления: