Re: [HACKERS] RE: [GENERAL] Transaction logging
От | Bruce Momjian |
---|---|
Тема | Re: [HACKERS] RE: [GENERAL] Transaction logging |
Дата | |
Msg-id | 199907070217.WAA00377@candle.pha.pa.us обсуждение исходный текст |
Ответ на | Re: [HACKERS] RE: [GENERAL] Transaction logging (jwieck@debis.com (Jan Wieck)) |
Список | pgsql-general |
Added to TODO: * Transaction log, so re-do log can be on a separate disk by logging SQL queries, or before/after row images > Michael Davis wrote: > > > > > What would it take to have transaction logging added to Postgres. I am a > > c/c++ programmer and will consider contributing to the Postgres development > > effort. I really like everything I see and read about Postgres. As a > > I spent some time on transaction logging since it's a feature > I'm missing too. There are mainly two different transaction > log mechanisms out. > > 1. Log queries sent to the backend. > > 2. Log images of inserted/updated rows and row ID's of > deleted ones. > > The query level logging will write less information if > queries usually affect a large number of rows. Unfortunately > the extensibility of Postgres work's against this approach. > There could be any number of user written functions who's > results aren't reproduceable during recovery. And core > features of Postgres itself would introduce the same problem. > Have a sequence which is used to create default values for > multiple tables, so that one ID is unique across them. Now > two backends insert (with INSERT ... SELECT) concurrently > into different tables using the same sequence. It's a > classic race condition and it depends on context switching > and page faults which backend will get which sequence > numbers. You cannot foresee and you cannot reproduce, except > you hook into the sequence generator and log this too. Later > when recovering, another hook into the sequence generator > must reproduce the logged results on the per > backend/transaction/command base, and the same must be done > for each function that usually returns unreproduceable > results (anything dealing with time, pid's, etc.). > > As said, this must also cover user functions. So at least > there must be a general log API that provides such a > functionality for user written functions. > > The image logging approach also has problems. First, the only > thing given to the heap access methods to outdate a tuple on > update/delete is the current tuple ID (information that tells > which tuple in which block is meant). So you need to save > the database files in binary format, because during the > actually existing dump/restore this could change and the > logged CTID's would hit the wrong tuples. > > Second, you must remember in the log which transaction ID > these informations came from and later if the transaction > committed or not, so the recovery can set this commit/abort > information in pg_log too. pg_log is a shared system file and > the transaction ID's are unique only for one server. Using > this information for online replication of a single database > to another Postgres installation will not work. > > Third, there are still some shared system catalogs across all > databases (pg_database, pg_group, pg_log!!!, pg_shadow and > pg_variable). Due to that it would be impossible (or at least > very, very tricky) to restore/recover (maybe point in time) > one single database. If you destroy one database and restore > it from the binary backup, these shared catalogs cannot be > restored too, so they're out of sync with the backup time. > How should the recovery now hit the right things (which > probably must not be there at all)?. > > All this is really a mess. I think the architecture of > Postgres will only allow something on query level with some > general API for things that must reproduce the same result > during recovery. For example time(). Inside the backend, > time() should never be called directly. Instead another > function is to be called that log's during normal operation > which time get's returned by this particular function call > and if the backend is in recovery mode, returns the value > from the log. > > And again, this all means trouble. Usually, most queries sent > to the database don't change any data because they are > SELECT's. It would dramatically blow up the log amount if you > log ALL queries instead of only those that modify things. But > when the query begins, you don't know this, because a SELECT > might call a function that uses SPI to UPDATE something else. > So the decision if the query must be logged or not can only > be made when the query is done (by having some global > variable where the heap access methods set a flag that > something got written). Now you have to log function call's > like time() even if the query will not modify any single row > in the database because the query is a > > SELECT 'now'::datetime - updtime FROM ... > > Doing this on a table with thousands of rows will definitely > waste much logging space and slowdown the whole thing by > unnecessary logging. > > Maybe it's a compromise if at each query start the actual > time and other such information is remembered by the backend, > all time() calls return this remembered value instead of the > real one (wouldn't be bad anyway IMHO), and this information > is logged only if the query is to be logged. > > Finally I think I must have missed some more problems, but > aren't these enough already to frustrate you :-? > > > Jan > > -- > > #======================================================================# > # It's easier to get forgiveness for being wrong than for being right. # > # Let's break this rule - forgive me. # > #======================================== jwieck@debis.com (Jan Wieck) # > > > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
В списке pgsql-general по дате отправления: