Re: [HACKERS] RE: [GENERAL] Transaction logging

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: [HACKERS] RE: [GENERAL] Transaction logging
Дата
Msg-id 199907070217.WAA00377@candle.pha.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] RE: [GENERAL] Transaction logging  (jwieck@debis.com (Jan Wieck))
Список pgsql-general
Added to TODO:

    * Transaction log, so re-do log can be on a separate disk by
      logging SQL queries, or before/after row images



> Michael Davis wrote:
>
> >
> > What would it take to have transaction logging added to Postgres.  I am a
> > c/c++ programmer and will consider contributing to the Postgres development
> > effort.  I really like everything I see and read about Postgres.  As a
>
>     I spent some time on transaction logging since it's a feature
>     I'm missing too. There are mainly two  different  transaction
>     log mechanisms out.
>
>     1.  Log queries sent to the backend.
>
>     2.  Log  images  of  inserted/updated  rows  and  row ID's of
>         deleted ones.
>
>     The query  level  logging  will  write  less  information  if
>     queries usually affect a large number of rows.  Unfortunately
>     the extensibility of Postgres work's against  this  approach.
>     There  could  be  any  number of user written functions who's
>     results  aren't  reproduceable  during  recovery.  And   core
>     features of Postgres itself would introduce the same problem.
>     Have a sequence which is used to create  default  values  for
>     multiple  tables,  so  that one ID is unique across them. Now
>     two backends insert (with  INSERT  ...  SELECT)  concurrently
>     into  different  tables  using  the  same  sequence.   It's a
>     classic race condition and it depends  on  context  switching
>     and  page  faults  which  backend  will  get  which  sequence
>     numbers. You cannot foresee and you cannot reproduce,  except
>     you  hook into the sequence generator and log this too. Later
>     when recovering, another hook  into  the  sequence  generator
>     must    reproduce    the    logged   results   on   the   per
>     backend/transaction/command base, and the same must  be  done
>     for   each  function  that  usually  returns  unreproduceable
>     results (anything dealing with time, pid's, etc.).
>
>     As said, this must also cover user  functions.  So  at  least
>     there  must  be  a  general  log  API  that  provides  such a
>     functionality for user written functions.
>
>     The image logging approach also has problems. First, the only
>     thing  given to the heap access methods to outdate a tuple on
>     update/delete is the current tuple ID (information that tells
>     which  tuple  in  which block is meant).  So you need to save
>     the database files  in  binary  format,  because  during  the
>     actually  existing  dump/restore  this  could  change and the
>     logged CTID's would hit the wrong tuples.
>
>     Second, you must remember in the  log  which  transaction  ID
>     these  informations  came  from  and later if the transaction
>     committed or not, so the recovery can set  this  commit/abort
>     information in pg_log too. pg_log is a shared system file and
>     the transaction ID's are unique only for one  server.   Using
>     this  information for online replication of a single database
>     to another Postgres installation will not work.
>
>     Third, there are still some shared system catalogs across all
>     databases  (pg_database,  pg_group,  pg_log!!!, pg_shadow and
>     pg_variable). Due to that it would be impossible (or at least
>     very,  very  tricky) to restore/recover (maybe point in time)
>     one single database. If you destroy one database and  restore
>     it  from  the  binary backup, these shared catalogs cannot be
>     restored too, so they're out of sync with  the  backup  time.
>     How  should  the  recovery  now  hit  the right things (which
>     probably must not be there at all)?.
>
>     All this is really  a  mess.  I  think  the  architecture  of
>     Postgres  will  only allow something on query level with some
>     general API for things that must reproduce  the  same  result
>     during  recovery.  For  example  time().  Inside the backend,
>     time() should  never  be  called  directly.  Instead  another
>     function  is  to be called that log's during normal operation
>     which time get's returned by this  particular  function  call
>     and  if  the  backend  is in recovery mode, returns the value
>     from the log.
>
>     And again, this all means trouble. Usually, most queries sent
>     to  the  database  don't  change  any  data  because they are
>     SELECT's. It would dramatically blow up the log amount if you
>     log ALL queries instead of only those that modify things. But
>     when the query begins, you don't know this, because a  SELECT
>     might call a function that uses SPI to UPDATE something else.
>     So the decision if the query must be logged or not  can  only
>     be  made  when  the  query  is  done  (by  having some global
>     variable where the  heap  access  methods  set  a  flag  that
>     something  got written).  Now you have to log function call's
>     like time() even if the query will not modify any single  row
>     in the database because the query is a
>
>     SELECT 'now'::datetime - updtime FROM ...
>
>     Doing  this on a table with thousands of rows will definitely
>     waste much logging space and  slowdown  the  whole  thing  by
>     unnecessary logging.
>
>     Maybe  it's  a  compromise  if at each query start the actual
>     time and other such information is remembered by the backend,
>     all  time() calls return this remembered value instead of the
>     real one (wouldn't be bad anyway IMHO), and this  information
>     is logged only if the query is to be logged.
>
>     Finally  I  think  I must have missed some more problems, but
>     aren't these enough already to frustrate you :-?
>
>
> Jan
>
> --
>
> #======================================================================#
> # It's easier to get forgiveness for being wrong than for being right. #
> # Let's break this rule - forgive me.                                  #
> #======================================== jwieck@debis.com (Jan Wieck) #
>
>
>
>


--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

В списке pgsql-general по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [GENERAL] inet and cidr type problems
Следующее
От: Juan Grigera
Дата:
Сообщение: JoinClauseSelecivity: bad value