redolog - for discussion

Поиск
Список
Период
Сортировка
От jwieck@debis.com (Jan Wieck)
Тема redolog - for discussion
Дата
Msg-id m0zjVGt-000EBjC@orion.SAPserv.Hamburg.dsh.de
обсуждение исходный текст
Список pgsql-hackers
Hi,

    here  are  some details I would like to discuss before beeing
    too deep in the implementation that it's hard to change  them
    later.

    Point  1  is an extension of the pg_database relation that is
    required to see the actual redolog state of a  database.  New
    fields are:

        lastbackup    datetime
        redomode      int4
        redoseq1      int4

    lastbackup  is  the time, the last successful full backup was
    taken.  More precise, the  time  when  pg_dump  switched  the
    backend into online backup mode.

    redomode  is  defined  as  0=none,  1=async-logging,  2=sync-
    logging, 4=restore, 5=recover, 6=error.

    redoseq1 is the sequence number of  the  redolog  file  began
    when  pg_dump  switched  to  online backup mode (this command
    implies a logfile switch).

    Point 2 is  the  extension  of  the  querylanguage.  All  the
    statements  are  restricted to superusers or the owner of the
    database. The SQL statements to  control  the  whole  process
    are:

    ALTER DATABASE REDOMODE {NONE | ASYNCHRONOUS | SYNCHRONOUS};

        Turns  logging  for the database on or off. Database must
        be in normal  operation  mode  for  it  (not  restore  or
        recover mode).

    ALTER DATABASE BEGIN BACKUP;

        Issued by pg_dump before doing anything else.

        The  command stops ALL other activity in the database, so
        pg_dump has time to pull out  at  least  the  information
        about  sequences  (actually  it  does  this while getting
        tables, might require some changes there so the  database
        get's back accessible soon).

    ALTER DATABASE ONLINE BACKUP;

        Issued  by  pg_dump  when  it  finished  the  things that
        require total exclusive database access.

        At this time, a logfile  switch  is  done  (only  if  the
        actual database is really logged) and the sequence number
        of the new logfile plus the current datetime  remembered.
        The behaviour of pg_dump's backend changes. It will see a
        snapshot of this time (implemented in tqual code) in  any
        subsequent  command  and  it  is  totally  unable  to  do
        anything that would update the database.

        Until the final END BACKUP is given, no  VACUUM  or  DROP
        TABLE  etc.  commands can be run. If they are issued, the
        command will be delayed until pg_dump finished.

    ALTER DATABASE END BACKUP;

        This  turns  back  the  special  behaviour  of  pg_dump's
        backend.  Additionally  the  remembered  time and redolog
        sequence are stored in  pg_database.   pg_dump  can  read
        them  out for the final statement in the dump output (see
        below).

    ALTER DATABASE BEGIN RESTORE;

        This command checks that  the  actual  database  is  just
        created  and  not  one  single  command has been executed
        before. It is the first command in  pg_dump's  output  if
        the database beeing dumped is a logged one.

        It switches the database into restore mode. In this mode,
        the first command on a new database  connection  must  be
        the special command

            RECOVER DATABASE AS USER 'uname'

        or an

            ALTER DATABASE END RESTORE ...;

        When doing the ACL stuff, pg_dump must output a reconnect
        (\c) to the database without the additional username  and
        then issue the special command.

    ALTER DATABASE END RESTORE [RECOVERY FROM redolog_seq];

        This  ends the restore mode. The additional RECOVERY FROM
        is put into by pg_dump  for  logged  databases  only.  It
        reads  out  this  information  after  END  BACKUP. If not
        given, the database is  switched  into  normal  operation
        mode  without  logging. But if given, the sequence number
        is stored in pg_database and the  database  is  put  into
        recover  mode. In that mode, only RECOVER commands can be
        issued.

    RECOVER DATABASE {ALL | UNTIL 'datetime' | RESET};

        The database must be in recover mode. If RESET is  given,
        the  mode  is  switched  to ASYNC logging, The lastbackup
        field is set to 'epoch' and redoseq1 set to 0. It  resets
        the database to the state at the backup snapshot time.

        For  the  others, the backend starts the recovery program
        which  reads  the  redolog  files,  establishes  database
        connections  as  required  and reruns all the commands in
        them. If a required logfile isn't  found,  it  tells  the
        backend  and  waits for the reply.  The backend tells the
        user what happened on error  (redolog  file  with  seq  n
        required but not found ...). So the user can put back the
        required redolog files and let recover  resume  (actually
        the user pgsql or root must put them back :-).

        If  the  recovery  is  interrupted  (controlling  backend
        terminates), the database is set into error mode and only
        a RECOVER DATABASE RESET will help.

        If  the  recovery finally succeeds, the same as for RESET
        happens.  The database is online in async logmode.

        Since the "destroydb" is also remembered in the  redolog,
        recovery  will  stop  at  least  if it hit's that for the
        database actually recoverd.  This is  to  prevent  faulty
        recovery  which  could  occure  if  someone destroy's one
        database, creates a  new  one  with  the  same  name  but
        different contents that is logged, destroy's it again and
        then want's to restore and recover the first.

    RECOVER DATABASE CONTINUE;

        After beeing told to restore  some  more  redolog  files,
        this command let's the recovery resume.

    RECOVER DATABASE AS USER 'uname';

        A  special command used in restore and recover mode only.
        This is restricted to  superusers  with  usecatupd  right
        (not  db  owner) and modifies the current username in the
        backend. It's ugly, but  the  problem  is  that  ordinary
        users  should not be able to use the database while it is
        in restore or recover mode.  So the connection cannot  be
        established like with '\c - username'.

        For  the  restore  and  recover  this  means, that a real
        superuser with unrestricted access is needed to restore a
        database  that  was  dumped with ACL info.  But otherwise
        one with createdb but not superuser rights  could  put  a
        CREATE  USER  into  a  file,  create  a  new database and
        "restore" that as user pgsql.  I'm  sure  we  don't  want
        this.

    ###

    Whow - hopefully I didn't forgot anything.

    All  that  might look very complicated, but the only commands
    someone would really use manually will be

        ALTER DATABASE REDOMODE ...;

    and

        RECOVER DATABASE ...;

    Anything else is used by pg_dump and the recovery program.

    What I'm trying to implement with  it  is  a  behaviour  that
    makes  it  possible to backup, restore and recover a database
    on a production system without running closed shop. Doing  so
    for  one database will not affect the others in the instance,
    since no modification of the hba conf or anything  else  will
    be  required.  Only  the database actually beeing restored is
    closed for normal usage.

    Hmmm - just have the next idea right now. One more  field  in
    pg_database could tell that the db is shut down or restricted
    Someone could disable a single database or restrict usage  to
    superusers/dbowner for some time, make database readonly etc.

    Anyway - does anybody see problems with the above? Do we need
    more  functionality?  Oh  yeah - another utility that follows
    log and replicates databases onto other systems on  the  fly.
    But let me get this all running first please :-).


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Ivar Helbekkmo
Дата:
Сообщение: Re: [HACKERS] Re: Mysql 321 - Mysql 322 - msql
Следующее
От: Vadim Mikheev
Дата:
Сообщение: HeapTuple changed