Re: [Retrieved]RE: backup and recovery

Поиск
Список
Период
Сортировка
От Murthy Kambhampaty
Тема Re: [Retrieved]RE: backup and recovery
Дата
Msg-id 3554BFA5FAE4B14FB5459D36E4F4E36203D93B@io.goeci.com
обсуждение исходный текст
Список pgsql-admin
I think you can get both benefits of "multi-statement transactions for INSERT dumps" by doing "subset copies"  ...
withoutany changes in postgresql!  The method I use is developed for handling single table "loads", but is still
relativelypainless even for database dumps; however, it is limited to text dumps. 

Let's say you want to unload-load table "tbl_foo" from schema "sch_bar" from database "db_baz" and reload
"sch_bar.tbl_foo"to database "db_quux".  Try the following: 

1.) Dump-restore the table schema so you create an empty table in the destination database. e.g.:
    /usr/local/pgsql-7.4/bin/pg_dump -s -t tbl_foo --schema sch_bar db_baz | \
        /usr/local/pgsql-7.4/bin/psql -d db_quux
This can be adjusted for different hosts, etc.

2.) COPY the records to a file:
    /usr/local/pgsql-7.4/bin/psql -d db_bar \
     -c "copy sch_bar.tbl_foo to stdout" > sch_bar.tbl_foo.dat
OR
    /usr/local/pgsql-7.4/bin/psql -d db_bar -Aqt \
     -c "select * from sch_bar.tbl_foo where <some condition>" > sch_bar.tbl_foo.dat
The latter is slower, but selective.  You can also use the -p option to set col and row separators to whatever you like
(aswith copy options). If your source data came from a dump file, rather than a COPY, you can strip sql commands to
leavedata only, or modify the commands below. 

3. Pipe the data from sch_bar.tbl_foo.dat to psql, with copy commands spliced in at chosen intervals (in numbers of
lines)depending on your preferences for speed versus "recoverability".  In the example below, the subset size is 2000
lines:
awk \
  -v SubSize=2000 \
  -v COPYSTMT="copy sch_bar.tbl_foo from stdin;" \
 'BEGIN{ print COPYSTMT } \
  { print $0 } \
  FNR % SubSize == 0 { \
  print "\\.\n\n" ; \
  print "\n"; \
  print COPYSTMT }' "sch_bar.tbl_foo.dat" | \
        /usr/local/pgsql-7.4/bin/psql -U gouser -d airfrance -f -

The awk command specifies the chosen subset size ("2000") and a copy statement for putting stdin in the selected table;
atthe "BEGIN"ning, a copy statmenet is issued and lines are streamed in from the text file containing table rows; after
eachSubSize number of lines the copy stream is ended (as in text dumps, with a "\."), and a new copy statment inserted. 

For a 220,000 row table, times for the simple copy versus the subset copy were:

    Simple copy:
    real    0m21.704s
    user    0m3.790s
    sys     0m0.880s

    Subset copy:
    real    0m24.233s
    user    0m5.710s
    sys     0m1.090s

Over 10% more wall clock time, but the savings from not having to rerun the entire "load" if errors are found could be
tremendous.


3a.  Alternately, you can generate a log so you easily know which subset failed (if any):
# LogFile="/home/postgres/load.log"; \
 awk \
  -v SubSize=2000 \
  -v COPYSTMT="copy S2.air from stdin;" \
  -v LogF="$LogFile" \
 'BEGIN{ print "Block Size: " SubSize > LogF; \
   print "Copy Statment: " COPYSTMT > LogF; \
   print "\n\n" > LogF; \
   close(LogF) ; \
   print COPYSTMT } \
  { print $0 } \
  FNR % SubSize == 0 { \
  print "\\.\n\n" ; \
  printf("select \047Processed %d records from line no. %d to line no. %d\047;\n", SubSize, FNR -SubSize +1, FNR) ; \
  print "\n"; \
  print COPYSTMT }
  END{ \
  print "\\.\n\n" ; \
  printf("select \047Processed a grand total of %d lines from %s\047;\n", NR, FILENAME ) }' \
  "sch_bar.tbl_foo.dat" | \
        /usr/local/pgsql-7.4/bin/psql -U gouser -d airfrance -Atnq -f - >> "$LogFile" 2>&1

Errors can be located with:

[postgres@mybox postgres]$ cat load.log | grep -B 3 -A 3 "ERROR:"
Processed 2000 records from line no. 192001 to line no. 194000
Processed 2000 records from line no. 194001 to line no. 196000
Processed 2000 records from line no. 196001 to line no. 198000
ERROR:  invalid input syntax for integer: "My0"
CONTEXT:  COPY tbl_foo, line 2000, column oct 02: "My0"
Processed 2000 records from line no. 198001 to line no. 200000
Processed 2000 records from line no. 200001 to line no. 202000


HTH
    Murthy




> -----Original Message-----
> From: pgsql-admin-owner@postgresql.org
> [mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Naomi Walker
> Sent: Wednesday, March 24, 2004 5:58 PM
> To: Tom Lane
> Cc: Bruce Momjian; Tsirkin Evgeny; Naomi Walker; Mark M. Huber;
> pgsql-admin@postgresql.org
> Subject: Re: [Retrieved]RE: [ADMIN] backup and recovery
>
>
> At 03:54 PM 3/24/2004, Tom Lane wrote:
> >Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > Added to TODO:
> > >       * Have pg_dump use multi-statement transactions for
> INSERT dumps
> >
> > > For simple performance reasons, it would be good.  I am
> not sure about
> > > allowing errors to continue loading.   Anyone else?
> >
> >Of course, anyone who actually cares about reload speed shouldn't be
> >using INSERT-style dumps anyway ... I'm not sure why we should expend
> >effort on that rather than just telling people to use the COPY mode.
>
> Understood.  I would still love this feature for when in the
> COPY mode.
>
>
> >                         regards, tom lane
>
> --------------------------------------------------------------
> -----------------------------------------------------------
> Naomi Walker                         Chief Information Officer
>                                                Eldorado
> Computing, Inc.
nwalker@eldocomp.com           602-604-3100
> --------------------------------------------------------------
> -----------------------------------------------------------
> Forget past mistakes. Forget failures. Forget everything
> except what you're
> going to do now and do it.
> - William Durant, founder of General Motors
> --------------------------------------------------------------
> ----------------------------------------------------------
>
> -- CONFIDENTIALITY NOTICE --
>
> This message is intended for the sole use of the individual
> and entity to whom it is addressed, and may contain
> information that is privileged, confidential and exempt from
> disclosure under applicable law. If you are not the intended
> addressee, nor authorized to receive for the intended
> addressee, you are hereby notified that you may not use,
> copy, disclose or distribute to anyone the message or any
> information contained in the message. If you have received
> this message in error, please immediately advise the sender
> by reply email, and delete the message. Thank you.
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

В списке pgsql-admin по дате отправления:

Предыдущее
От: Hemapriya
Дата:
Сообщение: postgres copy command very slow.
Следующее
От: Bruno Wolff III
Дата:
Сообщение: Re: backup and recovery