Re: pg_dump additional options for performance

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: pg_dump additional options for performance
Дата
Msg-id 200803040133.m241X8r08435@momjian.us
обсуждение исходный текст
Ответ на Re: pg_dump additional options for performance  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: pg_dump additional options for performance  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: pg_dump additional options for performance  ("Joshua D. Drake" <jd@commandprompt.com>)
Список pgsql-hackers
Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > I've not been advocating improving pg_restore, which is where the -Fc
> > tricks come in.
> > ...
> > I see you thought I meant pg_restore. I don't thinking extending
> > pg_restore in that way is of sufficiently generic use to make it
> > worthwhile. Extending psql would be worth it, since not all psql scripts
> > come from pg_dump.
> 
> OK, the reason I didn't grasp what you are proposing is that it's insane.
> 
> We can easily, and backwards-compatibly, improve pg_restore to do
> concurrent restores.  Trying to make psql do something like this will
> require a complete rewrite, and there is no prospect that it will work
> for any input that didn't come from (an updated version of) pg_dump
> anyway.  Furthermore you will have to write a whole bunch of new code
> just to duplicate what pg_dump/pg_restore already do, ie store/retrieve
> the TOC and dependency info in a program-readable fashion.
> 
> Since the performance advantages are still somewhat hypothetical,
> I think we should reach for the low-hanging fruit first.  If concurrent
> pg_restore really does prove to be the best thing since sliced bread,
> *then* would be the time to start thinking about whether it's possible
> to do the same thing in less-constrained scenarios.

Added to TODO based on this discussion:
       o Allow pg_dump to utilize multiple CPUs and I/O channels by dumping         multiple objects simultaneously
         The difficulty with this is getting multiple dump processes to         produce a single dump output file.
  http://archives.postgresql.org/pgsql-hackers/2008-02/msg00205.php
 
       o Allow pg_restore to utilize multiple CPUs and I/O channels by         restoring multiple objects
simultaneously
         This might require a pg_restore flag to indicate how many         simultaneous operations should be performed.
Only pg_dump's         -Fc format has the necessary dependency information.
 
       o To better utilize resources, restore data, primary keys, and         indexes for a single table before
restoringthe next table
 
         Hopefully this will allow the CPU-I/O load to be more uniform         for simultaneous restores.  The idea is
tostart data restores         for several objects, and once the first object is done, to move         on to its primary
keysand indexes.  Over time, simultaneous         data loads and index builds will be running.
 
       o To better utilize resources, allow pg_restore to check foreign         keys simultaneously, where possible
 o Allow pg_restore to create all indexes of a table         concurrently, via a single heap scan
 
         This requires a pg_dump -Fc file because that format contains         the required dependency information.
   http://archives.postgresql.org/pgsql-general/2007-05/msg01274.php
 
       o Allow pg_restore to load different parts of the COPY data         simultaneously

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: pg_dump additional options for performance
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: pg_dump additional options for performance