Re: pg_dump additional options for performance
| От | Simon Riggs | 
|---|---|
| Тема | Re: pg_dump additional options for performance | 
| Дата | |
| Msg-id | 1217151108.3894.1218.camel@ebony.2ndQuadrant обсуждение исходный текст | 
| Ответ на | Re: pg_dump additional options for performance (Tom Lane <tgl@sss.pgh.pa.us>) | 
| Список | pgsql-patches | 
On Sat, 2008-07-26 at 13:56 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > I want to dump tables separately for performance reasons. There are > > documented tests showing 100% gains using this method. There is no gain > > adding this to pg_restore. There is a gain to be had - parallelising > > index creation, but this patch doesn't provide parallelisation. > > Right, but the parallelization is going to happen sometime, and it is > going to happen in the context of pg_restore. I honestly think there is less benefit that way than if we consider things more as a whole: To do data dump quickly we need to dump different tables to different disks simultaneously. By its very nature, that cannot end with just a single file. So the starting point for any restore must be potentially more than one file. There are two ways of dumping: either multi-thread pg_dump, or allow multiple pg_dumps to work together. Second option much less work, same result. (Either way we also need a way for multiple concurrent sessions to share a snapshot.) When restoring, we can then just use multiple pg_restore sessions to restore the individual data files. Or again we can write a multi-threaded pg_restore to do the same thing - why would I bother doing that when I already can? It gains us nothing. Parallelising the index creation seems best done using concurrent psql. We've agreed some mods to psql to put multi-sessions in there. If we do that right, then we can make pg_restore generate a psql script with multi-session commands scattered appropriately throughout. Parallel pg_restore is a lot of work for a narrow use case. Concurrent psql provides a much wider set of use cases. So fully parallelising dump/restore can be achieved by * splitting dump into pieces (this patch) * allowing sessions to share a common snapshot * concurrent psql * changes to pg_restore/psql/pg_dump to allow commands to be inserted which will use concurrent psql features If we do things this way then we have some useful tools that can be used in a range of use cases, not just restore. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
В списке pgsql-patches по дате отправления: