Re: why postgresql over other RDBMS
От | Chris Browne |
---|---|
Тема | Re: why postgresql over other RDBMS |
Дата | |
Msg-id | 60d50omdvr.fsf@dba2.int.libertyrms.com обсуждение исходный текст |
Ответ на | Re: why postgresql over other RDBMS (Tom Allison <tom@tacocat.net>) |
Ответы |
Re: why postgresql over other RDBMS
(Ron Johnson <ron.l.johnson@cox.net>)
|
Список | pgsql-general |
alvherre@commandprompt.com (Alvaro Herrera) writes: > Erik Jones wrote: > >> And, to finish up, is there any reason that pg_restore couldn't >> already work with separate processes working in parallel? > > The problem is that the ordering of objects in the dump is the only > thing that makes the dump consistent with regards to the dependencies of > objects. So pg_restore cannot make any assumptions of parallelisability > of the restoration process of objects in the dump. > > pg_dump is the only one who has the dependency information. > > If that information were to be saved in the dump, then maybe pg_restore > could work in parallel. But it seems a fairly non-trivial thing to do. > > Mind you, while I am idling at this idea, it seems that just having > multiple processes generating a dump is not such a hot idea by itself, > because you then have no clue on how to order the restoration of the > multiple files that are going to result. I think it's less bad than you think. The really timeconsuming bits of "pg_restore" are: 1. the loading of table data 2. creating indices on those tables [distant] 3. setting up R/I constraints If you look at the present structure of pg_dump output, those are all pretty visibly separate steps. pg_dump output [loosely] consists of: - Type definitions & such - Table definitions - loading table data (e.g. - 1) - stored function definitions - indices (e.g. - parts of 2) - primary keys (e.g. - the rest of 2) - triggers + rules (including 3) Thus, a "parallel load" would start by doing some things in a serial fashion, namely creating types and tables. This isn't a parallelizable step, but so what? It shouldn't take very long. The parallel load can load as many tables concurrently as you choose; since there are no indices or R/I triggers, those are immaterial factors. Generating indices and primary keys could, again, be parallelized pretty heavily, and have (potentially) heavy benefit. Furthermore, an interesting thing to do might be to use the same approach that Slony-I does, at present, for subscriptions. It temporarily deactivates triggers and indices while loading the data, then reactivates them, and requests a re-index. That would permit loading the *entire* schema, save for data, and then load and index with fairly much maximum possible efficiency. That seems like a not-completely-frightening "SMOP" (simple matter of programming). Not completely trivial, but not frighteningly non-trivial... -- (format nil "~S@~S" "cbbrowne" "cbbrowne.com") http://linuxdatabases.info/info/linuxdistributions.html Developmental Psychology "Schoolyard behavior resembles adult primate behavior because "Ontogeny Recapitulates Phylogeny" doesn't stop at birth." -- Mark Miller
В списке pgsql-general по дате отправления:
Предыдущее
От: "Michael Harris \(BR/EPA\)"Дата:
Сообщение: Re: ERROR: cache lookup failed for type 0