parallel pg_restore design issues

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема parallel pg_restore design issues
Дата
Msg-id 48E957C4.8060008@dunslane.net
обсуждение исходный текст
Ответы Re: parallel pg_restore design issues  (Philip Warner <pjw@rhyme.com.au>)
Список pgsql-hackers
There are a couple of open questions for parallel pg_restore.

First, we need a way to decide the boundary between the serially run 
"pre-data" section and the remainder of the items in the TOC. Currently 
the code uses the first TABLEDATA item as the boundary. That's not 
terribly robust (what if there aren't any?). Also, people have wanted to 
steer clear of hardcoding much knowledge of archive member types into 
pg_restore as a way of future-proofing it somewhat. I'm wondering if we 
should have pg_dump explicitly mark items as pre-data,data or post-data. 
For legacy archives we could still check for either a TABLEDATA item or 
something known to sort after those (i.e. a BLOB, BLOB COMMENT, 
CONSTRAINT, INDEX, RULE, TRIGGER or FK CONSTRAINT item).

Another item we have already discussed is how to prevent concurrent 
processes from trying to take conflicting locks. Her we really can't 
rely on pg_dump to help us out, as lock requirements might change (a 
little bird has already whispered in my ear about reducing the strength 
of FK CONSTRAINT locks taken). I haven't got a really good answer here.

Last, there is the question of what algorithm to use in chosing the next 
item to run. Currently, I am using "next item in the queue whose 
dependencies have been met", with no queue reordering.

Another possible algorithm would reorder the queue by elevating any item 
whose dependencies have been met. This will mean all the indexes for a 
table will tend to be grouped together, which might well be a good 
thing, and will tend to limit the tendency to do all the data loading at 
once.

Both of these could be modified by explicitly limiting TABLEDATA items 
to a certain proportion (say, one quarter) of the processing slots 
available, if other items are available.

I'm actually somewhat inclined to make provision for all of these 
possibilities via a command line option, with the first being the 
default. One size doesn't fit all, I suspect, and if it does we'll need 
lots of data before deciding what that size is. The extra logic won't 
really involve all that much code, and it will all be confined to a 
couple of functions.

Thoughts?

cheers

andrew


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Common Table Expressions applied; some issues remain
Следующее
От: Greg Smith
Дата:
Сообщение: Re: Add default_val to pg_settings