Re: Testing of parallel restore with current snapshot

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: Testing of parallel restore with current snapshot
Дата
Msg-id 4A0DBBC9.7040001@dunslane.net
обсуждение исходный текст
Ответ на Re: Testing of parallel restore with current snapshot  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Testing of parallel restore with current snapshot  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> Tom Lane wrote:
>>     
>>> I don't want to mess with it right now either, but perhaps we should
>>> have a TODO item to improve the intelligence of parallel restore so that
>>> it really does try to do things this way.
>>>       
>
>   
>> Other things being equal it schedules things in TOC order, which often 
>> works as we want anyway. I think there's a good case for altering the 
>> name sort order of pg_dump to group sub-objects of a table (indexes, 
>> constraints etc.) together, ie. instead of sorting by <objectname>, we'd 
>> sort by <tablename, objectname>. This would possibly improve the effect 
>> seen in parallel restore without requiring any extra intelligence there.
>>     
>
> I'm not at all excited about substituting one arbitrary ordering rule
> for another one ...
>
> What is probably happening that accounts for Josh's positive experience
> is that all the indexes of a particular table "come free" from the
> dependency restrictions at the same instant, namely when the data load
> for that table ends.  So if there's nothing else to do they'll get
> scheduled together.  However, if the data load for table B finishes
> before all the indexes of table A have been scheduled, then B's indexes
> will start competing with A's for scheduling slots.  The performance
> considerations suggest that we'd be best advised to finish out all of
> A's indexes before scheduling any of B's, but I'm not sure that that's
> what it will actually do.
>
> Based on this thought, what seems to make sense as a quick-and-dirty
> answer is to make sure that items get scheduled in the same order they
> came free from dependency restrictions.  I don't recall whether that
> is true at the moment, or how hard it might be to make it true if it
> isn't already.
>
>             
>   

AIUI, pg_dump sorts items by <object-type, schema, objectname> and then 
does a topological sort to permute this order to reflect dependencies. 
This is the TOC order parallel restore starts with (unless the order is 
mucked with by the user via the --use-list option).  Each time it needs 
to schedule an item from the list, it chooses the first one yet to run 
that meets both these criteria:
   * all its dependencies have already been restored   * it has no locking conflicts with a currently running item.

Now, it is common practice to use the table name as a prefix of an index 
name, and this will actually cause indexes for a table to be grouped 
together in the TOC list. I think that's why Josh is seeing what he's 
seeing. If this holds, then all of the index creations for A will be 
started before any of the indexes for B, even if B's table data finishes 
restoring half way through restoring A's indexes. So your speculation 
about B's indexes contending with A's is incorrect unless their names 
sort intermingled.

During development, I did play with changing the TOC order some, but 
abandoned it, as testing didn't show any obvious gain - if anything the 
reverse. There are some remnants of this in the code.

cheers

andrew


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Testing of parallel restore with current snapshot
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Re: [BUGS] BUG #4796: Recovery followed by backup creates unrecoverable WAL-file