Обсуждение: pg_dump object sorting

Поиск
Список
Период
Сортировка

pg_dump object sorting

От
Andrew Dunstan
Дата:
I have been looking at refining the sorting of objects in pg_dump to 
make it take advantage of buffering and synchronised scanning, and 
possibly make parallel restoration simpler and more efficient.

My first thought was to sort indexes by <namespace, tablename, 
indexname> instead of by <namespace, indexname>. However, that doesn't 
go far enough, I think. Is there any reason we can't do all of a table's 
indexes and non-FK constraints together? Will that affect anything other 
than PK and UNIQUE constraints, as NULL and CHECK constraints are 
included in table definitions?

cheers

andrew


Re: pg_dump object sorting

От
Jeff Davis
Дата:
On Mon, 2008-04-14 at 11:18 -0400, Andrew Dunstan wrote:
> I have been looking at refining the sorting of objects in pg_dump to 
> make it take advantage of buffering and synchronised scanning, and 
> possibly make parallel restoration simpler and more efficient.
> 

Synchronized scanning is explicitly disabled in pg_dump. That was a
last-minute change to answer Greg Stark's complaint about dumping a
clustered table:

http://archives.postgresql.org/pgsql-hackers/2008-01/msg00987.php

That hopefully won't be a permanent solution, because I think
synchronized scans are useful for pg_dump.

However, I'm not clear on how the pg_dump order would be able to better
take advantage of synchronized scans anyway. What did you have in mind?

Regards,Jeff Davis



Re: pg_dump object sorting

От
Andrew Dunstan
Дата:

Jeff Davis wrote:
> On Mon, 2008-04-14 at 11:18 -0400, Andrew Dunstan wrote:
>   
>> I have been looking at refining the sorting of objects in pg_dump to 
>> make it take advantage of buffering and synchronised scanning, and 
>> possibly make parallel restoration simpler and more efficient.
>>
>>     
>
> Synchronized scanning is explicitly disabled in pg_dump. That was a
> last-minute change to answer Greg Stark's complaint about dumping a
> clustered table:
>
> http://archives.postgresql.org/pgsql-hackers/2008-01/msg00987.php
>
> That hopefully won't be a permanent solution, because I think
> synchronized scans are useful for pg_dump.
>
> However, I'm not clear on how the pg_dump order would be able to better
> take advantage of synchronized scans anyway. What did you have in mind?
>
>
>   

I should have expressed it better. The idea is to have pg_dump emit the 
objects in an order that allows the restore to take advantage of sync 
scans. So sync scans being disabled in pg_dump would not at all matter.

cheers

andrew


Re: pg_dump object sorting

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> I should have expressed it better. The idea is to have pg_dump emit the 
> objects in an order that allows the restore to take advantage of sync 
> scans. So sync scans being disabled in pg_dump would not at all matter.

Unless you do something to explicitly parallelize the operations,
how will a different ordering improve matters?

I thought we had a paper design for this, and it involved teaching
pg_restore how to use multiple connections.  In that context it's
entirely up to pg_restore to manage the ordering and ensure dependencies
are met.  So I'm not seeing how it helps to have a different sort rule
at pg_dump time --- it won't really make pg_restore's task any easier.
        regards, tom lane


Re: pg_dump object sorting

От
Andrew Dunstan
Дата:

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> I should have expressed it better. The idea is to have pg_dump emit the 
>> objects in an order that allows the restore to take advantage of sync 
>> scans. So sync scans being disabled in pg_dump would not at all matter.
>>     
>
> Unless you do something to explicitly parallelize the operations,
> how will a different ordering improve matters?
>
> I thought we had a paper design for this, and it involved teaching
> pg_restore how to use multiple connections.  In that context it's
> entirely up to pg_restore to manage the ordering and ensure dependencies
> are met.  So I'm not seeing how it helps to have a different sort rule
> at pg_dump time --- it won't really make pg_restore's task any easier.
>
>             
>   

Well, what actually got me going on this initially was that I got 
annoyed by having indexes not grouped by table when I dumped out the 
schema of a database, because it seemed a bit illogical. Then I started 
thinking about it and it seemed to me that even without synchronised 
scanning or parallel restoration, we might benefit from building all the 
indexes of a given table together, especially if the whole table could 
fit in either our cache or the OS cache.

cheers

andrew