Re: [PERFORM] Slow BLOBs restoring

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [PERFORM] Slow BLOBs restoring
Дата
Msg-id 6530.1291907115@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [PERFORM] Slow BLOBs restoring  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [PERFORM] Slow BLOBs restoring  (Robert Haas <robertmhaas@gmail.com>)
Re: [PERFORM] Slow BLOBs restoring  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers
I wrote:
> One fairly simple, if ugly, thing we could do about this is skip calling
> reduce_dependencies during the first loop if the TOC object is a blob;
> effectively assuming that nothing could depend on a blob.  But that does
> nothing about the point that we're failing to parallelize blob
> restoration.  Right offhand it seems hard to do much about that without
> some changes to the archive representation of blobs.  Some things that
> might be worth looking at for 9.1:

> * Add a flag to TOC objects saying "this object has no dependencies",
> to provide a generalized and principled way to skip the
> reduce_dependencies loop.  This is only a good idea if pg_dump knows
> that or can cheaply determine it at dump time, but I think it can.

I had further ideas about this part of the problem.  First, there's no
need for a file format change to fix this: parallel restore is already
groveling over all the dependencies in its fix_dependencies step, so it
could count them for itself easily enough.  Second, the real problem
here is that reduce_dependencies processing is O(N^2) in the number of
TOC objects.  Skipping it for blobs, or even for all dependency-free
objects, doesn't make that very much better: the kind of people who
really need parallel restore are still likely to bump into unreasonable
processing time.  I think what we need to do is make fix_dependencies
build a reverse lookup list of all the objects dependent on each TOC
object, so that the searching behavior in reduce_dependencies can be
eliminated outright.  That will take O(N) time and O(N) extra space,
which is a good tradeoff because you won't care if N is small, while if
N is large you have got to have it anyway.

Barring objections, I will do this and back-patch into 9.0.  There is
maybe some case for trying to fix 8.4 as well, but since 8.4 didn't
make a separate TOC entry for each blob, it isn't as exposed to the
problem.  We didn't back-patch the last round of efficiency hacks in
this area, so I'm thinking it's not necessary here either.  Comments?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [PERFORM] Slow BLOBs restoring
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Solving sudoku using SQL