Re: 5 minutes to pg_dump nothing

Поиск
Список
Период
Сортировка
От Nikita The Spider The Spider
Тема Re: 5 minutes to pg_dump nothing
Дата
Msg-id 35e76ac10709231337x2a93b38eu1b48b0e491b222e9@mail.gmail.com
обсуждение исходный текст
Ответ на Re: 5 minutes to pg_dump nothing  ("Nikita The Spider The Spider" <nikitathespider@gmail.com>)
Ответы Re: 5 minutes to pg_dump nothing  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
On 9/21/07, Nikita The Spider The Spider <nikitathespider@gmail.com> wrote:
> On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > "Nikita The Spider The Spider" <nikitathespider@gmail.com> writes:
> > > I'm seeing a problem where pg_dump takes at least 5 minutes to execute
> > > no matter what I ask it to dump -- even a non-existent or empty table.
> > > One possible red flag is that pg_type contains 56508 rows. This
> > > doesn't seem excessive to me, but perhaps it should.
> >
> > That does seem like a lot --- what sort of types are they?  Scalar,
> > composite, what?  It's fairly likely that no one has tried to optimize
> > pg_dump for such a case.
>
> Aha, thanks. Didn't realize I was pushing the bounds of what was
> reasonable. Here's the type counts:
>  typrelkind | the_count
> ------------+-----------
>             |       114
>  sequence   |     11496
>  composite  |     12290
>  ordinary   |     13844
>  TOAST      |      9215
>  view       |      9699
> (6 rows)

I think I found the problem. getTypes() calls findFuncByOid() twice
for each item it sees. A comment at the top of findFuncByOid() says,
"should hash this, but just do linear search for now". So that funtion
is O(n)/2 where n = the # of functions found by the query in
getFuncs(), and since it's called twice for each item in getTypes, the
overall performance is O(m * n) where n = # of functions and m = # of
types. As I said before, I have ~56000 items in pg_type and the query
at the top of getFuncs returns ~98000 rows so for me performance is
O(5.4 billion)-ish.

Some experimentation backs up my assertion that this is the
performance bottleneck. I created another database on the same machine
with 10k each of tables, views and custom types. The tables contained
a SERIAL PRIMARY KEY and a text column so that PG created indexes and
TOAST tables to go along with them. The result was ~50000 items in
pg_type. When I run pg_dump against this database, it completes in 4
seconds so simply having a lot of items in pg_type is not enough to
recreate my problem.

I added 100k functions to that database and pg_dump took several
minutes to run (again, trying to dump a non-existent table), which is
what I'm seeing on my production database.

My database design aimed to make each of my 1600 or so schemas its own
little self-sufficent world, hence the proliferation of types and
functions. I didn't realize that at some level they're all effectively
in the same namespace. Now I know.

Sorry I can't offer a patch. It'd be fun to write C code again but my
skills are pretty rusty and I'm afraid I'd butcher it. :-P

Thanks for your help! Given that this problem seems to be triggered by
a sort of edge case and the fix is non-trivial, I guess I should not
expect a new version of pg_dump soon?

Cheers

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: zero value in statistics collector's result
Следующее
От: Tom Lane
Дата:
Сообщение: Re: 5 minutes to pg_dump nothing