Re: pg_dump performance

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: pg_dump performance
Дата
Msg-id 4772C93F.703@enterprisedb.com
обсуждение исходный текст
Ответ на Re: pg_dump performance  (Jared Mauch <jared@puck.nether.net>)
Ответы Re: pg_dump performance
Список pgsql-performance
Jared Mauch wrote:
> On Wed, Dec 26, 2007 at 10:52:08PM +0200, Heikki Linnakangas wrote:
>> Jared Mauch wrote:
>>>     pg_dump is utilizing about 13% of the cpu and the
>>> corresponding postgres backend is at 100% cpu time.
>>> (multi-core, multi-cpu, lotsa ram, super-fast disk).
>>> ...
>>>     Any tips on getting pg_dump (actually the backend) to perform much closer
>>> to 500k/sec or more?  This would also aide me when I upgrade pg versions
>>> and need to dump/restore with minimal downtime (as the data never stops
>>> coming.. whee).
>> I would suggest running oprofile to see where the time is spent. There
>> might be some simple optimizations that you could do at the source level
>> that would help.
>>
>> Where the time is spent depends a lot on the schema and data. For example,
>> I profiled a pg_dump run on a benchmark database a while ago, and found
>> that most of the time was spent in sprintf, formatting timestamp columns.
>> If you have a lot of timestamp columns that might be the bottleneck for you
>> as well, or something else.
>>
>> Or if you can post the schema for the table you're dumping, maybe we can
>> make a more educated guess.
>
>     here's the template table that they're all copies
> of:
>
> CREATE TABLE template_flowdatas (
>     routerip inet,
>     starttime integer,
>     srcip inet,
>     dstip inet,
>     srcifc smallint,
>     dstifc smallint,
>     srcasn integer,
>     dstasn integer,
>     proto smallint,
>     srcport integer,
>     dstport integer,
>     flowlen integer,
>     tcpflags smallint,
>     tosbit smallint
> );

I run a quick oprofile run on my laptop, with a table like that, filled
with dummy data. It looks like indeed ~30% of the CPU time is spent in
sprintf, to convert the integers and inets to string format. I think you
could speed that up by replacing the sprintf calls in int2, int4 and
inet output functions with faster, customized functions. We don't need
all the bells and whistles of sprintf, which gives the opportunity to
optimize.


A binary mode dump should go a lot faster, because it doesn't need to do
those conversions, but binary dumps are not guaranteed to work across
versions.

BTW, the profiling I did earlier led me to think this should be
optimized in the compiler. I started a thread about that on the gcc
mailing list but got busy with other stuff and didn't follow through
that idea: http://gcc.gnu.org/ml/gcc/2007-10/msg00073.html

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

В списке pgsql-performance по дате отправления:

Предыдущее
От: Bill Moran
Дата:
Сообщение: Re: With 4 disks should I go for RAID 5 or RAID 10
Следующее
От: david@lang.hm
Дата:
Сообщение: Re: With 4 disks should I go for RAID 5 or RAID 10