Re: 8.3 / 8.2.6 restore comparison

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: 8.3 / 8.2.6 restore comparison
Дата
Msg-id 47C13B5E.3020604@enterprisedb.com
обсуждение исходный текст
Ответ на Re: 8.3 / 8.2.6 restore comparison  ("Joshua D. Drake" <jd@commandprompt.com>)
Ответы Re: 8.3 / 8.2.6 restore comparison  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Re: 8.3 / 8.2.6 restore comparison  (andy <andy@squeakycode.net>)
Re: 8.3 / 8.2.6 restore comparison  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: 8.3 / 8.2.6 restore comparison  ("Joshua D. Drake" <jd@commandprompt.com>)
Re: 8.3 / 8.2.6 restore comparison  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
Joshua D. Drake wrote:
> On Sun, 24 Feb 2008 00:43:18 +0000
> "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:
>  
>> Incidentally, I've been working on a patch to speed up CopyReadLine.
>> I was going to run some more tests first, but since we're talking
>> about it, I guess I should just post the patch. I'll post to
>> pgsql-patches shortly.
> 
> On your post to patches you mentioned only about a 5% improvement.
> Don't get me wrong, 5% is 5% and I respect it greatly but as far as I
> can tell we are about 300% behind the curve.

Yeah. Looking at the profile, the time is spent really all over the 
place. There's no one clear bottleneck to focus on. I think we could do 
a few more ~5% improvements, but

At some point, I think we have to bite the bullet and find a way to use 
multiple CPUs for a single load. I don't have any good ideas or plans 
for that, but hopefully someone does.

> My tests were maxing out at ~22G an hour. On hardware that can do 
> in 360G an hour and that is assuming > 50% overhead between OS, libs,
> etc... I have no choice but to conclude we have a much, much deeper and
> fundamental issue going on with COPY. I am inspired by Itagaki Takahiro
> and his batch update of indexes which should help greatly overall but
> doesn't help my specific issue.

Yep, the index build idea is an I/O improvement, not a CPU one.

> Forgive me for not being a C programmer and Alvaro is not online so I
> would vet these questions with him first.
> 
> I know that copy is in theory a bulk loader but, when performing the
> readline how many lines are we reading?  Do we read up to 8192? Or do we
> shove in say 8megs of data before we invoke DoCopy?

We read 64 KB at a time, and then CopyReadLineText returns one line at a 
time from that buffer.

Looking at your profile more, and after the memchr patch, the "raw input 
side" of copy, consisting of reading the data from disk in 64KB blocks, 
splitting that into lines, and splitting lines into columns, still takes 
~20% of the CPU time. I suspect CopyReadAttributesText is the biggest 
culprit there.

You could avoid the ~8% spent in XLogInsert in PostgreSQL 8.3, by 
creating the table (or truncating it) in the same transaction with the COPY.

After that, heap_formtuple is high on the list. I wonder if we could do 
something about that.

> I am just curious if there is some simple low hanging fruit that is
> possibly missing.

I don't see any piece of code that's causing problems. We can shave off 
a few percents here and there I think, but don't expect a 300% 
improvement anytime soon. A few ideas I've thought about are:

- use a specialized version of strtol, for base 10. That won't help on 
your table, but I've seen strtol consume a significant amount of time on 
tables with numeric/integer columns.

- Instead of pallocing and memcpying the text fields, leave a little bit 
of room between fields in the attribute_buf, and write the varlen header 
there directly. This might help you since your table has a lot of text 
fields.

- Instead of the normal PG function calling conventions, provide 
specialized fastpath input functions for the most common data types. 
InputFunctionCall consumed 4.5% of the CPU time in your profile.

- Use a simpler memory context implementation, that's like a stack with 
no pfree support, for the per-tuple context.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: 8.3 / 8.2.6 restore comparison
Следующее
От: "Heikki Linnakangas"
Дата:
Сообщение: Re: 8.3 / 8.2.6 restore comparison