Re: Making "COPY partitioned_table FROM" faster

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: Making "COPY partitioned_table FROM" faster
Дата
Msg-id CAAKRu_ZvbATWE_22mMdT6ms2cEOQR7018Au12O5Q8C_xKBjigA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Making "COPY partitioned_table FROM" faster  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: Making "COPY partitioned_table FROM" faster
Список pgsql-hackers

On Wed, Jul 25, 2018 at 7:24 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:

On patched code line 2564, there is a missing parenthesis. It might be better to remove the parentheses entirely and, while you're at it, there is a missing comma.

/*
 * For partitioned tables we can't support multi-inserts when there
 * are any statement level insert triggers.  (It might be possible to
 * allow partitioned tables with such triggers in the future, but for
 * now CopyFromInsertBatch expects that any before row insert and
 * statement level insert triggers are on the same relation.
 */

Should be

/*
 * For partitioned tables we can't support multi-inserts when there
 * are any statement level insert triggers. It might be possible to
 * allow partitioned tables with such triggers in the future, but for
 * now, CopyFromInsertBatch expects that any before row insert and
 * statement level insert triggers are on the same relation.
 */

Regarding the performance improvement and code diff from v2 to v3:
The rearranging of the code in v3 makes sense and improves the flow of the code, however, I stared at it for awhile and couldn't quite work out in my head which part caused the significant improvement from v2.
I would have thought that a speedup from patch v2 to v3 would have come from doing multi-inserts less often when the target partition is switching a lot, however, looking at the v2 to v3 diff, I can't see how any of the changes would have caused a decrease in the number of multi-inserts given the same data and target table ddl.
So, I thinking I'm missing something. Which of the changes would cause the performance improvement from patch v2 to v3? My understanding is:

a) Made the avgTuplesPerPartChange >= 1.3 the first condition in the
test that sets leafpart_use_multi_insert.

This would short-circuit checking the other conditions when deciding how to set leafpart_use_multi_insert
 
b) Added an unlikely() when testing of the partition's resultRelInfo
has set to be initialised. This is only true the first time a
partition is inserted into during the COPY.

 The addition of "unlikely" here seems like a good idea because there is a function call (ExecInitPartitionInfo) inside the if statement. However, would that cause such a difference in performance from v2 to v3? What would be the reason? Cache misses? Some kind of pre-evaluation of the expression?

c) Added unlikely() around lastPartitionSampleLineNo test.  This will
only be true 1 in 1000, so pretty unlikely.

Even though this makes sense based on the frequency with which this condition will evaluate to true, I don't understand what the performance benefit would be for adding a compiler hint here around such a trivial calculation

d) Moved the nBufferedTuples > 0 test under the insertMethod ==
CIM_MULTI_CONDITIONAL test. It's never going to be > 0 when we're
doing CIM_SINGLE.

This is a good catch. It makes this part of the code more clear, as well. However, it doesn't seem like it would affect performance at all.

Let me know what I'm missing.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Online enabling of checksums
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Online enabling of checksums