Re: Making "COPY partitioned_table FROM" faster

Поиск

Список

Период

Сортировка

От	Melanie Plageman
Тема	Re: Making "COPY partitioned_table FROM" faster
Дата	26 июля 2018 г. 23:30:22
Msg-id	CAAKRu_ZvbATWE_22mMdT6ms2cEOQR7018Au12O5Q8C_xKBjigA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Making "COPY partitioned_table FROM" faster (David Rowley <david.rowley@2ndquadrant.com>)
Ответы	Re: Making "COPY partitioned_table FROM" faster
Список	pgsql-hackers

Дерево обсуждения

On Wed, Jul 25, 2018 at 7:24 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:

On patched code line 2564, there is a missing parenthesis. It might be better to remove the parentheses entirely and, while you're at it, there is a missing comma.

/*
* For partitioned tables we can't support multi-inserts when there
* are any statement level insert triggers. (It might be possible to
* allow partitioned tables with such triggers in the future, but for
* now CopyFromInsertBatch expects that any before row insert and
* statement level insert triggers are on the same relation.
*/

Should be

/*
* For partitioned tables we can't support multi-inserts when there
* are any statement level insert triggers. It might be possible to
* allow partitioned tables with such triggers in the future, but for
* now, CopyFromInsertBatch expects that any before row insert and
* statement level insert triggers are on the same relation.
*/

Regarding the performance improvement and code diff from v2 to v3:

The rearranging of the code in v3 makes sense and improves the flow of the code, however, I stared at it for awhile and couldn't quite work out in my head which part caused the significant improvement from v2.

I would have thought that a speedup from patch v2 to v3 would have come from doing multi-inserts less often when the target partition is switching a lot, however, looking at the v2 to v3 diff, I can't see how any of the changes would have caused a decrease in the number of multi-inserts given the same data and target table ddl.

So, I thinking I'm missing something. Which of the changes would cause the performance improvement from patch v2 to v3? My understanding is:

a) Made the avgTuplesPerPartChange >= 1.3 the first condition in the
test that sets leafpart_use_multi_insert.

This would short-circuit checking the other conditions when deciding how to set leafpart_use_multi_insert

b) Added an unlikely() when testing of the partition's resultRelInfo
has set to be initialised. This is only true the first time a
partition is inserted into during the COPY.

The addition of "unlikely" here seems like a good idea because there is a function call (ExecInitPartitionInfo) inside the if statement. However, would that cause such a difference in performance from v2 to v3? What would be the reason? Cache misses? Some kind of pre-evaluation of the expression?

c) Added unlikely() around lastPartitionSampleLineNo test. This will
only be true 1 in 1000, so pretty unlikely.

Even though this makes sense based on the frequency with which this condition will evaluate to true, I don't understand what the performance benefit would be for adding a compiler hint here around such a trivial calculation

d) Moved the nBufferedTuples > 0 test under the insertMethod ==
CIM_MULTI_CONDITIONAL test. It's never going to be > 0 when we're
doing CIM_SINGLE.

This is a good catch. It makes this part of the code more clear, as well. However, it doesn't seem like it would affect performance at all.

Let me know what I'm missing.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Robert Haas
Дата: 26 июля 2018 г., 23:03:39
Сообщение: Re: Online enabling of checksums

Следующее

От: Andres Freund
Дата: 26 июля 2018 г., 23:35:16
Сообщение: Re: Online enabling of checksums

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Making "COPY partitioned_table FROM" faster

Предыдущее

Следующее