Re: Batch insert in CTAS/MatView code

Поиск

Список

Период

Сортировка

От	Paul Guo
Тема	Re: Batch insert in CTAS/MatView code
Дата	10 марта 2019 г. 16:55:35
Msg-id	CAEET0ZEnwWpx98qfnf5w12_AxnRdCnWH09rFprx8bc-qc9-UAA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Batch insert in CTAS/MatView code (Heikki Linnakangas <hlinnaka@iki.fi>)
Список	pgsql-hackers

Дерево обсуждения

Sorry for the late reply.

To Michael. Thank you. I know this commitfest is ongoing and I'm not targeting for this.

On Thu, Mar 7, 2019 at 4:54 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 06/03/2019 22:06, Paul Guo wrote:
> The patch also modifies heap_multi_insert() a bit to do a bit further
> code-level optimization by using static memory, instead of using memory
> context and dynamic allocation.

If toasting is required, heap_prepare_insert() creates a palloc'd tuple.
That is still leaked to the current memory context.

Thanks. I checked the code for that but apparently, I missed that one. I'll see what proper context can be used for CTAS. For copy code maybe just revert my change.

Leaking into the current memory context is not a bad thing, because
resetting a memory context is faster than doing a lot of pfree() calls.
The callers just need to be prepared for that, and use a short-lived
memory context.

> By the way, while looking at the code, I noticed that there are 9 local
> arrays with large length in toast_insert_or_update() which seems to be a
> risk of stack overflow. Maybe we should put it as static or global.

Hmm. We currently reserve 512 kB between the kernel's limit, and the
limit we check in check_stack_depth(). See STACK_DEPTH_SLOP. Those
arrays add up to 52800 bytes on a 64-bit maching, if I did my math
right. So there's still a lot of headroom. I agree that it nevertheless
seems a bit excessive, though.

I was worried about some recursive calling of it, but probably there should be no worry for toast_insert_or_update().

> With the patch,
>
> Time: 4728.142 ms (00:04.728)
> Time: 14203.983 ms (00:14.204)
> Time: 1008.669 ms (00:01.009)
>
> Baseline,
> Time: 11096.146 ms (00:11.096)
> Time: 13106.741 ms (00:13.107)
> Time: 1100.174 ms (00:01.100)

Nice speedup!

> While for toast and large column size there is < 10% decrease but for
> small column size the improvement is super good. Actually if I hardcode
> the batch count as 4 all test cases are better but the improvement for
> small column size is smaller than that with current patch. Pretty much
> the number 4 is quite case specific so I can not hardcode that in the
> patch. Of course we could further tune that but the current value seems
> to be a good trade-off?

Have you done any profiling, on why the multi-insert is slower with
large tuples? In principle, I don't see why it should be slower.

Thanks for the suggestion. I'll explore a bit more on this.

- Heikki

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Ramanarayana
Дата: 10 марта 2019 г., 16:26:57
Сообщение: Re: Add missing CREATE TABLE IF NOT EXISTS table_name AS EXECUTE query;

Следующее

От: Heikki Linnakangas
Дата: 10 марта 2019 г., 17:09:15
Сообщение: Re: Making all nbtree entries unique by having heap TIDs participatein comparisons

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Batch insert in CTAS/MatView code

Предыдущее

Следующее